Terms and conditions
The information contained in this website is for general information purposes only. Nothing on this website constitutes, or is meant to constitute, advice of any kind.
While the University of Leicester Genetics and Genome Biology department ("the Department") endeavours to keep the information up to date and correct, it makes no representations or warranties, express or implied, about the completeness, accuracy, reliability or availability of the website or the information contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
In no event will "the Department" be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of this website.
Through this website you are able to link to other websites which are not under "the Department's" control. "The Department" has no control over the nature, content and availability of those sites. The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.
Every effort is made to keep the website up and running smoothly. However, "the Department" takes no responsibility for, and will not be liable for, the website being temporarily unavailable due to technical issues beyond its control.”
Cookie Usage
This site uses "cookies" to track if you have accepted our terms and conditions and to monitor the number of visitors to the site and pages visited. This prevents you from having to accept them multiple times. By clicking the accept button you are agreeing to use of cookies on this website.
Creative Commons licence summary
This work is licensed under a Creative Commons Licence. Under this licence you may share and adapt material , for non-commercial purposes only, providing you credit this website (referencing "the Department" and the current curator) as the source of the information. Any work made using the materials in this website must also be made available under the same terms. A full description of the licence terms and conditions can be found at the link below.
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
Health data research does not have to involve the use of expensive proprietary software. While GDPR does place restrictions on access to and use of health related data, once suitable legal and ethical approval has been gained, the software below can be used to query the data to gain some useful insights. With the exception of Excel (which most people will have access to) the other software has been developed by the open source commutes and is available for free or for a small suggested donation to fund further development. If you are a member of University of Leicester staff or a student then then additional statistical software is available on university managed computers via the self-service system. Further details regarding what software is available on university computers can be found here for those who are access this site from on campus.
Anaconda is an integrated environment which is under-pinned by the python programming language. It uses plug in modules (which the user can select from) to provide a variety of tools for tasks such as programming, data manipulation, documentation and explanation of code, etc. Some of the plug-ins (such as the Orange data analysis software) are available separately, but installing them as a package in the Anaconda environment (which is available for those at the University of Leicester in the software self-service (Mac) or the software centre (Windows)) can be performed without requiring administrator privileges as may be the case if the packages are installed separately.
With all the talk of "big data" in data science it can be easy to forget the basics. If you are not handling a data set with thousands of lines of data, you may well be able to use Excel to visualise and analyse your data. Excel has expanded its capabilities over the years and now includes a wide variety of functions, including various graphing and statistical offerings. Microsoft have a training section to help you get up to speed with the basics as well as some more advanced features of Excel.
Galaxy is a workspace environment for the processing of raw NGS data. . The interface provides a way using common NGS analysis tools with out the need for learning the command line, and individual tools can also be combined into a pipeline/ While there are online servers that run the Galaxy software, it can also be installed locally by selecting the correct version of Galaxy container for your desired analysis at this link. Note you will need to install an environment to run the images in on your computer. The docker environment is recommended. Alternatively to install Galaxy natively on your Unix/Linux to MacOS computer from the command line, the series of commands (which can just be cut and pated into a terminal window) can be found here. Users should note that an independent FTP server may need to be installed on the target system, and it is necessary to set up administrator privileges on the instalation of Galaxy in order for settings to be saved.
Gephi takes a more visual approach to data exploration. It allows the user to take a global view of large multi factorial data sets, and then apply different predefined analysis to visualise data clusters, networks, linkage and play with algorithm parameters to see how changing them affects the out come in real time. It is also possible to select data element(s) in the both table view and graphical view and then perform other analysis relative to the selected data point. The overview video gives a good idea of the types of analysis that are possible. The website also has a dedicated training page, with some tutorials in French, Spanish and Chinese in addition to English.
While Tulip allows individual algorithms to be applied to the data one at a time, Orange allows users to construct analysis pipelines using visual interface, which uses blocks to represent algorithms that can be linked together. The parameters for each algorithm can be altered by clicking on each block and the output of the algorithm is also shown by double clicking on each one. Pipelines are built up by linking blocks together and specific data can be passed between blocks by selecting it in each analysis block. The training videos are under the documentation section of the website. Informative YouTube Videos on how the various "widgets" work and a simple explanation of the underlying statistics and algorithms can be found here
When a data set becomes too large to handle in Excel then you are going to need to analyse the data via some form of command driven process, rather than the more familiar point and click approach of Excel. However, this does not necessarily mean you have to dive right in at the deep end and learn programming. R is a programming language that was developed to allow statisticians to analyse data, however R Studio makes using R a little more friendly than hammering out commands on the command line or writing scripts. While it is not completely point and click, it does allow a lot of the functionality of R to be accessed via such an interface. The R studio website has tutorials that cover the basics of data processing.
Tulip takes a more structured approach to analysis, and allows predefined algorithms to be applied to tabulated data. The parameters of each algorithm can be altered in a panel similar to the approach taken in more familiar applications like office applications and graphics software. The tutorial section of the website takes the form of worked examples that led the user through several analysis by the hand to get them used to Tulips features.