The Statistics and Data Science team plays an integral part in the research of all scientific programmes in the MRC Epidemiology Unit.
Statistics work is led by Stephen Sharp:
- Good Analytical Practice (see below).
- Statistical input into ongoing Unit research work.
- Library of exemplar Stata code for application of specific methods.
- Cambridge Epidemiology and Trials Unit (CETU).
- National Diet and Nutrition Survey (NDNS) rolling programme.
- QC, imputation and analysis for big data from genomewide and omics platforms.
- Collaborations with external statisticians (e.g. MRC Biostatistics Unit) on statistical topics of relevance to Unit research.
- Contributions to the training of statisticians and epidemiologists (e.g. University of Cambridge MPhil in Population Health Sciences).
- Provision of statistical reviews for papers submitted to medical and epidemiological journals.
Data science work is led by Tom Bishop:
- Federated meta-analysis, which enables cross-cohort analyses without physically pooling data from each study (InterConnect, EUCAN-Connect projects).
- Application of novel methods for data acquisition and processing, including web-scraping techniques and deep neural networks.
- Collaboration with external experts (e.g. University of Cambridge Department of Applied Mathematics and Theoretical Physics, Department of Computer Science, Health Data Research UK) on data science issues of relevance to Unit research.
- Development of a Trusted Research Environment (TRE) for the Unit, which allows researchers to use and access our data without being able to take it away.
The MRC Epidemiology Unit has a Standard Operating Procedure (SOP) for Good Analytical Practice. This SOP applies to all employees, students and visitors affiliated to any of the Unit research programmes who perform any type of analysis using data and generate outputs for which the MRC Epidemiology Unit has primary responsibility. Examples of outputs include papers, reports, PhD theses, MPhil project reports, conference presentations. The rationale of this SOP is to ensure that all analytical work is clearly justified, accurate, transparent and reproducible.
Topics covered by the SOP include:
- Statistical Analysis Plans.
- Analysis software.
- Analysis programs.
- Datasets.
- Location of analysis work.
- Internal peer review of analysis work.
For further information about the SOP or the work of the Statistics and Data Science Team, please email Stephen Sharp (sjs207@cam.ac.uk).