Skip to main content

Individual Course

Principles, Statistical and Computational Tools for Reproducible Data Science

thumbnail

Course Length

8 weeks

3-8 hours per week

Featuring faculty from:

Harvard Faculty of Arts & Sciences LogoHarvard Faculty of Arts & Sciences

Enroll as Individual

Certificate Price:

$ 149

Enroll as Individual

Certificate Price:

$ 149

Learn skills and tools that support data science and reproducible research.

Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

To meet the needs of the scientific community, this course will examine the fundamentals of methods and tools for reproducible research. Led by experienced faculty from the Harvard T.H. Chan School of Public Health, you will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.

This course will appeal to students and professionals in biostatistics, computational biology, bioinformatics, and data science. The course content will blend video lectures, case studies, peer-to-peer engagements and use of computational tools and platforms (such as R/RStudio, and Git/Github), culminating in a final presentation of a final reproducible research project.

We’ll cover Fundamentals of Reproducible Science; Case Studies; Data Provenance; Statistical Methods for Reproducible Science; Computational Tools for Reproducible Science; and Reproducible Reporting Science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Consider this course a survey of best practices: we’d like to make you aware of pitfalls in reproducible data science, some failure - and success - stories in the past, and tools and design patterns that might help make it all easier. But ultimately it’ll be up to you to take the skills you learn from this course to create your own environment in which you can easily carry out reproducible research, and to encourage and integrate with similar environments for your collaborators and colleagues. We look forward to seeing you in this course and the research you do in the future!

The course will be delivered via edX and connect learners around the world.

Self-Guided

EDX

Learning Outcome

Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.

Learning Outcome

Understand fundamentals of reproducible science using case studies that illustrate various practices.

Learning Outcome

Learn about computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.

  • Learn from Harvard faculty
  • Do it on your own time
  • Get a certificate, add it to your resume
  • Be part of the Harvard Community
Data Science for Business values

Faculty

Lorenzo Trippa
John Quackenbush
Curtis H.

Your Instructor

Christine Choirat

Research Associate at Harvard University

Dr. Christine Choirat is a research associate at the Institute for Quantitative Social Science (IQSS) and research scientist in the Biostatistics Department at the Harvard T.H. Chan School of Public Health. Christine was trained as a statistician and has over twenty journal and conference publications. Her research focuses on quantitative methods applied to economics, management, decision theory and psychology, with a special emphasis on experimental approaches. She is particularly interested in issues related with governance and ethics.

Your Instructor

Lorenzo Trippa

Associate Professor of Biostatistics at Harvard University

Dr. Lorenzo Trippa is an associate professor of Biostatistics in the Biostatistics Department at the Harvard T.H. Chan School of Public Health. He is interested in the development of adaptive clinical trial designs, and his research includes the study of algorithms and methodologies for the analysis of data generated by adaptive trials. He is also particularly interested in Bayesian nonparametrics, a great source of modeling opportunity in biomedical applications as well as both computational and theoretical problems.

Your Instructor

John Quackenbush

Professor of Computational Biology and Bioinformatics at Harvard University

Dr. John Quakenbush is a professor of Computational Biology and Bioinformatics in the Biostatistics Department at the Harvard T.H. Chan School of Public Health and Dana-Farber Cancer Institute, as well as the director of the Center for Cancer Computational Biology (CCCB). His research group focuses on methods spanning the laboratory to the laptop that are designed to use genomic and computational approaches to reveal the underlying biology. In particular, they have been looking at patterns of gene expression in cancer with the goal of elucidating the networks and pathways that are fundamental in the development and progression of the disease.

Your Instructor

Curtis Huttenhower

Associate Professor of Computational Biology and Bioinformatics at Harvard University

Dr. Curtis Huttenhower is an associate professor of Computational Biology and Bioinformatics in the Biostatistics Department at the Harvard T.H. Chan School of Public Health and director of the Huttenhower Lab. His research focuses on understanding the function of microbial communities, particularly that of the human microbiome in health and disease. This work entails a combination of computational methods development for wrangling large data collections, as well as biological analyses and laboratory experiments to link the microbiome in human populations to specific microbiological mechanisms.

An example HarvardX certificate

Ways to take this course

Audit or Pursue a Verified Certificate

A Verified Certificate costs $149 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate.

⁠Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums. Please note that this track does not offer a certificate for learners who earn a passing grade.

Stay tuned for more

Don’t miss a thing. Subscribe to our newsletter and get updates on exclusive content for Harvard Online learners.