Duke HTS Course Tutorial - Introduction to Computing for HTS Experiments

Course Components

In this course, you will learn how to generate and analyze RNAseq data. Roughly, we have the following components:

  • Experimental Design and Statistics How do we design experiments so that results are easily interpretable and answer the question(s) we are interested in.

  • Analysis of Data How do we properly analyze experimental data, so that results are correct.

  • Computational Procedure Analysis pipeline (Bioinformatics)

Finally, we want to do all of the above in a REPRODUCIBLE fashion.

The analysis pipeline has several different apps. Some are written in R, and so require knowledge of the R programming language (minimal - but some proficiency is needed). Some have components in python, and some are binaries. Both of these last types of applications require moderate proficiency in the 'bash shell' or 'unix command line'. Therefore, we will cover the following topics:

  • Basic R
  • Basic Unix/Linux commands

Additionally, we will use the bootcamp to reinforce the statistical lecture materials by walking you through some of the examples using R, and we will cover some 'data visualization' techniques that include graphics in R.

All of this will be done within the Jupyter notebook tool, which allows for what is called 'literate programming' and reproducible pipelines.

How this is all setup

Detailed instructions will follow, but here is a summary of what is to be done for clarity.

Materials

The course materials are archived in a for download via git. They are also available here as a compressed archive file. If you are a git user, clone the repo using:

$ git clone https://gitlab.oit.duke.edu/janice/hts_final_for_distribution.git

Docker image

A docker image is a virtualized system that can be used to create an isolated computing environment on your computer or on some other server. Our docker image is stored on Dockerhub and may be downloaded from there using []. This image has an Ubuntu 18.04 environment and all of the software necessary to run the HTS pipeline. As such, it is rather large (XX GB).

Docker program

You will need to install the docker program to run the docker image.

Linking the image to files

You will need to have the course materials and any data files on your computer or the server on which you are running the image. You will tell the docker program where these are in the docker run command.

Detailed Instructions

Download the course materials

Download the materials here. Double click on the archive file and extract the materials to the desired folder.

Install docker

Follow these links to install docker for your operating system:

Windows Mac

For Linux systems, use your package manager to install docker

Get the docker image

You will need to open a terminal window or command prompt to enter the following commands.

$ docker pull dukehtscourse/jupyter-hts-2019

Start the image running

$ docker run --name hts-course -v YOUR_DIRECTORY_WITH_COURSE_MATERIAL:/home/jovyan/work \ -d -p 127.0.0.1\:9999\:8888 \ -e PASSWORD="YOUR_CHOSEN_NOTEBOOK_PASSWORD" \ -e NB_UID=1000 \ -t dukehtscourse/jupyter-hts-2019

Run the notebooks

Running on your computer

Open a browser window and go to the address https://localhost:8888

Running on a server

Open a browser window on your computer and go to the address https://<your server url>:8888

Ready to go

You should now see the directory listing in your browser window with all of the course materials.