Overview

After a brief introduction, I will cover the steps involved in the processing of Hi-C data, from raw fastq reads into a contact matrix, then we will jump straight into visualizing data and exploring features of interest.


Getting Started with Linux

We are using the Linux command line to run most of the tools we use today. If you are new to Linux please complete the Intro to Command Line Workshop.


Logging in

We have several servers that you can login to. For the purpose of this practical we will use bifx-core3. No matter where you login, you will have access to the same files and programs. If not on the University network, this requires you to use a VPN.

There are several options to login to our machines. You can use the Terminal app on a Mac or equivalant Command Prompt or MobaXTerm on Windows. Login via X2G0 if you want a graphical interface.

To login via command line: ssh

Login with your user credentials for our servers

If you are using MobaXTerm, an alternative way of logging in to the server is shown in the MobaXTerm demo.

Once you have typed in your password, you should see some welcome text and a prompt that looks something like this:

[USERNAME]@bifx-core3:~$


1. Creating A Web Directory

In order to view files created on the server, we need to create a public_html directory.

After logging in you should be in your $HOME directory, check with;

pwd

This should show the PATH of your present working directory, which should now be your home directory as you have just logged in. You can return to this place at any time using the change directory command.

cd

You have permissions to create files and directories under your home folder. Lets create some now which we will use later on.

mkdir ~/public_html
mkdir ~/public_html/TMP

Here we have used the absolute path name for each directory using ~/ as a shortcut for your $HOME directory. Nested directories are separated by the forward slash ‘/’ sign.

As you have created ~/public_html, contents of this directory are available online with any web browser

To see it enter the following URL, changing yourUserName to what ever your username is.

http://bifx-core3.bio.ed.ac.uk/~yourUserName

For some of you, this may be found here instead;

https://bifx-core3.bio.ed.ac.uk/Public/yourUserName/


2. Data

First, make a new directory for this tutorial and move into that directory. Then link the directory to your public html folder as we are going to make everything public in this tutorial.

cd 
mkdir Hi-C_workshop
cd Hi-C_workshop
ln -s $PWD ~/public_html/

As Hi-C processing is slow, we are using some test data from the Dixon et al. 2012 paper.

These contain a small subsets of reads (~200,000).

Let’s copy the data into our project folder, we are copying multiple folders and files so need the recursive ‘-r’ argument.

cp -r /homes2/drober19/hic_workshop/test_data .

The data is also available here

When you receive data from a sequencing centre the file should also be provided with an alphanumeric string known as an md5 checksum. We can think of this as a files passport or fingerprint and use it to verify our data and ensure it wasn’t corrupted or truncated during download. The md5 checksums for these files are below. Lets check that now using the md5sum command:

md5 checksum filename
94662045313249dec4d74f1eb337ddea test_data/dixon_2M_2/SRR400264_01_R1.fastq
43fcb38e4c64146a4ff7872176f7b698 test_data/dixon_2M_2/SRR400264_01_R2.fastq
41842bf0dcc77fb9cd4107b103d2588e test_data/dixon_2M/SRR400264_00_R1.fastq
09db06e20633213743a3081f0a3121a9 test_data/dixon_2M/SRR400264_00_R2.fastq
md5sum test_data/*/* > md5
cat md5 # prints out the contents of md5
#To check the files and md5 sums match at any time
md5sum -c md5 


HiGlass

We will be using HiGlass later on;

Again, you’ll need a VPN to access the local server if you’re using your own laptop;

http://bifx-core3.bio.ed.ac.uk:32771/app

Or use the official HiGlass site.



Key Aims:

  • Be able to login to the bifx servers
  • Create a personal web directory
  • Create a project directory for fastq files