8. Creating Tracks

Converting BAM files to bigWig

Visually inspecting data via a genome browser is often the first step in any analysis. Has the sequencing worked as expected? Are there noticeable differences between my samples by eye? BAM files are typically large and contain detailed information for every read, thus they are slow to access and view on a genome browser. If we are only interested in read profiles we can convert our BAM files to graphs of sequencing depth per base. The wiggle file format has three columns to represent a genomic interval and a 4th as a score which will represent the number of reads overlapping that region. We will create a compressed version of this format called bigWig for use with genome browsers. To do this we are going to use the deepTools package which has a lot of useful tools particularly for ChIP-seq analysis.

The bamCoverage tool in deepTools has many different options for normalising, smoothing and filtering your BAM files before converting to bigWig and the documentation is worth a read.

cat samples.txt | parallel -j 4 "bamCoverage -bs 1 --normalizeUsing BPM -e 200 -b bwa_out/{}/{}.uniq.bam --outFileName bwa_out/{}/{}.uniq.bw"


9. Scripting

Putting all of your commands into a script is good practice for keeping track of your analyses and for reproducibility. We want to write scripts so they can be used again on different datasets and avoid hardcoding the names of files.

We now have our alignments (BAM) and visualisation files (bigWig) and this is normally a branching point for downstream analyses that quantify and annotate the data.

Here is a pipeline containing everything we have run so far; pipeline.sh

Looking at this pipeline script you’ll notice we are not running parallel every time we run a command, instead we launch the script using parallel and the sample names are passed using the special ‘$1’ bash variable. This is more efficient as the samples will run through each command independently.

Using this we can re-run everything from start to end in one go.

mkdir CS_workshop_tmp #Create a temporary directory
cd CS_workshop_tmp #Move into that directory
cp /homes/library/training/ChIP-seq_workshop/pipeline.sh . # Copy the pipeline into the new directory
cp ../samples.txt . #Copy the samples file
nohup cat samples.txt | parallel -j 4 bash pipeline.sh {} > pipeline.log & #Run the shell script (See Below) 
cd .. #Move back to the main directory

You can keep track of your pipeline by using ps or looking at the log file.

ps f
tail CS_workshop_tmp/pipeline.log # Shows the end of a file

Note any commands run on multiple samples will need be to run separately (after pipeline finished)

multiqc CS_workshop_tmp/fastq -o CS_workshop_tmp/fastq


Tidy Up!

Files are large, disk space is expensive, remove any unwanted or temporary files from your folder. We should always keep the raw data (fastq) and our final processed datasets (BAM, bigWig etc) and the script we used to generate them. SAM files are large and should be removed once converted to BAM.

rm fastq/*trim*fq.gz #Remove trimmed fastq temp files
rm bwa_out/*/*.sam #Remove sam files


Integrative Genomics Viewer (IGV)

IGV (Integrative genomics viewer) is a powerful genome browser from the Broad institute. It is perfect for viewing your sequencing files as loading data is easy, common formats are understood, views are customisable and navigation is quick.

First let’s put all of our visualisation files in one folder.

mkdir visualisation
ln -s $PWD/bwa_out/*/*bw visualisation
ln -s $PWD/bwa_out/*/*uniq.bam* visualisation

It is best to download the desktop app for use on your own machine but if you are using a university managed computer you can access the web app at https://igv.org/app/.

Open IGV and set the genome to sacCer3, then find your visualisation folder online. In the desktop version you can drag and drop files into IGV. If you are using the webapp you will need to download the files you require and open them using the tracks menu.

Open a BAM file and a corresponding bigWig file into IGV. Make sure the .bai index file is in the same folder as the BAM.

First, let’s navigate the chromosome:

Now let’s customise our view:

DESKTOP APP

  1. Select Tracks->Fit Data To Window To automatically set track heights to fill the view
  2. Right click on the Refseq genes track and switch between display modes Collapsed, Expanded and Squished. (Use the horizontal panel divider to adjust height)
  3. Use the right click menu on the bigwig track name to customise the display. Try the following:
  4. Rename the track
  5. Change the track colour
  6. Change the graph type
  7. Change the windowing function
  8. Adjust the data range
  9. Set the scale to Autoscale. This automatically sets the Y-axis to the height of the tallest peak in your view.

WEB APP

  1. Use the cog on the Ensembl genes track to switch between display modes Collapsed, Expanded and Squished.
  2. Customise the bigwig track by trying the following:
  3. Rename the track
  4. Change the track colour
  5. Change the track height
  6. Set a minimum and maximum Y-axis
  7. Set the scale to Autoscale. This automatically sets the Y-axis to the height of the tallest peak in your view.


Challenge:

One BAM vs bigWig

  • Why are the coverage profiles different?

  • Which format is faster to view?

  • Can you identify mismatches in read alignment?

  • What other information do the BAM files give us?

Now let’s load all our bigwig files

  • Can you identify peaks in the ChIP samples

  • Do you see any peaks that are also present in the Input?



Other genome browsers


Key Aims:

  • Create tracks from our data
  • Visualise alignments on a genome browser