Monday, January 28, 2013

Re-ordering a BAM file using Picard tools

Some programs such as RNA-SeQC, a quality control program for the NGS data, require BAM (alignment files) to be sorted in the same order as the reference genome file. For example, if the reference genome file that was used to create alignments had chromosome orders as: chr1, chr21, chr4, chr5, chr6..... then BAM file should be sorted in that order only.
In order to that Picard tools has program calles "ReorderSAM" but this program require something called "sequence dictionary file" generated from the reference genome file.
This sequence file can be created using another Picard tools utility called "CreateSequenceDictionary". The problem that I found is that its manual doesn't say exactly what the output file name and format should be.

In order to run "ReorderSam", a reference sequence dictionary file with the extension of ".dict" must be present in the same directory as the reference genome sequence file directory. So if your genome file is "hg19.fa", the dictionary file should be "hg19.dict".

When you run "ReorderSam", give the full path to the reference genome fasta file (not the dictionary file) as the value for the parameter "REFERENCE". "ReorderSam" will look for the ".dict" file in the same directory.