Monday, October 10, 2011

How to tweak the BLAT code

Lately I've been working on long RNA-Seq reads (454 reads) and have been using BLAT (Blast like alignment tool) by Jim Kent in order to map sequencing reads to the reference genome. I like BLAT for this purpose particularly as it stitches alignment blocks scattered over a span (separated by putative introns mainly) and outputs them as a single "gene-oriented" alignment.
I've been trying to run the BLAT for the whole genome at once (all chromosome sequences in one file) rather than running for each chromosome.
If I run each chromosome separately, at the end of the run I'll have to merge BLAT output files from different chromosomes and sort them on the read ID as I would like to have all the possible genomic hits for each read all together in the file.

Everytime I try to run BLAT like that, it fails to do that (especially for human genome) and terminates with the error "needHugeMen: Out of huge memory - request size 957189248 bytes". I I browsed through the help pages on UCSC genome website (host and support website for the BLAT program) but didn't find anything conclusive. Finally, I started to look at the source code of BLAT itself. Here is the trick/tweak to get around the memory allocation problem.

1. Download the source code of BLAT.
2. Unzip the file and enter the directory.
3. Now open the file lib/memalloc.c
4. Go to line number 76 where "static size_t maxAlloc" is defined.
5. change 128*8*1024*1024*(sizeof(size_t)/4)*(sizeof(size_t)/4) to
(128*8*1024*1024*(sizeof(size_t)/4)*(sizeof(size_t)/4))*2
6. Save and close the file.
7. Now follow the instructions given in BLAT README to compile the edited code and to create the binary files.

This will make the BLAT able to load large genomes at once.

2 comments:

  1. Wow.. Thanks for the information. I have bookmarked your blog and will follow it.

    ReplyDelete

Comment moderation has been enabled. All comments must be approved by the blog author. Please type your comment below and hit 'Publish'.