I've been trying to run the BLAT for the whole genome at once (all chromosome sequences in one file) rather than running for each chromosome.
If I run each chromosome separately, at the end of the run I'll have to merge BLAT output files from different chromosomes and sort them on the read ID as I would like to have all the possible genomic hits for each read all together in the file.
Everytime I try to run BLAT like that, it fails to do that (especially for human genome) and terminates with the error "needHugeMen: Out of huge memory - request size 957189248 bytes". I I browsed through the help pages on UCSC genome website (host and support website for the BLAT program) but didn't find anything conclusive. Finally, I started to look at the source code of BLAT itself. Here is the trick/tweak to get around the memory allocation problem.
1. Download the source code of BLAT.
2. Unzip the file and enter the directory.
3. Now open the file lib/memalloc.c
4. Go to line number 76 where "static size_t maxAlloc" is defined.
5. change 128*8*1024*1024*(sizeof(size_t)/4)*(sizeof(size_t)/4) to
(128*8*1024*1024*(sizeof(size_t)/4)*(sizeof(size_t)/4))*2
6. Save and close the file.
7. Now follow the instructions given in BLAT README to compile the edited code and to create the binary files.
This will make the BLAT able to load large genomes at once.
Wow.. Thanks for the information. I have bookmarked your blog and will follow it.
ReplyDeleteNice solution Vinay..!!
ReplyDelete