SRA(NCBI) stores all the sequencing run as single "sra" or "lite.sra" file. You may want separate files if you want to use the data from paired-end sequencing. When I run SRA toolkit's "fastq-dump" utility on paired-end sequencing SRA files, sometimes I get only one files where all the mate-pairs are stored in one file rather than two or three files.
The solution for the problem is to always run fastq-dump with "--split-3" option. If the experiment is single-end sequencing, only one fastq file will be generated. If it is paired-end sequencing, there may be two or three fastq files.
Two files (with suffix "_1" and "_2") are matched mate-pair read file where as the third one (without any suffix) contains all the reads that do not have any mate-paires (or SRA couldn't resolve mate-paires for them).
Hope my experiences with NCBI SRA data handling help the readership.
That post saved me a lot of trouble. Thank you!
ReplyDeletevery useful, thank you.
ReplyDeleteGreat!! This post was a time and a frustration saver
ReplyDeleteThank you very much. Your post helps me a lot!
ReplyDeletevery useful!
ReplyDelete--split-files should also do the same, which would also create files equal to the number of reads
ReplyDeleteOn the NCBI-SRA web site, often there is no information on whether the runs are single end or paired. I guess it is useful to use the option mentioned by you as default, but does any one know how to find this info on the SRA web site.
ReplyDeleteAwesome Vinay! Was looking for just this option
ReplyDeleteThanks :)
Thanks Vinay :) Was looking for exactly this
ReplyDeleteThank you so much.
ReplyDeleteI have had this problem with months, and I didn't know what am I doing wrong :)
Thank you once more
This comment has been removed by the author.
ReplyDeleteThat is one blog I would like to follow! Thanks!!
ReplyDeleteVery cool
ReplyDeleteHi On using the following command:
ReplyDelete$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
Hi On using the following command:
ReplyDelete$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
Hi On using the following command:
ReplyDelete$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
Hi On using the following command:
ReplyDelete$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
Hi Alok,
ReplyDeleteDo not use "-A" argument. Try this command instead:
/path_to_folder/fastq-dump --split-3 -O SRR030257.lite.sra
Thank you soooo much
ReplyDeleteThank you soooo much
ReplyDeletethanks a lot.
ReplyDeletethanks a lot
ReplyDeleteThanks, this is very helpful!
ReplyDeleteThanks a lot. Exactly what I was looking for.
ReplyDeleteGreat tip! It's easy to miss that split-3 argument.
ReplyDeleteThank you very much!
ReplyDeleteWoooW man, you saved my life.
ReplyDeleteI was stuck with fastq-dumb generating two in compatible files, when i used the STAR aligner against them, i was getting this error:
EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one
SOLUTION: Check you your input files: they may be corrupted
At the end i used --split3 option and it generated 3 files, _1.fastq and _2.fastq and .fastq.
I used _1.fastq and _2.fastq and i got 95% alignment score!!
Again, thank you.
Great to see your post. Some weeks ago I get into the same situation of dealing with fastq-dump for paired-end data, and I discovered the "--split-3" argument. I made some test with fastq-dump using differents paramters configurations. You can check the tests in this Biostar's post: https://www.biostars.org/p/213348/#213457.
ReplyDeleteGreetings.
Thank you for this post. It helped me a lot!
ReplyDelete