Bioinformatics blog: How to extract paired-end reads from SRA files

Monday, February 27, 2012

How to extract paired-end reads from SRA files

SRA(NCBI) stores all the sequencing run as single "sra" or "lite.sra" file. You may want separate files if you want to use the data from paired-end sequencing. When I run SRA toolkit's "fastq-dump" utility on paired-end sequencing SRA files, sometimes I get only one files where all the mate-pairs are stored in one file rather than two or three files.
The solution for the problem is to always run fastq-dump with "--split-3" option. If the experiment is single-end sequencing, only one fastq file will be generated. If it is paired-end sequencing, there may be two or three fastq files.
Two files (with suffix "_1" and "_2") are matched mate-pair read file where as the third one (without any suffix) contains all the reads that do not have any mate-paires (or SRA couldn't resolve mate-paires for them).

Hope my experiences with NCBI SRA data handling help the readership.

29 comments:

KaiSeptember 4, 2012 at 10:48 AM
That post saved me a lot of trouble. Thank you!
ReplyDelete
Replies
AnonymousSeptember 26, 2012 at 12:36 PM
very useful, thank you.
ReplyDelete
Replies
JohanNovember 1, 2012 at 8:38 AM
Great!! This post was a time and a frustration saver
ReplyDelete
Replies
voutNovember 17, 2012 at 10:47 AM
Thank you very much. Your post helps me a lot!
ReplyDelete
Replies
KendomaniacDecember 21, 2012 at 9:41 AM
very useful!
ReplyDelete
Replies
UnknownJanuary 28, 2013 at 5:06 PM
--split-files should also do the same, which would also create files equal to the number of reads
ReplyDelete
Replies
UnknownJanuary 29, 2013 at 11:42 AM
On the NCBI-SRA web site, often there is no information on whether the runs are single end or paired. I guess it is useful to use the option mentioned by you as default, but does any one know how to find this info on the SRA web site.
ReplyDelete
Replies
Deepak PurushothamApril 4, 2013 at 12:36 PM
Awesome Vinay! Was looking for just this option

Thanks :)

ReplyDelete
Replies
Deepak PurushothamApril 4, 2013 at 12:38 PM
Thanks Vinay :) Was looking for exactly this
ReplyDelete
Replies
UnknownApril 11, 2013 at 7:13 PM
Thank you so much.
I have had this problem with months, and I didn't know what am I doing wrong :)
Thank you once more
ReplyDelete
Replies
KRMay 2, 2013 at 8:29 PM
This comment has been removed by the author.
ReplyDelete
Replies
KRMay 2, 2013 at 8:30 PM
That is one blog I would like to follow! Thanks!!
ReplyDelete
Replies
UnknownAugust 13, 2013 at 11:46 AM
Very cool
ReplyDelete
Replies
UnknownAugust 24, 2013 at 11:30 AM
Hi On using the following command:
$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
ReplyDelete
Replies
UnknownAugust 24, 2013 at 11:30 AM
Hi On using the following command:
$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
ReplyDelete
Replies
UnknownAugust 24, 2013 at 11:31 AM
Hi On using the following command:
$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
ReplyDelete
Replies
UnknownAugust 24, 2013 at 11:31 AM
Hi On using the following command:
$ fastq-dump --split-files -A SRR030257.lite.sra
I am not able produce a fastq file. I shows an error. I have set up the environmental variable, but still it does not work. However on using the following command:
$./illumina-dump --table-path ./SRRO30257.sra --outdir foldername -qseq 1
I am able to generate a file which i later convert it to fastq via a perl program !!
ReplyDelete
Replies
Vinay MittalAugust 26, 2013 at 12:26 PM
Hi Alok,
Do not use "-A" argument. Try this command instead:
/path_to_folder/fastq-dump --split-3 -O SRR030257.lite.sra
ReplyDelete
Replies
Mads BakJanuary 15, 2014 at 9:15 AM
Thank you soooo much
ReplyDelete
Replies
Mads BakJanuary 15, 2014 at 9:15 AM
Thank you soooo much
ReplyDelete
Replies
UnknownJanuary 27, 2014 at 9:10 PM
thanks a lot.
ReplyDelete
Replies
UnknownJanuary 27, 2014 at 9:11 PM
thanks a lot
ReplyDelete
Replies
AnonymousFebruary 4, 2014 at 9:43 PM
Thanks, this is very helpful!
ReplyDelete
Replies
AnonymousMarch 6, 2014 at 8:44 AM
Thanks a lot. Exactly what I was looking for.
ReplyDelete
Replies
KeithMarch 12, 2014 at 11:33 PM
Great tip! It's easy to miss that split-3 argument.
ReplyDelete
Replies
AnonymousMay 7, 2014 at 10:45 AM
Thank you very much!
ReplyDelete
Replies
AssomOctober 22, 2016 at 3:30 PM
WoooW man, you saved my life.

I was stuck with fastq-dumb generating two in compatible files, when i used the STAR aligner against them, i was getting this error:

EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one
SOLUTION: Check you your input files: they may be corrupted

At the end i used --split3 option and it generated 3 files, _1.fastq and _2.fastq and .fastq.

I used _1.fastq and _2.fastq and i got 95% alignment score!!

Again, thank you.
ReplyDelete
Replies
UnknownDecember 3, 2016 at 9:16 PM
Great to see your post. Some weeks ago I get into the same situation of dealing with fastq-dump for paired-end data, and I discovered the "--split-3" argument. I made some test with fastq-dump using differents paramters configurations. You can check the tests in this Biostar's post: https://www.biostars.org/p/213348/#213457.

Greetings.
ReplyDelete
Replies
UnknownJanuary 9, 2017 at 2:50 AM
Thank you for this post. It helped me a lot!
ReplyDelete
Replies

Add comment

Comment moderation has been enabled. All comments must be approved by the blog author. Please type your comment below and hit 'Publish'.

Bioinformatics blog

Monday, February 27, 2012

How to extract paired-end reads from SRA files

29 comments:

Blog Archive

Search This Blog