Lately I've been using cufflinks, cuffcompare, cuffdiff and cummeRbund pipeline (Trapnell et al. ) to analyze my RNA-Seq datasets. I am annotating the cufflinks assembled transcripts with the reference human annotated transcripts using cuffcompare. When cuffcompare finds a match between known (annotated) ORF and the assembled transcript, it attaches the attribute "p_id" (abbreviated for protein_id) in the output combined.gtf file. This attribute is required for running cummeRbund successfully in the downstream analysis.
For some reason cuffcompare was generating output files without this attribute when I ran it for my data. From some discussion on Seqanswers forum, I figured out the cuffcompare needs to be run with "-s" argument in order to get the protein_ids in the output file.
"-s" options directs cuffcompare to look for repeats in the transcripts.
I am not sure if this is a bug or some logical error in the program, but the trick worked for me.
For some reason cuffcompare was generating output files without this attribute when I ran it for my data. From some discussion on Seqanswers forum, I figured out the cuffcompare needs to be run with "-s" argument in order to get the protein_ids in the output file.
"-s" options directs cuffcompare to look for repeats in the transcripts.
I am not sure if this is a bug or some logical error in the program, but the trick worked for me.