Tuesday, November 12, 2013

Difference between Cuffcompare and Cuffmerge



Cufflinks is one of the most commonly used program for reference-genome based transcriptome assembly, and expression estimation and differential expression analysis. Cufflinks comes with two supplementary programs (in addition to few others) for post cufflinks workflow:: Cuffcompare and Cuffmerge.

Although Cuffcompare and Cuffmerge may seem to perform same task when it comes to handling multiple transcriptome assemblie, there are sitll substantial differences among the two.Cufflinks manual, Tuxedo pipeline (Bowtie-TopHat-Cufflinks-Cuffdiff) paper and forum posts from the developers have pointed out these differences but still I get a lot of questions regarding the same. I will try to explain differences to make it more clear.

Cuffcompare and Cuffmerge both are used to merge multiple transcript assemblies but in a little different manner.
Cuffcompare takes all the transcripts from multiple assemblies (in GTF format) and creates a union of all the transcripts where all the redundant transcripts are removed. Cuffcompare does not change any of the assembled transcript in any of the assembly instead it simply compares the coordinates of the transcripts.
Resulting file, "combined.gtf", contains a set of "unified" transcripts across all the assemblies.  The "combined.gtf" file can be used as the reference GTF file for the quantification across the samples using Cuffdiff (another program in Cufflinks toolkit).

Cuffmerge, on the other hand, creates a "merged" set of transcripts form multiple assemblies. During this merging transcripts from all the assemblies (GTF files) are converted to representative reads in SAM format and Cufflinks (original assembly program) is run internally to see of there is any gaps that can be filled and a longer consensus sequence can be created. Basically, Cuffmerge merges transcripts that are overlapping  and share a similar exon structure (or splicing structure) to generate a longer chain of connected exons.


Overall, Cuffcompare will generate a non-redundant set of transcripts while Cuffmerge will generate a more consensus assembly form a multiple set of assemblies. So from Cuffmerge you get a cleaner, somewhat more complete assembly and ,generally, fewer number of assembled transcript as compared to the transcripts from Cuffcompare.

Additional note:
Cuffcompare is a more comprehensive program than simply a tool to combine assemblies. For example, ".trackinbg" file generated by Cuffcompare contains the information about how many samples each transcripts was present so that you can the idea of multiplicity (recurrence) of each transcript across multiple samples. Cuffcompare can also annotate your transcript assemblies using a reference annotation files (in GTF format) and will assign reference transcript Id( such as ensemnbl id) and gene symbol to the assembled transcripts.



4 comments:

  1. Hi, Vinay Mittal. I'm very happy to find your blog to get many knowledge about RNAseq. I have a question about cuffcompare, I think the cuffcompare output tmap should output a row per input transcript id. However, I find that my output tmap file have perhaps make some filtering to lost some transcript id. Have you met this situation, do you know why? Thank you!

    ReplyDelete
  2. Some of the small fragements that are unlikely to be conclusive of a functional transcript are removed by cuffcompare. This is the reason you are seeing missing transcripts from he tmap file. I think you can tell cuffcompare to not remove any transcripts during the filtering process.

    ReplyDelete
  3. My understanding was that Cuffmerge also runs Cuffcompare automatically. On the Cufflinks manual, it says, "Cufflinks includes a script called cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artifacts." Does this always occur automatically, and is there a way to run Cuffmerge without running Cuffcompare in the background?
    http://cufflinks.cbcb.umd.edu/manual.html

    ReplyDelete
  4. Thanks Vinay for sharing your knowledge about Cufflinks.

    Best wshes,
    Amit

    ReplyDelete

Comment moderation has been enabled. All comments must be approved by the blog author. Please type your comment below and hit 'Publish'.