Salih Ulu Blog


Compare transcriptome assembly to the reference transcriptome (optional) ● Timing <5 min
You can use a utility program included in the Cufflinks suite called Cuffcompare to compare assemblies against a reference transcriptome. Cuffcompare makes it possible to separate new genes from known ones, and new isoforms of 8 known genes from known splice variants. Run Cuffcompare on each of the replicate assemblies as well as the merged transcriptome file:

$ find . -name transcripts.gtf > gtf_out_ list.txt

$ cuffcompare -i gtf_out_list.txt -r genes.gtf

$ for i in ‘find . -name *.tmap’; do echo $i; awk ‘NR > 1 { s[$3]++ } END { for (j in s) { print j, s[j] }} ‘ $i; done;

The first command creates a file called gtf_out_list.txt that lists all of the GTF files in the working directory (or its sub- directories). The second command runs Cuffcompare, which compares each assembly GTF in the list to the reference annota- tion file genes.gtf. Cuffcompare produces a number of output files and statistics, and a full description of its behavior and functionality is out of the scope of this protocol. Please see the Cufflinks manual for more details on Cuffcompare’s output files and their formats. The third command prints a simple table for each assembly that lists how many transcripts in each assembly are complete matches to known transcripts, how many are partial matches and so on.


salih-MacBook-Pro:~ simacpro$ find . -name transcripts.gtf > gtf_out_list.txt

salih-MacBook-Pro:~ simacpro$ cuffcompare -i gtf_out_list.txt -r Bowtie2Index/genome.gtf
You are using Cufflinks v2.2.1, which is the most recent release.

salih-MacBook-Pro:~ simacpro$ find . -name *.tmap | while read file; do echo $file; awk 'NR > 1 { s[$3]++ } END { for (j in s) { print j, s[j] }} ' $file; done
e 21
j 969
= 9433
c 1055
e 20
j 928
= 9553
c 987
e 21
j 917
= 9530
c 978
e 19
j 881
= 9274
c 979
e 19
j 880
o 1
= 9269
c 959
e 22
j 868
= 9315
c 965
salih-MacBook-Pro:~ simacpro$

Categories: Bioinformatics, Tuxedo

Leave a Reply