Transcripts that do not have homology to known protein-coding genes or InterPro domain-containing genes were collected as non-protein coding transcript candidates.
To identify non-coding RNA (ncRNA) genes, we first conducted a sequence homology search against all known RNA genes (collected from various databases) using BLASTN, and then examined supporting functional evidence by human curation. Transcripts were classified into two categories, Identical to known ncRNA and Similar to known ncRNA. Additionally, in each category, "short ncRNA"(snRNA, snoRNA, and scaRNA) or "long ncRNA" (other functional ncRNA) was written as a comment.
1.Identical to known ncRNA (with BLASTN results Eval < 10-10 and with functional evidence)
2.Similar to known ncRNA (with BLASTN results Eval < 10-10 and without functional evidence)
For transcripts that do not have sequence similarity to currently known RNA genes and do not have a putative CDS (coding sequence) with deduced amino acid sequence whose length is longer than 20 amino acid residues, we next conducted discriminant analysis to detect functional ncRNA candidates. Additionally, in the category of Putative ncRNA, "short ncRNA" (snRNA, snoRNA, and scaRNA), "long ncRNA" (other functional ncRNA) or "both long and short ncRNA" was written as a comment.
3.Putative ncRNA (functional ncRNA candidates by discriminant analysis)
4.Uncharacterized transcript (non-functional ncRNA candidates)
5.Unclassifiable transcript (possible genomic fragments or partial sequences) >