|
|
|
| Home Quick guide Navi BLAST Site map Download Contact us Help |
Description: There are many mRNA sequences which can not be aligned correctly onto the human reference genome due to several reasons. We annotated these UM mRNA sequences and categorized them according to the reason for the failure of mapping. Here, we provide the annotated set of UM genes which are predicted to be transcribed from the gap region or from the unsequenced region of the reference genome. These UM genes are supported by at least one line of evidence other than the alignment on the human reference genome. In addition, we also provide the annotation information for the chimera transcripts which may contain trans-splicing candidates or fusion transcripts transcribed from a rearranged genome.
This topc includes three sub-topics:
| HIX | Representative HIT | Similarity category | Definition | Supporting evidences (comma separated) | Position | Annotation status (auto or manual) |
|---|---|---|---|---|---|---|
| HIX0060313 | HIT000323848 | 3 | von Willebrand factor, type D domain containing protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0041649 | HIT000243877 | 3 | Homeobox domain containing protein. | 5-1,5-5 | - | manual |
| HIX0060423 | HIT000192492 | 3 | von Willebrand factor, type D domain containing protein. | 5-5 | - | manual |
| HIX0159179 | HIT000326960 | 2 | Mucin (Fragment). | 5-5 | - | auto |
| HIX0060216 | HIT000026013 | 3 | Plectin repeat containing protein. | 5-2,5-3,5-5 | - | manual |
| HIX0080310 | HIT000339687 | 3 | Coatomer, gamma subunit family protein. | 5-2,5-3,5-5,5-6-1,5-6-2,5-6-3 | - | auto |
| HIX0060229 | HIT000048686 | 4 | Conserved hypothetical protein. | 5-2,5-5 | - | auto |
| HIX0044052 | HIT000067464 | 3 | Aconitase, mitochondrial-like family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0027940 | HIT000043513 | 5 | Hypothetical protein. | 5-1,5-2,5-5,5-6-1 | - | manual |
| HIX0055490 | HIT000327117 | 3 | Pyruvate kinase family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0042028 | HIT000099040 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0043567 | HIT000064221 | 3 | Homeobox domain containing protein. | 5-5,5-6-3 | - | manual |
| HIX0159184 | HIT000332427 | 4 | Conserved hypothetical protein. | 5-1,5-3,5-5 | - | auto |
| HIX0028063 | HIT000046651 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0018463 | HIT000084380 | 3 | Ras family protein. | 5-1,5-3,5-5 | - | manual |
| HIX0017710 | HIT000013787 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0019573 | HIT000026389 | 3 | Protein of unknown function DUF1193 family protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0042975 | HIT000061896 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0080315 | HIT000017555 | 3 | Zinc finger, C2H2-type domain containing protein. | 5-5 | - | manual |
| HIX0055480 | HIT000327092 | 4 | Conserved hypothetical protein. | 5-5,5-6-3 | - | auto |
| HIX0041483 | HIT000216695 | 3 | Replication initiation factor family protein. | 5-5 | - | manual |
| HIX0027949 | HIT000052647 | 4 | Conserved hypothetical protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0159189 | HIT000063765 | 2 | PHD finger protein 2 (GRC5). | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0097813 | HIT000259351 | 3 | Chlorophyll A-B binding protein family protein. | 5-5 | - | auto |
| HIX0159191 | HIT000249427 | 2 | Transgelin-2 (SM22-alpha homolog). | 5-2,5-3,5-5 | - | auto |
| HIX0042029 | HIT000031758 | 5 | Hypothetical protein. | 5-2,5-3,5-5 | - | auto |
| HIX0159192 | HIT000045138 | 5 | Hypothetical protein. | 5-2,5-5 | - | auto |
| HIX0061037 | HIT000332435 | 4 | Conserved hypothetical protein. | 5-5 | - | auto |
| HIX0044950 | HIT000070322 | 3 | Potassium channel, voltage dependent, Kv4.2 family protein. | 5-2,5-3,5-5,5-6-3,5-6-4 | - | auto |
| HIX0023407 | HIT000015937 | 4 | Conserved hypothetical protein. | 5-5,5-6-1,5-6-2,5-6-4 | - | auto |
| HIX0080399 | HIT000065455 | 4 | Conserved hypothetical protein. | 5-5 | - | auto |
| HIX0112563 | HIT000386642 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159194 | HIT000393102 | 4 | Conserved hypothetical protein. | 5-5 | - | auto |
| HIX0159195 | HIT000332095 | 2 | IGHM protein. | 5-5,5-6-4 | - | auto |
| HIX0097979 | HIT000392343 | 4 | Conserved hypothetical protein. | 5-2,5-5 | - | manual |
| HIX0080309 | HIT000026018 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0043236 | HIT000063482 | 3 | Protein of unknown function DUF1234 family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0049630 | HIT000252513 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0054342 | HIT000321000 | 3 | ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0159197 | HIT000191412 | 2 | Cytochrome P450 2A6 (EC 1.14.14.1) (CYPIIA6) (Coumarin 7-hydroxylase) (P450 IIA3) (CYP2A3) (P450(I)). | 5-2,5-3,5-5 | - | auto |
| HIX0159199 | HIT000191957 | 2 | Immunglobulin heavy chain variable region (Fragment). | 5-5,5-6-3 | - | auto |
| HIX0068099 | HIT000195934 | 3 | ATPase, V1 complex, subunit B family protein. | 5-2,5-3,5-5,5-6-1,5-6-2,5-6-4 | - | auto |
| HIX0159200 | HIT000334029 | 1 | Uncharacterized serine/threonine-protein kinase SgK494 (EC 2.7.11.1) (Sugen kinase 494). Isoform 2. | 5-4,5-5 | - | manual |
| HIX0037289 | HIT000336562 | 3 | Mammalian taste receptor family protein. | 5-2,5-3,5-5,5-6-1 | - | auto |
| HIX0113155 | HIT000393254 | 5 | Hypothetical protein. | 5-1,5-5,5-6-1 | - | auto |
| HIX0019325 | HIT000038191 | 3 | Twist family protein. | 5-2,5-5,5-6-2,5-6-3,5-6-4 | - | manual |
| HIX0060419 | HIT000074350 | 3 | Glyceraldehyde 3-phosphate dehydrogenase family protein. | 5-5 | - | auto |
| HIX0060255 | HIT000056150 | 3 | GPCR kinase domain containing protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0042074 | HIT000193050 | 3 | Homeobox domain containing protein. | 5-5 | - | manual |
| HIX0159202 | HIT000250723 | 1 | Dynein light chain 1, cytoplasmic (Dynein light chain LC8-type 1) (8 kDa dynein light chain) (DLC8) (Protein inhibitor of neuronal nitric oxide synthase) (PIN). | 5-4,5-5 | - | manual |
| HIX0022691 | HIT000047610 | 4 | Conserved hypothetical protein. | 5-3,5-5 | - | auto |
| HIX0159205 | HIT000022694 | 2 | Carboxyl-ester lipase. | 5-2,5-5 | - | auto |
| HIX0042026 | HIT000094695 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0051085 | HIT000099581 | 4 | Conserved hypothetical protein. | 5-5,5-6-1 | - | auto |
| HIX0159206 | HIT000017288 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5 | - | auto |
| HIX0021405 | HIT000079603 | 3 | Zinc finger, C2H2-type domain containing protein. | 5-2,5-3,5-5,5-6-1 | - | auto |
| HIX0075704 | HIT000332430 | 4 | Conserved hypothetical protein. | 5-1,5-3,5-5 | - | auto |
| HIX0050885 | HIT000037321 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5 | - | manual |
| HIX0159209 | HIT000026212 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | auto |
| HIX0159210 | HIT000195782 | 2 | Mucin-3A precursor (MUC-3A) (Intestinal mucin-3A). | 5-1,5-5 | - | auto |
| HIX0080322 | HIT000045447 | 3 | RNA polymerase II, heptapeptide repeat, eukaryotic containing protein. | 5-5,5-6-1 | - | manual |
| HIX0159211 | HIT000332425 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | auto |
| HIX0159212 | HIT000392815 | 2 | Imuunoglobulin mu-chain D-J4-region (Fragment). | 5-5,5-6-4 | - | auto |
| HIX0017615 | HIT000006117 | 4 | Conserved hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159213 | HIT000061897 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0072430 | HIT000251585 | 4 | Conserved hypothetical protein. | 5-3,5-5 | - | auto |
| HIX0046983 | HIT000243957 | 3 | Steroid/nuclear receptor, oncofetal protein p65 family protein. | 5-5 | - | manual |
| HIX0159215 | HIT000324082 | 5 | Hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0159216 | HIT000332437 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | auto |
| HIX0061364 | HIT000011443 | 5 | Hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0028061 | HIT000046507 | 4 | Conserved hypothetical protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0017743 | HIT000022348 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5 | - | manual |
| HIX0097994 | HIT000393071 | 4 | Conserved hypothetical protein. | 5-5 | - | auto |
| HIX0159217 | HIT000090650 | 2 | Plus agglutinin (Fragment). | 5-2,5-3,5-5 | - | auto |
| HIX0159218 | HIT000250428 | 5 | Hypothetical protein. | 5-2,5-3,5-5 | - | manual |
| HIX0027993 | HIT000043320 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159219 | HIT000332098 | 2 | Anti-FactorVIII scFv (Fragment). | 5-2,5-3,5-5 | - | auto |
| HIX0055510 | HIT000327141 | 4 | Conserved hypothetical protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0159220 | HIT000325725 | 2 | Immunglobulin heavy chain variable region (Fragment). | 5-2,5-3,5-5,5-6-4 | - | auto |
| HIX0041484 | HIT000061895 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0159222 | HIT000246382 | 2 | Integrin-linked protein kinase (EC 2.7.11.1). | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0052343 | HIT000215687 | 3 | Gonadotropin, beta chain family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0055478 | HIT000327090 | 4 | Conserved hypothetical protein. | 5-2,5-5,5-6-3,5-6-4 | - | auto |
| HIX0159228 | HIT000255501 | 2 | ATP synthase protein 8 (ATPase subunit 8) (A6L). | 5-5 | - | auto |
| HIX0017611 | HIT000018015 | 5 | Hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159230 | HIT000392895 | 2 | Immunglobulin heavy chain variable region (Fragment). | 5-2,5-3,5-5 | - | auto |
| HIX0028051 | HIT000046129 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159232 | HIT000049837 | 2 | SJCHGC03017 protein. | 5-2,5-5 | - | auto |
| HIX0159233 | HIT000063098 | 2 | Rheumatoid factor C6 heavy chain (Fragment). | 5-2,5-3,5-5 | - | auto |
| HIX0159234 | HIT000017517 | 2 | Pherophorin-S precursor. | 5-1,5-2,5-5 | - | manual |
| HIX0159235 | HIT000191941 | 2 | Immunglobulin heavy chain variable region (Fragment). | 5-5,5-6-3 | - | auto |
| HIX0159236 | HIT000332103 | 2 | Imuunoglobulin mu-chain D-J4-region (Fragment). | 5-5,5-6-3 | - | auto |
| HIX0023340 | HIT000005681 | 3 | Protein of unknown function DUF1630 family protein. | 5-1,5-5,5-6-1 | - | manual |
| HIX0051547 | HIT000192814 | 3 | Aromatic amino acid hydroxylase family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0159237 | HIT000334596 | 1 | T-complex protein 1 subunit delta (TCP-1-delta) (CCT-delta) (Stimulator of TAR RNA-binding). | 5-4,5-5 | - | manual |
| HIX0060170 | HIT000018866 | 3 | Sushi/SCR/CCP domain containing protein. | 5-2,5-5,5-6-1 | - | auto |
| HIX0029968 | HIT000091421 | 3 | SH2 motif domain containing protein. | 5-2,5-3,5-5,5-6-1 | - | auto |
| HIX0159238 | HIT000072102 | 1 | RNase K. | 5-4,5-5 | - | manual |
| HIX0028055 | HIT000046236 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159240 | HIT000332431 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | auto |
| HIX0159242 | HIT000332429 | 4 | Conserved hypothetical protein. | 5-1,5-3,5-5 | - | auto |
| HIX0159243 | HIT000218979 | 1 | Methionine aminopeptidase 2 (EC 3.4.11.18) (MetAP 2) (MAP 2) (Peptidase M 2) (Initiation factor 2-associated 67 kDa glycoprotein) (p67) (p67eIF2). | 5-4,5-5,5-6-3,5-6-4 | - | manual |
| HIX0053957 | HIT000222065 | 3 | Isocitrate/isopropylmalate dehydrogenase family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0053189 | HIT000218438 | 3 | ABC transporter-like domain containing protein. | 5-2,5-3,5-5,5-6-3,5-6-4 | - | auto |
| HIX0018300 | HIT000017407 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159249 | HIT000218478 | 2 | Homeobox-like sequence. Part of tandem repeat (Fragment). | 5-1,5-5 | - | manual |
| HIX0112556 | HIT000386444 | 3 | TMC family protein. | 5-2,5-3,5-5,5-6-3 | - | auto |
| HIX0050926 | HIT000091700 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5 | - | manual |
| HIX0159252 | HIT000332438 | 4 | Conserved hypothetical protein. | 5-1,5-2,5-5 | - | auto |
| HIX0045747 | HIT000076540 | 3 | Ornithine decarboxylase antizyme family protein. | 5-5,5-6-3,5-6-4 | - | auto |
| HIX0061387 | HIT000013837 | 3 | Homeobox domain containing protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0028276 | HIT000055328 | 5 | Hypothetical protein. | 5-1,5-2,5-3,5-5 | - | manual |
| HIX0159253 | HIT000191834 | 2 | Mucin-6 precursor (MUC-6) (Gastric mucin-6). | 5-1,5-2,5-5 | - | auto |
| HIX0028096 | HIT000048362 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5,5-6-1 | - | manual |
| HIX0159255 | HIT000042835 | 1 | Stomatin-like 3. | 5-4,5-5 | - | manual |
| HIX0159257 | HIT000220924 | 2 | Immunglobulin heavy chain variable region (Fragment). | 5-5,5-6-3 | - | auto |
| HIX0159258 | HIT000061898 | 2 | Mucin (Fragment). | 5-1,5-5 | - | manual |
| HIX0045028 | HIT000071453 | 4 | Conserved hypothetical protein. | 5-1,5-5 | - | manual |
| HIX0028253 | HIT000053809 | 3 | Krueppel-associated box domain containing protein. | 5-5,5-6-1 | - | auto |
| HIX0061651 | HIT000045110 | 4 | Conserved hypothetical protein. | 5-2,5-3,5-5 | - | auto |
| Description: A normal mRNA sequence is aligned on the single location of the genome. However, the alignment of a fused type mRNA is split into two parts and mapped on the different locations of the genome (e.g. the 5’part of the transcript is mapped on the chromosome 1, but the 3’ part is mapped on the chromosome 2). Several possibilities should be considered for the formation of such chimeric/fusion transcript as follws; 1. experimental artifact (cDNA/cDNA recombination) 2. transcription from the rearranged genome (translocation, transposition, segmental duplication) 3. trans-splicing We identified chimeric transcripts in the unmapped mRNAs by examining the alignment result. We further analyzed the splice site motif of the fusion boundary, consensus with exon/intron structure, and overlapping status with normal transcripts etc. These data are considered to be helpful for classifying the chimeric transcripts according to the type of fusion (discrimination between trans-splicing and others etc.). |
| [Download (2)-1] Annotation summary for the chimeric (fusion) transcripts File name: um_dlfile2.txt Format: 1. HIT 2. Accession # 3. Internal ID for Chimera alignment 4. Cluster-ID (chimera cluster) 5. Alignment position for the 5’ part (genome coordinate) 6. Alignment position for the 5’ part (transcript coordinate) 7. Alignment position for the 3’ part (genome coordinate) 8. Alignment position for the 3’ part (transcript coordinate) 9. Alignment position for the 3’ part 10.Total identity 11.Total coverage 12.Splice site motif for the fusion boundary 13.Overlapping transcript for the 5’ part 14.Overlapping transcript for the 3’ part |
| [Download (2)-2] Alignment file for chimeric (fusion) transcripts File name: um_dlfile3.txt Header lines: >tID chim-id tDesc ident cover qstt qend tstt tend qleng - tID --- H-InvDB's transcript ID (HIT*********) - chim-id --- internal ID for chimera alignment - tDesc --- absolute coordinate of the fused genome in the reference genome format=chrom1:stt1(strand +or-)end1__chrom2:stt2(strand +or-)end2 - ident --- total identity for the chimeric alignment - cover --- total coverage for the chimeric alignment - qstt --- chimera alignment start position (transcript position) - qend --- chimera alignment end position (transcript position) - tstt --- chimera alignment start position (fused genome position) - tend --- chimera alignment end position (fused genome position) - qleng --- length of the chimera transcript Header line starts with ">". All lines are described in tab-delimited format. The chimeraAln format describes an alignment for the chimeric transcript, which is mapped onto the multiple locations of the genome. Each set of chimeAln alignments starts with a header line (the line starts with ">"), contains two or more alignment data lines, and terminates with a junction data line. |
| [Download (2)-3] Overlaps between H-InvDB transcripts and chimeric (fusion) transcripts File name: um_dlfile4.txt Header lines >tID chim-id tDesc tID --- H-InvDB's transcript ID (HIT*********) chim-id --- internal ID for chimera alignment tDesc --- absolute coordinate of the fused genome in the reference genome format=chrom1:stt1(strand +or-)end1__chrom2:stt2(strand +or-)end2 The chimeraAln format describes an alignment for the chimeric transcript, which is mapped onto the multiple locations of the genome. Each set of chimeAln alignments starts with a header line (the line starts with ">"), contains representative overlapping transcripts lines, and overlap status lines if any overlaps. |