H-InvDB x AHG DB
H-InvDB x AHG DB
H-InvDB_8.3 released on March 26, 2013.
Search by for Advanced Search
ホーム クイックガイド 検索ナビ BLAST サイトマップ データダウンロード 問い合わせ ヘルプ
[English]

H-InvDB アノテーション・トピックス

UM: ゲノムにマッピングされない転写物のアノテーション (UM転写物) [ダウンロード]

定義: ヒトのリファレンスゲノム配列に対して正常なマッピングを行うことができないヒトmRNA配列が数多く存在します。我々はこれらの転写配列に対してアノテーションを行い、それらをUMとなる原因によって分類しています。”リファレンスゲノムへのアラインメント”以外の証拠によるサポートを受け、配列未決定領域上に存在すると考えられる信頼性の高いUM遺伝子のセットを提供しています。その他、trans-splicing候補の探索や、ゲノム再編成の研究に役立つと考えられるキメラ転写物に関するアノテーション情報等も提供しています。

3つのサブトピックス:

  1. ”NCBI リファレンスゲノムへのアラインメント”以外の証拠によるサポートを受け、配列未決定領域上に存在すると考えられる信頼性の高いUM遺伝子のセット
  2. キメラ転写産物
  3. 部分的マップ転写産物

1. ”NCBI リファレンスゲノムへのアラインメント”以外の証拠によるサポートを受け、配列未決定領域上に存在すると考えられる信頼性の高いUM遺伝子のセット ダウンロード
[Download (1)-1] Annotated UM genes supported by evidences other than alignment on the NCBI reference genome

File name: um_dlfile1.txt
Format:
1: HIX
2: Representative HIT
3: Similarity category
4: Definition
5: Supporting evidences (comma separated)
     5-1 Watson genome
     5-2 Venter genome
     5-3 Celera alternative genome (alternative b.36.3)
     5-4 Experimental evidence for translation
     5-5 Supported by two or more transcripts (reproductive)
     5-6 Conserved on the other mammal’s genome
         5-6-1 chimpanzee
         5-6-2 macaque
         5-6-3 mouse
         5-6-4 rat
6: Position
7: Annotation status (auto or manual)
HIXRepresentative HITSimilarity categoryDefinitionSupporting evidences (comma separated)PositionAnnotation status (auto or manual)
HIX0060313 HIT0003238483von Willebrand factor, type D domain containing protein.5-1,5-2,5-3,5-5-manual
HIX0041649 HIT0002438773Homeobox domain containing protein.5-1,5-5-manual
HIX0060423 HIT0001924923von Willebrand factor, type D domain containing protein.5-5-manual
HIX0159179 HIT0003269602Mucin (Fragment).5-5-auto
HIX0060216 HIT0000260133Plectin repeat containing protein.5-2,5-3,5-5-manual
HIX0080310 HIT0003396873Coatomer, gamma subunit family protein.5-2,5-3,5-5,5-6-1,5-6-2,5-6-3-auto
HIX0060229 HIT0000486864Conserved hypothetical protein.5-2,5-5-auto
HIX0044052 HIT0000674643Aconitase, mitochondrial-like family protein.5-5,5-6-3,5-6-4-auto
HIX0027940 HIT0000435135Hypothetical protein.5-1,5-2,5-5,5-6-1-manual
HIX0055490 HIT0003271173Pyruvate kinase family protein.5-5,5-6-3,5-6-4-auto
HIX0042028 HIT0000990404Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0043567 HIT0000642213Homeobox domain containing protein.5-5,5-6-3-manual
HIX0159184 HIT0003324274Conserved hypothetical protein.5-1,5-3,5-5-auto
HIX0028063 HIT0000466514Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0018463 HIT0000843803Ras family protein.5-1,5-3,5-5-manual
HIX0017710 HIT0000137875Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0019573 HIT0000263893Protein of unknown function DUF1193 family protein.5-1,5-2,5-3,5-5-manual
HIX0042975 HIT0000618964Conserved hypothetical protein.5-1,5-5-manual
HIX0080315 HIT0000175553Zinc finger, C2H2-type domain containing protein.5-5-manual
HIX0055480 HIT0003270924Conserved hypothetical protein.5-5,5-6-3-auto
HIX0041483 HIT0002166953Replication initiation factor family protein.5-5-manual
HIX0027949 HIT0000526474Conserved hypothetical protein.5-1,5-2,5-3,5-5-manual
HIX0159189 HIT0000637652PHD finger protein 2 (GRC5).5-5,5-6-3,5-6-4-auto
HIX0097813 HIT0002593513Chlorophyll A-B binding protein family protein.5-5-auto
HIX0159191 HIT0002494272Transgelin-2 (SM22-alpha homolog).5-2,5-3,5-5-auto
HIX0042029 HIT0000317585Hypothetical protein.5-2,5-3,5-5-auto
HIX0159192 HIT0000451385Hypothetical protein.5-2,5-5-auto
HIX0061037 HIT0003324354Conserved hypothetical protein.5-5-auto
HIX0044950 HIT0000703223Potassium channel, voltage dependent, Kv4.2 family protein.5-2,5-3,5-5,5-6-3,5-6-4-auto
HIX0023407 HIT0000159374Conserved hypothetical protein.5-5,5-6-1,5-6-2,5-6-4-auto
HIX0080399 HIT0000654554Conserved hypothetical protein.5-5-auto
HIX0112563 HIT0003866425Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159194 HIT0003931024Conserved hypothetical protein.5-5-auto
HIX0159195 HIT0003320952IGHM protein.5-5,5-6-4-auto
HIX0097979 HIT0003923434Conserved hypothetical protein.5-2,5-5-manual
HIX0080309 HIT0000260184Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0043236 HIT0000634823Protein of unknown function DUF1234 family protein.5-5,5-6-3,5-6-4-auto
HIX0049630 HIT0002525134Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0054342 HIT0003210003ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter family protein.5-5,5-6-3,5-6-4-auto
HIX0159197 HIT0001914122Cytochrome P450 2A6 (EC 1.14.14.1) (CYPIIA6) (Coumarin 7-hydroxylase) (P450 IIA3) (CYP2A3) (P450(I)).5-2,5-3,5-5-auto
HIX0159199 HIT0001919572Immunglobulin heavy chain variable region (Fragment).5-5,5-6-3-auto
HIX0068099 HIT0001959343ATPase, V1 complex, subunit B family protein.5-2,5-3,5-5,5-6-1,5-6-2,5-6-4-auto
HIX0159200 HIT0003340291Uncharacterized serine/threonine-protein kinase SgK494 (EC 2.7.11.1) (Sugen kinase 494). Isoform 2.5-4,5-5-manual
HIX0037289 HIT0003365623Mammalian taste receptor family protein.5-2,5-3,5-5,5-6-1-auto
HIX0113155 HIT0003932545Hypothetical protein.5-1,5-5,5-6-1-auto
HIX0019325 HIT0000381913Twist family protein.5-2,5-5,5-6-2,5-6-3,5-6-4-manual
HIX0060419 HIT0000743503Glyceraldehyde 3-phosphate dehydrogenase family protein.5-5-auto
HIX0060255 HIT0000561503GPCR kinase domain containing protein.5-1,5-2,5-3,5-5-manual
HIX0042074 HIT0001930503Homeobox domain containing protein.5-5-manual
HIX0159202 HIT0002507231Dynein light chain 1, cytoplasmic (Dynein light chain LC8-type 1) (8 kDa dynein light chain) (DLC8) (Protein inhibitor of neuronal nitric oxide synthase) (PIN).5-4,5-5-manual
HIX0022691 HIT0000476104Conserved hypothetical protein.5-3,5-5-auto
HIX0159205 HIT0000226942Carboxyl-ester lipase.5-2,5-5-auto
HIX0042026 HIT0000946954Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0051085 HIT0000995814Conserved hypothetical protein.5-5,5-6-1-auto
HIX0159206 HIT0000172884Conserved hypothetical protein.5-2,5-3,5-5-auto
HIX0021405 HIT0000796033Zinc finger, C2H2-type domain containing protein.5-2,5-3,5-5,5-6-1-auto
HIX0075704 HIT0003324304Conserved hypothetical protein.5-1,5-3,5-5-auto
HIX0050885 HIT0000373214Conserved hypothetical protein.5-2,5-3,5-5-manual
HIX0159209 HIT0000262124Conserved hypothetical protein.5-1,5-5-auto
HIX0159210 HIT0001957822Mucin-3A precursor (MUC-3A) (Intestinal mucin-3A).5-1,5-5-auto
HIX0080322 HIT0000454473RNA polymerase II, heptapeptide repeat, eukaryotic containing protein.5-5,5-6-1-manual
HIX0159211 HIT0003324254Conserved hypothetical protein.5-1,5-5-auto
HIX0159212 HIT0003928152Imuunoglobulin mu-chain D-J4-region (Fragment).5-5,5-6-4-auto
HIX0017615 HIT0000061174Conserved hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159213 HIT0000618974Conserved hypothetical protein.5-1,5-5-manual
HIX0072430 HIT0002515854Conserved hypothetical protein.5-3,5-5-auto
HIX0046983 HIT0002439573Steroid/nuclear receptor, oncofetal protein p65 family protein.5-5-manual
HIX0159215 HIT0003240825Hypothetical protein.5-1,5-5-manual
HIX0159216 HIT0003324374Conserved hypothetical protein.5-1,5-5-auto
HIX0061364 HIT0000114435Hypothetical protein.5-1,5-5-manual
HIX0028061 HIT0000465074Conserved hypothetical protein.5-1,5-2,5-3,5-5-manual
HIX0017743 HIT0000223484Conserved hypothetical protein.5-2,5-3,5-5-manual
HIX0097994 HIT0003930714Conserved hypothetical protein.5-5-auto
HIX0159217 HIT0000906502Plus agglutinin (Fragment).5-2,5-3,5-5-auto
HIX0159218 HIT0002504285Hypothetical protein.5-2,5-3,5-5-manual
HIX0027993 HIT0000433205Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159219 HIT0003320982Anti-FactorVIII scFv (Fragment).5-2,5-3,5-5-auto
HIX0055510 HIT0003271414Conserved hypothetical protein.5-5,5-6-3,5-6-4-auto
HIX0159220 HIT0003257252Immunglobulin heavy chain variable region (Fragment).5-2,5-3,5-5,5-6-4-auto
HIX0041484 HIT0000618954Conserved hypothetical protein.5-1,5-5-manual
HIX0159222 HIT0002463822Integrin-linked protein kinase (EC 2.7.11.1).5-5,5-6-3,5-6-4-auto
HIX0052343 HIT0002156873Gonadotropin, beta chain family protein.5-5,5-6-3,5-6-4-auto
HIX0055478 HIT0003270904Conserved hypothetical protein.5-2,5-5,5-6-3,5-6-4-auto
HIX0159228 HIT0002555012ATP synthase protein 8 (ATPase subunit 8) (A6L).5-5-auto
HIX0017611 HIT0000180155Hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0159230 HIT0003928952Immunglobulin heavy chain variable region (Fragment).5-2,5-3,5-5-auto
HIX0028051 HIT0000461295Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159232 HIT0000498372SJCHGC03017 protein.5-2,5-5-auto
HIX0159233 HIT0000630982Rheumatoid factor C6 heavy chain (Fragment).5-2,5-3,5-5-auto
HIX0159234 HIT0000175172Pherophorin-S precursor.5-1,5-2,5-5-manual
HIX0159235 HIT0001919412Immunglobulin heavy chain variable region (Fragment).5-5,5-6-3-auto
HIX0159236 HIT0003321032Imuunoglobulin mu-chain D-J4-region (Fragment).5-5,5-6-3-auto
HIX0023340 HIT0000056813Protein of unknown function DUF1630 family protein.5-1,5-5,5-6-1-manual
HIX0051547 HIT0001928143Aromatic amino acid hydroxylase family protein.5-5,5-6-3,5-6-4-auto
HIX0159237 HIT0003345961T-complex protein 1 subunit delta (TCP-1-delta) (CCT-delta) (Stimulator of TAR RNA-binding).5-4,5-5-manual
HIX0060170 HIT0000188663Sushi/SCR/CCP domain containing protein.5-2,5-5,5-6-1-auto
HIX0029968 HIT0000914213SH2 motif domain containing protein.5-2,5-3,5-5,5-6-1-auto
HIX0159238 HIT0000721021RNase K.5-4,5-5-manual
HIX0028055 HIT0000462365Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159240 HIT0003324314Conserved hypothetical protein.5-1,5-5-auto
HIX0159242 HIT0003324294Conserved hypothetical protein.5-1,5-3,5-5-auto
HIX0159243 HIT0002189791Methionine aminopeptidase 2 (EC 3.4.11.18) (MetAP 2) (MAP 2) (Peptidase M 2) (Initiation factor 2-associated 67 kDa glycoprotein) (p67) (p67eIF2).5-4,5-5,5-6-3,5-6-4-manual
HIX0053957 HIT0002220653Isocitrate/isopropylmalate dehydrogenase family protein.5-5,5-6-3,5-6-4-auto
HIX0053189 HIT0002184383ABC transporter-like domain containing protein.5-2,5-3,5-5,5-6-3,5-6-4-auto
HIX0018300 HIT0000174075Hypothetical protein.5-1,5-2,5-3,5-5,5-6-1-manual
HIX0159249 HIT0002184782Homeobox-like sequence. Part of tandem repeat (Fragment).5-1,5-5-manual
HIX0112556 HIT0003864443TMC family protein.5-2,5-3,5-5,5-6-3-auto
HIX0050926 HIT0000917004Conserved hypothetical protein.5-2,5-3,5-5-manual
HIX0159252 HIT0003324384Conserved hypothetical protein.5-1,5-2,5-5-auto
HIX0045747 HIT0000765403Ornithine decarboxylase antizyme family protein.5-5,5-6-3,5-6-4-auto
HIX0061387 HIT0000138373Homeobox domain containing protein.5-1,5-2,5-3,5-5-manual
HIX0028276 HIT0000553285Hypothetical protein.5-1,5-2,5-3,5-5-manual
HIX0159253 HIT0001918342Mucin-6 precursor (MUC-6) (Gastric mucin-6).5-1,5-2,5-5-auto
HIX0028096 HIT0000483624Conserved hypothetical protein.5-2,5-3,5-5,5-6-1-manual
HIX0159255 HIT0000428351Stomatin-like 3.5-4,5-5-manual
HIX0159257 HIT0002209242Immunglobulin heavy chain variable region (Fragment).5-5,5-6-3-auto
HIX0159258 HIT0000618982Mucin (Fragment).5-1,5-5-manual
HIX0045028 HIT0000714534Conserved hypothetical protein.5-1,5-5-manual
HIX0028253 HIT0000538093Krueppel-associated  box domain containing protein.5-5,5-6-1-auto
HIX0061651 HIT0000451104Conserved hypothetical protein.5-2,5-3,5-5-auto
2. キメラ転写産物
Description: 通常のmRNA配列はゲノム上の一ヵ所にマップされます。しかし、フュージョンタイプのmRNAはゲノム上の互いに離れた2ヵ所以上の場所にマップされます(例えば、5'側は1番染色体にマップされ、3'側は2番染色体にマップされる等)

転写物のキメラ化には次に挙げるいくつかの原因が考えられます。
1. 実験的エラー (cDNA/cDNA 再編成)
2. ゲノム再編成が起こった場所からの転写 (転位、転座、分節重複)
3. trans-splicing
アラインメント結果を解析することで、これらのキメラ転写物を同定しています。そして、キメラ転写物の融合部位におけるスプライスサイトモチーフの解析や、マップ遺伝子のエキソンーイントロン構造との一致を判定しています。これらのデータはキメラ転写物を原因別に分類することに役立つと考えられます。(例えば、trans-splicing とその他のキメラ転写物の分類 等)
[Download (2)-1] Annotation summary for the chimeric (fusion) transcripts

File name: um_dlfile2.txt
Format:
1. HIT
2. Accession #
3. Internal ID for Chimera alignment
4. Cluster-ID (chimera cluster)
5. Alignment position for the 5’ part (genome coordinate)
6. Alignment position for the 5’ part (transcript coordinate)
7. Alignment position for the 3’ part (genome coordinate)
8. Alignment position for the 3’ part (transcript coordinate)
9. Alignment position for the 3’ part
10.Total identity
11.Total coverage
12.Splice site motif for the fusion boundary
13.Overlapping transcript for the 5’ part
14.Overlapping transcript for the 3’ part
[Download (2)-2] Alignment file for chimeric (fusion) transcripts

File name: um_dlfile3.txt
Header lines:
>tID chim-id tDesc ident cover qstt qend tstt tend qleng
- tID --- H-InvDB's transcript ID (HIT*********)
- chim-id --- internal ID for chimera alignment
- tDesc --- absolute coordinate of the fused genome in the reference genome format=chrom1:stt1(strand +or-)end1__chrom2:stt2(strand +or-)end2
- ident --- total identity for the chimeric alignment
- cover --- total coverage for the chimeric alignment
- qstt --- chimera alignment start position (transcript position)
- qend --- chimera alignment end position (transcript position)
- tstt --- chimera alignment start position (fused genome position)
- tend --- chimera alignment end position (fused genome position)
- qleng --- length of the chimera transcript

Header line starts with ">". All lines are described in tab-delimited format.
The chimeraAln format describes an alignment for the chimeric transcript, which is mapped onto the multiple locations of the genome. Each set of chimeAln alignments starts with a header line (the line starts with ">"), contains two or more alignment data lines, and terminates with a junction data line.
[Download (2)-3] Overlaps between H-InvDB transcripts and chimeric (fusion) transcripts
File name: um_dlfile4.txt
Header lines
>tID chim-id tDesc
     tID --- H-InvDB's transcript ID (HIT*********)
     chim-id --- internal ID for chimera alignment
     tDesc --- absolute coordinate of the fused genome in the reference genome format=chrom1:stt1(strand +or-)end1__chrom2:stt2(strand +or-)end2
The chimeraAln format describes an alignment for the chimeric transcript, which is mapped onto the multiple locations of the genome. Each set of chimeAln alignments starts with a header line (the line starts with ">"), contains representative overlapping transcripts lines, and overlap status lines if any overlaps.
3. 部分的マップ転写産物
[Download (3)] Overlaps between H-InvDB transcripts and partially mapped transcripts
File name: um_dlfile5.txt
Format:
1: HIT
2: overlapping transcripts (comma delimited)