|
|
Home Quick guide Navi BLAST Site map Download Contact us Help |
ReadMe | Previous dataset |
H-InvDB_9.0 released on May 27, 2015. (Erratum released on June 18, 2015.) | |
List of H-Invitational IDs |
acc2hinv_id.txt.gz | 2.7M | Text File |
Format 1: DNA databank accession numbers, 2: HIT : H-Invitational transcript, 3: HIX : H-Invitational clusters, 4: cDNA data provider |
List of new, deleted and updated H-Invitational IDs |
new_del_update_hinvid.txt | 720K | Text File |
Annotation data sets |
H-InvDB-HIX_9.0 2015. 6 | ||
SET of all H-Inv clusters (Flat File) | 43M | Flat File |
SET of all H-Inv clusters (XML) | 56M | XML File |
Annotation for all H-Inv clusters are provided in both flat file and XML format | ||
H-InvDB-HIT_9.0 2015. 6 | ||
SET of all H-Inv transcripts(Flat File) | 614M | Flat File |
SET 1 of all H-Inv transcripts (XML) | 197M | XML File |
SET 2 of all H-Inv transcripts (XML) | 189M | XML File |
SET 3 of all H-Inv transcripts (XML) | 180M | XML File |
SET 4 of all H-Inv transcripts (XML) | 105M | XML File |
SET 5 of all H-Inv transcripts (XML) | 59M | XML File |
NOTES: Annotation for all H-Inv transcripts are provided in both flat file and XML format | ||
H-InvDB-HIP_9.0 2015. 6 | SET of all H-Inv proteins (Flat File) | 413M | Flat File |
SET 1 of all H-Inv proteins (XML) | 165M | XML File |
SET 2 of all H-Inv proteins (XML) | 158M | XML File |
SET 3 of all H-Inv proteins (XML) | 116M | XML File |
NOTES: Annotation for all H-Inv proteins are provided in both flat file and XML format |
Section data of annotation |
H-InvDB-HIX-Annotation_9.0 2015. 6 | ||
SET of all H-Inv clusters (Flat File) | 35M | Flat File |
Section of basic annotation for H-Inv clusters are provided flat file format | ||
H-InvDB-HIT-Annotation_9.0 2015. 6 | ||
SET of all H-Inv transcripts (Flat File) | 436M | Flat File |
NOTES: Section of basic annotation for H-Inv transcripts are provided flat file format. | ||
H-InvDB-Expression_8.1 | ||
SET of H-Inv clusters (Flat File) | 7.9M | Flat File |
Section of tissue-specific expression data is provided flat file format. | ||
H-InvDB-DiseaseInfo_8.0 | ||
SET of H-Inv clusters (Flat File) | 7.5M | Flat File |
Section of disease information is provided flat file format. | ||
H-InvDB-Evolution_7.1 | ||
SET of all H-Inv transcripts (Flat File) | 24M | Flat File |
Section of molecular evolutionary analysis is provided flat file format. | ||
H-InvDB-3Dstructure_8.0 | ||
SET of all H-Inv transcripts (Flat File) | 29M | Flat File |
NOTES: Section of prediction of 3-D structure of H-Inv transcripts by GTOP (http://sybock.genes.nig.ac.jp/~hinv3/gtop.html) are provided flat file format | ||
H-InvDB-Subcellular_9.0 2015. 6 | ||
SET of all H-Inv transcripts (Flat File) | 18M | Flat File |
NOTES: Section of prediction of subcellular localization by WoLF PSORT, TargetP, TMHMM and SOSUI are provided flat file form |
Sequence data sets |
Nucleotide data sets 2015. 6 | ||
All H-Invitational transcripts (HITs) | 124M | Flat File |
Representative H-Invitational transcripts (HITs) | 31M | Flat File |
Representative alternative variants (RASV) of H-Invitational transcripts (HITs) | 46M | Flat File |
H-Inv2 full-length cDNA dataset (HITs) | 47M | Flat File |
H-Invitational cluster genome sequence (HIXs) | 644M | Flat File |
NOTES: 1) Nucleotide sequences of all H-Inv transcripts (HITs) in FASTA format. 2) Nucleotide sequences of representative H-Inv transcripts (HITs) in FASTA format. 3) Nucleotide sequences of representative splicing variants (RASV) of H-Inv transcripts (HITs) in FASTA format. 4) Nucleotide sequences of all H-Inv cDNAs (HITs) in FASTA format. 5) Nucleotide sequences of the genome for the all H-Inv clusters (HIXs) in FASTA format. Headline> HIT version | HIX version | HIP version | DNA databank accession number | "FS": frame shift error if revised or "NO" for no revision | "IM": remaining intronic sequence if revised or "NO" for no revision | "HC": function if human curated or "AA" for auto-annotated| Frame (+3 to -3) | Position of CDS (start..end) | Definition Body: cDNA sequences |
||
Amino acid data sets 2015. 6 | ||
All H-Invitational trsnscripts (HITs) | 47M | Flat File |
Representative H-Inv transcripts (HITs) | 9.7M | Flat File |
Representative alternative variants (RASV) of H-Inv transcripts (HITs) | 18M | Flat File |
H-Inv2 full-length cDNA dataset (HITs) | 14M | Flat File |
All H-Invitational proteins (HIPs) | 37M | Flat File |
1) Translation of all H-Inv transcripts (HITs) in FASTA format. 2) Translation of representative H-Inv transcripts (HITs) in FASTA format. 3) Translation of representative alternative variants (RASV) of H-Inv transcripts (HITs) in FASTA format. 4) Translation of all H-Inv cDNAs (HITs) in FASTA format. 5) Translation of all H-Inv proteins (HIPs) in FASTA format. Headline> HIT version | HIX version | HIP version | DNA databank accession number | "FS": frame shift error if revised or "NO" for no revision | "IM": remaining intronic sequence if revised or "NO" for no revision | "HC": function if human curated or "AA" for auto-annotated| Frame (+3 to -3) | Position of CDS (start..end) | Definition Body: translation |
Results of computational analysis New! (revised) |
Multiple alignment information 2015. 5 | ||
SET of all MULTIPLE ALIGNMENT(Flat File) | 347M | Flat File |
NOTES: "FMULTI" provides multiple-alignments of all the H-Inv transcripts and RefSeq sequences mapped in the same H-Inv cluster(HIX), against human genome sequence. | ||
Positional information of mapping 2015. 6 | ||
Data of all positional information for all the exon(Flat File) | 124M | Flat File |
NOTES: "FMULTIP"; FMULTIP.tar.gz (mALNp/.tbl) provides positional information for all the exon of all the H-Inv transcripts and RefSeq sequences mapped in the same H-Inv cluster(HIX), against human genome sequence. Format: 01:HIT (or acc), 02:HIX (or temporally cluster-id), 03:key (=acc_chr_sno), 04:seq1 ("genome"), 05:seq2 (cDNA_acc), 06:strand, 07:exon_no, 08:type[N:not aligned part of seq1(genome); U:unmapped part of seq2(cDNA); A:alignment, D:deletion, I:insertion; G:gap due to other member)], 09:start_seq1 (genome), 10:end_seq1 (genome), 11:start_seq2 (cDNA), 12:end_seq2 (cDNA) (or gap_length if 08:type='G') |
||
The positional information of the exon, CDS and UTR for representative H-Inv transcripts (HITs) in GFF3 format. | 7.6M | GFF3 File |
The positional information of the exon, CDS and UTR for all H-Inv transcripts (HITs) in GFF3 format. | 25M | GFF3 File |
NOTES: "h-inv_pub_rep.gff3.gz","h-inv_pub.gff3.gz"; The positional information of the exon, CDS and UTR in GFF3 format. Format:01: "seqid (HIT)", 02: "source", 03: "type", 04: "start", 05: "end", 06: "score", 07: "strand", 08: "phase", 09: "attributes" Attributes (exon):ID=HIX, Name=HIT, Note=HUGO gene symbol Attributes (CDS):ID=HIT, Parent=HIX, Name=HIT, Alias=definition, Note=HUGO gene symbol, accession Attributes (UTR):Parent=HIT, Name=HIT |
||
H-ANGEL matrix | ||
Gene expression matrix of H-ANGEL (Flat File) | 25M | Flat File |
NOTES: "H-ANGEL_matrix.txt.gz" provides gene expression matrix of H-ANGEL. The followings are the No. of column and each description. Format: 1: Type of platform, 2: experimental ID added by each provider, 3: primer/probe ID. e.g. GeneChip Identifier, 4: 10 categories collapsed by the avarage of 40 categories, 5: 10 categories collapsed by the maximum value of 40 categories, 6: 40 categories, 7: Acc corresponding to the primer/probe, 8: start site of the primer/probe on the genome, 9: end site of the primer/probe on the genome, 10: Absolute value of the expression data for EST and SAGE only, NaN otherwise, 11: HIX, 12: UniGene ID, 13: start site of HIX on the genome, 14: end site of HIX on the genome, 15: strand of the locus, 16: Acc(s) included within HIX. |
||
Molecular evolutionary annotation (Evola) | ||
SET of Evola ortholog list | Flat File | |
"Evola.txt.gz": provides ortholog accession number list of Evola |
||
Inter-species multi-FASTA (Transcript) | ||
SET of Evola ortholog sequences (Transcript) | Flat File | |
NOTES: "NFAS.tar.gz" provides multiple FASTA of transcript nucleotide sequences of human and other species orthologs. | ||
Inter-species multi-FASTA (Protein) | ||
SET of Evola ortholog sequences (Protein) | Flat File | |
NOTES: "PFAS.tar.gz" provides multiple FASTA of protein amino acid sequences of human and other species orthologs. | ||
Phylogenetic trees | ||
SET of Evola duplicate gene family trees (Flat File) | Flat File | |
NOTES: "NJ.tar.gz" provides the phylogenetic trees (phb files) constructed by the Neighbor-joining method (amino acid) | ||
Human protein complex database with quality index (PCDq), data set New! 2015. 11 | ||
Protein complex list, their subunits (members), and related annotation. | 1.5M | TSV Files (tar.gz file) |
NOTES: "complexList.tsv" provides complex ID, name, etc.
"subunitMembers.tsv" provides subunits (members) of each complex and related annotations.
"public_ppi.tsv" provides PPI data used for complex prediction. File format is described in README file included in download package. |
||
A subset of H-InvDB annotation data sets with supporting proteome evidence 2015. 7 | ||
SET of H-Inv clusters with supporting proteome evidence (Flat File) | 25M | Flat File |
SET of H-Inv clusters with supporting proteome evidence (XML) | 33M | XML |
SET of H-Inv transcripts with supporting proteome evidence (Flat File) | 146M | Flat File |
SET of H-Inv transcripts with supporting proteome evidence (XML) | 178M | XML |
SET of H-Inv proteins with supporting proteome evidence (Flat File) | 57M | Flat File |
SET of H-Inv proteins with supporting proteome evidence (XML) | 61M | XML |
NOTES: Subsets of H-InvDB loci, transcripts, and proteins, with supporting evidences of expression confirmed in comprehensive proteomic experiments. Those classified as "protein level" or "transcript level" expression in C-HPP are included. | ||
Transcripts and clusters unmapped to human genome  2015. 5 | ||
Clusters of transcripts unmapped to human genome (Flat File) | 2.7M | Flat File |
Clusters of transcripts unmapped to human genome (XML) | 3.0M | XML |
Transcripts unmapped to human genome (Flat File) | 14M | Flat File |
Transcripts unmapped to human genome (XML) | 15M | XML |
NOTES: Data set of transcripts (HIT) and clusters (HIX) that are not mapped to human genome. |
FTP Download |
FTP site for downloading data files in H-Invitational Database (DNA Data Bank of Japan). ftp://ftp.ddbj.nig.ac.jp/mirror_database/hinv/ |