About PANDA System

back to PANDA System Home

1. Motivation

The mission of this system The mission of this study is as follows; Our purpose is to extract new disease-susceptible genes using H-InvDB. And our concept for extracting new candidate genes is that new candidates have similar character to reported disease related genes. Similar character means not only sequence homology or domain structure but also same metabolic pathway or expression pattern. H-InvDB has advantage of existence of these information on genes. So, based on this concept, we constructed Priority Analysis for Disease Association (PANDA) system On our methods, we used the 7 kinds of annotated information. Paralogous genes that result from a duplication anterior to the divergence of vertebrates have high degrees of similarity in their sequence and may share some redundancy in function. Genes associated with a disease may be identified through their functional similarities to known disease genes or their localization along the same physiological pathway as known disease genes. So, we compared among the genes based on InterPro, EC#, KEGG pathway, GO terms. Next, we transferred these information into score with some formula. We can get 7 kind of score in each gene. Finally, we tried to select the new disease-susceptible candidates with discriminant analysis.

2. Select Training Set

Selecting Training Set (disease related genes) At first, we selected disease related genes (called known genes) using OMIM & LocusLink. We queried some disease to OMIM & LL. OMIM & LL extracted the list of genes which have relationship to queried disease. After that, we read PubMed abstract about relationship between gene and disease. For example, when the gene is reported about the disease relationship in only mouse experiment. We remove the gene from the list. So, We selected the "reported disease-related gene set" from H-InvDB. We call these genes known genes. And rest of genes are called "Others".

3. Structure of PANDA

Structure of PANDA System This is structure of PANDA system. We selected known genes. After that we transferred gene information into scores with some formula. And we calculated Frequency scores in target disease. We used these scores as training set for machine learning. After that we scored unknown genes. Finally, we tried to prioritize the others with discriminant analysis and select new candidates.

4. Explanation about Mahalanobis Distance

using mahalanobis distance for discriminant analysis We used Mahalanobis distance for discriminant analysis. As you know, we can get 7 kind of scores. We calculated Mahalanobis distance in 7 dimension in equal weighting. At first, we calculated the centers of gravity of known genes and unknown genes. Second we calculated the distance of position of Gene X from C.G. of known and unknown genes (we called MD1 and MD2). Finally, we compared between MD1 and MD2. We calculated MD2-MD1. When the score of MD2-MD1 is more than 0, this means near to known genes. And we considered this gene as a candidate.

for download:

Last Modified : $Date: 2006/03/31 08:50:29 $
Valid XHTML 1.0 Strict