MUSC/UIUC Joint Text Mining Project

This wiki website is for the joint research project "Automatic Literature-based Protein Annotation" funded by a grant (R01 LM009153-01A1) from the National Library of Medicine, NIH.

Data Repositories

Projects

GOGrapher

The project is to develop a Python packageGOGrapher which can be used to represent the Gene Ontology graph structure. A GOGraph uses the Gene Ontology to create a network relating terms to each other and proteins to terms. Given a species name or the name of one of the origin databases for the GODB (and an aspect), a directed unweighted graph is constructed (a GODiGraph - using networkx).

Sentence-based correspondence LDA model

This project is to identify the relationship between the content of a MEDLINE document and the GO annotation associated with the document. The overall assumption is that the sentences of a given document is generated by (or belong to) the topics contained in the document. Similarly, the associated GO annotation is also related to the topics of the document, The model is to capture the correspondence between the observed GO annotation and sentences of the document. A restricted project web page is here.

Stochastic graph-based multi-label classification

This project is perform multi-label classification to annotate proteins. Given a document related to a protein, the task is to predict a set of GO annotation based on the assumption that multiple aspect of the protein can be discussed by the paper. If there are total G GO terms in the Gene Ontology, the task can be thought of as produce a G-dimensional vector such that some of elements are set to 1s and other to 0s. Ideally, one would produce a probabilistic output indicating what is the posterior probability of the vector. Apparently, the space of all possible output vector is intractable, thus various stochastic Furthermore, we would like to utilize the structure of the GO graph to facilitate the accuracy of the prediction by stochastically search the graph to probabilistically

Projects/NLMTextMining (last edited 2008-09-17 18:51:02 by mullerb)