. .
Text Mining Resources
Categories
The following links are categorized to make them more accessible:

Show All | Hide All
  • Text/Documents:
     
    • Mining microarray expression data by literature profiling (2002) (Online Version)
      Damien Chaussabel and Alan Sher, NIH, Bethesda, MD
            The rapidly expanding fields of genomics and proteomics have prompted the development of computational methods for managing, analyzing and visualizing expression data derived from microarray screening. Nevertheless, the lack of efficient techniques for assessing the biological implications of gene-expression data remains an important obstacle in exploiting this information. To address this need, we have developed a mining technique based on the analysis of literature profiles generated by extracting the frequencies of certain terms from thousands of abstracts stored in the Medline literature database. Terms are then filtered on the basis of both repetitive occurrence and co-occurrence among multiple gene entries. Finally, clustering analysis is performed on the retained frequency values, shaping a coherent picture of the functional relationship among large and heterogeneous lists of genes. Such data treatment also provides information on the nature and pertinence of the associations that were formed.
       
    • Use of keyword hierarchies to interpret gene expression patterns (2001) (PDF, 128KB, 8pgs)
      Masys et al., University of California, San Diego
            High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. RESULTS: We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.
       
    • Using Text Analysis to Identify Functionally Coherent Gene Groups (2002) (Online version)
      Raychaudhuri et al., Stanford University
            The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature.
       
    • Finding the evidence for protein-protein interactions from PubMed abstracts (2006) (PDF, 442KB, 7pgs)
      Jang et al., Bioinformatics Team, Electronics and Telecommunications Research Institute (ETRI) in Korea
            As a result of a search for two proteins, PubMed frequently returns hundreds of abstracts. In this paper, a method is introduced that validates protein-protein interactions from PubMed abstracts.
       
    • GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data (2003) (PDF, 426KB, 11pgs)
      Rzhetsky et al., Columbia University, New York
            The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction of information using natural-language processing, information visualization, and generation of specialized knowledge bases for molecular biology. GeneWays is an integrated system that combines several such subtasks. It analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks. GeneWays is designed as an open platform, allowing researchers to query, review, and critique stored information.
       
  • Software Downloads:
     
  • Slide Presentations:
     
  • Video Lectures:
     
  • Sample Scripts:
     
  • Sample Data:
     
  • General Web Resources:
     
Site Recommendations
Do you know of any resources of interest or ways to make this page more informative? Let us know about it!
 







. . Site map | Privacy policy | Webmaster | AR BRIN
Copyright © 2004-2009 Donaghey College of Information Science and Systems Engineering
Created and maintained by MidSouth Bioinformatics Center
 
This page was created on September 23, 2009.