Biomedical hypothesis generation by text mining and gene prioritization

Ingrid Petrič, Balázs Ligeti, Balázs Gyorffy, Sándor Pongor

Research output: Contribution to journalArticle

3 Citations (Scopus)


Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.

Original languageEnglish
Pages (from-to)847-857
Number of pages11
JournalProtein and Peptide Letters
Issue number8
Publication statusPublished - Jun 2014


  • Biomedical hypothesis generation
  • Disease gene prediction
  • Gene prioritization
  • Ovarian cancer
  • Text mining

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry

Fingerprint Dive into the research topics of 'Biomedical hypothesis generation by text mining and gene prioritization'. Together they form a unique fingerprint.

  • Cite this