Knowledge Discovery in Biomedical Literature

A. Beryl Joylin; Nancy Victor

doi:10.5958/0974-360X.2017.00335.3

Research Journal of Pharmacy and Technology

ISSN

0974-360X (Online)
0974-3618 (Print)

Submit Article

Knowledge Discovery in Biomedical Literature

Author(s): A. Beryl Joylin, Nancy Victor

Email(s): beryljoylin@gmail.com , nancyvictor@vit.ac.in

DOI: 10.5958/0974-360X.2017.00335.3

Address: A. Beryl Joylin*, Prof. Nancy Victor
School of Information Technology and Engineering, VIT University, Vellore
*Corresponding Author

Published In: Volume - 10, Issue - 6, Year - 2017

View HTML

View PDF

ABSTRACT:
One of the active and challenging research topics in data mining is to find useful knowledge from a collection of unstructured data, especially on the biomedical domain due to its complications and intricacies. With the amount of biomedical literature generated every day it is becoming an information-saturated field, building automated extraction tools to handle the large volumes of published literature is becoming more important. However, the task of making effective use of this consistently growing enormous amount of data especially in biomedical domain still remains a challenging question to many researchers. The goal of this project is to build a framework on which researchers can query and generate visualizations of the latest progress in biomedical publications or their areas of interest. PubMed, the largest repository of medical data maintained by the United States National Library of Medicine, is used as the source for journal abstracts and other metadata. Named Entity Recognition, specially developed for biomedical literature, is performed on the abstracts to extract entities like chemicals, drugs, proteins, and genes. The proposed system uses multinomial logistic regression model, combined with rule-based methods to identify biomedical entities in the abstracts. A graph database is built with the entities identified, along with metadata from the publication. A graph database is used because we can explicitly represent relationships between the nodes. This allows semantic querying to be performed on the database, which is useful for making complex queries with ease. The graph schema features a tree representation of timelines which makes building time related queries much more efficient. The graph database used is Neo4j, an open-source graph database implemented in Java. As a result, a system which can be used by researchers and pharmaceuticals to identify research trends is developed.

Keywords:

Cite this article:
A. Beryl Joylin, Nancy Victor. Knowledge Discovery in Biomedical Literature. Research J. Pharm. and Tech. 2017; 10(6): 1911-1918. doi: 10.5958/0974-360X.2017.00335.3

Cite(Electronic):
A. Beryl Joylin, Nancy Victor. Knowledge Discovery in Biomedical Literature. Research J. Pharm. and Tech. 2017; 10(6): 1911-1918. doi: 10.5958/0974-360X.2017.00335.3 Available on: https://www.rjptonline.org/AbstractView.aspx?PID=2017-10-6-59