BioLink: A Platform for Building Integrated Biological Data Systems
A central challenge of bioinformatics is the integration of diverse
datasets from both public and proprietary biological data sets, and the
follow-on construction of software applications that leverage the
resulting integrated content that are easily accessible throughout the
enterprise. Bioinformatics groups looking to develop a new internal
application face a number of commonly recurring challenges including:
- Parsing, formatting, and loading of data from heterogeneous data sources.
- Developing the administrative tools for managing the automatic update of each data source in a timely manner.
- Enabling flexible querying of the integrated data for scientific users wishing to pose a wide variety of research questions.
- Ensuring scalability and technical support of the application to provide for future functional, analytical, or data-content enhancements.
Figure 1.
Tracing connections to ask a research question.
The development of integrated biological databases is motivated by
the need for scientists to ask deeper scientific questions. For example,
the question: "What compounds show activity against targets associated
with pathway X" can be investigated by tracing connections from the
pathway to its associated genes, the bioassays that target those genes,
and the compounds that show activity when screened against those assays
(Figure 1).
BioLink is a platform for facilitating the integration of biological
data, for updating the data automatically, and for building consistent
and powerful web-based interfaces that enable the integrated data
content to be searched, explored, extracted, and analyzed (Figure 2).
It provides an end-to-end solution for the rapid delivery of biological
databases. It includes a database layer for containing both public
and proprietary datasets along with meta-data descriptors that define
the content and how it should be rendered, a JSP-based service layer for
extracting data content in response to user queries, and a user-interface
layer built upon a growing collection of powerful and re-usable container
and charting widgets. These widgets can be readily organized into pages
(perspectives) and separate tabs as required.
Figure 2.
Architecture Overview.
| Web UI |
| Data Services |
| Integrated Data |
|
 |
|
A toolkit of reusable Web 2.0 widgets for displaying structured content.
Pages are organized into customizable perspectives, each consisting
of one or more tabs. Each tab has a distinct layout which
contains one or more widgets.
|
|
JSP framework for serving up annotations (JSON structured content),
meta-data (detailed description of the content), and UI definitions
(mappings between Web UI widgets and metadata).
|
|
Integrated data content, including data update services, and an API that serves up
both public and proprietary data content as requested by asynchronous data services.
|
|
For example, our DataTable widget (Figure 3) is a convenient
container for the display of tabular data in any context throughout the
interface. It automatically provides support for configuring column
layout, multi-column sorting, data export, field-specific URLs, row
selection, and a menu of actions that can be triggered against selected
items.
Figure 3.
DataTable widget displaying the gene results of a search for gpcr*, sorted by Gene Symbol, with several highlighted genes.
Our charting widgets (Figure 4) leverage the Google charting API
to enable insertion of both static and dynamically constructed
histograms, pie charts, scatter plots, and other visualizations. They can
be used in the context of both for data analysis and system
administration.
Figure 4.
3d Pie Chart and Bar Chart with IQR, Median, and Mean.
Our BioLink Explorer system is a publicly accessible application that
exemplifies some of the key features of the full Enterprise platform.
It integrates knowledge of human genes, biological pathways and
interactions, disease associations, functional annotations, PubChem
bioassays, references, and clinical trials. BioLink Enterprise is
flexible enough to be the basis for a wide range of integrated
data-centric applications where ease-of-use, scalability,
maintainability, and cost-effectiveness are key project objectives.
Please contact us for more information about how BioLink can address
your data integration and application development needs.