We are excited to announce that automated annotations of
pharmacogenomic information in the scientific literature are now available from
PharmGKB. These annotations have been produced using the PGxMine project, the
result of a collaboration with
Dr. Jake Lever at Stanford University.
PGxMine uses a supervised machine learning algorithm to
carry out text mining of PubMed abstracts and full-text articles from PubMed
Central. Sentences which contain a chemical and a variant are found using the
PubTator Central resource and subsequently identified as being highly likely to contain PGx
information are highlighted as an automated annotation. Automated annotations will
also be used by PharmGKB curators to identify papers for manual curation.
The new automated annotations tab can now be found on drug,
gene, variant and haplotype pages on the PharmGKB website. Each automated
annotation displays the relevant sentence identified by PGxMine as well as
information about the article where the sentence was found. PGxMine was
deliberately designed to have a high level of precision at the expense of a lower recall
rate. This means that PGx associations that are mentioned in multiple papers should
be captured by the algorithm while associations mentioned in only one paper may
be missed.
Unlike variant annotations or clinical annotations, which
are manually curated by PharmGKB curators, automated annotations are found
using computational methods only. The accuracy or relevance of these
annotations has not been checked by PharmGKB staff. Users should therefore be
aware that there is some noise associated with these annotations. Users should
also note that this is not a comprehensive annotation of all published
articles. Articles which are only accessible through a journal subscription
cannot be annotated by PGxMine and will not be displayed in the automated
annotations section.
A paper describing PGxMine in greater detail has been accepted
by the
Pacific Symposium on Biocomputing and will be available online soon. We will add the URL as a comment to this
blog post as soon as it is available. An
FAQ page about automated annotations
and the PGxMine project can be found on the PharmGKB website.
Future updates of our automated annotations will be tied to
the update schedule of PubTator Central. The PGxMine code is open source and
can be accessed at
GitHub while a full data dump can be accessed at
Zenodo.