PharmGKB Blog: PharmGKB releases automated annotations

Tuesday, October 1, 2019

PharmGKB releases automated annotations

We are excited to announce that automated annotations of pharmacogenomic information in the scientific literature are now available from PharmGKB. These annotations have been produced using the PGxMine project, the result of a collaboration with Dr. Jake Lever at Stanford University.

PGxMine uses a supervised machine learning algorithm to carry out text mining of PubMed abstracts and full-text articles from PubMed Central. Sentences which contain a chemical and a variant are found using the PubTator Central resource and subsequently identified as being highly likely to contain PGx information are highlighted as an automated annotation. Automated annotations will also be used by PharmGKB curators to identify papers for manual curation.

The new automated annotations tab can now be found on drug, gene, variant and haplotype pages on the PharmGKB website. Each automated annotation displays the relevant sentence identified by PGxMine as well as information about the article where the sentence was found. PGxMine was deliberately designed to have a high level of precision at the expense of a lower recall rate. This means that PGx associations that are mentioned in multiple papers should be captured by the algorithm while associations mentioned in only one paper may be missed.

Unlike variant annotations or clinical annotations, which are manually curated by PharmGKB curators, automated annotations are found using computational methods only. The accuracy or relevance of these annotations has not been checked by PharmGKB staff. Users should therefore be aware that there is some noise associated with these annotations. Users should also note that this is not a comprehensive annotation of all published articles. Articles which are only accessible through a journal subscription cannot be annotated by PGxMine and will not be displayed in the automated annotations section.

A paper describing PGxMine in greater detail has been accepted by the Pacific Symposium on Biocomputing and will be available online soon. We will add the URL as a comment to this blog post as soon as it is available. An FAQ page about automated annotations and the PGxMine project can be found on the PharmGKB website.

Future updates of our automated annotations will be tied to the update schedule of PubTator Central. The PGxMine code is open source and can be accessed at GitHub while a full data dump can be accessed at Zenodo.

1 comment:

Anonymous said...: The PGxMine paper can be accessed here https://psb.stanford.edu/psb-online/proceedings/psb20/Lever.pdf; December 5, 2019 at 5:10 PM