Monday, July 11, 2016

Improvements for Variant data in PharmGKB

PharmGKB is constantly improving the way we organize and annotate data in the field of pharmacogenomics. One of our recent improvements involves how we store and refer to Variants. Genetic variation is a very important part of what we do here at PharmGKB. That importance means we need a clear and effective way to store and use genetic variations throughout the site and services we offer.

The primary change is that Variant objects are now assigned PharmGKB Accession Identifiers (i.e. PA numbers). You’ve probably already seen that Genes and Chemicals have Accession Identifers (e.g. CYP2C19 has id PA124) and now that same system of identifiers is extended to the Variants that we annotate. This may not seem like a big change on the outside but it has big implications for how we can use Variants in our tools and through our API. One of these implications is that we’ll be able to more easily integrate rare variants and variations that aren’t tied to records in dbSNP.

The second change is that we will only store data on variants that we’ve annotated. It was previously possible to link to https://www.pharmgkb.org/rsid/{rsid} and get a summary of the variant whether PharmGKB had annotated that dbSNP record or not. PharmGKB has a large amount of annotations on pharmacogenomic knowledge but that covers a very, very small percentage of the 153,953,962 variants cataloged in the current release of dbSNP. This method made our system bloated and more complex so we trimmed it down. PharmGKB curators now add Variant records as they annotate them and can update Variant information when necessary.

Other improvements to Variant records include:
  • Each variant gets a mandatory name and optional symbol (e.g. rs# if applicable)
  • Variants have a primary sequence location that we consider canonical for our annotations but this location is optional for variants that haven’t been located on a specific sequence yet.
  • Variants can also have alternate sequence locations. For example, our primary location for a variant is on GRCh37 but we may have an alternate sequence location for GRCh38.
  • We now have better support for tracking alternate names given to a Variant in publications
  • Variants will use the GRCh37 assembly for their primary location when possible
Existing URLs for Variant pages on PharmGKB remain in the same format (i.e. https://www.pharmgkb.org/rsid/{rsid}) and we also support the new URL format of https://www.pharmgkb.org/variant/{PA#}. The layout of the Variant page has been updated slightly to add in the new information about sequence locations and other properties.

Keep an eye out for more changes to Variant pages and data as we take advantage of these new features.

No comments: