SynVar

Variant expansion and normalisation

Background

Genetic variants are drawing increasing interest regarding their role in pathologies, for designing new drugs or refining treatment efficacy through stratification. However, variant interpretation depends on time-consuming curation tasks. To support variant interpretation efforts and decisions based on the latest evidence, we propose Variomes [1], a service performing variant-specific triage of publications.

To increase the comprehensiveness of Variomes, we developed SynVar. This tool enables variant expansion and cross-level representation normalizations. This task faces different challenges:

While several variant databases and registries exist, such as ClinVar, dbSNP, and the ClinGen Allele Registry, relying on them as reference terminologies presents several limitations:

Description

To enable a smooth and effective retrieval of variants in the literature, we developed a variant expansion and normalisation tool that enables to generate for a given variant – including variants not described in existing databases – its corresponding description at the genomic (g.), cDNA (transcript-based, c.), and protein (p.) levels, in the HGVS format as well as in many non-standard yet frequently used descriptions found in the literature. It is adapted for variant expansion and normalisation from any description level.

Supported variant types

SynVar supports the following variant types according to HGVS nomenclature:

Isoform support

SynVar can recognize and process variants specified on protein isoforms. When the optional parameter iso=true is provided, the tool expands the variant to all available isoforms of the gene. The system accepts:

Example: TP53 R248W with iso=true returns variant representations for all 9 TP53 isoforms. The variant is first validated on the canonical isoform (P04637-1). If not valid there, the system automatically searches other isoforms. With iso=true, all 9 isoforms are returned regardless of which isoform was initially validated.

Workflow

Use-cases

Protein variant: the change is validated on the reference sequence of the canonical isoform, by default, as retrieved by the UniProt API tool [2]. The valid variant is then backtranslated into the possible cDNA variants, using the back-translator tool from Mutalyzer [3]. Finally the cDNA variant is mapped onto its genomic position (GRCh37 and GRCh38 builds) using VariantValidator [4].

cDNA variant: the variant is validated and mapped onto genome position using VariantValidator [4], which also translates it into the corresponding protein variant.

Genomic variant: the variant is validated and converted to the cDNA variants using VariantValidator [4], if not intergenic. VariantValidator also provides the translation into protein variants. If intergenic, only genomic variant representations are generated.

dbSNP id: The different genomic variants associated to the dbSNP [5] id are retrieved through the NCBI eutils services. The conversion and translation procedure from genomic variant is similar to the one described above.

ClinGen Allele Registry ID: The genomic variant corresponding to the ClinGen Allele Registry ID (CA ID) is retrieved through the ClinGen Allele Registry [6]. The genomic mapping and translation is similar to the one described above.

Output

Results are returned as a list of genomic variants (defined by chromosome, position, reference allele and alternate allele), along with their corresponding cDNA (transcript-based) and protein variants, grouped by genes and isoforms. The output is in XML format. The main elements are the following:

Programmatic access

URL

https://synvar.sibils.org/generate/literature/fromMutation

Parameters

Optional parameters

Examples

Substitutions (SNPs)

Deletions

Duplications and Insertions

Isoform-specific queries

Database identifiers

Special cases with map parameter

Automatic detection (without ref or level parameters)

Variant extraction from complex text

Normalization only (norm parameter)

Output formats

Search interface

Fields

Template programs

Example scripts to query the service and parse the output:

References

  1. Mottaz A, Pasche E, Michel PA, Mottin L, Teodoro D, Ruch P. Designing an Optimal Expansion Method to Improve the Recall of a Genomic Variant Curation-Support Service. Stud Health Technol Inform. 2022 May 25;294:839-843. doi: 10.3233/SHTI220603. PubMed
  2. Pasche E, Mottaz A, Caucheteur D, Gobeill J, Michel PA, Ruch P. Variomes: a high recall search engine to support the curation of genomic variants. Bioinformatics. 2022 Apr 28;38(9):2595-2601. doi: 10.1093/bioinformatics/btac146. PubMed>
  3. The UniProt Consortium (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523–D531. https://doi.org/10.1093/nar/gkac1052
  4. den Dunnen J. T. (2016). Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer. Current protocols in human genetics, 90, 7.13.1–7.13.19. https://doi.org/10.1002/cphg.2
  5. Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J., & Dalgleish, R. (2018). VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions. Human mutation, 39(1), 61–68. https://doi.org/10.1002/humu.23348
  6. Smigielski, E. M., Sirotkin, K., Ward, M., & Sherry, S. T. (2000). dbSNP: a database of single nucleotide polymorphisms. Nucleic acids research, 28(1), 352–355. https://doi.org/10.1093/nar/28.1.352
  7. Pawliczek, P., Patel, R. Y., Ashmore, L. R., Jackson, A. R., Bizon, C., Nelson, T., Powell, B., Freimuth, R. R., Strande, N., Shah, N., Riegel, B., Meeks, M., Levy, M. A., Kattman, B., Berg, J. S., & Harrison, S. M. (2018). ClinGen Allele Registry links information about genetic variants. Human mutation, 39(11), 1690–1701. https://doi.org/10.1002/humu.23637