SynVar

Variant expansion and normalisation

Background

Genetic variants are drawing increasing interest regarding their role in pathologies, for designing new drugs or refining treatment efficacy through stratification. However, variant interpretation depends on time-consuming curation tasks. To support variant interpretation efforts and decisions based on the latest evidence, we propose Variomes [1, 2], a service performing variant-specific triage of publications.

To increase the comprehensiveness of Variomes, we developed SynVar. This tool enables variant expansion and cross-level representation normalizations. This task faces different challenges:

While several variant databases and registries exist, such as ClinVar, dbSNP, and the ClinGen Allele Registry, relying on them as reference terminologies presents several limitations:

Description

To enable a smooth and effective retrieval of variants in the literature, we developed a variant expansion and normalisation tool that enables to generate for a given variant – including variants not described in existing databases – its corresponding description at the genomic (g.), cDNA (transcript-based, c.), and protein (p.) levels, in the HGVS format as well as in many non-standard yet frequently used descriptions found in the literature. It is adapted for variant expansion and normalisation from any description level.

Supported variant types

SynVar supports the following variant types according to HGVS nomenclature:

Supported input formats

In addition to HGVS notation, variants can be provided in the following formats, which are automatically converted to HGVS before processing:

Expansion options

The following optional parameters generate additional synonyms in mode=expand:

Workflow

Use-cases

Protein variant: the change is validated on the reference sequence of the canonical isoform, by default, as retrieved by the UniProt API [3]. The valid variant is then backtranslated into the possible cDNA variants, using the back-translator tool from Mutalyzer [4]. Finally the cDNA variant is mapped onto its genomic position (GRCh37 and GRCh38 builds) using VariantValidator [5].

cDNA variant: the variant is validated and mapped onto genome position using VariantValidator [5], which also translates it into the corresponding protein variant.

Genomic variant: the variant is validated and converted to the cDNA variants using VariantValidator [5], if not intergenic. VariantValidator also provides the translation into protein variants. If intergenic, only genomic variant representations are generated.

dbSNP id: The different genomic variants associated to the dbSNP [6] id are retrieved through the NCBI eutils services. The conversion and translation procedure from genomic variant is similar to the one described above.

ClinGen Allele Registry ID: The genomic variant corresponding to the ClinGen Allele Registry ID (CA ID) is retrieved through the ClinGen Allele Registry [7]. The genomic mapping and translation is similar to the one described above.

Output

Results are returned as a list of genomic variants, along with their corresponding cDNA (transcript-based) and protein variants, grouped by genes and isoforms. The output content depends on the mode parameter:

mode=expand (default): full variant expansion with the following elements:

mode=normalize: normalized identifiers without syntactic variations:

The output format is controlled by the format parameter: xml (default), json (same structure in JSON), or vrs (GA4GH VRS-structured JSON including a VRS Allele object derived from the SPDI representation; the vrs format implies mode=normalize).

Programmatic access

URL

https://synvar.sibils.org/api

The previous URL /generate/literature/fromMutation is still supported for backward compatibility.

Parameters

Parameter Description Example Default value
variant Variant description, ClinGen Allele Registry ID, or dbSNP id. Can include the gene/reference or free text containing variants. Also accepts SPDI, VCF, IVS, and HGVS repeat notation as input.
Free text (variant in standard or non-standard format, with or without gene/reference)
V600E, BRAF V600E, c.1799T>A, NM_004333.6:c.1799T>A, rs113488022, CA251544, NC_000007.14:140753335:A:T (SPDI), 7:140753336:A:T (VCF), IVS1+1G>A no default value mandatory
ref Gene name or chromosome number/name. Optional when included in the variant field, or when using dbSNP/ClinGen identifiers.
Free text (gene name, chromosome number/name, sequence accession: RefSeq NM_/NP_/NC_, Ensembl ENST/ENSP, LRG)
BRAF, JAK2, 9, X no default value optional
level Level of the variant description. When set to any, the level is detected automatically.
Possible values: protein, cdna (or transcript), genome, genome38 (or genome_grch38), genome37 (or genome_grch37), dbsnp, clingen, any
The genome38/genome37 shortcuts combine level=genome with assembly filtering.
protein any optional
iso Expand to all available isoforms of the gene.
Possible values: true, false
true false optional
map Require genome mapping. When true, results are only returned if genome mapping succeeds. When false, outputs syntactic variations even without successful genome mapping.
Possible values: true, false
true false optional
mode Processing mode. expand generates all synonyms and syntactic variations. normalize returns only normalized identifiers (HGVS, dbSNP, ClinGen, SPDI, VCF) without syntactic variations.
Possible values: expand, normalize
The previous parameter norm=true is equivalent to mode=normalize.
normalize expand optional
format Output format. xml and json return the same structure in different formats. vrs returns a GA4GH VRS-structured JSON with HGVS, SPDI, VCF and VRS Allele (implies mode=normalize).
Possible values: xml, json, vrs
json xml optional
startMet Enable Start Met ±1 shift. Generates additional protein synonyms at position−1 and accepts input with +1 fallback (e.g. BRAF V600E also generates V599E).
Possible values: true, false
true false optional
insForDup Generate insertion-equivalent synonyms for duplications (e.g. A763dup → A763_Y764insA).
Possible values: true, false
true false optional
leftAlign Generate left-aligned (shifted) synonyms for deletions and duplications in repetitive regions. HGVS mandates 3' alignment; this adds left-aligned and intermediate forms.
Possible values: true, false
true false optional
assembly Restrict genomic mapping to a specific genome assembly. When not specified, both GRCh38 and GRCh37 mappings are returned.
Possible values: GRCh38, GRCh37, hg38, hg19
Alternatively, the assembly can be specified via the level parameter: genome38 or genome_grch38 for GRCh38, genome37 or genome_grch37 for GRCh37.
GRCh38 both assemblies optional

Examples

Substitutions (SNPs)

Deletions

Duplications and Insertions

Deletion-insertions (delins)

Frameshifts

Isoform-specific queries

Database identifiers

SPDI and VCF input

IVS (Intervening Sequence) notation

HGVS repeat notation

Assembly-specific queries

Special cases with map parameter

Automatic detection (without ref or level parameters)

Variant extraction from complex text

Normalization (mode parameter)

Output formats

Advanced options

Search interface

Fields

Template programs

Example scripts to query the service and parse the output:

References

  1. Mottaz A, Pasche E, Michel PA, Mottin L, Teodoro D, Ruch P. Designing an Optimal Expansion Method to Improve the Recall of a Genomic Variant Curation-Support Service. Stud Health Technol Inform. 2022 May 25;294:839-843. doi: 10.3233/SHTI220603. PubMed
  2. Pasche E, Mottaz A, Caucheteur D, Gobeill J, Michel PA, Ruch P. Variomes: a high recall search engine to support the curation of genomic variants. Bioinformatics. 2022 Apr 28;38(9):2595-2601. doi: 10.1093/bioinformatics/btac146. PubMed
  3. The UniProt Consortium (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523–D531. https://doi.org/10.1093/nar/gkac1052
  4. den Dunnen J. T. (2016). Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer. Current protocols in human genetics, 90, 7.13.1–7.13.19. https://doi.org/10.1002/cphg.2
  5. Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J., & Dalgleish, R. (2018). VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions. Human mutation, 39(1), 61–68. https://doi.org/10.1002/humu.23348
  6. Smigielski, E. M., Sirotkin, K., Ward, M., & Sherry, S. T. (2000). dbSNP: a database of single nucleotide polymorphisms. Nucleic acids research, 28(1), 352–355. https://doi.org/10.1093/nar/28.1.352
  7. Pawliczek, P., Patel, R. Y., Ashmore, L. R., Jackson, A. R., Bizon, C., Nelson, T., Powell, B., Freimuth, R. R., Strande, N., Shah, N., Riegel, B., Meeks, M., Levy, M. A., Kattman, B., Berg, J. S., & Harrison, S. M. (2018). ClinGen Allele Registry links information about genetic variants. Human mutation, 39(11), 1690–1701. https://doi.org/10.1002/humu.23637