Overview

The primer command can be used to generate primers for the purpose of verifying genotype calls present within a VCF. Primers can be generated for verifying genotypes using snip-SNPs, indels (based on product size), and using sanger sequencing.

All of the primer subcommands require you to specify a reference using --ref. A reference can be obtained using the vk genome command.

Options

  • --ref - Set the reference genome to use. Obtain genomes using vk genome.
  • --region - Restrict primer generation to a specific region.
  • --samples - Output genotypes for a sample or set of samples.
  • --template - The template to use for generating primers.
  • --size - For the template command, sets the size of the region output, specified as an integer (e.g. 300). For sanger, sets the target amplicon size, specified as a range (e.g. 400-600).
  • --polymorphic - Only outputs variants where there is at least one 0/0 and one 1/1 genotype.
  • --nprimers - Specify the number of primer sets desired for each variant.

Output

Output is in tab-seperated value (TSV) format. The following columns are always output:

  • CHROM - chromosome/contig
  • POS - chromosomal position
  • region - The region used as a template to generate primers. Note that the amplicons will be smaller
  • REF - Reference allele
  • ALT - Alternative allele
  • template_sample - Template used for generating primers. Default is to use ALT calls.

These additional columns are output for snip, indel, and sanger options:

  • variant_count - Number of variable sites within interval among the samples specified. Includes missing/heterozygous sites. A high variant count should be avoided.
  • primer_left - Left PCR primer
  • primer_right - Right PCR primer
  • melting_temperature - Melting temperature for the left and right primers (TM), separated by a comma.
  • amplicon_length - Sequence of the amplified piece of DNA
  • amplicon_region - Genomic region (Chromsome:Start-End) of the amplified region.
  • amplicon_sequence - Sequence of the amplified region.
  • 0/0 - Comma-delimited samples with homozygous reference genotypes
  • 1/1 - Comma-delimited samples with homozygous alternative genotypes
  • polymorphic - True when there is at least one 0/0 and one 1/1 genotype.

Additionally, there are columns specific to the type of primers being generated.

snip output columns

  • ref_sites - Cut Position : Product Sizes for Reference genotype (see example below)
  • alt_sites - Cut positions : Product Sizes for Alternative genotype (see example below)
  • restriction_enzyme - The restriction enzyme to use.
  • restriction_site - The sequence/motif used by the restriction enzyme
  • restriction_site_length - Length of the restriction site

ref_sites/alt_sites are denoted as the following example illustrates:

ref_sites: 183:183,472 - There is one restriction site at position 183; Cutting that site produces two products. One is 183 bp, the other is 472 bp.

The alternative genotypes produce a different set of restriction sites and product sizes:

183,258:183,75,397 - Cut at 183 and 258; Product sizes are 183, 75, and 397.

indel output columns

  • indel_size - basepair size of interval
  • indel_type - Deletion or Insertion relative to the reference
  • REF_product_size - PCR product size with homozygous reference genotype
  • ALT_product_size - PCR product size with homozygous alternative genotype

sanger output columns

variant_distance - Indicates the distance from the start of the amplicon to the variant position.

Primer3 Records

The section below describes how primers are generated using Primer3. When VCF-kit generates primers, it creates a primer configuration file in the Boulder-IO format. The options are set as follows.

The options below are always used:

PRIMER_GC_CLAMP=1
PRIMER_MAX_SIZE=20
PRIMER_MIN_SIZE=18
PRIMER_OPT_SIZE=20
PRIMER_TASK=pick_pcr_primers

The options below depend on specified options or the type of primers being generated.

PRIMER_NUM_RETURN=5 # Can be set using -nprimers
PRIMER_PRODUCT_SIZE_RANGE=600-800 # Can be set for template and sanger options.
PRIMER_THERMODYNAMIC_PARAMETERS_PATH # set to either /usr/local/share/primer3_config/ or /usr/local/share/primer3/primer3_config/

Changes for each variant; The region down- and upstream of the variant:

SEQUENCE_TEMPLATE

Commands

template

Template can be used to fetch the region surrounding variants. Set the size of the desired region using --size=<int>.

snip

snip-SNPs are single-nucleotide polymorphisms that modify a restriction site, resulting in a restriction fragment length polymorphism (RFLP). They are useful because they can be used to determine the genotype of a sample using only PCR and restriction enzymes. VCF-kit takes the context of SNPs and determines whether a snip-SNP is present. Then it generates primers and estimates product sizes with and without the restriction site present. snip-SNP primers can be generated using vk primer snip.

--enzymes

Restriction enzymes can be set using the --enzymes option as a tab-delimited list or by specifying one of the groups below:

vk primer snip --ref=WBcel235 --enzymes=DraI data/test.vcf.gz # Specify a single restriction enzyme
vk primer snip --ref=WBcel235 --enzymes=DraI,BssAI <vcf> # Specify a list by delimiting with a comma
vk primer snip --ref=WBcel235 --enzymes=HF <vcf> # Specify a group of enzymes; All/Common/HF

Examples

vk primer snip --ref=WBcel235 data/test.vcf.gz

The table has been formatted for easier reading. Normal output is tab-seperated.

 CHROM | POS     | region            | REF | ALT | template_sample | variant_count | ref_sites           | alt_sites                   | restriction_enzyme | restriction_site | restriction_site_length | primer_left          | primer_right         | melting_temperature | amplicon_length | amplicon_region   | amplicon_sequence | 0/0 | 1-Jan | polymorphic |
-------|---------|-------------------|-----|-----|-----------------|---------------|---------------------|-----------------------------|--------------------|------------------|-------------------------|----------------------|----------------------|---------------------|-----------------|-------------------|-------------------|-----|-------|-------------|
 I     | 1198228 | I:1197728-1198727 | G   | A   | ALT             | 1             | 93,283:93,190,471   | 93:93,661                   | NdeI               | CATATG           | 6                       | gtattcagtgggcaagcagc | GGATTAGGCCACCATCCGAG | 59.547,59.965       | 754             | I:1197943-1198697 | gtatt…            |     |       | FALSE       |
 I     | 1487691 | I:1487191-1488190 | A   | C   | ALT             | 1             | 548,654:548,106,100 | 356,548,654:356,192,106,100 | BsuRI              | GGCC             | 4                       | TCAAAGCTGTTTTTGGCGGG | CTTCCCGACAACTTTGCTGC | 59.896,60.04        | 754             | I:1487335-1488089 | TCAAA…            |     |       | FALSE       |
 I     | 1487691 | I:1487191-1488190 | A   | C   | ALT             | 1             | 548,654:548,106,100 | 356,548,654:356,192,106,100 | BsnI               | GGCC             | 4                       | TCAAAGCTGTTTTTGGCGGG | CTTCCCGACAACTTTGCTGC | 59.896,60.04        | 754             | I:1487335-1488089 | TCAAA…            |     |       | FALSE       |

Note: Amplicon sequences are truncated in the above output.

Indels

Sanger

The sanger command will generate PCR primers that can be used to amplify a region of interest. The left primer can then also be used to initiate Sanger sequencing. sanger can be used to verify both indels and snps.

Tip

Set the amplicon size (--size) of the region between 500-800 bp. Your variant should be 50-600 bp upstream of the start of the amplicon as indicated by variant_distance.