Overview

  vk calc sample_hom_gt <vcf>
  vk calc genotypes [--frequency] <vcf>
  vk calc spectrum <vcf>

The calc command can be used examine the frequency or count of genotypes/alleles from a VCF in different ways.

Calculate shared homozygous genotypes

    vk calc sample_hom_gt <vcf>

The above command calculates the frequency of homozygous genotypes (e.g. number of singletons, doubletons, tripletons, etc.) by sample.

An example of the output from this utility appears as the table below. The first column indicates the name of the sample. The second column indicates how often the genotype occurs in the population (within the VCF). Finally, the third column indicates the number of cases that exist for that sample. Therefore, the first row indicates that ED3052 has two singletons. The second row indicates that ED3052 shares a single genotype with another sample for three variants.

sample freq_of_gt n_gt_at_freq
ED3052 1 2
ED3052 2 3
ED3052 3 2
ED3052 4 2
ED3052 5 2
ED3052 6 1
ED3052 7 2
ED3052 8 2
ED3052 9 1

Calculate genotype distributions

The following command will output a table that looks like this:

    vk calc genotypes <vcf>

Providing the following output:

n ref het alt mis
937 14 0 0 0
328 13 0 0 1
242 13 0 1 0
168 12 0 0 2
101 11 0 0 3
94 12 0 1 1
89 12 0 2 0
73 10 0 0 4

Where:

  • n = number of variants/records with the genotype distribution shown.
  • ref - number of homozygous reference genotypes
  • het - number of heterozygous genotypes
  • alt - number of homozygous alternative genotypes (biallelic variants only)
  • mis - number of missing genotypes

In the example above, the first row would represent 937 records where there are 14 reference genotypes and 0 heterozygous, 0 homozygous alternative, and 0 missing genotypes.

Optionally, you can specify the --frequency flag to calculate the frequencies of each genotype.

    vk calc genotypes --frequency <vcf>
n ref het alt mis
937 1 0 0 0
328 0.928571 0 0 0.0714286
242 0.928571 0 0.0714286 0
168 0.857143 0 0 0.142857
101 0.785714 0 0 0.214286
94 0.857143 0 0.0714286 0.0714286
89 0.857143 0 0.142857 0
73 0.714286 0 0 0.285714
62 0.785714 0 0.214286 0
46 0.714286 0 0.285714 0

Calculate the frequency of alleles

    vk calc spectrum <vcf>

The above command will generate the output below:

n alt_allele_freq
3 1
10 0.928571
1 0.923077
1 0.9
7 0.857143
1 0.846154
4 0.833333
2 0.818182
5 0.785714
1 0.777778
1 0.769231
7 0.75
7 0.714286

This provides the number of records and their corresponding allele frequency for alternative alleles only. All alleles that are non-reference are counted.