Overview
vk genome location [<path>]
vk genome list
vk genome --search=<term>
vk genome ncbi [options] --ref=<asm_name> [--accession-chrom-names]
vk genome wormbase [options] --ref=<asm_name>
The genome
command can be used to download and prepare reference genomes for use with other tools within vcf-kit
. vcf-kit
will do the following when downloading a reference genome:
- Download the reference genome.
- If downloaded from NCBI,
vcf-kit
will attempt to replace chromosomes names with shorthand descriptors of if possible (e.g. Chromosome I). - Unzip the gzipped reference and re-compress using
bgzip
. - Create a
bwa
index. - Create a
samtools
index. - Create a
blast
database. - Remove temporary file names.
Usage
View/set genome directory
By default, genomes are stored within your home directory in a .genome
folder. The location of this directory can be viewed by typing:
vk genome location
Tip
You can use a bash alias to access the genome currently set with vcf-kit. Add this to your .bash_profile:
alias GENOME=`vk genome location`
And you can access the currently set genome using GENOME
.
Additionally, you can change the location by adding a path:
vk genome /path-to-my-new-genome-directory
List genomes
A list of downloaded genomes can be viewed using:
vk genome list
Download Genomes
Search NCBI
vcf-kit
makes it easy to obtain and prepare genomes from the NCBI genome database. To do this, it downloads a text file containing a list of all available genomes and uses this for searching purposes. To search for a genome, you can type:
vk genome --search="Human cyclovirus"
The results of the search will be output in a table:
Genome Directory: /Users/dancook/.genome
Searching...
assembly_accession bioproject organism_name asm_name ...
-------------------- ------------ -------------------------- --------------- ...
GCF_000908835.1 PRJNA209365 Human cyclovirus VS5700009 ViralProj209365 ...
GCF_000918035.1 PRJNA243497 Human cyclovirus ViralProj243497 ...
To download a genome and setup for use:
vk genome --ref=<asm_name>
As the instructions illustrate, you can download the genome by providing the asm_name
(assembly name).
Download from NCBI
Set --ref
to an asm_name
from the search results table to download a genome.
vk genome --ref=ViralProj209365
Wormbase
Reference genomes can also be obtained from wormbase.
vk genome wormbase --ref=WS245
Custom Directories
It is possible to set the directory to download a genome using the --directory
parameter. This command will set the directory to the current location.
vk genome ncbi --directory="." --ref=ViralProj15089