Skip to content

anvilproject/ibd-king

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KING

KING (Kinship-based Inference for GWAS) is a relationship inference tool that estimates kinship coefficients for all pairwise relationships. Unrelated pairs can be precisely separated from close relatives with no false positives, with accuracy up to 3rd- or 4th-degree (depending on array or WGS) for --related and --ibdseg analyses, and up to 2nd-degree for --kinship analysis.

The KING orchestration workflow estimates kinship coefficients from VCF files. At least two samples must be used as input to the workflow. Samples for analysis can either be in multiple single-sample VCFs, a single joint VCF, or multiple joint VCFs that do not have overlapping samples. There are four options for running KING: "ibdseg", "kinship", "related", and "duplicate". This workflow will run with the "related" option by default. Each option uses a different algorithm or specifications to estimate kinship. The output file types vary for each flag that can be run with KING. Refer to the original KING manual for more information on the differences between the options for running KING.

Note: The "ibdseg" option will fail to produce an output if no IBD segments are found. It can also fail if there are less than 10 samples in the analysis, in which case the "kinship" option is better suited.

KING Orchestration Input Parameters

An few example JSON input files for running KING are provided in the example folder of this repo. The inputs within the JSON files are dummy paths and are not meant to be used as is. Descriptions of each input are outlined below:

Type Name Req'd Description Default Value
Array[File] input_vcfs Yes VCFs for identifying related individuals; VCFs within the array must not have overlapping samples and should be gzipped
Array[File] input_vcfs_idx Yes Index files for input_vcfs
File input_bed No BED file for filtering joint VCFs; Use a BED file to increase efficiency of merging dataset VCFs
String output_basename Yes Basename for file outputs
String run_type No Type of flag to be used for running KING; Either "ibdseg", "kinship", "related", or "duplicate" "related"
Int degree No The maximum degree of relatedness to include in KING output 3
Boolean missing_to_ref No If true, all missing variant calls will be converted to reference genotypes (0/0) when merging VCFs false
String bcftools_docker_image No Docker image for bcftools "us-central1-docker.pkg.dev/mgb-lmm-gcp-infrast-1651079146/mgbpmbiofx/bcftools:1.17"
String king_docker_iamge No Docker image with KING tools "uwgac/topmed-master@sha256:0bb7f98d6b9182d4e4a6b82c98c04a244d766707875ddfd8a48005a9f5c5481e"

KING Orchestration Output Parameters

Type Name When Description
File kin_output When either the --kinship or --related flag is used .kin file that contains kinship coefficients of individuals
File kin0_output When either the --kinship or --related flag is used Second .kin file that contains kinship coefficients of between-family relationship checking
File seg_output When the --ibdseg flag is used .seg file that contains kinship coefficients and inferred relationships of samples
File con_output When the --duplicate flag is used .con file that contains only duplicate individuals

KING WDL Tasks

The WDL tasks used by the KING Orchestration workflow are contained within the KingTasks.wdl document within this repo. This includes tasks that manipulate VCFs and tasks that will run KING. Below are the inputs and outputs for each task:

FilterVcfTask

This task will filter the input VCF file to contain only regions within the input BED file. It will then count the number of SNPs and samples in the resulting VCF. If no input BED is given, the task will simply count the number of SNPs and samples in the input VCF.

Input Parameters

Type Name Req'd Description Default Value
File input_vcf Yes VCF or VCF gz to filter
File input_vcf_idx Yes Index file corresponding to input VCF
File input_bed No BED file containing regions for filtering
String output_basename No Basename for output filtered VCF Defaults to basename of input VCF
Int addldisk No Addition disk space to add to the final runtime disk space in GB 10
Int preemptible No Number of retries for VM 1

Output Parameters

Type Name When Description
File output_vcf_gz If an input BED is supplied VCF filtered to regions in the input BED file
Int num_snps Always Number of SNPs in the input bed file
Int num_samples Always Number of samples in the output VCF

MergeVcfsTask

This task will merge all the input VCF files into a single VCF. The input VCF files must not have any overlapping samples. If an BED file is supplied, it will simultaneously filter the VCFs to the regions within the BED file. When merging the VCFs, there is an option to convert missing variant calls for any samples to reference calls. Finally, the task will count the number of SNPs and the number of samples in the resulting merged VCF.

Input Parameters

Type Name Req'd Description Default Value
Array[File] input_vcfs Yes VCFs with non-overlapping samples to merge into one VCF
Array[File] input_vcfs_idx No Index files corresponding to input VCFs; must be in the same order as the input VCF array
File input_bed No BED file with regions for filtering
Boolean missing_to_ref No If true, all missing variant calls will be converted to reference genotypes (0/0) false
String output_basename Yes Basename for output files
String docker_image Yes Docker image for bcftools
Int addldisk No Additional disk space to add to the final runtime disk space in GB 10
Int mem_size No Memory for runtime Defaults to 4; If the size of input VCFs is greater than 10, defaults to 8
Int preemptible No Number of retries for VM 2

Output Parameters

Type Name When Description
File merged_vcf Always Merged VCF of all input VCFs, filtered to regions in the input BED file if given
Int num_snps Always Number of SNPs in the input bed file
Int num_samples Always Number of samples in the output VCF

Vcf2BedTask

This task will convert a VCF to PLINK bed, bim, and fam files for use with KING.

Input Parameters

Type Name Req'd Description Default Value
File input_vcf Yes VCF to convert to PLINK BED
String output_basename No Basename for output files Defaults to basename of input VCF
String docker_image Yes Docker image that contains PLINK
Int addldisk No Addition disk space to add to the final runtime disk space in GB 10
Int plink_mem No Memory to use for PLINK in GB; Actual runtime memory will be twice the size of the input PLINK memory 4
Int preemptible No Number of retries for VM 1

Output Parameters

Type Name When Description
File bed_file Always PLINK BED from VCF
File bim_file Always BIM file corresponding to output PLINK BED
File fam_file Always FAM file corresponding to output PLINK BED

RunKingTask

This task will run KING, a kinship estimation tool. This tool has several flags to run different relationship inferences, each using a different algorithm or specifications to estimate kinship. The output file types vary for each flag that can be run with KING. Refer to the KING manual for further descriptions on each flag.

Input Parameters

Type Name Req'd Description Default Value
File bed_file Yes PLINK BED file from converting input VCF to BED
File fam_file Yes PLINK FAM file corresponding to input BEB
File bim_file Yes PLINK BIM file corresponding to input BED
Int degree No Largest degree of relatedness allowed for KING relationships 3
String flag Yes Flag to run a specified KING algorithm; either "ibdseg", "related", "kinship" or "duplicate"
String output_basename Yes Basename for output files
String docker_image Yes Docker image for running KING
Int addldisk No Addition disk space to add to the final runtime disk space in GB 10
Int cpu No CPU for runtime 2
Int mem_size No Memory for runtime 4
Int preemptible No Number of retries for VM 2

Output Parameters

Type Name When Description
File kin_output When either the --kinship or --related flag is used .kin file that contains kinship coefficients of individuals
File kin0_output When either the --kinship or --related flag is used Second .kin file that contains kinship coefficients of between-family relationship checking
File seg_output When the --ibdseg flag is used .seg file that contains kinship coefficients and inferred relationships of samples
File con_output When the --duplicate flag is used .con file that contains only duplicate individuals

References

Original KING paper: Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-2873

KING tutorial and manual: https://www.kingrelatedness.com/manual.shtml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages