[med-svn] [vcftools] 04/08: Use official manpage

Andreas Tille tille at debian.org
Sun Jul 3 20:26:46 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository vcftools.

commit de5be31edee1f02434b3d80d404ac5183eacbe16
Author: Andreas Tille <tille at debian.org>
Date:   Sun Jul 3 22:09:59 2016 +0200

    Use official manpage
---
 debian/manpages        |   1 +
 debian/mans/vcftools.1 | 472 -------------------------------------------------
 2 files changed, 1 insertion(+), 472 deletions(-)

diff --git a/debian/manpages b/debian/manpages
index 4f4649b..5c1104c 100644
--- a/debian/manpages
+++ b/debian/manpages
@@ -1 +1,2 @@
 debian/mans/*.1
+src/cpp/*.1
diff --git a/debian/mans/vcftools.1 b/debian/mans/vcftools.1
deleted file mode 100644
index 84bcae8..0000000
--- a/debian/mans/vcftools.1
+++ /dev/null
@@ -1,472 +0,0 @@
-.TH VCFTOOLS "1" "July 2011" "vcftools 0.1.5" "User Commands"
-.SH NAME
-vcftools \- analyse VCF files
-.SH SYNOPSIS
-.B vcftools \fR[\fIOPTIONS\fR] 
-.SH DESCRIPTION
-The vcftools program is run from the command line. The interface is 
-inspired by PLINK, and so should be largely familiar to users of that 
-package. Commands take the following form:
-
-  vcftools \-\-vcf file1.vcf \-\-chr 20 \-\-freq
-
-The above command tells vcftools to read in the file file1.vcf, extract 
-sites on chromosome 20, and calculate the allele frequency at each site. 
-The resulting allele frequency estimates are stored in the output file, 
-out.freq. As in the above example, output from vcftools is mainly sent to 
-output files, as opposed to being shown on the screen.
-
-Note that some commands may only be available in the latest version of 
-vcftools. To obtain the latest version, you should use SVN to checkout the 
-latest code, as described on the home page.
-
-Also note that polyploid genotypes are not currently supported.
-
-.SS Basic Options
-.TP
-\fB\-\-vcf\fR <filename>
-This option defines the VCF file to be processed. The files need to be 
-decompressed prior to use with vcftools. vcftools expects files in VCF 
-format v4.0, a specification of which can be found here.
-.TP
-\fB\-\-gzvcf\fR <filename>
-This option can be used in place of the \-\-vcf option to read compressed 
-(gzipped) VCF files directly. Note that this option can be quite slow when 
-used with large files.
-.TP
-\fB\-\-out\fR <prefix>
-This option defines the output filename prefix for all files generated by 
-vcftools. For example, if <prefix> is set to output_filename, then all 
-output files will be of the form output_filename.*** . If this option is 
-omitted, all output files will have the prefix 'out.'.
-
-.SS Site Filter Options
-
-.TP
-\fB\-\-chr\fR <chromosom>
-Only process sites with a chromosome identifier matching <chromosome>
-.TP
-\fB\-\-from\-bp\fR <integer>
-.TP
-\fB\-\-to\-bp\fR <integer>
-These options define the physical range of sites will be processed. Sites 
-outside of this range will be excluded. These options can only be used in 
-conjunction with \-\-chr.
-.TP
-\fB\-\-snp\fR <string>
-Include SNP(s) with matching ID. This command can be used multiple times 
-in order to include more than one SNP.
-.TP
-\fB\-\-snps\fR <filename>
-Include a list of SNPs given in a file. The file should contain a list of 
-SNP IDs, with one ID per line.
-.TP
-\fB\-\-exclude\fR <filename>
-Exclude a list of SNPs given in a file. The file should contain a list of 
-SNP IDs, with one ID per line.
-.TP
-\fB\-\-positions\fR <filename>
-Include a set of sites on the basis of a list of positions. Each line of 
-the input file should contain a (tab-separated) chromosome and position. 
-The file should have a header line. Sites not included in the list are 
-excluded.
-.TP
-\fB\-\-bed\fR <filename>
-.TP
-\fB\-\-exclude\-bed\fR <filename>
-Include or exclude a set of sites on the basis of a BED file. Only the 
-first three columns (chrom, chromStart and chromEnd) are required. The 
-BED file should have a header line.
-.TP
-\fB\-\-remove\-filtered\-all\fR
-.TP
-\fB\-\-remove\-filtered\fR <sting>
-.TP
-\fB\-\-keep\-filtered\fR <sting>
-These options are used to filter sites on the basis of their FILTER flag. 
-The first option removes all sites with a FILTER flag. The second option 
-can be used to exclude sites with a specific filter flag. The third option 
-can be used to select sites on the basis of specific filter flags. 
-The second and third options can be used multiple times to specify multiple 
-FILTERs. The \-\-keep\-filtered option is applied before 
-the \-\-remove\-filtered 
-option.
-.TP
-\fB\-\-minQ\fR <float>
-Include only sites with Quality above this threshold.
-.TP
-\fB\-\-min\-meanDP\fR <float>
-.TP
-\fB\-\-max\-meanDP\fR <float>
-Include sites with mean Depth within the thresholds defined by these options.
-.TP
-\fB\-\-maf\fR <float>
-.TP
-\fB\-\-max\-maf\fR <float>
-Include only sites with Minor Allele Frequency within the specified range.
-.TP
-\fB\-\-non\-ref\-af\fR <float>
-.TP
-\fB\-\-max\-non\-ref\-af\fR <float>
-Include only sites with Non-Reference Allele Frequency within the specified 
-range.
-.TP
-\fB\-\-hue\fR <float>
-Assesses sites for Hardy-Weinberg Equilibrium using an exact test, as 
-defined by Wigginton, Cutler and Abecasis (2005). Sites with a p-value 
-below the threshold defined by this option are taken to be out of HWE, 
-and therefore excluded.
-.TP
-\fB\-\-geno\fR <float>
-Exclude sites on the basis of the proportion of missing data (defined to 
-be between 0 and 1).
-.TP
-\fB\-\-min\-alleles\fR <int>
-.TP
-\fB\-\-max\-alleles\fR <int>
-Include only sites with a number of alleles within the specified range. 
-For example, to include only bi\-allelic sites, one could use:
-
-      vcftools \-\-vcf file1.vcf \-\-min\-alleles 2 \-\-max\-alleles 2
-
-.TP
-\fB\-\-mask\fR <filename>
-.TP
-\fB\-\-invert\-mask\fR <filename>
-.TP
-\fB\-\-mask\-min\fR <filename>
-Include sites on the basis of a FASTA-like file. The provided file contains 
-a sequence of integer digits (between 0 and 9) for each position on a 
-chromosome that specify if a site at that position should be filtered or not. 
-An example mask file would look like:
-
-      >1
-      0000011111222...
-
-In this example, sites in the VCF file located within the first 5 bases of 
-the start of chromosome 1 would be kept, whereas sites at position 6 onwards 
-would be filtered out. The threshold integer that determines if sites are 
-filtered or not is set using the \-\-mask\-min option, which defaults to 0. 
-The chromosomes contained in the mask file must be sorted in the same order 
-as the VCF file. The \-\-mask option is used to specify the mask file to be 
-used, whereas the \-\-invert\-mask option can be used to specify a mask file 
-that will be inverted before being applied.
-
-.SS Individual Filters
-
-.TP
-\fB\-\-indv\fR <string>
-Specify an individual to be kept in the analysis. This option can be used 
-multiple times to specify multiple individuals.
-.TP
-\fB\-\-keep\fR <filename>
-Provide a file containing a list of individuals to include in subsequent a
-nalysis. Each individual ID (as defined in the VCF headerline) should be 
-included on a separate line.
-.TP
-\fB\-\-remove\-indv\fR <string>
-Specify an individual to be removed from the analysis. This option can be 
-used multiple times to specify multiple individuals. If the \-\-indv option 
-is also specified, then the \-\-indv option is executed before 
-the \-\-remove\-indv option.
-.TP
-\fB\-\-remove\fR <filename>
-Provide a file containing a list of individuals to exclude in subsequent 
-analysis. Each individual ID (as defined in the VCF headerline) should be 
-included on a separate line. If both the \-\-keep and the \-\-remove options 
-are used, then the \-\-keep option is execute before the \-\-remove option.
-.TP
-\fB\-\-mon\-indv\-meanDP\fR <float>
-.TP
-\fB\-\-max\-indv\-meanDP\fR <float>
-Calculate the mean coverage on a per-individual basis. Only individuals with 
-coverage within the range specified by these options are included in 
-subsequent analyses.
-.TP
-\fB\-\-mind\fR <float>
-Specify the minimum call rate threshold for each individual.
-.TP
-\fB\-\-phased\fR
-First excludes all individuals having all genotypes unphased, and 
-subsequently excludes all sites with unphased genotypes. The remaining data 
-therefore consists of phased data only.
-
-.SS Genotype Filters
-.TP
-\fB\-\-remove\-filtered\-geno\-all\fR
-.TP
-\fB\-\-remove\-filtered\-geno\fR <string>
-The first option removes all genotypes with a FILTER flag. The second option 
-can be used to exclude genotypes with a specific filter flag.
-.TP
-\fB\-\-minGQ\fR <float>
-Exclude all genotypes with a quality below the threshold specified by 
-this option (GQ).
-.TP
-\fB\-\-minDP\fR <float>
-Exclude all genotypes with a sequencing depth below that specified by 
-this option (DP)
-
-.SS Output Statistics
-.TP
-\fB\-\-freq\fR
-.TP
-\fB\-\-counts\fR
-.TP
-\fB\-\-freq2\fR
-.TP
-\fB\-\-counts2\fR
-Output per\-site frequency information. The \-\-freq outputs the allele 
-frequency in a file with the suffix '.frq'. The \-\-counts option outputs a 
-similar file with the suffix '.frq.count', that contains the raw allele 
-counts at each site.
-The \-\-freq2 and \-\-count2 options are used to suppress allele information in 
-the output file. In this case, the order of the freqs/counts depends on the
-numbering in the VCF file.
-.TP
-\fB\-\-depth\fR
-Generates a file containing the mean depth per individual. This file has 
-the suffix '.idepth'.
-.TP
-\fB\-\-site\-depth\fR
-.TP
-\fB\-\-site\-mean\-depth\fR
-Generates a file containing the depth per site. The \-\-site\-depth option 
-outputs the depth for each site summed across individuals. This file has 
-the suffix '.ldepth'. Likewise, the \-\-site\-mean\-depth outputs the mean 
-depth for each site, and the output file has the suffix '.ldepth.mean'.
-.TP
-\fB\-\-geno\-depth\fR
-Generates a (possibly very large) file containing the depth for each 
-genotype in the VCF file. Missing entries are given the value \-1. The 
-file has the suffix '.gdepth'.
-.TP
-\fB\-\-site\-quality\fR
-Generates a file containing the per\-site SNP quality, as found in the QUAL 
-column of the VCF file. This file has the suffix '.lqual'.
-.TP
-\fB\-\-het\fR
-Calculates a measure of heterozygosity on a per\-individual basis. 
-Specfically, the inbreeding coefficient, F, is estimated for each 
-individual using a method of moments. The resulting file has the suffix '.het'.
-.TP
-\fB\-\-hardy\fR
-Reports a p\-value for each site from a Hardy\-Weinberg Equilibrium test 
-(as defined by Wigginton, Cutler and Abecasis (2005)). The resulting file 
-(with suffix '.hwe') also contains the Observed numbers of Homozygotes and 
-Heterozygotes and the corresponding Expected numbers under HWE. 
-.TP
-\fB\-\-missing\fR
-Generates two files reporting the missingness on a per\-individual and 
-per\-site basis. The two files have suffixes '.imiss' and '.lmiss' 
-respectively.
-.TP
-\fB\-\-hap\-r2\fR
-.TP
-\fB\-\-geno\-r2\fR
-.TP
-\fB\-\-ld\-window\fR <int>
-.TP
-\fB\-\-ld\-window\-bp\fR <int>
-.TP
-\fB\-\-min\-r2\fR <float>
-These options are used to report Linkage Disequilibrium (LD) statistics 
-as summarised by the r2 statistic. The \-\-hap\-r2 option informs vcftools 
-to output a file reporting the r2 statistic using phased haplotypes. This 
-is the traditional measure of LD often reported in the population genetics 
-literature. If phased haplotypes are unavailable then the \-\-geno\-r2 option 
-may be used, which calculates the squared correlation coefficient between 
-genotypes encoded as 0, 1 and 2 to represent the number of non-reference 
-alleles in each individual. This is the same as the LD measure reported 
-by PLINK. The haplotype version outputs a file with the suffix '.hap.ld', 
-whereas the genotype version outputs a file with the suffix '.geno.ld'. 
-The haplotype version implies the option \-\-phased.
-
-The \-\-ld\-window option defines the maximum SNP separation for the 
-calculation of LD. Likewise, the \-\-ld\-window\-bp option can be used to 
-define the maximum physical separation of SNPs included in the LD 
-calculation. Finally, the \-\-min\-r2 sets a minimum value for r2 below 
-which the LD statistic is not reported.
-.TP
-\fB\-\-SNPdnsity\fR <int>
-Calculates the number and density of SNPs in bins of size defined by this 
-option. The resulting output file has the suffix '.snpden'.
-.TP
-\fB\-\-TsTv\fR <int>
-Calculates the Transition / Transversion ratio in bins of size defined by 
-this option. The resulting output file has the suffix '.TsTv'. A summary 
-is also supplied in a file with the suffix '.TsTv.summary'.
-.TP
-\fB\-\-FILTER\-summary\fR
-Generates a summary of the number of SNPs and Ts/Tv ratio for each FILTER 
-category. The output file has the suffix '.FILTER.summary.
-.TP
-\fB\-\-filtered\-sites\fR
-Creates two files listing sites that have been kept or removed after 
-filtering. The first file, with suffix '.kept.sites', lists sites kept 
-by vcftools after filters have been applied. The second file, with the 
-suffix '.removed.sites', list sites removed by the applied filters.
-.TP
-\fB\-\-singletons\fR
-This option will generate a file detailing the location of singletons, and 
-the individual they occur in. The file reports both true singletons, and 
-private doubletons (i.e. SNPs where the minor allele only occurs in a 
-single individual and that individual is homozygotic for that allele). 
-The output file has the suffix '.singletons'.
-.TP
-\fB\-\-site\-pi\fR
-.TP
-\fB\-\-window\-pi\fR <int>
-These options are used to estimate levels of nucleotide diversity. The first 
-option does this on a per\-site basis, and the output file has the 
-suffix '.sites.pi'. The second option calculates the nucleotide diversity in 
-windows, with the window size defined in the option argument. Output for 
-this option has the suffix '.windowed.pi'. The windowed version requires 
-phased data, and hence use of this option implies the \-\-phased option.
-
-.SS Output in Other Formats
-.TP
-\fB\-\-O12\fR
-This option outputs the genotypes as a large matrix. Three files are 
-produced. The first, with suffix '.012', contains the genotypes of each 
-individual on a separate line. Genotypes are represented as 0, 1 and 2, 
-where the number represent that number of non-reference alleles. Missing 
-genotypes are represented by \-1. The second file, with suffix '.012.indv' 
-details the individuals included in the main file. The third file, with 
-suffix '.012.pos' details the site locations included in the main file.
-.TP
-\fB\-\-IMPUTE\fR
-This option outputs phased haplotypes in IMPUTE reference\-panel format. As 
-IMPUTE requires phased data, using this option also implies \-\-phased. 
-Unphased individuals and genotypes are therefore excluded. Only bi\-allelic 
-sites are included in the output. Using this option generates three files. 
-The IMPUTE haplotype file has the suffix '.impute.hap', and the IMPUTE 
-legend file has the suffix '.impute.hap.legend'. The third file, with 
-suffix '.impute.hap.indv', details the individuals included in the 
-haplotype file, although this file is not needed by IMPUTE.
-.TP
-\fB\-\-ldhat\fR
-.TP
-\fB\-\-ldhat\-geno\fR
-These options output data in LDhat format. Use of these options  also 
-require the \-\-chr option to by used. The \-\-ldhat option outputs phased 
-data only, and therefore also implies \-\-phased, leading to unphased 
-individuals and genotypes being excluded. Alternatively, the \-\-ldhat\-geno 
-option treats all of the data as unphased, and therefore outputs LDhat 
-files in genotype/unphased format. In either case, two files are generated 
-with the suffixes '.ldhat.sites' and '.ldhat.locs', which correspond to the 
-LDhat 'sites' and 'locs' input files respectively.
-.TP
-\fB\-\-BEAGLE\-GL\fR
-This option outputs genotype likelihood information for input into the 
-BEAGLE program. This option requires the VCF file to contain the FORMAT 
-GL tag, which can generally be output by SNP callers such as the GATK. 
-Use of this option requires a chromosome to be specified via the
-\-\-chr option. The resulting output file (with the suffix '.BEAGLE.GL') 
-contains genotype likelihoods for biallelic sites, and is suitable for 
-input into BEAGLE via the 'like=' argument.
-.TP
-\fB\-\-plink\fR
-This option outputs the genotype data in PLINK PED format. Two files are 
-generated, with suffixes '.ped' and '.map'. Note that only bi\-allelic loci 
-will be output. Further details of these files can be found in the PLINK 
-documentation.
-
-Note: This option can be very slow on large datasets. Using the \-\-chr option 
-to divide up the dataset is advised.
-.TP
-\fB\-\-plink\-tped\fR
-The \-\-plink option above can be extremely slow on large datasets. An 
-alternative that might be considerably quicker is to output in the 
-PLINK transposed format. This can be achieved using the \-\-plink\-tped 
-option, which produces two files with suffixes '.tped' and '.tfam'.
-.TP
-\fB\-\-recode\fR
-The \-\-recode option is used to generate a VCF file from the input VCF file 
-having applied the options specified by the user. The output file has the 
-suffix '.recode.vcf'.
-
-By default, the INFO fields are removed from the output file, as the INFO 
-values may be invalidated by the recoding (e.g. the total depth may need to 
-be recalculated if individuals are removed). This default functionality can 
-be overridden by using the \-\-keep\-INFO <string> option, where <string> 
-defines the INFO key to keep in the output file. The \-\-keep\-INFO flag can 
-be used multiple times. Alternatively, the option \-\-keep\-INFO-all can be 
-used to retain all INFO fields.
-
-.SS Miscellaneous
-.TP
-\fB\-\-extract\-FORMAT\-info\fR <string>
-Extract information from the genotype fields in the VCF file relating to a 
-specified FORMAT identifier. For example, using the
-option '\-\-extract\-FORMAT\-info GT' would extract the all of the GT 
-(i.e. Genotype) 
-entries. The resulting output file has the suffix '.<FORMAT_ID>.FORMAT'.
-.TP
-\fB\-\-get\-INFO\fR <string>
-This option is used to extract information from the INFO field in the VCF 
-file. The <string> argument specifies the INFO tag to be extracted, and the 
-option can be used multiple times in order to extract multiple INFO entries. 
-The resulting file, with suffix '.INFO', contains the required INFO 
-information in a tab\-separated table. For example, to extract the NS and 
-DB flags, one would use the command:
-
-      vcftools \-\-vcf file1.vcf \-\-get\-INFO NS \-\-get\-INFO DB
-
-.SS VCF File Comparison Options
-
-The file comparison options are currently in a state of flux and likely buggy. 
-If you find a bug, please report it. Note that genotype\-level filters are not 
-supported in these options.
-
-.TP
-\fB\-\-diff\fR <filename>
-.TP
-\fB\-\-gzdiff\fR <filename>
-Select a VCF file for comparison with the file specified by the \-\-vcf option. 
-Outputs two files describing the sites and individuals common / unique to 
-each file. These files have the suffixes '.diff.sites_in_files' 
-and '.diff.indv_in_files' respectively. The \-\-gzdiff version can be used to 
-read compressed VCF files.
-.TP
-\fB\-\-diff\-site\-discordance\fR
-Used in conjunction with the \-\-diff option to calculate discordance on a 
-site by site basis. The resulting output file has the suffix '.diff.sites'.
-.TP
-\fB\-\-diff\-indv\-discordance\fR
-Used in conjunction with the \-\-diff option to calculate discordance on a 
-per-individual basis. The resulting output file has the suffix '.diff.indv'.
-.TP
-\fB\-\-diff\-discordance\-matrix\fR
-Used in conjunction with the \-\-diff option to calculate a discordance matrix. 
-This option only works with bi\-allelic loci with matching alleles that are 
-present in both files. The resulting output file has the 
-suffix '.diff.discordance.matrix'.
-.TP
-\fB\-\-diff\-switch\-error\fR
-Used in conjunction with the \-\-diff option to calculate phasing errors 
-(specifically 'switch errors'). This option generates two output files 
-describing switch errors found between sites, and the average switch error 
-per individual. These two files have the suffixes '.diff.switch'
-and '.diff.indv.switch' respectively.
-
-.SS Options still in development
-
-The following options are yet to be finalised, are likely to contain bugs, 
-and are likely to change in the future.
-.TP
-\fB\-\-fst\fR <filename>
-.TP
-\fB\-\-gzfst\fR <filename>
-Calculate FST for a pair of VCF files, with the second file being specified 
-by this option. FST is currently calculated using the formula described in 
-the supplementary material of the Phase I HapMap paper. Currently, only 
-pairwise FST calculations are supported, although this will likely change 
-in the future. The \-\-gzfst option can be used to read compressed VCF files.
-
-.TP
-\fB\-\-LROH\fR
-Identify Long Runs of Homozygosity.
-.TP
-\fB\-\-relatedness\fR
-Output Individual Relatedness Statistics. 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/vcftools.git



More information about the debian-med-commit mailing list