[med-svn] [Git][med-team/swarm-cluster][upstream] New upstream version 2.2.2+dfsg

Mon Feb 19 10:28:52 UTC 2018

Andreas Tille pushed to branch upstream at Debian Med / swarm-cluster


Commits:
a5b46547 by Andreas Tille at 2018-02-19T11:04:07+01:00
New upstream version 2.2.2+dfsg
- - - - -


16 changed files:

- README.md
- man/swarm.1
- src/Makefile
- src/algo.cc
- src/algod1.cc
- src/bitmap.h
- src/db.cc
- src/derep.cc
- src/nw.cc
- src/qgram.cc
- src/scan.cc
- src/search16.cc
- src/search8.cc
- src/swarm.cc
- src/swarm.h
- src/util.cc


Changes:

=====================================
README.md
=====================================

--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# swarm #
+# swarm
 
 A robust and fast clustering method for amplicon-based studies.
 
@@ -16,8 +16,8 @@ To help users, we describe
 starting from raw fastq files, clustering with **swarm** and producing
 a filtered OTU table.
 
-**swarm** 2.0 introduces several novelties and improvements over
-  swarm 1.0:
+swarm 2.0 introduces several novelties and improvements over swarm
+1.0:
 * built-in breaking phase now performed automatically,
 * possibility to output OTU representatives in fasta format (option
   `-w`),
@@ -26,30 +26,7 @@ a filtered OTU table.
 * a new option called *fastidious* that refines *d* = 1 results and
   reduces the number of small OTUs,
 
-Table of Contents
-=================
-
-* [Common misconceptions](#common_misconceptions)
-* [Quick start](#quick_start)
-* [Install](#install)
-* [Prepare amplicon fasta files](#prepare_amplicon)
-  * [Linearization](#linearization)
-  * [Dereplication](#dereplication)
-  * [Launch swarm](#launch)
-* [Frequently asked questions](#FAQ)
-  * [Refine swarm OTUs](#refine_OTUs)
-  * [Count the number of amplicons per OTU](#OTU_sizes)
-  * [Get the seed sequence for each swarm](#extract_seeds)
-  * [Get fasta sequences for all amplicons in a swarm](#extract_all)
-* [Troubleshooting](#troubleshooting)
-* [Citation](#citation)
-* [Contact](#contact)
-* [Third-party pipelines](#pipelines)
-* [Alternatives](#alternatives)
-* [Version history](#history)
-
-<a name="common_misconceptions"/>
-## Common misconceptions ##
+## Common misconceptions
 
 **swarm** is a single-linkage clustering method, with some superficial
   similarities with other clustering methods (e.g.,
@@ -79,8 +56,7 @@ Table of Contents
   that molecular markers have limitations too.
 
 
-<a name="quick_start"/>
-## Quick start ##
+## Quick start
 
 **swarm** most simple usage is:
 
@@ -117,10 +93,9 @@ can increase significantly. See options `-c` and `-y` to control and
 cap swarm's memory consumption.
 
 
-<a name="install"/>
 ## Install ##
 
-Get the latest binaries for GNU/Linux or MacOSX from
+Get the latest binaries for GNU/Linux or macOS from
 [the release page](https://github.com/torognes/swarm/releases "swarm
 tagged releases"). Get the source code from
 [GitHub](https://github.com/torognes/swarm "swarm public repository")
@@ -149,7 +124,6 @@ Once installed, the man page can be accessed with the command `man
 swarm`.
 
 
-<a name="prepare_amplicon"/>
 ## Prepare amplicon fasta files ##
 
 To facilitate the use of **swarm**, we provide examples of shell
@@ -164,7 +138,6 @@ from adaptors and primers (with
 converted to fasta.
 
 
-<a name="linearization"/>
 ### Linearization ###
 
 **swarm** accepts wrapped fasta files as well as linear fasta
@@ -180,7 +153,6 @@ awk 'NR==1 {print ; next} {printf /^>/ ? "\n"$0"\n" : $1} END {printf "\n"}' amp
 ```
 
 
-<a name="dereplication"/>
 ### Dereplication (mandatory) ###
 
 In a sample, or collection of sample, a given sequence is likely to
@@ -248,7 +220,7 @@ Alternatively, you may specify a default abundance value with
 **swarm**'s `--append-abundance` (`-a`) option to be used when
 abundance information is missing from a sequence.
 
-<a name="launch"/>
+
 ### Launch swarm ###
 
 Here is a typical way to use **swarm**:
@@ -268,7 +240,6 @@ See the
 for details on swarm's options and parameters.
 
 
-<a name="FAQ"/>
 ## Frequently asked questions ##
 
 To facilitate the use of **swarm**, we provide examples of options or
@@ -277,7 +248,6 @@ that the amplicon fasta file was prepared as describe above
 (linearization and dereplication).
 
 
-<a name="refine_OTUs"/>
 ### Refine swarm OTUs ###
 
 The chain-breaking, which used to be performed in a second step in
@@ -292,7 +262,6 @@ is described in the figure below:
 ![](https://github.com/frederic-mahe/swarm/blob/master/figures/swarm_2.0_fastidious_reduced.png)
 
 
-<a name="OTU_sizes"/>
 ### Count the number of amplicons per OTU ###
 
 You might want to check the size distribution of OTUs (number of
@@ -303,7 +272,6 @@ represents an OTU and provides different metrics. See the manual for a
 complete description.
 
 
-<a name="extract_seeds"/>
 ### Get the seed sequence for each OTU ###
 
 It is frequent for subsequent analyses to keep only one representative
@@ -312,7 +280,6 @@ burden. That operation is easily done with **swarm** by using the `-w
 filename` option.
 
 
-<a name="extract_all"/>
 ### Get fasta sequences for all amplicons in a OTU ###
 
 For each OTU, get the fasta sequences for all amplicons. Warning, this
@@ -338,7 +305,6 @@ rm "${AMPLICONS}"
 ```
 
 
-<a name="troubleshooting"/>
 ## Troubleshooting ##
 
 If **swarm** exits with an error message saying `This program
@@ -348,7 +314,6 @@ on CPUs with the SSE2 instructions, i.e. most Intel and AMD CPUs
 released since 2004.
 
 
-<a name="citation"/>
 ## Citation ##
 
 To cite **swarm**, please refer to:
@@ -362,7 +327,6 @@ Swarm v2: highly-scalable and high-resolution amplicon clustering.
 PeerJ 3:e1420 doi: [10.7717/peerj.1420](http://dx.doi.org/10.7717/peerj.1420)
 
 
-<a name="contact"/>
 ## Contact ##
 
 You are welcome to:
@@ -372,7 +336,6 @@ You are welcome to:
 * compose a friendly e-mail to: Frédéric Mahé <mahe at rhrk.uni-kl.de> and Torbjørn Rognes <torognes at ifi.uio.no>
 
 
-<a name="pipelines"/>
 ## Third-party pipelines ##
 
 **swarm** is available in third-party pipelines:
@@ -388,35 +351,57 @@ You are welcome to:
   performing microbiome analysis from raw DNA sequencing data.
 
 
-<a name="alternatives"/>
 ## Alternatives ##
 
 If you want to try alternative free and open-source clustering
 methods, here are some links:
 
-* [Vsearch](https://github.com/torognes/vsearch)
+* [VSEARCH](https://github.com/torognes/vsearch)
 * [Oligotyping](http://merenlab.org/projects/oligotyping/)
 * [DNAclust](http://dnaclust.sourceforge.net/)
 * [Sumaclust](http://metabarcoding.org/sumatra)
 * [Crunchclust](https://code.google.com/p/crunchclust/)
 
 
-<a name="history"/>
 ## Version history##
 
-<a name="version2112"/>
+### version 2.2.2 ###
+
+**swarm** 2.2.2 fixes a bug causing Swarm to wait forever in very rare
+cases when multiple threads were used.
+
+### version 2.2.1 ###
+
+**swarm** 2.2.1 fixes a memory allocation bug for d=1.
+
+### version 2.2.0 ###
+
+**swarm** 2.2.0 fixes several problems and improves usability.
+Corrected output to structure and uclust files when using fastidious
+mode. Corrected abundance output in some cases. Added check for
+duplicated sequences and fixed check for duplicated sequence
+IDs. Checks for empty sequences. Sorts sequences by additional fields
+to improve stability. Improves compatibility with compilers and
+operating systems.  Outputs sequences in upper case. Allows 64-bit
+abundances. Shows message when waiting for input from stdin. Improves
+error messages and warnings. Improves checking of command line
+options. Fixes remaining errors reported by test suite. Updates
+documentation.
+
+### version 2.1.13 ###
+
+**swarm** 2.1.13 removes a bug in the progress bar when writing seeds.
+
 ### version 2.1.12 ###
 
 **swarm** 2.1.12 removes a debugging message.
 
-<a name="version2111"/>
 ### version 2.1.11 ###
 
 **swarm** 2.1.11 fixes two bugs related to the SIMD implementation
 of alignment that might result in incorrect alignments and scores.
 The bug only applies when d>1.
 
-<a name="version2110"/>
 ### version 2.1.10 ###
 
 **swarm** 2.1.10 fixes two bugs related to gap penalties of
@@ -426,47 +411,39 @@ use a slightly higher gap extension penalty than specified. The
 default gap extension penalty used have actually been 4.5 instead of
 4.
 
-<a name="version219"/>
 ### version 2.1.9 ###
 
 **swarm** 2.1.9 fixes a problem when compiling with GCC version 6.
 
-<a name="version218"/>
 ### version 2.1.8 ###
 
 **swarm** 2.1.8 fixes a rare bug triggered when clustering extremely
 short undereplicated sequences. Also, alignment parameters are not
 shown when d=1.
 
-<a name="version217"/>
 ### version 2.1.7 ###
 
 **swarm** 2.1.7 fixes more problems with seed output. Ignore CR
   characters in FASTA files. Improved help and error messsages.
 
-<a name="version216"/>
 ### version 2.1.6 ###
 
 **swarm** 2.1.6 fixes problems with older compilers that do not have
 the x86intrin.h header file. It also fixes a bug in the output of seeds
 with the `-w` option when d>1.
 
-<a name="version215"/>
 ### version 2.1.5 ###
 
 **swarm** 2.1.5 fixes minor bugs.
 
-<a name="version214"/>
 ### version 2.1.4 ###
 
 **swarm** 2.1.4 fixes minor bugs in the swarm algorithm used for d=1.
 
-<a name="version213"/>
 ### version 2.1.3 ###
 
 **swarm** 2.1.3 adds checks of numeric option arguments.
 
-<a name="version212"/>
 ### version 2.1.2 ###
 
 **swarm** 2.1.2 adds the -a (--append-abundance) option to set a
@@ -475,29 +452,24 @@ missing from the input file. If this option is not specified, missing
 abundance information will result in a fatal error. The error message
 in that case is improved.
 
-<a name="version211"/>
 ### version 2.1.1 ###
 
 **swarm** 2.1.1 fixes a bug with the fastidious option that caused it
 to ignore some connections between heavy and light swarms.
 
-<a name="version210"/>
 ### version 2.1.0 ###
 
 **swarm** 2.1.0 marks the first official release of swarm 2.
 
-<a name="version207"/>
 ### version 2.0.7 ###
 
 **swarm** 2.0.7 writes abundance information in usearch style when using
 options `-w` (`--seeds`) in combination with `-z` (`--usearch-abundance`).
 
-<a name="version206"/>
 ### version 2.0.6 ###
 
 **swarm** 2.0.6 fixes a minor bug.
 
-<a name="version205"/>
 ### version 2.0.5 ###
 
 **swarm** 2.0.5 improves the implementation of the fastidious option
@@ -507,41 +479,34 @@ representatives sequences with updated abundances (sum of all
 abundances inside each OTU). This version also enables dereplication
 when `d = 0`.
 
-<a name="version204"/>
 ### version 2.0.4 ###
 
 **swarm** 2.0.4 includes a fully parallelized fastidious option.
 
-<a name="version203"/>
 ### version 2.0.3 ###
 
 **swarm** 2.0.3 includes a working fastidious option.
 
-<a name="version202"/>
 ### version 2.0.2 ###
 
 **swarm** 2.0.2 fixes SSSE3 problems.
 
-<a name="version201"/>
 ### version 2.0.1 ###
 
 **swarm** 2.0.1 is a development release that partially implements the 
 fastidious option.
 
-<a name="version200"/>
 ### version 2.0.0 ###
 
 **swarm** 2.0.0 simplifies the usage of swarm by using the fast
 algorithm and the built-in OTU breaking by default. Some options are
 changed and some new output options are introduced.
 
-<a name="version1221"/>
 ### version 1.2.21 ###
 
 **swarm** 1.2.21 is supposed to fix some problems related to the use of the
 SSSE3 CPU instructions which are not always available.
 
-<a name="version1220"/>
 ### version 1.2.20 ###
 
 **swarm** 1.2.20 presents a production-ready version of the
@@ -554,32 +519,27 @@ rigourously identical to the results previously produced with
 swarm. That release also introduces new options to control swarm
 output (options `-i` and `-l`).
 
-<a name="version1219"/>
 ### version 1.2.19 ###
 
 **swarm** 1.2.19 fixes a problem related to abundance information when
   the sequence identifier includes multiple underscore characters.
 
-<a name="version1218"/>
 ### version 1.2.18 ###
 
 **swarm** 1.2.18 reenables the possibility of reading sequences from
   `stdin` if no file name is specified on the command line. It also
   fixes a bug related to CPU features detection.
 
-<a name="version1217"/>
 ### version 1.2.17 ###
 
 **swarm** 1.2.17 fixes a memory allocation bug introduced in
   version 1.2.15.
 
-<a name="version1216"/>
 ### version 1.2.16 ###
 
 **swarm** 1.2.16 fixes a bug in the abundance sort introduced in
   version 1.2.15.
 
-<a name="version1215"/>
 ### version 1.2.15 ###
 
 **swarm** 1.2.15 sorts the input sequences in order of decreasing
@@ -587,29 +547,24 @@ output (options `-i` and `-l`).
   the alternative algorithm for *d* = 1 it also sorts all subseeds in
   order of decreasing abundance.
 
-<a name="version1214"/>
 ### version 1.2.14 ###
 
 **swarm** 1.2.14 fixes a bug in the output with the swarm breaker
   option (`-b`) when using the alternative algorithm (`-a`).
 
-<a name="version1213"/>
 ### version 1.2.13 ###
 
 **swarm** 1.2.13 updates the citation.
 
-<a name="version1212"/>
 ### version 1.2.12 ###
 
 **swarm** 1.2.12 improves speed of new search strategy for *d* = 1.
 
-<a name="version1211"/>
 ### version 1.2.11 ###
 
 **swarm** 1.2.11 corrects the number of differences reported in the
   break swarms output.
 
-<a name="version1210"/>
 ### version 1.2.10 ###
 
 **swarm** 1.2.10 allows amplicon abundances to be specified using the
@@ -617,7 +572,6 @@ output (options `-i` and `-l`).
   `-z` option is chosen. Also fixes the bad URL shown in the previous
   version of swarm.
 
-<a name="version129"/>
 ### version 1.2.9 ###
 
 **swarm** 1.2.9 includes a parallelized variant of the new search
@@ -630,7 +584,6 @@ output (options `-i` and `-l`).
   differences between the seed and the amplicon is indicated in the
   last column.
 
-<a name="version128"/>
 ### version 1.2.8 ###
 
 **swarm** 1.2.8 fixes an error with the gap extension
@@ -641,13 +594,11 @@ output (options `-i` and `-l`).
   sequences or more. The new strategy can be turned on with the `-a`
   option.
 
-<a name="version127"/>
 ### version 1.2.7 ###
 
 **swarm** 1.2.7 incorporates a few small changes and improvements to
   make it ready for integration into QIIME.
 
-<a name="version126"/>
 ### version 1.2.6 ###
 
 **swarm** 1.2.6 add an option (`-r` or `--mothur`) to format the
@@ -656,7 +607,6 @@ output (options `-i` and `-l`).
   input sequences it will now report the illegal character and the
   line number.
 
-<a name="version125"/>
 ### version 1.2.5 ###
 
 **swarm** 1.2.5 can be run on CPUs without the POPCNT feature. It
@@ -664,7 +614,6 @@ output (options `-i` and `-l`).
   the appropriate code.  The code that avoids POPCNT is just slightly
   slower. Only basic SSE2 is now required.
 
-<a name="version124"/>
 ### version 1.2.4 ###
 
 **swarm** 1.2.4 changes the name of the new option from
@@ -672,7 +621,6 @@ output (options `-i` and `-l`).
   options, and also adds a companion script `swarm_breaker.py` to
   refine swarm results (`scripts` folder).
 
-<a name="version123"/>
 ### version 1.2.3 ###
 
 **swarm** 1.2.3 adds an option (`-b` or `--break_swarms`) to output
@@ -681,20 +629,18 @@ output (options `-i` and `-l`).
   the inline assembly code is also changed for compatibility with more
   compilers.
 
-<a name="version122"/>
 ### version 1.2.2 ###
 
 **swarm** 1.2.2 fixes an issue with incorrect values in the statistics
   file (maximum generation and radius of swarms). This version is also
   a bit faster.
 
-<a name="version121"/>
 ### version 1.2.1 ###
 
 **swarm** 1.2.1 removes the need for a SSE4.1 capable CPU and should
   now be able to run on most servers, desktops and laptops.
 
-<a name="version120"/>
+
 ### version 1.2.0 ###
 
 **swarm** 1.2.0 introduces a pre-filtering of similar amplicons based
@@ -705,7 +651,7 @@ output (options `-i` and `-l`).
   a computational overhead, but becomes more and more efficient as the
   size of the amplicon set increases.
 
-<a name="version111"/>
+
 ### version 1.1.1 ###
 
 **swarm** now works on Apple computers. This version also corrects an
@@ -713,7 +659,7 @@ output (options `-i` and `-l`).
   sub-optimal alignments. Slightly different alignments may result
   relative to previous version, giving slightly different swarms.
 
-<a name="version110"/>
+
 ### version 1.1.0 ###
 
 **swarm** 1.1.0 introduces new optimizations and is 20% faster than


=====================================
man/swarm.1
=====================================
--- a/man/swarm.1
+++ b/man/swarm.1
@@ -1,5 +1,5 @@
 .\" ============================================================================
-.TH swarm 1 "January 16, 2017" "version 2.1.12" "USER COMMANDS"
+.TH swarm 1 "December 12, 2017" "version 2.2.2" "USER COMMANDS"
 .\" ============================================================================
 .SH NAME
 swarm \(em find clusters of nearly-identical nucleotide amplicons
@@ -42,22 +42,24 @@ extremely fast Needleman-Wunsch algorithm making use of the Streaming
 SIMD Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are
 not available, \fBswarm\fR exits with an error message.
 .PP
-\fBswarm\fR reads the named input \fIfilename\fR, a fasta file of
-nucleotide amplicons. The amplicon identifier is defined as the string
-comprised between the ">" symbol and the first space or the end of the
+\fBswarm\fR can read nucleotide amplicons in fasta format from a
+normal file or from the standard input (using a pipe or a
+redirection). The amplicon identifier is defined as the string
+comprised between the '>' symbol and the first space or the end of the
 line, whichever comes first. As \fBswarm\fR outputs lists of amplicon
 identifiers, amplicon identifiers must be unique to avoid ambiguity;
-swarm exits with an error message if identifiers are not
-unique. Amplicon identifiers must end with a "_" followed by a
-positive integer representing the amplicon copy number (or abundance
-annotation; usearch/vsearch users can use the option \-z to change
-that behavior). Abundance annotations play a crucial role in the
-clustering process, and swarm exits with an error message if that
+swarm exits with an error message if identifiers are not unique.
+Amplicon identifiers must end with a '_' followed by a positive
+integer representing the amplicon copy number (or abundance
+annotation; usearch/vsearch users can use the option \-z to use
+the ';size=' annotation). Abundance annotations play a crucial role in
+the clustering process, and swarm exits with an error message if that
 information is not available. The amplicon sequence is defined as a
-string of [acgt] or [acgu] symbols (case insensitive), starting after
-the end of the identifier line and ending before the next identifier
-line or the file end; \fBswarm\fR exits with an error message if any
-other symbol is present.
+string of [ACGT] or [ACGU] symbols (case insensitive, 'U' is replaced
+with 'T' internally), starting after the end of the identifier line
+and ending before the next identifier line or the file end;
+\fBswarm\fR silently removes newline symbols ('\\n' or '\\r') and
+exits with an error message if any other symbol is present.
 .\" ----------------------------------------------------------------------------
 .SS General options
 .TP 9
@@ -74,8 +76,8 @@ output version information and exit successfully.
 .TP
 .B \-\-
 delimit the option list. Later arguments, if any, are treated as
-operands even if they begin with "\-". For example, "swarm \-\-
-\-file.fasta" reads from the file "\-file.fasta".
+operands even if they begin with '\-'. For example, 'swarm \-\-
+\-file.fasta' reads from the file '\-file.fasta'.
 .LP
 .\" ----------------------------------------------------------------------------
 .SS Clustering options
@@ -86,9 +88,9 @@ that two amplicons will be grouped if they have \fIinteger\fR (or
 less) differences. This is \fBswarm\fR's most important parameter. The
 number of differences is calculated as the number of mismatches
 (substitutions, insertions or deletions) between the two amplicons
-once the optimal pairwise global alignment has been found (see
-"pairwise alignment advanced options" to influence that step). Any
-\fIinteger\fR between 0 and 256 can be used, but high \fId\fR values
+once the optimal pairwise global alignment has been found
+(see 'pairwise alignment advanced options' to influence that step).
+Any \fIinteger\fR from 0 to 255 can be used, but high \fId\fR values
 will decrease the taxonomical resolution of \fBswarm\fR
 results. Commonly used \fId\fR values are 1, 2 or 3, rarely
 higher. When using \fId\fR = 0, \fBswarm\fR will output results
@@ -122,7 +124,7 @@ of \fBswarm\fR results. Default mass of a large OTU is 3.
 when using the option \-\-fastidious (\-f), define \fBswarm\fR's
 maximum memory footprint (in megabytes). \fBswarm\fR will adjust the
 \-\-bloom\-bits (\-y) value of the Bloom filter to fit within the
-specified amount of memory.
+specified amount of memory. The value must be at least 3.
 .TP
 .B \-f\fP,\fB\ \-\-fastidious
 when working with \fId\fR = 1, perform a second clustering pass to
@@ -142,8 +144,9 @@ clustering results: the output files produced by the options \-\-log
 (\-l), \-\-output\-file (\-o), \-\-mothur (\-r), \-\-uclust\-file, and
 \-\-seeds (\-w) are updated to reflect these modifications; the file
 \-\-statistics\-file (\-s) is partially updated (columns 6 and 7 are
-not updated); the output file \-\-internal\-structure (\-i) is not
-updated.
+not updated); the output file \-\-internal\-structure (\-i) is
+partially updated (column 5 is not updated for amplicons that belonged to
+the small OTU).
 .TP
 .BI \-y\fP,\fB\ \-\-bloom\-bits\~ "positive integer"
 when using the option \-\-fastidious (\-f), define the size (in bits)
@@ -159,9 +162,10 @@ alternative way to control the memory footprint.
 .TP 9
 .BI \-a\fP,\fB\ \-\-append\-abundance\~ "positive integer"
 set abundance value to use when some or all amplicons in the input
-file lack abundance values. Warning, it is not recommended to use
-\fBswarm\fR on datasets where abundance values are all identical. We
-provide that option as a courtesy to advanced users, please use it
+file lack abundance values (_\fIinteger\fR, or ;size=\fIinteger\fR;
+when using \-z). Warning, it is not recommended to use \fBswarm\fR on
+datasets where abundance values are all identical. We provide that
+option as a courtesy to advanced users, please use it
 carefully. \fBswarm\fR exits with an error message if abundance values
 are missing and if this option is not used.
 .TP
@@ -183,8 +187,12 @@ OTU number (\fIpositive integer\fR). OTUs are numbered in their order
 of delineation, starting from 1. All pairs of amplicons belonging to
 the same OTU will receive the same number.
 .IP \n+[step].
-number of steps from the OTU seed to amplicon B (\fIpositive
-integer\fR).
+cummulated number of steps from the OTU seed to amplicon B
+(\fIpositive integer\fR). When using the option \-\-fastidious (\-f),
+the actual number of steps between grafted amplicons and the OTU seed
+cannot be re-computed efficiently and is always set to 2 for the
+amplicon pair linking the small OTU to the big OTU. Cummulated number
+of steps in the small OTU (if any) are left unchanged.
 .RE
 .RE
 .TP
@@ -222,17 +230,54 @@ initial seed abundance,
 number of amplicons with an abundance of 1 in the OTU,
 .IP \n+[step].
 maximum number of iterations before the OTU reached its natural
-limit),
+limit,
 .IP \n+[step].
-theoretical maximum radius of the OTU (i.e., number of cummulated
-differences between the seed and the furthermost amplicon in the
-OTU). The actual maximum radius of the OTU is often much smaller.
+cummulated number of steps along the path joining the seed and the
+furthermost amplicon in the OTU. Please note that the actual number of
+differences between the seed and the furthermost amplicon is usually
+much smaller. When using the option \-\-fastidious (\-f), grafted
+amplicons are not taken into account.
 .RE
 .RE
 .TP
 .BI \-u\fP,\fB\ \-\-uclust\-file \0filename
-output clustering results in uclust-like file format to the specified
-file. That option does not modify \fBswarm\fR's default output format.
+output clustering results in \fIfilename\fR using a tab-separated
+uclust-like format with 10 columns and 3 different type of entries (S,
+H or C). That option does not modify \fBswarm\fR's default output
+format. Each fasta sequence in the input file can be either a cluster
+centroid (S) or a hit (H) assigned to a cluster. Cluster records (C)
+summarize information (size, centroid label) for each cluster. Column
+content varies with the type of entry (S, H or C):
+.RS
+.RS
+.nr step 1 1
+.IP \n[step]. 4
+Record type: S, H, or C.
+.IP \n+[step].
+Cluster number (zero-based).
+.IP \n+[step].
+Centroid length (S), query length (H), or cluster size (C).
+.IP \n+[step].
+Percentage of similarity with the centroid sequence (H), or set to '*'
+(S, C).
+.IP \n+[step].
+Match orientation + or - (H), or set to '*' (S, C).
+.IP \n+[step].
+Not used, always set to '*' (S, C) or to zero (H).
+.IP \n+[step].
+Not used, always set to '*' (S, C) or to zero (H).
+.IP \n+[step].
+set to '*' (S, C) or, for H, compact representation of the pairwise
+alignment using the CIGAR format (Compact Idiosyncratic Gapped
+Alignment Report): M (match), D (deletion) and I (insertion). The
+equal sign '=' indicates that the query is identical to the centroid
+sequence.
+.IP \n+[step].
+Label of the query sequence (H), or of the centroid sequence (S, C).
+.IP \n+[step].
+Label of the centroid sequence (H), or set to '*' (S, C).
+.RE
+.RE
 .TP
 .BI \-w\fP,\fB\ \-\-seeds \0filename
 output OTU representatives to \fIfilename\fR in fasta format. The
@@ -344,13 +389,37 @@ New features and important modifications of \fBswarm\fR (short lived
 or minor bug releases are not mentioned):
 .RS
 .TP
+.BR v2.2.2\~ "released December 12, 2017"
+Version 2.2.2 fixes a bug that would cause Swarm to wait forever in
+very rare cases when multiple threads were used.
+.TP
+.BR v2.2.1\~ "released October 27, 2017"
+Version 2.2.1 fixes a memory allocation bug for \fId\fR = 1 and
+duplicated sequences.
+.TP
+.BR v2.2.0\~ "released October 17, 2017"
+Version 2.2.0 fixes several problems and improves usability. Corrected
+output to structure and uclust files when using fastidious
+mode. Corrected abundance output in some cases. Added check for
+duplicated sequences and fixed check for duplicated sequence
+IDs. Checks for empty sequences. Sorts sequences by additional fields
+to improve stability. Improves compatibility with compilers and
+operating systems.  Outputs sequences in upper case. Allows 64-bit
+abundances. Shows message when waiting for input from stdin. Improves
+error messages and warnings. Improves checking of command line
+options. Fixes remaining errors reported by test suite. Updates
+documentation.
+.TP
+.BR v2.1.13\~ "released March 8, 2017"
+Version 2.1.13 removes a bug with the progress bar when writing seeds.
+.TP
 .BR v2.1.12\~ "released January 16, 2017"
 Version 2.1.12 removes a debugging message.
 .TP
 .BR v2.1.11\~ "released January 16, 2017"
 Version 2.1.11 fixes two bugs related to the SIMD implementation
 of alignment that might result in incorrect alignments and scores.
-The bug only applies when d>1.
+The bug only applies when \fId\fR > 1.
 .TP
 .BR v2.1.10\~ "released December 22, 2016"
 Version 2.1.10 fixes two bugs related to gap penalties of alignments.
@@ -370,7 +439,7 @@ shown when \fId\fR = 1.
 .BR v2.1.7\~ "released February 24, 2016"
 Version 2.1.7 fixes a bug in the output of seeds with the \-w option
 when \fId\fR > 1 that was not properly fixed in version 2.1.6. It also
-handles ascii character n°13 (CR) in FASTA files better. Swarm will
+handles ascii character #13 (CR) in FASTA files better. Swarm will
 now exit with status 0 if the \-h or the \-v option is specified. The
 help text and some error messages have been improved.
 .TP
@@ -484,7 +553,7 @@ has been noticeably improved.
 .TP
 .BR v1.2.10\~ "released August 8, 2014"
 Version 1.2.10 allows amplicon abundances to be specified using the
-usearch style in the sequence header (e.g. ">id;size=1") when the \-z
+usearch style in the sequence header (e.g. '>id;size=1') when the \-z
 option is chosen.
 .TP
 .BR v1.2.8\~ "released August 5, 2014"


=====================================
src/Makefile
=====================================
--- a/src/Makefile
+++ b/src/Makefile
@@ -25,13 +25,12 @@
 #COMMON=-pg -g
 COMMON=-g
 
-COMPILEOPT=-Wall -Wsign-compare -O3 -msse2 -mtune=core2 -Icityhash
-
 LIBS=-lpthread
 LINKFLAGS=$(COMMON)
 
 CXX=g++
-CXXFLAGS=$(COMPILEOPT) $(COMMON)
+WARNINGS=-Wall -Wsign-compare -Wextra -Wpedantic -Wno-long-long
+CXXFLAGS=$(COMMON) $(WARNINGS) -O3 -msse2 -mtune=core2 -Icityhash
 
 PROG=swarm
 
@@ -41,11 +40,6 @@ OBJS=swarm.o db.o search8.o search16.o nw.o matrix.o util.o scan.o \
 DEPS=Makefile swarm.h bitmap.h bloom.h cityhash/config.h cityhash/city.h \
 	threads.h
 
-.SUFFIXES:.o .cc
-
-%.o : %.cc $(DEPS)
-	$(CXX) $(CXXFLAGS) -c -o $@ $<
-
 all : $(PROG)
 
 swarm : $(OBJS)
@@ -57,4 +51,4 @@ clean :
 	rm -rf swarm *.o *~ ../bin/ gmon.out cityhash/*.o ../man/*~ ../*~
 
 ssse3.o : ssse3.cc $(DEPS)
-	$(CXX) -mssse3 $(CXXFLAGS) -c -o $@ $<
+	$(CXX) $(CXXFLAGS) -mssse3 -c -o $@ $<


=====================================
src/algo.cc
=====================================
--- a/src/algo.cc
+++ b/src/algo.cc
@@ -55,14 +55,12 @@ void algo_run()
   count_comparisons_8 = 0;
   count_comparisons_16 = 0;
 
-  unsigned long searches = 0;
-
 #ifdef VERBOSE
+  unsigned long searches = 0;
   unsigned long estimates = 0;
 #endif
 
   unsigned long largestswarm = 0;
-  unsigned long swarmsize = 0;
 
   unsigned long maxgenerations = 0;
 
@@ -113,7 +111,7 @@ void algo_run()
   
   unsigned long bits;
 
-  if ((unsigned long)resolution <= diff_saturation)
+  if ((unsigned long)opt_differences <= diff_saturation)
     bits = 8;
   else
     bits = 16;
@@ -131,6 +129,7 @@ void algo_run()
       
       swarmid++;
 
+      unsigned long swarmsize = 0;
       unsigned long amplicons_copies = 0;
       unsigned long singletons = 0;
       unsigned long hitcount = 0;
@@ -184,7 +183,7 @@ void algo_run()
           unsigned poolampliconid = qgramamps[i];
           long diff = qgramdiffs[i];
           amps[swarmed+i].diffestimate = diff;
-          if (diff <= resolution)
+          if (diff <= opt_differences)
             {
               targetindices[targetcount] = swarmed+i;
               targetampliconids[targetcount] = poolampliconid;
@@ -196,7 +195,9 @@ void algo_run()
         {
           search_do(seedampliconid, targetcount, targetampliconids, 
                     scores, diffs, alignlengths, bits);
+#ifdef VERBOSE
           searches++;
+#endif
 
           if (bits == 8)
             count_comparisons_8 += targetcount;
@@ -218,7 +219,7 @@ void algo_run()
 
               unsigned diff = diffs[t];
 
-              if (diff <= (unsigned long) resolution)
+              if (diff <= (unsigned long) opt_differences)
                 {
                   unsigned i = targetindices[t];
 
@@ -294,7 +295,7 @@ void algo_run()
               for(unsigned long i=swarmed; i<amplicons; i++)
                 {
                   unsigned long targetampliconid = amps[i].ampliconid;
-                  if ((amps[i].diffestimate <= subseedradius + resolution) &&
+                  if ((amps[i].diffestimate <= subseedradius + opt_differences) &&
                       ((opt_no_otu_breaking) || 
                        (db_getabundance(targetampliconid)
                         <= subseedabundance)))
@@ -313,7 +314,7 @@ void algo_run()
 #endif
 
               for(unsigned long i=0; i < listlen; i++)
-                if ((long)qgramdiffs[i] <= resolution)
+                if ((long)qgramdiffs[i] <= opt_differences)
                   {
                     targetindices[targetcount] = qgramindices[i];
                     targetampliconids[targetcount] = qgramamps[i];
@@ -324,7 +325,9 @@ void algo_run()
                 {
                   search_do(subseedampliconid, targetcount, targetampliconids, 
                             scores, diffs, alignlengths, bits);
+#ifdef VERBOSE
                   searches++;
+#endif
 
                   if (bits == 8)
                     count_comparisons_8 += targetcount;
@@ -335,7 +338,7 @@ void algo_run()
                     {
                       unsigned diff = diffs[t];
               
-                      if (diff <= (unsigned long) resolution)
+                      if (diff <= (unsigned long) opt_differences)
                         {
                           unsigned i = targetindices[t];
                 
@@ -486,12 +489,12 @@ void algo_run()
       char sep_amplicons;
       char sep_swarms;
 
-      if (mothur)
+      if (opt_mothur)
         {
           /* mothur list file output */
           sep_amplicons = ',';
           sep_swarms = '\t';
-          fprintf(outfile, "swarm_%ld\t%lu\t", resolution, swarmid);
+          fprintf(outfile, "swarm_%ld\t%lu\t", opt_differences, swarmid);
         }
       else
         {


=====================================
src/algod1.cc
=====================================
--- a/src/algod1.cc
+++ b/src/algod1.cc
@@ -42,7 +42,7 @@ static struct ampinfo_s
   int parent;
   int generation; 
   int next;       /* amp id of next amplicon in swarm */
-  int graft_cand; /* amp id of potential grafting parent (fastitdious) */
+  int graft_cand; /* amp id of potential grafting parent (fastidious) */
 } * ampinfo = 0;
 
 /* Information about each swarm (OTU) */
@@ -375,7 +375,7 @@ void * worker(void * vp)
   while (tip->work >= 0)
     {
       /* wait for work available */
-      if (tip->work == 0)
+      while (tip->work == 0)
         pthread_cond_wait(&tip->workcond, &tip->workmutex);
       if (tip->work > 0)
         {
@@ -397,10 +397,10 @@ void threads_init()
   /* allocate memory for thread info, incl the variant sequences */
   unsigned long longestamplicon = db_getlongestsequence();
   ti = (struct thread_info_s *) 
-    xmalloc(threads * sizeof(struct thread_info_s));
+    xmalloc(opt_threads * sizeof(struct thread_info_s));
   
   /* init and create worker threads */
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       tip->varseq = (unsigned char*) xmalloc(longestamplicon+1);
@@ -417,7 +417,7 @@ void threads_init()
 void threads_done()
 {
   /* finish and clean up worker threads */
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       
@@ -442,37 +442,18 @@ void threads_done()
   pthread_attr_destroy(&attr);
 }
 
-void swarm_breaker_info(int amp)
-{
-  /* output info for swarm_breaker script */
-  if (opt_internal_structure)
-    {
-      long seed = ampinfo[amp].parent;
-      fprint_id_noabundance(internal_structure_file, seed);
-      fprintf(internal_structure_file, "\t");
-      fprint_id_noabundance(internal_structure_file, amp);
-      fprintf(internal_structure_file, "\t%d", 1);
-      fprintf(internal_structure_file, "\t%d\t%d", 
-              ampinfo[seed].swarmid + 1,
-              ampinfo[amp].generation);
-      fprintf(internal_structure_file, "\n");
-    }
-}
-
 void add_amp_to_swarm(int amp)
 {
   /* add to swarm */
   ampinfo[current_swarm_tail].next = amp;
   current_swarm_tail = amp;
-
-  swarm_breaker_info(amp);
 }
 
-void process_seed(int seed, int subseed)
+void process_seed(int subseed)
 {
   unsigned long seqlen = db_getsequencelen(subseed);
 
-  threads_used = threads;
+  threads_used = opt_threads;
   if (threads_used > seqlen + 1)
     threads_used = seqlen+1;
 
@@ -510,7 +491,8 @@ void process_seed(int seed, int subseed)
     {
       if (global_hits_count + ti[t].hits_count > global_hits_alloc)
         {
-          global_hits_alloc <<= 1;
+          while (global_hits_count + ti[t].hits_count > global_hits_alloc)
+            global_hits_alloc <<= 1;
           global_hits_data = (int*)xrealloc(global_hits_data,
                                             global_hits_alloc * sizeof(int));
         }
@@ -648,7 +630,12 @@ int attach_candidates(int amplicons)
       int parent = graft_array[i].parent;
       int child  = graft_array[i].child;
 
-      if (!swarminfo[ampinfo[child].swarmid].attached)
+      if (swarminfo[ampinfo[child].swarmid].attached)
+        {
+          /* this light swarm is already attached */
+          ampinfo[child].graft_cand = NO_SWARM;
+        }
+      else
         {
           /* attach child to parent */
           attach(parent, child);
@@ -949,6 +936,8 @@ BloomFilter * bloomp;
 
 void mark_light_thread(long t)
 {
+  (void) t;
+
   char * buffer1 = (char*) xmalloc(db_getlongestsequence() + 2);
   pthread_mutex_lock(&light_mutex);
   while (light_progress < light_amplicon_count)
@@ -976,6 +965,8 @@ static long amplicons;
 
 void check_heavy_thread(long t)
 {
+  (void) t;
+
   char * buffer1 = (char*) xmalloc(db_getlongestsequence() + 2);
   char * buffer2 = (char*) xmalloc(db_getlongestsequence() + 3);
   pthread_mutex_lock(&heavy_mutex);
@@ -1071,7 +1062,7 @@ void algo_d1_run()
           global_hits_count = 0;
 
           /* find the first generation matches */
-          process_seed(seed, seed);
+          process_seed(seed);
 
           /* sort hits */
           qsort(global_hits_data, global_hits_count,
@@ -1089,7 +1080,7 @@ void algo_d1_run()
               global_hits_count = 0;
               while(subseed != NO_SWARM)
                 {
-                  process_seed(seed, subseed);
+                  process_seed(subseed);
                   update_stats(subseed);
                   subseed = ampinfo[subseed].next;
                 }
@@ -1227,7 +1218,7 @@ void algo_d1_run()
             }
 
           fprintf(logfile,
-                  "Bloom filter: bits=%ld, m=%ld, k=%ld, size=%.1lfMB\n",
+                  "Bloom filter: bits=%ld, m=%ld, k=%ld, size=%.1fMB\n",
                   bits, m, k, 1.0 * m / (8*1024*1024));
           
           bloomp = new BloomFilter(m, k);
@@ -1247,7 +1238,7 @@ void algo_d1_run()
           light_progress = 0;
           light_amplicon_count = amplicons_in_small_otus;
           light_amplicon = amplicons - 1;
-          ThreadRunner * tr = new ThreadRunner(threads, mark_light_thread);
+          ThreadRunner * tr = new ThreadRunner(opt_threads, mark_light_thread);
           tr->run();
           delete tr;
           pthread_mutex_destroy(&light_mutex);
@@ -1286,7 +1277,7 @@ void algo_d1_run()
           heavy_progress = 0;
           heavy_amplicon_count = amplicons_in_large_otus;
           heavy_amplicon = 0;
-          ThreadRunner * heavy_tr = new ThreadRunner(threads, 
+          ThreadRunner * heavy_tr = new ThreadRunner(opt_threads, 
                                                      check_heavy_thread);
           heavy_tr->run();
           delete heavy_tr;
@@ -1331,8 +1322,8 @@ void algo_d1_run()
 
   progress_init("Writing swarms:   ", swarmcount);
 
-  if (mothur)
-    fprintf(outfile, "swarm_%ld\t%ld", resolution, swarmcount_adjusted);
+  if (opt_mothur)
+    fprintf(outfile, "swarm_%ld\t%ld", opt_differences, swarmcount_adjusted);
 
   for(int i = 0; i < swarmcount; i++)
     {
@@ -1343,7 +1334,7 @@ void algo_d1_run()
                a >= 0;
                a = ampinfo[a].next)
             {
-              if (mothur)
+              if (opt_mothur)
                 {
                   if (a == seed)
                     fputc('\t', outfile);
@@ -1357,13 +1348,13 @@ void algo_d1_run()
                 }
               fprint_id(outfile, a);
             }
-          if (!mothur)
+          if (!opt_mothur)
             fputc('\n', outfile);
         }
       progress_update(i+1);
     }
 
-  if (mothur)
+  if (opt_mothur)
     fputc('\n', outfile);
   
   progress_done();
@@ -1373,7 +1364,7 @@ void algo_d1_run()
 
   if (opt_seeds)
     {
-      progress_init("Writing seeds:    ", swarmcount_adjusted);
+      progress_init("Writing seeds:    ", swarmcount);
       for(int i=0; i < swarmcount; i++)
         {
           if (!swarminfo[i].attached)
@@ -1390,10 +1381,80 @@ void algo_d1_run()
     }
 
 
+  /* output internal structure */
+
+  if (opt_internal_structure)
+    {
+      unsigned int cluster_no = 0;
+
+      progress_init("Writing structure:", swarmcount);
+
+      for(unsigned int swarmid = 0; swarmid < swarmcount ; swarmid++)
+        {
+          if (!swarminfo[swarmid].attached)
+            {
+              int seed = swarminfo[swarmid].seed;
+
+              struct ampinfo_s * bp = ampinfo + seed;
+
+              for (int a = bp->next;
+                   a >= 0;
+                   a = ampinfo[a].next)
+                {
+                  long graft_parent = ampinfo[a].graft_cand;
+                  if (graft_parent != NO_SWARM)
+                    {
+                      fprint_id_noabundance(internal_structure_file, graft_parent);
+                      fprintf(internal_structure_file, "\t");
+                      fprint_id_noabundance(internal_structure_file, a);
+                      fprintf(internal_structure_file,
+                              "\t%d\t%d\t%d\n",
+                              2,
+                              cluster_no + 1,
+                              ampinfo[graft_parent].generation + 1);
+                    }
+
+                  long parent = ampinfo[a].parent;
+                  if (parent != NO_SWARM)
+                    {
+                      int diff = 1;
+                      if (duplicates_found)
+                        {
+                          unsigned long parentseqlen = db_getsequencelen(parent);
+                          unsigned long ampseqlen = db_getsequencelen(a);
+                          if (parentseqlen == ampseqlen)
+                            {
+                              unsigned char * parentseq = (unsigned char *) db_getsequence(parent);
+                              unsigned char * ampseq = (unsigned char *) db_getsequence(a);
+                              if (memcmp(parentseq, ampseq, parentseqlen) == 0)
+                                diff = 0;
+                            }
+                        }
+                      fprint_id_noabundance(internal_structure_file, parent);
+                      fprintf(internal_structure_file, "\t");
+                      fprint_id_noabundance(internal_structure_file, a);
+                      fprintf(internal_structure_file,
+                              "\t%d\t%d\t%d\n",
+                              diff,
+                              cluster_no + 1,
+                              ampinfo[a].generation);
+                    }
+                }
+
+              cluster_no++;
+            }
+          progress_update(swarmid);
+        }
+      progress_done();
+    }
+
+
   /* output swarm in uclust format */
 
   if (uclustfile)
     {
+      unsigned int cluster_no = 0;
+
       progress_init("Writing UCLUST:   ", swarmcount);
 
       for(unsigned int swarmid = 0; swarmid < swarmcount ; swarmid++)
@@ -1405,13 +1466,13 @@ void algo_d1_run()
               struct ampinfo_s * bp = ampinfo + seed;
               
               fprintf(uclustfile, "C\t%u\t%d\t*\t*\t*\t*\t*\t",
-                      swarmid,
+                      cluster_no,
                       swarminfo[swarmid].size);
               fprint_id(uclustfile, seed);
               fprintf(uclustfile, "\t*\n");
               
               fprintf(uclustfile, "S\t%u\t%lu\t*\t*\t*\t*\t*\t",
-                      swarmid,
+                      cluster_no,
                       db_getsequencelen(seed));
               fprint_id(uclustfile, seed);
               fprintf(uclustfile, "\t*\n");
@@ -1440,7 +1501,7 @@ void algo_d1_run()
                   
                   fprintf(uclustfile,
                           "H\t%d\t%lu\t%.1f\t+\t0\t0\t%s\t",
-                          ampinfo[seed].swarmid,
+                          cluster_no,
                           db_getsequencelen(a),
                           percentid, 
                           nwdiff > 0 ? nwalignment : "=");
@@ -1453,6 +1514,8 @@ void algo_d1_run()
                   if (nwalignment)
                     free(nwalignment);
                 }
+
+              cluster_no++;
             }
           progress_update(swarmid);
         }


=====================================
src/bitmap.h
=====================================
--- a/src/bitmap.h
+++ b/src/bitmap.h
@@ -29,7 +29,7 @@ class Bitmap
   
  public:
 
-  Bitmap(size_t _size)
+  explicit Bitmap(size_t _size)
   {
     size = _size;
     data = (unsigned char *) xmalloc((size+7)/8);


=====================================
src/db.cc
=====================================
--- a/src/db.cc
+++ b/src/db.cc
@@ -90,21 +90,31 @@ void showseq(char * seq)
 
 void fprint_id(FILE * stream, unsigned long x)
 {
-  fprintf(stream, "%.*s", seqindex[x].headeridlen, seqindex[x].header);
+  seqinfo_t * sp = seqindex + x;
+  char * h = sp->header;
+  int hdrlen = sp->headerlen;
+
+  if (opt_append_abundance && (sp->abundance_start == sp->abundance_end))
+    if (opt_usearch_abundance)
+      fprintf(stream, "%.*s;size=%lu;", hdrlen, h, sp->abundance);
+    else
+      fprintf(stream, "%.*s_%lu", hdrlen, h, sp->abundance);
+  else
+    fprintf(stream, "%.*s", hdrlen, h);
 }
 
 void fprint_id_noabundance(FILE * stream, unsigned long x)
 {
   seqinfo_t * sp = seqindex + x;
   char * h = sp->header;
-  int hdrlen = sp->headeridlen;
+  int hdrlen = sp->headerlen;
 
   if (sp->abundance_start < sp->abundance_end)
     {
       /* print start of header */
       fprintf(stream, "%.*s", sp->abundance_start, h);
       
-      if (usearch_abundance)
+      if (opt_usearch_abundance)
         {
           /* print semicolon if the abundance is not at either end */
           if ((sp->abundance_start > 0) && (sp->abundance_end < hdrlen))
@@ -115,9 +125,7 @@ void fprint_id_noabundance(FILE * stream, unsigned long x)
         }
     }
   else
-    {
-      fprintf(stream, "%s", h);
-    }
+    fprintf(stream, "%.*s", hdrlen, h);
 }
 
 void fprint_id_with_new_abundance(FILE * stream,
@@ -126,14 +134,14 @@ void fprint_id_with_new_abundance(FILE * stream,
 {
   seqinfo_t * sp = seqindex + seqno;
 
-  if (usearch_abundance)
+  if (opt_usearch_abundance)
     fprintf(stream,
             "%.*s%ssize=%lu;%.*s",
             sp->abundance_start,
             sp->header,
             sp->abundance_start > 0 ? ";" : "",
             abundance,
-            sp->headeridlen - sp->abundance_end,
+            sp->headerlen - sp->abundance_end,
             sp->header + sp->abundance_end);
   else
     fprintf(stream,
@@ -153,14 +161,7 @@ int db_compare_abundance(const void * a, const void * b)
   else if (x->abundance < y->abundance)
     return +1;
   else 
-    {
-      if (x < y)
-        return -1;
-      else if (x > y)
-        return +1;
-      else
-        return 0;
-    }
+    return strcmp(x->header, y->header);
 }
 
 void db_read(const char * filename)
@@ -189,14 +190,15 @@ void db_read(const char * filename)
 
   /* get file size */
 
-  long filesize = 0;
-  if (filename)
-    {
-      if (fseek(fp, 0, SEEK_END))
-        fatal("Error: Unable to seek in database file (%s)", filename);
-      filesize = ftell(fp);
-      rewind(fp);
-    }
+  struct stat fs;
+
+  if (fstat(fileno(fp), & fs))
+    fatal("Unable to fstat on input file (%s)", filename);
+  bool is_regular = S_ISREG(fs.st_mode);
+  long filesize = is_regular ? fs.st_size : 0;
+
+  if (! is_regular)
+    fprintf(logfile, "Waiting for data... (Hit Ctrl-C and run swarm -h if you meant to read data from a file.)\n");
 
   char line[LINEALLOC];
   line[0] = 0;
@@ -263,10 +265,11 @@ void db_read(const char * filename)
 
       while (line[0] && (line[0] != '>'))
         {
-          char m;
           char c;
           char * p = line;
           while((c = *p++))
+	    {
+	    char m;
             if ((m = map_nt[(int)c]) >= 0)
               {
                 while (datalen >= dataalloc)
@@ -280,13 +283,19 @@ void db_read(const char * filename)
               }
             else if ((c != 10) && (c != 13))
               {
-                char msg[100];
                 if ((c >= 32) && (c <= 126))
-                  snprintf(msg, 100, "Illegal character '%c' in sequence on line %u", c, lineno);
+                  fprintf(stderr,
+                          "Illegal character '%c' in sequence on line %u",
+                          c,
+                          lineno);
                 else
-                  snprintf(msg, 100, "Illegal character (ascii no %d) in sequence on line %u", c, lineno);
-                fatal(msg);
+                  fprintf(stderr,
+                          "Illegal character (ascii no %d) in sequence on line %u",
+                          c,
+                          lineno);
+                exit(1);
               }
+	    }
           line[0] = 0;
           if (!fgets(line, LINEALLOC, fp))
             line[0] = 0;
@@ -301,6 +310,12 @@ void db_read(const char * filename)
       
       long length = datalen - seqbegin;
 
+      if (length == 0)
+        {
+          fprintf(stderr, "\nError: Empty sequence found on line %u.\n\n", lineno-1);
+          exit(1);
+        }
+
       nucleotides += length;
 
       if (length > longest)
@@ -311,7 +326,7 @@ void db_read(const char * filename)
 
       sequences++;
       
-      if (filename)
+      if (is_regular)
         progress_update(ftell(fp));
     }
   progress_done();
@@ -322,12 +337,25 @@ void db_read(const char * filename)
 
   unsigned long hdrhashsize = 2 * sequences;
 
-  seqinfo_t * * hdrhashtable = 
+  seqinfo_t * * hdrhashtable =
     (seqinfo_t **) xmalloc(hdrhashsize * sizeof(seqinfo_t *));
   memset(hdrhashtable, 0, hdrhashsize * sizeof(seqinfo_t *));
 
   unsigned long duplicatedidentifiers = 0;
-  
+
+  /* set up hash to check for unique sequences */
+
+  unsigned long seqhashsize = 2 * sequences;
+
+  seqinfo_t * * seqhashtable = 0;
+
+  if (opt_differences > 0)
+    {
+      seqhashtable =
+        (seqinfo_t **) xmalloc(seqhashsize * sizeof(seqinfo_t *));
+      memset(seqhashtable, 0, seqhashsize * sizeof(seqinfo_t *));
+    }
+
   /* create indices */
 
   seqindex = (seqinfo_t *) xmalloc(sequences * sizeof(seqinfo_t));
@@ -336,7 +364,7 @@ void db_read(const char * filename)
   regex_t db_regexp;
   regmatch_t pmatch[4];
 
-  if (usearch_abundance)
+  if (opt_usearch_abundance)
     {
       if (regcomp(&db_regexp, "(^|;)size=([0-9]+)(;|$)", REG_EXTENDED))
         fatal("Regular expression compilation failed");
@@ -347,11 +375,12 @@ void db_read(const char * filename)
         fatal("Regular expression compilation failed");
     }
 
-  long lastabundance = LONG_MAX;
+  seqinfo_t * lastseq = 0;
 
   int presorted = 1;
   int missingabundance = 0;
   unsigned int missingabundance_lineno = 0;
+  char * missingabundance_header = 0;
 
   char * p = datap;
   progress_init("Indexing database:", sequences);
@@ -364,26 +393,40 @@ void db_read(const char * filename)
       /* get header */
       seqindex_p->header = p;
       seqindex_p->headerlen = strlen(seqindex_p->header);
-      seqindex_p->headeridlen = seqindex_p->headerlen;
       p += seqindex_p->headerlen + 1;
 
+      /* and sequence */
+      seqindex_p->seq = p;
+      seqindex_p->seqlen = strlen(p);
+      p += seqindex_p->seqlen + 1;
+
       /* get amplicon abundance */
-      seqindex_p->abundance = 0;
       if (!regexec(&db_regexp, seqindex_p->header, 4, pmatch, 0))
         {
           seqindex_p->abundance = atol(seqindex_p->header + pmatch[2].rm_so);
           seqindex_p->abundance_start = pmatch[0].rm_so;
           seqindex_p->abundance_end = pmatch[0].rm_eo;
+
+          if (seqindex_p->abundance == 0)
+            {
+              fprintf(stderr,
+                      "\nError: Illegal abundance value on line %u:\n%s\n"
+                      "Abundance values should be positive integers.\n\n",
+                      lineno,
+                      seqindex_p->header);
+              exit(1);
+            }
         }
       else
         {
-          seqindex_p->abundance_start = 0;
-          seqindex_p->abundance_end = 0;
+          seqindex_p->abundance_start = seqindex_p->headerlen;
+          seqindex_p->abundance_end = seqindex_p->headerlen;
+          seqindex_p->abundance = 0;
         }
       
       if (seqindex_p->abundance < 1)
         {
-          if (opt_append_abundance > 0)
+          if (opt_append_abundance)
             {
               seqindex_p->abundance = opt_append_abundance;
             }
@@ -391,42 +434,85 @@ void db_read(const char * filename)
             {
               missingabundance++;
               if (missingabundance == 1)
-                missingabundance_lineno = lineno;
+                {
+                  missingabundance_lineno = lineno;
+                  missingabundance_header = seqindex_p->header;
+                }
+            }
+        }
+
+      if (seqindex_p->abundance_start == 0)
+          fatal("Empty sequence identifier");
+
+
+      /* check if the sequences are presorted by abundance and header */
+
+      if (presorted && lastseq)
+        {
+          if (lastseq->abundance < seqindex_p->abundance)
+            presorted = 0;
+          else if (lastseq->abundance == seqindex_p->abundance)
+            {
+              if (strcmp(lastseq->header, seqindex_p->header) > 0)
+                presorted = 0;
             }
         }
 
-      if (seqindex_p->abundance > lastabundance)
-        presorted = 0;
+      lastseq = seqindex_p;
 
-      lastabundance = seqindex_p->abundance;
 
-      /* check hash, fatal error if found, otherwize insert new */
-      unsigned long hdrhash = HASH((unsigned char*)seqindex_p->header, seqindex_p->headeridlen);
+      /* check for duplicated identifiers using hash table */
+
+      unsigned long hdrhash = HASH((unsigned char*)seqindex_p->header, seqindex_p->abundance_start);
       seqindex_p->hdrhash = hdrhash;
-      unsigned long hashindex = hdrhash % hdrhashsize;
+      unsigned long hdrhashindex = hdrhash % hdrhashsize;
 
-      seqinfo_t * found;
+      seqinfo_t * hdrfound = 0;
     
-      while ((found = hdrhashtable[hashindex]))
+      while ((hdrfound = hdrhashtable[hdrhashindex]))
         {
-          if ((found->hdrhash == hdrhash) &&
-              (found->headeridlen == seqindex_p->headeridlen) &&
-              (strncmp(found->header, seqindex_p->header, found->headeridlen) == 0))
+          if ((hdrfound->hdrhash == hdrhash) &&
+              (hdrfound->abundance_start == seqindex_p->abundance_start) &&
+              (strncmp(hdrfound->header, seqindex_p->header, hdrfound->abundance_start) == 0))
             break;
-          hashindex = (hashindex + 1) % hdrhashsize;
+          hdrhashindex = (hdrhashindex + 1) % hdrhashsize;
         }
 
-      if (found)
+      if (hdrfound)
         {
           duplicatedidentifiers++;
-          fprintf(logfile, "WARNING: Duplicated sequence identifier: %s\n", seqindex_p->header);
+          fprintf(stderr, "\nError: Duplicated sequence identifier: %.*s\n\n",
+                  seqindex_p->abundance_start,
+                  seqindex_p->header);
+          exit(1);
         }
 
-      hdrhashtable[hashindex] = seqindex_p;
+      hdrhashtable[hdrhashindex] = seqindex_p;
     
-      seqindex_p->seq = p;
-      seqindex_p->seqlen = strlen(p);
-      p += seqindex_p->seqlen + 1;
+
+      if (opt_differences > 0)
+        {
+          /* check for duplicated sequences using hash table */
+          unsigned long seqhash = HASH((unsigned char*)seqindex_p->seq,
+                                       seqindex_p->seqlen);
+          seqindex_p->seqhash = seqhash;
+          unsigned long seqhashindex = seqhash % seqhashsize;
+          seqinfo_t * seqfound = 0;
+
+          while ((seqfound = seqhashtable[seqhashindex]))
+            {
+              if ((seqfound->seqhash == seqhash) &&
+                  (seqfound->seqlen == seqindex_p->seqlen) &&
+                  (memcmp(seqfound->seq, seqindex_p->seq, seqfound->seqlen) == 0))
+                break;
+              seqhashindex = (seqhashindex + 1) % seqhashsize;
+            }
+
+          if (seqfound)
+            duplicates_found++;
+          else
+            seqhashtable[seqhashindex] = seqindex_p;
+        }
 
       seqindex_p++;
       progress_update(i);
@@ -435,17 +521,26 @@ void db_read(const char * filename)
 
   if (missingabundance)
     {
-      char * msg;
-      if (asprintf(&msg,
-                   "Abundance annotations not found for %d sequences, starting on line %u.\n"
-                   "Fasta headers must end with abundance annotations (_INT or ;size=INT).\n"
-                   "The -z option must be used if the abundance annotation is in the latter format.\n"
-                   "Abundance annotations can be produced by dereplicating the sequences.\n",
-                   missingabundance,
-                   missingabundance_lineno) == -1)
-        fatal("Out of memory");
-      else
-        fatal(msg);
+      fprintf(stderr,
+              "\nError: Abundance annotations not found for %d sequences, starting on line %u.\n"
+              ">%s\n"
+              "Fasta headers must end with abundance annotations (_INT or ;size=INT).\n"
+              "The -z option must be used if the abundance annotation is in the latter format.\n"
+              "Abundance annotations can be produced by dereplicating the sequences.\n"
+              "The header is defined as the string comprised between the \">\" symbol\n"
+              "and the first space or the end of the line, whichever comes first.\n\n",
+              missingabundance,
+              missingabundance_lineno,
+              missingabundance_header);
+      exit(1);
+    }
+
+  if (duplicates_found)
+    {
+      fprintf(logfile,
+              "WARNING: %lu duplicated sequences detected.\n"
+              "Please consider dereplicating your data for optimal results.\n",
+              duplicates_found);
     }
 
   if (!presorted)
@@ -459,8 +554,11 @@ void db_read(const char * filename)
 
   free(hdrhashtable);
 
-  if (duplicatedidentifiers)
-    exit(1);
+  if (seqhashtable)
+    {
+      free(seqhashtable);
+      seqhashtable = 0;
+    }
 }
 
 void db_qgrams_init()


=====================================
src/derep.cc
=====================================
--- a/src/derep.cc
+++ b/src/derep.cc
@@ -213,20 +213,20 @@ void dereplicate()
 
   progress_init("Writing swarms:   ", swarmcount);
 
-  if (mothur)
-    fprintf(outfile, "swarm_%ld\t%ld", resolution, swarmcount);
+  if (opt_mothur)
+    fprintf(outfile, "swarm_%ld\t%ld", opt_differences, swarmcount);
 
   for(int i = 0; i < swarmcount; i++)
     {
       int seed = hashtable[i].seqno_first;
-      if (mothur)
+      if (opt_mothur)
         fputc('\t', outfile);
       fprint_id(outfile, seed);
       int a = nextseqtab[seed];
 
       while (a)
         {
-          if (mothur)
+          if (opt_mothur)
             fputc(',', outfile);
           else
             fputc(SEPCHAR, outfile);
@@ -234,13 +234,13 @@ void dereplicate()
           a = nextseqtab[a];
         }
       
-      if (!mothur)
+      if (!opt_mothur)
         fputc('\n', outfile);
 
       progress_update(i+1);
     }
 
-  if (mothur)
+  if (opt_mothur)
     fputc('\n', outfile);
   
   progress_done();
@@ -345,7 +345,7 @@ void dereplicate()
           fprint_id_noabundance(statsfile, sp->seqno_first);
           fprintf(statsfile, "\t%lu\t%u\t%u\t%u\n", 
                   db_getabundance(sp->seqno_first),
-                  sp->singletons, 0, 0);
+                  sp->singletons, 0U, 0U);
           progress_update(i);
         }
       progress_done();


=====================================
src/nw.cc
=====================================
--- a/src/nw.cc
+++ b/src/nw.cc
@@ -125,7 +125,6 @@ void nw(char * dseq,
      hearray must point to at least 2*qlen longs of allocated memory (8*qlen bytes) */
 
   long n, e;
-  long unsigned *hep;
 
   long qlen = qend - qseq;
   long dlen = dend - dseq;
@@ -142,6 +141,7 @@ void nw(char * dseq,
 
   for(j=0; j<dlen; j++)
   {
+    long unsigned *hep;
     hep = hearray;
     long f = 2 * gapopen + (j+2) * gapextend;
     long h = (j == 0) ? 0 : (gapopen + j * gapextend);


=====================================
src/qgram.cc
=====================================
--- a/src/qgram.cc
+++ b/src/qgram.cc
@@ -210,7 +210,8 @@ void * qgram_worker(void * vp)
   while (tip->work >= 0)
     {
       /* wait for work available */
-      pthread_cond_wait(&tip->workcond, &tip->workmutex);
+      while (tip->work == 0)
+        pthread_cond_wait(&tip->workcond, &tip->workmutex);
       if (tip->work > 0)
         {
           qgram_work_diff(tip);
@@ -228,11 +229,11 @@ void qgram_diff_init()
   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
   
   /* allocate memory for thread info */
-  ti = (struct thread_info_s *) xmalloc(threads * 
+  ti = (struct thread_info_s *) xmalloc(opt_threads * 
                                         sizeof(struct thread_info_s));
   
   /* init and create worker threads */
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       tip->work = 0;
@@ -246,7 +247,7 @@ void qgram_diff_init()
 void qgram_diff_done()
 {
   /* finish and clean up worker threads */
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       
@@ -273,7 +274,7 @@ void qgram_diff_fast(unsigned long seed,
                      unsigned long * amplist,
                      unsigned long * difflist)
 {
-  long thr = threads;
+  long thr = opt_threads;
   
   const unsigned long m = 150;
 


=====================================
src/scan.cc
=====================================
--- a/src/scan.cc
+++ b/src/scan.cc
@@ -237,7 +237,8 @@ void * search_worker(void * vp)
   while (tip->work >= 0)
     {
       /* wait for work available */
-      pthread_cond_wait(&tip->workcond, &tip->workmutex);
+      while (tip->work == 0)
+        pthread_cond_wait(&tip->workcond, &tip->workmutex);
       if (tip->work > 0)
         {
           search_worker_core(t);
@@ -268,7 +269,7 @@ void search_do(unsigned long query_no,
   master_alignlengths = alignlengths;
   master_bits = bits;
 
-  unsigned long thr = threads;
+  unsigned long thr = opt_threads;
 
   if (bits == 8)
     {
@@ -315,9 +316,9 @@ void search_begin()
 {
   longestdbsequence = db_getlongestsequence();
   
-  sd = (struct search_data *) xmalloc(sizeof(search_data) * threads);
+  sd = (struct search_data *) xmalloc(sizeof(search_data) * opt_threads);
 
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     search_alloc(sd+t);
 
   /* start threads */
@@ -326,11 +327,11 @@ void search_begin()
   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
   
   /* allocate memory for thread info */
-  ti = (struct thread_info_s *) xmalloc(threads * 
+  ti = (struct thread_info_s *) xmalloc(opt_threads *
                                         sizeof(struct thread_info_s));
   
   /* init and create worker threads */
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       tip->work = 0;
@@ -346,7 +347,7 @@ void search_end()
 {
   /* finish and clean up worker threads */
 
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     {
       struct thread_info_s * tip = ti + t;
       
@@ -367,7 +368,7 @@ void search_end()
   free(ti);
   pthread_attr_destroy(&attr);
 
-  for(unsigned long t=0; t<threads; t++)
+  for(long t=0; t<opt_threads; t++)
     search_free(sd+t);
   free(sd);
 }


=====================================
src/search16.cc
=====================================
--- a/src/search16.cc
+++ b/src/search16.cc
@@ -570,7 +570,7 @@ void search16(WORD * * q_start,
   unsigned long next_id = 0;
   unsigned long done;
   
-  T0 = _mm_set_epi16(0, 0, 0, 0, 0, 0, 0, 0xffff);
+  T0 = _mm_set_epi16(0, 0, 0, 0, 0, 0, 0, -1);
   Q  = _mm_set1_epi16(gap_open_penalty+gap_extend_penalty);
   R  = _mm_set1_epi16(gap_extend_penalty);
 


=====================================
src/search8.cc
=====================================
--- a/src/search8.cc
+++ b/src/search8.cc
@@ -854,8 +854,7 @@ void search8(BYTE * * q_start,
   unsigned long next_id = 0;
   unsigned long done;
   
-  T0 = _mm_set_epi8(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
-                    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff);
+  T0 = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1);
   Q  = _mm_set1_epi8(gap_open_penalty+gap_extend_penalty);
   R  = _mm_set1_epi8(gap_extend_penalty);
 


=====================================
src/swarm.cc
=====================================
--- a/src/swarm.cc
+++ b/src/swarm.cc
@@ -23,47 +23,34 @@
 
 #include "swarm.h"
 
-/* ARGUMENTS AND THEIR DEFAULTS */
-
-#define DEFAULT_GAPOPEN 12
-#define DEFAULT_GAPEXTEND 4
-#define DEFAULT_MATCHSCORE 5
-#define DEFAULT_MISMATCHSCORE (-4)
-#define DEFAULT_THREADS 1
-#define DEFAULT_RESOLUTION 1
-#define DEFAULT_BREAK_SWARMS 0
-#define DEFAULT_MOTHUR 0
-#define DEFAULT_USEARCH_ABUNDANCE 0
-#define DEFAULT_INTERNAL_STRUCTURE 0
-#define DEFAULT_LOG 0
-#define DEFAULT_NO_OTU_BREAKING 0
-#define DEFAULT_FASTIDIOUS 0
-#define DEFAULT_BOUNDARY 3
-
-char * outfilename;
-char * statsfilename;
-char * uclustfilename;
+/* OPTIONS */
+
 char * progname;
-char * databasename;
-long gapopen;
-long gapextend;
-long matchscore;
-long mismatchscore;
-unsigned long threads;
-long resolution;
-long break_swarms;
-long mothur;
-long usearch_abundance;
+char * input_filename;
 
-char * opt_log;
 char * opt_internal_structure;
+char * opt_log;
+char * opt_output_file;
 char * opt_seeds;
-long opt_no_otu_breaking;
-long opt_fastidious;
-long opt_boundary;
+char * opt_statistics_file;
+char * opt_uclust_file;
+
+long opt_append_abundance;
 long opt_bloom_bits;
+long opt_boundary;
 long opt_ceiling;
-long opt_append_abundance;
+long opt_differences;
+long opt_fastidious;
+long opt_gap_extension_penalty;
+long opt_gap_opening_penalty;
+long opt_help;
+long opt_match_reward;
+long opt_mismatch_penalty;
+long opt_mothur;
+long opt_no_otu_breaking;
+long opt_threads;
+long opt_usearch_abundance;
+long opt_version;
 
 long penalty_factor;
 long penalty_gapextend;
@@ -85,6 +72,8 @@ long avx2_present = 0;
 
 unsigned long dbsequencecount = 0;
 
+unsigned long duplicates_found = 0;
+
 FILE * outfile;
 FILE * statsfile;
 FILE * uclustfile;
@@ -92,12 +81,23 @@ FILE * logfile = stderr;
 FILE * internal_structure_file;
 FILE * fp_seeds = 0;
 
-char sym_nt[] = "-acgt                           ";
+char sym_nt[] = "-ACGT                           ";
 
-#define cpuid(f1, f2, a, b, c, d)                                       \
-  __asm__ __volatile__ ("cpuid"                                         \
-                        : "=a" (a), "=b" (b), "=c" (c), "=d" (d)        \
+char * DASH_FILENAME = (char*) "-";
+char * STDIN_NAME = (char*) "/dev/stdin";
+char * STDOUT_NAME = (char*) "/dev/stdout";
+
+void cpuid(unsigned int f1,
+           unsigned int f2,
+           unsigned int & a,
+           unsigned int & b,
+           unsigned int & c,
+           unsigned int & d)
+{
+  __asm__ __volatile__ ("cpuid"
+                        : "=a" (a), "=b" (b), "=c" (c), "=d" (d)
                         : "a" (f1), "c" (f2));
+}
 
 void cpu_features_detect()
 {
@@ -165,24 +165,33 @@ long args_long(char * str, const char * option)
 void args_show()
 {
   cpu_features_show();
-  fprintf(logfile, "Database file:     %s\n", databasename ? databasename : "(stdin)");
-  fprintf(logfile, "Output file:       %s\n", outfilename ? outfilename : "(stdout)");
-  if (statsfilename)
-    fprintf(logfile, "Statistics file:   %s\n", statsfilename);
-  if (uclustfilename)
-    fprintf(logfile, "Uclust file:       %s\n", uclustfilename);
-  fprintf(logfile, "Resolution (d):    %ld\n", resolution);
-  fprintf(logfile, "Threads:           %lu\n", threads);
-
-  if (resolution > 1)
+  fprintf(logfile, "Database file:     %s\n", input_filename);
+  fprintf(logfile, "Output file:       %s\n", opt_output_file);
+  if (opt_statistics_file)
+    fprintf(logfile, "Statistics file:   %s\n", opt_statistics_file);
+  if (opt_uclust_file)
+    fprintf(logfile, "Uclust file:       %s\n", opt_uclust_file);
+  fprintf(logfile, "Resolution (d):    %ld\n", opt_differences);
+  fprintf(logfile, "Threads:           %ld\n", opt_threads);
+
+  if (opt_differences > 1)
     {
-  fprintf(logfile, "Scores:            match: %ld, mismatch: %ld\n", matchscore, mismatchscore);
-  fprintf(logfile, "Gap penalties:     opening: %ld, extension: %ld\n", gapopen, gapextend);
-  fprintf(logfile, "Converted costs:   mismatch: %ld, gap opening: %ld, gap extension: %ld\n", penalty_mismatch, penalty_gapopen, penalty_gapextend);
+      fprintf(logfile,
+              "Scores:            match: %ld, mismatch: %ld\n",
+              opt_match_reward, opt_mismatch_penalty);
+      fprintf(logfile,
+              "Gap penalties:     opening: %ld, extension: %ld\n",
+              opt_gap_opening_penalty, opt_gap_extension_penalty);
+      fprintf(logfile,
+              "Converted costs:   mismatch: %ld, gap opening: %ld, "
+              "gap extension: %ld\n",
+              penalty_mismatch, penalty_gapopen, penalty_gapextend);
     }
-  fprintf(logfile, "Break OTUs:        %s\n", opt_no_otu_breaking ? "No" : "Yes");
+  fprintf(logfile, "Break OTUs:        %s\n",
+          opt_no_otu_breaking ? "No" : "Yes");
   if (opt_fastidious)
-    fprintf(logfile, "Fastidious:        Yes, with boundary %ld\n", opt_boundary);
+    fprintf(logfile, "Fastidious:        Yes, with boundary %ld\n",
+            opt_boundary);
   else
     fprintf(logfile, "Fastidious:        No\n");
 }
@@ -192,7 +201,7 @@ void args_usage()
   /*               0         1         2         3         4         5         6         7          */
   /*               01234567890123456789012345678901234567890123456789012345678901234567890123456789 */
 
-  fprintf(stderr, "Usage: swarm [OPTIONS] [filename]\n");
+  fprintf(stderr, "Usage: swarm [OPTIONS] [FASTAFILE]\n");
   fprintf(stderr, "\n");
   fprintf(stderr, "General options:\n");
   fprintf(stderr, " -h, --help                          display this help and exit\n");
@@ -200,11 +209,13 @@ void args_usage()
   fprintf(stderr, " -v, --version                       display version information and exit\n");
   fprintf(stderr, "\n");
   fprintf(stderr, "Clustering options:\n");
+  fprintf(stderr, " -d, --differences INTEGER           resolution (1)\n");
+  fprintf(stderr, " -n, --no-otu-breaking               never break OTUs (not recommended!)\n");
+  fprintf(stderr, "\n");
+  fprintf(stderr, "Fastidious options (only when d = 1):\n");
   fprintf(stderr, " -b, --boundary INTEGER              min mass of large OTU for fastidious (3)\n");
   fprintf(stderr, " -c, --ceiling INTEGER               max memory in MB used for fastidious\n");
-  fprintf(stderr, " -d, --differences INTEGER           resolution (1)\n");
   fprintf(stderr, " -f, --fastidious                    link nearby low-abundance swarms\n");
-  fprintf(stderr, " -n, --no-otu-breaking               never break OTUs\n");
   fprintf(stderr, " -y, --bloom-bits INTEGER            bits used per Bloom filter entry (16)\n");
   fprintf(stderr, "\n");
   fprintf(stderr, "Input/output options:\n");
@@ -218,7 +229,7 @@ void args_usage()
   fprintf(stderr, " -w, --seeds FILENAME                write seed seqs with abundances to FASTA\n");
   fprintf(stderr, " -z, --usearch-abundance             abundance annotation in usearch style\n");
   fprintf(stderr, "\n");
-  fprintf(stderr, "Pairwise alignment advanced options:\n");
+  fprintf(stderr, "Pairwise alignment advanced options (only when d > 1):\n");
   fprintf(stderr, " -m, --match-reward INTEGER          reward for nucleotide match (5)\n");
   fprintf(stderr, " -p, --mismatch-penalty INTEGER      penalty for nucleotide mismatch (4)\n");
   fprintf(stderr, " -g, --gap-opening-penalty INTEGER   gap open penalty (12)\n");
@@ -249,217 +260,315 @@ void args_init(int argc, char **argv)
   /* Set defaults */
 
   progname = argv[0];
+  input_filename = DASH_FILENAME;
 
-  databasename = NULL;
-  outfilename = NULL;
-  statsfilename = NULL;
-  resolution = DEFAULT_RESOLUTION;
-  threads = DEFAULT_THREADS;
-  matchscore = DEFAULT_MATCHSCORE;
-  mismatchscore = DEFAULT_MISMATCHSCORE;
-  gapopen = DEFAULT_GAPOPEN;
-  gapextend = DEFAULT_GAPEXTEND;
-  mothur = DEFAULT_MOTHUR;
-  usearch_abundance = DEFAULT_USEARCH_ABUNDANCE;
-  opt_log = DEFAULT_LOG;
-  opt_internal_structure = DEFAULT_INTERNAL_STRUCTURE;
-  opt_no_otu_breaking = DEFAULT_NO_OTU_BREAKING;
-  opt_fastidious = DEFAULT_FASTIDIOUS;
-  opt_boundary = DEFAULT_BOUNDARY;
+  opt_append_abundance = 0;
   opt_bloom_bits = 16;
-  opt_seeds = 0;
+  opt_boundary = 3;
   opt_ceiling = 0;
-  opt_append_abundance = 0;
+  opt_differences = 1;
+  opt_fastidious = 0;
+  opt_gap_extension_penalty = 4;
+  opt_gap_opening_penalty = 12;
+  opt_help = 0;
+  opt_internal_structure = 0;
+  opt_log = 0;
+  opt_match_reward = 5;
+  opt_mismatch_penalty = 4;
+  opt_mothur = 0;
+  opt_no_otu_breaking = 0;
+  opt_output_file = DASH_FILENAME;
+  opt_seeds = 0;
+  opt_statistics_file = 0;
+  opt_threads = 1;
+  opt_uclust_file = 0;
+  opt_usearch_abundance = 0;
+  opt_version = 0;
 
   opterr = 1;
 
-  char short_options[] = "d:ho:t:vm:p:g:e:s:u:rzi:l:nfb:w:y:c:a:";
+  char short_options[] = "a:b:c:d:e:fg:hi:l:m:no:p:rs:t:u:vw:y:z";
+
+  /* unused short option letters: jkqx */
 
   static struct option long_options[] =
   {
+    {"append-abundance",      required_argument, NULL, 'a' },
+    {"boundary",              required_argument, NULL, 'b' },
+    {"ceiling",               required_argument, NULL, 'c' },
     {"differences",           required_argument, NULL, 'd' },
+    {"gap-extension-penalty", required_argument, NULL, 'e' },
+    {"fastidious",            no_argument,       NULL, 'f' },
+    {"gap-opening-penalty",   required_argument, NULL, 'g' },
     {"help",                  no_argument,       NULL, 'h' },
-    {"output-file",           required_argument, NULL, 'o' },
-    {"threads",               required_argument, NULL, 't' },
-    {"version",               no_argument,       NULL, 'v' },
+    {"internal-structure",    required_argument, NULL, 'i' },
+    {"log",                   required_argument, NULL, 'l' },
     {"match-reward",          required_argument, NULL, 'm' },
+    {"no-otu-breaking",       no_argument,       NULL, 'n' },
+    {"output-file",           required_argument, NULL, 'o' },
     {"mismatch-penalty",      required_argument, NULL, 'p' },
-    {"gap-opening-penalty",   required_argument, NULL, 'g' },
-    {"gap-extension-penalty", required_argument, NULL, 'e' },
+    {"mothur",                no_argument,       NULL, 'r' },
     {"statistics-file",       required_argument, NULL, 's' },
+    {"threads",               required_argument, NULL, 't' },
     {"uclust-file",           required_argument, NULL, 'u' },
-    {"mothur",                no_argument,       NULL, 'r' },
-    {"usearch-abundance",     no_argument,       NULL, 'z' },
-    {"internal-structure",    required_argument, NULL, 'i' },
-    {"log",                   required_argument, NULL, 'l' },
-    {"no-otu-breaking",       no_argument,       NULL, 'n' },
-    {"fastidious",            no_argument,       NULL, 'f' },
-    {"boundary",              required_argument, NULL, 'b' },
+    {"version",               no_argument,       NULL, 'v' },
     {"seeds",                 required_argument, NULL, 'w' },
     {"bloom-bits",            required_argument, NULL, 'y' },
-    {"ceiling",               required_argument, NULL, 'c' },
-    {"append-abundance",      required_argument, NULL, 'a' },
+    {"usearch-abundance",     no_argument,       NULL, 'z' },
     { 0, 0, 0, 0 }
   };
   
+  int used_options[26] = { 0, 0, 0, 0, 0,
+                           0, 0, 0, 0, 0,
+                           0, 0, 0, 0, 0,
+                           0, 0, 0, 0, 0,
+                           0, 0, 0, 0, 0,
+                           0 };
+
   int option_index = 0;
   int c;
   
   while ((c = getopt_long(argc, argv, short_options, long_options, &option_index)) != -1)
   {
-    switch(c)
-    {
-    case 'd':
-      /* differences (resolution) */
-      resolution = args_long(optarg, "-d or --differences");
-      break;
-          
-    case 'h':
-      /* help */
-      show_header();
-      args_usage();
-      exit(0);
-      break;
 
-    case 'o':
-      /* output-file */
-      outfilename = optarg;
-      break;
-          
-    case 't':
-      /* threads */
-      threads = args_long(optarg, "-t or --threads");
-      break;
-          
-    case 'v':
-      /* version */
-      show_header();
-      exit(0);
-      break;
+    /* check if any option is specified more than once */
+
+    if ((c >= 'a') && (c <= 'z'))
+      {
+        int optindex = c - 'a';
+        if (used_options[optindex] == 1)
+          {
+            int longoptindex = 0;
+            while (long_options[longoptindex].name)
+              {
+                if (long_options[longoptindex].val == c)
+                  break;
+                longoptindex++;
+              }
+
+            fprintf(stderr,
+                    "Error: Option -%c or --%s specified more than once.\n",
+                    c,
+                    long_options[longoptindex].name);
+            exit(1);
+          }
+        used_options[optindex] = 1;
+      }
 
-    case 'm':
-      /* match-reward */
-      matchscore = args_long(optarg, "-m or --match-reward");
-      break;
-          
-    case 'p':
-      /* mismatch-penalty */
-      mismatchscore = - args_long(optarg, "-p or --mismatch-penalty");
-      break;
-          
-    case 'g':
-      /* gap-opening-penalty */
-      gapopen = args_long(optarg, "-g or --gap-opening-penalty");
-      break;
-          
-    case 'e':
-      /* gap extension penalty */
-      gapextend = args_long(optarg, "-e or --gap-extension-penalty");
-      break;
-          
-    case 's':
-      /* statistics-file */
-      statsfilename = optarg;
-      break;
-          
-    case 'u':
-      /* uclust-file */
-      uclustfilename = optarg;
-      break;
-          
-    case 'r':
-      /* mothur */
-      mothur = 1;
-      break;
-          
-    case 'z':
-      /* usearch-abundance */
-      usearch_abundance = 1;
-      break;
-          
-    case 'i':
-      /* internal-structure */
-      opt_internal_structure = optarg;
-      break;
-          
-    case 'l':
-      /* log */
-      opt_log = optarg;
-      break;
-          
-    case 'n':
-      /* no-otu-breaking */
-      opt_no_otu_breaking = 1;
-      break;
-          
-    case 'f':
-      /* fastidious */
-      opt_fastidious = 1;
-      break;
-          
-    case 'b':
-      /* boundary */
-      opt_boundary = args_long(optarg, "-b or --boundary");
-      break;
+    switch(c)
+      {
+      case 'a':
+        /* append-abundance */
+        opt_append_abundance = args_long(optarg, "-a or --append-abundance");
+        break;
+
+      case 'b':
+        /* boundary */
+        opt_boundary = args_long(optarg, "-b or --boundary");
+        break;
+
+      case 'c':
+        /* ceiling */
+        opt_ceiling = args_long(optarg, "-c or --ceiling");
+        break;
+
+      case 'd':
+        /* differences (resolution) */
+        opt_differences = args_long(optarg, "-d or --differences");
+        break;
+
+      case 'e':
+        /* gap extension penalty */
+        opt_gap_extension_penalty = args_long(optarg, "-e or --gap-extension-penalty");
+        break;
           
-    case 'w':
-      /* seeds */
-      opt_seeds = optarg;
-      break;
-      
-    case 'y':
-      /* bloom-bits */
-      opt_bloom_bits = args_long(optarg, "-y or --bloom-bits");
-      break;
-      
-    case 'c':
-      /* ceiling */
-      opt_ceiling = args_long(optarg, "-c or --ceiling");
-      break;
-      
-    case 'a':
-      /* append-abundance */
-      opt_append_abundance = args_long(optarg, "-a or --append-abundance");
-      break;
-      
-    default:
-      show_header();
-      args_usage();
-      exit(1);
-      break;
+      case 'f':
+        /* fastidious */
+        opt_fastidious = 1;
+        break;
+
+      case 'g':
+        /* gap-opening-penalty */
+        opt_gap_opening_penalty = args_long(optarg, "-g or --gap-opening-penalty");
+        break;
+
+      case 'h':
+        /* help */
+        opt_help = 1;
+        break;
+
+      case 'i':
+        /* internal-structure */
+        opt_internal_structure = optarg;
+        break;
+
+      case 'l':
+        /* log */
+        opt_log = optarg;
+        break;
+
+      case 'm':
+        /* match-reward */
+        opt_match_reward = args_long(optarg, "-m or --match-reward");
+        break;
+
+      case 'n':
+        /* no-otu-breaking */
+        opt_no_otu_breaking = 1;
+        break;
+
+      case 'o':
+        /* output-file */
+        opt_output_file = optarg;
+        break;
+
+      case 'p':
+        /* mismatch-penalty */
+        opt_mismatch_penalty = args_long(optarg, "-p or --mismatch-penalty");
+        break;
+
+      case 'r':
+        /* mothur */
+        opt_mothur = 1;
+        break;
+
+      case 's':
+        /* statistics-file */
+        opt_statistics_file = optarg;
+        break;
+
+      case 't':
+        /* threads */
+        opt_threads = args_long(optarg, "-t or --threads");
+        break;
+
+      case 'u':
+        /* uclust-file */
+        opt_uclust_file = optarg;
+        break;
+
+      case 'v':
+        /* version */
+        opt_version = 1;
+        break;
+
+      case 'w':
+        /* seeds */
+        opt_seeds = optarg;
+        break;
+
+      case 'y':
+        /* bloom-bits */
+        opt_bloom_bits = args_long(optarg, "-y or --bloom-bits");
+        break;
+
+      case 'z':
+        /* usearch-abundance */
+        opt_usearch_abundance = 1;
+        break;
+
+      default:
+        show_header();
+        args_usage();
+        exit(1);
+        break;
     }
   }
   
   if (optind < argc)
-    databasename = argv[optind];
+    input_filename = argv[optind];
   
-  if (resolution < 0)
-    fatal("Illegal resolution specified.");
-
-  if ((threads < 1) || (threads > MAX_THREADS))
-    fatal("Illegal number of threads specified");
+  if ((opt_threads < 1) || (opt_threads > MAX_THREADS))
+    {
+      fprintf(stderr, "\nError: Illegal number of threads specified with -t or --threads, must be in the range 1 to %d.\n", MAX_THREADS);
+      exit(1);
+    }
   
-  if ((gapopen < 0) || (gapextend < 0) || ((gapopen + gapextend) < 1))
-    fatal("Illegal gap penalties specified.");
+  if ((opt_differences < 0) || (opt_differences > 255))
+    fatal("Illegal number of differences specified with -d or --differences, must be in the range 0 to 255.");
+
+  if (opt_fastidious && (opt_differences != 1))
+    fatal("Fastidious mode (specified with -f or --fastidious) only works when the resolution (specified with -d or --differences) is 1.");
+
+  if (!opt_fastidious)
+    {
+      if (used_options[1])
+        fatal("Option -b or --boundary specified without -f or --fastidious.\n");
+      if (used_options[2])
+        fatal("Option -c or --ceiling specified without -f or --fastidious.\n");
+      if (used_options[24])
+        fatal("Option -y or --bloom-bits specified without -f or --fastidious.\n");
+    }
+
+  if (opt_differences < 2)
+    {
+      if (used_options[12])
+        fatal("Option -m or --match-reward specified when d < 2.");
+      if (used_options[15])
+        fatal("Option -p or --mismatch-penalty specified when d < 2.");
+      if (used_options[6])
+        fatal("Option -g or --gap-opening-penalty specified when d < 2.");
+      if (used_options[4])
+        fatal("Option -e or --gap-extension-penalty specified when d < 2.");
+    }
+
+  if (opt_gap_opening_penalty < 0)
+    fatal("Illegal gap opening penalty specified with -g or --gap-opening-penalty, must not be negative.");
+
+  if (opt_gap_extension_penalty < 0)
+    fatal("Illegal gap extension penalty specified with -e or --gap-extension-penalty, must not be negative.");
+
+  if ((opt_gap_opening_penalty + opt_gap_extension_penalty) < 1)
+    fatal("Illegal gap penalties specified, the sum of the gap open and the gap extension penalty must be at least 1.");
 
-  if (matchscore < 1)
-    fatal("Illegal match reward specified.");
+  if (opt_match_reward < 1)
+    fatal("Illegal match reward specified with -m or --match-reward, must be at least 1.");
 
-  if (mismatchscore > -1)
-    fatal("Illegal mismatch penalty specified.");
+  if (opt_mismatch_penalty < 1)
+    fatal("Illegal mismatch penalty specified with -p or --mismatch-penalty, must be at least 1.");
+
+  if (opt_boundary < 2)
+    fatal("Illegal boundary specified with -b or --boundary, must be at least 2.");
+
+  if (used_options[2] && ((opt_ceiling < 3) || (opt_ceiling > 1073741824)))
+    fatal("Illegal memory ceiling specified with -c or --ceiling, must be in the range 3 to 1073741824 MB.");
 
   if ((opt_bloom_bits < 2) || (opt_bloom_bits > 64))
-    fatal("Illegal number of Bloom filter bits specified (must be 2..64).");
+    fatal("Illegal number of Bloom filter bits specified with -y or --bloom-bits, must be in the range 2 to 64.");
+
+  if (used_options[0] && (opt_append_abundance < 1))
+    fatal("Illegal abundance value specified with -a or --append-abundance, must be at least 1.");
+
+
+  /* replace filename "-" by "/dev/stdin" for input file options */
+
+  if (!strcmp(input_filename, DASH_FILENAME))
+    input_filename = STDIN_NAME;
+
+  /* replace filename "-" by "/dev/stdout" for output file options */
+
+  char * * stdout_options[] =
+    {
+      & opt_internal_structure,
+      & opt_log,
+      & opt_output_file,
+      & opt_statistics_file,
+      & opt_uclust_file,
+      & opt_seeds,
+      0
+    };
+
+  int o = 0;
+  while(char * * stdout_opt = stdout_options[o++])
+    if ((*stdout_opt) && (!strcmp(*stdout_opt, DASH_FILENAME)))
+      *stdout_opt = STDOUT_NAME;
 
-  if (opt_ceiling < 0)
-    fatal("Illegal memory ceiling");
 
-  if (opt_append_abundance < 0)
-    fatal("Illegal abundance value specified");
+  /* open files */
 
-  if (outfilename)
+  if (opt_output_file)
     {
-      outfile = fopen(outfilename, "w");
+      outfile = fopen(opt_output_file, "w");
       if (! outfile)
         fatal("Unable to open output file for writing.");
     }
@@ -475,18 +584,18 @@ void args_init(int argc, char **argv)
   else
     fp_seeds = 0;
   
-  if (statsfilename)
+  if (opt_statistics_file)
     {
-      statsfile = fopen(statsfilename, "w");
+      statsfile = fopen(opt_statistics_file, "w");
       if (! statsfile)
         fatal("Unable to open statistics file for writing.");
     }
   else
     statsfile = 0;
   
-  if (uclustfilename)
+  if (opt_uclust_file)
     {
-      uclustfile = fopen(uclustfilename, "w");
+      uclustfile = fopen(opt_uclust_file, "w");
       if (! uclustfile)
         fatal("Unable to open uclust file for writing.");
     }
@@ -509,10 +618,7 @@ void args_init(int argc, char **argv)
         fatal("Unable to open internal structure file for writing.");
     }
   else
-    internal_structure_file = stderr;
-
-  if (opt_fastidious && (resolution != 1))
-    fatal("The fastidious option only works when the resolution (d) is 1.\n");
+    internal_structure_file = 0;
 }
 
 int main(int argc, char** argv)
@@ -524,9 +630,17 @@ int main(int argc, char** argv)
   
   args_init(argc, argv);
 
-  penalty_mismatch = 2 * matchscore - 2 * mismatchscore;
-  penalty_gapopen = 2 * gapopen;
-  penalty_gapextend = matchscore + 2 * gapextend;
+  if (opt_version || opt_help)
+    {
+      show_header();
+      if (opt_help)
+        args_usage();
+      exit(0);
+    }
+
+  penalty_mismatch = 2 * opt_match_reward + 2 * opt_mismatch_penalty;
+  penalty_gapopen = 2 * opt_gap_opening_penalty;
+  penalty_gapextend = opt_match_reward + 2 * opt_gap_extension_penalty;
 
   penalty_factor = gcd(gcd(penalty_mismatch, penalty_gapopen), penalty_gapextend);
   
@@ -540,7 +654,7 @@ int main(int argc, char** argv)
 
   fprintf(logfile, "\n");
 
-  db_read(databasename);
+  db_read(input_filename);
   
   fprintf(logfile, "Database info:     %lu nt", db_getnucleotidecount());
   fprintf(logfile, " in %lu sequences,", db_getsequencecount());
@@ -552,7 +666,7 @@ int main(int argc, char** argv)
 
   search_begin();
   
-  switch (resolution)
+  switch (opt_differences)
     {
     case 0:
       dereplicate();
@@ -582,6 +696,6 @@ int main(int argc, char** argv)
   if (statsfile)
     fclose(statsfile);
 
-  if (outfilename)
+  if (opt_output_file)
     fclose(outfile);
 }


=====================================
src/swarm.h
=====================================
--- a/src/swarm.h
+++ b/src/swarm.h
@@ -29,8 +29,10 @@
 #include <regex.h>
 #include <limits.h>
 #include <city.h>
+#include <stdarg.h>
 #include <sys/types.h>
 #include <sys/resource.h>
+#include <sys/stat.h>
 
 #ifdef __APPLE__
 #include <sys/sysctl.h>
@@ -52,7 +54,7 @@
 #define LINE_MAX 2048
 #endif
 
-#define SWARM_VERSION "2.1.12"
+#define SWARM_VERSION "2.2.2"
 #define WIDTH 32
 #define WIDTH_SHIFT 5
 #define BLOCKWIDTH 32
@@ -86,12 +88,12 @@ struct seqinfo_s
 {
   char * header;
   char * seq;
-  unsigned int headerlen;
-  unsigned int headeridlen;
+  int headerlen;
   unsigned int seqlen;
-  unsigned int abundance;
+  unsigned long abundance;
   unsigned int clusterid;
-  unsigned int hdrhash;
+  unsigned long hdrhash;
+  unsigned long seqhash;
   int abundance_start;
   int abundance_end;
 };
@@ -112,18 +114,33 @@ typedef struct queryinfo queryinfo_t;
 
 /* common data */
 
+extern char * opt_internal_structure;
+extern char * opt_log;
+extern char * opt_output_file;
+extern char * opt_seeds;
+extern char * opt_statistics_file;
+extern char * opt_uclust_file;
+extern long opt_append_abundance;
+extern long opt_bloom_bits;
+extern long opt_boundary;
+extern long opt_ceiling;
+extern long opt_differences;
+extern long opt_fastidious;
+extern long opt_gap_extension_penalty;
+extern long opt_gap_opening_penalty;
+extern long opt_help;
+extern long opt_match_reward;
+extern long opt_mismatch_penalty;
+extern long opt_mothur;
+extern long opt_no_otu_breaking;
+extern long opt_threads;
+extern long opt_usearch_abundance;
+extern long opt_version;
+
 extern char * queryname;
 extern char * matrixname;
-extern long gapopen;
-extern long gapextend;
-extern long gapopenextend;
-extern long matchscore;
-extern long mismatchscore;
-extern unsigned long threads;
-extern char * databasename;
-extern long resolution;
-extern long mothur;
-extern long usearch_abundance;
+
+extern char * input_filename;
 
 extern char map_ncbi_nt4[];
 extern char map_ncbi_nt16[];
@@ -142,15 +159,6 @@ extern FILE * internal_structure_file;
 extern FILE * logfile;
 extern FILE * fp_seeds;
 
-extern char * opt_log;
-extern char * opt_internal_structure;
-extern char * opt_seeds;
-extern long opt_no_otu_breaking;
-extern long opt_fastidious;
-extern long opt_boundary;
-extern long opt_bloom_bits;
-extern long opt_ceiling;
-extern long opt_append_abundance;
 
 extern long SCORELIMIT_7;
 extern long SCORELIMIT_8;
@@ -180,6 +188,7 @@ extern unsigned long longestdbsequence;
 
 extern queryinfo_t query;
 
+extern unsigned long duplicates_found;
 
 /* functions in util.cc */
 
@@ -188,7 +197,6 @@ void fatal(const char * msg);
 void fatal(const char * format, const char * message);
 void * xmalloc(size_t size);
 void * xrealloc(void * ptr, size_t size);
-char * xstrchrnul(char *s, int c);
 unsigned long hash_fnv_1a_64(unsigned char * s, unsigned long n);
 unsigned int hash_fnv_1a_32(unsigned char * s, unsigned long n);
 unsigned long hash_djb2(unsigned char * s, unsigned long n);


=====================================
src/util.cc
=====================================
--- a/src/util.cc
+++ b/src/util.cc
@@ -105,16 +105,6 @@ void * xrealloc(void *ptr, size_t size)
   return t;
 }
 
-char * xstrchrnul(char *s, int c)
-{
-  char * r = strchr(s, c);
-
-  if (r)
-    return r;
-  else
-    return (char *)s + strlen(s);
-}
-
 unsigned long hash_fnv_1a_64(unsigned char * s, unsigned long n)
 {
   const unsigned long fnv_offset = 14695981039346656037UL;



View it on GitLab: https://salsa.debian.org/med-team/swarm-cluster/commit/a5b46547e7e1cb448d45220116fa5682559dd510

---
View it on GitLab: https://salsa.debian.org/med-team/swarm-cluster/commit/a5b46547e7e1cb448d45220116fa5682559dd510
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debian-med-commit/attachments/20180219/77259e6e/attachment-0001.html>