[med-svn] [last-align] 01/06: New upstream version 755

Andreas Tille tille at debian.org
Tue Oct 18 11:46:17 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository last-align.

commit 9dfdbad7cbedc932eb5470a7e7b239f82753dd76
Author: Andreas Tille <tille at debian.org>
Date:   Sat Sep 10 22:37:04 2016 +0200

    New upstream version 755
---
 ChangeLog.txt                     |  23 ++++++-
 doc/last-map-probs.html           |   6 +-
 doc/last-map-probs.txt            |   6 +-
 doc/last-pair-probs.html          |  16 ++---
 doc/last-pair-probs.txt           |  16 ++---
 doc/last-parallel.html            |   4 +-
 doc/last-parallel.txt             |   4 +-
 doc/last-split.html               |  28 ++++----
 doc/last-split.txt                |  29 +++++----
 doc/last-tuning.html              |  37 ++++++-----
 doc/last-tuning.txt               |  43 ++++++-------
 doc/last-tutorial.html            | 129 ++++++++++++++++++-------------------
 doc/last-tutorial.txt             | 130 +++++++++++++++++---------------------
 doc/lastal.html                   | 104 +++++++++++++++---------------
 doc/lastal.txt                    |  89 +++++++++++++-------------
 doc/lastdb.html                   |  55 ++++++++--------
 doc/lastdb.txt                    |  55 ++++++++--------
 examples/last-bisulfite-paired.sh |   8 +--
 examples/last-bisulfite.sh        |   4 +-
 src/LastalArguments.cc            |  45 +++++++------
 src/LastdbArguments.cc            |  12 ++--
 src/split/last-split-main.cc      |   2 +-
 src/split/last-split.cc           |   2 +-
 src/version.hh                    |   2 +-
 24 files changed, 432 insertions(+), 417 deletions(-)

diff --git a/ChangeLog.txt b/ChangeLog.txt
index 6caa061..d49aba7 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -1,8 +1,29 @@
+2016-09-01  Martin C. Frith  <Martin C. Frith>
+
+	* doc/last-map-probs.txt, doc/last-pair-probs.txt, doc/last-
+	parallel.txt, doc/last-split.txt, doc/last-tutorial.txt, examples
+	/last-bisulfite-paired.sh, examples/last-bisulfite.sh,
+	test/bs100.maf, test/maf-convert-test.out:
+	Replaced score thresholds with significance thresholds.
+	[d48e9d4a7268] [tip]
+
+	* doc/last-split.txt, src/split/last-split-main.cc, src/split/last-
+	split.cc, test/last-split-test.out:
+	Reduced the default score threshold for spliced alignment.
+	[e632c170bdd2]
+
+2016-08-30  Martin C. Frith  <Martin C. Frith>
+
+	* doc/last-tuning.txt, doc/lastal.txt, doc/lastdb.txt,
+	src/LastalArguments.cc, src/LastdbArguments.cc:
+	Reduced lastal memory usage with multi volumes or threads.
+	[e0e21e5549dd]
+
 2016-08-01  Martin C. Frith  <Martin C. Frith>
 
 	* scripts/last-train:
 	Fixed last-train crash with gap existence cost < 0.
-	[9a03622aa2c0] [tip]
+	[9a03622aa2c0]
 
 	* src/TwoQualityScoreMatrix.cc, src/TwoQualityScoreMatrix.hh,
 	src/lastal.cc:
diff --git a/doc/last-map-probs.html b/doc/last-map-probs.html
index 89e5031..63c59eb 100644
--- a/doc/last-map-probs.html
+++ b/doc/last-map-probs.html
@@ -329,8 +329,8 @@ probability > 0.01.</p>
 <h2>Typical usage</h2>
 <p>These commands map DNA reads to the human genome:</p>
 <pre class="literal-block">
-lastdb -uNEAR hu human/chr*.fa
-lastal -Q1 -e120 hu reads.fastq | last-map-probs > myalns.maf
+lastdb -uNEAR -R01 hu human/chr*.fa
+lastal -Q1 -D1000 hu reads.fastq | last-map-probs > myalns.maf
 </pre>
 </div>
 <div class="section" id="options">
@@ -378,7 +378,7 @@ need too much memory.</p>
 <h2>Using multiple CPUs</h2>
 <p>This will run the pipeline on all your CPU cores:</p>
 <pre class="literal-block">
-parallel-fastq "lastal -Q1 -e120 hu | last-map-probs" < reads.fastq > myalns.maf
+parallel-fastq "lastal -Q1 -D1000 hu | last-map-probs" < reads.fastq > myalns.maf
 </pre>
 <p>It requires GNU parallel to be installed
 (<a class="reference external" href="http://www.gnu.org/software/parallel/">http://www.gnu.org/software/parallel/</a>).</p>
diff --git a/doc/last-map-probs.txt b/doc/last-map-probs.txt
index 7b49a8f..fbea50e 100644
--- a/doc/last-map-probs.txt
+++ b/doc/last-map-probs.txt
@@ -15,8 +15,8 @@ Typical usage
 
 These commands map DNA reads to the human genome::
 
-  lastdb -uNEAR hu human/chr*.fa
-  lastal -Q1 -e120 hu reads.fastq | last-map-probs > myalns.maf
+  lastdb -uNEAR -R01 hu human/chr*.fa
+  lastal -Q1 -D1000 hu reads.fastq | last-map-probs > myalns.maf
 
 Options
 -------
@@ -54,7 +54,7 @@ Using multiple CPUs
 
 This will run the pipeline on all your CPU cores::
 
-  parallel-fastq "lastal -Q1 -e120 hu | last-map-probs" < reads.fastq > myalns.maf
+  parallel-fastq "lastal -Q1 -D1000 hu | last-map-probs" < reads.fastq > myalns.maf
 
 It requires GNU parallel to be installed
 (http://www.gnu.org/software/parallel/).
diff --git a/doc/last-pair-probs.html b/doc/last-pair-probs.html
index 7b9393f..59c776f 100644
--- a/doc/last-pair-probs.html
+++ b/doc/last-pair-probs.html
@@ -349,15 +349,15 @@ probability > 0.01.</p>
 next two reads are paired, and so on.  We can align them to the human
 genome like this:</p>
 <pre class="literal-block">
-lastdb -uNEAR hg human-genome.fasta
-lastal -Q1 -e120 -i1 hg interleaved.fastq > temp.maf
+lastdb -uNEAR -R01 hg human-genome.fasta
+lastal -Q1 -D1000 -i1 hg interleaved.fastq > temp.maf
 last-pair-probs temp.maf > out.maf
 </pre>
 <p>Suppose we have paired reads in two files, where the two first reads
 are paired, the two second reads are paired, and so on.  We can
 interleave them like this:</p>
 <pre class="literal-block">
-fastq-interleave x.fastq y.fastq | lastal -Q1 -e120 -i1 hg > temp.maf
+fastq-interleave x.fastq y.fastq | lastal -Q1 -D1000 -i1 hg > temp.maf
 </pre>
 </div>
 <div class="section" id="reads-from-potentially-spliced-rna-molecules">
@@ -379,13 +379,13 @@ distribution, and then to estimate alignment probabilities.  It is
 more efficient to estimate the distance distribution from a small
 sample of the data:</p>
 <pre class="literal-block">
-lastal -Q1 -e120 -i1 hg sample.fastq | last-pair-probs -e
+lastal -Q1 -D1000 -i1 hg sample.fastq | last-pair-probs -e
 </pre>
 <p>Suppose this tells us that the mean distance is 250 and the standard
 deviation is 38.5.  We can use that to estimate the alignment
 probabilities:</p>
 <pre class="literal-block">
-lastal -Q1 -e120 -i1 hg all.fastq | last-pair-probs -f250 -s38.5 > out.maf
+lastal -Q1 -D1000 -i1 hg all.fastq | last-pair-probs -f250 -s38.5 > out.maf
 </pre>
 </div>
 <div class="section" id="going-faster-by-parallelization">
@@ -393,7 +393,7 @@ lastal -Q1 -e120 -i1 hg all.fastq | last-pair-probs -f250 -s38.5 > out.maf
 <p>This will run the pipeline on all your CPU cores:</p>
 <pre class="literal-block">
 fastq-interleave x.fastq y.fastq |
-parallel-fastq "lastal -Q1 -e120 -i1 hg | last-pair-probs -f250 -s38.5" > out.maf
+parallel-fastq "lastal -Q1 -D1000 -i1 hg | last-pair-probs -f250 -s38.5" > out.maf
 </pre>
 <p>It requires GNU parallel to be installed
 (<a class="reference external" href="http://www.gnu.org/software/parallel/">http://www.gnu.org/software/parallel/</a>).</p>
@@ -421,8 +421,8 @@ lastal) describing the alignment parameters.</p>
 </li>
 <li><p class="first">It is also possible to supply the alignments in two files:</p>
 <pre class="literal-block">
-lastal -Q1 -e120 -i1 hg x.fastq > temp1.maf
-lastal -Q1 -e120 -i1 hg y.fastq > temp2.maf
+lastal -Q1 -D1000 -i1 hg x.fastq > temp1.maf
+lastal -Q1 -D1000 -i1 hg y.fastq > temp2.maf
 last-pair-probs temp1.maf temp2.maf > out.maf
 </pre>
 </li>
diff --git a/doc/last-pair-probs.txt b/doc/last-pair-probs.txt
index 1c7a19e..e093a2f 100644
--- a/doc/last-pair-probs.txt
+++ b/doc/last-pair-probs.txt
@@ -32,15 +32,15 @@ Suppose we have paired DNA reads in a file called "interleaved.fastq"
 next two reads are paired, and so on.  We can align them to the human
 genome like this::
 
-  lastdb -uNEAR hg human-genome.fasta
-  lastal -Q1 -e120 -i1 hg interleaved.fastq > temp.maf
+  lastdb -uNEAR -R01 hg human-genome.fasta
+  lastal -Q1 -D1000 -i1 hg interleaved.fastq > temp.maf
   last-pair-probs temp.maf > out.maf
 
 Suppose we have paired reads in two files, where the two first reads
 are paired, the two second reads are paired, and so on.  We can
 interleave them like this::
 
-  fastq-interleave x.fastq y.fastq | lastal -Q1 -e120 -i1 hg > temp.maf
+  fastq-interleave x.fastq y.fastq | lastal -Q1 -D1000 -i1 hg > temp.maf
 
 Reads from potentially-spliced RNA molecules
 --------------------------------------------
@@ -63,13 +63,13 @@ distribution, and then to estimate alignment probabilities.  It is
 more efficient to estimate the distance distribution from a small
 sample of the data::
 
-  lastal -Q1 -e120 -i1 hg sample.fastq | last-pair-probs -e
+  lastal -Q1 -D1000 -i1 hg sample.fastq | last-pair-probs -e
 
 Suppose this tells us that the mean distance is 250 and the standard
 deviation is 38.5.  We can use that to estimate the alignment
 probabilities::
 
-  lastal -Q1 -e120 -i1 hg all.fastq | last-pair-probs -f250 -s38.5 > out.maf
+  lastal -Q1 -D1000 -i1 hg all.fastq | last-pair-probs -f250 -s38.5 > out.maf
 
 Going faster by parallelization
 -------------------------------
@@ -77,7 +77,7 @@ Going faster by parallelization
 This will run the pipeline on all your CPU cores::
 
   fastq-interleave x.fastq y.fastq |
-  parallel-fastq "lastal -Q1 -e120 -i1 hg | last-pair-probs -f250 -s38.5" > out.maf
+  parallel-fastq "lastal -Q1 -D1000 -i1 hg | last-pair-probs -f250 -s38.5" > out.maf
 
 It requires GNU parallel to be installed
 (http://www.gnu.org/software/parallel/).
@@ -105,8 +105,8 @@ Details
 
 * It is also possible to supply the alignments in two files::
 
-    lastal -Q1 -e120 -i1 hg x.fastq > temp1.maf
-    lastal -Q1 -e120 -i1 hg y.fastq > temp2.maf
+    lastal -Q1 -D1000 -i1 hg x.fastq > temp1.maf
+    lastal -Q1 -D1000 -i1 hg y.fastq > temp2.maf
     last-pair-probs temp1.maf temp2.maf > out.maf
 
 Options
diff --git a/doc/last-parallel.html b/doc/last-parallel.html
index 070c1c0..5d2ad15 100644
--- a/doc/last-parallel.html
+++ b/doc/last-parallel.html
@@ -364,11 +364,11 @@ parallel-fasta "lastal mydb" < queries.fa > myalns.maf
 </pre>
 <p>Instead of this:</p>
 <pre class="literal-block">
-lastal -Q1 -e120 db q.fastq | last-split > out.maf
+lastal -Q1 -D100 db q.fastq | last-split > out.maf
 </pre>
 <p>try this:</p>
 <pre class="literal-block">
-parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
+parallel-fastq "lastal -Q1 -D100 db | last-split" < q.fastq > out.maf
 </pre>
 <p>Instead of this:</p>
 <pre class="literal-block">
diff --git a/doc/last-parallel.txt b/doc/last-parallel.txt
index b2ca8f7..730157f 100644
--- a/doc/last-parallel.txt
+++ b/doc/last-parallel.txt
@@ -53,11 +53,11 @@ try this::
 
 Instead of this::
 
-  lastal -Q1 -e120 db q.fastq | last-split > out.maf
+  lastal -Q1 -D100 db q.fastq | last-split > out.maf
 
 try this::
 
-  parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
+  parallel-fastq "lastal -Q1 -D100 db | last-split" < q.fastq > out.maf
 
 Instead of this::
 
diff --git a/doc/last-split.html b/doc/last-split.html
index 99ddaf0..c54d670 100644
--- a/doc/last-split.html
+++ b/doc/last-split.html
@@ -333,8 +333,8 @@ breakpoints, or RNA queries that cross splice junctions.</p>
 format), and the genome is in "genome.fasta" (in fasta format).  We
 can do the alignment like this:</p>
 <pre class="literal-block">
-lastdb -uNEAR db genome.fasta
-lastal -Q1 -e120 db q.fastq | last-split > out.maf
+lastdb -uNEAR -R01 db genome.fasta
+lastal -Q1 -D100 db q.fastq | last-split > out.maf
 </pre>
 </div>
 <div class="section" id="spliced-alignment-of-rna-reads-to-a-genome">
@@ -344,8 +344,8 @@ time, we provide the genome information to last-split, which causes it
 to do spliced instead of split alignment, and also tells it where the
 splice signals are (GT, AG, etc):</p>
 <pre class="literal-block">
-lastdb -uNEAR db genome.fasta
-lastal -Q1 -e120 db q.fastq | last-split -g db > out.maf
+lastdb -uNEAR -R01 db genome.fasta
+lastal -Q1 -D10 db q.fastq | last-split -g db > out.maf
 </pre>
 <p>This will favour splices starting at GT (and to a lesser extent GC and
 AT), and ending at AG (and to a lesser extent AC).  However, it allows
@@ -353,12 +353,16 @@ splices starting and ending anywhere.  It also favours splices with
 introns of typical length, specified by a log-normal distribution
 (i.e. cis-splices).  However, it allows arbitrary trans-splices
 between any two places in the genome.</p>
+<p>-D10 sets a very loose significance threshold, so that we can find
+very short parts of a spliced alignment (e.g. short exons).  Note that
+last-split discards the lowest-significance alignments, but it uses
+them to estimate the ambiguity of higher-significance alignments.</p>
 </div>
 <div class="section" id="alignment-of-two-whole-genomes">
 <h3>Alignment of two whole genomes</h3>
 <p>We can align the cat and rat genomes like this:</p>
 <pre class="literal-block">
-lastdb -cR11 -uMAM8 catdb cat.fasta
+lastdb -uMAM8 -cR11 catdb cat.fasta
 lastal -m100 -E0.05 catdb rat.fasta | last-split -m1 > out.maf
 </pre>
 <p>This will align each rat base-pair to at most one cat base-pair, but
@@ -391,7 +395,7 @@ matches.</p>
 <h2>Going faster by parallelization</h2>
 <p>For example, split alignment of DNA reads to a genome:</p>
 <pre class="literal-block">
-parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
+parallel-fastq "lastal -Q1 -D100 db | last-split" < q.fastq > out.maf
 </pre>
 <p>This requires GNU parallel to be installed
 (<a class="reference external" href="http://www.gnu.org/software/parallel/">http://www.gnu.org/software/parallel/</a>).</p>
@@ -539,8 +543,8 @@ of the query.)</p>
 <p>Spliced alignment can be slow.  It can be sped up, at a small cost in
 accuracy, by not favouring cis-splices:</p>
 <pre class="literal-block">
-lastdb -uNEAR db genome.fasta
-lastal -Q1 -e120 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
+lastdb -uNEAR -R01 db genome.fasta
+lastal -Q1 -D10 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
 </pre>
 <p>The -c0 turns off cis-splicing, and the -t0.004 specifies a higher
 probability of trans-splicing.</p>
@@ -551,8 +555,8 @@ probability of trans-splicing.</p>
 middle of the query, we can do "spliced" alignment without considering
 splice signals or favouring cis-splices:</p>
 <pre class="literal-block">
-lastdb -uNEAR db genome.fasta
-lastal -Q1 -e120 db q.fastq | last-split -c0 > out.maf
+lastdb -uNEAR -R01 db genome.fasta
+lastal -Q1 -D100 db q.fastq | last-split -c0 > out.maf
 </pre>
 </div>
 </div>
@@ -605,11 +609,11 @@ increase this value!</td></tr>
 <p>For SPLIT alignment, the default value is e (the lastal score
 threshold).  Alignments with score just above INT will get
 high mismap probabilities.</p>
-<p class="last">For SPLICED alignment, the default value is e + t * ln(1000),
+<p class="last">For SPLICED alignment, the default value is e + t * ln(100),
 where t is a scale factor that is written in the lastal
 header.  This roughly means that, for every alignment it
 writes, it has considered alternative alignments with
-one-thousandth the probability.  Alignments with score just
+one-hundredth the probability.  Alignments with score just
 above INT will not necessarily get high mismap probabilities.</p>
 </td></tr>
 <tr><td class="option-group">
diff --git a/doc/last-split.txt b/doc/last-split.txt
index d781100..609f3ad 100644
--- a/doc/last-split.txt
+++ b/doc/last-split.txt
@@ -20,8 +20,8 @@ Assume the DNA reads are in a file called "q.fastq" (in fastq-sanger
 format), and the genome is in "genome.fasta" (in fasta format).  We
 can do the alignment like this::
 
-  lastdb -uNEAR db genome.fasta
-  lastal -Q1 -e120 db q.fastq | last-split > out.maf
+  lastdb -uNEAR -R01 db genome.fasta
+  lastal -Q1 -D100 db q.fastq | last-split > out.maf
 
 Spliced alignment of RNA reads to a genome
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -31,8 +31,8 @@ time, we provide the genome information to last-split, which causes it
 to do spliced instead of split alignment, and also tells it where the
 splice signals are (GT, AG, etc)::
 
-  lastdb -uNEAR db genome.fasta
-  lastal -Q1 -e120 db q.fastq | last-split -g db > out.maf
+  lastdb -uNEAR -R01 db genome.fasta
+  lastal -Q1 -D10 db q.fastq | last-split -g db > out.maf
 
 This will favour splices starting at GT (and to a lesser extent GC and
 AT), and ending at AG (and to a lesser extent AC).  However, it allows
@@ -41,12 +41,17 @@ introns of typical length, specified by a log-normal distribution
 (i.e. cis-splices).  However, it allows arbitrary trans-splices
 between any two places in the genome.
 
+-D10 sets a very loose significance threshold, so that we can find
+very short parts of a spliced alignment (e.g. short exons).  Note that
+last-split discards the lowest-significance alignments, but it uses
+them to estimate the ambiguity of higher-significance alignments.
+
 Alignment of two whole genomes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 We can align the cat and rat genomes like this::
 
-  lastdb -cR11 -uMAM8 catdb cat.fasta
+  lastdb -uMAM8 -cR11 catdb cat.fasta
   lastal -m100 -E0.05 catdb rat.fasta | last-split -m1 > out.maf
 
 This will align each rat base-pair to at most one cat base-pair, but
@@ -70,7 +75,7 @@ Going faster by parallelization
 
 For example, split alignment of DNA reads to a genome::
 
-  parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
+  parallel-fastq "lastal -Q1 -D100 db | last-split" < q.fastq > out.maf
 
 This requires GNU parallel to be installed
 (http://www.gnu.org/software/parallel/).
@@ -151,8 +156,8 @@ Faster spliced alignment
 Spliced alignment can be slow.  It can be sped up, at a small cost in
 accuracy, by not favouring cis-splices::
 
-  lastdb -uNEAR db genome.fasta
-  lastal -Q1 -e120 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
+  lastdb -uNEAR -R01 db genome.fasta
+  lastal -Q1 -D10 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
 
 The -c0 turns off cis-splicing, and the -t0.004 specifies a higher
 probability of trans-splicing.
@@ -164,8 +169,8 @@ If we do not wish to allow arbitrarily large unaligned parts in the
 middle of the query, we can do "spliced" alignment without considering
 splice signals or favouring cis-splices::
 
-  lastdb -uNEAR db genome.fasta
-  lastal -Q1 -e120 db q.fastq | last-split -c0 > out.maf
+  lastdb -uNEAR -R01 db genome.fasta
+  lastal -Q1 -D100 db q.fastq | last-split -c0 > out.maf
 
 Options
 -------
@@ -213,11 +218,11 @@ Options
          threshold).  Alignments with score just above INT will get
          high mismap probabilities.
 
-         For SPLICED alignment, the default value is e + t * ln(1000),
+         For SPLICED alignment, the default value is e + t * ln(100),
          where t is a scale factor that is written in the lastal
          header.  This roughly means that, for every alignment it
          writes, it has considered alternative alignments with
-         one-thousandth the probability.  Alignments with score just
+         one-hundredth the probability.  Alignments with score just
          above INT will not necessarily get high mismap probabilities.
 
   -n, --no-split
diff --git a/doc/last-tuning.html b/doc/last-tuning.html
index fc16735..11ec408 100644
--- a/doc/last-tuning.html
+++ b/doc/last-tuning.html
@@ -369,18 +369,6 @@ alphabetically earliest.</p>
 </div>
 <div class="section" id="other-options">
 <h2>Other options</h2>
-<div class="section" id="lastdb-i">
-<h3>lastdb -i</h3>
-<p>This option <strong>makes lastdb faster</strong>, but disables some lastal options.
-If lastdb is too slow, try -i10.</p>
-</div>
-<div class="section" id="lastdb-c">
-<h3>lastdb -C</h3>
-<p>This option may make lastal a bit <strong>faster</strong>, but <strong>uses more memory
-and disk</strong>, and makes lastdb slower.  If these downsides are no
-problem, you may as well try it.  -C3 is fastest (at least sometimes)
-but uses most memory, -C2 is almost as fast.</p>
-</div>
 <div class="section" id="lastal-m">
 <h3>lastal -m</h3>
 <p>This option <strong>trades speed for sensitivity</strong>.  It sets the rareness
@@ -397,6 +385,13 @@ the minimum length of initial matches, e.g. -l50 means length 50.
 sensitivity is adequate if the alignments contain long, gapless,
 high-identity matches.</p>
 </div>
+<div class="section" id="lastal-c">
+<h3>lastal -C</h3>
+<p>This option (gapless alignment culling) can make lastal <strong>faster</strong> but
+<strong>less sensitive</strong>.  It can also <strong>reduce redundant output</strong>.  For
+example, -C2 makes it discard alignments (before gapped extension)
+whose query coordinates lie in those of 2 or more stronger alignments.</p>
+</div>
 <div class="section" id="lastal-x">
 <h3>lastal -x</h3>
 <p>This option can make lastal <strong>faster</strong> but <strong>less sensitive</strong>.  It
@@ -408,13 +403,6 @@ parameters and the database size.  You can see it in the lastal header
 after "x=", e.g. by running lastal with no queries.  Then try, say,
 halving it.</p>
 </div>
-<div class="section" id="lastal-c">
-<h3>lastal -C</h3>
-<p>This option (gapless alignment culling) can make lastal <strong>faster</strong> but
-<strong>less sensitive</strong>.  It can also <strong>reduce redundant output</strong>.  For
-example, -C2 makes it discard alignments (before gapped extension)
-whose query coordinates lie in those of 2 or more stronger alignments.</p>
-</div>
 <div class="section" id="id2">
 <h3>lastal -M</h3>
 <p>This option requests "minimum-difference" alignment, which is <strong>faster
@@ -432,6 +420,17 @@ is faster because it skips the gapping phase entirely.)</p>
 <h3>lastal -f</h3>
 <p>Option -fTAB <strong>reduces the output size</strong>, which can improve speed.</p>
 </div>
+<div class="section" id="lastdb-i">
+<h3>lastdb -i</h3>
+<p>This option <strong>makes lastdb faster</strong>, but disables some lastal options.
+If lastdb is too slow, try -i10.</p>
+</div>
+<div class="section" id="lastdb-c2">
+<h3>lastdb -C2</h3>
+<p>This option may make lastal a bit <strong>faster</strong>, but <strong>uses more memory
+and disk</strong>, and makes lastdb slower.  If these downsides are no
+problem, you may as well try it.</p>
+</div>
 <div class="section" id="repeat-masking">
 <h3>Repeat masking</h3>
 <p>This can make LAST <strong>much faster</strong>, produce <strong>less output</strong>, and
diff --git a/doc/last-tuning.txt b/doc/last-tuning.txt
index 90af428..4fe79b2 100644
--- a/doc/last-tuning.txt
+++ b/doc/last-tuning.txt
@@ -64,20 +64,6 @@ The fraction of positions that are "minimum" is roughly: 2 / (W + 1).
 Other options
 ~~~~~~~~~~~~~
 
-lastdb -i
----------
-
-This option **makes lastdb faster**, but disables some lastal options.
-If lastdb is too slow, try -i10.
-
-lastdb -C
----------
-
-This option may make lastal a bit **faster**, but **uses more memory
-and disk**, and makes lastdb slower.  If these downsides are no
-problem, you may as well try it.  -C3 is fastest (at least sometimes)
-but uses most memory, -C2 is almost as fast.
-
 lastal -m
 ---------
 
@@ -96,6 +82,14 @@ the minimum length of initial matches, e.g. -l50 means length 50.
 sensitivity is adequate if the alignments contain long, gapless,
 high-identity matches.
 
+lastal -C
+---------
+
+This option (gapless alignment culling) can make lastal **faster** but
+**less sensitive**.  It can also **reduce redundant output**.  For
+example, -C2 makes it discard alignments (before gapped extension)
+whose query coordinates lie in those of 2 or more stronger alignments.
+
 lastal -x
 ---------
 
@@ -109,14 +103,6 @@ parameters and the database size.  You can see it in the lastal header
 after "x=", e.g. by running lastal with no queries.  Then try, say,
 halving it.
 
-lastal -C
----------
-
-This option (gapless alignment culling) can make lastal **faster** but
-**less sensitive**.  It can also **reduce redundant output**.  For
-example, -C2 makes it discard alignments (before gapped extension)
-whose query coordinates lie in those of 2 or more stronger alignments.
-
 lastal -M
 ---------
 
@@ -137,6 +123,19 @@ lastal -f
 
 Option -fTAB **reduces the output size**, which can improve speed.
 
+lastdb -i
+---------
+
+This option **makes lastdb faster**, but disables some lastal options.
+If lastdb is too slow, try -i10.
+
+lastdb -C2
+----------
+
+This option may make lastal a bit **faster**, but **uses more memory
+and disk**, and makes lastdb slower.  If these downsides are no
+problem, you may as well try it.
+
 Repeat masking
 --------------
 
diff --git a/doc/last-tutorial.html b/doc/last-tutorial.html
index eecd70a..62f419a 100644
--- a/doc/last-tutorial.html
+++ b/doc/last-tutorial.html
@@ -349,11 +349,11 @@ a score=27 EG2=4.7e+04 E=2.6e-05
 s humanMito 2170 145 + 16571 AGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTT...
 s fuguMito  1648 142 + 16447 AGTAGGCTTAGAAGCAGCCACCA--CAAGAAAGCGTT...
 </pre>
-<p>The score is a measure of how strong the similarity is.  EG2 and E are
-explained at <a class="reference external" href="last-evalues.html">last-evalues.html</a>.  Lines starting with "s" contain:
-the sequence name, the start coordinate of the alignment, the number
-of bases spanned by the alignment, the strand, the sequence length,
-and the aligned bases.</p>
+<p>The score is a measure of how significant the similarity is.  EG2 and
+E are explained at <a class="reference external" href="last-evalues.html">last-evalues.html</a>.  Lines starting with "s"
+contain: the sequence name, the start coordinate of the alignment, the
+number of bases spanned by the alignment, the strand, the sequence
+length, and the aligned bases.</p>
 <p>The start coordinates are zero-based.  This means that, if the
 alignment begins right at the start of a sequence, the coordinate is
 0.  If the strand is "-", the start coordinate is in the reverse
@@ -394,14 +394,12 @@ lastal -pPAM30 invdb vertebrate.fa
 <p>(How short is "very short"?  It depends on the amount of sequence data
 we are searching, but perhaps roughly less than 40 amino acids.)</p>
 </div>
-<div class="section" id="example-5-align-human-dna-reads-to-the-human-genome">
-<h2>Example 5: Align human DNA reads to the human genome</h2>
-<p>Suppose we have DNA reads in a file called reads.fastq, in
-fastq-sanger format.  We can align them to the human genome like
-this:</p>
+<div class="section" id="example-5-align-human-dna-sequences-to-the-human-genome">
+<h2>Example 5: Align human DNA sequences to the human genome</h2>
+<p>We can align human DNA sequences to the human genome like this:</p>
 <pre class="literal-block">
-lastdb -uNEAR humandb human/chr*.fa
-lastal -Q1 -e120 humandb reads.fastq | last-split > myalns.maf
+lastdb -uNEAR -R01 humandb human/chr*.fa
+lastal humandb queries.fa | last-split > myalns.maf
 </pre>
 <p>This will use about 15 gigabytes of memory.</p>
 <ul>
@@ -409,19 +407,59 @@ lastal -Q1 -e120 humandb reads.fastq | last-split > myalns.maf
 better at finding short-and-strong similarities.  (It also changes
 the default scoring scheme.)</p>
 </li>
-<li><p class="first">The -Q1 option indicates that the reads are in fastq-sanger format.</p>
-</li>
-<li><p class="first">The -e120 option requests alignments with score ≥ 120.  This is
-intentionally a somewhat low score (high E-value): last-split then
-discards low-confidence alignments, but it uses them to estimate the
-ambiguity of high-confidence alignments.</p>
+<li><p class="first">-R01 tells it to mark simple sequences (such as cacacacacacacacaca)
+by lowercase, but not suppress them.  This has no effect on the
+alignment, but it allows us to see simple sequences in the output,
+and gives us the option to do <a class="reference external" href="last-postmask.html">post-alignment masking</a>.</p>
 </li>
 <li><p class="first">last-split reads the alignments produced by lastal, and looks for a
-unique best alignment for each part of each read.  It allows
-different parts of one read to match different parts of the genome.
-It has several useful options, please see <a class="reference external" href="last-split.html">last-split.html</a>.</p>
+unique best alignment for each part of each query.  It allows
+different parts of one query to match different parts of the genome,
+which may happen due to rearrangements.  It has several useful
+options, please see <a class="reference external" href="last-split.html">last-split.html</a>.</p>
+</li>
+</ul>
+</div>
+<div class="section" id="example-6-find-very-short-dna-alignments">
+<h2>Example 6: Find very short DNA alignments</h2>
+<p>By default, LAST is quite strict, and only reports significant
+alignments that will rarely occur by chance.  In the preceding
+example, the minimum alignment length is about 28 bases (less for
+smaller genomes).  To find shorter alignments, we must down-tune the
+strictness:</p>
+<pre class="literal-block">
+lastdb -uNEAR -R01 humandb human/chr*.fa
+lastal -D100 humandb queries.fa | last-split -m1 > myalns.maf
+</pre>
+<ul>
+<li><p class="first">-D100 makes lastal report alignments that could occur by chance once
+per hundred query letters.  (The default is once per million.)</p>
+</li>
+<li><p class="first">-m1 tells last-split to keep low-confidence alignments.</p>
 </li>
 </ul>
+<p>In this example, the minimum alignment length is about 20 bases (less
+for smaller genomes).</p>
+</div>
+<div class="section" id="example-7-align-human-fastq-sequences-to-the-human-genome">
+<h2>Example 7: Align human fastq sequences to the human genome</h2>
+<p>DNA sequences are not always perfectly accurate, and they are
+sometimes provided in fastq format, which indicates the reliability of
+each base.  LAST can use this information to improve alignment
+accuracy.  (It assumes the reliabilities reflect substitution errors,
+not insertion/deletion errors: if that is not true, it may be better
+to use fasta format.)  Option -Q1 indicates fastq-sanger format:</p>
+<pre class="literal-block">
+lastdb -uNEAR -R01 humandb human/chr*.fa
+lastal -Q1 -D100 humandb queries.fastq | last-split > myalns.maf
+</pre>
+</div>
+<div class="section" id="fastq-format-confusion">
+<h2>Fastq format confusion</h2>
+<p>Unfortunately, there is more than one fastq format (see
+<a class="reference external" href="http://nar.oxfordjournals.org/content/38/6/1767.long">http://nar.oxfordjournals.org/content/38/6/1767.long</a>).  Recently
+(2013) fastq-sanger seems to be dominant, but if you have another
+variant you need to change the -Q option (see <a class="reference external" href="lastal.html">lastal.html</a>).</p>
 </div>
 <div class="section" id="paired-reads">
 <h2>Paired reads</h2>
@@ -437,51 +475,6 @@ the two reads in a pair to match (e.g.) different chromosomes.</p>
 </li>
 </ol>
 </div>
-<div class="section" id="fastq-format-confusion">
-<h2>Fastq format confusion</h2>
-<p>Unfortunately, there is more than one fastq format (see
-<a class="reference external" href="http://nar.oxfordjournals.org/content/38/6/1767.long">http://nar.oxfordjournals.org/content/38/6/1767.long</a>).  Recently
-(2013) fastq-sanger seems to be dominant, but if you have another
-variant you need to change the -Q option (see <a class="reference external" href="lastal.html">lastal.html</a>).</p>
-</div>
-<div class="section" id="example-6-align-human-fasta-reads-to-the-human-genome">
-<h2>Example 6: Align human fasta reads to the human genome</h2>
-<p>If our reads are in fasta instead of fastq format, we simply omit -Q:</p>
-<pre class="literal-block">
-lastdb -uNEAR humandb human/chr*.fa
-lastal -e120 humandb reads.fa | last-split > myalns.maf
-</pre>
-<p>(In older versions of LAST, we had to set a short-and-strong scoring
-scheme, but this is now done automatically by -uNEAR.)</p>
-</div>
-<div class="section" id="example-7-align-aardvark-fastq-reads-to-the-human-genome">
-<h2>Example 7: Align aardvark fastq reads to the human genome</h2>
-<p>In this case we expect weak similarities, so we omit -uNEAR.  We also
-need to change the scoring scheme (because with -Q1 it defaults to a
-short-and-strong scoring scheme):</p>
-<pre class="literal-block">
-lastdb -cR01 humandb human/chr*.fa
-lastal -Q1 -r5 -q5 -a35 -b5 humandb reads.fastq > myalns.maf
-</pre>
-<p>Option -r5 sets the match score to 5, -q5 sets the mismatch cost to 5,
-while -a35 and -b5 set the gap cost to 35 + 5×(gap length).</p>
-<p>(Why use 5:5:35:5 rather than 1:1:7:1?  The reason is that 5:5:35:5
-has roughly the same scale as the fastq quality scores.  lastal uses
-the quality scores to modify the alignment scores, and then rounds the
-modified scores to integers.  If we used 1:1:7:1, the integer-rounding
-would lose information.)</p>
-</div>
-<div class="section" id="very-short-reads">
-<h2>Very short reads</h2>
-<p>WARNING!  The standard score parameters do not align very short reads.
-This is because the match score is 6 and the score threshold is 120,
-so at least 20 high-quality matches are required (or a greater number
-of low-quality matches).  In addition, last-split discards
-low-confidence alignments.  To align very short reads, reduce lastal's
-score threshold (-e) or increase last-split's error threshold (-m).</p>
-<p>If the score threshold is too low, you will get meaningless, random
-alignments.</p>
-</div>
 <div class="section" id="tuning-speed-sensitivity-memory-and-disk-usage">
 <h2>Tuning speed, sensitivity, memory and disk usage</h2>
 <ul>
@@ -496,7 +489,7 @@ alignments.</p>
 <p>If you have ~50 GB of memory and don't mind waiting a few days, this
 is a good way to compare such genomes:</p>
 <pre class="literal-block">
-lastdb -cR11 -uMAM8 catdb cat.fa
+lastdb -uMAM8 -cR11 catdb cat.fa
 lastal -m100 -E0.05 catdb rat.fa | last-split -m1 > out.maf
 </pre>
 <p>This looks for a unique best alignment for each part of each rat
@@ -515,7 +508,7 @@ maf-swap out.maf | last-split -m1 > out2.maf
 <p>For strongly similar genomes (e.g. 99% identity), something like this
 is more appropriate:</p>
 <pre class="literal-block">
-lastdb -cR11 -uNEAR human human.fa
+lastdb -uNEAR -cR11 human human.fa
 lastal -m50 -E0.05 human chimp.fa | last-split -m1 > out.maf
 </pre>
 </div>
diff --git a/doc/last-tutorial.txt b/doc/last-tutorial.txt
index 62f1326..cbff52a 100644
--- a/doc/last-tutorial.txt
+++ b/doc/last-tutorial.txt
@@ -36,11 +36,11 @@ Each alignment looks like this::
   s humanMito 2170 145 + 16571 AGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTT...
   s fuguMito  1648 142 + 16447 AGTAGGCTTAGAAGCAGCCACCA--CAAGAAAGCGTT...
 
-The score is a measure of how strong the similarity is.  EG2 and E are
-explained at `<last-evalues.html>`_.  Lines starting with "s" contain:
-the sequence name, the start coordinate of the alignment, the number
-of bases spanned by the alignment, the strand, the sequence length,
-and the aligned bases.
+The score is a measure of how significant the similarity is.  EG2 and
+E are explained at `<last-evalues.html>`_.  Lines starting with "s"
+contain: the sequence name, the start coordinate of the alignment, the
+number of bases spanned by the alignment, the strand, the sequence
+length, and the aligned bases.
 
 The start coordinates are zero-based.  This means that, if the
 alignment begins right at the start of a sequence, the coordinate is
@@ -86,15 +86,13 @@ scoring scheme may work well::
 (How short is "very short"?  It depends on the amount of sequence data
 we are searching, but perhaps roughly less than 40 amino acids.)
 
-Example 5: Align human DNA reads to the human genome
-----------------------------------------------------
+Example 5: Align human DNA sequences to the human genome
+--------------------------------------------------------
 
-Suppose we have DNA reads in a file called reads.fastq, in
-fastq-sanger format.  We can align them to the human genome like
-this::
+We can align human DNA sequences to the human genome like this::
 
-  lastdb -uNEAR humandb human/chr*.fa
-  lastal -Q1 -e120 humandb reads.fastq | last-split > myalns.maf
+  lastdb -uNEAR -R01 humandb human/chr*.fa
+  lastal humandb queries.fa | last-split > myalns.maf
 
 This will use about 15 gigabytes of memory.
 
@@ -102,30 +100,50 @@ This will use about 15 gigabytes of memory.
   better at finding short-and-strong similarities.  (It also changes
   the default scoring scheme.)
 
-* The -Q1 option indicates that the reads are in fastq-sanger format.
-
-* The -e120 option requests alignments with score ≥ 120.  This is
-  intentionally a somewhat low score (high E-value): last-split then
-  discards low-confidence alignments, but it uses them to estimate the
-  ambiguity of high-confidence alignments.
+* -R01 tells it to mark simple sequences (such as cacacacacacacacaca)
+  by lowercase, but not suppress them.  This has no effect on the
+  alignment, but it allows us to see simple sequences in the output,
+  and gives us the option to do `post-alignment masking
+  <last-postmask.html>`_.
 
 * last-split reads the alignments produced by lastal, and looks for a
-  unique best alignment for each part of each read.  It allows
-  different parts of one read to match different parts of the genome.
-  It has several useful options, please see `<last-split.html>`_.
+  unique best alignment for each part of each query.  It allows
+  different parts of one query to match different parts of the genome,
+  which may happen due to rearrangements.  It has several useful
+  options, please see `<last-split.html>`_.
 
-Paired reads
-------------
+Example 6: Find very short DNA alignments
+-----------------------------------------
 
-If you have paired reads, there are two options:
+By default, LAST is quite strict, and only reports significant
+alignments that will rarely occur by chance.  In the preceding
+example, the minimum alignment length is about 28 bases (less for
+smaller genomes).  To find shorter alignments, we must down-tune the
+strictness::
 
-1. Use last-pair-probs (see `<last-pair-probs.html>`_).
+  lastdb -uNEAR -R01 humandb human/chr*.fa
+  lastal -D100 humandb queries.fa | last-split -m1 > myalns.maf
 
-2. Ignore the pairing information, and align the reads individually
-   (using last-split as above).  This may be useful because
-   last-pair-probs does not currently allow different parts of one
-   read to match different parts of the genome, though it does allow
-   the two reads in a pair to match (e.g.) different chromosomes.
+* -D100 makes lastal report alignments that could occur by chance once
+  per hundred query letters.  (The default is once per million.)
+
+* -m1 tells last-split to keep low-confidence alignments.
+
+In this example, the minimum alignment length is about 20 bases (less
+for smaller genomes).
+
+Example 7: Align human fastq sequences to the human genome
+----------------------------------------------------------
+
+DNA sequences are not always perfectly accurate, and they are
+sometimes provided in fastq format, which indicates the reliability of
+each base.  LAST can use this information to improve alignment
+accuracy.  (It assumes the reliabilities reflect substitution errors,
+not insertion/deletion errors: if that is not true, it may be better
+to use fasta format.)  Option -Q1 indicates fastq-sanger format::
+
+  lastdb -uNEAR -R01 humandb human/chr*.fa
+  lastal -Q1 -D100 humandb queries.fastq | last-split > myalns.maf
 
 Fastq format confusion
 ----------------------
@@ -135,48 +153,18 @@ http://nar.oxfordjournals.org/content/38/6/1767.long).  Recently
 (2013) fastq-sanger seems to be dominant, but if you have another
 variant you need to change the -Q option (see `<lastal.html>`_).
 
-Example 6: Align human fasta reads to the human genome
-------------------------------------------------------
-
-If our reads are in fasta instead of fastq format, we simply omit -Q::
-
-  lastdb -uNEAR humandb human/chr*.fa
-  lastal -e120 humandb reads.fa | last-split > myalns.maf
-
-(In older versions of LAST, we had to set a short-and-strong scoring
-scheme, but this is now done automatically by -uNEAR.)
-
-Example 7: Align aardvark fastq reads to the human genome
----------------------------------------------------------
-
-In this case we expect weak similarities, so we omit -uNEAR.  We also
-need to change the scoring scheme (because with -Q1 it defaults to a
-short-and-strong scoring scheme)::
-
-  lastdb -cR01 humandb human/chr*.fa
-  lastal -Q1 -r5 -q5 -a35 -b5 humandb reads.fastq > myalns.maf
-
-Option -r5 sets the match score to 5, -q5 sets the mismatch cost to 5,
-while -a35 and -b5 set the gap cost to 35 + 5×(gap length).
-
-(Why use 5:5:35:5 rather than 1:1:7:1?  The reason is that 5:5:35:5
-has roughly the same scale as the fastq quality scores.  lastal uses
-the quality scores to modify the alignment scores, and then rounds the
-modified scores to integers.  If we used 1:1:7:1, the integer-rounding
-would lose information.)
+Paired reads
+------------
 
-Very short reads
-----------------
+If you have paired reads, there are two options:
 
-WARNING!  The standard score parameters do not align very short reads.
-This is because the match score is 6 and the score threshold is 120,
-so at least 20 high-quality matches are required (or a greater number
-of low-quality matches).  In addition, last-split discards
-low-confidence alignments.  To align very short reads, reduce lastal's
-score threshold (-e) or increase last-split's error threshold (-m).
+1. Use last-pair-probs (see `<last-pair-probs.html>`_).
 
-If the score threshold is too low, you will get meaningless, random
-alignments.
+2. Ignore the pairing information, and align the reads individually
+   (using last-split as above).  This may be useful because
+   last-pair-probs does not currently allow different parts of one
+   read to match different parts of the genome, though it does allow
+   the two reads in a pair to match (e.g.) different chromosomes.
 
 Tuning speed, sensitivity, memory and disk usage
 ------------------------------------------------
@@ -193,7 +181,7 @@ Example 8: Compare the cat and rat genomes
 If you have ~50 GB of memory and don't mind waiting a few days, this
 is a good way to compare such genomes::
 
-  lastdb -cR11 -uMAM8 catdb cat.fa
+  lastdb -uMAM8 -cR11 catdb cat.fa
   lastal -m100 -E0.05 catdb rat.fa | last-split -m1 > out.maf
 
 This looks for a unique best alignment for each part of each rat
@@ -213,7 +201,7 @@ Example 9: Compare the human and chimp genomes
 For strongly similar genomes (e.g. 99% identity), something like this
 is more appropriate::
 
-  lastdb -cR11 -uNEAR human human.fa
+  lastdb -uNEAR -cR11 human human.fa
   lastal -m50 -E0.05 human chimp.fa | last-split -m1 > out.maf
 
 Example 10: Ambiguity of alignment columns
diff --git a/doc/lastal.html b/doc/lastal.html
index 319ba5d..0dea52a 100644
--- a/doc/lastal.html
+++ b/doc/lastal.html
@@ -410,7 +410,8 @@ alignment length, mismatches, gap opens, query start, query end,
 reference start, reference end, E-value, bit score.  The start
 coordinates are one-based.  <em>Warning:</em> this is a lossy format,
 because it does not show gap positions.  <em>Warning:</em> the other
-LAST programs cannot read this format.</p>
+LAST programs cannot read this format.  <em>Warning:</em> <a class="reference external" href="last-evalues.html">"bit score"
+is not the same as "score"</a>.</p>
 <p><strong>BlastTab+</strong> format is the same as BlastTab, with 2 extra
 columns at the end: length of query sequence and length of
 reference sequence.  More columns might be added in future.</p>
@@ -421,6 +422,26 @@ MAF.</p>
 </table>
 </blockquote>
 </div>
+<div class="section" id="e-value-options">
+<h3>E-value options</h3>
+<blockquote>
+<table class="docutils option-list" frame="void" rules="none">
+<col class="option" />
+<col class="description" />
+<tbody valign="top">
+<tr><td class="option-group">
+<kbd><span class="option">-D <var>LENGTH</var></span></kbd></td>
+<td>Report alignments that are expected by chance at most once per
+LENGTH query letters.  This option only affects the default
+value of -E, so if you specify -E then -D has no effect.</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-E <var>THRESHOLD</var></span></kbd></td>
+<td>Maximum EG2 (<a class="reference external" href="last-evalues.html">expected alignments per square giga</a>).  This option only affects the default
+value of -e, so if you specify -e then -E has no effect.</td></tr>
+</tbody>
+</table>
+</blockquote>
+</div>
 <div class="section" id="score-options">
 <h3>Score options</h3>
 <blockquote>
@@ -519,27 +540,6 @@ both, -e will prevail.)</td></tr>
 </table>
 </blockquote>
 </div>
-<div class="section" id="e-value-options">
-<h3>E-value options</h3>
-<blockquote>
-<table class="docutils option-list" frame="void" rules="none">
-<col class="option" />
-<col class="description" />
-<tbody valign="top">
-<tr><td class="option-group">
-<kbd><span class="option">-D <var>LENGTH</var></span></kbd></td>
-<td>Report alignments that are expected by chance at most once per
-LENGTH query letters.  This option only affects the default
-value of -E, so if you specify -E then -D has no effect.</td></tr>
-<tr><td class="option-group">
-<kbd><span class="option">-E <var>THRESHOLD</var></span></kbd></td>
-<td>Maximum EG2 (expected alignments per square giga).  This option
-only affects the default value of -e, so if you specify -e then
--E has no effect.</td></tr>
-</tbody>
-</table>
-</blockquote>
-</div>
 <div class="section" id="initial-match-options">
 <h3>Initial-match options</h3>
 <blockquote>
@@ -599,6 +599,36 @@ aligned to either strand of the query.  1 means that the matrix
 applies to either strand of the reference aligned to the forward
 strand of the query.</td></tr>
 <tr><td class="option-group">
+<kbd><span class="option">-K <var>LIMIT</var></span></kbd></td>
+<td>Omit any alignment whose query range lies in LIMIT or more other
+alignments with higher score (and on the same strand).  This is
+a useful way to get just the top few hits to each part of each
+query (P Berman et al. 2000, J Comput Biol 7:293-302).</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-C <var>LIMIT</var></span></kbd></td>
+<td>Before extending gapped alignments, discard any gapless
+alignment whose query range lies in LIMIT or more others (for
+the same strand and volume) with higher score-per-length.  This
+can reduce run time and output size (MC Frith & R Kawaguchi
+2015, Genome Biol 16:106).</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-P <var>THREADS</var></span></kbd></td>
+<td>Divide the work between this number of threads running in
+parallel.  0 means use as many threads as your computer claims
+it can handle simultaneously.  Single query sequences are not
+divided between threads, so you need multiple queries per batch
+for this option to take effect.</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-i <var>BYTES</var></span></kbd></td>
+<td><p class="first">Search queries in batches of at most this many bytes.  If a
+single sequence exceeds this amount, however, it is not split.
+You can use suffixes K, M, and G to specify KibiBytes,
+MebiBytes, and GibiBytes.  This option has no effect on the
+results (apart from their order).</p>
+<p class="last">If the reference was split into volumes by lastdb, then each
+volume will be read into memory once per query batch.</p>
+</td></tr>
+<tr><td class="option-group">
 <kbd><span class="option">-M</span></kbd></td>
 <td><p class="first">Find minimum-difference alignments, which is faster but cruder.
 This treats all matches the same, and minimizes the number of
@@ -638,36 +668,6 @@ start at one query position, if it gets COUNT successful
 extensions, it skips any remaining initial matches starting at
 that position.</td></tr>
 <tr><td class="option-group">
-<kbd><span class="option">-C <var>LIMIT</var></span></kbd></td>
-<td>Before extending gapped alignments, discard any gapless
-alignment whose query range lies in LIMIT or more others (for
-the same strand and volume) with higher score-per-length.  This
-can reduce run time and output size (MC Frith & R Kawaguchi
-2015, Genome Biol 16:106).</td></tr>
-<tr><td class="option-group">
-<kbd><span class="option">-K <var>LIMIT</var></span></kbd></td>
-<td>Omit any alignment whose query range lies in LIMIT or more other
-alignments with higher score (and on the same strand).  This is
-a useful way to get just the top few hits to each part of each
-query (P Berman et al. 2000, J Comput Biol 7:293-302).</td></tr>
-<tr><td class="option-group">
-<kbd><span class="option">-i <var>BYTES</var></span></kbd></td>
-<td><p class="first">Search queries in batches of at most this many bytes.  If a
-single sequence exceeds this amount, however, it is not split.
-You can use suffixes K, M, and G to specify KibiBytes,
-MebiBytes, and GibiBytes.  This option has no effect on the
-results (apart from their order).</p>
-<p class="last">If the reference was split into volumes by lastdb, then each
-volume will be read into memory once per query batch.</p>
-</td></tr>
-<tr><td class="option-group">
-<kbd><span class="option">-P <var>THREADS</var></span></kbd></td>
-<td>Divide the work between this number of threads running in
-parallel.  0 means use as many threads as your computer claims
-it can handle simultaneously.  Single query sequences are not
-divided between threads, so you need multiple queries per batch
-for this option to take effect.</td></tr>
-<tr><td class="option-group">
 <kbd><span class="option">-R <var>DIGITS</var></span></kbd></td>
 <td><p class="first">Specify lowercase-marking of repeats, by two digits (e.g. "-R 01"),
 with the following meanings.</p>
diff --git a/doc/lastal.txt b/doc/lastal.txt
index 7e93663..a8f6b32 100644
--- a/doc/lastal.txt
+++ b/doc/lastal.txt
@@ -90,7 +90,8 @@ Cosmetic options
       reference start, reference end, E-value, bit score.  The start
       coordinates are one-based.  *Warning:* this is a lossy format,
       because it does not show gap positions.  *Warning:* the other
-      LAST programs cannot read this format.
+      LAST programs cannot read this format.  *Warning:* `"bit score"
+      is not the same as "score" <last-evalues.html>`_.
 
       **BlastTab+** format is the same as BlastTab, with 2 extra
       columns at the end: length of query sequence and length of
@@ -99,6 +100,19 @@ Cosmetic options
       For backwards compatibility, a NAME of 0 means TAB and 1 means
       MAF.
 
+E-value options
+~~~~~~~~~~~~~~~
+
+  -D LENGTH
+      Report alignments that are expected by chance at most once per
+      LENGTH query letters.  This option only affects the default
+      value of -E, so if you specify -E then -D has no effect.
+
+  -E THRESHOLD
+      Maximum EG2 (`expected alignments per square giga
+      <last-evalues.html>`_).  This option only affects the default
+      value of -e, so if you specify -e then -E has no effect.
+
 Score options
 ~~~~~~~~~~~~~
 
@@ -188,19 +202,6 @@ Score options
       option -j1, then -d and -e mean the same thing.  If you set
       both, -e will prevail.)
 
-E-value options
-~~~~~~~~~~~~~~~
-
-  -D LENGTH
-      Report alignments that are expected by chance at most once per
-      LENGTH query letters.  This option only affects the default
-      value of -E, so if you specify -E then -D has no effect.
-
-  -E THRESHOLD
-      Maximum EG2 (expected alignments per square giga).  This option
-      only affects the default value of -e, so if you specify -e then
-      -E has no effect.
-
 Initial-match options
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -247,6 +248,36 @@ Miscellaneous options
       applies to either strand of the reference aligned to the forward
       strand of the query.
 
+  -K LIMIT
+      Omit any alignment whose query range lies in LIMIT or more other
+      alignments with higher score (and on the same strand).  This is
+      a useful way to get just the top few hits to each part of each
+      query (P Berman et al. 2000, J Comput Biol 7:293-302).
+
+  -C LIMIT
+      Before extending gapped alignments, discard any gapless
+      alignment whose query range lies in LIMIT or more others (for
+      the same strand and volume) with higher score-per-length.  This
+      can reduce run time and output size (MC Frith & R Kawaguchi
+      2015, Genome Biol 16:106).
+
+  -P THREADS
+      Divide the work between this number of threads running in
+      parallel.  0 means use as many threads as your computer claims
+      it can handle simultaneously.  Single query sequences are not
+      divided between threads, so you need multiple queries per batch
+      for this option to take effect.
+
+  -i BYTES
+      Search queries in batches of at most this many bytes.  If a
+      single sequence exceeds this amount, however, it is not split.
+      You can use suffixes K, M, and G to specify KibiBytes,
+      MebiBytes, and GibiBytes.  This option has no effect on the
+      results (apart from their order).
+
+      If the reference was split into volumes by lastdb, then each
+      volume will be read into memory once per query batch.
+
   -M  Find minimum-difference alignments, which is faster but cruder.
       This treats all matches the same, and minimizes the number of
       differences (mismatches plus gaps).
@@ -279,36 +310,6 @@ Miscellaneous options
       extensions, it skips any remaining initial matches starting at
       that position.
 
-  -C LIMIT
-      Before extending gapped alignments, discard any gapless
-      alignment whose query range lies in LIMIT or more others (for
-      the same strand and volume) with higher score-per-length.  This
-      can reduce run time and output size (MC Frith & R Kawaguchi
-      2015, Genome Biol 16:106).
-
-  -K LIMIT
-      Omit any alignment whose query range lies in LIMIT or more other
-      alignments with higher score (and on the same strand).  This is
-      a useful way to get just the top few hits to each part of each
-      query (P Berman et al. 2000, J Comput Biol 7:293-302).
-
-  -i BYTES
-      Search queries in batches of at most this many bytes.  If a
-      single sequence exceeds this amount, however, it is not split.
-      You can use suffixes K, M, and G to specify KibiBytes,
-      MebiBytes, and GibiBytes.  This option has no effect on the
-      results (apart from their order).
-
-      If the reference was split into volumes by lastdb, then each
-      volume will be read into memory once per query batch.
-
-  -P THREADS
-      Divide the work between this number of threads running in
-      parallel.  0 means use as many threads as your computer claims
-      it can handle simultaneously.  Single query sequences are not
-      divided between threads, so you need multiple queries per batch
-      for this option to take effect.
-
   -R DIGITS
       Specify lowercase-marking of repeats, by two digits (e.g. "-R 01"),
       with the following meanings.
diff --git a/doc/lastdb.html b/doc/lastdb.html
index aeb828d..c5dfc0b 100644
--- a/doc/lastdb.html
+++ b/doc/lastdb.html
@@ -388,6 +388,18 @@ for ~80% AT-rich genomes.</p>
 these sequences to some other sequences using lastal, lowercase
 letters will be excluded from initial matches.  This will apply
 to lowercase letters in both sets of sequences.</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-u <var>NAME</var></span></kbd></td>
+<td><p class="first">Specify a seeding scheme.  The -m option will then be ignored.
+The built-in schemes are described in <a class="reference external" href="last-seeds.html">last-seeds.html</a>.</p>
+<p class="last">Any other NAME is assumed to be a file name.  For an example of
+the format, see the seed files in the data directory.  You can
+set other lastdb options on lines starting with <tt class="docutils literal">#lastdb</tt>, but
+command line options override them.  You can also set lastal
+options on lines starting with <tt class="docutils literal">#lastal</tt>, which are overridden
+by options from a <a class="reference external" href="last-matrices.html">scoring scheme</a> or the
+lastal command line.</p>
+</td></tr>
 </tbody>
 </table>
 </blockquote>
@@ -400,13 +412,6 @@ to lowercase letters in both sets of sequences.</td></tr>
 <col class="description" />
 <tbody valign="top">
 <tr><td class="option-group">
-<kbd><span class="option">-Q <var>NUMBER</var></span></kbd></td>
-<td>Specify the input format.  0 means fasta, 1 means fastq-sanger,
-2 means fastq-solexa, and 3 means fastq-illumina.  The fastq
-formats provide sequence quality data, which will be stored by
-lastdb and then used by lastal.  These formats are described in
-<a class="reference external" href="lastal.html">lastal.html</a>.</td></tr>
-<tr><td class="option-group">
 <kbd><span class="option">-w <var>STEP</var></span></kbd></td>
 <td>Allow initial matches to start only at every STEP-th position in
 each of the sequences given to lastdb.  This reduces the memory
@@ -445,6 +450,19 @@ lastdb will refuse to process any single sequence longer than
 about 4 billion.</p>
 </td></tr>
 <tr><td class="option-group">
+<kbd><span class="option">-Q <var>NUMBER</var></span></kbd></td>
+<td>Specify the input format.  0 means fasta, 1 means fastq-sanger,
+2 means fastq-solexa, and 3 means fastq-illumina.  The fastq
+formats provide sequence quality data, which will be stored by
+lastdb and then used by lastal.  These formats are described in
+<a class="reference external" href="lastal.html">lastal.html</a>.</td></tr>
+<tr><td class="option-group">
+<kbd><span class="option">-P <var>THREADS</var></span></kbd></td>
+<td>Divide the work between this number of threads running in
+parallel.  0 means use as many threads as your computer claims
+it can handle simultaneously.  Currently, multi-threading is
+used for tantan masking only.</td></tr>
+<tr><td class="option-group">
 <kbd><span class="option">-m <var>PATTERN</var></span></kbd></td>
 <td><p class="first">Specify a spaced seed pattern, for example "-m 110101".  In this
 example, mismatches will be allowed at every third and fifth
@@ -465,24 +483,6 @@ Alternatively, you can use Iedera's notation, for example
 and/or using "-m" multiple times.</p>
 </td></tr>
 <tr><td class="option-group">
-<kbd><span class="option">-u <var>NAME</var></span></kbd></td>
-<td><p class="first">Specify a seeding scheme.  The -m option will then be ignored.
-The built-in schemes are described in <a class="reference external" href="last-seeds.html">last-seeds.html</a>.</p>
-<p class="last">Any other NAME is assumed to be a file name.  For an example of
-the format, see the seed files in the data directory.  You can
-set other lastdb options on lines starting with <tt class="docutils literal">#lastdb</tt>, but
-command line options override them.  You can also set lastal
-options on lines starting with <tt class="docutils literal">#lastal</tt>, which are overridden
-by options from a <a class="reference external" href="last-matrices.html">scoring scheme</a> or the
-lastal command line.</p>
-</td></tr>
-<tr><td class="option-group">
-<kbd><span class="option">-P <var>THREADS</var></span></kbd></td>
-<td>Divide the work between this number of threads running in
-parallel.  0 means use as many threads as your computer claims
-it can handle simultaneously.  Currently, multi-threading is
-used for tantan masking only.</td></tr>
-<tr><td class="option-group">
 <kbd><span class="option">-a <var>SYMBOLS</var></span></kbd></td>
 <td>Specify your own alphabet, e.g. "-a 0123".  The default (DNA)
 alphabet is equivalent to "-a ACGT".  The protein alphabet (-p)
@@ -507,8 +507,9 @@ at most one byte per possible match start position.</td></tr>
 <td>Specify the type of "child table" to make: 0 means none, 1 means
 byte-size (uses a little more memory), 2 means short-size (uses
 somewhat more memory), 3 means full (uses a lot more memory).
-Choices > 0 make lastdb slower, and may make lastal successively
-faster.</td></tr>
+Choices > 0 make lastal a bit faster, but make lastdb slower,
+and have no effect on lastal's results.  Some tests suggest that
+-C2 is a good choice: faster than -C1 and no slower than -C3.</td></tr>
 <tr><td class="option-group">
 <kbd><span class="option">-x</span></kbd></td>
 <td>Just count sequences and letters.  This is much faster.  Letter
diff --git a/doc/lastdb.txt b/doc/lastdb.txt
index 32a530c..b0485fb 100644
--- a/doc/lastdb.txt
+++ b/doc/lastdb.txt
@@ -61,16 +61,21 @@ Main Options
       letters will be excluded from initial matches.  This will apply
       to lowercase letters in both sets of sequences.
 
+  -u NAME
+      Specify a seeding scheme.  The -m option will then be ignored.
+      The built-in schemes are described in `<last-seeds.html>`_.
+
+      Any other NAME is assumed to be a file name.  For an example of
+      the format, see the seed files in the data directory.  You can
+      set other lastdb options on lines starting with ``#lastdb``, but
+      command line options override them.  You can also set lastal
+      options on lines starting with ``#lastal``, which are overridden
+      by options from a `scoring scheme <last-matrices.html>`_ or the
+      lastal command line.
+
 Advanced Options
 ~~~~~~~~~~~~~~~~
 
-  -Q NUMBER
-      Specify the input format.  0 means fasta, 1 means fastq-sanger,
-      2 means fastq-solexa, and 3 means fastq-illumina.  The fastq
-      formats provide sequence quality data, which will be stored by
-      lastdb and then used by lastal.  These formats are described in
-      `<lastal.html>`_.
-
   -w STEP
       Allow initial matches to start only at every STEP-th position in
       each of the sequences given to lastdb.  This reduces the memory
@@ -113,6 +118,19 @@ Advanced Options
       lastdb will refuse to process any single sequence longer than
       about 4 billion.
 
+  -Q NUMBER
+      Specify the input format.  0 means fasta, 1 means fastq-sanger,
+      2 means fastq-solexa, and 3 means fastq-illumina.  The fastq
+      formats provide sequence quality data, which will be stored by
+      lastdb and then used by lastal.  These formats are described in
+      `<lastal.html>`_.
+
+  -P THREADS
+      Divide the work between this number of threads running in
+      parallel.  0 means use as many threads as your computer claims
+      it can handle simultaneously.  Currently, multi-threading is
+      used for tantan masking only.
+
   -m PATTERN
       Specify a spaced seed pattern, for example "-m 110101".  In this
       example, mismatches will be allowed at every third and fifth
@@ -136,24 +154,6 @@ Advanced Options
       You can specify multiple patterns by separating them with commas
       and/or using "-m" multiple times.
 
-  -u NAME
-      Specify a seeding scheme.  The -m option will then be ignored.
-      The built-in schemes are described in `<last-seeds.html>`_.
-
-      Any other NAME is assumed to be a file name.  For an example of
-      the format, see the seed files in the data directory.  You can
-      set other lastdb options on lines starting with ``#lastdb``, but
-      command line options override them.  You can also set lastal
-      options on lines starting with ``#lastal``, which are overridden
-      by options from a `scoring scheme <last-matrices.html>`_ or the
-      lastal command line.
-
-  -P THREADS
-      Divide the work between this number of threads running in
-      parallel.  0 means use as many threads as your computer claims
-      it can handle simultaneously.  Currently, multi-threading is
-      used for tantan masking only.
-
   -a SYMBOLS
       Specify your own alphabet, e.g. "-a 0123".  The default (DNA)
       alphabet is equivalent to "-a ACGT".  The protein alphabet (-p)
@@ -178,8 +178,9 @@ Advanced Options
       Specify the type of "child table" to make: 0 means none, 1 means
       byte-size (uses a little more memory), 2 means short-size (uses
       somewhat more memory), 3 means full (uses a lot more memory).
-      Choices > 0 make lastdb slower, and may make lastal successively
-      faster.
+      Choices > 0 make lastal a bit faster, but make lastdb slower,
+      and have no effect on lastal's results.  Some tests suggest that
+      -C2 is a good choice: faster than -C1 and no slower than -C3.
 
   -x  Just count sequences and letters.  This is much faster.  Letter
       counting is never case-sensitive.
diff --git a/examples/last-bisulfite-paired.sh b/examples/last-bisulfite-paired.sh
index 41582a8..c94428f 100755
--- a/examples/last-bisulfite-paired.sh
+++ b/examples/last-bisulfite-paired.sh
@@ -30,13 +30,13 @@ trap 'rm -f $tmp.*' EXIT
 cat > $tmp.script << 'EOF'
 t=$1.$$
 
-lastal -pBISF -s1 -Q1 -e120 -i1 "$2" "$4" > $t.t1f
-lastal -pBISR -s0 -Q1 -e120 -i1 "$3" "$4" > $t.t1r
+lastal -pBISF -s1 -Q1 -D1000 -i1 "$2" "$4" > $t.t1f
+lastal -pBISR -s0 -Q1 -D1000 -i1 "$3" "$4" > $t.t1r
 last-merge-batches $t.t1f $t.t1r > $t.t1
 rm $t.t1f $t.t1r
 
-lastal -pBISF -s0 -Q1 -e120 -i1 "$2" "$5" > $t.t2f
-lastal -pBISR -s1 -Q1 -e120 -i1 "$3" "$5" > $t.t2r
+lastal -pBISF -s0 -Q1 -D1000 -i1 "$2" "$5" > $t.t2f
+lastal -pBISR -s1 -Q1 -D1000 -i1 "$3" "$5" > $t.t2r
 last-merge-batches $t.t2f $t.t2r > $t.t2
 rm $t.t2f $t.t2r
 
diff --git a/examples/last-bisulfite.sh b/examples/last-bisulfite.sh
index 4946614..59dec48 100755
--- a/examples/last-bisulfite.sh
+++ b/examples/last-bisulfite.sh
@@ -30,8 +30,8 @@ trap 'rm -f $tmp.*' EXIT
 # Convert C to t, and all other letters to uppercase:
 perl -pe 'y/Cca-z/ttA-Z/ if $. % 4 == 2' "$@" > "$tmp".q
 
-lastal -pBISF -s1 -Q1 -e120 "$my_f" "$tmp".q > "$tmp".f
-lastal -pBISR -s0 -Q1 -e120 "$my_r" "$tmp".q > "$tmp".r
+lastal -pBISF -s1 -Q1 -D1000 "$my_f" "$tmp".q > "$tmp".f
+lastal -pBISR -s0 -Q1 -D1000 "$my_r" "$tmp".q > "$tmp".r
 
 last-merge-batches "$tmp".f "$tmp".r | last-split -m0.1 |
 perl -F'(\s+)' -ane '$F[12] =~ y/ta/CG/ if /^s/ and $s++ % 2; print @F'
diff --git a/src/LastalArguments.cc b/src/LastalArguments.cc
index 82ecca1..a4f71f2 100644
--- a/src/LastalArguments.cc
+++ b/src/LastalArguments.cc
@@ -89,10 +89,21 @@ LastalArguments::LastalArguments() :
 void LastalArguments::fromArgs( int argc, char** argv, bool optionsOnly ){
   programName = argv[0];
   std::string usage = "Usage: " + std::string(programName) +
-    " [options] lastdb-name fasta-sequence-file(s)";
+    " [options] lastdb-name fasta-sequence-file(s)\n\
+Find and align similar sequences.\n\
+\n\
+Cosmetic options:\n\
+-h, --help: show all options and their default settings, and exit\n\
+-V, --version: show version information, and exit\n\
+-v: be verbose: write messages about what lastal is doing\n\
+-f: output format: TAB, MAF, BlastTab, BlastTab+ (default=MAF)";
 
   std::string help = usage + "\n\
-Find local sequence alignments.\n\
+\n\
+E-value options (default settings):\n\
+-D: query letters per random alignment ("
+    + stringify(queryLettersPerRandomAlignment) + ")\n\
+-E: maximum expected alignments per square giga (1e+18/D/refSize/numOfStrands)\n\
 \n\
 Score options (default settings):\n\
 -r: match score   (2 if -M, else  6 if 0<Q<5, else 1 if DNA)\n\
@@ -110,17 +121,6 @@ Score options (default settings):\n\
 -d: minimum score for gapless alignments (min[e, t*ln(1000*refSize/n)])\n\
 -e: minimum score for gapped alignments\n\
 \n\
-E-value options (default settings):\n\
--D: query letters per random alignment ("
-    + stringify(queryLettersPerRandomAlignment) + ")\n\
--E: maximum expected alignments per square giga (1e+18/D/refSize/numOfStrands)\n\
-\n\
-Cosmetic options (default settings):\n\
--h, --help: show all options and their default settings, and exit\n\
--V, --version: show version information, and exit\n\
--v: be verbose: write messages about what lastal is doing\n\
--f: output format: TAB, MAF, BlastTab, BlastTab+ (MAF)\n\
-\n\
 Initial-match options (default settings):\n\
 -m: maximum initial matches per query position ("
     + stringify(oneHitMultiplicity) + ")\n\
@@ -135,15 +135,15 @@ Miscellaneous options (default settings):\n\
 -s: strand: 0=reverse, 1=forward, 2=both (2 for DNA, 1 for protein)\n\
 -S: score matrix applies to forward strand of: 0=reference, 1=query ("
     + stringify(isQueryStrandMatrix) + ")\n\
+-K: omit alignments whose query range lies in >= K others with > score (off)\n\
+-C: omit gapless alignments in >= C others with > score-per-length (off)\n\
+-P: number of parallel threads ("
+    + stringify(numOfThreads) + ")\n\
+-i: query batch size (8 KiB, unless there is > 1 thread or lastdb volume)\n\
 -M: find minimum-difference alignments (faster but cruder)\n\
 -T: type of alignment: 0=local, 1=overlap ("
     + stringify(globality) + ")\n\
 -n: maximum gapless alignments per query position (infinity if m=0, else m)\n\
--C: omit gapless alignments in >= C others with > score-per-length (off)\n\
--K: omit alignments whose query range lies in >= K others with > score (off)\n\
--i: query batch size (8 KiB, unless there is > 1 thread or lastdb volume)\n\
--P: number of parallel threads ("
-    + stringify(numOfThreads) + ")\n\
 -R: repeat-marking options (the same as was used for lastdb)\n\
 -u: mask lowercase during extensions: 0=never, 1=gapless,\n\
     2=gapless+postmask, 3=always (2 if lastdb -c and Q<5, else 0)\n\
@@ -472,7 +472,8 @@ void LastalArguments::setDefaultsFromAlphabet( bool isDna, bool isProtein,
   if( batchSize == 0 ){
     // With voluming, we want the batches to be as large as will
     // comfortably fit into memory, because each volume gets read from
-    // disk once per batch.
+    // disk once per batch.  With multi-threads, we want large batches
+    // so that long query sequences can be processed in parallel.
     if( !isVolumes && realNumOfThreads == 1 )
       batchSize = 0x2000;  // 8 Kbytes (?)
     else if( inputFormat == sequenceFormat::pssm )
@@ -482,8 +483,10 @@ void LastalArguments::setDefaultsFromAlphabet( bool isDna, bool isProtein,
     else if( inputFormat == sequenceFormat::prb )
       batchSize = 0x2000000;  // 32 Mbytes (?)
     else
-      batchSize = 0x8000000;  // 128 Mbytes
-    // (should we reduce the 128 Mbytes, for fewer out-of-memory errors?)
+      batchSize = 0x4000000;  // 64 Mbytes (?)
+    // 128 Mbytes seemed to sometimes use excessive memory to store
+    // the alignments.  I suspect 64 Mbytes may still be too much
+    // sometimes.
     if( verbosity )
       std::cerr << programName << ": batch size=" << batchSize << '\n';
   }
diff --git a/src/LastdbArguments.cc b/src/LastdbArguments.cc
index c274142..274d248 100644
--- a/src/LastdbArguments.cc
+++ b/src/LastdbArguments.cc
@@ -56,22 +56,22 @@ Main Options:\n\
 -p: interpret the sequences as proteins\n\
 -R: repeat-marking options (default="
     + stringify(isKeepLowercase) + stringify(tantanSetting) + ")\n\
--c: soft-mask lowercase letters";
+-c: soft-mask lowercase letters\n\
+-u: seeding scheme (default: YASS for DNA, else exact-match seeds)";
 
   std::string help = usage + "\n\
 \n\
 Advanced Options (default settings):\n\
--Q: input format: 0=fasta, 1=fastq-sanger, 2=fastq-solexa, 3=fastq-illumina ("
-      + stringify(inputFormat) + ")\n\
--s: volume size (unlimited)\n\
--m: seed pattern (non-DNA: 1)\n\
--u: seeding scheme (DNA: YASS)\n\
 -w: use initial matches starting at every w-th position in each sequence ("
     + stringify(indexStep) + ")\n\
 -W: use \"minimum\" positions in sliding windows of W consecutive positions ("
     + stringify(minimizerWindow) + ")\n\
+-s: volume size (unlimited)\n\
+-Q: input format: 0=fasta, 1=fastq-sanger, 2=fastq-solexa, 3=fastq-illumina ("
+      + stringify(inputFormat) + ")\n\
 -P: number of parallel threads ("
     + stringify(numOfThreads) + ")\n\
+-m: seed pattern\n\
 -a: user-defined alphabet\n\
 -i: minimum limit on initial matches per query position ("
     + stringify(minSeedLimit) + ")\n\
diff --git a/src/split/last-split-main.cc b/src/split/last-split-main.cc
index e281829..01cea8f 100644
--- a/src/split/last-split-main.cc
+++ b/src/split/last-split-main.cc
@@ -56,7 +56,7 @@ Options:\n\
     + cbrc::stringify(opts.sdev) + ")\n\
  -m, --mismap=PROB  maximum mismap probability (default="
     + cbrc::stringify(opts.mismap) + ")\n\
- -s, --score=INT    minimum alignment score (default=e OR e+t*ln[1000])\n\
+ -s, --score=INT    minimum alignment score (default=e OR e+t*ln[100])\n\
  -n, --no-split     write original, not split, alignments\n\
  -v, --verbose      be verbose\n\
  -V, --version      show version information and exit\n\
diff --git a/src/split/last-split.cc b/src/split/last-split.cc
index 8a24191..7db0432 100644
--- a/src/split/last-split.cc
+++ b/src/split/last-split.cc
@@ -307,7 +307,7 @@ void lastSplit(LastSplitOptions& opts) {
 	    err("unsupported Q format");
 	  if (opts.score < 0)
 	    opts.score = lastalScoreThreshold +
-	      (opts.isSplicedAlignment ? scoreFromProb(1000, scale) : 0);
+	      (opts.isSplicedAlignment ? scoreFromProb(100, scale) : 0);
 	  int restartCost =
 	    opts.isSplicedAlignment ? -(INT_MIN/2) : opts.score - 1;
 	  double jumpProb = opts.isSplicedAlignment
diff --git a/src/version.hh b/src/version.hh
index 2a9b445..7f9046d 100644
--- a/src/version.hh
+++ b/src/version.hh
@@ -1 +1 @@
-"752"
+"755"

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/last-align.git



More information about the debian-med-commit mailing list