[med-svn] [examl] 01/02: New upstream version 3.0.18

Andreas Tille tille at debian.org
Wed Feb 15 16:08:47 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository examl.

commit 2c674372298b9869a7c3d985b81a8372939a0266
Author: Andreas Tille <tille at debian.org>
Date:   Wed Feb 15 17:07:52 2017 +0100

    New upstream version 3.0.18
---
 README.md                                 |   29 +
 README_MIC.txt                            |   80 +
 codeDocumentation/PSR.txt                 |    2 +
 codeDocumentation/startupIllustration.pdf |  Bin 0 -> 75681 bytes
 codeDocumentation/startupIllustration.svg |  818 ++++
 examl/Makefile.AVX.gcc                    |   54 +
 examl/Makefile.MIC.icc                    |   56 +
 examl/Makefile.OMP.AVX.gcc                |   54 +
 examl/Makefile.OMP.SSE3.gcc               |   51 +
 examl/Makefile.SSE3.gcc                   |   52 +
 examl/avxLikelihood.c                     | 4052 +++++++++++++++++++
 examl/axml.c                              | 2782 +++++++++++++
 examl/axml.h                              | 1418 +++++++
 examl/bipartitionList.c                   |  592 +++
 examl/byteFile.c                          |  435 ++
 examl/byteFile.h                          |   60 +
 examl/communication.c                     |  182 +
 examl/evaluateGenericSpecial.c            | 2083 ++++++++++
 examl/evaluatePartialGenericSpecial.c     | 1058 +++++
 examl/globalVariables.h                   |  180 +
 examl/makenewzGenericSpecial.c            | 2747 +++++++++++++
 examl/mic_native.h                        |   96 +
 examl/mic_native_aa.c                     | 1323 ++++++
 examl/mic_native_dna.c                    |  661 +++
 examl/models.c                            | 4243 ++++++++++++++++++++
 examl/newviewGenericSpecial.c             | 6218 +++++++++++++++++++++++++++++
 examl/optimizeModel.c                     | 3134 +++++++++++++++
 examl/partitionAssignment.c               |  693 ++++
 examl/partitionAssignment.h               |   64 +
 examl/quartets.c                          |  615 +++
 examl/restartHashTable.c                  |  357 ++
 examl/searchAlgo.c                        | 2651 ++++++++++++
 examl/topologies.c                        |  653 +++
 examl/trash.c                             |   78 +
 examl/treeIO.c                            | 1184 ++++++
 gpl-3.0.txt                               |  674 ++++
 manual/ExaML.backup.odt                   |  Bin 0 -> 87753 bytes
 manual/ExaML.odt                          |  Bin 0 -> 102973 bytes
 manual/ExaML.pdf                          |  Bin 0 -> 412169 bytes
 parser/Makefile.SSE3.gcc                  |   29 +
 parser/Makefile.check.warnings            |   26 +
 parser/USAGE                              |    1 +
 parser/axml.c                             | 2895 ++++++++++++++
 parser/axml.h                             | 1295 ++++++
 parser/globalVariables.h                  |  195 +
 parser/parsePartitions.c                  | 1427 +++++++
 testData/140                              |  142 +
 testData/140.model                        |    3 +
 testData/140.tree                         |    1 +
 testData/354.tree                         |    1 +
 testData/49                               |   50 +
 testData/49.model                         |    4 +
 testData/49.tree                          |    1 +
 versionHeader/version.h                   |    4 +
 54 files changed, 45503 insertions(+)

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..02092f2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,29 @@
+ExaML
+=====
+
+Exascale Maximum Likelihood (ExaML) code for phylogenetic inference using MPI.
+
+This code implements the popular RAxML search algorithm for maximum likelihood based inference 
+of phylogenetic trees.
+
+It uses a radically new MPI parallelization approach that yields improved parallel efficiency, 
+in particular on partitioned multi-gene or whole-genome datasets.
+
+When using ExaML please cite the following paper: 
+
+Alexey M. Kozlov, Andre J. Aberer, Alexandros Stamatakis: "ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers." Bioinformatics (2015) 31 (15): 2577-2579.
+
+It is up to 4 times faster than RAxML-Light [1].
+
+As RAxML-Light, ExaML also implements checkpointing, SSE3, AVX vectorization and 
+memory saving techniques.
+
+[1] A. Stamatakis,  A.J. Aberer, C. Goll, S.A. Smith, S.A. Berger, F. Izquierdo-Carrasco: 
+    "RAxML-Light: A Tool for computing TeraByte Phylogenies", 
+    Bioinformatics 2012; doi: 10.1093/bioinformatics/bts309.
+
+
+Intel Xeon Phi
+--------------
+
+For details on running ExaML on Intel MIC (aka Xeon Phi), please refer to README_MIC.txt.
\ No newline at end of file
diff --git a/README_MIC.txt b/README_MIC.txt
new file mode 100644
index 0000000..d6e7683
--- /dev/null
+++ b/README_MIC.txt
@@ -0,0 +1,80 @@
+Using ExaML on the Intel MIC/Intel Xeon Phi coprocessors
+
+Compiling under Linux
+---------------------
+
+Please set your MPI/MIC environment (ask your sysadmin if unsure) and then run:
+
+   make -f Makefile.AVX.gcc
+   make -f Makefile.MIC.icc clean
+   make -f Makefile.MIC.icc
+
+This will create two executables for both host(=CPU) and MIC - they will be 
+named examl-AVX and examl-MIC, respectively.
+
+
+Running
+----------------------
+
+1. Use parse-examl to generate a binary alignment file as usual.
+
+2. You might want to allocate MPI ranks on both host CPUs and MICs (hybrid mode)
+or just on the MICs, depending on your configuration.
+
+Sample command line for running ExaML in hybrid mode (16 CPU core + 2 MIC cards):
+
+  mpiexec -host myhost-ib -n 16 /scratch/examl-AVX -n mictest -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch : \
+          -host myhost-mic0 -n 30 -env OMP_NUM_THREADS 4 -env KMP_AFFINITY "granularity=fine,balanced" /scratch/examl-MIC -n mictest \
+          -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch : \
+          -host myhost-mic1 -n 30 -env OMP_NUM_THREADS 4 -env KMP_AFFINITY "granularity=fine,balanced" /scratch/examl-MIC -n mictest \
+          -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch
+
+Here, we use 1 MPI rank per core on the host CPUs. On each MIC, we start 30 ranks x 4 OpenMP threads, 
+which gives 120 threads in total or 2 threads per MIC core. Changing the ratio of CPU:MIC ranks allows
+to fine-tune load balance for the specific hardware configuration at hand.
+
+
+Limitations & caveats
+---------------------
+
+1. Supported on the MIC:
+
+   + DNA and AA alignments
+   + GAMMA model of rate heterogeneity
+   + multiple partitions 
+   + all AA substitution matrices supported by ExaML, including LG4
+
+2. Currently NOT supported:
+
+   - binary and generic multi-state alignments
+   - PSR model
+   - memory saving for gappy alignments (-S option)
+
+3. Memory 
+
+  Compared to traditional CPUs, MIC cards have significantly lower memory-per-core value,
+  which poses a problem for memory-intensive ML computations. Thus you should plan carefully 
+  and split your run over multiple cards, if needed.
+
+  To estimate memory requirements for your dataset, you can use the web-calculator here:
+
+    http://sco.h-its.org/exelixis/web/software/raxml/index.html#memcalc
+
+  A similar tool tailored for MICs is coming soon, stay tuned :)
+
+4. Performance
+  
+  ExaML-MIC performs best on alignments with large number of sites and few taxa.
+  The latter is due to the limited on-card memory of the MICs (s. above), so you 
+  might need to use multiple cards if the number of taxa is large.
+
+  For details, please refer to: http://www.hicomb.org/papers/HICOMB2014-04.pdf
+
+
+Contact & Support
+--------------------
+
+Please use RAxML google group to ask questions:
+
+https://groups.google.com/forum/?hl=en#!forum/raxml
+
diff --git a/codeDocumentation/PSR.txt b/codeDocumentation/PSR.txt
new file mode 100644
index 0000000..4e0b81b
--- /dev/null
+++ b/codeDocumentation/PSR.txt
@@ -0,0 +1,2 @@
+To disable per-site rate category scaling in ExaML it suffices to comment out the function invocations for:
+updatePerSiteRates() and checkPerSiteRates() in the source code.
diff --git a/codeDocumentation/startupIllustration.pdf b/codeDocumentation/startupIllustration.pdf
new file mode 100644
index 0000000..42683d3
Binary files /dev/null and b/codeDocumentation/startupIllustration.pdf differ
diff --git a/codeDocumentation/startupIllustration.svg b/codeDocumentation/startupIllustration.svg
new file mode 100644
index 0000000..b4cfc2f
--- /dev/null
+++ b/codeDocumentation/startupIllustration.svg
@@ -0,0 +1,818 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="2023.8151"
+   height="1020.7793"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.48.4 r9939"
+   sodipodi:docname="img-5.pdf">
+  <defs
+     id="defs4">
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend"
+       style="overflow:visible">
+      <path
+         id="path4196"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)"
+         inkscape:connector-curvature="0" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-8"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-6"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-4"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-66"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-6"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-4"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-5"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-69"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-1"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-8"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-46"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-7"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-65"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-2"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-40"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-25"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend-82"
+       style="overflow:visible">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4196-44"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+         style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+    </marker>
+  </defs>
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="0.35"
+     inkscape:cx="678.71845"
+     inkscape:cy="456.86792"
+     inkscape:document-units="px"
+     inkscape:current-layer="svg2"
+     showgrid="false"
+     inkscape:window-width="1916"
+     inkscape:window-height="1057"
+     inkscape:window-x="0"
+     inkscape:window-y="19"
+     inkscape:window-maximized="0"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0" />
+  <metadata
+     id="metadata7">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="<1->"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(-30,-13.400127)">
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="34.285713"
+       y="85.219322"
+       id="text3753"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3755"
+         x="34.285713"
+         y="85.219322">bytefile layout</tspan></text>
+    <rect
+       style="fill:#cccccc;fill-opacity:1;stroke:none"
+       id="rect3757"
+       width="692.85718"
+       height="918.57141"
+       x="30"
+       y="103.79076" />
+    <g
+       id="g3948">
+      <text
+         sodipodi:linespacing="125%"
+         id="text3759"
+         y="136.64789"
+         x="54.285717"
+         style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="136.64789"
+           x="54.285717"
+           id="tspan3761"
+           sodipodi:role="line">int sizeof(size_t),</tspan><tspan
+           id="tspan3763"
+           y="149.14789"
+           x="54.285717"
+           sodipodi:role="line">int numTax,</tspan><tspan
+           id="tspan3765"
+           y="161.64789"
+           x="54.285717"
+           sodipodi:role="line">size_t numPattern, </tspan><tspan
+           id="tspan3767"
+           y="174.14789"
+           x="54.285717"
+           sodipodi:role="line">int numPartitions, </tspan><tspan
+           id="tspan3769"
+           y="186.64789"
+           x="54.285717"
+           sodipodi:role="line">double gappyness </tspan></text>
+      <text
+         transform="scale(-0.86513234,1.1558925)"
+         sodipodi:linespacing="125%"
+         id="text3771"
+         y="155.88176"
+         x="-304.42612"
+         style="font-size:45.68457413px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="155.88176"
+           x="-304.42612"
+           id="tspan3773"
+           sodipodi:role="line">{</tspan></text>
+      <text
+         sodipodi:linespacing="125%"
+         id="text3775"
+         y="172.61049"
+         x="264.66064"
+         style="font-size:22.46417236px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           style="font-size:20px"
+           y="172.61049"
+           x="264.66064"
+           id="tspan3777"
+           sodipodi:role="line">header</tspan></text>
+    </g>
+    <g
+       id="g3908"
+       transform="translate(-7.0710681,154.55334)">
+      <text
+         sodipodi:linespacing="125%"
+         id="text3779"
+         y="275.93359"
+         x="55"
+         style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="275.93359"
+           x="55"
+           id="tspan3781"
+           sodipodi:role="line">int len1, </tspan><tspan
+           id="tspan3783"
+           y="288.43359"
+           x="55"
+           sodipodi:role="line">char taxonName[len1], </tspan><tspan
+           id="tspan3785"
+           y="300.93359"
+           x="55"
+           sodipodi:role="line">int len2, </tspan><tspan
+           id="tspan3787"
+           y="313.43359"
+           x="55"
+           sodipodi:role="line">char taxonName[len2],</tspan><tspan
+           id="tspan3789"
+           y="325.93359"
+           x="55"
+           sodipodi:role="line">...</tspan></text>
+      <text
+         transform="scale(-0.75650437,1.3218694)"
+         sodipodi:linespacing="125%"
+         id="text3771-8"
+         y="237.30026"
+         x="-348.03134"
+         style="font-size:52.24451065px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="237.30026"
+           x="-348.03134"
+           id="tspan3773-9"
+           sodipodi:role="line">{</tspan></text>
+      <text
+         sodipodi:linespacing="125%"
+         id="text3812"
+         y="300.21933"
+         x="270"
+         style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="300.21933"
+           x="270"
+           id="tspan3814"
+           sodipodi:role="line">taxon names</tspan></text>
+    </g>
+    <g
+       id="g3920"
+       transform="translate(-6.0609153,95.964492)">
+      <text
+         sodipodi:linespacing="125%"
+         id="text3816"
+         y="424.50504"
+         x="56.42857"
+         style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="424.50504"
+           x="56.42857"
+           id="tspan3818"
+           sodipodi:role="line">partition1{ </tspan><tspan
+           id="tspan3852"
+           y="437.00504"
+           x="56.42857"
+           sodipodi:role="line">int states, </tspan><tspan
+           id="tspan3820"
+           y="449.50504"
+           x="56.42857"
+           sodipodi:role="line">int maxTipStates,</tspan><tspan
+           id="tspan3822"
+           y="462.00504"
+           x="56.42857"
+           sodipodi:role="line">size_t lower,</tspan><tspan
+           id="tspan3824"
+           y="474.50504"
+           x="56.42857"
+           sodipodi:role="line">size_t upper,</tspan><tspan
+           id="tspan3826"
+           y="487.00504"
+           x="56.42857"
+           sodipodi:role="line">size_t width, (unused)</tspan><tspan
+           id="tspan3828"
+           y="499.50504"
+           x="56.42857"
+           sodipodi:role="line">int dataType,</tspan><tspan
+           id="tspan3830"
+           y="512.005"
+           x="56.42857"
+           sodipodi:role="line">int protModels,</tspan><tspan
+           id="tspan3832"
+           y="524.505"
+           x="56.42857"
+           sodipodi:role="line">int autoProtModels,</tspan><tspan
+           id="tspan3836"
+           y="537.005"
+           x="56.42857"
+           sodipodi:role="line">int protFreqs,</tspan><tspan
+           id="tspan3840"
+           y="549.505"
+           x="56.42857"
+           sodipodi:role="line">boolean nonGTR,</tspan><tspan
+           id="tspan3842"
+           y="562.005"
+           x="56.42857"
+           sodipodi:role="line">boolean optimizeBaseFrequencies,</tspan><tspan
+           id="tspan3844"
+           y="574.505"
+           x="56.42857"
+           sodipodi:role="line">int numberOfCategories,</tspan><tspan
+           id="tspan3846"
+           y="587.005"
+           x="56.42857"
+           sodipodi:role="line">int len,</tspan><tspan
+           id="tspan3848"
+           y="599.505"
+           x="56.42857"
+           sodipodi:role="line">char partitionName[len],</tspan><tspan
+           id="tspan3850"
+           y="612.005"
+           x="56.42857"
+           sodipodi:role="line">double frequencies[states]</tspan><tspan
+           id="tspan3854"
+           y="624.505"
+           x="56.42857"
+           sodipodi:role="line">}</tspan><tspan
+           id="tspan3856"
+           y="637.005"
+           x="56.42857"
+           sodipodi:role="line">partition 2{</tspan><tspan
+           id="tspan3858"
+           y="649.505"
+           x="56.42857"
+           sodipodi:role="line">....</tspan><tspan
+           id="tspan3862"
+           y="662.005"
+           x="56.42857"
+           sodipodi:role="line">}</tspan><tspan
+           id="tspan3860"
+           y="674.505"
+           x="56.42857"
+           sodipodi:role="line">....</tspan></text>
+      <text
+         transform="scale(-0.2978097,3.357849)"
+         sodipodi:linespacing="125%"
+         id="text3771-8-8"
+         y="185.44205"
+         x="-839.49341"
+         style="font-size:83.03351593px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="185.44205"
+           x="-839.49341"
+           id="tspan3773-9-3"
+           sodipodi:role="line">{</tspan></text>
+      <text
+         sodipodi:linespacing="125%"
+         id="text3885"
+         y="544.50507"
+         x="260.71429"
+         style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="544.50507"
+           x="260.71429"
+           id="tspan3887"
+           sodipodi:role="line">partition infos</tspan></text>
+    </g>
+    <rect
+       style="fill:#808080;fill-opacity:1;stroke:none"
+       id="rect3960"
+       width="403.05087"
+       height="159.6041"
+       x="54.548237"
+       y="227.06755" />
+    <text
+       xml:space="preserve"
+       style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="57.823799"
+       y="220.41492"
+       id="text3962"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3964"
+         x="57.823799"
+         y="220.41492">int weights[numPattern]</tspan></text>
+    <rect
+       style="fill:#808080;fill-opacity:1;stroke:none"
+       id="rect3960-1"
+       width="403.05087"
+       height="159.6041"
+       x="44.951782"
+       y="811.81976" />
+    <text
+       xml:space="preserve"
+       style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="45.714287"
+       y="802.36218"
+       id="text3984"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3986"
+         x="45.714287"
+         y="802.36218">char yVector[numPattern]</tspan></text>
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer2"
+     inkscape:label="<2->"
+     transform="translate(-30,-13.400127)"
+     style="display:inline">
+    <g
+       id="g4683">
+      <rect
+         y="213.79076"
+         x="785.71429"
+         height="185.71428"
+         width="320"
+         id="rect3995"
+         style="fill:#00ffff;fill-opacity:1;stroke:none" />
+      <text
+         sodipodi:linespacing="125%"
+         id="text3997"
+         y="203.79076"
+         x="790"
+         style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="203.79076"
+           x="790"
+           id="tspan3999"
+           sodipodi:role="line">ByteFile *bFile</tspan></text>
+      <text
+         sodipodi:linespacing="125%"
+         id="text4651"
+         y="276.64789"
+         x="801.42859"
+         style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+         xml:space="preserve"><tspan
+           y="276.64789"
+           x="801.42859"
+           id="tspan4653"
+           sodipodi:role="line">....</tspan><tspan
+           id="tspan4655"
+           y="301.64789"
+           x="801.42859"
+           sodipodi:role="line">pInfo* partitions</tspan><tspan
+           id="tspan4657"
+           y="326.64789"
+           x="801.42859"
+           sodipodi:role="line">....</tspan></text>
+    </g>
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer3"
+     inkscape:label="<2>"
+     style="display:inline"
+     transform="translate(-30,-13.400127)">
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="825.71429"
+       y="43.790752"
+       id="text3988"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3990"
+         x="825.71429"
+         y="43.790752">1. read header, taxa, partitions into ByteFile struct</tspan><tspan
+         sodipodi:role="line"
+         x="825.71429"
+         y="93.790756"
+         id="tspan3992">(use seekPos() to navigate in bytefile)</tspan></text>
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+       d="m 368.57143,160.93361 c 104.28571,57.14286 407.14286,80 407.14286,80"
+       id="path4001"
+       inkscape:connector-curvature="0" />
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+       d="M 417.79138,448.34805 C 524.93423,432.63377 774.93424,331.2052 774.93424,331.2052"
+       id="path4001-4"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+       d="M 405.78958,605.47122 C 512.93243,589.75694 778.64673,381.18552 778.64673,381.18552"
+       id="path4001-4-5"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer5"
+     inkscape:label="<3->"
+     style="display:inline"
+     transform="translate(-30,-13.400127)">
+    <rect
+       style="fill:#00ffff;stroke:none;display:inline"
+       id="rect4718"
+       width="370"
+       height="195.71428"
+       x="1298.5714"
+       y="203.79076" />
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;display:inline;font-family:Sans"
+       x="1299.9999"
+       y="192.36218"
+       id="text4720"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4722"
+         x="1299.9999"
+         y="192.36218">PartitionAssignment *pAss</tspan></text>
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer4"
+     inkscape:label="<3>"
+     style="display:inline"
+     transform="translate(-30,-13.400127)">
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="777.14288"
+       y="59.505039"
+       id="text4714"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4716"
+         x="777.14288"
+         y="59.505039">2. every process computes partition assignment</tspan></text>
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="m 1062.9324,318.32837 c 108.5715,-55.71428 251.4286,-8.57142 251.4286,-8.57142"
+       id="path4001-4-9"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer6"
+     inkscape:label="<4>"
+     style="display:inline"
+     transform="translate(-30,-13.400127)">
+    <rect
+       style="fill:#ff0000;fill-opacity:1;stroke:none"
+       id="rect4767"
+       width="215.71428"
+       height="37.142857"
+       x="151.42857"
+       y="226.6479" />
+    <rect
+       style="fill:#ff0000;fill-opacity:1;stroke:none"
+       id="rect4769"
+       width="205.71428"
+       height="35.714287"
+       x="251.28572"
+       y="351.07645" />
+    <rect
+       style="fill:#ff0000;fill-opacity:1;stroke:none"
+       id="rect4767-5"
+       width="215.71428"
+       height="37.142857"
+       x="141.42857"
+       y="811.64789" />
+    <rect
+       style="fill:#ff0000;fill-opacity:1;stroke:none"
+       id="rect4769-3"
+       width="205.71428"
+       height="35.714287"
+       x="241.42859"
+       y="936.64789" />
+    <text
+       xml:space="preserve"
+       style="font-size:27.47451591px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="812.2901"
+       y="435.74408"
+       id="text4792"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4794"
+         x="812.2901"
+         y="435.74408">partitions[0].yVector</tspan><tspan
+         sodipodi:role="line"
+         x="812.2901"
+         y="470.08722"
+         id="tspan4796">partitions[0].wgt</tspan><tspan
+         sodipodi:role="line"
+         x="812.2901"
+         y="504.43036"
+         id="tspan4800">partitions[4].yVector</tspan><tspan
+         sodipodi:role="line"
+         x="812.2901"
+         y="538.7735"
+         id="tspan4804">partitions[4].wgt</tspan></text>
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="m 365.84088,245.71715 c 258.07408,-7.22696 437.29667,183.35756 437.29667,183.35756"
+       id="path4001-4-9-2"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="791.95959"
+       y="70.493912"
+       id="text4831"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         x="791.95959"
+         y="70.493912"
+         id="tspan4839">3. process only reads data assigned to it (exa_fread/exa_fseek)</tspan></text>
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="m 446.65309,368.95576 c 120.69333,19.03701 358.50477,130.82963 358.50477,130.82963"
+       id="path4001-4-9-2-2"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="M 438.57187,958.88485 C 642.09771,927.41423 795.05633,534.13058 795.05633,534.13058"
+       id="path4001-4-9-2-2-5"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="M 337.55662,833.62593 C 423.7393,651.2704 759.25848,484.19643 799.09694,465.4402"
+       id="path4001-4-9-2-2-5-7"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+  </g>
+  <g
+     inkscape:groupmode="layer"
+     id="layer7"
+     inkscape:label="<5>"
+     transform="translate(-30,-13.400127)"
+     style="display:inline">
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="783.87836"
+       y="62.412689"
+       id="text4916"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4918"
+         x="783.87836"
+         y="62.412689">4. tree struct is initialized; bFile and pAss are deleted</tspan></text>
+    <flowRoot
+       xml:space="preserve"
+       id="flowRoot4920"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"><flowRegion
+         id="flowRegion4922"><rect
+           id="rect4924"
+           width="359.61432"
+           height="436.38589"
+           x="1004.0916"
+           y="599.81384" /></flowRegion><flowPara
+         id="flowPara4926"></flowPara></flowRoot>    <rect
+       style="fill:#00ffff;fill-opacity:1;stroke:none"
+       id="rect4928"
+       width="393.9595"
+       height="307.08636"
+       x="1016.2134"
+       y="727.09308" />
+    <text
+       xml:space="preserve"
+       style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="1018.2338"
+       y="706.89001"
+       id="text4930"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4932"
+         x="1018.2338"
+         y="706.89001">tree *tr</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="1038.4368"
+       y="832.14893"
+       id="text4934"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4936"
+         x="1038.4368"
+         y="832.14893">....</tspan><tspan
+         sodipodi:role="line"
+         x="1038.4368"
+         y="857.14893"
+         id="tspan4938">pInfo *partitionData</tspan><tspan
+         sodipodi:role="line"
+         x="1038.4368"
+         y="882.14893"
+         id="tspan4940">....</tspan><tspan
+         sodipodi:role="line"
+         x="1038.4368"
+         y="907.14893"
+         id="tspan4942" /><tspan
+         sodipodi:role="line"
+         x="1038.4368"
+         y="932.14893"
+         id="tspan4944">Assign* assignments</tspan></text>
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="m 909.83212,402.22245 c -105.58083,249.35179 89.80411,439.93631 89.80411,439.93631"
+       id="path4001-4-9-2-8"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+       d="m 1428.4762,395.11161 c 37.8607,285.71728 -158.6935,528.82973 -158.6935,528.82973"
+       id="path4001-4-9-2-8-8"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <text
+       xml:space="preserve"
+       style="font-size:186.88011169px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="1409.8387"
+       y="373.53967"
+       id="text4995"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4997"
+         x="1409.8387"
+         y="373.53967"
+         style="fill:#ff0000">X</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:186.88011169px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       x="861.42468"
+       y="368.92688"
+       id="text4995-0"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4997-2"
+         x="861.42468"
+         y="368.92688"
+         style="fill:#ff0000">X</tspan></text>
+  </g>
+</svg>
diff --git a/examl/Makefile.AVX.gcc b/examl/Makefile.AVX.gcc
new file mode 100644
index 0000000..08e20aa
--- /dev/null
+++ b/examl/Makefile.AVX.gcc
@@ -0,0 +1,54 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D__SIM_SSE3 -D__AVX -D_OPTIMIZED_FUNCTIONS -msse3 -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE #-Wall   -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototypes -Wpointer-sign -We [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mavx
+
+RM = rm -f
+
+objs    = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o avxLikelihood.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-AVX
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-AVX : $(objs)
+	$(CC) -o examl-AVX $(objs) $(LIBRARIES) 
+
+avxLikelihood.o : avxLikelihood.c $(GLOBAL_DEPS)
+	$(CC) $(CFLAGS) -mavx -c -o avxLikelihood.o avxLikelihood.c
+
+models.o : models.c $(GLOBAL_DEPS)
+	 $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c  $(GLOBAL_DEPS) 
+communication.o : communication.c $(GLOBAL_DEPS) 
+
+
+clean : 
+	$(RM) *.o examl-AVX
+
+dev : examl-AVX
\ No newline at end of file
diff --git a/examl/Makefile.MIC.icc b/examl/Makefile.MIC.icc
new file mode 100644
index 0000000..1cd6e9a
--- /dev/null
+++ b/examl/Makefile.MIC.icc
@@ -0,0 +1,56 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+MICFLAGS = -D__MIC_NATIVE -mmic -opt-streaming-cache-evict=0 -openmp -D_USE_OMP #-D_PROFILE_MPI
+COMMON_FLAGS = -std=c99 -D__SIM_SSE3 -D_OPTIMIZED_FUNCTIONS -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE  $(MICFLAGS) # -Wall   -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototypes -Wpointer- [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mmic -openmp
+
+RM = rm -f
+
+objs    = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o byteFile.o partitionAssignment.o communication.o mic_native_dna.o mic_native_aa.o quartets.o
+
+all : clean examl-MIC
+
+GLOBAL_DEPS = axml.h globalVariables.h
+
+examl-MIC : $(objs)
+	$(CC) -o examl-MIC $(objs) $(LIBRARIES)
+
+models.o : models.c $(GLOBAL_DEPS)
+	$(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+partitionAssignment.o: partitionAssignment.o $(GLOBAL_DEPS)
+	$(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o partitionAssignment.o partitionAssignment.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+communication.o : communication.c $(GLOBAL_DEPS) 
+mic_native_dna.o : mic_native_dna.c $(GLOBAL_DEPS)
+mic_native_aa.o : mic_native_aa.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean : 
+	$(RM) *.o examl-MIC
+
+dev : examl-MIC
diff --git a/examl/Makefile.OMP.AVX.gcc b/examl/Makefile.OMP.AVX.gcc
new file mode 100644
index 0000000..13d21e0
--- /dev/null
+++ b/examl/Makefile.OMP.AVX.gcc
@@ -0,0 +1,54 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D__SIM_SSE3 -D__AVX -D_USE_OMP -fopenmp -D_OPTIMIZED_FUNCTIONS -msse3 -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE  -Wall #  -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototyp [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mavx -fopenmp
+
+RM = rm -f
+
+objs    = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o avxLikelihood.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-OMP-AVX
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-OMP-AVX : $(objs)
+	$(CC) -o examl-OMP-AVX $(objs) $(LIBRARIES) 
+
+avxLikelihood.o : avxLikelihood.c $(GLOBAL_DEPS)
+	$(CC) $(CFLAGS) -mavx -c -o avxLikelihood.o avxLikelihood.c
+
+models.o : models.c $(GLOBAL_DEPS)
+	 $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c  $(GLOBAL_DEPS) 
+communication.o : communication.c $(GLOBAL_DEPS) 
+
+
+clean : 
+	$(RM) *.o examl-OMP-AVX
+
+dev : examl-OMP-AVX
diff --git a/examl/Makefile.OMP.SSE3.gcc b/examl/Makefile.OMP.SSE3.gcc
new file mode 100644
index 0000000..5ee3bc2
--- /dev/null
+++ b/examl/Makefile.OMP.SSE3.gcc
@@ -0,0 +1,51 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D_USE_OMP -fopenmp -D_GNU_SOURCE -D__SIM_SSE3  -msse3 -fomit-frame-pointer -funroll-loops -D_OPTIMIZED_FUNCTIONS -D_USE_ALLREDUCE -Wall #-Wunused-parameter -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict [...]
+
+OPT_FLAG_1 = -O1 
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -fopenmp
+
+RM = rm -f
+
+objs    = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o  treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o  byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-OMP
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-OMP : $(objs)
+	$(CC) -o examl-OMP $(objs) $(LIBRARIES) 
+
+models.o : models.c $(GLOBAL_DEPS)
+	 $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+models.o : models.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c  $(GLOBAL_DEPS) 
+communication.o : communication.c $(GLOBAL_DEPS) 
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean : 
+	$(RM) *.o examl-OMP
+
+dev : examl-OMP
diff --git a/examl/Makefile.SSE3.gcc b/examl/Makefile.SSE3.gcc
new file mode 100644
index 0000000..c15f0fc
--- /dev/null
+++ b/examl/Makefile.SSE3.gcc
@@ -0,0 +1,52 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+
+COMMON_FLAGS = -D_GNU_SOURCE -D__SIM_SSE3  -msse3 -fomit-frame-pointer -funroll-loops -D_OPTIMIZED_FUNCTIONS -D_USE_ALLREDUCE #-Wall -Wunused-parameter -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototypes -Wpointe [...]
+
+OPT_FLAG_1 = -O1 
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs    = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o  treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o  byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl : $(objs)
+	$(CC) -o examl $(objs) $(LIBRARIES) 
+
+models.o : models.c $(GLOBAL_DEPS)
+	 $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+models.o : models.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c  $(GLOBAL_DEPS) 
+communication.o : communication.c $(GLOBAL_DEPS) 
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean : 
+	$(RM) *.o examl
+
+dev : examl
\ No newline at end of file
diff --git a/examl/avxLikelihood.c b/examl/avxLikelihood.c
new file mode 100644
index 0000000..f4438f3
--- /dev/null
+++ b/examl/avxLikelihood.c
@@ -0,0 +1,4052 @@
+#include <unistd.h>
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+#include "axml.h"
+#include <stdint.h>
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+#include <immintrin.h>
+
+#ifdef _FMA
+#include <x86intrin.h>
+#define FMAMACC(a,b,c) _mm256_macc_pd(b,c,a)
+#endif
+
+extern const unsigned int mask32[32];
+
+const union __attribute__ ((aligned (BYTE_ALIGNMENT)))
+{
+  uint64_t i[4];
+  __m256d m;
+  
+} absMask_AVX = {{0x7fffffffffffffffULL, 0x7fffffffffffffffULL, 0x7fffffffffffffffULL, 0x7fffffffffffffffULL}};
+
+
+
+static inline __m256d hadd4(__m256d v, __m256d u)
+{ 
+  __m256d
+    a, b;
+  
+  v = _mm256_hadd_pd(v, v);
+  a = _mm256_permute2f128_pd(v, v, 1);
+  v = _mm256_add_pd(a, v);
+
+  u = _mm256_hadd_pd(u, u);
+  b = _mm256_permute2f128_pd(u, u, 1);
+  u = _mm256_add_pd(b, u);
+
+  v = _mm256_mul_pd(v, u);	
+  
+  return v;
+}
+
+static inline __m256d hadd3(__m256d v)
+{ 
+  __m256d
+    a;
+  
+  v = _mm256_hadd_pd(v, v);
+  a = _mm256_permute2f128_pd(v, v, 1);
+  v = _mm256_add_pd(a, v);
+  
+  return v;
+}
+
+
+void  newviewGTRGAMMA_AVX(int tipCase,
+			 double *x1, double *x2, double *x3,
+			 double *extEV, double *tipVector,
+			 unsigned char *tipX1, unsigned char *tipX2,
+			 const int n, double *left, double *right, int *wgt, int *scalerIncrement
+			 )
+{
+ 
+  int  
+    i, 
+    k, 
+    scale, 
+    addScale = 0;
+ 
+  __m256d 
+    minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+    twoto = _mm256_set1_pd(twotothe256);
+ 
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double 
+	  *uX1, 
+	  umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+	  *uX2, 
+	  umpX2[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m256d 
+	      tv = _mm256_load_pd(&(tipVector[i * 4]));
+
+	    int 
+	      j;
+	    
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&left[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+		}
+	  
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&right[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX2[i * 64 + j * 16 + k * 4], left1);
+		}	    
+	  }   	
+	  
+
+	for(i = 0; i < n; i++)
+	  {	    		 	    
+	    uX1 = &umpX1[64 * tipX1[i]];
+	    uX2 = &umpX2[64 * tipX2[i]];		  
+	    
+	    for(k = 0; k < 4; k++)
+	      {
+		__m256d	   
+		  xv = _mm256_setzero_pd();
+	       
+		int 
+		  l;
+		
+		for(l = 0; l < 4; l++)
+		  {	       	     				      	      																	   
+		    __m256d
+		      x1v =  _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+		
+		    __m256d 
+		      evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+		    xv = FMAMACC(xv,x1v,evv);
+#else						  
+		    xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+		  }
+		
+		_mm256_store_pd(&x3[16 * i + 4 * k], xv);
+	      }	         	   	    
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	double 
+	  *uX1, 
+	  umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m256d 
+	      tv = _mm256_load_pd(&(tipVector[i*4]));
+
+	    int 
+	      j;
+	    
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&left[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+		}	 	   
+	  }   	
+	
+	for(i = 0; i < n; i++)
+	  { 
+	    __m256d
+	      xv[4];	    	   
+	    
+	    scale = 1;
+	    uX1 = &umpX1[64 * tipX1[i]];
+
+	    for(k = 0; k < 4; k++)
+	      {
+		__m256d	   		 
+		  xvr = _mm256_load_pd(&(x2[i * 16 + k * 4]));
+
+		int 
+		  l;
+
+		xv[k]  = _mm256_setzero_pd();
+		  
+		for(l = 0; l < 4; l++)
+		  {	       	     				      	      															
+		    __m256d  
+		      x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),		     
+		      x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+			
+		    x2v = hadd3(x2v);
+		    x1v = _mm256_mul_pd(x1v, x2v);			
+		
+		    __m256d 
+		      evv = _mm256_load_pd(&extEV[l * 4]);
+			
+#ifdef _FMA
+		    xv[k] = FMAMACC(xv[k],x1v,evv);
+#else			  
+		    xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+		  }
+		    
+		if(scale)
+		  {
+		    __m256d 	     
+		      v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+		    v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		    
+		    if(_mm256_movemask_pd( v1 ) != 15)
+		      scale = 0;
+		  }
+	      }	    
+
+	    if(scale)
+	      {
+		xv[0] = _mm256_mul_pd(xv[0], twoto);
+		xv[1] = _mm256_mul_pd(xv[1], twoto);
+		xv[2] = _mm256_mul_pd(xv[2], twoto);
+		xv[3] = _mm256_mul_pd(xv[3], twoto);
+		addScale += wgt[i];
+	      }
+
+	    _mm256_store_pd(&x3[16 * i],      xv[0]);
+	    _mm256_store_pd(&x3[16 * i + 4],  xv[1]);
+	    _mm256_store_pd(&x3[16 * i + 8],  xv[2]);
+	    _mm256_store_pd(&x3[16 * i + 12], xv[3]);
+	  }
+      }
+      break;
+    case INNER_INNER:
+      {
+	for(i = 0; i < n; i++)
+	  {	
+	    __m256d
+	      xv[4];
+	    
+	    scale = 1;
+
+	    for(k = 0; k < 4; k++)
+	      {
+		__m256d	   
+		 
+		  xvl = _mm256_load_pd(&(x1[i * 16 + k * 4])),
+		  xvr = _mm256_load_pd(&(x2[i * 16 + k * 4]));
+
+		int 
+		  l;
+
+		xv[k] = _mm256_setzero_pd();
+
+		for(l = 0; l < 4; l++)
+		  {	       	     				      	      															
+		    __m256d 
+		      x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+		      x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+			
+		    x1v = hadd4(x1v, x2v);			
+		
+		    __m256d 
+		      evv = _mm256_load_pd(&extEV[l * 4]);
+						  
+		    xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+		  }
+		
+		if(scale)
+		  {
+		    __m256d 	     
+		      v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+		    v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		    
+		    if(_mm256_movemask_pd( v1 ) != 15)
+		      scale = 0;
+		  }
+	      }
+
+	     if(scale)
+	      {
+		xv[0] = _mm256_mul_pd(xv[0], twoto);
+		xv[1] = _mm256_mul_pd(xv[1], twoto);
+		xv[2] = _mm256_mul_pd(xv[2], twoto);
+		xv[3] = _mm256_mul_pd(xv[3], twoto);
+		addScale += wgt[i];
+	      }
+		
+	    _mm256_store_pd(&x3[16 * i],      xv[0]);
+	    _mm256_store_pd(&x3[16 * i + 4],  xv[1]);
+	    _mm256_store_pd(&x3[16 * i + 8],  xv[2]);
+	    _mm256_store_pd(&x3[16 * i + 12], xv[3]);
+	  }
+      }
+      break;
+    default:
+      assert(0);
+    }
+
+  
+  *scalerIncrement = addScale;
+  
+}
+
+
+
+
+void newviewGTRCAT_AVX(int tipCase,  double *EV,  int *cptr,
+			   double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+			   unsigned char *tipX1, unsigned char *tipX2,
+			   int n,  double *left, double *right, int *wgt, int *scalerIncrement)
+{
+  double
+    *le,
+    *ri,
+    *x1,
+    *x2;
+    
+  int 
+    i,     
+    addScale = 0;
+   
+  __m256d 
+    minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+    twoto = _mm256_set1_pd(twotothe256);
+  
+  switch(tipCase)
+    {
+    case TIP_TIP:      
+      for (i = 0; i < n; i++)
+	{	 
+	  int 
+	    l;
+	  
+	  le = &left[cptr[i] * 16];
+	  ri = &right[cptr[i] * 16];
+
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+	  
+	  __m256d	   
+	    vv = _mm256_setzero_pd();
+	   	   	    
+	  for(l = 0; l < 4; l++)
+	    {	       	     				      	      															
+	      __m256d 
+		x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+			
+	      x1v = hadd4(x1v, x2v);			
+		
+	      __m256d 
+		evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+	      vv = FMAMACC(vv,x1v,evv);
+#else				
+	      vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));						      	
+#endif
+	    }	  		  
+
+	  _mm256_store_pd(&x3_start[4 * i], vv);	    	   	    
+	}
+      break;
+    case TIP_INNER:      
+      for (i = 0; i < n; i++)
+	{
+	  int 
+	    l;
+
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &x2_start[4 * i];	 
+	  
+	  le =  &left[cptr[i] * 16];
+	  ri =  &right[cptr[i] * 16];
+
+	  __m256d	   
+	    vv = _mm256_setzero_pd();
+	  
+	  for(l = 0; l < 4; l++)
+	    {	       	     				      	      															
+	      __m256d 
+		x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+			
+	      x1v = hadd4(x1v, x2v);			
+		
+	      __m256d 
+		evv = _mm256_load_pd(&EV[l * 4]);
+				
+#ifdef _FMA
+	      vv = FMAMACC(vv,x1v,evv);
+#else	      
+	      vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+	    }	  		  
+	  
+	  
+	  __m256d 	     
+	    v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+	  v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	    
+	  if(_mm256_movemask_pd( v1 ) == 15)
+	    {	     	      
+	      vv = _mm256_mul_pd(vv, twoto);	      
+	      addScale += wgt[i];
+	    }       
+	  
+	  _mm256_store_pd(&x3_start[4 * i], vv);	 	  	  
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  int 
+	    l;
+
+	  x1 = &x1_start[4 * i];
+	  x2 = &x2_start[4 * i];
+	  
+	  
+	  le =  &left[cptr[i] * 16];
+	  ri =  &right[cptr[i] * 16];
+
+	  __m256d	   
+	    vv = _mm256_setzero_pd();
+	  
+	  for(l = 0; l < 4; l++)
+	    {	       	     				      	      															
+	      __m256d 
+		x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+			
+	      x1v = hadd4(x1v, x2v);			
+		
+	      __m256d 
+		evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+	      vv = FMAMACC(vv,x1v,evv);
+#else						
+	      vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));						      	
+#endif
+	    }	  		  
+
+	 
+	  __m256d 	     
+	    v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+	  v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	    
+	  if(_mm256_movemask_pd( v1 ) == 15)
+	    {	
+	      vv = _mm256_mul_pd(vv, twoto);	      
+	      addScale += wgt[i];
+	    }	
+
+	  _mm256_store_pd(&x3_start[4 * i], vv);
+	  	  
+	}
+      break;
+    default:
+      assert(0);
+    }
+
+  
+  *scalerIncrement = addScale;
+}
+
+void newviewGTRCATPROT_AVX(int tipCase, double *extEV,
+			       int *cptr,
+			       double *x1, double *x2, double *x3, double *tipVector,
+			       unsigned char *tipX1, unsigned char *tipX2,
+			       int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+  double
+    *le, *ri, *v, *vl, *vr;
+
+  int i, l, scale, addScale = 0;
+
+#ifdef _FMA
+  int k;
+#endif
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	for (i = 0; i < n; i++)
+	  {	   
+	    le = &left[cptr[i] * 400];
+	    ri = &right[cptr[i] * 400];
+
+	    vl = &(tipVector[20 * tipX1[i]]);
+	    vr = &(tipVector[20 * tipX2[i]]);
+	    v  = &x3[20 * i];	    	    	   	    
+
+	    __m256d vv[5];
+	    
+	    vv[0] = _mm256_setzero_pd();
+	    vv[1] = _mm256_setzero_pd();
+	    vv[2] = _mm256_setzero_pd();
+	    vv[3] = _mm256_setzero_pd();
+	    vv[4] = _mm256_setzero_pd();	   	    
+
+	    for(l = 0; l < 20; l++)
+	      {	       
+		__m256d 
+		  x1v = _mm256_setzero_pd(),
+		  x2v = _mm256_setzero_pd();	
+				
+		double 
+		  *ev = &extEV[l * 20],
+		  *lv = &le[l * 20],
+		  *rv = &ri[l * 20];														
+
+#ifdef _FMA		
+		for(k = 0; k < 20; k += 4) 
+		  {
+		    __m256d vlv = _mm256_load_pd(&vl[k]);
+		    __m256d lvv = _mm256_load_pd(&lv[k]);
+		    x1v = FMAMACC(x1v,vlv,lvv);
+		    __m256d vrv = _mm256_load_pd(&vr[k]);
+		    __m256d rvv = _mm256_load_pd(&rv[k]);
+		    x2v = FMAMACC(x2v,vrv,rvv);
+		  }
+#else		
+		x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+		x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+		x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+		x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+		x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+		x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+		x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+		x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+		x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+		x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));	
+#endif
+
+		x1v = hadd4(x1v, x2v);			
+#ifdef _FMA
+		for(k = 0; k < 5; k++) 
+		  {
+		    __m256d evv = _mm256_load_pd(&ev[k*4]);
+		    vv[k] = FMAMACC(vv[k],x1v,evv);
+		  }	  
+#else		
+		__m256d 
+		  evv[5];
+	    	
+		evv[0] = _mm256_load_pd(&ev[0]);
+		evv[1] = _mm256_load_pd(&ev[4]);
+		evv[2] = _mm256_load_pd(&ev[8]);
+		evv[3] = _mm256_load_pd(&ev[12]);
+		evv[4] = _mm256_load_pd(&ev[16]);		
+		
+		vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+		vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+		vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+		vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+		vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      		      	  
+#endif
+	      }
+	    _mm256_store_pd(&v[0], vv[0]);
+	    _mm256_store_pd(&v[4], vv[1]);
+	    _mm256_store_pd(&v[8], vv[2]);
+	    _mm256_store_pd(&v[12], vv[3]);
+	    _mm256_store_pd(&v[16], vv[4]);
+	  }
+      }
+      break;
+    case TIP_INNER:      	
+      for (i = 0; i < n; i++)
+	{
+	  le = &left[cptr[i] * 400];
+	  ri = &right[cptr[i] * 400];
+	  
+	  vl = &(tipVector[20 * tipX1[i]]);
+	  vr = &x2[20 * i];
+	  v  = &x3[20 * i];	   
+	  
+	  __m256d vv[5];
+	  
+	  vv[0] = _mm256_setzero_pd();
+	  vv[1] = _mm256_setzero_pd();
+	  vv[2] = _mm256_setzero_pd();
+	  vv[3] = _mm256_setzero_pd();
+	  vv[4] = _mm256_setzero_pd();
+	  
+	 
+
+	  for(l = 0; l < 20; l++)
+	    {	       
+	      __m256d 
+		x1v = _mm256_setzero_pd(),
+		x2v = _mm256_setzero_pd();	
+	      
+	      double 
+		*ev = &extEV[l * 20],
+		*lv = &le[l * 20],
+		*rv = &ri[l * 20];														
+#ifdef _FMA
+	      for(k = 0; k < 20; k += 4) 
+		{
+		  __m256d vlv = _mm256_load_pd(&vl[k]);
+		  __m256d lvv = _mm256_load_pd(&lv[k]);
+		  x1v = FMAMACC(x1v,vlv,lvv);
+		  __m256d vrv = _mm256_load_pd(&vr[k]);
+		  __m256d rvv = _mm256_load_pd(&rv[k]);
+		  x2v = FMAMACC(x2v,vrv,rvv);
+		}
+#else	      
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+	      
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+
+	      x1v = hadd4(x1v, x2v);			
+	      
+	      __m256d 
+		evv[5];
+	      
+	      evv[0] = _mm256_load_pd(&ev[0]);
+	      evv[1] = _mm256_load_pd(&ev[4]);
+	      evv[2] = _mm256_load_pd(&ev[8]);
+	      evv[3] = _mm256_load_pd(&ev[12]);
+	      evv[4] = _mm256_load_pd(&ev[16]);		
+
+#ifdef _FMA
+	      for(k = 0; k < 5; k++)
+		vv[k] = FMAMACC(vv[k],x1v,evv[k]);		 
+#else	      
+	      vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+	      vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+	      vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+	      vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+	      vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      	
+#endif
+	    }	  
+
+	   	     
+	  __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+	  
+	  scale = 1;
+	  
+	  for(l = 0; scale && (l < 20); l += 4)
+	    {	       
+	      __m256d 
+		v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+	      v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	      
+	      if(_mm256_movemask_pd( v1 ) != 15)
+		scale = 0;
+	    }	    	  	  
+	 
+
+	  if(scale)
+	    {
+	      __m256d 
+		twoto = _mm256_set1_pd(twotothe256);
+	      
+	      for(l = 0; l < 20; l += 4)
+		vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);		    		 
+	  
+	     
+	      addScale += wgt[i];
+	     	      
+	    }
+
+	  _mm256_store_pd(&v[0], vv[0]);
+	  _mm256_store_pd(&v[4], vv[1]);
+	  _mm256_store_pd(&v[8], vv[2]);
+	  _mm256_store_pd(&v[12], vv[3]);
+	  _mm256_store_pd(&v[16], vv[4]);	       
+	}
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  le = &left[cptr[i] * 400];
+	  ri = &right[cptr[i] * 400];
+
+	  vl = &x1[20 * i];
+	  vr = &x2[20 * i];
+	  v = &x3[20 * i];
+
+	  __m256d vv[5];
+	  
+	  vv[0] = _mm256_setzero_pd();
+	  vv[1] = _mm256_setzero_pd();
+	  vv[2] = _mm256_setzero_pd();
+	  vv[3] = _mm256_setzero_pd();
+	  vv[4] = _mm256_setzero_pd();
+	  
+	  for(l = 0; l < 20; l++)
+	    {	       
+	      __m256d 
+		x1v = _mm256_setzero_pd(),
+		x2v = _mm256_setzero_pd();	
+	      
+	      double 
+		*ev = &extEV[l * 20],
+		*lv = &le[l * 20],
+		*rv = &ri[l * 20];														
+	      
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+	      x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+	      
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+	      x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+
+	      x1v = hadd4(x1v, x2v);			
+#ifdef _FMA
+	       for(k = 0; k < 5; k++) 
+		 {
+		   __m256d evv = _mm256_load_pd(&ev[k*4]);
+		   vv[k] = FMAMACC(vv[k],x1v,evv);
+		 }
+#else	      
+	      __m256d 
+		evv[5];
+	      
+	      evv[0] = _mm256_load_pd(&ev[0]);
+	      evv[1] = _mm256_load_pd(&ev[4]);
+	      evv[2] = _mm256_load_pd(&ev[8]);
+	      evv[3] = _mm256_load_pd(&ev[12]);
+	      evv[4] = _mm256_load_pd(&ev[16]);		
+	      
+	      vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+	      vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+	      vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+	      vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+	      vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      	
+#endif
+	    }	  
+
+	   	     
+	  __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+	  
+	  scale = 1;
+	  
+	  for(l = 0; scale && (l < 20); l += 4)
+	    {	       
+	      __m256d 
+		v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+	      v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	      
+	      if(_mm256_movemask_pd( v1 ) != 15)
+		scale = 0;
+	    }	    	  	  
+
+	  if(scale)
+	    {
+	      __m256d 
+		twoto = _mm256_set1_pd(twotothe256);
+	      
+	      for(l = 0; l < 20; l += 4)
+		vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);		    		 
+	  
+	     
+	      addScale += wgt[i];	      
+	    }
+
+	  _mm256_store_pd(&v[0], vv[0]);
+	  _mm256_store_pd(&v[4], vv[1]);
+	  _mm256_store_pd(&v[8], vv[2]);
+	  _mm256_store_pd(&v[12], vv[3]);
+	  _mm256_store_pd(&v[16], vv[4]);
+	 
+	}
+      break;
+    default:
+      assert(0);
+    }
+  
+  
+  *scalerIncrement = addScale;
+}
+
+
+
+void newviewGTRGAMMAPROT_AVX_LG4(int tipCase,
+				 double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+				 int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n, 
+				 double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling) 
+{
+  double	
+    *uX1, 
+    *uX2, 
+    *v, 
+    x1px2, 
+    *vl, 
+    *vr;
+  
+  int	
+    i, 
+    j, 
+    l, 
+    k, 
+    scale, 
+    addScale = 0;
+
+ 
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+   __m256d
+    bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+  __m256i
+    bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif 
+  
+  switch(tipCase) 
+    {
+    case TIP_TIP: 
+      {
+       
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+	  umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	
+	for(i = 0; i < 23; i++) 
+	  {	    	    
+	    for(k = 0; k < 80; k++) 
+	      {
+		double 
+		  *ll =  &left[k * 20],
+		  *rr =  &right[k * 20];
+		
+		__m256d 
+		  umpX1v = _mm256_setzero_pd(),
+		  umpX2v = _mm256_setzero_pd();
+		
+		v = &(tipVector[k / 20][20 * i]);
+
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+		    __m256d llv = _mm256_load_pd(&ll[l]);
+		    umpX1v = FMAMACC(umpX1v,vv,llv);
+		    __m256d rrv = _mm256_load_pd(&rr[l]);
+		    umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else		    
+		    umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+		    umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+		  }
+		
+		umpX1v = hadd3(umpX1v);
+		umpX2v = hadd3(umpX2v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+		_mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+	      } 
+	  }
+
+	for(i = 0; i < n; i++) 
+	  {	    
+	    uX1 = &umpX1[80 * tipX1[i]];
+	    uX2 = &umpX2[80 * tipX2[i]];
+	   
+	    for(j = 0; j < 4; j++) 
+	      {     	
+		__m256d vv[5];  
+
+		v = &x3[i * 80 + j * 20];
+			
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+
+		for(k = 0; k < 20; k++) 
+		  {			 
+		    x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+		    __m256d x1px2v = _mm256_set1_pd(x1px2);		    
+		    
+		    __m256d extEvv = _mm256_load_pd(&extEV[j][20 * k]);
+#ifdef _FMA
+		    vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+		    vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[0],vv[0]);
+		    
+		    extEvv = _mm256_load_pd(&extEV[j][20 * k + 4]);
+#ifdef _FMA
+		    vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+		    vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[4],vv[1]);
+
+		    extEvv = _mm256_load_pd(&extEV[j][20 * k + 8]);
+#ifdef _FMA
+		    vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+		    vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[8],vv[2]);
+
+		    extEvv = _mm256_load_pd(&extEV[j][20 * k + 12]);
+#ifdef _FMA
+		    vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+		    vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[12],vv[3]);
+
+		    extEvv = _mm256_load_pd(&extEV[j][20 * k + 16]);
+#ifdef _FMA
+		    vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+		    vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[16],vv[4]);
+		  } 
+	      } 
+	  } 
+      } 
+      break;
+    case TIP_INNER: 
+      {
+
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+	  ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for(i = 0; i < 23; i++) 
+	  {	   
+	    for(k = 0; k < 80; k++) 
+	      {
+		__m256d umpX1v = _mm256_setzero_pd();
+		
+		 v = &(tipVector[k / 20][20 * i]);
+
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+		   
+		    umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+		    umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+		  }
+		umpX1v = hadd3(umpX1v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+	      } 
+	  }
+	
+	for (i = 0; i < n; i++) 
+	  {	   
+	    uX1 = &umpX1[80 * tipX1[i]];
+	   	    
+	    for(k = 0; k < 4; k++) 
+	      {
+		v = &(x2[80 * i + k * 20]);
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    __m256d ump_x2v = _mm256_setzero_pd();
+		    		  
+		    __m256d vv = _mm256_load_pd(&v[0]);
+		    __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    vv = _mm256_load_pd(&v[4]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[8]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[12]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[16]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    ump_x2v = hadd3(ump_x2v);
+		    _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+		  }
+		
+		v = &(x3[80 * i + 20 * k]);
+	
+
+		__m256d vv[5]; 
+
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    x1px2 = uX1[k * 20 + l]	* ump_x2[l];
+		    __m256d x1px2v = _mm256_set1_pd(x1px2);	
+	    		 
+#ifdef _FMA
+		    __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+		    vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+		    vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 0])));
+#endif
+		    _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+		    vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+		    vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 4])));
+#endif
+		    _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+		    vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+		    vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 8])));
+#endif
+		    _mm256_store_pd(&v[8],vv[2]);
+		    
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+		    vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+		    vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 12])));
+#endif
+		    _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+		    vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+		    vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 16])));
+#endif
+		    _mm256_store_pd(&v[16],vv[4]);
+
+		  } 
+	      }
+	   
+	    v = &x3[80 * i];
+	    __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+	    scale = 1;
+	    for(l = 0; scale && (l < 80); l += 4) 
+	      {
+		__m256d vv = _mm256_load_pd(&v[l]);
+		__m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+		vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+		if(_mm256_movemask_pd(vv_abs) != 15)
+		  scale = 0;
+	      }
+	    
+	    if(scale) 
+	      {		
+		__m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+		for(l = 0; l < 80; l += 4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		  }
+		if(useFastScaling)
+		  addScale += wgt[i];				
+		else
+		  ex3[i] += 1;
+	      } 
+	  } 
+      } 
+      break;
+    case INNER_INNER:      
+      for(i = 0; i < n; i++) 
+	{ 
+	  scale = 1;
+	  
+	  for(k = 0; k < 4; k++) 
+	    {
+	      vl = &(x1[80 * i + 20 * k]);
+	      vr = &(x2[80 * i + 20 * k]);
+	      v  = &(x3[80 * i + 20 * k]);	      	   
+
+	      __m256d vv[5]; 
+	      
+	      vv[0] = _mm256_setzero_pd();
+	      vv[1] = _mm256_setzero_pd();
+	      vv[2] = _mm256_setzero_pd();
+	      vv[3] = _mm256_setzero_pd();
+	      vv[4] = _mm256_setzero_pd();
+	      
+	      for(l = 0; l < 20; l++) 
+		{		  
+		  __m256d al = _mm256_setzero_pd();
+		  __m256d ar = _mm256_setzero_pd();
+       		  
+		  __m256d leftv  = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+		  __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+		  __m256d vlv = _mm256_load_pd(&vl[0]);
+		  __m256d vrv = _mm256_load_pd(&vr[0]);
+		  
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));		  
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+		  vlv = _mm256_load_pd(&vl[4]);
+		  vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+		  vlv = _mm256_load_pd(&vl[8]);
+		  vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+		  vlv = _mm256_load_pd(&vl[12]);
+		  vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+		  vlv = _mm256_load_pd(&vl[16]);
+		  vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  /**************************************************************************************************************/
+
+		  al = hadd3(al);
+		  ar = hadd3(ar);
+		  al = _mm256_mul_pd(ar,al);
+		  
+		  /************************************************************************************************************/
+#ifdef _FMA		    
+		  __m256d ev =  _mm256_load_pd(&extEV[20 * l + 0]);
+		  vv[0] = FMAMACC(vv[0], al, ev);		 
+#else
+		  vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 0])));			  		 		  
+#endif
+		  _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 4]);
+		  vv[1] = FMAMACC(vv[1], al, ev);		 
+#else
+		  vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 4])));		  		 
+#endif
+		  _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 8]);
+		  vv[2] = FMAMACC(vv[2], al, ev);		 
+#else
+		  vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 8])));		  		 
+#endif
+		  _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 12]);
+		  vv[3] = FMAMACC(vv[3], al, ev);		 
+#else
+		  vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 12])));		  		 
+#endif
+		  _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 16]);
+		  vv[4] = FMAMACC(vv[4], al, ev);		 
+#else
+		  vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 16])));			 	  
+#endif
+		  _mm256_store_pd(&v[16],vv[4]);		 
+		} 
+	    }
+	  v = &(x3[80 * i]);
+	  scale = 1;
+	  __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);	 
+
+	  for(l = 0; scale && (l < 80); l += 4) 
+	    {
+	      __m256d vv = _mm256_load_pd(&v[l]);
+	      __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+	      vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+	      if(_mm256_movemask_pd(vv_abs) != 15)
+		scale = 0;	     
+	    }
+
+	  if(scale) 
+	    {		     	      
+	      __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+	      for(l = 0; l < 80; l += 4) 
+		{
+		  __m256d vv = _mm256_load_pd(&v[l]);
+		  _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		}
+	      if(useFastScaling)
+		addScale += wgt[i];					
+	      else
+		ex3[i] += 1;
+	    } 
+	}
+      break;
+    default:
+      assert(0);
+    }
+ 
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+}
+
+ 
+
+void newviewGTRGAMMAPROT_AVX(int tipCase,
+			     double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+			     unsigned char *tipX1, unsigned char *tipX2, int n, 
+			     double *left, double *right, int *wgt, int *scalerIncrement) 
+{
+  double	
+    *uX1, 
+    *uX2, 
+    *v, 
+    x1px2, 
+    *vl, 
+    *vr;
+  
+  int	
+    i, 
+    j, 
+    l, 
+    k, 
+    scale, 
+    addScale = 0;
+
+ 
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+   __m256d
+    bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+  __m256i
+    bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif 
+  
+  switch(tipCase) 
+    {
+    case TIP_TIP: 
+      {
+       
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+	  umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for(i = 0; i < 23; i++) 
+	  {
+	    v = &(tipVector[20 * i]);
+	    
+	    for(k = 0; k < 80; k++) 
+	      {
+		double 
+		  *ll =  &left[k * 20],
+		  *rr =  &right[k * 20];
+		
+		__m256d 
+		  umpX1v = _mm256_setzero_pd(),
+		  umpX2v = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+		    __m256d llv = _mm256_load_pd(&ll[l]);
+		    umpX1v = FMAMACC(umpX1v,vv,llv);
+		    __m256d rrv = _mm256_load_pd(&rr[l]);
+		    umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else		    
+		    umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+		    umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+		  }
+		
+		umpX1v = hadd3(umpX1v);
+		umpX2v = hadd3(umpX2v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+		_mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+	      } 
+	  }
+
+	for(i = 0; i < n; i++) 
+	  {	    
+	    uX1 = &umpX1[80 * tipX1[i]];
+	    uX2 = &umpX2[80 * tipX2[i]];
+	   
+	    for(j = 0; j < 4; j++) 
+	      {     	
+		__m256d vv[5];  
+
+		v = &x3[i * 80 + j * 20];
+			
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+
+		for(k = 0; k < 20; k++) 
+		  {			 
+		    x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+		    __m256d x1px2v = _mm256_set1_pd(x1px2);		    
+		    
+		    __m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+		    vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+		    vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[0],vv[0]);
+		    
+		    extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+		    vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+		    vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[4],vv[1]);
+
+		    extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+		    vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+		    vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[8],vv[2]);
+
+		    extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+		    vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+		    vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[12],vv[3]);
+
+		    extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+		    vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+		    vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		    _mm256_store_pd(&v[16],vv[4]);
+		  } 
+	      } 
+	  } 
+      } 
+      break;
+    case TIP_INNER: 
+      {
+
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+	  ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for(i = 0; i < 23; i++) 
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++) 
+	      {
+		__m256d umpX1v = _mm256_setzero_pd();
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+		   
+		    umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+		    umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+		  }
+		umpX1v = hadd3(umpX1v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+	      } 
+	  }
+	
+	for (i = 0; i < n; i++) 
+	  {	   
+	    uX1 = &umpX1[80 * tipX1[i]];
+	   	    
+	    for(k = 0; k < 4; k++) 
+	      {
+		v = &(x2[80 * i + k * 20]);
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    __m256d ump_x2v = _mm256_setzero_pd();
+		    		  
+		    __m256d vv = _mm256_load_pd(&v[0]);
+		    __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    vv = _mm256_load_pd(&v[4]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[8]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[12]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[16]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    ump_x2v = hadd3(ump_x2v);
+		    _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+		  }
+		
+		v = &(x3[80 * i + 20 * k]);
+	
+
+		__m256d vv[5]; 
+
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    x1px2 = uX1[k * 20 + l]	* ump_x2[l];
+		    __m256d x1px2v = _mm256_set1_pd(x1px2);	
+	    		 
+#ifdef _FMA
+		    __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+		    vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+		    vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+		    _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+		    vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+		    vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+		    _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+		    vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+		    vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+		    _mm256_store_pd(&v[8],vv[2]);
+		    
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+		    vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+		    vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+		    _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+		    vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+		    vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+		    _mm256_store_pd(&v[16],vv[4]);
+
+		  } 
+	      }
+	   
+	    v = &x3[80 * i];
+	    __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+	    scale = 1;
+	    for(l = 0; scale && (l < 80); l += 4) 
+	      {
+		__m256d vv = _mm256_load_pd(&v[l]);
+		__m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+		vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+		if(_mm256_movemask_pd(vv_abs) != 15)
+		  scale = 0;
+	      }
+	    
+	    if(scale) 
+	      {		
+		__m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+		for(l = 0; l < 80; l += 4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		  }
+	
+		addScale += wgt[i];				
+	
+	      } 
+	  } 
+      } 
+      break;
+    case INNER_INNER:      
+      for(i = 0; i < n; i++) 
+	{ 
+	  scale = 1;
+	  
+	  for(k = 0; k < 4; k++) 
+	    {
+	      vl = &(x1[80 * i + 20 * k]);
+	      vr = &(x2[80 * i + 20 * k]);
+	      v  = &(x3[80 * i + 20 * k]);	      	   
+
+	      __m256d vv[5]; 
+	      
+	      vv[0] = _mm256_setzero_pd();
+	      vv[1] = _mm256_setzero_pd();
+	      vv[2] = _mm256_setzero_pd();
+	      vv[3] = _mm256_setzero_pd();
+	      vv[4] = _mm256_setzero_pd();
+	      
+	      for(l = 0; l < 20; l++) 
+		{		  
+		  __m256d al = _mm256_setzero_pd();
+		  __m256d ar = _mm256_setzero_pd();
+       		  
+		  __m256d leftv  = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+		  __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+		  __m256d vlv = _mm256_load_pd(&vl[0]);
+		  __m256d vrv = _mm256_load_pd(&vr[0]);
+		  
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));		  
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+		  vlv = _mm256_load_pd(&vl[4]);
+		  vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+		  vlv = _mm256_load_pd(&vl[8]);
+		  vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+		  vlv = _mm256_load_pd(&vl[12]);
+		  vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+		  rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+		  vlv = _mm256_load_pd(&vl[16]);
+		  vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA		    
+		  al = FMAMACC(al, vlv, leftv);
+		  ar = FMAMACC(ar, vrv, rightv);
+#else
+		  al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		  ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+		  /**************************************************************************************************************/
+
+		  al = hadd3(al);
+		  ar = hadd3(ar);
+		  al = _mm256_mul_pd(ar,al);
+		  
+		  /************************************************************************************************************/
+#ifdef _FMA		    
+		  __m256d ev =  _mm256_load_pd(&extEV[20 * l + 0]);
+		  vv[0] = FMAMACC(vv[0], al, ev);		 
+#else
+		  vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));			  		 		  
+#endif
+		  _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 4]);
+		  vv[1] = FMAMACC(vv[1], al, ev);		 
+#else
+		  vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));		  		 
+#endif
+		  _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 8]);
+		  vv[2] = FMAMACC(vv[2], al, ev);		 
+#else
+		  vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));		  		 
+#endif
+		  _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 12]);
+		  vv[3] = FMAMACC(vv[3], al, ev);		 
+#else
+		  vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));		  		 
+#endif
+		  _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA		    
+		  ev =  _mm256_load_pd(&extEV[20 * l + 16]);
+		  vv[4] = FMAMACC(vv[4], al, ev);		 
+#else
+		  vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));			 	  
+#endif
+		  _mm256_store_pd(&v[16],vv[4]);		 
+		} 
+	    }
+	  v = &(x3[80 * i]);
+	  scale = 1;
+	  __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);	 
+
+	  for(l = 0; scale && (l < 80); l += 4) 
+	    {
+	      __m256d vv = _mm256_load_pd(&v[l]);
+	      __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+	      vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+	      if(_mm256_movemask_pd(vv_abs) != 15)
+		scale = 0;	     
+	    }
+
+	  if(scale) 
+	    {		     	      
+	      __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+	      for(l = 0; l < 80; l += 4) 
+		{
+		  __m256d vv = _mm256_load_pd(&v[l]);
+		  _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		}
+	     
+	      addScale += wgt[i];						    
+	    } 
+	}
+      break;
+    default:
+      assert(0);
+    }
+ 
+  
+  *scalerIncrement = addScale;
+}
+
+
+/***** functions with memory saving ******************************/
+
+void  newviewGTRGAMMA_AVX_GAPPED_SAVE(int tipCase,
+				      double *x1_start, double *x2_start, double *x3_start,
+				      double *extEV, double *tipVector,
+				      int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				      const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				      unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+				      double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+				      )
+{
+ 
+  int  
+    i, 
+    k, 
+    scale,
+    scaleGap,
+    addScale = 0;
+ 
+  __m256d 
+    minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+    twoto = _mm256_set1_pd(twotothe256);
+ 
+  double
+    *x1,
+    *x2,
+    *x3,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double 
+	  *uX1, 
+	  umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+	  *uX2, 
+	  umpX2[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m256d 
+	      tv = _mm256_load_pd(&(tipVector[i * 4]));
+
+	    int 
+	      j;
+	    
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&left[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+		}
+	  
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&right[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX2[i * 64 + j * 16 + k * 4], left1);
+		}	    
+	  }   	
+	  
+	x3 = x3_gapColumn;
+
+	{
+	  uX1 = &umpX1[960];
+	  uX2 = &umpX2[960];		  
+	  
+	  for(k = 0; k < 4; k++)
+	    {
+	      __m256d	   
+		xv = _mm256_setzero_pd();
+	      
+	      int 
+		l;
+	      
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      																	   
+		  __m256d
+		    x1v =  _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+		  
+		  __m256d 
+		    evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+		  xv = FMAMACC(xv,x1v,evv);
+#else						  
+		  xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+		}
+		    
+	      _mm256_store_pd(&x3[4 * k], xv);
+	    }
+	}
+	
+	x3 = x3_start;
+
+	for(i = 0; i < n; i++)
+	  {		    	    	
+	    if(!(x3_gap[i / 32] & mask32[i % 32]))	     
+	      {
+		uX1 = &umpX1[64 * tipX1[i]];
+		uX2 = &umpX2[64 * tipX2[i]];		  
+	    
+		for(k = 0; k < 4; k++)
+		  {
+		    __m256d	   
+		      xv = _mm256_setzero_pd();
+	       
+		    int 
+		      l;
+		
+		    for(l = 0; l < 4; l++)
+		      {	       	     				      	      																	   
+			__m256d
+			  x1v =  _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+			
+			__m256d 
+			  evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+			xv = FMAMACC(xv,x1v,evv);
+#else						  
+			xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+		      }
+		    
+		    _mm256_store_pd(&x3[4 * k], xv);
+		  }
+
+		x3 += 16;
+	      }
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	double 
+	  *uX1, 
+	  umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+       
+	for (i = 1; i < 16; i++)
+	  {
+	    __m256d 
+	      tv = _mm256_load_pd(&(tipVector[i*4]));
+
+	    int 
+	      j;
+	    
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m256d 
+		    left1 = _mm256_load_pd(&left[j * 16 + k * 4]);		  		  		  
+
+		  left1 = _mm256_mul_pd(left1, tv);		  
+		  left1 = hadd3(left1);
+		  		  		  
+		  _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+		}	 	   
+	  }	
+
+	{ 
+	  __m256d
+	    xv[4];
+	  
+	  scaleGap = 1;
+	  uX1 = &umpX1[960];
+
+	  x2 = x2_gapColumn;			 
+	  x3 = x3_gapColumn;
+
+	  for(k = 0; k < 4; k++)
+	    {
+	      __m256d	   		 
+		xvr = _mm256_load_pd(&(x2[k * 4]));
+
+	      int 
+		l;
+
+	      xv[k]  = _mm256_setzero_pd();
+		  
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      															
+		  __m256d  
+		    x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),		     
+		    x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+			
+		  x2v = hadd3(x2v);
+		  x1v = _mm256_mul_pd(x1v, x2v);			
+		
+		  __m256d 
+		    evv = _mm256_load_pd(&extEV[l * 4]);
+			
+#ifdef _FMA
+		  xv[k] = FMAMACC(xv[k],x1v,evv);
+#else			  
+		  xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+		}
+		    
+	      if(scaleGap)
+		{
+		  __m256d 	     
+		    v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+		  
+		  v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		    
+		  if(_mm256_movemask_pd( v1 ) != 15)
+		    scaleGap = 0;
+		}
+	    }
+	
+	  if(scaleGap)
+	    {
+	      xv[0] = _mm256_mul_pd(xv[0], twoto);
+	      xv[1] = _mm256_mul_pd(xv[1], twoto);
+	      xv[2] = _mm256_mul_pd(xv[2], twoto);
+	      xv[3] = _mm256_mul_pd(xv[3], twoto);	    
+	    }
+
+	  _mm256_store_pd(&x3[0],      xv[0]);
+	  _mm256_store_pd(&x3[4],  xv[1]);
+	  _mm256_store_pd(&x3[8],  xv[2]);
+	  _mm256_store_pd(&x3[12], xv[3]);
+	}
+	
+	x3 = x3_start;
+	
+	for(i = 0; i < n; i++)
+	  {
+	    if((x3_gap[i / 32] & mask32[i % 32]))
+	      {
+		if(scaleGap)
+		  {
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i]  += 1;
+		  }
+	      }
+	    else
+	      {
+		if(x2_gap[i / 32] & mask32[i % 32])
+		  x2 = x2_gapColumn;
+		else
+		  {
+		    x2 = x2_ptr;
+		    x2_ptr += 16;
+		  }
+		
+		__m256d
+		  xv[4];	    	   
+		
+		scale = 1;
+		uX1 = &umpX1[64 * tipX1[i]];
+		
+		for(k = 0; k < 4; k++)
+		  {
+		    __m256d	   		 
+		      xvr = _mm256_load_pd(&(x2[k * 4]));
+		    
+		    int 
+		      l;
+		    
+		    xv[k]  = _mm256_setzero_pd();
+		    
+		    for(l = 0; l < 4; l++)
+		      {	       	     				      	      															
+			__m256d  
+			  x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),		     
+			  x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+			
+			x2v = hadd3(x2v);
+			x1v = _mm256_mul_pd(x1v, x2v);			
+			
+			__m256d 
+			  evv = _mm256_load_pd(&extEV[l * 4]);
+			
+#ifdef _FMA
+			xv[k] = FMAMACC(xv[k],x1v,evv);
+#else			  
+			xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+		      }
+		    
+		    if(scale)
+		      {
+			__m256d 	     
+			  v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+			
+			v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+			
+			if(_mm256_movemask_pd( v1 ) != 15)
+			  scale = 0;
+		      }
+		  }	    
+	      
+		if(scale)
+		  {
+		    xv[0] = _mm256_mul_pd(xv[0], twoto);
+		    xv[1] = _mm256_mul_pd(xv[1], twoto);
+		    xv[2] = _mm256_mul_pd(xv[2], twoto);
+		    xv[3] = _mm256_mul_pd(xv[3], twoto);
+
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i] += 1;		   
+		  }
+	      
+		_mm256_store_pd(&x3[0],      xv[0]);
+		_mm256_store_pd(&x3[4],  xv[1]);
+		_mm256_store_pd(&x3[8],  xv[2]);
+		_mm256_store_pd(&x3[12], xv[3]);
+	      
+		x3 += 16;
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      {          
+	{		
+	  x1 = x1_gapColumn;	     	    
+	  x2 = x2_gapColumn;	    
+	  x3 = x3_gapColumn;
+
+	  __m256d
+	    xv[4];
+	    
+	  scaleGap = 1;
+
+	  for(k = 0; k < 4; k++)
+	    {
+	      __m256d	   
+		
+		xvl = _mm256_load_pd(&(x1[k * 4])),
+		xvr = _mm256_load_pd(&(x2[k * 4]));
+
+	      int 
+		l;
+
+	      xv[k] = _mm256_setzero_pd();
+
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      															
+		  __m256d 
+		    x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+		    x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+		  
+		  x1v = hadd4(x1v, x2v);			
+		  
+		  __m256d 
+		    evv = _mm256_load_pd(&extEV[l * 4]);
+		  
+		  xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+		}
+		
+	      if(scaleGap)
+		  {
+		    __m256d 	     
+		      v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+		    v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		    
+		    if(_mm256_movemask_pd( v1 ) != 15)
+		      scaleGap = 0;
+		  }
+	    }
+
+	  if(scaleGap)
+	    {
+	      xv[0] = _mm256_mul_pd(xv[0], twoto);
+	      xv[1] = _mm256_mul_pd(xv[1], twoto);
+	      xv[2] = _mm256_mul_pd(xv[2], twoto);
+	      xv[3] = _mm256_mul_pd(xv[3], twoto);	       
+	    }
+		
+	  _mm256_store_pd(&x3[0],  xv[0]);
+	  _mm256_store_pd(&x3[4],  xv[1]);
+	  _mm256_store_pd(&x3[8],  xv[2]);
+	  _mm256_store_pd(&x3[12], xv[3]);
+	}	  
+      
+	x3 = x3_start;
+
+	for(i = 0; i < n; i++)
+	  {
+	    if(x3_gap[i / 32] & mask32[i % 32])
+	      {	     
+		if(scaleGap)
+		  {
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i]  += 1; 	       
+		  }
+	      }
+	    else
+	      {	
+		if(x1_gap[i / 32] & mask32[i % 32])
+		  x1 = x1_gapColumn;
+		else
+		  {
+		    x1 = x1_ptr;
+		    x1_ptr += 16;
+		  }
+	     
+		if(x2_gap[i / 32] & mask32[i % 32])
+		  x2 = x2_gapColumn;
+		else
+		  {
+		    x2 = x2_ptr;
+		    x2_ptr += 16;
+		  }
+
+		__m256d
+		  xv[4];
+	    
+		scale = 1;
+
+		for(k = 0; k < 4; k++)
+		  {
+		    __m256d	   
+		      
+		      xvl = _mm256_load_pd(&(x1[k * 4])),
+		      xvr = _mm256_load_pd(&(x2[k * 4]));
+		    
+		    int 
+		      l;
+		    
+		    xv[k] = _mm256_setzero_pd();
+		    
+		    for(l = 0; l < 4; l++)
+		      {	       	     				      	      															
+			__m256d 
+			  x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+			  x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));			    
+			
+			x1v = hadd4(x1v, x2v);			
+			
+			__m256d 
+			  evv = _mm256_load_pd(&extEV[l * 4]);
+			
+			xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+		      }
+		    
+		    if(scale)
+		      {
+			__m256d 	     
+			  v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+			
+			v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+			
+			if(_mm256_movemask_pd( v1 ) != 15)
+			  scale = 0;
+		      }
+		  }
+
+		if(scale)
+		  {
+		    xv[0] = _mm256_mul_pd(xv[0], twoto);
+		    xv[1] = _mm256_mul_pd(xv[1], twoto);
+		    xv[2] = _mm256_mul_pd(xv[2], twoto);
+		    xv[3] = _mm256_mul_pd(xv[3], twoto);
+		    
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i] += 1;
+		  }
+		
+		_mm256_store_pd(&x3[0],      xv[0]);
+		_mm256_store_pd(&x3[4],  xv[1]);
+		_mm256_store_pd(&x3[8],  xv[2]);
+		_mm256_store_pd(&x3[12], xv[3]);
+	      
+		x3 += 16;
+	      }
+	  }
+      }
+      break;
+    default:
+      assert(0);
+    }
+
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+  
+}
+
+
+void newviewGTRCAT_AVX_GAPPED_SAVE(int tipCase,  double *EV,  int *cptr,
+				   double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+				   int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				   int n,  double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				   unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				   double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+  double
+    *le,
+    *ri,
+    *x1,
+    *x2, 
+    *x3,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start, 
+    *x3_ptr = x3_start;
+  
+  int 
+    i, 
+    scaleGap = 0,
+    addScale = 0;
+   
+  __m256d 
+    minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+    twoto = _mm256_set1_pd(twotothe256);
+  
+
+  {
+    int 
+      l;
+
+    x1 = x1_gapColumn;	      
+    x2 = x2_gapColumn;
+    x3 = x3_gapColumn;    	 
+	  	  
+    le =  &left[maxCats * 16];
+    ri =  &right[maxCats * 16];
+
+    __m256d	   
+      vv = _mm256_setzero_pd();
+	  
+    for(l = 0; l < 4; l++)
+      {	       	     				      	      															
+	__m256d 
+	  x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+	  x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+	
+	x1v = hadd4(x1v, x2v);			
+	
+	__m256d 
+	  evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+	vv = FMAMACC(vv,x1v,evv);
+#else						
+	vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));						      	
+#endif
+      }	  		  
+
+    if(tipCase != TIP_TIP)
+      {
+	__m256d 	     
+	  v1 = _mm256_and_pd(vv, absMask_AVX.m);
+    
+	v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+    
+	if(_mm256_movemask_pd( v1 ) == 15)
+	  {
+	    vv = _mm256_mul_pd(vv, twoto);	      	 
+	    scaleGap = 1;
+	  }
+      }
+    
+    _mm256_store_pd(x3, vv);    
+  }
+
+  switch(tipCase)
+    {
+    case TIP_TIP:      
+      for (i = 0; i < n; i++)
+	{ 
+	  if(noGap(x3_gap, i))
+	    {	 
+	      int 
+		l;
+	      
+	      x1 = &(tipVector[4 * tipX1[i]]);
+	      x2 = &(tipVector[4 * tipX2[i]]);
+
+	      x3 = x3_ptr;
+
+	      if(isGap(x1_gap, i))
+		le =  &left[maxCats * 16];
+	      else	  	  
+		le =  &left[cptr[i] * 16];	  
+	      
+	      if(isGap(x2_gap, i))
+		ri =  &right[maxCats * 16];
+	      else	 	  
+		ri =  &right[cptr[i] * 16];
+	  	  
+	      __m256d	   
+		vv = _mm256_setzero_pd();
+	      
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      															
+		  __m256d 
+		    x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		    x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+		  
+		  x1v = hadd4(x1v, x2v);			
+		  
+		  __m256d 
+		    evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+		  vv = FMAMACC(vv,x1v,evv);
+#else				
+		  vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));						      	
+#endif
+		}	  		  
+
+	      _mm256_store_pd(x3, vv);	 
+	      
+	      x3_ptr += 4;
+	    }
+	}
+      break;
+    case TIP_INNER:      
+      for (i = 0; i < n; i++)
+	{ 
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)
+		{
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i] += 1;		   		    
+		}	       
+	    }
+	  else
+	    {
+	      int 
+		l;
+
+	      x1 = &(tipVector[4 * tipX1[i]]);    
+	      x3 = x3_ptr;
+
+	      if(isGap(x1_gap, i))
+		le =  &left[maxCats * 16];
+	      else
+		le =  &left[cptr[i] * 16];
+	  
+	      if(isGap(x2_gap, i))
+		{		 
+		  ri =  &right[maxCats * 16];
+		  x2 = x2_gapColumn;
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 16];
+		  x2 = x2_ptr;
+		  x2_ptr += 4;
+		}	  	 
+
+	      __m256d	   
+		vv = _mm256_setzero_pd();
+	      
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      															
+		  __m256d 
+		    x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		    x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+		  
+		  x1v = hadd4(x1v, x2v);			
+		  
+		  __m256d 
+		    evv = _mm256_load_pd(&EV[l * 4]);
+		  
+#ifdef _FMA
+		  vv = FMAMACC(vv,x1v,evv);
+#else	      
+		  vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+		}	  		  
+	  
+	  
+	      __m256d 	     
+		v1 = _mm256_and_pd(vv, absMask_AVX.m);
+	      
+	      v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	      
+	      if(_mm256_movemask_pd( v1 ) == 15)
+		{	     	      
+		  vv = _mm256_mul_pd(vv, twoto);	      
+		  
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i] += 1;		 
+		}       
+	  
+	      _mm256_store_pd(x3, vv);	 	  	  
+
+	      x3_ptr += 4;
+	    }
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)		   		    
+		{
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i] += 1;
+		}	      
+	    }
+	  else
+	    {
+	      int 
+		l;
+	      
+	      x3 = x3_ptr;
+	      
+	      if(isGap(x1_gap, i))
+		{
+		  x1 = x1_gapColumn;
+		  le =  &left[maxCats * 16];
+		}
+	      else
+		{
+		  le =  &left[cptr[i] * 16];
+		  x1 = x1_ptr;
+		  x1_ptr += 4;
+		}
+
+	      if(isGap(x2_gap, i))	
+		{
+		  x2 = x2_gapColumn;
+		  ri =  &right[maxCats * 16];	    
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 16];
+		  x2 = x2_ptr;
+		  x2_ptr += 4;
+		}	 	  	  	  
+	  
+	      __m256d	   
+		vv = _mm256_setzero_pd();
+	      
+	      for(l = 0; l < 4; l++)
+		{	       	     				      	      															
+		  __m256d 
+		    x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+		    x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));			    
+		  
+		  x1v = hadd4(x1v, x2v);			
+		  
+		  __m256d 
+		    evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+		  vv = FMAMACC(vv,x1v,evv);
+#else						
+		  vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));						      	
+#endif
+		}	  		  
+	      
+	      
+	      __m256d 	     
+		v1 = _mm256_and_pd(vv, absMask_AVX.m);
+	      
+	      v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	      
+	      if(_mm256_movemask_pd( v1 ) == 15)
+		{	
+		  vv = _mm256_mul_pd(vv, twoto);	      
+		  
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i] += 1;		
+		}	
+	      
+	      _mm256_store_pd(x3, vv);
+	      
+	      x3_ptr += 4;
+	    }	  	  
+	}
+      break;
+    default:
+      assert(0);
+    }
+
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+}
+
+void newviewGTRCATPROT_AVX_GAPPED_SAVE(int tipCase, double *extEV,
+				       int *cptr,
+				       double *x1, double *x2, double *x3, double *tipVector,
+				       int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				       int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				       unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				       double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+  double
+    *le, 
+    *ri, 
+    *v, 
+    *vl, 
+    *vr,
+    *x1_ptr = x1,
+    *x2_ptr = x2, 
+    *x3_ptr = x3;
+  
+  int 
+    i, 
+    l, 
+    scale, 
+    addScale = 0,
+    scaleGap = 0;
+
+#ifdef _FMA
+  int k;
+#endif
+
+  {
+    le = &left[maxCats * 400];
+    ri = &right[maxCats * 400];
+    
+    vl = x1_gapColumn;
+    vr = x2_gapColumn;
+    v  = x3_gapColumn;
+
+    __m256d vv[5];
+    
+    vv[0] = _mm256_setzero_pd();
+    vv[1] = _mm256_setzero_pd();
+    vv[2] = _mm256_setzero_pd();
+    vv[3] = _mm256_setzero_pd();
+    vv[4] = _mm256_setzero_pd();
+    
+    for(l = 0; l < 20; l++)
+      {	       
+	__m256d 
+	  x1v = _mm256_setzero_pd(),
+	  x2v = _mm256_setzero_pd();	
+	
+	double 
+	  *ev = &extEV[l * 20],
+	  *lv = &le[l * 20],
+	  *rv = &ri[l * 20];														
+	
+	x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+	x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+	x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+	x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+	x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+	
+	x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+	x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+	x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+	x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+	x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+	
+	x1v = hadd4(x1v, x2v);			
+#ifdef _FMA
+	for(k = 0; k < 5; k++) 
+	  {
+	    __m256d evv = _mm256_load_pd(&ev[k*4]);
+	    vv[k] = FMAMACC(vv[k],x1v,evv);
+	  }
+#else	      
+	__m256d 
+	  evv[5];
+	
+	evv[0] = _mm256_load_pd(&ev[0]);
+	evv[1] = _mm256_load_pd(&ev[4]);
+	evv[2] = _mm256_load_pd(&ev[8]);
+	evv[3] = _mm256_load_pd(&ev[12]);
+	evv[4] = _mm256_load_pd(&ev[16]);		
+	
+	vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+	vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+	vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+	vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+	vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      	
+#endif
+      }	  
+
+
+     if(tipCase != TIP_TIP)
+       {
+	 __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+	  
+	 scale = 1;
+	  
+	 for(l = 0; scale && (l < 20); l += 4)
+	   {	       
+	     __m256d 
+	       v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+	     v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+	     
+	     if(_mm256_movemask_pd( v1 ) != 15)
+	       scale = 0;
+	   }	    	  	  
+
+	 if(scale)
+	   {
+	      __m256d 
+		twoto = _mm256_set1_pd(twotothe256);
+	      
+	      for(l = 0; l < 20; l += 4)
+		vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);		    		 	      	     	      
+	   
+	      scaleGap = 1;
+	   }
+       }
+
+     _mm256_store_pd(&v[0], vv[0]);
+     _mm256_store_pd(&v[4], vv[1]);
+     _mm256_store_pd(&v[8], vv[2]);
+     _mm256_store_pd(&v[12], vv[3]);
+     _mm256_store_pd(&v[16], vv[4]);     
+  }
+
+
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	for (i = 0; i < n; i++)
+	  {
+	    if(noGap(x3_gap, i))	   
+	      {	    
+		vl = &(tipVector[20 * tipX1[i]]);
+		vr = &(tipVector[20 * tipX2[i]]);
+		v  = x3_ptr;	    	    	   	    
+
+		if(isGap(x1_gap, i))
+		  le =  &left[maxCats * 400];
+		else	  	  
+		  le =  &left[cptr[i] * 400];	  
+		
+		if(isGap(x2_gap, i))
+		  ri =  &right[maxCats * 400];
+		else	 	  
+		  ri =  &right[cptr[i] * 400];
+
+		__m256d vv[5];
+		
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();	   	    
+		
+		for(l = 0; l < 20; l++)
+		  {	       
+		    __m256d 
+		      x1v = _mm256_setzero_pd(),
+		      x2v = _mm256_setzero_pd();	
+		    
+		    double 
+		      *ev = &extEV[l * 20],
+		      *lv = &le[l * 20],
+		      *rv = &ri[l * 20];														
+		    
+#ifdef _FMA		
+		    for(k = 0; k < 20; k += 4) 
+		      {
+			__m256d vlv = _mm256_load_pd(&vl[k]);
+			__m256d lvv = _mm256_load_pd(&lv[k]);
+			x1v = FMAMACC(x1v,vlv,lvv);
+			__m256d vrv = _mm256_load_pd(&vr[k]);
+			__m256d rvv = _mm256_load_pd(&rv[k]);
+			x2v = FMAMACC(x2v,vrv,rvv);
+		      }
+#else		
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+		    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));	
+#endif
+		    
+		    x1v = hadd4(x1v, x2v);			
+#ifdef _FMA
+		    for(k = 0; k < 5; k++) 
+		      {
+			__m256d evv = _mm256_load_pd(&ev[k*4]);
+			vv[k] = FMAMACC(vv[k],x1v,evv);
+		      }	  
+#else		
+		    __m256d 
+		      evv[5];
+		    
+		    evv[0] = _mm256_load_pd(&ev[0]);
+		    evv[1] = _mm256_load_pd(&ev[4]);
+		    evv[2] = _mm256_load_pd(&ev[8]);
+		    evv[3] = _mm256_load_pd(&ev[12]);
+		    evv[4] = _mm256_load_pd(&ev[16]);		
+		    
+		    vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+		    vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+		    vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+		    vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+		    vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      		      	  
+#endif
+		  }
+		
+		_mm256_store_pd(&v[0], vv[0]);
+		_mm256_store_pd(&v[4], vv[1]);
+		_mm256_store_pd(&v[8], vv[2]);
+		_mm256_store_pd(&v[12], vv[3]);
+		_mm256_store_pd(&v[16], vv[4]);
+
+		x3_ptr += 20;
+	      }
+	  }
+      }
+      break;
+    case TIP_INNER:      	
+      for (i = 0; i < n; i++)
+	{
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)
+		{
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i] += 1;		   		    
+		}	     
+	    }
+	  else
+	    {
+	      vl = &(tipVector[20 * tipX1[i]]);
+
+	      vr = x2_ptr;
+	      v = x3_ptr;
+	      
+	      if(isGap(x1_gap, i))
+		le =  &left[maxCats * 400];
+	      else
+		le =  &left[cptr[i] * 400];
+	      
+	      if(isGap(x2_gap, i))
+		{		 
+		  ri =  &right[maxCats * 400];
+		  vr = x2_gapColumn;
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 400];
+		  vr = x2_ptr;
+		  x2_ptr += 20;
+		}	  	  
+	  
+	      __m256d vv[5];
+	      
+	      vv[0] = _mm256_setzero_pd();
+	      vv[1] = _mm256_setzero_pd();
+	      vv[2] = _mm256_setzero_pd();
+	      vv[3] = _mm256_setzero_pd();
+	      vv[4] = _mm256_setzero_pd();
+	      	      	      
+	      for(l = 0; l < 20; l++)
+		{	       
+		  __m256d 
+		    x1v = _mm256_setzero_pd(),
+		    x2v = _mm256_setzero_pd();	
+		  
+		  double 
+		    *ev = &extEV[l * 20],
+		    *lv = &le[l * 20],
+		    *rv = &ri[l * 20];														
+#ifdef _FMA
+		  for(k = 0; k < 20; k += 4) 
+		    {
+		      __m256d vlv = _mm256_load_pd(&vl[k]);
+		      __m256d lvv = _mm256_load_pd(&lv[k]);
+		      x1v = FMAMACC(x1v,vlv,lvv);
+		      __m256d vrv = _mm256_load_pd(&vr[k]);
+		      __m256d rvv = _mm256_load_pd(&rv[k]);
+		      x2v = FMAMACC(x2v,vrv,rvv);
+		    }
+#else	      
+		  x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+		  x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+		  x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+		  x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+		  x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+		  
+		  x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+		  x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+		  x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+		  x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+		  x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+		  
+		  x1v = hadd4(x1v, x2v);			
+		  
+		  __m256d 
+		    evv[5];
+		  
+		  evv[0] = _mm256_load_pd(&ev[0]);
+		  evv[1] = _mm256_load_pd(&ev[4]);
+		  evv[2] = _mm256_load_pd(&ev[8]);
+		  evv[3] = _mm256_load_pd(&ev[12]);
+		  evv[4] = _mm256_load_pd(&ev[16]);		
+		  
+#ifdef _FMA
+		  for(k = 0; k < 5; k++)
+		    vv[k] = FMAMACC(vv[k],x1v,evv[k]);		 
+#else	      
+		  vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+		  vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+		  vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+		  vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+		  vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      	
+#endif
+		}	  
+
+	   	     
+	      __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+	  
+	      scale = 1;
+	      
+	      for(l = 0; scale && (l < 20); l += 4)
+		{	       
+		  __m256d 
+		    v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+		  v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		  
+		  if(_mm256_movemask_pd( v1 ) != 15)
+		    scale = 0;
+		}	    	  	  
+	 
+	      if(scale)
+		{
+		  __m256d 
+		    twoto = _mm256_set1_pd(twotothe256);
+		  
+		  for(l = 0; l < 20; l += 4)
+		    vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);		    		 
+		  
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i]  += 1;	      
+		}
+
+	      _mm256_store_pd(&v[0], vv[0]);
+	      _mm256_store_pd(&v[4], vv[1]);
+	      _mm256_store_pd(&v[8], vv[2]);
+	      _mm256_store_pd(&v[12], vv[3]);
+	      _mm256_store_pd(&v[16], vv[4]);	       
+	      
+	      x3_ptr += 20;
+	    }
+	}    
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	   if(isGap(x3_gap, i))
+	     {
+	       if(scaleGap)		   		    
+		 {
+		   if(useFastScaling)
+		     addScale += wgt[i];
+		   else
+		     ex3[i] += 1;
+		 }		 	       
+	     }
+	   else
+	     {
+
+	        v = x3_ptr;
+
+		if(isGap(x1_gap, i))
+		  {
+		    vl = x1_gapColumn;
+		    le =  &left[maxCats * 400];
+		  }
+		else
+		  {
+		    le =  &left[cptr[i] * 400];
+		    vl = x1_ptr;
+		    x1_ptr += 20;
+		  }
+		
+		if(isGap(x2_gap, i))	
+		  {
+		    vr = x2_gapColumn;
+		    ri =  &right[maxCats * 400];	    
+		  }
+		else
+		  {
+		    ri =  &right[cptr[i] * 400];
+		    vr = x2_ptr;
+		    x2_ptr += 20;
+		  }	 	  	 
+		
+		__m256d vv[5];
+		
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l++)
+		  {	       
+		    __m256d 
+		      x1v = _mm256_setzero_pd(),
+		      x2v = _mm256_setzero_pd();	
+		    
+		    double 
+		      *ev = &extEV[l * 20],
+		      *lv = &le[l * 20],
+		      *rv = &ri[l * 20];														
+		    
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+		    x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+		    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));			    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));				    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));			    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));				    
+		    x2v = _mm256_add_pd(x2v,  _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+		    
+		    x1v = hadd4(x1v, x2v);			
+#ifdef _FMA
+		    for(k = 0; k < 5; k++) 
+		      {
+			__m256d evv = _mm256_load_pd(&ev[k*4]);
+			vv[k] = FMAMACC(vv[k],x1v,evv);
+		      }
+#else	      
+		    __m256d 
+		      evv[5];
+		    
+		    evv[0] = _mm256_load_pd(&ev[0]);
+		    evv[1] = _mm256_load_pd(&ev[4]);
+		    evv[2] = _mm256_load_pd(&ev[8]);
+		    evv[3] = _mm256_load_pd(&ev[12]);
+		    evv[4] = _mm256_load_pd(&ev[16]);		
+		    
+		    vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+		    vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+		    vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+		    vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+		    vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));				      	
+#endif
+		  }	  
+
+	   	     
+		__m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+		
+		scale = 1;
+		
+		for(l = 0; scale && (l < 20); l += 4)
+		  {	       
+		    __m256d 
+		      v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+		    v1 = _mm256_cmp_pd(v1,  minlikelihood_avx, _CMP_LT_OS);
+		    
+		    if(_mm256_movemask_pd( v1 ) != 15)
+		      scale = 0;
+		  }	    	  	  
+		
+		if(scale)
+		  {
+		    __m256d 
+		      twoto = _mm256_set1_pd(twotothe256);
+		    
+		    for(l = 0; l < 20; l += 4)
+		      vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);		    		 
+		    
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i]  += 1;	      
+		  }
+
+		_mm256_store_pd(&v[0], vv[0]);
+		_mm256_store_pd(&v[4], vv[1]);
+		_mm256_store_pd(&v[8], vv[2]);
+		_mm256_store_pd(&v[12], vv[3]);
+		_mm256_store_pd(&v[16], vv[4]);
+
+		 x3_ptr += 20;
+	     }
+	}   
+      break;
+    default:
+      assert(0);
+    }
+  
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+}
+
+void newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(int tipCase,
+					 double *x1_start, double *x2_start, double *x3_start, double *extEV, double *tipVector,
+					 int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n, 
+					 double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+					 unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+					 double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn) 
+{
+  double	
+    *x1 = x1_start,
+    *x2 = x2_start,
+    *x3_ptr = x3_start,
+    *x2_ptr = x2_start,
+    *x1_ptr = x1_start,
+    *uX1, 
+    *uX2, 
+    *v, 
+    x1px2, 
+    *vl, 
+    *vr;
+  
+  int	
+    i, 
+    j, 
+    l, 
+    k, 
+    gapScaling = 0,
+    scale, 
+    addScale = 0;
+
+ 
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+   __m256d
+    bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+  __m256i
+    bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif 
+  
+  switch(tipCase) 
+    {
+    case TIP_TIP: 
+      {       
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+	  umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+
+	for(i = 0; i < 23; i++) 
+	  {
+	    v = &(tipVector[20 * i]);
+	    
+	    for(k = 0; k < 80; k++) 
+	      {
+		double 
+		  *ll =  &left[k * 20],
+		  *rr =  &right[k * 20];
+		
+		__m256d 
+		  umpX1v = _mm256_setzero_pd(),
+		  umpX2v = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+		    __m256d llv = _mm256_load_pd(&ll[l]);
+		    umpX1v = FMAMACC(umpX1v,vv,llv);
+		    __m256d rrv = _mm256_load_pd(&rr[l]);
+		    umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else		    
+		    umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+		    umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+		  }
+		
+		umpX1v = hadd3(umpX1v);
+		umpX2v = hadd3(umpX2v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+		_mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+	      } 
+	  }
+
+	
+	{	    
+	  uX1 = &umpX1[1760];
+	  uX2 = &umpX2[1760];
+	  
+	  for(j = 0; j < 4; j++) 
+	    {     	
+	      __m256d vv[5];  
+	      
+	      v = &x3_gapColumn[j * 20];
+	      
+	      vv[0] = _mm256_setzero_pd();
+	      vv[1] = _mm256_setzero_pd();
+	      vv[2] = _mm256_setzero_pd();
+	      vv[3] = _mm256_setzero_pd();
+	      vv[4] = _mm256_setzero_pd();
+	      
+	      for(k = 0; k < 20; k++) 
+		{			 
+		  x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+		  
+		  __m256d x1px2v = _mm256_set1_pd(x1px2);		    
+		  
+		  __m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+		  vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+		  vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		  _mm256_store_pd(&v[0],vv[0]);
+		  
+		  extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+		  vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+		  vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		  _mm256_store_pd(&v[4],vv[1]);
+		  
+		  extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+		  vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+		  vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		  _mm256_store_pd(&v[8],vv[2]);
+		  
+		  extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+		  vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+		  vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		  _mm256_store_pd(&v[12],vv[3]);
+		  
+		  extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+		  vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+		  vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+		  _mm256_store_pd(&v[16],vv[4]);
+		} 
+	    } 
+	}
+
+	
+	for(i = 0; i < n; i++) 
+	  {
+	    if(!(x3_gap[i / 32] & mask32[i % 32]))
+	      {	    
+		uX1 = &umpX1[80 * tipX1[i]];
+		uX2 = &umpX2[80 * tipX2[i]];
+	   
+		for(j = 0; j < 4; j++) 
+		  {     	
+		    __m256d vv[5];  
+		    
+		    v = &x3_ptr[j * 20];
+			
+		    vv[0] = _mm256_setzero_pd();
+		    vv[1] = _mm256_setzero_pd();
+		    vv[2] = _mm256_setzero_pd();
+		    vv[3] = _mm256_setzero_pd();
+		    vv[4] = _mm256_setzero_pd();
+
+		    for(k = 0; k < 20; k++) 
+		      {			 
+			x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+			
+			__m256d x1px2v = _mm256_set1_pd(x1px2);		    
+			
+			__m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+			vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+			vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+			_mm256_store_pd(&v[0],vv[0]);
+			
+			extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+			vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+			vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+			_mm256_store_pd(&v[4],vv[1]);
+			
+			extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+			vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+			vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+			_mm256_store_pd(&v[8],vv[2]);
+			
+			extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+			vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+			vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+			_mm256_store_pd(&v[12],vv[3]);
+			
+			extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+			vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+			vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+			_mm256_store_pd(&v[16],vv[4]);
+		      } 
+		  }
+		x3_ptr += 80;		  
+	      }
+	  }
+      }
+      break;
+    case TIP_INNER: 
+      {
+	double 
+	  umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+	  ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+
+	for(i = 0; i < 23; i++) 
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++) 
+	      {
+		__m256d umpX1v = _mm256_setzero_pd();
+		for(l = 0; l < 20; l+=4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+		   
+		    umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+		    umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+		  }
+		umpX1v = hadd3(umpX1v);
+		_mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+	      } 
+	  }
+
+	{	   
+	  uX1 = &umpX1[1760];
+	   	    
+	  for(k = 0; k < 4; k++) 
+	    {
+	      v = &(x2_gapColumn[k * 20]);
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    __m256d ump_x2v = _mm256_setzero_pd();
+		    		  
+		    __m256d vv = _mm256_load_pd(&v[0]);
+		    __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    vv = _mm256_load_pd(&v[4]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[8]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[12]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+		    vv = _mm256_load_pd(&v[16]);
+		    rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+		    ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+		    ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+		    
+		    ump_x2v = hadd3(ump_x2v);
+		    _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+		  }
+		
+		v = &x3_gapColumn[20 * k];
+	
+		__m256d vv[5]; 
+
+		vv[0] = _mm256_setzero_pd();
+		vv[1] = _mm256_setzero_pd();
+		vv[2] = _mm256_setzero_pd();
+		vv[3] = _mm256_setzero_pd();
+		vv[4] = _mm256_setzero_pd();
+		
+		for(l = 0; l < 20; l++) 
+		  {
+		    x1px2 = uX1[k * 20 + l]	* ump_x2[l];
+		    __m256d x1px2v = _mm256_set1_pd(x1px2);	
+	    		 
+#ifdef _FMA
+		    __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+		    vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+		    vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+		    _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+		    vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+		    vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+		    _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+		    vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+		    vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+		    _mm256_store_pd(&v[8],vv[2]);
+		    
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+		    vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+		    vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+		    _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+		    ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+		    vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+		    vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+		    _mm256_store_pd(&v[16],vv[4]);
+
+		  } 
+	      }
+	   
+	    v = x3_gapColumn;
+	    __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+	    scale = 1;
+	    for(l = 0; scale && (l < 80); l += 4) 
+	      {
+		__m256d vv = _mm256_load_pd(&v[l]);
+		__m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+		vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+		if(_mm256_movemask_pd(vv_abs) != 15)
+		  scale = 0;
+	      }
+	    
+	    if(scale) 
+	      {		
+		__m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+		gapScaling = 1;
+
+		for(l = 0; l < 80; l += 4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		  }	
+	      } 
+	}       
+	
+	for (i = 0; i < n; i++) 
+	  {	   
+	    if((x3_gap[i / 32] & mask32[i % 32]))
+	      {	       
+		if(gapScaling)
+		  {
+		    if(useFastScaling)
+		      addScale += wgt[i];
+		    else
+		      ex3[i]  += 1;
+		  }
+	      }
+	    else
+	      {		
+		uX1 = &umpX1[80 * tipX1[i]];
+		
+		if(x2_gap[i / 32] & mask32[i % 32])
+		  x2 = x2_gapColumn;
+		else
+		  {
+		    x2 = x2_ptr;
+		    x2_ptr += 80;
+		  }	      
+	    
+		for(k = 0; k < 4; k++) 
+		  {
+		    v = &(x2[k * 20]);
+		    
+		    for(l = 0; l < 20; l++) 
+		      {
+			__m256d ump_x2v = _mm256_setzero_pd();
+		    	
+			__m256d vv = _mm256_load_pd(&v[0]);
+			__m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+			ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+			ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+			
+			vv = _mm256_load_pd(&v[4]);
+			rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+			ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+			ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+			
+			vv = _mm256_load_pd(&v[8]);
+			rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+			ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+			ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+			
+			vv = _mm256_load_pd(&v[12]);
+			rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+			ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+			ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+			
+			vv = _mm256_load_pd(&v[16]);
+			rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+			ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+			ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+			
+			ump_x2v = hadd3(ump_x2v);
+			_mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+		      }
+		  
+		    
+		    v = &x3_ptr[k * 20];
+		    
+		    __m256d vv[5]; 
+		    
+		    vv[0] = _mm256_setzero_pd();
+		    vv[1] = _mm256_setzero_pd();
+		    vv[2] = _mm256_setzero_pd();
+		    vv[3] = _mm256_setzero_pd();
+		    vv[4] = _mm256_setzero_pd();
+		    
+		    for(l = 0; l < 20; l++) 
+		      {
+			x1px2 = uX1[k * 20 + l]	* ump_x2[l];
+			__m256d x1px2v = _mm256_set1_pd(x1px2);	
+			
+#ifdef _FMA
+			__m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+			vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+			vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+			_mm256_store_pd(&v[0],vv[0]);
+			
+#ifdef _FMA
+			ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+			vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+			vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+			_mm256_store_pd(&v[4],vv[1]);
+			
+#ifdef _FMA
+			ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+			vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+			vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+			_mm256_store_pd(&v[8],vv[2]);
+			
+#ifdef _FMA
+			ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+			vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+			vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+			_mm256_store_pd(&v[12],vv[3]);
+			
+			
+#ifdef _FMA
+			ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+			vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+			vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+			_mm256_store_pd(&v[16],vv[4]);
+			
+		      } 
+		  }
+		
+		v = x3_ptr;
+		__m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+		scale = 1;
+		for(l = 0; scale && (l < 80); l += 4) 
+		  {
+		    __m256d vv = _mm256_load_pd(&v[l]);
+		    __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+		    vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+		    if(_mm256_movemask_pd(vv_abs) != 15)
+		      scale = 0;
+		  }
+	    
+		if(scale) 
+		  {		
+		    __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+		    for(l = 0; l < 80; l += 4) 
+		      {
+			__m256d vv = _mm256_load_pd(&v[l]);
+			_mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		      }
+		    if(useFastScaling)
+		      addScale += wgt[i];				
+		    else
+		      ex3[i] += 1;
+		  }	      
+		x3_ptr += 80;
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:    	  
+      for(k = 0; k < 4; k++) 
+	{
+	  vl = &(x1_gapColumn[20 * k]);
+	  vr = &(x2_gapColumn[20 * k]);
+	  v  = &(x3_gapColumn[20 * k]);	      	   
+
+	  __m256d vv[5]; 
+	  
+	  vv[0] = _mm256_setzero_pd();
+	  vv[1] = _mm256_setzero_pd();
+	  vv[2] = _mm256_setzero_pd();
+	  vv[3] = _mm256_setzero_pd();
+	  vv[4] = _mm256_setzero_pd();
+	  
+	  for(l = 0; l < 20; l++) 
+	    {		  
+	      __m256d al = _mm256_setzero_pd();
+	      __m256d ar = _mm256_setzero_pd();
+	      
+	      __m256d leftv  = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+	      __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+	      __m256d vlv = _mm256_load_pd(&vl[0]);
+	      __m256d vrv = _mm256_load_pd(&vr[0]);
+	      
+#ifdef _FMA
+	      
+	      al = FMAMACC(al, vlv, leftv);
+	      ar = FMAMACC(ar, vrv, rightv);
+#else
+	      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+	      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));		  
+#endif
+	      
+	      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+	      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+	      vlv = _mm256_load_pd(&vl[4]);
+	      vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+	      
+	      al = FMAMACC(al, vlv, leftv);
+	      ar = FMAMACC(ar, vrv, rightv);
+#else
+	      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+	      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+	      
+	      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+	      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+	      vlv = _mm256_load_pd(&vl[8]);
+	      vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+	      
+	      al = FMAMACC(al, vlv, leftv);
+	      ar = FMAMACC(ar, vrv, rightv);
+#else
+	      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+	      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+	      
+	      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+	      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+	      vlv = _mm256_load_pd(&vl[12]);
+	      vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+	      
+	      al = FMAMACC(al, vlv, leftv);
+	      ar = FMAMACC(ar, vrv, rightv);
+#else
+	      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+	      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+	      
+	      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+	      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+	      vlv = _mm256_load_pd(&vl[16]);
+	      vrv = _mm256_load_pd(&vr[16]);
+	      
+#ifdef _FMA		    
+	      al = FMAMACC(al, vlv, leftv);
+	      ar = FMAMACC(ar, vrv, rightv);
+#else
+	      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+	      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+	      
+	      /**************************************************************************************************************/
+	      
+	      al = hadd3(al);
+	      ar = hadd3(ar);
+	      al = _mm256_mul_pd(ar,al);
+	      
+	      /************************************************************************************************************/
+#ifdef _FMA		    
+	      __m256d ev =  _mm256_load_pd(&extEV[20 * l + 0]);
+	      vv[0] = FMAMACC(vv[0], al, ev);		 
+#else
+	      vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));			  		 		  
+#endif
+	      _mm256_store_pd(&v[0],vv[0]);
+	      
+#ifdef _FMA		    
+	      ev =  _mm256_load_pd(&extEV[20 * l + 4]);
+	      vv[1] = FMAMACC(vv[1], al, ev);		 
+#else
+	      vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));		  		 
+#endif
+	      _mm256_store_pd(&v[4],vv[1]);
+	      
+#ifdef _FMA		    
+	      ev =  _mm256_load_pd(&extEV[20 * l + 8]);
+	      vv[2] = FMAMACC(vv[2], al, ev);		 
+#else
+	      vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));		  		 
+#endif
+	      _mm256_store_pd(&v[8],vv[2]);
+	      
+#ifdef _FMA		    
+	      ev =  _mm256_load_pd(&extEV[20 * l + 12]);
+	      vv[3] = FMAMACC(vv[3], al, ev);		 
+#else
+	      vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));		  		 
+#endif
+	      _mm256_store_pd(&v[12],vv[3]);
+	      
+#ifdef _FMA		    
+	      ev =  _mm256_load_pd(&extEV[20 * l + 16]);
+	      vv[4] = FMAMACC(vv[4], al, ev);		 
+#else
+	      vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));			 	  
+#endif
+	      _mm256_store_pd(&v[16],vv[4]);		 
+	    } 
+	}
+	
+      v = x3_gapColumn;
+      scale = 1;
+      __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);	 
+      
+      for(l = 0; scale && (l < 80); l += 4) 
+	{
+	  __m256d vv = _mm256_load_pd(&v[l]);
+	  __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+	  vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+	  if(_mm256_movemask_pd(vv_abs) != 15)
+	    scale = 0;	     
+	}
+
+      if(scale) 
+	{		     	      
+	  __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+	  gapScaling = 1;
+
+	  for(l = 0; l < 80; l += 4) 
+	    {
+	      __m256d vv = _mm256_load_pd(&v[l]);
+	      _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+	    }
+	  
+	} 
+   
+     
+
+      for(i = 0; i < n; i++) 
+	{   
+	  
+	  if(x3_gap[i / 32] & mask32[i % 32])
+	    {	     
+	      if(gapScaling)
+		{
+		  if(useFastScaling)
+		    addScale += wgt[i];
+		  else
+		    ex3[i]  += 1; 	       
+		}
+	    }
+	  else
+	    {
+	      if(x1_gap[i / 32] & mask32[i % 32])
+		x1 = x1_gapColumn;
+	      else
+		{
+		  x1 = x1_ptr;
+		  x1_ptr += 80;
+		}
+
+	      if(x2_gap[i / 32] & mask32[i % 32])
+		x2 = x2_gapColumn;
+	      else
+		{
+		  x2 = x2_ptr;
+		  x2_ptr += 80;
+		}	   
+	  
+	      for(k = 0; k < 4; k++) 
+		{
+		  vl = &(x1[20 * k]);
+		  vr = &(x2[20 * k]);
+		  v  = &(x3_ptr[20 * k]);	      	   
+		  
+		  __m256d vv[5]; 
+		  
+		  vv[0] = _mm256_setzero_pd();
+		  vv[1] = _mm256_setzero_pd();
+		  vv[2] = _mm256_setzero_pd();
+		  vv[3] = _mm256_setzero_pd();
+		  vv[4] = _mm256_setzero_pd();
+		  
+		  for(l = 0; l < 20; l++) 
+		    {		  
+		      __m256d al = _mm256_setzero_pd();
+		      __m256d ar = _mm256_setzero_pd();
+		      
+		      __m256d leftv  = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+		      __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+		      __m256d vlv = _mm256_load_pd(&vl[0]);
+		      __m256d vrv = _mm256_load_pd(&vr[0]);
+		      
+#ifdef _FMA
+		      
+		      al = FMAMACC(al, vlv, leftv);
+		      ar = FMAMACC(ar, vrv, rightv);
+#else
+		      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));		  
+#endif
+		      
+		      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+		      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+		      vlv = _mm256_load_pd(&vl[4]);
+		      vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+		      
+		      al = FMAMACC(al, vlv, leftv);
+		      ar = FMAMACC(ar, vrv, rightv);
+#else
+		      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+		      
+		      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+		      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+		      vlv = _mm256_load_pd(&vl[8]);
+		      vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+		      
+		      al = FMAMACC(al, vlv, leftv);
+		      ar = FMAMACC(ar, vrv, rightv);
+#else
+		      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+		      
+		      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+		      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+		      vlv = _mm256_load_pd(&vl[12]);
+		      vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+		      
+		      al = FMAMACC(al, vlv, leftv);
+		      ar = FMAMACC(ar, vrv, rightv);
+#else
+		      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+		      
+		      leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+		      rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+		      vlv = _mm256_load_pd(&vl[16]);
+		      vrv = _mm256_load_pd(&vr[16]);
+		      
+#ifdef _FMA		    
+		      al = FMAMACC(al, vlv, leftv);
+		      ar = FMAMACC(ar, vrv, rightv);
+#else
+		      al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+		      ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+		      
+		      /**************************************************************************************************************/
+		      
+		      al = hadd3(al);
+		      ar = hadd3(ar);
+		      al = _mm256_mul_pd(ar,al);
+		      
+		      /************************************************************************************************************/
+#ifdef _FMA		    
+		      __m256d ev =  _mm256_load_pd(&extEV[20 * l + 0]);
+		      vv[0] = FMAMACC(vv[0], al, ev);		 
+#else
+		      vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));			  		 		  
+#endif
+		      _mm256_store_pd(&v[0],vv[0]);
+		      
+#ifdef _FMA		    
+		      ev =  _mm256_load_pd(&extEV[20 * l + 4]);
+		      vv[1] = FMAMACC(vv[1], al, ev);		 
+#else
+		      vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));		  		 
+#endif
+		      _mm256_store_pd(&v[4],vv[1]);
+		      
+#ifdef _FMA		    
+		      ev =  _mm256_load_pd(&extEV[20 * l + 8]);
+		      vv[2] = FMAMACC(vv[2], al, ev);		 
+#else
+		      vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));		  		 
+#endif
+		      _mm256_store_pd(&v[8],vv[2]);
+		      
+#ifdef _FMA		    
+		      ev =  _mm256_load_pd(&extEV[20 * l + 12]);
+		      vv[3] = FMAMACC(vv[3], al, ev);		 
+#else
+		      vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));		  		 
+#endif
+		      _mm256_store_pd(&v[12],vv[3]);
+		      
+#ifdef _FMA		    
+		      ev =  _mm256_load_pd(&extEV[20 * l + 16]);
+		      vv[4] = FMAMACC(vv[4], al, ev);		 
+#else
+		      vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));			 	  
+#endif
+		      _mm256_store_pd(&v[16],vv[4]);		 
+		    }
+		}
+	      
+	      v = x3_ptr;
+	      scale = 1;
+	      
+	      __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);	 
+	      
+	      for(l = 0; scale && (l < 80); l += 4) 
+		{
+		  __m256d vv = _mm256_load_pd(&v[l]);
+		  __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+		  vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+		  if(_mm256_movemask_pd(vv_abs) != 15)
+		    scale = 0;	     
+		}
+	      
+	      if(scale) 
+		{		     	      
+		  __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+		  for(l = 0; l < 80; l += 4) 
+		    {
+		      __m256d vv = _mm256_load_pd(&v[l]);
+		      _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+		    }
+		  if(useFastScaling)
+		    addScale += wgt[i];					
+		  else
+		    ex3[i] += 1;
+		}  
+	      x3_ptr += 80;
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+ 
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+}
diff --git a/examl/axml.c b/examl/axml.c
new file mode 100644
index 0000000..605d096
--- /dev/null
+++ b/examl/axml.c
@@ -0,0 +1,2782 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifdef WIN32
+#include <direct.h>
+#endif
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdarg.h>
+#include <limits.h>
+#include <unistd.h>
+#include <getopt.h>
+
+#include <mpi.h>
+
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+#include <xmmintrin.h>
+/*
+  special bug fix, enforces denormalized numbers to be flushed to zero,
+  without this program is a tiny bit faster though.
+  #include <emmintrin.h> 
+  #define MM_DAZ_MASK    0x0040
+  #define MM_DAZ_ON    0x0040
+  #define MM_DAZ_OFF    0x0000
+*/
+#endif
+
+#include "axml.h"
+
+
+#include "globalVariables.h"
+
+#include "byteFile.h"
+#include "partitionAssignment.h"
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+/***************** UTILITY FUNCTIONS **************************/
+
+/*pInfo *cleanPinfoInit()
+{
+  pInfo *p = (pInfo*)malloc(sizeof(pInfo));
+
+  
+  return p;
+  }*/
+
+
+void storeExecuteMaskInTraversalDescriptor(tree *tr)
+{
+   int model;
+      
+   for(model = 0; model < tr->NumberOfModels; model++)
+     tr->td[0].executeModel[model] = tr->executeModel[model];
+}
+
+void storeValuesInTraversalDescriptor(tree *tr, double *value)
+{
+   int model;
+      
+   for(model = 0; model < tr->NumberOfModels; model++)
+     tr->td[0].parameterValues[model] = value[model];
+}
+
+
+
+void myBinFwrite(void *ptr, size_t size, size_t nmemb, FILE *byteFile)
+{
+  size_t
+    bytes_read;
+  
+  bytes_read = fwrite(ptr, size, nmemb, byteFile);
+
+  assert(bytes_read == nmemb);
+}
+
+void myBinFread(void *ptr, size_t size, size_t nmemb, FILE *byteFile)
+{  
+  size_t
+    bytes_read;
+  
+  bytes_read = fread(ptr, size, nmemb, byteFile);
+
+  assert(bytes_read == nmemb);
+}
+
+
+static void outOfMemory(void)
+{
+  printf("ExaML process %d was not able to allocate enough memory.\n", processID);
+  printf("Please check the approximate memory consumption of your dataset using\n");
+  printf("the memory calculator at http://www.exelixis-lab.org/web/software/raxml/index.html.\n");
+  printf("ExaML will exit now\n");
+
+  
+  MPI_Abort(MPI_COMM_WORLD, -1);
+
+  exit(-1);
+ }
+
+void *malloc_aligned(size_t size) 
+{
+  void 
+    *ptr = (void *)NULL;
+ 
+  int 
+    res;
+  
+
+#ifdef WIN32
+  ptr = _aligned_malloc(size, BYTE_ALIGNMENT);;
+#else
+  res = posix_memalign( &ptr, BYTE_ALIGNMENT, size );
+
+  if(res != 0)
+  {
+    outOfMemory();
+    assert(0);
+  }
+#endif 
+   
+  return ptr;
+}
+
+
+
+
+
+
+
+static void printBoth(FILE *f, const char* format, ... )
+{
+  if(processID == 0)
+    {
+      va_list args;
+      va_start(args, format);
+      vfprintf(f, format, args );
+      va_end(args);
+      
+      va_start(args, format);
+      vprintf(format, args );
+      va_end(args);
+    }
+}
+
+
+
+
+void printBothOpen(const char* format, ... )
+{
+  if(processID == 0)
+    {
+      FILE *f = myfopen(infoFileName, "ab");
+      
+      va_list args;
+      va_start(args, format);
+      vfprintf(f, format, args );
+      va_end(args);
+      
+      va_start(args, format);
+      vprintf(format, args );
+      va_end(args);
+      
+      fclose(f);
+    }
+}
+
+static void printBothOpenDifferentFile(char *fileName, const char* format, ... )
+{
+  if(processID == 0)
+    {
+      FILE 
+	*f = myfopen(fileName, "ab");
+      
+      va_list 
+	args;
+      
+      va_start(args, format);
+      vfprintf(f, format, args );
+      va_end(args);
+            
+      fclose(f);
+    }
+}
+
+
+
+boolean getSmoothFreqs(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].smoothFrequencies;
+}
+
+const unsigned int *getBitVector(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].bitVector;
+}
+
+
+int getStates(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].states;
+}
+
+int getUndetermined(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].undetermined;
+}
+
+
+
+char getInverseMeaning(int dataType, unsigned char state)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return  pLengths[dataType].inverseMeaning[state];
+}
+
+partitionLengths *getPartitionLengths(pInfo *p)
+{
+  int 
+    dataType  = p->dataType,
+    states    = p->states,
+    tipLength = p->maxTipStates;
+
+  assert(states != -1 && tipLength != -1);
+
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  pLength.leftLength = pLength.rightLength = states * states;
+  pLength.eignLength = states;
+  pLength.evLength   = states * states;
+  pLength.eiLength   = states * states;
+  pLength.substRatesLength = (states * states - states) / 2;
+  pLength.frequenciesLength = states;
+  pLength.tipVectorLength   = tipLength * states;
+  pLength.symmetryVectorLength = (states * states - states) / 2;
+  pLength.frequencyGroupingLength = states;
+  pLength.nonGTR = FALSE;
+
+  return (&pLengths[dataType]); 
+}
+
+
+
+
+
+
+
+
+
+
+size_t discreteRateCategories(int rateHetModel)
+{
+  size_t 
+    result;
+
+  switch(rateHetModel)
+    {
+    case CAT:
+      result = 1;
+      break;
+    case GAMMA:
+      result = 4;
+      break;
+    default:
+      assert(0);
+    }
+
+  return result;
+}
+
+
+
+double gettime(void)
+{
+#ifdef WIN32
+  time_t tp;
+  struct tm localtm;
+  tp = time(NULL);
+  localtm = *localtime(&tp);
+  return 60.0*localtm.tm_min + localtm.tm_sec;
+#else
+  struct timeval ttime;
+  gettimeofday(&ttime , NULL);
+  return ttime.tv_sec + ttime.tv_usec * 0.000001;
+#endif
+}
+
+int gettimeSrand(void)
+{
+#ifdef WIN32
+  time_t tp;
+  struct tm localtm;
+  tp = time(NULL);
+  localtm = *localtime(&tp);
+  return 24*60*60*localtm.tm_yday + 60*60*localtm.tm_hour + 60*localtm.tm_min  + localtm.tm_sec;
+#else
+  struct timeval ttime;
+  gettimeofday(&ttime , NULL);
+  return ttime.tv_sec + ttime.tv_usec;
+#endif
+}
+
+double randum (long  *seed)
+{
+  long  sum, mult0, mult1, seed0, seed1, seed2, newseed0, newseed1, newseed2;
+  double res;
+
+  mult0 = 1549;
+  seed0 = *seed & 4095;
+  sum  = mult0 * seed0;
+  newseed0 = sum & 4095;
+  sum >>= 12;
+  seed1 = (*seed >> 12) & 4095;
+  mult1 =  406;
+  sum += mult0 * seed1 + mult1 * seed0;
+  newseed1 = sum & 4095;
+  sum >>= 12;
+  seed2 = (*seed >> 24) & 255;
+  sum += mult0 * seed2 + mult1 * seed1;
+  newseed2 = sum & 255;
+
+  *seed = newseed2 << 24 | newseed1 << 12 | newseed0;
+  res = 0.00390625 * (newseed2 + 0.000244140625 * (newseed1 + 0.000244140625 * newseed0));
+
+  return res;
+}
+
+static int filexists(char *filename)
+{
+  FILE 
+    *fp;
+  
+  int 
+    res;
+  
+  fp = fopen(filename,"rb");
+
+  if(fp)
+    {
+      res = 1;
+      fclose(fp);
+    }
+  else
+    res = 0;
+
+  return res;
+}
+
+
+FILE *myfopen(const char *path, const char *mode)
+{
+  FILE *fp = fopen(path, mode);
+
+  if(strcmp(mode,"r") == 0 || strcmp(mode,"rb") == 0)
+    {
+      if(fp)
+	return fp;
+      else
+	{
+	  if(processID == 0)
+	    printf("The file %s you want to open for reading does not exist, exiting ...\n", path);
+	  errorExit(-1);
+	  return (FILE *)NULL;
+	}
+    }
+  else
+    {
+      if(fp)
+	return fp;
+      else
+	{
+	  if(processID == 0)
+	    printf("The file %s ExaML wants to open for writing or appending can not be opened [mode: %s], exiting ...\n",
+		   path, mode);
+	  errorExit(-1);
+	  return (FILE *)NULL;
+	}
+    }
+
+
+}
+
+
+
+
+
+/********************* END UTILITY FUNCTIONS ********************/
+
+
+/******************************some functions for the likelihood computation ****************************/
+
+
+boolean isTip(int number, int maxTips)
+{
+  assert(number > 0);
+
+  if(number <= maxTips)
+    return TRUE;
+  else
+    return FALSE;
+}
+
+
+
+
+
+
+
+
+
+void getxnode (nodeptr p)
+{
+  nodeptr  s;
+
+  if ((s = p->next)->x || (s = s->next)->x)
+    {
+      p->x = s->x;
+      s->x = 0;
+    }
+
+  assert(p->x);
+}
+
+
+
+
+
+void hookup (nodeptr p, nodeptr q, double *z, int numBranches)
+{
+  int i;
+
+  p->back = q;
+  q->back = p;
+
+  for(i = 0; i < numBranches; i++)
+    p->z[i] = q->z[i] = z[i];
+}
+
+void hookupDefault (nodeptr p, nodeptr q, int numBranches)
+{
+  int i;
+
+  p->back = q;
+  q->back = p;
+
+  for(i = 0; i < numBranches; i++)
+    p->z[i] = q->z[i] = defaultz;
+}
+
+
+/***********************reading and initializing input ******************/
+
+
+
+
+
+
+
+boolean whitechar (int ch)
+{
+  return (ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r');
+}
+
+
+
+
+
+
+
+
+
+
+static unsigned int KISS32(void)
+{
+  static unsigned int 
+    x = 123456789, 
+    y = 362436069,
+    z = 21288629,
+    w = 14921776,
+    c = 0;
+
+  unsigned int t;
+
+  x += 545925293;
+  y ^= (y<<13); 
+  y ^= (y>>17); 
+  y ^= (y<<5);
+  t = z + w + c; 
+  z = w; 
+  c = (t>>31); 
+  w = t & 2147483647;
+
+  return (x+y+w);
+}
+
+static boolean setupTree (tree *tr)
+{
+  nodeptr  
+    p0, 
+    p, 
+    q;
+  
+  int
+    i,
+    j,   
+    tips,
+    inter; 
+  
+  tr->bigCutoff = FALSE;
+  
+  tr->maxCategories = MAX(4, tr->categories);
+  
+  tr->partitionContributions = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+  tr->partitionWeights       = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+  for(i = 0; i < tr->NumberOfModels; i++)
+    {
+      tr->partitionContributions[i] = -1.0;
+      tr->partitionWeights[i] = -1.0;
+    }
+  
+  tr->perPartitionLH = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+    
+  for(i = 0; i < tr->NumberOfModels; i++)    
+    tr->perPartitionLH[i] = 0.0;	    
+     
+  tips  = tr->mxtips;
+  inter = tr->mxtips - 1;
+
+  /* printf("%d tips\t%d inner\n", tips, inter); */
+   
+  
+  tr->treeStringLength = tr->mxtips * (nmlngth+128) + 256 + tr->mxtips * 2;
+
+  tr->tree_string  = (char*)calloc(tr->treeStringLength, sizeof(char)); 
+  tr->tree0 = (char*)calloc(tr->treeStringLength, sizeof(char));
+  tr->tree1 = (char*)calloc(tr->treeStringLength, sizeof(char));
+
+
+  /* TODO, must that be so long ? */
+  /* assert(0);  */
+
+
+  tr->td[0].count = 0;
+  tr->td[0].ti    = (traversalInfo *)malloc(sizeof(traversalInfo) * tr->mxtips);
+  tr->td[0].executeModel = (boolean *)malloc(sizeof(boolean) * tr->NumberOfModels);
+  tr->td[0].parameterValues = (double *)malloc(sizeof(double) * tr->NumberOfModels);  
+  
+  tr->constraintVector = (int *)malloc((2 * tr->mxtips) * sizeof(int));
+
+
+  if (!(p0 = (nodeptr) malloc((tips + 3*inter) * sizeof(node))))
+    {
+      printf("ERROR: Unable to obtain sufficient tree memory\n");
+      return  FALSE;
+    }
+  
+  tr->nodeBaseAddress = p0;
+
+
+  if (!(tr->nodep = (nodeptr *) malloc((2*tr->mxtips) * sizeof(nodeptr))))
+    {
+      printf("ERROR: Unable to obtain sufficient tree memory, too\n");
+      return  FALSE;
+    }
+
+  tr->nodep[0] = (node *) NULL;    /* Use as 1-based array */
+
+  for (i = 1; i <= tips; i++)
+    {
+      p = p0++;
+
+      p->hash   =  KISS32(); /* hast table stuff */
+      p->x      =  0;
+      p->xBips  =  0;
+      p->number =  i;
+      p->next   =  p;
+      p->back   = (node *)NULL;     
+      tr->nodep[i] = p;
+    }
+
+  for (i = tips + 1; i <= tips + inter; i++)
+    {
+      q = (node *) NULL;
+      for (j = 1; j <= 3; j++)
+	{	 
+	  p = p0++;
+	  if(j == 1)
+	    {
+	      p->xBips = 1;
+	      p->x = 1;
+	    }
+	  else
+	    {
+	      p->xBips = 0;
+	      p->x =  0;
+	    }
+	  p->number = i;
+	  p->next   = q;	  
+	  p->back   = (node *) NULL;
+	  p->hash   = 0;       
+	  q = p;
+	}
+      p->next->next->next = p;
+      tr->nodep[i] = p;
+    }
+
+  tr->likelihood  = unlikely;
+  tr->start       = (node *) NULL;  
+
+  tr->ntips       = 0;
+  tr->nextnode    = 0;
+ 
+  for(i = 0; i < tr->numBranches; i++)
+    tr->partitionSmoothed[i] = FALSE;
+
+  tr->bitVectors = (unsigned int **)NULL;
+
+  tr->vLength = 0;
+
+  tr->h = (hashtable*)NULL;
+  
+  tr->nameHash = initStringHashTable(10 * tr->mxtips);
+
+  return TRUE;
+}
+
+
+
+
+static void initAdef(analdef *adef)
+{   
+  adef->max_rearrange          = 21;
+  adef->stepwidth              = 5;
+  adef->initial                = 10;
+  adef->bestTrav               = 10;
+  adef->initialSet             = FALSE; 
+  adef->mode                   = BIG_RAPID_MODE; 
+  adef->likelihoodEpsilon      = 0.1;
+ 
+  adef->permuteTreeoptimize    = FALSE; 
+  adef->perGeneBranchLengths   = FALSE;  
+ 
+  adef->useCheckpoint          = FALSE;
+   
+  adef->useQuartetGrouping        = FALSE;
+  adef->numberRandomQuartets      = 0;
+
+  adef->quartetCkpInterval        = 1000;
+
+#ifdef _BAYESIAN 
+  adef->bayesian               = FALSE;
+#endif
+
+}
+
+
+
+static int modelExists(char *model, tree *tr)
+{  
+   if(strcmp(model, "PSR\0") == 0)
+    {
+      tr->rateHetModel = CAT;
+      return 1;
+    }
+
+  if(strcmp(model, "GAMMA\0") == 0)
+    {
+      tr->rateHetModel = GAMMA;
+      return 1;
+    }
+
+  
+  return 0;
+}
+
+
+
+
+
+
+/*********************************** *********************************************************/
+
+
+static void printVersionInfo(void)
+{
+  if(processID == 0)
+    printf("\n\nThis is %s version %s released by Alexandros Stamatakis, Andre J. Aberer, and Alexey Kozlov on %s.\n\n",  programName, programVersion, programDate); 
+}
+
+static void printMinusFUsage(void)
+{
+  printf("\n");
+ 
+
+  printf("              \"-f d\": new rapid hill-climbing \n");
+  printf("                      DEFAULT: ON\n");
+
+  printf("\n");
+
+  printf("               \"-f e\": compute the likelihood of a bunch of trees passed via -t\n");
+  printf("                this option will do a quick and dirty optimization without re-optimizng\n");
+  printf("                the model parameters for each tree\n");
+
+  printf("\n");
+
+  printf("               \"-f E\": compute the likelihood of a bunch of trees passed via -t\n");
+  printf("                this option will do a thorough optimization that re-optimizes\n");
+  printf("                the model parameters for each tree\n");
+
+  printf("\n");
+
+  printf("              \"-f o\": old and slower rapid hill-climbing without heuristic cutoff\n");
+
+  printf("\n");
+
+  printf("              \"-f q\": fast quartet calculator\n");
+  
+  printf("\n");
+
+  printf("              DEFAULT for \"-f\": new rapid hill climbing\n");
+
+  printf("\n");
+}
+
+
+static void printREADME(void)
+{
+  if(processID == 0)
+    {
+      printVersionInfo();
+      printf("\n");  
+      printf("\nTo report bugs use the RAxML google group\n");
+      printf("Please send me all input files, the exact invocation, details of the HW and operating system,\n");
+      printf("as well as all error messages printed to screen.\n\n\n");
+      
+      printf("examl|examl-AVX\n");
+      printf("      -s binarySequenceFileName\n");
+      printf("      -n outputFileNames\n");
+      printf("      -m rateHeterogeneityModel\n");
+      printf("      -t userStartingTree|-R binaryCheckpointFile|-g constraintTree -p randomNumberSeed\n");
+      printf("      [-a]\n");
+      printf("      [-B numberOfMLtreesToSave]\n"); 
+      printf("      [-c numberOfCategories]\n");
+      printf("      [-D]\n");
+      printf("      [-e likelihoodEpsilon] \n");
+      printf("      [-f d|e|E|o|q]\n");    
+      printf("      [-h] \n");
+      printf("      [-i initialRearrangementSetting] \n");
+      printf("      [-I quartetCheckpointInterval] \n");
+      printf("      [-M]\n");
+      printf("      [-r randomQuartetNumber] \n");
+      printf("      [-S]\n");
+      printf("      [-v]\n"); 
+      printf("      [-w outputDirectory] \n"); 
+      printf("      [-Y quartetGroupingFileName]\n");
+      printf("      [--auto-prot=ml|bic|aic|aicc]\n");
+      printf("\n");  
+      printf("      -a      use the median for the discrete approximation of the GAMMA model of rate heterogeneity\n");
+      printf("\n");
+      printf("              DEFAULT: OFF\n");
+      printf("\n");
+      printf("      -B      specify the number of best ML trees to save and print to file\n");
+      printf("\n");
+      printf("      -c      Specify number of distinct rate catgories for ExaML when modelOfEvolution\n");
+      printf("              is set to GTRPSR\n");
+      printf("              Individual per-site rates are categorized into numberOfCategories rate \n");
+      printf("              categories to accelerate computations. \n");
+      printf("\n");
+      printf("              DEFAULT: 25\n");
+      printf("\n");
+      printf("      -D      ML search convergence criterion. This will break off ML searches if the relative \n");
+      printf("              Robinson-Foulds distance between the trees obtained from two consecutive lazy SPR cycles\n");
+      printf("              is smaller or equal to 1%s. Usage recommended for very large datasets in terms of taxa.\n", "%");
+      printf("              On trees with more than 500 taxa this will yield execution time improvements of approximately 50%s\n",  "%");
+      printf("              While yielding only slightly worse trees.\n");
+      printf("\n");
+      printf("              DEFAULT: OFF\n");    
+      printf("\n");
+      printf("      -e      set model optimization precision in log likelihood units for final\n");
+      printf("              optimization of model parameters\n");
+      printf("\n");
+      printf("              DEFAULT: 0.1 \n"); 
+      printf("\n");
+      printf("      -f      select algorithm:\n");
+      
+      printMinusFUsage();
+ 
+      printf("\n");
+      printf("      -g      Pass a multi-furcating constraint tree to ExaML. The tree needs to contain all taxa of the alignment!\n");
+      printf("              When using this option you also need to specify a random number seed via \"-p\"\n");
+      printf("\n");
+      printf("      -h      Display this help message.\n");
+      printf("\n");  
+      printf("      -i      Initial rearrangement setting for the subsequent application of topological \n");
+      printf("              changes phase\n");
+      printf("\n");
+      printf("      -I      Set after how many quartet evaluations a new checkpoint will be printed.\n");
+      printf("\n");
+      printf("              DEFAULT: 1000\n");
+      printf("\n");
+      printf("      -m      Model of rate heterogeneity\n");
+      printf("\n"); 
+      printf("              select \"-m PSR\" for the per-site rate category model (this used to be called CAT in RAxML)\n");
+      printf("              select \"-m GAMMA\" for the gamma model of rate heterogeneity with 4 discrete rates\n");
+      printf("\n");
+      printf("      -M      Switch on estimation of individual per-partition branch lengths. Only has effect when used in combination with \"-q\"\n");
+      printf("              Branch lengths for individual partitions will be printed to separate files\n");
+      printf("              A weighted average of the branch lengths is computed by using the respective partition lengths\n");
+      printf("\n");
+      printf("              DEFAULT: OFF\n");
+      printf("\n");
+      printf("      -n      Specifies the name of the output file.\n"); 
+      printf("\n");
+      printf("      -p      Specify a random number seed, required in conjunction with the \"-g\" option for constraint trees\n");
+      printf("\n");
+      printf("      -R      read in a binary checkpoint file called ExaML_binaryCheckpoint.RUN_ID_number\n");
+      printf("\n");
+      printf("      -r      Pass the number of quartets to randomly sub-sample from the possible number of quartets for the given taxon set.\n");
+      printf("              Only works in combination with -f q !\n");
+      printf("\n");
+      printf("      -s      Specify the name of the BINARY alignment data file generated by the parser component\n");
+      printf("\n");
+      printf("      -S      turn on memory saving option for gappy multi-gene alignments. For large and gappy datasets specify -S to save memory\n");
+      printf("              This will produce slightly different likelihood values, may be a bit slower but can reduce memory consumption\n");
+      printf("              from 70GB to 19GB on very large and gappy datasets\n");
+      printf("\n");
+      printf("      -t      Specify a user starting tree file name in Newick format\n");
+      printf("\n");
+      printf("      -v      Display version information\n");
+      printf("\n");
+      printf("      -w      FULL (!) path to the directory into which ExaML shall write its output files\n");
+      printf("\n");
+      printf("              DEFAULT: current directory\n");  
+      printf("\n"); 
+      printf("      -Y      Pass a quartet grouping file name defining four groups from which to draw quartets\n");
+      printf("              The file input format must contain 4 groups in the following form:\n");
+      printf("              (Chicken, Human, Loach), (Cow, Carp), (Mouse, Rat, Seal), (Whale, Frog);\n");
+      printf("              Only works in combination with -f q !\n");
+      printf("\n");
+      
+      printf("\n");
+      printf("      --auto-prot=ml|bic|aic|aicc When using automatic protein model selection you can chose the criterion for selecting these models.\n");
+      printf("              RAxML will test all available prot subst. models except for LG4M, LG4X and GTR-based models, with and without empirical base frequencies.\n");
+      printf("              You can chose between ML score based selection and the BIC, AIC, and AICc criteria.\n");
+      printf("\n");
+      printf("              DEFAULT: ml\n");
+      printf("\n\n\n\n");
+    }
+}
+
+
+
+
+static void analyzeRunId(char id[128])
+{
+  int i = 0;
+
+  while(id[i] != '\0')
+    {    
+      if(i >= 128)
+	{
+	  printf("Error: run id after \"-n\" is too long, it has %d characters please use a shorter one\n", i);
+	  assert(0);
+	}
+      
+      if(id[i] == '/')
+	{
+	  printf("Error character %c not allowed in run ID\n", id[i]);
+	  assert(0);
+	}
+
+
+      i++;
+    }
+
+  if(i == 0)
+    {
+      printf("Error: please provide a string for the run id after \"-n\" \n");
+      assert(0);
+    }
+
+}
+
+static void get_args(int argc, char *argv[], analdef *adef, tree *tr)
+{
+  boolean   
+    resultDirSet = FALSE;
+
+  char
+    resultDir[1024] = "",          
+    //*optarg,
+    model[1024] = "",       
+    modelChar;
+
+  double 
+    likelihoodEpsilon;
+  
+  int        
+    fOptionCount = 0,
+    c,
+    nameSet = 0,
+    treeSet = 0,   
+    modelSet = 0, 
+    byteFileSet = 0,
+    seedSet = 0;
+
+
+  /*********** tr inits **************/ 
+ 
+  tr->doCutoff = TRUE;
+  tr->secondaryStructureModel = SEC_16; /* default setting */
+  tr->searchConvergenceCriterion = FALSE;
+  tr->rateHetModel = GAMMA;
+ 
+  tr->multiStateModel  = GTR_MULTI_STATE;
+  tr->useGappedImplementation = FALSE;
+  tr->saveMemory = FALSE;
+  tr->constraintTree = FALSE;
+
+  tr->fastTreeEvaluation = FALSE;
+
+  /* tr->manyPartitions = FALSE; */
+
+  tr->categories             = 25;
+
+  tr->gapyness               = 0.0; 
+  tr->saveBestTrees          = 0;
+
+  tr->useMedian = FALSE;
+  
+  tr->autoProteinSelectionType = AUTO_ML;
+  
+  /********* tr inits end*************/
+	
+  //while(!bad_opt && ((c = mygetopt(argc,argv,"R:B:e:c:f:i:m:t:g:w:n:s:p:vhMSDa", &optind, &optarg))!=-1))
+	
+  static 
+    int flag;
+  
+  while(1)
+    {
+      static struct 
+	option long_options[2] =
+	{	 	 
+	  {"auto-prot",   required_argument, &flag, 1},	   	  	 	 
+	  {0, 0, 0, 0}
+	};
+      
+      int 
+	option_index;
+      
+      flag = 0;        
+
+      c = getopt_long(argc, argv, "R:B:Y:I:e:c:f:i:m:t:g:w:n:s:p:r:vhMSDa", long_options, &option_index);    
+    
+      if(c == -1)
+	break;
+      
+      if(flag > 0)
+	{
+	  switch(option_index)
+	    {
+	    case 0:
+	      {
+		char 
+		  *autoModels[4] = {"ml", "bic", "aic", "aicc"};
+
+		int 
+		  k;
+
+		for(k = 0; k < 4; k++)		  
+		  if(strcmp(optarg, autoModels[k]) == 0)
+		    break;
+
+		if(k == 4)
+		  {
+		    printf("\nError, unknown protein model selection type, you can specify one of the following selection criteria:\n\n");
+		    for(k = 0; k < 4; k++)
+		      printf("--auto-prot=%s\n", autoModels[k]);
+		    printf("\n");
+		    errorExit(-1);
+		  }
+		else
+		  {
+		    switch(k)
+		      {
+		      case 0:
+			tr->autoProteinSelectionType = AUTO_ML;
+			break;
+		      case 1:
+			tr->autoProteinSelectionType = AUTO_BIC;
+			break;
+		      case 2:
+			tr->autoProteinSelectionType = AUTO_AIC;
+			break;
+		      case 3:
+			tr->autoProteinSelectionType = AUTO_AICC;
+			break;
+		      default:
+			assert(0);
+		      }
+		  }
+	      }
+	      break;
+	    default:
+	      assert(0);
+	    }
+	}
+      else	
+	switch(c)
+	  {    
+	  case 'Y':
+	    adef->useQuartetGrouping = TRUE;	 
+	    strcpy(quartetGroupingFileName, optarg);
+	    break;
+	  case 'r':
+	    sscanf(optarg, "%lu", &(adef->numberRandomQuartets));	    
+	    assert(adef->numberRandomQuartets > 0);
+	    break; 
+	  case 'p':
+	    sscanf(optarg,"%u", &(tr->randomSeed));
+	    seedSet = 1;
+	    break;
+	  case 'a':
+	    tr->useMedian = TRUE;	
+	    break;
+	  case 'B':
+	    sscanf(optarg,"%d", &(tr->saveBestTrees));
+	    if(tr->saveBestTrees < 0)
+	      {
+		printf("Number of best trees to save must be greater than 0!\n");
+		errorExit(-1);	 
+	      }
+	    break;       
+	  case 's':	    
+	    strcpy(byteFileName, optarg);	 	
+	    byteFileSet = TRUE;
+	    /*printf("%s \n", byteFileName);*/
+	    break;      
+	  case 'S':
+	    tr->saveMemory = TRUE;
+	    break;
+	  case 'D':
+	    tr->searchConvergenceCriterion = TRUE;	
+	    break;
+	  case 'R':
+	    adef->useCheckpoint = TRUE;
+	    strcpy(binaryCheckpointInputName, optarg);
+	    break;          
+	  case 'I':
+	    sscanf(optarg, "%lu", &(adef->quartetCkpInterval));
+	    break;
+	  case 'M':
+	    adef->perGeneBranchLengths = TRUE;
+	    break;                                 
+	  case 'e':
+	    sscanf(optarg,"%lf", &likelihoodEpsilon);
+	    adef->likelihoodEpsilon = likelihoodEpsilon;
+	    break;    	    
+	  case 'v':
+	    printVersionInfo();
+	    errorExit(0);	    
+	  case 'h':
+	    printREADME();
+	    errorExit(0);     
+	  case 'c':
+	    sscanf(optarg, "%d", &tr->categories);
+	    break;     
+	  case 'f':
+	    sscanf(optarg, "%c", &modelChar);
+	    fOptionCount++;
+	    if(fOptionCount > 1) 
+	      {
+		printf("\nError: only one of the various \"-f \" options can be used per ExaML run!\n");
+		printf("They are mutually exclusive! exiting ...\n\n");
+		errorExit(-1);
+	      }
+	    switch(modelChar)
+	      {	 
+	      case 'e':
+		adef->mode = TREE_EVALUATION;
+		tr->fastTreeEvaluation = TRUE;
+		break;
+	      case 'E':
+		adef->mode = TREE_EVALUATION;
+		tr->fastTreeEvaluation = FALSE;
+		break;
+	      case 'd':
+		adef->mode = BIG_RAPID_MODE;
+		tr->doCutoff = TRUE;
+		break;	  
+	      case 'o':
+		adef->mode = BIG_RAPID_MODE;
+		tr->doCutoff = FALSE;
+		break;	    	  	  	     
+	      case 'q':
+		adef->mode = QUARTET_CALCULATION;
+		break;	
+	      default:
+		{
+		  if(processID == 0)
+		    {
+		      printf("Error select one of the following algorithms via -f :\n");
+		      printMinusFUsage();
+		    }
+		  errorExit(-1);
+		}
+	      }
+	    break;
+	  case 'i':
+	    sscanf(optarg, "%d", &adef->initial);
+	    adef->initialSet = TRUE;
+	    break;
+	  case 'n':
+	    strcpy(run_id,optarg);
+	    analyzeRunId(run_id);
+	    nameSet = 1;
+	    break;
+	  case 'w':
+	    strcpy(resultDir, optarg);
+	    resultDirSet = TRUE;
+	    break;
+	  case 't':
+	    strcpy(tree_file, optarg);       
+	    treeSet = 1;       
+	    break;
+	  case 'g':
+	    strcpy(tree_file, optarg);       
+	    treeSet = 1;       
+	    tr->constraintTree = TRUE;
+	    break;	
+	  case 'm':
+	    strcpy(model,optarg);
+	    if(modelExists(model, tr) == 0)
+	      {
+		if(processID == 0)
+		  {
+		    printf("Rate heterogeneity Model %s does not exist\n\n", model);               
+		    printf("For per site rates (called CAT in previous versions) use: PSR\n");	
+		    printf("For GAMMA use: GAMMA\n");		
+		  }
+		errorExit(-1);
+	      }
+	    else
+	      modelSet = 1;
+	    break;     
+	  default:
+	    errorExit(-1);
+	  }
+    }
+  
+  if(adef->useQuartetGrouping && adef->mode != QUARTET_CALCULATION)
+    {
+      if(processID == 0)
+	printf("\nError, you must specify \"-Y quartetGroupingFileName\" in combination with \"-f q\"\n");
+      errorExit(-1);
+    }
+
+  if(adef->numberRandomQuartets > 0 && adef->mode != QUARTET_CALCULATION)
+    {
+       if(processID == 0)
+	printf("\nError, you must specify \"-r randomQuartetNumber\" in combination with \"-f q\"\n");
+      errorExit(-1);
+    }
+
+  if((adef->numberRandomQuartets > 0) && (adef->useQuartetGrouping))
+    {
+      if(processID == 0)
+	printf("\nError, you must specify either \"-r randomQuartetNumber\" or \"-Y quartetGroupingFileName\"\n");
+      errorExit(-1);
+    }  
+
+  if(tr->constraintTree)
+    {
+      if(!seedSet && processID == 0)
+	{
+	  printf("\nError, you must specify a random number seed via \"-p\" when  using a constraint\n");
+	  printf("tree via \"-g\" \n");
+	  errorExit(-1);
+	}
+    }
+
+  if(!byteFileSet)
+    {
+      if(processID == 0)
+	printf("\nError, you must specify a binary format data file with the \"-s\" option\n");
+      errorExit(-1);
+    }
+
+  if(!modelSet)
+    {
+      if(processID == 0)
+	printf("\nError, you must specify a model of rate heterogeneity with the \"-m\" option\n");
+      errorExit(-1);
+    }
+
+  if(!nameSet)
+    {
+      if(processID == 0)
+	printf("\nError: please specify a name for this run with -n\n");
+      errorExit(-1);
+    }
+
+  if(!treeSet && !adef->useCheckpoint)
+    {
+      if(processID == 0)
+	{
+	  printf("\nError: please either specify a starting tree for this run with -t\n");
+	  printf("or re-start the run from a checkpoint with -R\n");
+	}
+      
+      errorExit(-1);
+    }
+  
+   {
+
+    const 
+      char *separator = "/";
+
+    if(resultDirSet)
+      {
+	char 
+	  dir[1024] = "";
+	
+
+	if(resultDir[0] != separator[0])
+	  strcat(dir, separator);
+	
+	strcat(dir, resultDir);
+	
+	if(dir[strlen(dir) - 1] != separator[0]) 
+	  strcat(dir, separator);
+	strcpy(workdir, dir);
+      }
+    else
+      {
+	char 
+	  dir[1024] = "",
+	  *result = getcwd(dir, sizeof(dir));
+	
+	assert(result != (char*)NULL);
+	
+	if(dir[strlen(dir) - 1] != separator[0]) 
+	  strcat(dir, separator);
+	
+	strcpy(workdir, dir);		
+      }
+   }
+
+  return;
+}
+
+
+
+
+void errorExit(int e)
+{
+  MPI_Finalize();
+
+  exit(e);
+}
+
+
+
+static void makeFileNames(void)
+{
+  int 
+    infoFileExists = 0;
+    
+  strcpy(resultFileName,       workdir);
+  strcpy(logFileName,          workdir);  
+  strcpy(infoFileName,         workdir);
+  strcpy(treeFileName,         workdir);
+  strcpy(binaryCheckpointName, workdir);
+  strcpy(modelFileName, workdir);
+  strcpy(quartetFileName,         workdir);
+
+  strcat(resultFileName,       "ExaML_result.");
+  strcat(logFileName,          "ExaML_log.");  
+  strcat(infoFileName,         "ExaML_info.");
+  strcat(binaryCheckpointName, "ExaML_binaryCheckpoint.");
+  strcat(modelFileName,        "ExaML_modelFile.");
+  strcat(treeFileName,         "ExaML_TreeFile.");
+  strcat(quartetFileName,      "ExaML_quartets.");
+  
+  strcat(resultFileName,       run_id);
+  strcat(logFileName,          run_id);  
+  strcat(infoFileName,         run_id); 
+  strcat(binaryCheckpointName, run_id);
+  strcat(modelFileName,        run_id);
+  strcat(treeFileName,         run_id);
+  strcat(quartetFileName,         run_id);
+
+  infoFileExists = filexists(infoFileName);
+
+  if(infoFileExists)
+    {
+      if(processID == 0)
+	{
+	  printf("ExaML output files with the run ID <%s> already exist \n", run_id);
+	  printf("in directory %s ...... exiting\n", workdir);
+	}
+
+      errorExit(-1);	
+    }
+}
+
+
+
+
+ 
+
+
+
+
+/***********************reading and initializing input ******************/
+
+
+/********************PRINTING various INFO **************************************/
+
+
+static void printModelAndProgramInfo(tree *tr, analdef *adef, int argc, char *argv[])
+{
+  if(processID == 0)
+    {
+      int i, model;
+      FILE *infoFile = myfopen(infoFileName, "ab");
+      char modelType[128];
+
+      
+      if(tr->useMedian)
+	strcpy(modelType, "GAMMA with Median");
+      else
+	strcpy(modelType, "GAMMA");   
+     
+      printBoth(infoFile, "\n\nThis is %s version %s released by Alexandros Stamatakis, Andre Aberer, and Alexey Kozlov in %s.\n\n",  programName, programVersion, programDate);
+                     
+      printBoth(infoFile, "\nAlignment has %zu distinct alignment patterns\n\n",  tr->originalCrunchedLength);
+                 
+      printBoth(infoFile, "Proportion of gaps and completely undetermined characters in this alignment: %3.2f%s\n", 100.0 * tr->gapyness, "%");
+      
+      switch(adef->mode)
+	{	
+	case  BIG_RAPID_MODE:	 
+	  printBoth(infoFile, "\nExaML rapid hill-climbing mode\n\n");
+	  break;
+	case TREE_EVALUATION:
+	  printBoth(infoFile, "\nExaML %s tree evaluation mode\n\n", (tr->fastTreeEvaluation)?"fast":"slow");
+	  break;
+	case QUARTET_CALCULATION:
+	  printBoth(infoFile, "\nExaML quartet evaluation mode\n\n");
+	  break;
+	default:
+	  assert(0);
+	}
+     	  
+      if(adef->perGeneBranchLengths)
+	printBoth(infoFile, "Using %d distinct models/data partitions with individual per partition branch length optimization\n\n\n", tr->NumberOfModels);
+      else
+	printBoth(infoFile, "Using %d distinct models/data partitions with joint branch length optimization\n\n\n", tr->NumberOfModels);	
+	      
+      printBoth(infoFile, "All free model parameters will be estimated by ExaML\n");
+           	
+      if(tr->rateHetModel == GAMMA || tr->rateHetModel == GAMMA_I)
+	printBoth(infoFile, "%s model of rate heteorgeneity, ML estimate of alpha-parameter\n\n", modelType);
+      else
+	{
+	  printBoth(infoFile, "ML estimate of %d per site rate categories\n\n", tr->categories);
+	 
+	}               
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  printBoth(infoFile, "Partition: %d\n", model);
+	  printBoth(infoFile, "Alignment Patterns: %d\n", tr->partitionData[model].upper - tr->partitionData[model].lower);
+	  printBoth(infoFile, "Name: %s\n", tr->partitionData[model].partitionName);
+	  
+	  switch(tr->partitionData[model].dataType)
+	    {
+	    case DNA_DATA:
+	      printBoth(infoFile, "DataType: DNA\n");	     
+	      printBoth(infoFile, "Substitution Matrix: GTR\n");
+	      if(tr->partitionData[model].optimizeBaseFrequencies)
+		printBoth(infoFile, "ML optimization of base frequencies\n");
+	      break;
+	    case AA_DATA:
+	      assert(tr->partitionData[model].protModels >= 0 && tr->partitionData[model].protModels < NUM_PROT_MODELS);
+	      printBoth(infoFile, "DataType: AA\n");	      
+	      printBoth(infoFile, "Substitution Matrix: %s\n", protModels[tr->partitionData[model].protModels]);
+	      if(!tr->partitionData[model].optimizeBaseFrequencies)
+		printBoth(infoFile, "Using %s Base Frequencies\n", (tr->partitionData[model].protFreqs == 1)?"empirical":"fixed");	     
+	      else		
+		printBoth(infoFile, "ML optimization of base frequencies\n");
+	      break;
+	    case BINARY_DATA:
+	      printBoth(infoFile, "DataType: BINARY/MORPHOLOGICAL\n");	      
+	      printBoth(infoFile, "Substitution Matrix: Uncorrected\n");
+	      break;
+	    
+	      /*
+		case SECONDARY_DATA:
+		printBoth(infoFile, "DataType: SECONDARY STRUCTURE\n");	     
+		printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+		break;
+		case SECONDARY_DATA_6:
+		printBoth(infoFile, "DataType: SECONDARY STRUCTURE 6 STATE\n");	     
+		printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+		break;
+		case SECONDARY_DATA_7:
+		printBoth(infoFile, "DataType: SECONDARY STRUCTURE 7 STATE\n");	      
+		printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+		break;
+		case GENERIC_32:
+		printBoth(infoFile, "DataType: Multi-State with %d distinct states in use (maximum 32)\n",tr->partitionData[model].states);		  
+		switch(tr->multiStateModel)
+		{
+		case ORDERED_MULTI_STATE:
+		printBoth(infoFile, "Substitution Matrix: Ordered Likelihood\n");
+		break;
+		case MK_MULTI_STATE:
+		printBoth(infoFile, "Substitution Matrix: MK model\n");
+		break;
+		case GTR_MULTI_STATE:
+		printBoth(infoFile, "Substitution Matrix: GTR\n");
+		break;
+		default:
+		assert(0);
+		}
+		break;
+		case GENERIC_64:
+		printBoth(infoFile, "DataType: Codon\n");		  
+		break;	
+	      */
+	    default:
+	      assert(0);
+	    }
+	  printBoth(infoFile, "\n\n\n");
+	}
+      
+      printBoth(infoFile, "\n");
+
+      printBoth(infoFile, "ExaML was called as follows:\n\n");
+      for(i = 0; i < argc; i++)
+	printBoth(infoFile,"%s ", argv[i]);
+      printBoth(infoFile,"\n\n\n");
+
+      fclose(infoFile);
+    }
+}
+
+void printResult(tree *tr, analdef *adef, boolean finalPrint)
+{
+  if(processID == 0)
+    {
+      FILE *logFile;
+      char temporaryFileName[1024] = "";
+      
+      strcpy(temporaryFileName, resultFileName);
+      
+      switch(adef->mode)
+	{    
+	case TREE_EVALUATION:
+	  Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, finalPrint, SUMMARIZE_LH, FALSE, FALSE);
+	  
+	  logFile = myfopen(temporaryFileName, "wb");
+	  fprintf(logFile, "%s", tr->tree_string);
+	  fclose(logFile);
+	  
+	  if(adef->perGeneBranchLengths)
+	    printTreePerGene(tr, adef, temporaryFileName, "wb");
+	  break;
+	case BIG_RAPID_MODE:     
+	  if(finalPrint)
+	    {
+	      switch(tr->rateHetModel)
+		{
+		case GAMMA:
+		case GAMMA_I:
+		  Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, finalPrint,
+			      SUMMARIZE_LH, FALSE, FALSE);
+		  
+		  logFile = myfopen(temporaryFileName, "wb");
+		  fprintf(logFile, "%s", tr->tree_string);
+		  fclose(logFile);
+		  
+		  if(adef->perGeneBranchLengths)
+		    printTreePerGene(tr, adef, temporaryFileName, "wb");
+		  break;
+		case CAT:
+		  /*Tree2String(tr->tree_string, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, finalPrint, adef,
+		    NO_BRANCHES, FALSE, FALSE);*/
+		  
+		  
+		  
+		  Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE,
+			      TRUE, SUMMARIZE_LH, FALSE, FALSE);
+		  
+		  
+		  
+		  
+		  logFile = myfopen(temporaryFileName, "wb");
+		  fprintf(logFile, "%s", tr->tree_string);
+		  fclose(logFile);
+		  
+		  break;
+		default:
+		  assert(0);
+		}
+	    }
+	  else
+	    {
+	      Tree2String(tr->tree_string, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, finalPrint,
+			  NO_BRANCHES, FALSE, FALSE);
+	      logFile = myfopen(temporaryFileName, "wb");
+	      fprintf(logFile, "%s", tr->tree_string);
+	      fclose(logFile);
+	    }    
+	  break;
+	default:
+	  printf("FATAL ERROR call to printResult from undefined STATE %d\n", adef->mode);
+	  exit(-1);
+	  break;
+	}
+    }
+}
+
+
+
+
+
+
+
+
+void printLog(tree *tr)
+{
+  if(processID == 0)
+    {
+      FILE *logFile;
+      double t;
+      
+      t = gettime() - masterTime;
+      
+      logFile = myfopen(logFileName, "ab");
+      
+      /* printf("%f %1.40f\n", t, tr->likelihood); */
+
+      fprintf(logFile, "%f %f\n", t, tr->likelihood);
+      
+      fclose(logFile);
+    }
+	     
+}
+
+
+
+
+
+
+
+
+
+void getDataTypeString(tree *tr, int model, char typeOfData[1024])
+{
+  switch(tr->partitionData[model].dataType)
+    {
+    case AA_DATA:
+      strcpy(typeOfData,"AA");
+      break;
+    case DNA_DATA:
+      strcpy(typeOfData,"DNA");
+      break;
+    case BINARY_DATA:
+      strcpy(typeOfData,"BINARY/MORPHOLOGICAL");
+      break;
+    case SECONDARY_DATA:
+      strcpy(typeOfData,"SECONDARY 16 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case SECONDARY_DATA_6:
+      strcpy(typeOfData,"SECONDARY 6 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case SECONDARY_DATA_7:
+      strcpy(typeOfData,"SECONDARY 7 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case GENERIC_32:
+      strcpy(typeOfData,"Multi-State");
+      break;
+    case GENERIC_64:
+      strcpy(typeOfData,"Codon"); 
+      break;
+    default:
+      assert(0);
+    }
+}
+static void printRatesDNA_BIN(int n, double *r, char **names, char *fileName)
+{
+  int i, j, c;
+
+  for(i = 0, c = 0; i < n; i++)
+    {
+      for(j = i + 1; j < n; j++)
+	{
+	  if(i == n - 2 && j == n - 1)
+	    printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], 1.0);
+	  else
+	    printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], r[c]);
+	  c++;
+	}
+    }
+}
+
+static void printRatesRest(int n, double *r, char **names, char *fileName)
+{
+  int i, j, c;
+
+  for(i = 0, c = 0; i < n; i++)
+    {
+      for(j = i + 1; j < n; j++)
+	{
+	  printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], r[c]);
+	  c++;
+	}
+    }
+}
+static double branchLength(int model, double *z, tree *tr)
+{
+  double x;
+  
+  x = z[model];
+  assert(x > 0);
+  if (x < zmin) 
+    x = zmin;  
+  
+ 
+  assert(x <= zmax);
+  
+  x = -log(x);
+  
+  return x;
+}
+
+
+static double treeLengthRec(nodeptr p, tree *tr, int model)
+{  
+  double 
+    x = branchLength(model, p->z, tr);
+
+  if(isTip(p->number, tr->mxtips))  
+    return x;    
+  else
+    {
+      double acc = 0;
+      nodeptr q;                
+     
+      q = p->next;      
+
+      while(q != p)
+	{
+	  acc += treeLengthRec(q->back, tr, model);
+	  q = q->next;
+	}
+
+      return acc + x;
+    }
+}
+
+static double treeLength(tree *tr, int model)
+{  
+  return treeLengthRec(tr->start->back, tr, model);
+}
+
+static void printFreqs(int n, double *f, char **names, char *fileName)
+{
+  int k;
+
+  for(k = 0; k < n; k++)
+    printBothOpenDifferentFile(fileName, "freq pi(%s): %f\n", names[k], f[k]);
+}
+
+static void printModelParams(tree *tr, analdef *adef, int treeIteration)
+{
+  int
+    model;
+
+  double
+    *f = (double*)NULL,
+    *r = (double*)NULL;
+
+  char 
+    fileName[2048],
+    buf[64];
+  
+  
+  strcpy(fileName, modelFileName);
+
+  if(treeIteration >= 0)
+    {
+      strcat(fileName, ".");
+      sprintf(buf, "%d", treeIteration);
+      strcat(fileName, buf);
+    }
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+      double tl;
+      char typeOfData[1024];
+
+      getDataTypeString(tr, model, typeOfData);      
+
+      printBothOpenDifferentFile(fileName, "\n\n");
+
+      printBothOpenDifferentFile(fileName, "Model Parameters of Partition %d, Name: %s, Type of Data: %s\n",
+				 model, tr->partitionData[model].partitionName, typeOfData);
+      
+      if(tr->rateHetModel == GAMMA)
+	printBothOpenDifferentFile(fileName, "alpha: %f\n", tr->partitionData[model].alpha);
+     
+
+      if(adef->perGeneBranchLengths)
+	tl = treeLength(tr, model);
+      else
+	tl = treeLength(tr, 0);
+
+      printBothOpenDifferentFile(fileName, "Tree-Length: %f\n", tl);
+
+      f = tr->partitionData[model].frequencies;
+      r = tr->partitionData[model].substRates;
+
+      switch(tr->partitionData[model].dataType)
+	{
+	case AA_DATA:
+	  {
+	    char *freqNames[20] = {"A", "R", "N ","D", "C", "Q", "E", "G",
+				   "H", "I", "L", "K", "M", "F", "P", "S",
+				   "T", "W", "Y", "V"};
+
+	     if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+	      {
+		int 
+		  k;
+		
+		for(k = 0; k < 4; k++)
+		  {
+		    printBothOpenDifferentFile(fileName, "LGM %d\n", k);
+		    printRatesRest(20, tr->partitionData[model].substRates_LG4[k], freqNames, fileName);
+		    printBothOpenDifferentFile(fileName, "\n");
+		    printFreqs(20, tr->partitionData[model].frequencies_LG4[k], freqNames, fileName);
+		  }
+	      }
+
+	    printRatesRest(20, r, freqNames, fileName);
+	    printBothOpenDifferentFile(fileName, "\n");
+	    printFreqs(20, f, freqNames, fileName);
+	  }
+	  break;
+	case DNA_DATA:
+	  {
+	    char *freqNames[4] = {"A", "C", "G", "T"};
+
+	    printRatesDNA_BIN(4, r, freqNames, fileName);
+	    printBothOpenDifferentFile(fileName, "\n");
+	    printFreqs(4, f, freqNames, fileName);
+	  }
+	  break;
+	case BINARY_DATA:
+	  {
+	    char *freqNames[2] = {"0", "1"};
+
+	    printRatesDNA_BIN(2, r, freqNames, fileName);
+	    printBothOpenDifferentFile(fileName, "\n");
+	    printFreqs(2, f, freqNames, fileName);
+	  }
+	  break;
+	default:
+	  assert(0);
+	}
+
+      printBothOpenDifferentFile(fileName, "\n");
+    }
+
+  printBothOpenDifferentFile(fileName, "\n");
+}
+
+
+static void finalizeInfoFile(tree *tr, analdef *adef)
+{
+  if(processID == 0)
+    {
+      double t;
+
+      t = gettime() - masterTime;
+      accumulatedTime = accumulatedTime + t;
+
+      switch(adef->mode)
+	{	
+	case  BIG_RAPID_MODE:	 
+	  printBothOpen("\n\nOverall Time for 1 Inference %f\n", t);
+	  printBothOpen("\nOverall accumulated Time (in case of restarts): %f\n\n", accumulatedTime);
+	  printBothOpen("Likelihood   : %f\n", tr->likelihood);
+	  printBothOpen("\n\n");	  	  
+	  printBothOpen("Model parameters written to:           %s\n", modelFileName);
+	  printBothOpen("Final tree written to:                 %s\n", resultFileName);
+	  printBothOpen("Execution Log File written to:         %s\n", logFileName);
+	  printBothOpen("Execution information file written to: %s\n",infoFileName);	
+	  break;
+	case TREE_EVALUATION:	
+	  printBothOpen("\n\nOverall Time for evaluating the likelihood of %d trees: %f secs\n\n", tr->numberOfTrees, t); 
+	  printBothOpen("\n\nThe model parameters of the trees have been written to files called %s.i\n", modelFileName);
+	  printBothOpen("where i is the number of the tree\n\n");
+	  printBothOpen("Note that, in case of a restart from a checkpoint, some tree model files will have been produced by previous runs!\n\n");
+	  printBothOpen("The trees with branch lengths have been written to file: %s\n", treeFileName);
+	  printBothOpen("They are in the same order as in the input file!\n\n");
+	  break;
+	case QUARTET_CALCULATION:
+	  printBothOpen("\n\nOverall quartet computation time: %f secs\n", t);
+	  printBothOpen("\nAll quartets and corresponding likelihoods written to file %s\n", quartetFileName);
+	  break;
+	default:
+	  assert(0);
+	}
+
+	 
+    }
+
+}
+
+
+/************************************************************************************/
+
+
+static int iterated_bitcount(unsigned int n)
+{
+    int 
+      count=0;    
+    
+    while(n)
+      {
+        count += n & 0x1u ;    
+        n >>= 1 ;
+      }
+    
+    return count;
+}
+
+/*static char bits_in_16bits [0x1u << 16];*/
+
+static void compute_bits_in_16bits(char *bits_in_16bits)
+{
+    unsigned int i;    
+    
+    /* size is 65536 */
+
+    for (i = 0; i < (0x1u<<16); i++)
+        bits_in_16bits[i] = iterated_bitcount(i);       
+
+    return ;
+}
+
+unsigned int precomputed16_bitcount (unsigned int n, char *bits_in_16bits)
+{
+  /* works only for 32-bit int*/
+    
+    return bits_in_16bits [n         & 0xffffu]
+        +  bits_in_16bits [(n >> 16) & 0xffffu] ;
+}
+
+
+static void clean_MPI_Exit(void)
+{
+  MPI_Barrier(MPI_COMM_WORLD);
+  MPI_Finalize();
+}
+
+static void error_MPI_Exit(void)
+{
+  MPI_Barrier(MPI_COMM_WORLD);
+  MPI_Finalize();
+
+  exit(1);
+}
+
+
+static void initializePartitions(tree *tr)
+{ 
+  size_t
+    i,
+    len, 
+    j,    
+    width;
+
+  int
+    model, 
+    maxCategories;
+
+  compute_bits_in_16bits(tr->bits_in_16bits);
+
+  maxCategories = tr->maxCategories;
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {    
+      const partitionLengths 
+	*pl = getPartitionLengths(&(tr->partitionData[model])); 
+
+      //must already be set as a consequence of alloc in function readPartitions
+      //and the subsequent copy of bf->partitions into tr->partitions!
+      assert(tr->partitionData[model].partitionName != (char*)NULL);
+
+      //printf("Partition name %s\n", tr->partitionData[model].partitionName);
+
+      width = tr->partitionData[model].width;
+	
+      /* 
+	 globalScaler needs to be 2 * tr->mxtips such that scalers of inner AND tip nodes can be added without a case switch
+	 to this end, it must also be initialized with zeros -> calloc
+       */
+      
+      len = 2 * tr->mxtips; 
+      tr->partitionData[model].globalScaler       = (unsigned int *)calloc(len, sizeof(unsigned int));
+
+#ifdef _USE_OMP
+      tr->partitionData[model].threadGlobalScaler = (unsigned int**) calloc(tr->nThreads, sizeof(unsigned int*));
+
+      tr->partitionData[model].reductionBuffer 	  = (double*) calloc(tr->nThreads, sizeof(double));
+      tr->partitionData[model].reductionBuffer2   = (double*) calloc(tr->nThreads, sizeof(double));
+
+      int 
+	t;
+      
+      for (t = 0; t < tr->maxThreadsPerModel; ++t)
+	{
+	  Assign*
+	    pAss = tr->partThreadAssigns[model * tr->maxThreadsPerModel + t];
+
+	  if (pAss)
+	    {
+	      int
+		tid = pAss->procId;
+
+	      tr->partitionData[model].threadGlobalScaler[tid]    = (unsigned int *)calloc(len, sizeof(unsigned int));
+	    }
+	}
+#endif
+
+      tr->partitionData[model].left              = (double *)malloc_aligned(pl->leftLength * (maxCategories + 1) * sizeof(double));
+      tr->partitionData[model].right             = (double *)malloc_aligned(pl->rightLength * (maxCategories + 1) * sizeof(double));
+      tr->partitionData[model].EIGN              = (double*)malloc(pl->eignLength * sizeof(double));
+      tr->partitionData[model].EV                = (double*)malloc_aligned(pl->evLength * sizeof(double));
+      tr->partitionData[model].EI                = (double*)malloc(pl->eiLength * sizeof(double));
+      
+      tr->partitionData[model].substRates        = (double *)malloc(pl->substRatesLength * sizeof(double));
+
+
+      //must already be set as a consequence of alloc in function readPartitions
+      //and the subsequent copy of bf->partitions into tr->partitions!
+      assert(tr->partitionData[model].frequencies != (double*)NULL);
+      //tr->partitionData[model].frequencies       = (double*)malloc(pl->frequenciesLength * sizeof(double));
+
+     
+
+      tr->partitionData[model].freqExponents     = (double*)malloc(pl->frequenciesLength * sizeof(double));
+      tr->partitionData[model].empiricalFrequencies       = (double*)malloc(pl->frequenciesLength * sizeof(double));
+      tr->partitionData[model].tipVector         = (double *)malloc_aligned(pl->tipVectorLength * sizeof(double));
+
+
+      if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)      
+	{	  	  
+	  int 
+	    k;
+	  
+	  for(k = 0; k < 4; k++)
+	    {	    
+	      tr->partitionData[model].rawEIGN_LG4[k]              = (double*)malloc(pl->eignLength * sizeof(double));
+	      tr->partitionData[model].EIGN_LG4[k]              = (double*)malloc(pl->eignLength * sizeof(double));
+	      tr->partitionData[model].EV_LG4[k]                = (double*)malloc_aligned(pl->evLength * sizeof(double));
+	      tr->partitionData[model].EI_LG4[k]                = (double*)malloc(pl->eiLength * sizeof(double));
+	      tr->partitionData[model].substRates_LG4[k]        = (double *)malloc(pl->substRatesLength * sizeof(double));
+	      tr->partitionData[model].frequencies_LG4[k]       = (double*)malloc(pl->frequenciesLength * sizeof(double));
+	      tr->partitionData[model].tipVector_LG4[k]         = (double *)malloc_aligned(pl->tipVectorLength * sizeof(double));
+	    }
+	}
+
+
+      tr->partitionData[model].symmetryVector    = (int *)malloc(pl->symmetryVectorLength  * sizeof(int));
+      tr->partitionData[model].frequencyGrouping = (int *)malloc(pl->frequencyGroupingLength  * sizeof(int));
+      
+      tr->partitionData[model].perSiteRates      = (double *)malloc(sizeof(double) * tr->maxCategories);
+            
+      //      tr->partitionData[model].nonGTR = FALSE; 
+      //      tr->partitionData[model].optimizeBaseFrequencies = FALSE; 
+      
+
+      //tr->partitionData[model].gammaRates = (double*)malloc(sizeof(double) * 4);
+
+      tr->partitionData[model].xVector = (double **)malloc(sizeof(double*) * tr->mxtips);   
+      	
+      for(j = 0; j < (size_t)tr->mxtips; j++)	        	  	  	  	 
+	  tr->partitionData[model].xVector[j]   = (double*)NULL;   
+
+      tr->partitionData[model].xSpaceVector = (size_t *)calloc(tr->mxtips, sizeof(size_t));  
+
+#ifdef __MIC_NATIVE
+      tr->partitionData[model].mic_EV                = (double*)malloc_aligned(4 * pl->evLength * sizeof(double));
+      tr->partitionData[model].mic_tipVector         = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+      tr->partitionData[model].mic_umpLeft           = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+      tr->partitionData[model].mic_umpRight           = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+
+      /* for Xeon Phi, sumBuffer must be padded to the multiple of 8 (because of site blocking in kernels) */
+      const int padded_width = GET_PADDED_WIDTH(width);
+      const int span = (size_t)(tr->partitionData[model].states) *
+              discreteRateCategories(tr->rateHetModel);
+
+      tr->partitionData[model].sumBuffer = (double *)malloc_aligned(padded_width *
+									   span * sizeof(double));
+
+      /* fill padding entries with 1. (will be corrected for with zero site weights in wgt) */
+      {
+          int k;
+          for (k = width*span; k < padded_width*span; ++k)
+              tr->partitionData[model].sumBuffer[k] = 1.;
+      }
+#else
+      tr->partitionData[model].sumBuffer = (double *)malloc_aligned(width *
+									   (size_t)(tr->partitionData[model].states) *
+									   discreteRateCategories(tr->rateHetModel) *
+									   sizeof(double));
+#endif
+
+      /* tr->partitionData[model].wgt = (int *)malloc_aligned(width * sizeof(int));	   */
+
+      /* rateCategory must be assigned using calloc() at start up there is only one rate category 0 for all sites */
+
+      if(width > 0 && tr->saveMemory)
+	{
+	  tr->partitionData[model].gapVectorLength = ((int)width / 32) + 1;
+	  
+	  len = tr->partitionData[model].gapVectorLength * 2 * tr->mxtips; 
+	  tr->partitionData[model].gapVector = (unsigned int*)calloc(len, sizeof(unsigned int));	  	    	  	  
+	    
+	  tr->partitionData[model].gapColumn = (double *)malloc_aligned(((size_t)tr->mxtips) *								      
+									       ((size_t)(tr->partitionData[model].states)) *
+									       discreteRateCategories(tr->rateHetModel) * sizeof(double));
+	}
+      else
+	{
+	   tr->partitionData[model].gapVectorLength = 0;
+	    
+	   tr->partitionData[model].gapVector = (unsigned int*)NULL; 	  	    	   
+	    
+	   tr->partitionData[model].gapColumn = (double*)NULL;	    	    	   
+	}              
+    }
+
+
+  /* set up the averaged frac changes per partition such that no further reading accesses to aliaswgt are necessary
+     and we can free the array for the GAMMA model */
+ 
+  {      
+    /* definitions: 
+       sizeof(short) <= sizeof(int) <= sizeof(long)
+       size_t defined by address space (here: 64 bit). 
+       
+       size_t + MPI is a bad idea: in the mpi2.2 standard, they do
+	 not mention it once.
+    */
+
+    unsigned long 
+      *modelWeights = (unsigned long*) calloc(tr->NumberOfModels, sizeof(unsigned long)); 
+    
+    size_t
+      wgtsum = 0;  
+
+    /* determine my weights per partition    */
+    for(model = 0; model < tr->NumberOfModels; model++)      
+      {
+	const pInfo 
+	  partition =  tr->partitionData[model] ; 
+	   
+	size_t 
+	  i = 0; 
+	   
+	for(i = 0; i < partition.width; ++i)
+	  modelWeights[model] += (long) partition.wgt[i]; 
+      }
+    MPI_Allreduce(MPI_IN_PLACE, modelWeights, tr->NumberOfModels, MPI_UNSIGNED_LONG, MPI_SUM, MPI_COMM_WORLD); 
+       
+    /* determine sum */
+    for(model = 0; model < tr->NumberOfModels; ++model)
+      wgtsum += modelWeights[model]; 
+
+    for(model = 0; model < tr->NumberOfModels; model++)      	
+      {
+	tr->partitionWeights[model]       = (double)modelWeights[model];
+	tr->partitionContributions[model] = ((double)modelWeights[model]) / ((double)wgtsum); 
+      }
+       
+    free(modelWeights);
+  }
+
+  /* initialize gap bit vectors at tips when memory saving option is enabled */
+  
+  if(tr->saveMemory)
+    {
+      for(model = 0; model <tr->NumberOfModels; model++)
+	{
+	  int        
+	    undetermined = getUndetermined(tr->partitionData[model].dataType);
+	  	 
+	  width =  tr->partitionData[model].width;
+	    
+	  if(width > 0)
+	    {	   	    	      	    	     
+	      for(j = 1; j <= (size_t)(tr->mxtips); j++)
+		for(i = 0; i < width; i++)
+		  if(tr->partitionData[model].yVector[j][i] == undetermined)
+		    tr->partitionData[model].gapVector[tr->partitionData[model].gapVectorLength * j + i / 32] |= mask32[i % 32];	    
+	    }     
+	}
+    }
+}
+
+
+
+static void initializeTree(tree *tr, analdef *adef)
+{
+  size_t 
+    i ;
+
+  if(adef->perGeneBranchLengths)
+    tr->numBranches = tr->NumberOfModels;
+  else
+    tr->numBranches = 1;
+
+
+  if(NUM_BRANCHES < tr->numBranches)
+    {
+      if(processID == 0 )
+	printf("You have specified per-partition branch lengths (-M option) with %d  models. \n\
+Please set #define NUM_BRANCHES in axml.h to %d (or higher) and recompile %s\n", 
+	       tr->NumberOfModels,tr->NumberOfModels, programName );
+      error_MPI_Exit();
+    }
+
+
+  /* If we use the RF-based convergence criterion we will need to allocate some hash tables.
+     let's not worry about this right now, because it is indeed ExaML-specific */
+
+  tr->executeModel   = (boolean *)calloc( tr->NumberOfModels, sizeof(boolean));
+  
+  for(i = 0; i < (size_t)tr->NumberOfModels; i++)
+    tr->executeModel[i] = TRUE;
+  
+  setupTree(tr); 
+  
+  if(tr->searchConvergenceCriterion && processID == 0)
+    {                     
+      tr->bitVectors = initBitVector(tr->mxtips, &(tr->vLength));
+      tr->h = initHashTable(tr->mxtips * 4);     
+    }
+  
+  for(i = 1; i <= (size_t)tr->mxtips; i++)
+    addword(tr->nameList[i], tr->nameHash, i);
+   
+  initializePartitions(tr);
+
+  initModel(tr);
+}
+
+
+static int getNumberOfTrees(char *fileName, boolean getOffsets, exa_off_t *treeOffsets)
+{
+  FILE 
+    *f = myfopen(fileName, "r");
+
+  int 
+    trees = 0,
+    ch;
+
+  if(getOffsets)
+    treeOffsets[trees] = 0;
+
+  while((ch = fgetc(f)) != EOF)
+    {
+      if(ch == ';')
+	{
+	  trees++;
+	  if(getOffsets)	    
+	    treeOffsets[trees] = exa_ftell(f) + 1;	 	      	   
+	}
+    }
+
+  assert(trees > 0);
+
+  fclose(f);
+
+  return trees;
+}
+
+static void optimizeTrees(tree *tr, analdef *adef)
+{
+  exa_off_t
+    *treeOffsets;
+
+  int 
+    i;   
+
+  tr->numberOfTrees = getNumberOfTrees(tree_file, FALSE, (exa_off_t *)NULL);
+  
+  if(processID == 0)
+    accumulatedTime = 0.0;
+
+  treeOffsets = (exa_off_t *)malloc(sizeof(exa_off_t) * (tr->numberOfTrees + 1));
+
+  tr->likelihoods = (double *)malloc(sizeof(double) * tr->numberOfTrees);
+  tr->treeStrings = (char   *)malloc(sizeof(char) * (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees);
+
+  getNumberOfTrees(tree_file, TRUE, treeOffsets);
+  
+  if(processID == 0)   
+    printBothOpen("\n\nFound %d trees to evaluate\n\n", tr->numberOfTrees);
+  
+  i = 0;
+
+  if(adef->useCheckpoint)
+    {      
+      restart(tr, adef);       		   	    
+	  
+      i = ckp.treeIteration;
+	       
+      if(tr->fastTreeEvaluation && i > 0)	
+	treeEvaluate(tr, 2);	
+      else
+	modOpt(tr, 0.1, adef, i);
+      
+      tr->likelihoods[i] = tr->likelihood;
+      Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+      memcpy(&(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]), tr->tree_string, sizeof(char) * tr->treeStringLength);
+      
+
+      if(processID == 0)
+	printModelParams(tr, adef, i);
+      
+      i++;
+    }
+       
+  for(; i < tr->numberOfTrees; i++)
+    {     
+      FILE 
+	*treeFile = myfopen(tree_file, "rb");
+	    
+      if(exa_fseek(treeFile, treeOffsets[i], SEEK_SET) != 0)
+	assert(0);
+
+      tr->likelihood = unlikely;
+   
+      treeReadLen(treeFile, tr, FALSE, FALSE, FALSE);
+               
+      fclose(treeFile);
+ 
+      tr->start = tr->nodep[1];     
+	  
+      if(i > 0)
+	resetBranches(tr);
+      
+      evaluateGeneric(tr, tr->start, TRUE);	
+      	  
+      if(tr->fastTreeEvaluation && i > 0)
+	{
+	  ckp.state = MOD_OPT;	  	 
+
+	  ckp.treeIteration = i;
+	  
+	  writeCheckpoint(tr, adef);
+
+	  treeEvaluate(tr, 2);
+	}
+      else
+	{
+	  treeEvaluate(tr, 1);      
+	  modOpt(tr, 0.1, adef, i);
+	}
+      
+      tr->likelihoods[i] = tr->likelihood;
+      Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+      memcpy(&(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]), tr->tree_string, sizeof(char) * tr->treeStringLength);
+
+      if(processID == 0)
+	printModelParams(tr, adef, i);
+    }
+
+  if(processID == 0)
+    {
+      FILE 
+	*f = myfopen(treeFileName, "w");
+      
+      for(i = 0; i < tr->numberOfTrees; i++)
+	{
+	  printBothOpen("Likelihood tree %d: %f \n", i, tr->likelihoods[i]);    
+	  fprintf(f, "%s", &(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]));
+	}
+      
+      fclose(f);
+    }
+}
+
+
+
+
+
+
+
+
+
+
+static void readByteFile (tree *tr, int commRank, int commSize )
+{
+  /* read stuff that is cheap; do not change the order! */
+  ByteFile 
+    *bFile = NULL; 
+  
+  initializeByteFile(&bFile, byteFileName); 
+  readHeader(bFile);
+  readTaxa(bFile);
+  readPartitions(bFile); 
+
+  /* calculate optimal distribution of data */
+  PartitionAssignment 
+    *pAss = NULL; 
+  
+  initializePartitionAssignment(&pAss, bFile->partitions, bFile->numPartitions, commSize); 
+  assign(pAss);
+
+  if(commRank == 0 )
+    {
+      printf("\n"); 
+      printAssignments(pAss); 
+      printf("\n"); 
+      printLoad(pAss); 
+      printf("\n");
+    }
+
+  /* now the data of this process is in this struct */
+  readMyData(bFile,pAss, commRank );
+
+  /* carry over the information to the tree */
+  initializeTreeFromByteFile(bFile, tr); 
+  
+  /* just fills up tr->partAssigns that contains the representation of
+     the assignment that we will need */
+  copyAssignmentInfoToTree(pAss, tr);
+
+  deletePartitionAssignment(pAss);
+  deleteByteFile(bFile);
+}
+
+#ifdef _USE_OMP
+void allocateXVectors(tree* tr)
+{
+  nodeptr
+    p = tr->start,
+    q = p->back;
+
+  tr->td[0].ti[0].pNumber = p->number;
+  tr->td[0].ti[0].qNumber = q->number;
+
+  tr->td[0].count = 1;
+
+  computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, FALSE);
+
+  traversalInfo
+    *ti = tr->td[0].ti;
+
+  int
+    i,
+    model;
+
+  for(i = 1; i < tr->td[0].count; i++)
+    {
+      traversalInfo *tInfo = &ti[i];
+
+      /* now loop over all partitions for nodes p, q, and r of the current traversal vector entry */
+
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  /* printf("new view on model %d with width %d\n", model, width);  */
+
+	  size_t
+	    width  = (size_t)tr->partitionData[model].width;
+
+	  double
+	    *x3_start = tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1];
+
+	  size_t
+	    rateHet = discreteRateCategories(tr->rateHetModel),
+
+	    /* get the number of states in the data stored in partition model */
+
+	    states = (size_t)tr->partitionData[model].states,
+
+	    /* get the length of the current likelihood array stored at node p. This is
+	       important mainly for the SEV-based memory saving option described in here:
+
+	       F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees".
+
+	       So tr->partitionData[model].xSpaceVector[i] provides the length of the allocated conditional array of partition model
+	       and node i
+	    */
+
+	    availableLength = tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)],
+	    requiredLength = 0;
+
+	  /* memory saving stuff, not important right now, but if you are interested ask Fernando */
+
+	  if(tr->saveMemory)
+	    {
+	      size_t
+		j,
+		setBits = 0;
+
+	      unsigned int
+		*x1_gap = &(tr->partitionData[model].gapVector[tInfo->qNumber * tr->partitionData[model].gapVectorLength]),
+		*x2_gap = &(tr->partitionData[model].gapVector[tInfo->rNumber * tr->partitionData[model].gapVectorLength]),
+		*x3_gap = &(tr->partitionData[model].gapVector[tInfo->pNumber * tr->partitionData[model].gapVectorLength]);
+
+	      for(j = 0; j < (size_t)tr->partitionData[model].gapVectorLength; j++)
+		{
+		  x3_gap[j] = x1_gap[j] & x2_gap[j];
+		  setBits += (size_t)(precomputed16_bitcount(x3_gap[j], tr->bits_in_16bits));
+		}
+
+	      requiredLength = (width - setBits)  * rateHet * states * sizeof(double);
+	    }
+	  else
+	    /* if we are not trying to save memory the space required to store an inner likelihood array
+	       is the number of sites in the partition times the number of states of the data type in the partition
+	       times the number of discrete GAMMA rates (1 for CAT essentially) times 8 bytes */
+	    requiredLength  =  width * rateHet * states * sizeof(double);
+
+	  /* Initially, even when not using memory saving no space is allocated for inner likelihood arrats hence
+	     availableLength will be zero at the very first time we traverse the tree.
+	     Hence we need to allocate something here */
+
+	  if(requiredLength != availableLength)
+	    {
+	      /* if there is a vector of incorrect length assigned here i.e., x3 != NULL we must free
+		 it first */
+	      if(x3_start)
+		free(x3_start);
+
+	      /* allocate memory: note that here we use a byte-boundary aligned malloc, because we need the vectors
+		 to be aligned at 16 BYTE (SSE3) or 32 BYTE (AVX) boundaries! */
+
+	      x3_start = (double*)malloc_aligned(requiredLength);
+
+	      /* update the data structures for consistent bookkeeping */
+	      tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] = x3_start;
+	      tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)] = requiredLength;
+	    }
+	} // for model
+    } // for traversal
+}
+
+void assignPartitionsToThreads(tree *tr, int commRank)
+{
+  pInfo** rankPartitions = (pInfo **)calloc(tr->NumberOfModels, sizeof(pInfo*) );
+  int i;
+  for (i = 0; i < tr->NumberOfModels; ++i)
+  {
+    rankPartitions[i] = (pInfo *)calloc(1, sizeof(pInfo));
+    rankPartitions[i]->lower = 0;
+    rankPartitions[i]->upper = tr->partitionData[i].width;
+    rankPartitions[i]->width = rankPartitions[i]->upper;
+    rankPartitions[i]->states = tr->partitionData[i].states;
+  }
+
+  PartitionAssignment *pAss = NULL;
+  initializePartitionAssignment(&pAss, rankPartitions, tr->NumberOfModels, tr->nThreads);
+
+  /* */
+  for(i = 0; i < pAss->numPartitions; ++i)
+    {
+      Partition
+	*p = pAss->partitions + i;
+      p->width = (int) ceil((float) p->width / (float) VECTOR_PADDING);
+    }
+  assign(pAss);
+
+  /* Align partition sizes to the boundary (needed for site-blocking on the MIC) */
+  int j;
+  for(i = 0; i < pAss->numProc; ++i)
+    {
+      for(j = 0; j < pAss->numAssignPerProc[i] ; ++j)
+	{
+	  Assignment *a = &pAss->assignPerProc[i][j];
+	  a->offset *= VECTOR_PADDING;
+	  a->width *= VECTOR_PADDING;
+
+	  /* adjust width of last chunk -> must NOT include padding */
+	  size_t realWidth = rankPartitions[a->partId]->width;
+	  if (a->offset + a->width > realWidth)
+	    a->width = realWidth - a->offset;
+	}
+    }
+
+  printf("Partition assignments to threads: \n");
+  printAssignments(pAss);
+  printf("\n");
+  printLoad(pAss);
+  printf("\n");
+
+  copyThreadAssignmentInfoToTree(pAss, tr);
+
+  deletePartitionAssignment(pAss);
+  for (i = 0; i < tr->NumberOfModels; ++i)
+    free(rankPartitions[i]);
+  free(rankPartitions);
+}
+#endif
+
+
+int main (int argc, char *argv[])
+{ 
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &processID);
+  MPI_Comm_size(MPI_COMM_WORLD, &processes);
+  printf("\nThis is ExaML FINE-GRAIN MPI Process Number: %d\n", processID);   
+  MPI_Barrier(MPI_COMM_WORLD);
+  
+  {
+    tree  
+      *tr = (tree*)malloc(sizeof(tree));
+  
+    analdef 
+      *adef = (analdef*)malloc(sizeof(analdef));   
+
+    /* 
+       tell the CPU to ignore exceptions generated by denormalized floating point values.
+       If this is not done, depending on the input data, the likelihood functions can exhibit 
+       substantial run-time differences for vectors of equal length.
+    */
+    
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+    _mm_setcsr( _mm_getcsr() | _MM_FLUSH_ZERO_ON);
+#endif   
+
+  /* get the start time */
+   
+    masterTime = gettime();         
+    
+  /* initialize the analysis parameters in struct adef to default values */
+    
+    initAdef(adef);
+
+  /* parse command line arguments: this has a side effect on tr struct and adef struct variables */
+  
+    get_args(argc, argv, adef, tr); 
+  
+  /* generate the ExaML output file names and store them in strings */
+    
+    makeFileNames();
+
+#ifdef _USE_OMP
+    if(tr->saveMemory)
+      {
+	printBothOpen("\nError: Memory saving option \"-S\" is not supported by the OpenMP version of ExaML!\n\n");
+	error_MPI_Exit();
+      }
+#endif
+
+    readByteFile(tr, processID, processes );
+
+#ifdef _USE_OMP
+    tr->nThreads = omp_get_max_threads();
+    assignPartitionsToThreads(tr, processID);
+#endif
+
+    initializeTree(tr, adef);
+
+    if(processID == 0)  
+      {	
+	printModelAndProgramInfo(tr, adef, argc, argv);  
+	printBothOpen("Memory Saving Option: %s\n", (tr->saveMemory == TRUE)?"ENABLED":"DISABLED");   	             
+      }
+	
+    /* do some error checks for the LG4 model and the binary models and the MIC and exit gracefully */
+
+    {
+      int 
+	countBinary = 0,
+	countLG4 = 0,
+	model;
+	
+#ifdef __MIC_NATIVE
+      if(tr->saveMemory)
+	{
+	  printBothOpen("Error: There is no MIC support yet for the memory saving option \"-S\"!\n\n");	  
+	  error_MPI_Exit();  	      
+	}
+      
+      if(tr->rateHetModel == CAT)
+	{
+	  printBothOpen("Error: There is no MIC support yet for the PSR model!\n\n");	  
+	  error_MPI_Exit(); 
+	}
+#endif
+
+
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  if(tr->partitionData[model].protModels == LG4M ||  tr->partitionData[model].protModels == LG4X)
+	    countLG4++;
+	  if(tr->partitionData[model].states == 2)
+	    countBinary++;
+	}
+
+      if(countLG4 > 0)
+	{
+	  if(tr->saveMemory == TRUE)
+	    {
+	      printBothOpen("Error: the LG4 substitution model does not work in combination with the \"-S\" memory saving flag!\n\n");	  
+	      error_MPI_Exit();
+	    }
+
+	  if(tr->rateHetModel == CAT)
+	    {
+	      printBothOpen("Error: the LG4 substitution model does not work for proportion of invariavble sites estimates!\n\n");
+	      error_MPI_Exit();
+	    }
+	}
+
+      if(countBinary > 0)
+	{
+	  if(tr->saveMemory == TRUE)
+	    {
+	      printBothOpen("Error: Binary data partitions can not be used in combination with the \"-S\" memory saving flag!\n\n");	  
+	      error_MPI_Exit();
+	    }
+	  
+#ifdef __MIC_NATIVE
+	  printBothOpen("Error: There is no MIC support yet for binary data partitions!\n\n");	  
+	  error_MPI_Exit();  	      
+#endif
+	}
+    }
+	             
+    /* 
+       this will re-start ExaML exactly where it has left off from a checkpoint file,
+       while checkpointing is important and has to be implemented for the library we should not worry about this right now 
+    */
+  
+   
+
+    switch(adef->mode)
+      {
+      case TREE_EVALUATION:
+	optimizeTrees(tr, adef);	
+	break;
+      case BIG_RAPID_MODE:
+	if(adef->useCheckpoint)
+	  {      
+	    /* read checkpoint file */
+	    restart(tr, adef);       	
+	    
+	    /* continue tree search where we left it off */
+	    computeBIGRAPID(tr, adef, TRUE); 
+
+	    /* now print the model parameters to file */
+	    if(processID == 0)
+	      printModelParams(tr, adef, -1);
+	  }
+	else
+	  {
+	    /* not important, only used to keep track of total accumulated exec time 
+	       when checkpointing and restarts were used */
+	    
+	    if(processID == 0)
+	      accumulatedTime = 0.0;
+	    
+	    /* get the starting tree: here we just parse the tree passed via the command line 
+	   and do an initial likelihood computation traversal 
+	   which we maybe should skip, TODO */
+	    
+	    getStartingTree(tr); 
+	    
+#ifdef _USE_OMP
+	    allocateXVectors(tr);
+#endif
+
+	    /* 
+	       here we do an initial full tree traversal on the starting tree using the Felsenstein pruning algorithm 
+	       This should basically be the first call to the library that actually computes something :-)
+	    */
+	    
+	    evaluateGeneric(tr, tr->start, TRUE);	       	  
+
+	    /* the treeEvaluate() function repeatedly iterates over the entire tree to optimize branch lengths until convergence */
+	    
+	    treeEvaluate(tr, 1); 
+
+	    /* now start the ML search algorithm */
+	    
+	    computeBIGRAPID(tr, adef, TRUE); 			     
+
+	    /* now print the model parameters to file */
+	    if(processID == 0)
+	      printModelParams(tr, adef, -1);
+	    
+	  }         
+	break;
+      case QUARTET_CALCULATION:	 
+	computeQuartets(tr, adef);          
+	break;
+      default:
+	assert(0);
+      }
+      
+    /* print some more nonsense into the ExaML_info file */
+  
+    if(processID == 0)
+      finalizeInfoFile(tr, adef);
+  }
+  
+  /* return 0 which means that our unix program terminated correctly, the return value is not 1 here */
+
+  clean_MPI_Exit();
+
+  return 0;
+}
+
+
diff --git a/examl/axml.h b/examl/axml.h
new file mode 100644
index 0000000..95e90c1
--- /dev/null
+++ b/examl/axml.h
@@ -0,0 +1,1418 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses
+ *  with thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef _AXML_H
+#define _AXML_H
+
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include "../versionHeader/version.h"
+
+#ifdef __MIC_NATIVE
+#define BYTE_ALIGNMENT 64
+#define VECTOR_PADDING 8
+#elif defined __AVX
+#define BYTE_ALIGNMENT 32
+#define VECTOR_PADDING 1
+#else
+#define BYTE_ALIGNMENT 16
+#define VECTOR_PADDING 1
+#endif
+
+#define GET_PADDED_WIDTH(w) w % VECTOR_PADDING == 0 ? w : w + (VECTOR_PADDING - (w % VECTOR_PADDING))
+
+#include <mpi.h>
+
+#ifdef _USE_OMP
+#include "omp.h"
+#endif
+
+/* BEGIN: file streams */
+#ifdef _GNU_SOURCE
+
+/* notice, that the gnu source macro implies posix compliance */
+
+
+/* these are posix compliant functions. They potentially work on files
+   larger than 2 GB (for gcc this can be ensured using the following
+   macro) */
+#define exa_fseek fseeko
+#define exa_ftell ftello 
+#define exa_off_t off_t
+
+/* only usefull for ftello/fseeko: ensure that we are using 64-bit
+   types for representing an offset */
+#define _FILE_OFFSET_BITS 64 
+
+#else 
+
+#define exa_fseek fseek 
+#define exa_ftell ftell 
+#define exa_off_t long 
+
+#endif
+/* END: file streams  */
+
+
+#define MAX_TIP_EV     0.999999999 /* max tip vector value, sum of EVs needs to be smaller than 1.0, otherwise the numerics break down */
+#define smoothings     32          /* maximum smoothing passes through tree */
+#define iterations     10          /* maximum iterations of iterations per insert */
+#define newzpercycle   1           /* iterations of makenewz per tree traversal */
+#define nmlngth        256         /* number of characters in species name */
+#define deltaz         0.00001     /* test of net branch length change in update */
+#define defaultz       0.9         /* value of z assigned as starting point */
+#define unlikely       -1.0E300    /* low likelihood for initialization */
+
+
+#define AUTO_ML   0
+#define AUTO_BIC  1
+#define AUTO_AIC  2
+#define AUTO_AICC 3
+
+#define SUMMARIZE_LENGTH -3
+#define SUMMARIZE_LH     -2
+#define NO_BRANCHES      -1
+
+#define MASK_LENGTH 32
+#define GET_BITVECTOR_LENGTH(x) ((x % MASK_LENGTH) ? (x / MASK_LENGTH + 1) : (x / MASK_LENGTH))
+
+#define zmin       1.0E-15  /* max branch prop. to -log(zmin) (= 34) */
+#define zmax (1.0 - 1.0E-6) /* min branch prop. to 1.0-zmax (= 1.0E-6) */
+
+#define twotothe256  \
+  115792089237316195423570985008687907853269984665640564039457584007913129639936.0  
+                                                     /*  2**256 (exactly)  */
+
+#define minlikelihood  (1.0/twotothe256)
+#define minusminlikelihood -minlikelihood
+
+
+
+
+/* 18446744073709551616.0 */
+
+/*4294967296.0*/
+
+/* 18446744073709551616.0 */
+
+/*  2**64 (exactly)  */
+/* 4294967296 2**32 */
+
+#define badRear         -1
+
+#define NUM_BRANCHES     256
+
+#define TRUE             1
+#define FALSE            0
+
+
+
+#define LIKELIHOOD_EPSILON 0.0000001
+
+#define AA_SCALE 10.0
+#define AA_SCALE_PLUS_EPSILON 10.001
+
+/* ALPHA_MIN is critical -> numerical instability, eg for 4 discrete rate cats                    */
+/* and alpha = 0.01 the lowest rate r_0 is                                                        */
+/* 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487             */
+/* which leads to numerical problems Table for alpha settings below:                              */
+/*                                                                                                */
+/* 0.010000 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487    */
+/* 0.010000 yielded nasty numerical bugs in at least one case !                                   */
+/* 0.020000 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875    */
+/* 0.030000 0.00000000000000000000476844846859006690412039180149775802624789852441798419292220    */
+/* 0.040000 0.00000000000000049522423236954066431210260930029681736928018820007024736185030633    */
+/* 0.050000 0.00000000000050625351310359203371872643495343928538368616365517027588794007897377    */
+/* 0.060000 0.00000000005134625283884191118711474021861409372524676086868566926568746566772461    */
+/* 0.070000 0.00000000139080650074206434685544624965062437960128249869740102440118789672851562    */
+/* 0.080000 0.00000001650681201563587066858709818343436959153791576682124286890029907226562500    */
+/* 0.090000 0.00000011301977332931251259273962858978301859735893231118097901344299316406250000    */
+/* 0.100000 0.00000052651925834844387815526344648331402709118265192955732345581054687500000000    */
+
+
+#define ALPHA_MIN    0.02
+#define ALPHA_MAX    1000.0
+
+#define RATE_MIN     0.0000001
+#define RATE_MAX     1000000.0
+
+#define INVAR_MIN    0.0001
+#define INVAR_MAX    0.9999
+
+#define TT_MIN       0.0000001
+#define TT_MAX       1000000.0
+
+#define FREQ_MIN     0.001
+
+#define LG4X_RATE_MIN 0.0000001
+#define LG4X_RATE_MAX 1000.0
+
+/* 
+   previous values between 0.001 and 0.000001
+
+   TO AVOID NUMERICAL PROBLEMS WHEN FREQ == 0 IN PARTITIONED MODELS, ESPECIALLY WITH AA 
+   previous value of FREQ_MIN was: 0.000001, but this seemed to cause problems with some 
+   of the 7-state secondary structure models with some rather exotic small toy test datasets,
+   on the other hand 0.001 caused problems with some of the 16-state secondary structure models
+
+   For some reason the frequency settings seem to be repeatedly causing numerical problems
+   
+*/
+
+#define ITMAX 100
+
+
+
+#define SHFT(a,b,c,d)                (a)=(b);(b)=(c);(c)=(d);
+#define SIGN(a,b)                    ((b) > 0.0 ? fabs(a) : -fabs(a))
+
+#define ABS(x)    (((x)<0)   ?  (-(x)) : (x))
+#define MIN(x,y)  (((x)<(y)) ?    (x)  : (y))
+#define MAX(x,y)  (((x)>(y)) ?    (x)  : (y))
+#define NINT(x)   ((int) ((x)>0 ? ((x)+0.5) : ((x)-0.5)))
+
+
+#define LOG(x)  log(x)
+
+#define FABS(x) fabs(x)
+
+
+#define EXP(x)  exp(x)
+
+
+
+
+
+
+#define PointGamma(prob,alpha,beta)  PointChi2(prob,2.0*(alpha))/(2.0*(beta))
+
+//#define programName        "ExaML"
+//#define programVersion     "2.0.3"
+//#define programDate        "June 25 2014"
+
+
+#define  TREE_EVALUATION            0
+#define  BIG_RAPID_MODE             1
+#define  QUARTET_CALCULATION        2
+
+
+#define M_GTRCAT         1
+#define M_GTRGAMMA       2
+#define M_BINCAT         3
+#define M_BINGAMMA       4
+#define M_PROTCAT        5
+#define M_PROTGAMMA      6
+#define M_32CAT          7
+#define M_32GAMMA        8
+#define M_64CAT          9
+#define M_64GAMMA        10
+
+
+#define DAYHOFF    0
+#define DCMUT      1
+#define JTT        2
+#define MTREV      3
+#define WAG        4
+#define RTREV      5
+#define CPREV      6
+#define VT         7
+#define BLOSUM62   8
+#define MTMAM      9
+#define LG         10
+#define MTART      11
+#define MTZOA      12
+#define PMB        13
+#define HIVB       14
+#define HIVW       15
+#define JTTDCMUT   16
+#define FLU        17 
+#define STMTREV    18
+#define AUTO       19
+#define LG4M       20
+#define LG4X       21
+#define GTR        22  /* GTR always needs to be the last one */
+
+#define NUM_PROT_MODELS 23
+
+/* bipartition stuff */
+
+#define BIPARTITIONS_ALL       0
+#define GET_BIPARTITIONS_BEST  1
+#define DRAW_BIPARTITIONS_BEST 2
+#define BIPARTITIONS_BOOTSTOP  3
+#define BIPARTITIONS_RF  4
+
+
+
+/* bootstopping stuff */
+
+#define BOOTSTOP_PERMUTATIONS 100
+#define START_BSTOP_TEST      10
+
+#define FC_THRESHOLD          99
+#define FC_SPACING            50
+#define FC_LOWER              0.99
+#define FC_INIT               20
+
+#define FREQUENCY_STOP 0
+#define MR_STOP        1
+#define MRE_STOP       2
+#define MRE_IGN_STOP   3
+
+#define MR_CONSENSUS 0
+#define MRE_CONSENSUS 1
+#define STRICT_CONSENSUS 2
+
+
+
+/* bootstopping stuff end */
+
+
+#define TIP_TIP     0
+#define TIP_INNER   1
+#define INNER_INNER 2
+
+#define MIN_MODEL        -1
+#define BINARY_DATA      0
+#define DNA_DATA         1
+#define AA_DATA          2
+#define SECONDARY_DATA   3
+#define SECONDARY_DATA_6 4
+#define SECONDARY_DATA_7 5
+#define GENERIC_32       6
+#define GENERIC_64       7
+#define MAX_MODEL        8
+
+#define SEC_6_A 0
+#define SEC_6_B 1
+#define SEC_6_C 2
+#define SEC_6_D 3
+#define SEC_6_E 4
+
+#define SEC_7_A 5
+#define SEC_7_B 6
+#define SEC_7_C 7
+#define SEC_7_D 8
+#define SEC_7_E 9
+#define SEC_7_F 10
+
+#define SEC_16   11
+#define SEC_16_A 12
+#define SEC_16_B 13
+#define SEC_16_C 14
+#define SEC_16_D 15
+#define SEC_16_E 16
+#define SEC_16_F 17
+#define SEC_16_I 18
+#define SEC_16_J 19
+#define SEC_16_K 20
+
+#define ORDERED_MULTI_STATE 0
+#define MK_MULTI_STATE      1
+#define GTR_MULTI_STATE     2
+
+
+
+
+
+#define CAT         0
+#define GAMMA       1
+#define GAMMA_I     2
+
+
+
+typedef  int boolean;
+
+
+typedef struct {
+  double lh;
+  int tree;
+  double weight;
+} elw;
+
+struct ent
+{
+  unsigned int *bitVector;
+  unsigned int *treeVector;
+  unsigned int amountTips;
+  int *supportVector;
+  unsigned int bipNumber;
+  unsigned int bipNumber2;
+  unsigned int supportFromTreeset[2]; 
+  struct ent *next;
+};
+
+typedef struct ent entry;
+
+typedef unsigned int hashNumberType;
+
+
+
+/*typedef uint_fast32_t parsimonyNumber;*/
+
+#define PCF 32
+
+/*
+  typedef uint64_t parsimonyNumber;
+
+  #define PCF 16
+
+
+typedef unsigned char parsimonyNumber;
+
+#define PCF 2
+*/
+
+typedef struct
+{
+  hashNumberType tableSize;
+  entry **table;
+  hashNumberType entryCount;
+}
+  hashtable;
+
+
+struct stringEnt
+{
+  int nodeNumber;
+  char *word;
+  struct stringEnt *next;
+};
+
+typedef struct stringEnt stringEntry;
+ 
+typedef struct
+{
+  hashNumberType tableSize;
+  stringEntry **table;
+}
+  stringHashtable;
+
+
+
+
+
+typedef struct ratec
+{
+  double accumulatedSiteLikelihood;
+  double rate;
+}
+  rateCategorize;
+
+
+typedef struct
+{
+  int tipCase;
+  int pNumber;
+  int qNumber;
+  int rNumber;
+  double qz[NUM_BRANCHES];
+  double rz[NUM_BRANCHES];
+} traversalInfo;
+
+typedef struct
+{
+  traversalInfo *ti;
+  int count;
+  int functionType;
+  boolean traversalHasChanged;
+  boolean *executeModel;
+  double  *parameterValues;
+} traversalData;
+
+
+struct noderec;
+
+
+
+typedef struct
+{
+ 
+
+  unsigned int *vector; 
+  int support;   
+  struct noderec *oP;
+  struct noderec *oQ;
+} branchInfo;
+
+
+
+
+
+
+
+
+typedef struct
+{
+  boolean valid;
+  int partitions;
+  int *partitionList;
+}
+  linkageData;
+
+typedef struct
+{
+  int entries;
+  linkageData* ld;
+}
+  linkageList;
+
+
+typedef  struct noderec
+{
+  double           z[NUM_BRANCHES];
+#ifdef _BAYESIAN 
+  double           z_tmp[NUM_BRANCHES];
+#endif 
+  struct noderec  *next;
+  struct noderec  *back;
+  hashNumberType   hash;
+  int              number;
+  char             x;
+  char             xPars;
+  char             xBips;
+}
+  node, *nodeptr;
+
+typedef struct
+  {
+    double lh;
+    int number;
+  }
+  info;
+
+typedef struct bInf {
+  double likelihood;
+  nodeptr node;
+} bestInfo;
+
+typedef struct iL {
+  bestInfo *list;
+  int n;
+  int valid;
+} infoList;
+
+
+
+typedef unsigned int parsimonyNumber;
+
+
+
+
+typedef struct {
+  int     states;
+  int     maxTipStates;
+  size_t     lower;
+  size_t     upper;
+  size_t     width;
+
+  size_t offset; 		/* NEW: makes the data assigned to
+				   this process identifiable (since we
+				   now, that all data from one
+				   partition must be in one contiguous
+				   chunk).  */
+
+  int     dataType;
+  int     protModels;
+  int     autoProtModels;
+  int     protFreqs;
+  boolean nonGTR;
+  boolean optimizeBaseFrequencies;
+  int     numberOfCategories;
+
+  char   *partitionName;
+  int    *symmetryVector;
+  int    *frequencyGrouping;
+    
+  double *sumBuffer; 
+  double gammaRates[4];
+  double *EIGN;
+  double *EV;
+  double *EI;
+  double *left;
+  double *right;
+
+   /* LG4 */
+
+  double *rawEIGN_LG4[4];
+  double *EIGN_LG4[4];
+  double *EV_LG4[4];
+  double *EI_LG4[4];   
+
+  double *frequencies_LG4[4];
+  double *tipVector_LG4[4];
+  double *substRates_LG4[4];
+  
+  /* LG4X */
+
+  double weights[4];
+  double weightExponents[4];
+
+  double weightsBuffer[4];
+  double weightExponentsBuffer[4];
+
+  /* LG4 */
+
+  double *frequencies;
+  double *freqExponents;
+  double *empiricalFrequencies;
+  double *tipVector; 
+  double *substRates;    
+  double *perSiteRates;
+  int    *wgt;			
+  int    *rateCategory;
+  double alpha;
+
+  double          **xVector;
+  size_t           *xSpaceVector;
+  unsigned char   **yVector;
+  unsigned char    *yResource; 	/* contains the entire array, that is referenced in yVector */
+  unsigned int     *globalScaler; 
+
+  int               gapVectorLength;
+  unsigned int     *gapVector;
+  double           *gapColumn; 
+
+  size_t parsimonyLength;
+  parsimonyNumber *parsVect; 
+
+  double *lhs;
+  double *patrat;
+
+#ifdef _USE_OMP
+  /* thread-private data for OMP version */
+  unsigned int **threadGlobalScaler;
+  double *reductionBuffer;
+  double *reductionBuffer2;
+#endif
+
+#ifdef __MIC_NATIVE
+  double *mic_EV;
+  double *mic_tipVector;
+
+  /* these arrays will store the precomputed product of tipVector and left/right P-matrix */
+  double *mic_umpLeft;
+  double *mic_umpRight;
+#endif  
+
+} pInfo;
+
+
+
+typedef struct 
+{
+  int left;
+  int right;
+  double likelihood;
+} lhEntry;
+
+
+typedef struct 
+{
+  int count;
+  int size;
+  lhEntry *entries;
+} lhList;
+
+
+typedef struct List_{
+  void *value; 			
+  struct List_ *next; 
+} List;
+
+
+#define REARR_SETTING 1
+#define FAST_SPRS     2
+#define SLOW_SPRS     3
+#define MOD_OPT       4
+#define QUARTETS      5
+
+typedef struct {
+  boolean useMedian;
+  int saveBestTrees;
+  boolean saveMemory;
+  boolean searchConvergenceCriterion;
+  boolean perGeneBranchLengths; //adef
+  double likelihoodEpsilon; //adef
+  int categories;
+  int mode; //adef
+  int fastTreeEvaluation;
+  boolean initialSet;//adef
+  int initial;//adef
+  int rateHetModel;
+  int autoProteinSelectionType;
+  
+  //quartets 
+  boolean useQuartetGrouping;//adef
+  unsigned long int numberRandomQuartets;//adef
+
+} commandLine;
+
+typedef struct {
+ 
+  int state;
+
+  /* search algorithm */
+
+  unsigned int vLength;
+
+  boolean constraintTree;
+  
+  int rearrangementsMax;
+  int rearrangementsMin;
+  int thoroughIterations;
+  int fastIterations;
+  int treeVectorLength;  
+  int mintrav;
+  int maxtrav;
+  int bestTrav;
+  int    Thorough;
+  int    optimizeRateCategoryInvocations;
+  
+  double accumulatedTime;
+
+  double startLH; 
+  double lh;
+  double previousLh;
+  double difference;
+  double epsilon;
+  
+  boolean impr;
+  boolean cutoff;  
+       
+  double tr_startLH;
+  double tr_endLH;
+  double tr_likelihood;
+  double tr_bestOfNode;
+  
+  double tr_lhCutoff;
+  double tr_lhAVG;
+  double tr_lhDEC;
+  int    tr_NumberOfCategories;
+  int    tr_itCount;  
+  int    tr_doCutoff;
+
+  /* modOpt */
+
+  int catOpt;
+  int treeIteration; 
+  /* quartets */
+
+  long seed;
+  int flavor;   
+  uint64_t quartetCounter;  
+  long filePosition;
+  char quartetFileName[1024];
+  //FILE NAME???
+
+  /* command line settings */
+
+  commandLine cmd;
+  
+} checkPointState;
+
+
+typedef struct {
+  double EIGN[19] __attribute__ ((aligned (BYTE_ALIGNMENT)));             
+  double EV[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));                
+  double EI[380] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double substRates[190];        
+  double frequencies[20] ;      
+  double tipVector[460] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double left[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double right[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+} siteAAModels;
+
+
+typedef struct assign
+{
+  int partitionId;
+  int procId;	     /* to which process is the partition assigned  */
+  size_t offset;     /* what is the offset of this assignment */
+  size_t width; 
+} Assign ; 
+
+
+typedef  struct  {
+
+  int *ti;
+
+  unsigned int randomSeed;
+  boolean constraintTree;
+  boolean useGappedImplementation;
+  boolean saveMemory;  
+  int              saveBestTrees;
+
+  stringHashtable  *nameHash;
+
+  pInfo            *partitionData;
+  
+
+  char             *secondaryStructureInput;
+
+  boolean          *executeModel;
+
+  double           *perPartitionLH;
+
+  traversalData    td[1];
+
+  int              maxCategories;
+  int              categories;
+  
+  double           coreLZ[NUM_BRANCHES];
+  int              numBranches;
+  
+  
+ 
+  branchInfo	   *bInf;
+
+  int              multiStateModel;
+
+
+  boolean curvatOK[NUM_BRANCHES];
+  /* the stuff below is shared among DNA and AA, span does
+     not change depending on datatype */
+
+  /* model stuff end */
+
+  unsigned char             **yVector;
+  int              secondaryStructureModel;
+  size_t           originalCrunchedLength;
+ 
+ 
+  int              *secondaryStructurePairs;
+
+
+  double            *partitionContributions;
+  double            *partitionWeights;
+
+  double            lhCutoff;
+  double            lhAVG;
+  unsigned long     lhDEC;
+  unsigned long     itCount;
+  int               numberOfInvariableColumns;
+  int               weightOfInvariableColumns;
+  int               rateHetModel;
+
+  double           startLH;
+  double           endLH;
+  double           likelihood;
+  
+ 
+  node           **nodep;
+  nodeptr          nodeBaseAddress;
+  node            *start;
+  int              mxtips;  
+
+  int              *constraintVector;
+  int              numberOfSecondaryColumns;
+  boolean          searchConvergenceCriterion;
+  int              ntips;
+  int              nextnode;  
+  int              NumberOfModels;    
+
+  boolean          bigCutoff;
+  boolean          partitionSmoothed[NUM_BRANCHES];
+  boolean          partitionConverged[NUM_BRANCHES];
+  boolean          rooted;
+  boolean          doCutoff;
+ 
+
+
+  double         gapyness;
+
+  char **nameList;
+  char *tree_string;
+  char *treeStrings;
+  char *tree0;
+  char *tree1;
+  int treeStringLength;
+ 
+  unsigned int bestParsimony;
+  unsigned int *parsimonyScore;
+  
+  double bestOfNode;
+  nodeptr removeNode;
+  nodeptr insertNode;
+
+  double zqr[NUM_BRANCHES];
+  double currentZQR[NUM_BRANCHES];
+
+  double currentLZR[NUM_BRANCHES];
+  double currentLZQ[NUM_BRANCHES];
+  double currentLZS[NUM_BRANCHES];
+  double currentLZI[NUM_BRANCHES];
+  double lzs[NUM_BRANCHES];
+  double lzq[NUM_BRANCHES];
+  double lzr[NUM_BRANCHES];
+  double lzi[NUM_BRANCHES];
+
+ 
+ 
+
+
+  unsigned int **bitVectors;
+
+  unsigned int vLength;
+
+  hashtable *h;
+
+  char bits_in_16bits [0x1u << 16];
+  
+  boolean useMedian;
+
+  int autoProteinSelectionType;
+
+  int numberOfTrees;
+
+  double *likelihoods;
+
+  boolean fastTreeEvaluation;
+  
+  int numAssignments; 
+  Assign *partAssigns;
+
+  /** 
+      IMPORTANT: 
+      
+      introducing a few resource pointers. All memeory needed for
+      example for per-partition patrats is owned by these
+      basepointers, the per-partition pointer just points at the
+      contiguous block of memory.
+      
+      The big advantage, why I really think, this is worth it is, that
+      these base pointers can be used to conveniently gather/scatter
+      all data at a master. The master still has to reorder the
+      gathered data, but less copying is necessary at the workers
+
+      REQUIREMENTS: 
+
+      * all memory necessary for a partition must be in a contiguous
+      block,
+
+      * memory for partitions is ordered by partition id (first
+      * partition 1, then partition 2,... )
+   */ 
+
+  double *patrat_basePtr; 
+  int *rateCategory_basePtr; 
+  double *lhs_basePtr;
+
+#ifdef _USE_OMP
+  /* number of OMP threads*/
+  int nThreads;
+
+  /* maximum number of partitions assigned to a single thread */
+  int maxModelsPerThread;
+
+  /* maximum number of threads assigned to a single partition */
+  int maxThreadsPerModel;
+
+  /* partition-to-threads assignments: indexed by thread */
+  Assign **threadPartAssigns;
+
+  /* partition-to-threads assignments: indexed by partition id */
+  Assign **partThreadAssigns;
+
+#endif
+
+} tree;
+
+
+/***************************************************************/
+
+typedef struct {
+  int partitionNumber;
+  int partitionLength;
+} partitionType;
+
+typedef struct
+{
+  double z[NUM_BRANCHES];
+  nodeptr p, q;
+  int cp, cq;
+}
+  connectRELL, *connptrRELL;
+
+typedef  struct
+{
+  connectRELL     *connect; 
+  int             start;
+  double          likelihood;
+}
+  topolRELL;
+
+
+typedef  struct
+{
+  int max;
+  topolRELL **t;
+}
+  topolRELL_LIST;
+
+
+/**************************************************************/
+
+
+
+typedef struct conntyp {
+    double           z[NUM_BRANCHES];           /* branch length */
+    node            *p, *q;       /* parent and child sectors */
+    void            *valptr;      /* pointer to value of subtree */
+    int              descend;     /* pointer to first connect of child */
+    int              sibling;     /* next connect from same parent */
+    } connect, *connptr;
+
+typedef  struct {
+    double           likelihood;
+  int              initialTreeNumber;
+    connect         *links;       /* pointer to first connect (start) */
+    node            *start;
+    int              nextlink;    /* index of next available connect */
+                                  /* tr->start = tpl->links->p */
+    int              ntips;
+    int              nextnode;
+    int              scrNum;      /* position in sorted list of scores */
+    int              tplNum;      /* position in sorted list of trees */
+
+    } topol;
+
+typedef struct {
+    double           best;        /* highest score saved */
+    double           worst;       /* lowest score saved */
+    topol           *start;       /* starting tree for optimization */
+    topol          **byScore;
+    topol          **byTopol;
+    int              nkeep;       /* maximum topologies to save */
+    int              nvalid;      /* number of topologies saved */
+    int              ninit;       /* number of topologies initialized */
+    int              numtrees;    /* number of alternatives tested */
+    boolean          improved;
+    } bestlist;
+
+#define randomTree    0
+#define givenTree     1 
+#define parsimonyTree 2
+
+typedef  struct {
+  int              bestTrav;
+  int              max_rearrange;
+  int              stepwidth;
+  int              initial;
+  boolean          initialSet;
+  int              mode; 
+  boolean        perGeneBranchLengths;
+  boolean        permuteTreeoptimize; 
+  boolean        compressPatterns;
+  double         likelihoodEpsilon;
+  boolean        useCheckpoint;
+  boolean        useQuartetGrouping;
+  unsigned long int numberRandomQuartets;
+  unsigned long int quartetCkpInterval; 
+ 
+#ifdef _BAYESIAN 
+  boolean       bayesian;
+#endif
+} analdef;
+
+
+
+
+typedef struct 
+{
+  int leftLength;
+  int rightLength;
+  int eignLength;
+  int evLength;
+  int eiLength;
+  int substRatesLength;
+  int frequenciesLength;
+  int tipVectorLength;
+  int symmetryVectorLength;
+  int frequencyGroupingLength;
+
+  boolean nonGTR;
+
+  int undetermined;
+
+  const char *inverseMeaning;
+
+  int states;
+
+  boolean smoothFrequencies;
+
+  const unsigned  int *bitVector;
+
+} partitionLengths;
+
+/****************************** FUNCTIONS ****************************************************/
+
+#ifdef _BAYESIAN 
+extern void mcmc(tree *tr, analdef *adef);
+#endif
+
+
+boolean isThisMyPartition(tree *localTree, int tid, int model);
+
+extern boolean allSmoothed(tree *tr);
+
+extern int treeFindTipName(FILE *fp, tree *tr, boolean check);
+
+extern void computePlacementBias(tree *tr, analdef *adef);
+
+extern int lookupWord(char *s, stringHashtable *h);
+
+extern void getDataTypeString(tree *tr, int model, char typeOfData[1024]);
+
+extern unsigned int genericBitCount(unsigned int* bitVector, unsigned int bitVectorLength);
+extern int countTips(nodeptr p, int numsp);
+extern entry *initEntry(void);
+extern void computeRogueTaxa(tree *tr, char* treeSetFileName, analdef *adef);
+extern unsigned int precomputed16_bitcount(unsigned int n, char *bits_in_16bits);
+
+
+
+
+
+extern size_t discreteRateCategories(int rateHetModel);
+
+extern partitionLengths * getPartitionLengths(pInfo *p);
+extern boolean getSmoothFreqs(int dataType);
+extern const unsigned int *getBitVector(int dataType);
+extern int getUndetermined(int dataType);
+extern int getStates(int dataType);
+extern char getInverseMeaning(int dataType, unsigned char state);
+extern double gettime ( void );
+extern int gettimeSrand ( void );
+extern double randum ( long *seed );
+
+extern void getxnode ( nodeptr p );
+extern void hookup ( nodeptr p, nodeptr q, double *z, int numBranches);
+extern void hookupDefault ( nodeptr p, nodeptr q, int numBranches);
+extern boolean whitechar ( int ch );
+extern void errorExit ( int e );
+extern void printResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBootstrapResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBipartitionResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printLog ( tree *tr);
+extern void printStartingTree ( tree *tr, analdef *adef, boolean finalPrint );
+extern void writeInfoFile ( analdef *adef, tree *tr, double t );
+extern int main ( int argc, char *argv[] );
+extern void calcBipartitions ( tree *tr, analdef *adef, char *bestTreeFileName, char *bootStrapFileName );
+extern void initReversibleGTR (tree *tr, int model);
+extern double LnGamma ( double alpha );
+extern double IncompleteGamma ( double x, double alpha, double ln_gamma_alpha );
+extern double PointNormal ( double prob );
+extern double PointChi2 ( double prob, double v );
+extern void makeGammaCats (double alpha, double *gammaRates, int K, boolean useMedian);
+extern void initModel ( tree *tr);
+extern void doAllInOne ( tree *tr, analdef *adef );
+
+extern void classifyML(tree *tr, analdef *adef);
+
+extern void resetBranches ( tree *tr );
+extern void modOpt ( tree *tr, double likelihoodEpsilon, analdef *adef, int treeIteration);
+
+
+
+extern void computeBOOTRAPID (tree *tr, analdef *adef, long *radiusSeed);
+extern void optimizeRAPID ( tree *tr, analdef *adef );
+extern void thoroughOptimization ( tree *tr, analdef *adef, topolRELL_LIST *rl, int index );
+extern int treeOptimizeThorough ( tree *tr, int mintrav, int maxtrav);
+extern void computeQuartets(tree *tr, analdef *adef);
+
+extern void makeRandomTree ( tree *tr);
+extern void nodeRectifier ( tree *tr );
+extern void makeParsimonyTreeFast(tree *tr);
+extern void allocateParsimonyDataStructures(tree *tr);
+extern void freeParsimonyDataStructures(tree *tr);
+extern void parsimonySPR(nodeptr p, tree *tr);
+
+extern FILE *myfopen(const char *path, const char *mode);
+
+
+extern boolean initrav ( tree *tr, nodeptr p );
+extern void initravPartition ( tree *tr, nodeptr p, int model );
+extern boolean update ( tree *tr, nodeptr p );
+extern boolean smooth ( tree *tr, nodeptr p );
+extern boolean smoothTree ( tree *tr, int maxtimes );
+extern boolean localSmooth ( tree *tr, nodeptr p, int maxtimes );
+extern boolean localSmoothMulti(tree *tr, nodeptr p, int maxtimes, int model);
+extern void initInfoList ( int n );
+extern void freeInfoList ( void );
+extern void insertInfoList ( nodeptr node, double likelihood );
+extern boolean smoothRegion ( tree *tr, nodeptr p, int region );
+extern boolean regionalSmooth ( tree *tr, nodeptr p, int maxtimes, int region );
+extern nodeptr removeNodeBIG ( tree *tr, nodeptr p, int numBranches);
+extern nodeptr removeNodeRestoreBIG ( tree *tr, nodeptr p );
+extern boolean insertBIG ( tree *tr, nodeptr p, nodeptr q, int numBranches);
+extern boolean insertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern boolean testInsertBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void addTraverseBIG ( tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav );
+extern int rearrangeBIG ( tree *tr, nodeptr p, int mintrav, int maxtrav );
+extern void traversalOrder ( nodeptr p, int *count, nodeptr *nodeArray );
+extern double treeOptimizeRapid ( tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt, bestlist *bestML);
+extern boolean testInsertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void restoreTreeFast ( tree *tr );
+extern int determineRearrangementSetting ( tree *tr, analdef *adef, bestlist *bestT, bestlist *bt, bestlist *bestML);
+extern void computeBIGRAPID ( tree *tr, analdef *adef, boolean estimateModel);
+extern boolean treeEvaluate ( tree *tr, double smoothFactor );
+extern boolean treeEvaluatePartition ( tree *tr, double smoothFactor, int model );
+
+extern void meshTreeSearch(tree *tr, analdef *adef, int thorough);
+
+extern void initTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void freeTL ( topolRELL_LIST *rl);
+extern void restoreTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void resetTL ( topolRELL_LIST *rl );
+extern void saveTL ( topolRELL_LIST *rl, tree *tr, int index );
+
+extern int  saveBestTree (bestlist *bt, tree *tr, boolean keepIdenticalTrees);
+extern int  recallBestTree (bestlist *bt, int rank, tree *tr);
+extern int initBestTree ( bestlist *bt, int newkeep, int numsp );
+extern void resetBestTree ( bestlist *bt );
+extern boolean freeBestTree ( bestlist *bt );
+
+
+extern char *Tree2String ( char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood, 
+			   boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport);
+extern void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission);
+
+
+
+extern int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly);
+extern void treeReadTopologyString(char *treeString, tree *tr);
+extern boolean treeReadLenMULT ( FILE *fp, tree *tr, int *partCount);
+
+extern void getStartingTree ( tree *tr);
+
+extern void computeBootStopOnly(tree *tr, char *bootStrapFileName, analdef *adef);
+extern boolean bootStop(tree *tr, hashtable *h, int numberOfTrees, double *pearsonAverage, unsigned int **bitVectors, int treeVectorLength, unsigned int vectorLength);
+extern void computeConsensusOnly(tree *tr, char* treeSetFileName, analdef *adef);
+extern double evaluatePartialGeneric (tree *, int i, double ki, int _model);
+extern void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal);
+extern void newviewGeneric (tree *tr, nodeptr p, boolean masked);
+extern void newviewGenericMulti (tree *tr, nodeptr p, int model);
+extern void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask);
+extern void makenewzGenericDistance(tree *tr, int maxiter, double *z0, double *result, int taxon1, int taxon2);
+extern double evaluatePartitionGeneric (tree *tr, nodeptr p, int model);
+extern void newviewPartitionGeneric (tree *tr, nodeptr p, int model);
+extern double evaluateGenericVector (tree *tr, nodeptr p);
+extern void categorizeGeneric (tree *tr, nodeptr p);
+extern double makenewzPartitionGeneric(tree *tr, nodeptr p, nodeptr q, double z0, int maxiter, int model);
+extern boolean isTip(int number, int maxTips);
+extern void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal);
+
+
+
+extern void   newviewIterative(tree *tr, int startIndex);
+
+extern void evaluateIterative(tree *);
+
+extern void *malloc_aligned( size_t size);
+
+extern void storeExecuteMaskInTraversalDescriptor(tree *tr);
+extern void storeValuesInTraversalDescriptor(tree *tr, double *value);
+
+
+
+
+extern void makenewzIterative(tree *);
+extern void execCore(tree *, volatile double *dlnLdlz, volatile double *d2lnLdlz2);
+
+
+
+extern void determineFullTraversal(nodeptr p, tree *tr);
+/*extern void optRateCat(tree *, int i, double lower_spacing, double upper_spacing, double *lhs);*/
+
+
+
+
+
+extern double evaluateGenericInitravPartition(tree *tr, nodeptr p, int model);
+extern void evaluateGenericVectorIterative(tree *, int startIndex, int endIndex);
+extern void categorizeIterative(tree *, int startIndex, int endIndex);
+
+extern void fixModelIndices(tree *tr, int endsite, boolean fixRates);
+extern void calculateModelOffsets(tree *tr);
+extern void gammaToCat(tree *tr);
+extern void catToGamma(tree *tr, analdef *adef);
+
+
+extern nodeptr findAnyTip(nodeptr p, int numsp);
+
+extern void parseProteinModel(analdef *adef);
+
+
+
+extern void computeNextReplicate(tree *tr, long *seed, int *originalRateCategories, int *originalInvariant, boolean isRapid, boolean fixRates);
+/*extern void computeNextReplicate(tree *tr, analdef *adef, int *originalRateCategories, int *originalInvariant);*/
+
+extern void putWAG(double *ext_initialRates);
+
+extern void reductionCleanup(tree *tr, int *originalRateCategories, int *originalInvariant);
+extern void parseSecondaryStructure(tree *tr, analdef *adef, int sites);
+extern void printPartitions(tree *tr);
+extern void compareBips(tree *tr, char *bootStrapFileName, analdef *adef);
+extern void computeRF(tree *tr, char *bootStrapFileName, analdef *adef);
+
+
+extern  unsigned int **initBitVector(int mxtips, unsigned int *vectorLength);
+extern hashtable *copyHashTable(hashtable *src, unsigned int vectorLength);
+extern hashtable *initHashTable(unsigned int n);
+extern void cleanupHashTable(hashtable *h, int state);
+extern double convergenceCriterion(hashtable *h, int mxtips);
+extern void freeBitVectors(unsigned int **v, int n);
+extern void freeHashTable(hashtable *h);
+extern stringHashtable *initStringHashTable(hashNumberType n);
+extern void addword(char *s, stringHashtable *h, int nodeNumber);
+
+
+extern void printBothOpen(const char* format, ... );
+extern void initRateMatrix(tree *tr);
+
+extern void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf,
+				    int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF);
+
+extern int getIncrement(tree *tr, int model);
+
+
+
+extern void writeBinaryModel(tree *tr);
+extern void readBinaryModel(tree *tr);
+extern void treeEvaluateRandom (tree *tr, double smoothFactor);
+extern void treeEvaluateProgressive(tree *tr);
+
+extern void testGapped(tree *tr);
+
+extern boolean issubset(unsigned int* bipA, unsigned int* bipB, unsigned int vectorLen);
+extern boolean compatible(entry* e1, entry* e2, unsigned int bvlen);
+
+
+
+extern int *permutationSH(tree *tr, int nBootstrap, long _randomSeed);
+
+extern void checkPerSiteRates(const tree * const tr ); 
+
+extern void restart(tree *tr, analdef *adef);
+
+extern void writeCheckpoint(tree *tr, analdef *adef);
+
+extern boolean isGap(unsigned int *x, int pos);
+extern boolean noGap(unsigned int *x, int pos);
+
+extern void scaleLG4X_EIGN(tree *tr, int model);
+
+extern void myBinFwrite(void *ptr, size_t size, size_t nmemb, FILE *byteFile);
+extern void myBinFread(void *ptr, size_t size, size_t nmemb, FILE *byteFile);
+
+#ifdef __AVX
+
+extern void newviewGTRGAMMAPROT_AVX_LG4(int tipCase,
+					double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+					int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n, 
+					double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+extern void newviewGTRCAT_AVX(int tipCase,  double *EV,  int *cptr,
+			      double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+			      unsigned char *tipX1, unsigned char *tipX2,
+			      int n,  double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+extern void newviewGTRCATPROT_AVX(int tipCase, double *extEV,
+				  int *cptr,
+				  double *x1, double *x2, double *x3, double *tipVector,
+				  unsigned char *tipX1, unsigned char *tipX2,
+				  int n, double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+extern void newviewGTRGAMMA_AVX(int tipCase,
+				double *x1_start, double *x2_start, double *x3_start,
+				double *EV, double *tipVector,
+				unsigned char *tipX1, unsigned char *tipX2,
+				const int n, double *left, double *right, int *wgt, int *scalerIncrement
+				);
+
+extern void newviewGTRGAMMAPROT_AVX(int tipCase,
+				    double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+				    unsigned char *tipX1, unsigned char *tipX2, int n, 
+				    double *left, double *right, int *wgt, int *scalerIncrement);
+
+/* memory saving functions */
+
+void newviewGTRCAT_AVX_GAPPED_SAVE(int tipCase,  double *EV,  int *cptr,
+				   double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+				   int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				   int n,  double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				   unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				   double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+void newviewGTRCATPROT_AVX_GAPPED_SAVE(int tipCase, double *extEV,
+				       int *cptr,
+				       double *x1, double *x2, double *x3, double *tipVector,
+				       int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				       int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				       unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				       double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+void  newviewGTRGAMMA_AVX_GAPPED_SAVE(int tipCase,
+				      double *x1_start, double *x2_start, double *x3_start,
+				      double *extEV, double *tipVector,
+				      int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				      const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+				      unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+				      double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+				      );
+
+void newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(int tipCase,
+					 double *x1_start, double *x2_start, double *x3_start, double *extEV, double *tipVector,
+					 int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n, 
+					 double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+					 unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+					 double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn); 
+#endif
+
+
+
+/* from communication.c */
+void calculateLengthAndDisplPerProcess(tree *tr, int **length_result, int **disp_result);
+void scatterDistrbutedArray(tree *tr, void *src, void *destination, MPI_Datatype type, int *countPerProc, int *displPerProc);
+void gatherDistributedArray(tree *tr, void **destination, void *src, MPI_Datatype type, int* countPerProc, int *displPerProc);
+
+
+#endif
+
+
+
+#
diff --git a/examl/bipartitionList.c b/examl/bipartitionList.c
new file mode 100644
index 0000000..7e3e80d
--- /dev/null
+++ b/examl/bipartitionList.c
@@ -0,0 +1,592 @@
+/*  RAxML-HPC, a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright March 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  stamatak at ics.forth.gr
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *  
+ *  Alexandros Stamatakis: "An Efficient Program for phylogenetic Inference Using Simulated Annealing". 
+ *  Proceedings of IPDPS2005,  Denver, Colorado, April 2005.
+ *  
+ *  AND
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+#ifndef WIN32  
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>  
+#endif
+
+#include <limits.h>
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include "axml.h"
+
+
+
+extern const unsigned int mask32[32];
+
+extern int processID;
+
+static void getxnodeBips (nodeptr p)
+{
+  nodeptr  s;
+
+  if ((s = p->next)->xBips || (s = s->next)->xBips)
+    {
+      p->xBips = s->xBips;
+      s->xBips = 0;
+    }
+
+  assert(p->xBips);
+}
+
+
+entry *initEntry(void)
+{
+  entry *e = (entry*)malloc(sizeof(entry));
+
+  e->bitVector     = (unsigned int*)NULL;
+  e->treeVector    = (unsigned int*)NULL;
+  e->supportVector = (int*)NULL;
+  e->bipNumber  = 0;
+  e->bipNumber2 = 0;
+  e->supportFromTreeset[0] = 0;
+  e->supportFromTreeset[1] = 0;
+  e->next       = (entry*)NULL;
+
+  return e;
+} 
+
+hashtable *initHashTable(hashNumberType n)
+{
+  /* 
+     init with primes 
+    
+     static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+     196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+     50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+  */
+
+  /* init with powers of two */
+
+  static const  hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+					      32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+					      4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+					      268435456, 536870912, 1073741824, 2147483648U};
+  
+  hashtable *h = (hashtable*)malloc(sizeof(hashtable));
+  
+  hashNumberType
+    tableSize,
+    i,
+    primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+    maxSize = (hashNumberType)-1;    
+
+  assert(n <= maxSize);
+
+  i = 0;
+
+  while(initTable[i] < n && i < primeTableLength)
+    i++;
+
+  assert(i < primeTableLength);
+
+  tableSize = initTable[i];
+
+ 
+
+  h->table = (entry**)calloc(tableSize, sizeof(entry*));
+  h->tableSize = tableSize;  
+  h->entryCount = 0;  
+
+  return h;
+}
+
+
+
+
+void freeHashTable(hashtable *h)
+{
+  hashNumberType
+    i,
+    entryCount = 0;
+   
+
+  for(i = 0; i < h->tableSize; i++)
+    {
+      if(h->table[i] != NULL)
+	{
+	  entry *e = h->table[i];
+	  entry *previous;	 
+
+	  do
+	    {
+	      previous = e;
+	      e = e->next;
+
+	      if(previous->bitVector)
+		free(previous->bitVector);
+
+	      if(previous->treeVector)
+		free(previous->treeVector);
+
+	      if(previous->supportVector)
+		free(previous->supportVector);
+	      
+	      free(previous);	      
+	      entryCount++;
+	    }
+	  while(e != NULL);	  
+	}
+
+    }
+
+  assert(entryCount == h->entryCount);
+ 
+  free(h->table);
+}
+
+
+
+void cleanupHashTable(hashtable *h, int state)
+{
+  hashNumberType
+    k,
+    entryCount = 0,
+    removeCount = 0;
+ 
+  assert(state == 1 || state == 0);
+
+  for(k = 0, entryCount = 0; k < h->tableSize; k++)	     
+    {      
+      if(h->table[k] != NULL)
+	{
+	  entry *e = h->table[k];
+	  entry *start     = (entry*)NULL;
+	  entry *lastValid = (entry*)NULL;
+	  	  
+	  do
+	    {	   	 	      	
+	      if(state == 0)
+		{
+		  e->treeVector[0] = e->treeVector[0] & 2;	
+		  assert(!(e->treeVector[0] & 1));
+		}
+	      else
+		{
+		  e->treeVector[0] = e->treeVector[0] & 1;
+		  assert(!(e->treeVector[0] & 2));
+		}
+	      
+	      if(e->treeVector[0] != 0)
+		{
+		  if(!start)
+		    start = e;
+		  lastValid = e;
+		  e = e->next;
+		}	  
+	      else
+		{
+		  entry *remove = e;
+		  e = e->next;
+		  
+		  removeCount++;
+
+		  if(lastValid)		    		    
+		    lastValid->next = remove->next;
+
+		  if(remove->bitVector)
+		    free(remove->bitVector);
+		  if(remove->treeVector)
+		    free(remove->treeVector);
+		  if(remove->supportVector)
+		    free(remove->supportVector);
+		  free(remove);		 
+		}
+	      
+	      entryCount++;	     	     
+	    }
+	  while(e != NULL);	 
+
+	  if(!start)
+	    {
+	      assert(!lastValid);
+	      h->table[k] = NULL;
+	    }
+	  else
+	    {
+	      h->table[k] = start;
+	    }	 	 
+	}    
+    }
+
+  assert(entryCount ==  h->entryCount);  
+
+  h->entryCount -= removeCount;
+}
+
+
+
+
+
+
+
+
+
+
+
+unsigned int **initBitVector(int mxtips, unsigned int *vectorLength)
+{
+  unsigned int 
+    **bitVectors = (unsigned int **)malloc(sizeof(unsigned int*) * 2 * mxtips);
+  
+  int 
+    i;
+
+  if(mxtips % MASK_LENGTH == 0)
+    *vectorLength = mxtips / MASK_LENGTH;
+  else
+    *vectorLength = 1 + (mxtips / MASK_LENGTH); 
+  
+  for(i = 1; i <= mxtips; i++)
+    {
+      bitVectors[i] = (unsigned int *)calloc(*vectorLength, sizeof(unsigned int));
+      assert(bitVectors[i]);
+      bitVectors[i][(i - 1) / MASK_LENGTH] |= mask32[(i - 1) % MASK_LENGTH];
+    }
+  
+  for(i = mxtips + 1; i < 2 * mxtips; i++) 
+    {
+      bitVectors[i] = (unsigned int *)malloc(sizeof(unsigned int) * *vectorLength);
+      assert(bitVectors[i]);
+    }
+
+  return bitVectors;
+}
+
+void freeBitVectors(unsigned int **v, int n)
+{
+  int i;
+
+  for(i = 1; i < n; i++)
+    free(v[i]);
+}
+
+
+
+
+
+static void newviewBipartitions(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength)
+{
+  
+  if(isTip(p->number, numsp))
+    return;
+  {
+    nodeptr 
+      q = p->next->back, 
+      r = p->next->next->back;
+    
+    
+    
+    unsigned int       
+      *vector = bitVectors[p->number],
+      *left  = bitVectors[q->number],
+      *right = bitVectors[r->number];
+    unsigned 
+      int i;      
+    
+    assert(processID == 0);
+    
+
+    while(!p->xBips)
+      {	
+	if(!p->xBips)
+	  getxnodeBips(p);
+      }
+
+    p->hash = q->hash ^ r->hash;
+
+    if(isTip(q->number, numsp) && isTip(r->number, numsp))
+      {		
+	for(i = 0; i < vectorLength; i++)
+	  vector[i] = left[i] | right[i];	  	
+      }
+    else
+      {	
+	if(isTip(q->number, numsp) || isTip(r->number, numsp))
+	  {
+	    if(isTip(r->number, numsp))
+	      {	
+		nodeptr tmp = r;
+		r = q;
+		q = tmp;
+	      }	   
+	    	    
+	    while(!r->xBips)
+	      {
+		if(!r->xBips)
+		  newviewBipartitions(bitVectors, r, numsp, vectorLength);
+	      }	   
+
+	    for(i = 0; i < vectorLength; i++)
+	      vector[i] = left[i] | right[i];	    	 
+	  }
+	else
+	  {	    
+	    while((!r->xBips) || (!q->xBips))
+	      {
+		if(!q->xBips)
+		  newviewBipartitions(bitVectors, q, numsp, vectorLength);
+		if(!r->xBips)
+		  newviewBipartitions(bitVectors, r, numsp, vectorLength);
+	      }	   	    	    	    	   
+
+	    for(i = 0; i < vectorLength; i++)
+	      vector[i] = left[i] | right[i];	 
+	  }
+
+      }     
+  }     
+}
+
+
+
+
+static void insertHashRF(unsigned int *bitVector, hashtable *h, unsigned int vectorLength, int treeNumber, int treeVectorLength, hashNumberType position, int support, 
+			 boolean computeWRF)
+{     
+  if(h->table[position] != NULL)
+    {
+      entry *e = h->table[position];     
+
+      do
+	{	 
+	  unsigned int i;
+	  
+	  for(i = 0; i < vectorLength; i++)
+	    if(bitVector[i] != e->bitVector[i])
+	      break;
+	  
+	  if(i == vectorLength)
+	    {
+	      e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+	      if(computeWRF)
+		{
+		  e->supportVector[treeNumber] = support;
+		 
+		  assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+		}
+	      return;
+	    }
+	  
+	  e = e->next;
+	}
+      while(e != (entry*)NULL); 
+
+      e = initEntry(); 
+       
+      /*e->bitVector  = (unsigned int*)calloc(vectorLength, sizeof(unsigned int));*/
+      e->bitVector = (unsigned int*)malloc_aligned(vectorLength * sizeof(unsigned int));
+      memset(e->bitVector, 0, vectorLength * sizeof(unsigned int));
+
+
+      e->treeVector = (unsigned int*)calloc(treeVectorLength, sizeof(unsigned int));
+      if(computeWRF)
+	e->supportVector = (int*)calloc(treeVectorLength * MASK_LENGTH, sizeof(int));
+
+      e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+      if(computeWRF)
+	{
+	  e->supportVector[treeNumber] = support;
+	 
+	  assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+	}
+
+      memcpy(e->bitVector, bitVector, sizeof(unsigned int) * vectorLength);
+     
+      e->next = h->table[position];
+      h->table[position] = e;          
+    }
+  else
+    {
+      entry *e = initEntry(); 
+       
+      /*e->bitVector  = (unsigned int*)calloc(vectorLength, sizeof(unsigned int)); */
+
+      e->bitVector = (unsigned int*)malloc_aligned(vectorLength * sizeof(unsigned int));
+      memset(e->bitVector, 0, vectorLength * sizeof(unsigned int));
+
+      e->treeVector = (unsigned int*)calloc(treeVectorLength, sizeof(unsigned int));
+      if(computeWRF)	
+	e->supportVector = (int*)calloc(treeVectorLength * MASK_LENGTH, sizeof(int));
+
+
+      e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+      if(computeWRF)
+	{
+	  e->supportVector[treeNumber] = support;
+	 
+	  assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+	}
+
+      memcpy(e->bitVector, bitVector, sizeof(unsigned int) * vectorLength);     
+
+      h->table[position] = e;
+    }
+
+  h->entryCount =  h->entryCount + 1;
+}
+
+
+
+void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf, 
+			     int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF)
+{
+  if(isTip(p->number, numsp))
+    return;
+  else
+    {
+      nodeptr 
+	q = p->next;          
+
+      do 
+	{
+	  bitVectorInitravSpecial(bitVectors, q->back, numsp, vectorLength, h, treeNumber, function, bInf, countBranches, treeVectorLength, traverseOnly, computeWRF);
+	  q = q->next;
+	}
+      while(q != p);
+           
+      newviewBipartitions(bitVectors, p, numsp, vectorLength);
+      
+      assert(p->xBips);
+
+      assert(!traverseOnly);     
+
+      if(!(isTip(p->back->number, numsp)))
+	{
+	  unsigned int 
+	    *toInsert  = bitVectors[p->number];
+	  
+	  hashNumberType 
+	    position = p->hash % h->tableSize;
+	 
+	  assert(!(toInsert[0] & 1));
+	  assert(!computeWRF);
+	  
+	  switch(function)
+	    {	     
+	    case BIPARTITIONS_RF:	     
+	      insertHashRF(toInsert, h, vectorLength, treeNumber, treeVectorLength, position, 0, computeWRF);
+	      *countBranches =  *countBranches + 1;
+	      break;
+	    default:
+	      assert(0);
+	    }	  	  
+	}
+      
+    }
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+double convergenceCriterion(hashtable *h, int mxtips)
+{
+  int      
+    rf = 0; 
+
+  unsigned int 
+    collisions = 0,
+    k = 0, 
+    entryCount = 0;
+  
+  double    
+    rrf;  
+
+  for(k = 0, entryCount = 0; k < h->tableSize; k++)	     
+    {      
+      if(h->table[k] != NULL)
+	{
+	  entry *e = h->table[k];
+
+	  unsigned int 
+	    slotCollisions = 0;
+
+	  do
+	    {
+	      unsigned int *vector = e->treeVector;	     
+	      if(((vector[0] & 1) > 0) + ((vector[0] & 2) > 0) == 1)
+		rf++;	     
+	      
+	      entryCount++;
+	      slotCollisions++;
+	      e = e->next;
+	    }
+	  while(e != NULL);
+
+	  collisions += (slotCollisions - 1);
+	}     
+    }
+
+  assert(entryCount == h->entryCount);  
+      
+  rrf = (double)rf/((double)(2 * (mxtips - 3)));  
+
+#ifdef _DEBUG_CHECKPOINTING
+  printf("Collisions: %u\n", collisions);
+#endif
+
+  return rrf;
+}
+
+
+
+
diff --git a/examl/byteFile.c b/examl/byteFile.c
new file mode 100644
index 0000000..49453f0
--- /dev/null
+++ b/examl/byteFile.c
@@ -0,0 +1,435 @@
+#include <string.h> 
+
+#if defined(__APPLE__)
+#include <malloc/malloc.h>
+#else
+#include <malloc.h>
+#endif
+
+#include "byteFile.h"
+#include <stdlib.h>
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+#define READ_VAR(file,var) assert( fread(&var, sizeof(var),1, file ) == 1  )
+#define READ_ARRAY(file, arrPtr, numElem, size)  assert( fread(arrPtr, size, numElem, file) ==  (unsigned int) numElem)
+
+extern int processID;
+
+/** 
+    seekPos finds the position in the byte file where a certain type
+    of information is stored. See byteFile.h for possible values of
+    "pos" .
+
+    Notice, that this is a "fall-through" switch statement: if -- for
+    instance -- we want to get to the position of the taxa, we have to
+    skip everything that comes prior to the taxa in the file (but
+    naturally not the taxa themselves).
+ */ 
+static void seekPos(ByteFile *bf, int pos)
+{
+  exa_off_t
+    toSkip = 0;
+  
+  int 
+    i; 
+
+  switch(pos)
+    {
+    case ALN_ALIGNMENT: 	/* skips partitions */
+      {
+	assert(bf->hasRead & ALN_PARTITIONS); 
+	pInfo p ; 
+
+	toSkip +=  bf->numPartitions * ( sizeof(p.states) + sizeof(p.maxTipStates) + sizeof(p.lower) 
+					 + sizeof(p.upper) + sizeof(p.width) + sizeof(p.dataType) + sizeof(p.protModels) 
+					 + sizeof(p.protFreqs) + sizeof(p.nonGTR) + sizeof(p.optimizeBaseFrequencies)); 
+	
+	/* skip the names and their lengths */
+	for( i = 0 ; i < bf->numPartitions; ++i)
+	  {
+	    pInfo *p = bf->partitions[i]; 
+	    toSkip += (strlen(p->partitionName)+1 ) * sizeof(char) + sizeof(int);
+	    toSkip += sizeof(double) * p->states; /* also skip frequncies */
+	  }
+      }
+    case ALN_PARTITIONS: 	/* skips taxa  */
+      {
+	assert(bf->hasRead & ALN_TAXA); 
+	for(i = 0; i < bf->numTax; ++i)
+	  toSkip += (strlen(bf->taxaNames[i]) + 1)  * sizeof(char) + sizeof(int); 
+      }
+    case ALN_TAXA: 		/* skips weights */
+      {
+	assert(bf->hasRead & ALN_HEAD); 
+	toSkip += bf->numPattern * sizeof(int); 
+      }
+    case ALN_WEIGHTS: 		/* skips header */
+      {
+	toSkip += 
+	  sizeof(bf->numTax) + sizeof(bf->numPattern) 
+	  + sizeof(bf->numPartitions) + sizeof(bf->gappyness); 
+      }
+    case ALN_HEAD : 
+      toSkip += (3 * sizeof(int)); 		/* skips the initial int that tells us how many bytes a size_t has as well as the integer for the version number and the magic integer number */
+      break; 
+    default : 
+      assert(0); 
+    }
+  
+  exa_fseek(bf->fh, toSkip, SEEK_SET); 
+} 
+
+/** 
+    initializes ByteFile **bf 
+ */ 
+void initializeByteFile(ByteFile **bf, char *name)
+{
+  *bf = (ByteFile *)calloc(1,sizeof(ByteFile)); 
+  ByteFile *result = *bf; 
+  result->fh  = myfopen(name, "rb"); 
+
+  int 
+    sizeOfSizeT = 0, 
+    version = 0,
+    magicNumber = 0;
+  
+  READ_VAR(result->fh, sizeOfSizeT); 
+
+  if(sizeOfSizeT != sizeof(size_t))
+    {
+      if(processID == 0)
+	{
+	  printf("\nError: the address data type has a size of %d bits on the current system while on the system on which you created the binary alignment file using the parser the address size is %d bits!\n", 
+		 8 * (int)sizeof(size_t), 8 * sizeOfSizeT);
+	  printf("Usually this indicates that the parser was executed on a 32-bit system while you are trying to run ExaML on a 64-bit system.\n");
+	  printf("Please parse the binary alignment file on the same hardware on which you intend to run ExaML.\n\n\n"); 
+	}
+	  
+      MPI_Barrier(MPI_COMM_WORLD);
+      MPI_Finalize();
+      exit(-1);
+    }
+
+  //check that version numbers of parser and ExaML match
+  READ_VAR(result->fh, version); 
+
+  if(version != (int)programVersionInt)
+    {
+      if(processID == 0)
+	{
+	  printf("\nError: Version number %d of ExaML parser and version number %d of ExaML don't match.\n", version, (int)programVersionInt);
+	  printf("You are either using an outdated version of the parser or of ExaML.\n");
+	  printf("Hasta siempre comandante.\n\n\n");
+	}
+      
+      MPI_Barrier(MPI_COMM_WORLD);
+      MPI_Finalize();     
+      exit(-1);
+    }
+
+  READ_VAR(result->fh, magicNumber);
+
+  if(magicNumber != 6517718)
+    { 
+      if(processID == 0)
+	{
+	  printf("\nError: The magic number %d of ExaML parser and magic number %d of ExaML don't match.\n", magicNumber, 6517718);
+	  printf("Something went terribly wrong here.\n");
+	  printf("Hasta la victoria siempre.\n\n\n");
+	}
+
+      MPI_Barrier(MPI_COMM_WORLD);
+      MPI_Finalize();   
+      exit(-1);
+    }
+
+} 
+
+
+/** 
+    a shallow cleanup of ByteFile *bf. Notice, that various data may
+    have been copied (by pointer value) to our tree instance and
+    therefore should not be clean up.
+ */ 
+void deleteByteFile(ByteFile *bf) 
+{
+  /* only a shallow free! pointers inside the pInfo must persist */
+  int i; 
+
+  if(bf->partitions)
+    {
+      for( i = 0; i < bf->numPartitions; ++i)
+      	free(bf->partitions[i]);
+      free(bf->partitions); 
+    }
+
+  if(bf->fh)
+    fclose(bf->fh); 
+
+  if(bf->taxaNames )
+    {
+      for(i = 0; i < bf->numTax; ++i)
+	free(bf->taxaNames[i] ); 
+    }
+  free(bf->taxaNames); 
+  free(bf);
+} 
+
+
+
+
+/** 
+    only reads initial header information 
+ */ 
+void readHeader(ByteFile* bf)
+{
+  seekPos(bf, ALN_HEAD); 
+  READ_VAR(bf->fh, bf->numTax); 
+  READ_VAR(bf->fh, bf->numPattern); 
+  READ_VAR(bf->fh, bf->numPartitions); 
+  READ_VAR(bf->fh, bf->gappyness) ;
+  bf->hasRead |= ALN_HEAD;
+
+}
+
+
+/** 
+    reads partition information from the byte file.
+ */
+void readPartitions(ByteFile *bf)
+{
+  int i ; 
+  
+  seekPos(bf, ALN_PARTITIONS); 
+  
+  assert(bf->partitions == (pInfo **)NULL); 
+  bf->partitions = (pInfo **)calloc(bf->numPartitions, sizeof(pInfo*) );
+  for(i = 0; i < bf->numPartitions; ++i)
+    {
+      bf->partitions[i] = (pInfo*)calloc(1,sizeof(pInfo));
+      pInfo* p = bf->partitions[i];
+
+      p->frequencies = (double*)NULL;
+      p->partitionName = (char *)NULL;
+
+      READ_VAR(bf->fh, p->states);
+      READ_VAR(bf->fh, p->maxTipStates);
+      READ_VAR(bf->fh, p->lower);
+      READ_VAR(bf->fh, p->upper);
+
+      /* DONT use this value! */
+      READ_VAR(bf->fh, p->width);
+      p->width = 0; 
+
+      READ_VAR(bf->fh, p->dataType);
+      READ_VAR(bf->fh, p->protModels);
+      //READ_VAR(bf->fh, p->autoProtModels);
+      READ_VAR(bf->fh, p->protFreqs);
+      READ_VAR(bf->fh, p->nonGTR);
+      READ_VAR(bf->fh, p->optimizeBaseFrequencies);
+      //      READ_VAR(bf->fh, p->numberOfCategories);
+
+      /* read string */
+      unsigned int len = 0; 
+      READ_VAR(bf->fh, len); 
+      p->partitionName = (char*)calloc(len,sizeof(char));
+      READ_ARRAY(bf->fh, p->partitionName, len, sizeof(char)); 
+
+      p->frequencies = (double*)calloc(p->states, sizeof(double)); 
+      READ_ARRAY(bf->fh, p->frequencies, p->states , sizeof(double)); 
+    }
+  
+  bf->hasRead |= ALN_PARTITIONS; 
+}
+
+
+/** 
+    reads the taxon names from the byte file  
+ */ 
+void readTaxa(ByteFile *bf)
+{
+  int i; 
+
+  assert(bf->taxaNames == (char **)NULL);
+  seekPos(bf,  ALN_TAXA); 
+
+  bf->taxaNames = (char **)calloc(bf->numTax, sizeof(char*));
+  for(i = 0; i < bf->numTax; ++i)
+    {
+      int len = 0; 
+      READ_VAR(bf->fh, len ); 
+      bf->taxaNames[i] = (char*)calloc(len, sizeof(char)); 
+      READ_ARRAY(bf->fh, bf->taxaNames[i], len, sizeof(char)); 
+    }
+
+  bf->hasRead |= ALN_TAXA; 
+}
+
+
+ // #define OLD_LAYOUT 
+
+/** 
+    uses the information in the PartitionAssignment to only extract
+    data relevant to this process (weights and alignment characters).
+ */ 
+void readMyData(ByteFile *bf, PartitionAssignment *pa, int procId)
+{
+  seekPos(bf, ALN_ALIGNMENT); 
+
+  exa_off_t
+    alnPos = exa_ftell(bf->fh); 
+
+  size_t 
+    len; 
+
+  int numAssign = pa->numAssignPerProc[procId];
+  Assignment *myAssigns = pa->assignPerProc[procId];
+
+  /* first read aln characters   */
+  int i,j ; 
+  for(i = 0; i < numAssign; ++i )
+    {
+      Assignment a = myAssigns[i]; 
+      /* printf("reading for: ") ;  */
+      /* printAssignment(a, procId);  */
+
+      pInfo *partition = bf->partitions[a.partId]; 
+      partition->width = a.width; 
+      partition->offset = a.offset; 
+      len = bf->numTax * a.width; 
+      partition->yResource = (unsigned char*)malloc_aligned( len * sizeof(unsigned char)); 
+      memset(partition->yResource,0,len * sizeof(unsigned char)); 
+      partition->yVector = (unsigned char**) calloc(bf->numTax + 1 , sizeof(unsigned char*)); 
+      for(j = 1; j <= bf->numTax; ++j)
+	partition->yVector[j] = partition->yResource + (j-1) * a.width; 
+
+#ifdef OLD_LAYOUT
+      for(j = 1; j <= bf->numTax; ++j )
+	{
+	  exa_off_t pos = alnPos + (  bf->numPattern * (j-1)    +  partition->lower + a.offset ) * sizeof(unsigned char); 
+	  assert(alnPos <= pos); 
+	  exa_fseek(bf->fh, pos, SEEK_SET); 
+	  READ_ARRAY(bf->fh, partition->yVector[j], a.width, sizeof(unsigned char));
+	}
+#else 
+      /*  if the entire partition is assigned to this process, read it
+          in one go. Otherwise, several seeks are necessary.  */
+      if( a.width == (partition->upper - partition->lower ) )
+        {
+	  exa_off_t
+            pos = alnPos + (partition->lower * bf->numTax) * sizeof(unsigned char); 
+
+	  assert(alnPos <= pos); 
+	  exa_fseek(bf->fh, pos, SEEK_SET); 
+	  READ_ARRAY(bf->fh, partition->yResource, a.width * bf->numTax, sizeof(unsigned char));
+        }
+      else 
+        {
+          for(j = 1; j <= bf->numTax; ++j )
+            {
+              exa_off_t 
+                pos = alnPos + sizeof(unsigned char) 
+                * ( 
+                   (partition->lower * bf->numTax ) /* until start of partition  */
+                   + ((j-1) * (partition->upper - partition->lower) ) /* until start of sequence of taxon within partition */
+                   + a.offset )  ; 
+
+              assert(alnPos <= pos); 
+              exa_fseek(bf->fh, pos, SEEK_SET); 
+              READ_ARRAY(bf->fh, partition->yVector[j], a.width, sizeof(unsigned char));
+            }
+        }
+#endif
+    }
+
+  
+  /* now read weights  */
+  seekPos(bf, ALN_WEIGHTS); 
+
+  exa_off_t
+    wgtPos = exa_ftell(bf->fh); 
+  assert( ! (wgtPos <  0) );
+
+  for(i = 0; i < numAssign; ++i)
+    {
+      Assignment a = myAssigns[i]; 
+      pInfo *partition = bf->partitions[a.partId];
+
+#ifdef __MIC_NATIVE
+     /* for Xeon Phi, wgt must be padded to the multiple of 8 (because of site blocking in kernels) */
+     const int padded_width = GET_PADDED_WIDTH(a.width);
+     len = padded_width * sizeof(int);
+#else
+     len = a.width * sizeof(int);
+#endif
+
+      partition->wgt = (int*)malloc_aligned( len); 
+      memset(partition->wgt, 0, len); 
+
+      exa_off_t pos = wgtPos +  (partition->lower  + a.offset) * sizeof(int); 
+      assert(wgtPos <= pos );
+      
+      exa_fseek(bf->fh, pos, SEEK_SET); 
+      READ_ARRAY(bf->fh, partition->wgt, a.width, sizeof(int)); 
+
+    }
+
+  bf->hasRead |= ALN_ALIGNMENT; 
+  bf->hasRead |= ALN_WEIGHTS; 
+} 
+
+
+/** 
+    copies all relevant information from our byte file to the tree
+    instance.
+ */ 
+void initializeTreeFromByteFile(ByteFile *bf, tree *tr)
+{
+  assert( ( bf->hasRead & ALN_HEAD )
+	  && (bf->hasRead & ALN_WEIGHTS)
+	  && (bf->hasRead & ALN_TAXA) 
+	  && (bf->hasRead & ALN_PARTITIONS)
+	  && (bf->hasRead & ALN_ALIGNMENT ) ); 
+ 
+  /* some additional stuff we read */
+  tr->mxtips = bf->numTax;
+  tr->originalCrunchedLength = bf->numPattern; 
+  tr->NumberOfModels = bf->numPartitions; 
+  tr->gapyness = bf->gappyness; 
+  
+  /* deep copy of taxa */
+  int i ; 
+  tr->nameList = (char **)calloc((size_t)(tr->mxtips + 1), sizeof(char *)  );
+  
+  tr->nameList[0] = (char *)NULL;
+
+  for(i = 1; i <= bf->numTax; ++i)
+    {
+      tr->nameList[i] = (char*)calloc(strlen(bf->taxaNames[i-1]) + 1, sizeof(char)); 
+      strcpy(tr->nameList[i], bf->taxaNames[i-1]);      
+    }
+
+  /* 
+   * shallow copy of partitions 
+   * 
+   * partition contains only shallow copies of a few data arrays that
+   * needed to be initialized at this point
+   */
+  int 
+    myLength = 0; 
+
+  tr->partitionData = (pInfo*)calloc(tr->NumberOfModels, sizeof(pInfo));
+
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    {     
+      tr->partitionData[i] = *(bf->partitions[i]);
+      myLength += tr->partitionData[i].width; 
+      assert( bf->partitions[i]->wgt != (int*)NULL || bf->partitions[i]->width == 0); 
+      assert( ( tr->partitionData[i].wgt != (int*)NULL)  || ( tr->partitionData[i].width == 0 ) ); 
+    }
+} 
+
+
diff --git a/examl/byteFile.h b/examl/byteFile.h
new file mode 100644
index 0000000..c4306ae
--- /dev/null
+++ b/examl/byteFile.h
@@ -0,0 +1,60 @@
+#ifndef _BYTE_FILE
+#define _BYTE_FILE
+
+#include "axml.h"
+
+#include "partitionAssignment.h"
+
+#define ALN_HEAD 1 
+#define ALN_WEIGHTS 2 
+#define ALN_TAXA 4 
+#define ALN_PARTITIONS 8  
+#define ALN_ALIGNMENT 16 
+
+
+
+typedef struct 
+{
+  int numTax; 
+  size_t numPattern; 
+  int numPartitions; 
+  double gappyness; 
+  pInfo **partitions;
+  char **taxaNames; 
+  FILE *fh; 
+  char hasRead ; 
+} ByteFile; 
+
+/* 
+   constructor  
+*/ 
+void initializeByteFile(ByteFile **bf, char *name); 
+/* 
+   deletor 
+*/ 
+void deleteByteFile(ByteFile *bf) ; 
+/* 
+   reads the header of a byte file   
+*/  
+void readHeader(ByteFile* bf); 
+/* 
+   reads partition information in a byte file 
+*/ 
+void readPartitions(ByteFile *bf); 
+/* 
+   reads the taxon names in a byte file   
+*/
+void readTaxa(ByteFile *bf); 
+/* 
+   reads weights and alignment characters in a byte file 
+*/ 
+void readMyData(ByteFile *bf, PartitionAssignment *pa, int procId); 
+/*
+  initializes a tree from a byte file.  
+
+  @notice Since shallow copies are involved, you cannot copy the
+  information from a byte file into multiple tree instances.
+ */
+void initializeTreeFromByteFile(ByteFile *bf, tree *tr); 
+
+#endif
diff --git a/examl/communication.c b/examl/communication.c
new file mode 100644
index 0000000..d700edc
--- /dev/null
+++ b/examl/communication.c
@@ -0,0 +1,182 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+
+#include <mpi.h>
+
+#include "axml.h"
+
+
+extern int processes; 
+extern int processID; 
+
+
+
+/** 
+    computes the count and displacement for gatherv/scatterv, assuming
+    that the new partition assignment algorithm was used
+*/ 
+void calculateLengthAndDisplPerProcess(tree *tr, int **length_result, int **disp_result)
+{
+  int i; 
+
+  *length_result = (int*) calloc((size_t) processes , sizeof(int)); 
+  *disp_result = (int*) calloc((size_t) processes, sizeof(int)); 
+
+  int* numPerProc = *length_result; 
+  int* displPerProc= *disp_result;
+  
+  for(i = 0; i < tr->numAssignments; ++i)
+    {
+      Assign* ass = &(tr->partAssigns[i]); 
+      numPerProc[ass->procId] += ass->width; 
+    }
+
+  displPerProc[0] = 0; 
+  for(i = 1; i < processes  ; ++i)
+    displPerProc[i] = displPerProc[i-1] + numPerProc[i-1];   
+}
+
+
+static size_t mapMpiTypeToSize(MPI_Datatype type)
+{
+  if(type == MPI_INT)
+    return sizeof(int); 
+  else if(type == MPI_DOUBLE)
+    return sizeof(double); 
+  else 
+    {
+      assert(0); 
+      return 0; 
+    }
+}
+
+
+/** 
+    scatters a distributed array (e.g., what used to be
+    tr->rateCategory) to partition-specfic arrays (e.g.,
+    tr->partitionData[i].rateCategory). 
+
+    This works, because tr->partitionData[i].rateCategory is a
+    non-owning pointer to a position in the global resource array
+    (e.g., tr->rateCategory_basePtr).
+*/ 
+void scatterDistrbutedArray(tree *tr, void *src, void *destination, MPI_Datatype type, int *countPerProc, int *displPerProc)
+{
+  int 
+    i; 
+  
+  size_t 
+    typeLen = mapMpiTypeToSize(type); 
+  
+  char 
+    *srcReordered = (char *)NULL; 
+
+  /* master must reorder the data   */
+  if(processID == 0)
+    {
+      srcReordered = (char *)malloc(tr->originalCrunchedLength * typeLen); 
+      int *seenPerProcesses = (int *)calloc((size_t) processes, sizeof(int)); 
+      
+      Assign *aIter = tr->partAssigns; 
+      Assign *aEnd = &(tr->partAssigns[ tr->numAssignments ] ); 
+
+      while(aIter != aEnd)
+	{
+	  pInfo *partition = &(tr->partitionData[ aIter->partitionId ]) ; 
+	  memcpy( srcReordered +  ( (size_t) displPerProc[aIter->procId] + (size_t) seenPerProcesses[aIter->procId] )  * typeLen , 
+		  ((char*) src) + (partition->lower + aIter->offset) * typeLen, 
+		  aIter->width * typeLen); 
+	  seenPerProcesses[aIter->procId] += aIter->width; 
+	  ++aIter; 
+	}
+
+      for(i = 0; i < processes; ++i)
+	assert(seenPerProcesses[i] == countPerProc[i]) ; 
+
+      free(seenPerProcesses); 
+    }
+  
+  MPI_Scatterv(srcReordered, countPerProc, displPerProc, type, destination, countPerProc[processID], type, 0, MPI_COMM_WORLD); 
+ 
+  /* after this scatter, every process already has the data correctly
+     ordered at its repective base pointer */
+ 
+  if(processID == 0)
+    free(srcReordered); 
+}
+
+
+/** 
+    gathers a distributed array (e.g., what used to be
+    tr->rateCategory) to partition-specfic arrays (e.g.,
+    tr->partitionData[i].rateCategory).
+
+    This works, because tr->partitionData[i].rateCategory is a
+    non-owning pointer to a position in the global resource array
+    (e.g., tr->rateCategory_basePtr).
+*/ 
+void gatherDistributedArray(tree *tr, void **destinationPtr, void *src, MPI_Datatype type, int* countPerProc, int *displPerProc)
+{
+  /* this is the raw array that the master will obtain from his
+     peers. Data in this arrays are ordered per process */
+  char 
+    *destinationUnordered = (char*)NULL; 
+  
+  char
+    *destination = (char*)NULL; 
+  
+  size_t
+    typeLen = mapMpiTypeToSize(type); 
+  
+  if(processID == 0)
+    {
+      //TODO one pointer is of type void the other of type char, not really nice
+      *destinationPtr = (void *)malloc( tr->originalCrunchedLength *  typeLen); 
+      destinationUnordered = (char *)malloc( tr->originalCrunchedLength * typeLen); 
+      destination = *destinationPtr; 
+    }
+  
+  MPI_Gatherv(src, countPerProc[processID], type, destinationUnordered, countPerProc, displPerProc, type,0 , MPI_COMM_WORLD ); 
+  
+  /*
+    here the master reorders the array it has obtained. Afterwards,
+    destinationPtr is a pointer to the array that contains the global
+    array that can be indexed by alignment position (i.e., if we have
+    gathered tr->partitionData[i].lhs, then *destinationPtr
+    corresponds to what previously was tr->lhs). This strongly couples
+    the respective distributed array to tr->partAssigns.
+   */ 
+  if(processID == 0)
+    {
+      int
+	i, 
+	*seenPerProcesses = (int*) calloc(processes, sizeof(int)); 
+
+      Assign
+	*aIter = tr->partAssigns; 
+
+      Assign
+	*aEnd = tr->partAssigns + tr->numAssignments; 
+
+      while(aIter != aEnd)
+	{
+	  pInfo
+	    *partition  = &(tr->partitionData[aIter->partitionId]); 
+
+	  memcpy(destination + (size_t) (partition->lower + aIter->offset) * typeLen,
+		 destinationUnordered +  (size_t) (displPerProc[aIter->procId] + seenPerProcesses[aIter->procId]) * typeLen ,
+		 typeLen * aIter->width);
+	  seenPerProcesses[aIter->procId] += aIter->width; 
+	  ++aIter  ; 
+	}
+      
+      /* check, if everything has been reordered */
+      for(i = 0; i < processes; ++i)
+      	assert(seenPerProcesses[i] == countPerProc[i]);
+
+      free(seenPerProcesses); 
+      free(destinationUnordered);
+    }
+}
diff --git a/examl/evaluateGenericSpecial.c b/examl/evaluateGenericSpecial.c
new file mode 100644
index 0000000..27388f5
--- /dev/null
+++ b/examl/evaluateGenericSpecial.c
@@ -0,0 +1,2083 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein. 
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32 
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+/* the set of functions in here computes the log likelihood at a given branch (the virtual root of a tree) */
+
+/* includes for using SSE3 intrinsics */
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+/*#include <tmmintrin.h>*/
+#endif
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+
+/* 
+   global variables of pthreads version, reductionBuffer is the global array 
+   that is used for implementing deterministic reduction operations, that is,
+   the total log likelihood over the partial log lieklihoods for the sites that each thread has computed 
+   
+   NumberOfThreads is just the number of threads.
+
+   Note the volatile modifier here, that guarantees that the compiler will not do weird optimizations 
+   rearraengements of the code accessing those variables, because it does not know that several concurrent threads 
+   will access those variables simulatenously 
+*/
+
+
+extern const char inverseMeaningDNA[16];
+extern int processID;
+
+/* a pre-computed 32-bit integer mask */
+
+extern const unsigned int mask32[32];
+
+/* the function below computes the P matrix from the decomposition of the Q matrix and the respective rate categories for a single partition */
+   
+
+static void calcDiagptable(const double z, const int states, const int numberOfCategories, const double *rptr, const double *EIGN, double *diagptable)
+{
+  int 
+    i, 
+    l;
+  
+  double 
+    lz,
+    *lza = (double *)malloc(sizeof(double) * states);
+
+  /* transform the root branch length to the log and check if it is not too small */
+
+  if (z < zmin) 
+    lz = log(zmin);
+  else
+    lz = log(z);
+  
+  /* do some pre-computations to avoid redundant computations further below */
+
+  for(i = 0; i < states; i++)      
+    lza[i] = EIGN[i] * lz; 
+
+  /* loop over the number of per-site or discrete gamma rate categories */
+
+  for(i = 0; i < numberOfCategories; i++)
+    {	      	       
+      /* 
+	 diagptable is a pre-allocated array of doubles that stores the P-Matrix 
+	 the first entry is always 1.0 
+       */
+      diagptable[i * states] = 1.0;
+
+      /* compute the P matrix for all remaining states of the model */
+
+      for(l = 1; l < states; l++)
+	diagptable[i * states + l] = EXP(rptr[i] * lza[l]);
+    }
+  
+  free(lza);
+}
+
+
+static void calcDiagptableFlex_LG4(double z, int numberOfCategories, double *rptr, double *EIGN[4], double *diagptable, const int numStates)
+{
+  int 
+    i, 
+    l;
+  
+  double 
+    lz;
+  
+  assert(numStates <= 64);
+  
+  if (z < zmin) 
+    lz = log(zmin);
+  else
+    lz = log(z);
+
+  for(i = 0; i <  numberOfCategories; i++)
+    {	      	       
+      diagptable[i * numStates + 0] = 1.0;
+
+      for(l = 1; l < numStates; l++)
+	diagptable[i * numStates + l] = EXP(rptr[i] * EIGN[i][l] * lz);     	          
+    }        
+}
+
+
+
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+/* below a a slow generic implementation of the likelihood computation at the root under the GAMMA model */
+
+static double evaluateGAMMA_FLEX(int *wptr,
+				 double *x1_start, double *x2_start, 
+				 double *tipVector, 
+				 unsigned char *tipX1, const int n, double *diagptable, const int states)
+{
+  double   
+    sum = 0.0, 
+    term,
+    *x1,
+    *x2;
+
+  int     
+    i, 
+    j,
+    k;
+
+  /* span is the offset within the likelihood array at an inner node that gets us from the values 
+     of site i to the values of site i + 1 */
+
+  const int 
+    span = states * 4;
+
+  /* we distingusih between two cases here: one node of the two nodes defining the branch at which we put the virtual root is 
+     a tip. Both nodes can not be tips because we do not allow for two-taxon trees ;-) 
+     Nota that, if a node is a tip, this will always be tipX1. This is done for code simplicity and the flipping of the nodes
+     is done before when we compute the traversal descriptor.     
+  */
+
+  /* the left node is a tip */
+  if(tipX1)
+    {          	
+      /* loop over the sites of this partition */
+      for (i = 0; i < n; i++)
+	{
+	  /* access pre-computed tip vector values via a lookup table */
+	  x1 = &(tipVector[states * tipX1[i]]);	 
+	  /* access the other(inner) node at the other end of the branch */
+	  x2 = &(x2_start[span * i]);	 
+	  
+	  /* loop over GAMMA rate categories, hard-coded as 4 in RAxML */
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    /* loop over states and multiply them with the P matrix */
+	    for(k = 0; k < states; k++)
+	      term += x1[k] * x2[j * states + k] * diagptable[j * states + k];	          	  	  	    	    	  
+	  	 
+	  /* take the log of the likelihood and multiply the per-gamma rate likelihood by 1/4.
+	     Under the GAMMA model the 4 discrete GAMMA rates all have the same probability 
+	     of 0.25 */
+
+	  term = LOG(0.25 * FABS(term));
+	 	 	  
+	  sum += wptr[i] * term;
+	}     
+    }
+  else
+    {        
+      for (i = 0; i < n; i++) 
+	{
+	  /* same as before, only that now we access two inner likelihood vectors x1 and x2 */
+	  	 	  	  
+	  x1 = &(x1_start[span * i]);
+	  x2 = &(x2_start[span * i]);	  	  
+	
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    for(k = 0; k < states; k++)
+	      term += x1[j * states + k] * x2[j * states + k] * diagptable[j * states + k];
+	          	  	  	      	  
+	  term = LOG(0.25 * FABS(term));
+	  	  
+	  sum += wptr[i] * term;
+	}                      	
+    }
+
+  return sum;
+} 
+
+
+/* a generic and slow implementation of the CAT model of rate heterogeneity */
+
+static double evaluateCAT_FLEX (int *cptr, int *wptr,
+				double *x1, double *x2, double *tipVector,
+				unsigned char *tipX1, int n, double *diagptable_start, const int states)
+{
+  double   
+    sum = 0.0, 
+    term,
+    *diagptable,  
+    *left, 
+    *right;
+  
+  int     
+    i, 
+    l;                           
+  
+  /* chosing between tip vectors and non tip vectors is identical in all flavors of this function ,regardless 
+     of whether we are using CAT, GAMMA, DNA or protein data etc */
+
+  if(tipX1)
+    {                 
+      for (i = 0; i < n; i++) 
+	{
+	  /* same as in the GAMMA implementation */
+	  left = &(tipVector[states * tipX1[i]]);
+	  right = &(x2[states * i]);
+	  
+	  /* important difference here, we do not have, as for GAMMA 
+	     4 P matrices assigned to each site, but just one. However those 
+	     P-Matrices can be different for the sites.
+	     Hence we index into the precalculated P-matrices for individual sites 
+	     via the category pointer cptr[i]
+	  */
+	  diagptable = &diagptable_start[states * cptr[i]];	           	 
+
+	  /* similar to gamma, with the only difference that we do not integrate (sum)
+	     over the discrete gamma rates, but simply compute the likelihood of the 
+	     site and the given P-matrix */
+
+	  for(l = 0, term = 0.0; l < states; l++)
+	    term += left[l] * right[l] * diagptable[l];	 	  	   
+	  
+	  /* take the log */
+
+	  term = LOG(FABS(term));
+	  	  
+	  /* 
+	     multiply the log with the pattern weight of this site. 
+	     The site pattern for which we just computed the likelihood may 
+	     represent several alignment columns sites that have been compressed 
+	     into one site pattern if they are exactly identical AND evolve under the same model,
+	     i.e., form part of the same partition.
+	  */	   	     
+
+	  sum += wptr[i] * term;
+	}      
+    }    
+  else
+    {    
+      for (i = 0; i < n; i++) 
+	{	
+	  /* as before we now access the likelihood arrayes of two inner nodes */
+	  left  = &x1[states * i];
+	  right = &x2[states * i];
+	  
+	  diagptable = &diagptable_start[states * cptr[i]];	  	
+
+	  for(l = 0, term = 0.0; l < states; l++)
+	    term += left[l] * right[l] * diagptable[l];	
+	  
+	  term = LOG(FABS(term));	 
+	  
+	  sum += wptr[i] * term;      
+	}
+    }
+             
+  return  sum;         
+} 
+
+#endif
+
+/* below are the function headers for unreadeble highly optimized versions of the above functions 
+   for DNA and protein data that also use SSE3 intrinsics and implement some memory saving tricks.
+   The actual functions can be found at the end of this source file. 
+   All other likelihood function implementation files:
+
+   newviewGenericSpacial.c
+   makenewzSpecial.c
+   evaluatePartialGenericSpecial.c
+
+   are also structured like this 
+
+   To decide which set of function implementations to use you will have to undefine or define _OPTIMIZED_FUNCTIONS 
+   in the Makefile 
+*/
+   
+
+#ifdef _OPTIMIZED_FUNCTIONS
+static double evaluateGTRGAMMA_BINARY(int *ex1, int *ex2, int *wptr,
+                                      double *x1_start, double *x2_start, 
+                                      double *tipVector, 
+                                      unsigned char *tipX1, const int n, double *diagptable, const boolean fastScaling);
+
+static double evaluateGTRCAT_BINARY (int *ex1, int *ex2, int *cptr, int *wptr,
+                                     double *x1_start, double *x2_start, double *tipVector,                   
+                                     unsigned char *tipX1, int n, double *diagptable_start, const boolean fastScaling);
+
+static double evaluateGTRGAMMAPROT_LG4(int *ex1, int *ex2, int *wptr,
+				       double *x1, double *x2,  
+				       double *tipVector[4], 
+				       unsigned char *tipX1, int n, double *diagptable, const boolean fastScaling, double *weights);
+
+/* GAMMA for proteins with memory saving */
+
+static double evaluateGTRGAMMAPROT_GAPPED_SAVE (int *wptr,
+						double *x1, double *x2,  
+						double *tipVector, 
+						unsigned char *tipX1, int n, double *diagptable, 
+						double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+
+/* GAMMA for proteins */
+
+static double evaluateGTRGAMMAPROT (int *wptr,
+				    double *x1, double *x2,  
+				    double *tipVector, 
+				    unsigned char *tipX1, int n, double *diagptable);
+
+/* CAT for proteins */
+
+static double evaluateGTRCATPROT (int *cptr, int *wptr,
+				  double *x1, double *x2, double *tipVector,
+				  unsigned char *tipX1, int n, double *diagptable_start);
+
+
+/* CAT for proteins with memory saving */
+
+static double evaluateGTRCATPROT_SAVE (int *cptr, int *wptr,
+				       double *x1, double *x2, double *tipVector,
+				       unsigned char *tipX1, int n, double *diagptable_start, 
+				       double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+/* analogous DNA fuctions */
+
+static double evaluateGTRCAT_SAVE (int *cptr, int *wptr,
+				   double *x1_start, double *x2_start, double *tipVector, 		      
+				   unsigned char *tipX1, int n, double *diagptable_start,
+				   double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static double evaluateGTRGAMMA_GAPPED_SAVE(int *wptr,
+					   double *x1_start, double *x2_start, 
+					   double *tipVector, 
+					   unsigned char *tipX1, const int n, double *diagptable,
+					   double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static double evaluateGTRGAMMA(int *wptr,
+			       double *x1_start, double *x2_start, 
+			       double *tipVector, 
+			       unsigned char *tipX1, const int n, double *diagptable);
+
+
+static double evaluateGTRCAT (int *cptr, int *wptr,
+			      double *x1_start, double *x2_start, double *tipVector, 		      
+			      unsigned char *tipX1, int n, double *diagptable_start);
+
+
+#endif
+
+
+/* This is the core function for computing the log likelihood at a branch */
+
+void evaluateIterative(tree *tr)
+{
+  /* the branch lengths and node indices of the virtual root branch are always the first one that 
+     are stored in the very important traversal array data structure that describes a partial or full tree traversal */
+
+  /* get the branch length at the root */
+  double 
+    *pz = tr->td[0].ti[0].qz;   
+
+  /* get the node number of the node to the left and right of the branch that defines the virtual rooting */
+
+  int    
+    pNumber = tr->td[0].ti[0].pNumber, 
+    qNumber = tr->td[0].ti[0].qNumber;
+ 
+  /* before we can compute the likelihood at the virtual root, we need to do a partial or full tree traversal to compute 
+     the conditional likelihoods of the vectors as specified in the traversal descriptor. Maintaining this tarversal descriptor consistent 
+     will unfortunately be the responsibility of users. This is tricky, if as planned for here, we use a rooted view (described somewhere in Felsenstein's book)
+     for the conditional vectors with respect to the tree
+  */
+     
+  /* iterate over all valid entries in the traversal descriptor */
+  newviewIterative(tr, 1);
+
+  int 
+    m;
+
+#ifdef _USE_OMP
+#pragma omp parallel for
+#endif
+  for(m = 0; m < tr->NumberOfModels; m++)
+    {
+      /* check if this partition has to be processed now - otherwise no need to compute P matrix */
+	if(!tr->td[0].executeModel[m] || tr->partitionData[m].width == 0)
+	  continue;
+
+	int
+	  categories,
+	  states = tr->partitionData[m].states;
+
+	double
+	  z,
+	  *rateCategories,
+	  *diagptable = tr->partitionData[m].left;
+
+	/* if we are using a per-partition branch length estimate, the branch has an index, otherwise, for a joint branch length
+	   estimate over all partitions we just use the branch length value with index 0 */
+	if(tr->numBranches > 1)
+	  z = pz[m];
+	else
+	  z = pz[0];
+
+
+	  /*
+	     figure out if we are using the CAT or GAMMA model of rate heterogeneity
+	     and set pointers to the rate heterogeneity rate arrays and also set the
+	     number of distinct rate categories appropriately.
+
+	     Under GAMMA this is constant and hard-coded as 4, weheras under CAT
+	     the number of site-wise rate categories can vary in the course of computations
+	     up to a user defined maximum value of site categories (default: 25)
+	   */
+	if(tr->rateHetModel == CAT)
+	  {
+	    rateCategories = tr->partitionData[m].perSiteRates;
+	    categories = tr->partitionData[m].numberOfCategories;
+	  }
+	else
+	  {
+	    rateCategories = tr->partitionData[m].gammaRates;
+	    categories = 4;
+	  }
+
+	if(tr->partitionData[m].protModels == LG4M || tr->partitionData[m].protModels == LG4X)
+	  calcDiagptableFlex_LG4(z, 4, tr->partitionData[m].gammaRates, tr->partitionData[m].EIGN_LG4, diagptable, 20);
+	else
+	  calcDiagptable(z, states, categories, rateCategories, tr->partitionData[m].EIGN, diagptable);
+    }
+
+  /* after the above call we are sure that we have properly and consistently computed the 
+     conditionals to the right and left of the virtual root and we can now invoke the 
+     the log likelihood computation */
+
+  /* we need to loop over all partitions. Note that we may have a mix of DNA, protein binary data etc partitions */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+  {
+    int
+      m,
+      model,
+      maxModel;
+
+#ifdef _USE_OMP
+    maxModel = tr->maxModelsPerThread;
+#else
+    maxModel = tr->NumberOfModels;
+#endif
+
+  for(m = 0; m < maxModel; m++)
+    {    
+      /* just defaults -> if partion wasn't assigned to this thread, it will be ignored later on */
+      size_t
+	width = 0,
+	offset = 0;
+
+      double
+	*diagptable     = (double*)NULL,
+	*perPartitionLH = (double*)NULL;
+
+      unsigned int
+	*globalScaler = (unsigned int*)NULL;
+
+
+#ifdef _USE_OMP
+    	  int
+    	    tid = omp_get_thread_num();
+
+    	  /* check if this thread should process this partition */
+    	  Assign* 
+	    pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+    	  if(pAss)
+	    {
+	      model  = pAss->partitionId;
+	      width  = pAss->width;
+	      offset = pAss->offset;
+	      
+	      assert(model < tr->NumberOfModels);
+	      
+	      diagptable = tr->partitionData[model].left;
+	      globalScaler = tr->partitionData[model].threadGlobalScaler[tid];
+	      perPartitionLH = &tr->partitionData[model].reductionBuffer[tid];
+	    }
+    	  else
+    	    break;
+	  
+#else
+    	  model = m;
+
+    	  /* number of sites in this partition */
+	  width  = (size_t)tr->partitionData[model].width;
+	  offset = 0;
+
+	  /* set this pointer to the memory area where space has been reserved a priori for storing the
+	     P matrix at the root */
+	  diagptable = tr->partitionData[model].left;
+	  globalScaler = tr->partitionData[model].globalScaler;
+	  perPartitionLH = &tr->perPartitionLH[model];
+#endif
+
+        
+      /* 
+	 Important part of the tarversal descriptor: 
+	 figure out if we need to recalculate the likelihood of this 
+	 partition: 
+
+	 The reasons why this is important in terms of performance are given in this paper 
+	 here which you should actually read:
+	 
+	 A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009, accepted for publication, Vienna, Austria, September 2009
+	 
+	 The width > 0 check is for checking if under the cyclic data distribution of per-partition sites to threads this thread does indeed have a site 
+	 of the current partition.
+
+       */
+
+      if(tr->td[0].executeModel[model] && width > 0)
+	{	
+	  int 
+	    rateHet = (int)discreteRateCategories(tr->rateHetModel),
+	    
+	    /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+	    states = tr->partitionData[model].states,
+
+	    /* span for single alignment site (in doubles!) */
+	    span = rateHet * states;
+
+	  size_t
+	    /* offset for current thread's data in global xVector (in doubles!) */
+	    x_offset = offset * (size_t)span;
+
+	  int
+	    /* integer weight vector with pattern compression weights */
+	    *wgt = tr->partitionData[model].wgt + offset,
+
+	    /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+	    *rateCategory = tr->partitionData[model].rateCategory + offset;
+	  
+	  double 
+	    partitionLikelihood = 0.0, 	 
+	    *weights = tr->partitionData[model].weights,
+	    *x1_start   = (double*)NULL, 
+	    *x2_start   = (double*)NULL,
+	    *x1_gapColumn = (double*)NULL,
+	    *x2_gapColumn = (double*)NULL;
+	  	    	 	  
+	  unsigned int
+	    *x1_gap = (unsigned int*)NULL,
+	    *x2_gap = (unsigned int*)NULL;	 
+	  
+	  unsigned char 
+	    *tip = (unsigned char*)NULL;	  
+
+	  /* figure out if we need to address tip vectors (a char array that indexes into a precomputed tip likelihood 
+	     value array or if we need to address inner vectors */
+
+	  /* either node p or node q is a tip */
+	  
+	  if(isTip(pNumber, tr->mxtips) || isTip(qNumber, tr->mxtips))
+	    {	        	    
+	      /* q is a tip */
+
+	      if(isTip(qNumber, tr->mxtips))
+		{	
+		  /* get the start address of the inner likelihood vector x2 for partition model,
+		     note that inner nodes are enumerated/indexed starting at 0 to save allocating some 
+		     space for additional pointers */
+		  		 
+		  x2_start = tr->partitionData[model].xVector[pNumber - tr->mxtips -1] + x_offset;
+
+		  /* get the corresponding tip vector */
+
+		  tip      = tr->partitionData[model].yVector[qNumber] + offset;
+
+		  /* memory saving stuff, let's deal with this later or ask Fernando ;-) */
+
+		  if(tr->saveMemory)
+		    {
+		      x2_gap         = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+		      x2_gapColumn   = &(tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet]);
+		    }
+		}           
+	      else
+		{	
+		  /* p is a tip, same as above */
+	 
+		  x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+		  tip = tr->partitionData[model].yVector[pNumber] + offset;
+
+		  if(tr->saveMemory)
+		    {
+		      x2_gap         = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+		      x2_gapColumn   = &(tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet]);
+		    }
+
+		}
+	    }
+	  else
+	    {  
+	 
+	      /* neither p nor q are tips, hence we need to get the addresses of two inner vectors */
+    
+	      x1_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+	      x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+
+	      /* memory saving option */
+
+	      if(tr->saveMemory)
+		{
+		  x1_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+		  x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+		  x1_gapColumn   = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet];
+		  x2_gapColumn   = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet];
+		}
+	
+	    }
+	  
+#ifndef _OPTIMIZED_FUNCTIONS
+
+	  /* generic slow functions, memory saving option is not implemented for these */
+
+	  assert(!tr->saveMemory);
+
+	  /* decide wheter CAT or GAMMA is used and compute log like */
+
+	  if(tr->rateHetModel == CAT)
+	     partitionLikelihood = evaluateCAT_FLEX(tr->partitionData[model].rateCategory, wgt,
+						    x1_start, x2_start, tr->partitionData[model].tipVector, 
+						    tip, width, diagptable, states);
+	  else
+	    partitionLikelihood = evaluateGAMMA_FLEX(wgt,
+						     x1_start, x2_start, tr->partitionData[model].tipVector,
+						     tip, width, diagptable, states);
+#else
+
+	  /* for the optimized functions we have a dedicated, optimized function implementation 
+	     for each rate heterogeneity and data type combination, we switch over the number of states 
+	     and the rate heterogeneity model */
+	  
+	  switch(states)
+	    { 	  
+	    case 2:
+#ifdef __MIC_NATIVE
+ 	      assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+	      assert(!tr->saveMemory);
+	      if(tr->rateHetModel == CAT)
+		partitionLikelihood = evaluateGTRCAT_BINARY((int *)NULL, (int *)NULL, rateCategory, wgt,
+				      x1_start, x2_start, tr->partitionData[model].tipVector, 
+				      tip, width, diagptable, TRUE);
+	      else				  
+		partitionLikelihood = evaluateGTRGAMMA_BINARY((int *)NULL, (int *)NULL, wgt,
+							     x1_start, x2_start, 
+							     tr->partitionData[model].tipVector,
+							     tip, width, diagptable, TRUE);	      	      
+#endif
+	      break;
+	    case 4: /* DNA */
+	      {
+		if(tr->rateHetModel == CAT)
+		  {		  		  
+		    if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+		      partitionLikelihood =  evaluateGTRCAT_SAVE(rateCategory, wgt,
+								 x1_start, x2_start, tr->partitionData[model].tipVector, 
+								 tip, width, diagptable, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		    else
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+		      partitionLikelihood =  evaluateGTRCAT(rateCategory, wgt,
+							    x1_start, x2_start, tr->partitionData[model].tipVector, 
+							    tip, width, diagptable);
+#endif
+		  }
+		else
+		  {		
+		    if(tr->saveMemory)		   
+#ifdef __MIC_NATIVE
+ 		      assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+		      partitionLikelihood =  evaluateGTRGAMMA_GAPPED_SAVE(wgt,
+									  x1_start, x2_start, tr->partitionData[model].tipVector,
+									  tip, width, diagptable,
+									  x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		    else
+#ifdef __MIC_NATIVE
+              partitionLikelihood = evaluateGAMMA_MIC(wgt,
+	                                 x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+	                                 tip, width, diagptable);
+#else
+		      partitionLikelihood =  evaluateGTRGAMMA(wgt,
+							      x1_start, x2_start, tr->partitionData[model].tipVector,
+							      tip, width, diagptable);
+#endif
+		  }
+	      }
+	      break;	  	   		   
+	    case 20: /* proteins */
+	      {
+		if(tr->rateHetModel == CAT)
+		  {		   		  
+		    if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+		      partitionLikelihood = evaluateGTRCATPROT_SAVE(rateCategory, wgt,
+								    x1_start, x2_start, tr->partitionData[model].tipVector,
+								    tip, width, diagptable,  x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		    else
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+		      partitionLikelihood = evaluateGTRCATPROT(rateCategory, wgt,
+							       x1_start, x2_start, tr->partitionData[model].tipVector,
+							       tip, width, diagptable);
+#endif
+		  }
+		else
+		  {		    		    		      
+		    if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ 		      assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+		      partitionLikelihood = evaluateGTRGAMMAPROT_GAPPED_SAVE(wgt,
+									     x1_start, x2_start, tr->partitionData[model].tipVector,
+									     tip, width, diagptable,
+									     x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		    else
+		      {
+			if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+#ifdef __MIC_NATIVE
+			 partitionLikelihood = evaluateGAMMAPROT_LG4_MIC(wgt,
+                               x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+                               tip, width, diagptable, weights);
+#else
+			  partitionLikelihood =  evaluateGTRGAMMAPROT_LG4((int *)NULL, (int *)NULL, wgt,
+									  x1_start, x2_start, tr->partitionData[model].tipVector_LG4,
+									  tip, width, diagptable, TRUE, weights);
+#endif
+			else
+#ifdef __MIC_NATIVE
+	            partitionLikelihood = evaluateGAMMAPROT_MIC(wgt,
+	                               x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+	                               tip, width, diagptable);
+#else
+			  partitionLikelihood = evaluateGTRGAMMAPROT(wgt,
+								     x1_start, x2_start, tr->partitionData[model].tipVector,
+								     tip, width, diagptable);
+#endif
+		      }
+		  }
+	      }
+	      break;	      		    
+	    default:
+	      assert(0);	    
+	    }	
+#endif
+	  
+	  /* now here is a nasty part, for each partition and each node we maintain an integer counter to count how often 
+	     how many entries per node were scaled by a constant factor. Here we use this information generated during Felsenstein's 
+	     pruning algorithm by the newview() functions to undo the preceding scaling multiplications at the root, for mathematical details 
+	     you should actually read:
+
+	     A. Stamatakis: "Orchestrating the Phylogenetic Likelihood Function on Emerging Parallel Architectures". 
+	     In B. Schmidt, editor, Bioinformatics: High Performance Parallel Computer Architectures, 85-115, CRC Press, Taylor & Francis, 2010.
+
+	     There's a copy of this book in my office 
+	  */
+
+	  partitionLikelihood += (globalScaler[pNumber] + globalScaler[qNumber]) * LOG(minlikelihood);
+
+	  /* check that there was no major numerical screw-up, the log likelihood should be < 0.0 always */
+
+	  
+
+	  assert(partitionLikelihood < 0.0);
+
+	  /* now we have the correct log likelihood for the current partition after undoing scaling multiplications */	  	 
+	  
+	  /* finally, we also store the per partition log likelihood which is important for optimizing the alpha parameter 
+	     of this partition for example */
+
+	  *perPartitionLH = partitionLikelihood;
+	}
+      else
+	{
+	  /* if the current thread does not have a single site of this partition
+	     it is important to set the per partition log like to 0.0 because 
+	     of the reduction operation that will take place later-on.
+	     That is, the values of tr->perPartitionLH across all threads 
+	     need to be in a consistent state, always !
+	  */
+
+	  if(width == 0)	    
+	    *perPartitionLH = 0.0;
+	  else
+	    {
+	      assert(tr->td[0].executeModel[model] == FALSE && *perPartitionLH < 0.0);
+	    }
+	}
+    }  /* for model */
+  }  /* OMP parallel */
+
+
+#ifdef _USE_OMP
+  /* perform reduction of per-partition LH scores */
+  int
+    model,
+    t;
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+     if (!tr->td[0].executeModel[model])
+       continue;
+
+      tr->perPartitionLH[model] = 0.0;
+      for(t = 0; t < tr->maxThreadsPerModel; t++)
+	{
+	  Assign*
+	    pAss = tr->partThreadAssigns[model * tr->maxThreadsPerModel + t];
+
+	  if (pAss)
+	    {
+	      int
+		tid = pAss->procId;
+
+	      tr->perPartitionLH[model] += tr->partitionData[model].reductionBuffer[tid];
+	    }
+	}
+    }
+#endif
+}
+
+
+
+
+void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal)
+{
+  /* now this may be the entry point of the library to compute 
+     the log like at a branch defined by p and p->back == q */
+
+  volatile double 
+    result = 0.0;
+  
+  nodeptr 
+    q = p->back; 
+  
+  int 
+    i,
+    model;
+
+ 
+  /* set the first entry of the traversal descriptor to contain the indices
+     of nodes p and q */
+
+  tr->td[0].ti[0].pNumber = p->number;
+  tr->td[0].ti[0].qNumber = q->number;          
+  
+  /* copy the branch lengths of the tree into the first entry of the traversal descriptor.
+     if -M is not used tr->numBranches must be 1 */
+
+  for(i = 0; i < tr->numBranches; i++)    
+    tr->td[0].ti[0].qz[i] =  q->z[i];
+  
+  /* now compute how many conditionals must be re-computed/re-oriented by newview
+     to be able to calculate the likelihood at the root defined by p and q.
+  */
+
+  /* one entry in the traversal descriptor is already used, hence set the tarversal length counter to 1 */
+  tr->td[0].count = 1;
+
+  /* do we need to recompute any of the vectors at or below p ? */
+  
+  if(fullTraversal)
+    { 
+      assert(isTip(p->number, tr->mxtips));
+      computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, FALSE);     
+    }
+  else
+    {
+      if(!p->x)
+	computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+
+      /* recompute/reorient any descriptors at or below q ? 
+	 computeTraversalInfo computes and stores the newview() to be executed for the traversal descriptor */
+      
+      if(!q->x)
+	computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);  
+    }
+   
+      /* now we copy this partition execute mask into the traversal descriptor which must come from the 
+	 calling program, the logic of this should not form part of the library */
+
+  storeExecuteMaskInTraversalDescriptor(tr);  
+  
+  /* also store in the traversal descriptor that something has changed i.e., in the parallel case that the 
+     traversal descriptor list of nodes needs to be broadcast once again */
+  
+  tr->td[0].traversalHasChanged = TRUE;
+
+
+  evaluateIterative(tr);  
+  
+  {
+    double 
+      *recv = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+    
+#ifdef _USE_ALLREDUCE   
+    MPI_Allreduce(tr->perPartitionLH, recv, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
+#else
+    MPI_Reduce(tr->perPartitionLH, recv, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+    MPI_Bcast(recv, tr->NumberOfModels, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+#endif
+    
+    memcpy(tr->perPartitionLH, recv, tr->NumberOfModels * sizeof(double));
+
+    for(model = 0; model < tr->NumberOfModels; model++)        
+      result += tr->perPartitionLH[model];
+         
+    free(recv);
+  }
+
+
+  /* set the tree data structure likelihood value to the total likelihood */
+
+  tr->likelihood = result;    
+  
+  /* 
+     MPI_Barrier(MPI_COMM_WORLD);
+     printf("Process %d likelihood: %f\n", processID, tr->likelihood);
+     MPI_Barrier(MPI_COMM_WORLD);
+  */
+
+  /* do some bookkeeping to have traversalHasChanged in a consistent state */
+
+  tr->td[0].traversalHasChanged = FALSE;  
+
+  
+  
+ 
+}
+
+
+
+
+
+
+
+/* below are the optimized function versions with geeky intrinsics */
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/* binary data */
+
+static double evaluateGTRCAT_BINARY (int *ex1, int *ex2, int *cptr, int *wptr,
+                                     double *x1_start, double *x2_start, double *tipVector,                   
+                                     unsigned char *tipX1, int n, double *diagptable_start, const boolean fastScaling)
+{
+  double  sum = 0.0, term;       
+  int     i;
+  double  *diagptable, *x1, *x2;                            
+ 
+  if(tipX1)
+    {          
+      for (i = 0; i < n; i++) 
+        {
+          double 
+	    t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+          x1 = &(tipVector[2 * tipX1[i]]);
+          x2 = &(x2_start[2 * i]);
+          
+          diagptable = &(diagptable_start[2 * cptr[i]]);                          
+        
+
+          _mm_store_pd(t, _mm_mul_pd(_mm_load_pd(x1), _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(diagptable))));
+          
+          if(fastScaling)
+            term = log(fabs(t[0] + t[1]));
+          else
+            term = log(fabs(t[0] + t[1])) + (ex2[i] * log(minlikelihood));                           
+
+          sum += wptr[i] * term;
+        }       
+    }               
+  else
+    {
+      for (i = 0; i < n; i++) 
+        {       
+          double 
+	    t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));                                 
+            
+          x1 = &x1_start[2 * i];
+          x2 = &x2_start[2 * i];
+          
+          diagptable = &diagptable_start[2 * cptr[i]];            
+
+          _mm_store_pd(t, _mm_mul_pd(_mm_load_pd(x1), _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(diagptable))));
+          
+          if(fastScaling)
+            term = log(fabs(t[0] + t[1]));
+          else
+            term = log(fabs(t[0] + t[1])) + ((ex1[i] + ex2[i]) * log(minlikelihood));                        
+
+          
+          sum += wptr[i] * term;
+        }          
+    }
+       
+  return  sum;         
+} 
+
+
+static double evaluateGTRGAMMA_BINARY(int *ex1, int *ex2, int *wptr,
+                                      double *x1_start, double *x2_start, 
+                                      double *tipVector, 
+                                      unsigned char *tipX1, const int n, double *diagptable, const boolean fastScaling)
+{
+  double   sum = 0.0, term;    
+  int     i, j;  
+  double  *x1, *x2;             
+
+  if(tipX1)
+    {          
+      for (i = 0; i < n; i++)
+        {
+          double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+          __m128d termv, x1v, x2v, dv;
+	  
+          x1 = &(tipVector[2 * tipX1[i]]);       
+          x2 = &x2_start[8 * i];                                
+
+          termv = _mm_set1_pd(0.0);                
+          
+          for(j = 0; j < 4; j++)
+            {
+              x1v = _mm_load_pd(&x1[0]);
+              x2v = _mm_load_pd(&x2[j * 2]);
+              dv   = _mm_load_pd(&diagptable[j * 2]);
+              
+              x1v = _mm_mul_pd(x1v, x2v);
+              x1v = _mm_mul_pd(x1v, dv);
+              
+              termv = _mm_add_pd(termv, x1v);                 
+            }
+          
+          _mm_store_pd(t, termv);               
+          
+          if(fastScaling)
+            term = log(0.25 * (fabs(t[0] + t[1])));
+          else
+            term = log(0.25 * (fabs(t[0] + t[1]))) + (ex2[i] * log(minlikelihood));       
+ 
+          
+          sum += wptr[i] * term;
+        }         
+    }
+  else
+    {         
+      for (i = 0; i < n; i++) 
+        {
+
+          double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+          __m128d termv, x1v, x2v, dv;
+                        
+          x1 = &x1_start[8 * i];
+          x2 = &x2_start[8 * i];
+                  
+
+          termv = _mm_set1_pd(0.0);                
+          
+          for(j = 0; j < 4; j++)
+            {
+              x1v = _mm_load_pd(&x1[j * 2]);
+              x2v = _mm_load_pd(&x2[j * 2]);
+              dv   = _mm_load_pd(&diagptable[j * 2]);
+              
+              x1v = _mm_mul_pd(x1v, x2v);
+              x1v = _mm_mul_pd(x1v, dv);
+              
+              termv = _mm_add_pd(termv, x1v);                 
+            }
+          
+          _mm_store_pd(t, termv);
+          
+          
+          if(fastScaling)
+            term = log(0.25 * (fabs(t[0] + t[1])));
+          else
+            term = log(0.25 * (fabs(t[0] + t[1]))) + ((ex1[i] +ex2[i]) * log(minlikelihood));     
+
+
+          sum += wptr[i] * term;
+        }                       
+    }
+
+  return sum;
+} 
+
+
+/* binary data end */
+
+
+static double evaluateGTRGAMMAPROT_LG4(int *ex1, int *ex2, int *wptr,
+				       double *x1, double *x2,  
+				       double *tipVector[4], 
+				       unsigned char *tipX1, int n, double *diagptable, const boolean fastScaling, double *weights)
+{
+  double   sum = 0.0, term;        
+  int     i, j, l;   
+  double  *left, *right;              
+  
+  if(tipX1)
+    {               
+      for (i = 0; i < n; i++) 
+	{
+#ifdef __SIM_SSE3  	  
+	  __m128d 
+	    tv = _mm_setzero_pd();
+	      	 	  	 
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double 
+		*d = &diagptable[j * 20];
+
+	      __m128d 
+		t = _mm_setzero_pd(),
+		w = _mm_set1_pd(weights[j]);
+	      
+	      
+	      left = &(tipVector[j][20 * tipX1[i]]);
+	      right = &(x2[80 * i + 20 * j]);
+	      
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  t = _mm_add_pd(t, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}
+	      
+	      tv = _mm_add_pd(tv, _mm_mul_pd(t, w));	      	      	     
+	    }
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+	  
+#else	  	  	  	  
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double 
+		t = 0.0;
+	      
+	      left = &(tipVector[j][20 * tipX1[i]]);
+	      right = &(x2[80 * i + 20 * j]);
+	      for(l = 0; l < 20; l++)
+		t += left[l] * right[l] * diagptable[j * 20 + l];	      
+
+	      term += weights[j] * t;
+	    }	  
+#endif
+	  
+	  if(fastScaling)
+	    term = LOG(FABS(term));
+	  else
+	    term = LOG(FABS(term)) + (ex2[i] * LOG(minlikelihood));	   
+	  
+	  sum += wptr[i] * term;
+	}    	        
+    }              
+  else
+    {
+      for (i = 0; i < n; i++) 
+	{	  	 	             
+#ifdef __SIM_SSE3        
+	  __m128d 
+	    tv = _mm_setzero_pd();	 	  	  
+	      
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double 
+		*d = &diagptable[j * 20];
+
+	      __m128d 
+		t = _mm_setzero_pd(),
+		w = _mm_set1_pd(weights[j]);
+	      
+	      left  = &(x1[80 * i + 20 * j]);
+	      right = &(x2[80 * i + 20 * j]);
+	      
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  t = _mm_add_pd(t, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}		 
+
+	       tv = _mm_add_pd(tv, _mm_mul_pd(t, w));
+	    }
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);	  
+	  
+#else
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double
+		t = 0.0;
+	      
+	      left  = &(x1[80 * i + 20 * j]);
+	      right = &(x2[80 * i + 20 * j]);	    
+	      
+	      for(l = 0; l < 20; l++)
+		t += left[l] * right[l] * diagptable[j * 20 + l];	
+
+	      term += weights[j] * t;
+	    }
+#endif
+	  
+	  if(fastScaling)
+	    term = LOG(FABS(term));
+	  else
+	    term = LOG(FABS(term)) + ((ex1[i] + ex2[i])*LOG(minlikelihood));
+	  
+	  sum += wptr[i] * term;
+	}         
+    }
+       
+  return  sum;
+}
+
+
+
+static double evaluateGTRGAMMAPROT_GAPPED_SAVE (int *wptr,
+						double *x1, double *x2,  
+						double *tipVector, 
+						unsigned char *tipX1, int n, double *diagptable, 
+						double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)					   
+{
+  double   sum = 0.0, term;        
+  int     i, j, l;   
+  double  
+    *left, 
+    *right,
+    *x1_ptr = x1,
+    *x2_ptr = x2,
+    *x1v,
+    *x2v;              
+  
+  if(tipX1)
+    {               
+      for (i = 0; i < n; i++) 
+	{
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2v = x2_gapColumn;
+	  else
+	    {
+	      x2v = x2_ptr;
+	      x2_ptr += 80;
+	    }
+
+	  __m128d tv = _mm_setzero_pd();
+	  left = &(tipVector[20 * tipX1[i]]);	  	  
+	  
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double *d = &diagptable[j * 20];
+	      right = &(x2v[20 * j]);
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}		 		
+	    }
+
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+	  
+
+	  
+	  term = LOG(0.25 * FABS(term));	  
+	  
+	  sum += wptr[i] * term;
+	}    	        
+    }              
+  else
+    {
+      for (i = 0; i < n; i++) 
+	{
+	  if(x1_gap[i / 32] & mask32[i % 32])
+	    x1v = x1_gapColumn;
+	  else
+	    {
+	      x1v = x1_ptr;
+	      x1_ptr += 80;
+	    }
+
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2v = x2_gapColumn;
+	  else
+	    {
+	      x2v = x2_ptr;
+	      x2_ptr += 80;
+	    }
+	  	 	             
+	  __m128d tv = _mm_setzero_pd();	 	  	  
+	      
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double *d = &diagptable[j * 20];
+	      left  = &(x1v[20 * j]);
+	      right = &(x2v[20 * j]);
+	      
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}		 		
+	    }
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);	  
+	  
+	 
+	  term = LOG(0.25 * FABS(term));
+	
+	  
+	  sum += wptr[i] * term;
+	}         
+    }
+       
+  return  sum;
+}
+
+
+
+static double evaluateGTRGAMMAPROT (int *wptr,
+				    double *x1, double *x2,  
+				    double *tipVector, 
+				    unsigned char *tipX1, int n, double *diagptable)
+{
+  double   sum = 0.0, term;        
+  int     i, j, l;   
+  double  *left, *right;              
+  
+  if(tipX1)
+    {               
+      for (i = 0; i < n; i++) 
+	{
+
+	  __m128d tv = _mm_setzero_pd();
+	  left = &(tipVector[20 * tipX1[i]]);	  	  
+	  
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double *d = &diagptable[j * 20];
+	      right = &(x2[80 * i + 20 * j]);
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}		 		
+	    }
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+	  
+	  
+	 
+	  term = LOG(0.25 * FABS(term));
+		 
+	  
+	  sum += wptr[i] * term;
+	}    	        
+    }              
+  else
+    {
+      for (i = 0; i < n; i++) 
+	{	  	 	             
+	  __m128d tv = _mm_setzero_pd();	 	  	  
+	      
+	  for(j = 0, term = 0.0; j < 4; j++)
+	    {
+	      double *d = &diagptable[j * 20];
+	      left  = &(x1[80 * i + 20 * j]);
+	      right = &(x2[80 * i + 20 * j]);
+	      
+	      for(l = 0; l < 20; l+=2)
+		{
+		  __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+		  tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));		   
+		}		 		
+	    }
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);	  
+	  
+	
+	  term = LOG(0.25 * FABS(term));
+	  
+	  
+	  sum += wptr[i] * term;
+	}
+    }
+       
+  return  sum;
+}
+
+
+static double evaluateGTRCATPROT (int *cptr, int *wptr,
+				  double *x1, double *x2, double *tipVector,
+				  unsigned char *tipX1, int n, double *diagptable_start)
+{
+  double   sum = 0.0, term;
+  double  *diagptable,  *left, *right;
+  int     i, l;                           
+  
+  if(tipX1)
+    {                 
+      for (i = 0; i < n; i++) 
+	{	       	
+	  left = &(tipVector[20 * tipX1[i]]);
+	  right = &(x2[20 * i]);
+	  
+	  diagptable = &diagptable_start[20 * cptr[i]];	           	 
+
+	  __m128d tv = _mm_setzero_pd();	    
+	  
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d lv = _mm_load_pd(&left[l]);
+	      __m128d rv = _mm_load_pd(&right[l]);
+	      __m128d mul = _mm_mul_pd(lv, rv);
+	      __m128d dv = _mm_load_pd(&diagptable[l]);
+	      
+	      tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));		   
+	    }		 		
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+  
+	  
+	  term = LOG(FABS(term));
+	  	  
+	  sum += wptr[i] * term;
+	}      
+    }    
+  else
+    {
+    
+      for (i = 0; i < n; i++) 
+	{		       	      	      
+	  left  = &x1[20 * i];
+	  right = &x2[20 * i];
+	  
+	  diagptable = &diagptable_start[20 * cptr[i]];	  	
+
+	  __m128d tv = _mm_setzero_pd();	    
+	      	    
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d lv = _mm_load_pd(&left[l]);
+	      __m128d rv = _mm_load_pd(&right[l]);
+	      __m128d mul = _mm_mul_pd(lv, rv);
+	      __m128d dv = _mm_load_pd(&diagptable[l]);
+	      
+	      tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));		   
+	    }		 		
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+	  	  
+	  term = LOG(FABS(term));	 
+	  
+	  sum += wptr[i] * term;      
+	}
+    }
+             
+  return  sum;         
+} 
+
+
+static double evaluateGTRCATPROT_SAVE (int *cptr, int *wptr,
+				       double *x1, double *x2, double *tipVector,
+				       unsigned char *tipX1, int n, double *diagptable_start, 
+				       double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  double   
+    sum = 0.0, 
+    term,
+    *diagptable,  
+    *left, 
+    *right,
+    *left_ptr = x1,
+    *right_ptr = x2;
+  
+  int     
+    i, 
+    l;                           
+  
+  if(tipX1)
+    {                 
+      for (i = 0; i < n; i++) 
+	{	       	
+	  left = &(tipVector[20 * tipX1[i]]);
+
+	  if(isGap(x2_gap, i))
+	    right = x2_gapColumn;
+	  else
+	    {
+	      right = right_ptr;
+	      right_ptr += 20;
+	    }	  	 
+	  
+	  diagptable = &diagptable_start[20 * cptr[i]];	           	 
+
+	  __m128d tv = _mm_setzero_pd();	    
+	  
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d lv = _mm_load_pd(&left[l]);
+	      __m128d rv = _mm_load_pd(&right[l]);
+	      __m128d mul = _mm_mul_pd(lv, rv);
+	      __m128d dv = _mm_load_pd(&diagptable[l]);
+	      
+	      tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));		   
+	    }		 		
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+    
+	  
+	  term = LOG(FABS(term));
+	  	  
+	  sum += wptr[i] * term;
+	}      
+    }    
+  else
+    {
+    
+      for (i = 0; i < n; i++) 
+	{		       	      	      	  
+	  if(isGap(x1_gap, i))
+	    left = x1_gapColumn;
+	  else
+	    {
+	      left = left_ptr;
+	      left_ptr += 20;
+	    }
+	  
+	  if(isGap(x2_gap, i))
+	    right = x2_gapColumn;
+	  else
+	    {
+	      right = right_ptr;
+	      right_ptr += 20;
+	    }
+	  
+	  diagptable = &diagptable_start[20 * cptr[i]];	  	
+
+	  __m128d tv = _mm_setzero_pd();	    
+	  
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d lv = _mm_load_pd(&left[l]);
+	      __m128d rv = _mm_load_pd(&right[l]);
+	      __m128d mul = _mm_mul_pd(lv, rv);
+	      __m128d dv = _mm_load_pd(&diagptable[l]);
+	      
+	      tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));		   
+	    }		 		
+	  
+	  tv = _mm_hadd_pd(tv, tv);
+	  _mm_storel_pd(&term, tv);
+	  	  
+	  term = LOG(FABS(term));	 
+	  
+	  sum += wptr[i] * term;      
+	}
+    }
+             
+  return  sum;         
+} 
+
+
+static double evaluateGTRCAT_SAVE (int *cptr, int *wptr,
+				   double *x1_start, double *x2_start, double *tipVector, 		      
+				   unsigned char *tipX1, int n, double *diagptable_start,
+				   double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  double  sum = 0.0, term;       
+  int     i;
+
+  double  *diagptable, 
+    *x1, 
+    *x2,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start;
+ 
+  if(tipX1)
+    {           
+      for (i = 0; i < n; i++) 
+	{	
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+	  x1 = &(tipVector[4 * tipX1[i]]);
+
+	  if(isGap(x2_gap, i))
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;
+	      x2_ptr += 4;
+	    }
+	  
+	  diagptable = &diagptable_start[4 * cptr[i]];
+	  	    	  
+	  x1v1 =  _mm_load_pd(&x1[0]);
+	  x1v2 =  _mm_load_pd(&x1[2]);
+	  x2v1 =  _mm_load_pd(&x2[0]);
+	  x2v2 =  _mm_load_pd(&x2[2]);
+	  dv1  =  _mm_load_pd(&diagptable[0]);
+	  dv2  =  _mm_load_pd(&diagptable[2]);
+	  
+	  x1v1 = _mm_mul_pd(x1v1, x2v1);
+	  x1v1 = _mm_mul_pd(x1v1, dv1);
+	  
+	  x1v2 = _mm_mul_pd(x1v2, x2v2);
+	  x1v2 = _mm_mul_pd(x1v2, dv2);
+	  
+	  x1v1 = _mm_add_pd(x1v1, x1v2);
+	  
+	  _mm_store_pd(t, x1v1);
+	  	  
+	  term = LOG(FABS(t[0] + t[1]));
+	      
+	 
+
+	  sum += wptr[i] * term;
+	}	
+    }               
+  else
+    {
+      for (i = 0; i < n; i++) 
+	{ 
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+	   
+	  if(isGap(x1_gap, i))
+	    x1 = x1_gapColumn;
+	  else
+	    {
+	      x1 = x1_ptr;
+	      x1_ptr += 4;
+	    }
+	  
+	  if(isGap(x2_gap, i))
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;
+	      x2_ptr += 4;
+	    }
+	  
+	  diagptable = &diagptable_start[4 * cptr[i]];	
+	  
+	  x1v1 =  _mm_load_pd(&x1[0]);
+	  x1v2 =  _mm_load_pd(&x1[2]);
+	  x2v1 =  _mm_load_pd(&x2[0]);
+	  x2v2 =  _mm_load_pd(&x2[2]);
+	  dv1  =  _mm_load_pd(&diagptable[0]);
+	  dv2  =  _mm_load_pd(&diagptable[2]);
+	  
+	  x1v1 = _mm_mul_pd(x1v1, x2v1);
+	  x1v1 = _mm_mul_pd(x1v1, dv1);
+	  
+	  x1v2 = _mm_mul_pd(x1v2, x2v2);
+	  x1v2 = _mm_mul_pd(x1v2, dv2);
+	  
+	  x1v1 = _mm_add_pd(x1v1, x1v2);
+	  
+	  _mm_store_pd(t, x1v1);
+	  
+	 
+	  term = LOG(FABS(t[0] + t[1]));
+	  
+	  sum += wptr[i] * term;
+	}    
+    }
+       
+  return  sum;         
+} 
+
+
+static double evaluateGTRGAMMA_GAPPED_SAVE(int *wptr,
+					   double *x1_start, double *x2_start, 
+					   double *tipVector, 
+					   unsigned char *tipX1, const int n, double *diagptable,
+					   double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  double   sum = 0.0, term;    
+  int     i, j;
+  double  
+    *x1, 
+    *x2,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start;
+
+ 
+
+  if(tipX1)
+    {        
+     
+      
+      for (i = 0; i < n; i++)
+	{
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d termv, x1v, x2v, dv;
+
+	  x1 = &(tipVector[4 * tipX1[i]]);	 
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;	 
+	      x2_ptr += 16;
+	    }
+	  
+	
+	  termv = _mm_set1_pd(0.0);	    	   
+	  
+	  for(j = 0; j < 4; j++)
+	    {
+	      x1v = _mm_load_pd(&x1[0]);
+	      x2v = _mm_load_pd(&x2[j * 4]);
+	      dv   = _mm_load_pd(&diagptable[j * 4]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	      
+	      x1v = _mm_load_pd(&x1[2]);
+	      x2v = _mm_load_pd(&x2[j * 4 + 2]);
+	      dv   = _mm_load_pd(&diagptable[j * 4 + 2]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	    }
+	  
+	  _mm_store_pd(t, termv);	  	 
+
+	 
+	  term = LOG(0.25 * FABS(t[0] + t[1]));
+	   
+	  
+	  sum += wptr[i] * term;
+	}     
+    }
+  else
+    {        
+      
+      for (i = 0; i < n; i++) 
+	{
+
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d termv, x1v, x2v, dv;
+
+	  if(x1_gap[i / 32] & mask32[i % 32])
+	    x1 = x1_gapColumn;
+	  else
+	    {
+	      x1 = x1_ptr; 	  	  
+	      x1_ptr += 16;
+	    }
+	 	      
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;
+	      x2_ptr += 16;
+	    }
+	
+	  termv = _mm_set1_pd(0.0);	  	 
+	  
+	  for(j = 0; j < 4; j++)
+	    {
+	      x1v = _mm_load_pd(&x1[j * 4]);
+	      x2v = _mm_load_pd(&x2[j * 4]);
+	      dv   = _mm_load_pd(&diagptable[j * 4]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	      
+	      x1v = _mm_load_pd(&x1[j * 4 + 2]);
+	      x2v = _mm_load_pd(&x2[j * 4 + 2]);
+	      dv   = _mm_load_pd(&diagptable[j * 4 + 2]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	    }
+	  
+	  _mm_store_pd(t, termv);
+
+	 
+	  term = LOG(0.25 * FABS(t[0] + t[1]));
+	 	  
+	  
+	  sum += wptr[i] * term;
+	}                      	
+    }
+
+  return sum;
+} 
+
+
+static double evaluateGTRGAMMA(int *wptr,
+			       double *x1_start, double *x2_start, 
+			       double *tipVector, 
+			       unsigned char *tipX1, const int n, double *diagptable)
+{
+  double   sum = 0.0, term;    
+  int     i, j;
+
+  double  *x1, *x2;             
+
+ 
+
+  if(tipX1)
+    {          	
+      for (i = 0; i < n; i++)
+	{
+
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d termv, x1v, x2v, dv;
+
+	  x1 = &(tipVector[4 * tipX1[i]]);	 
+	  x2 = &x2_start[16 * i];	 
+	  
+	
+	  termv = _mm_set1_pd(0.0);	    	   
+	  
+	  for(j = 0; j < 4; j++)
+	    {
+	      x1v = _mm_load_pd(&x1[0]);
+	      x2v = _mm_load_pd(&x2[j * 4]);
+	      dv   = _mm_load_pd(&diagptable[j * 4]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	      
+	      x1v = _mm_load_pd(&x1[2]);
+	      x2v = _mm_load_pd(&x2[j * 4 + 2]);
+	      dv   = _mm_load_pd(&diagptable[j * 4 + 2]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	    }
+	  
+	  _mm_store_pd(t, termv);
+	  
+	  
+	
+	  term = LOG(0.25 * FABS(t[0] + t[1]));
+	  
+	 
+	  
+	  sum += wptr[i] * term;
+	}     
+    }
+  else
+    {        
+      for (i = 0; i < n; i++) 
+	{
+
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d termv, x1v, x2v, dv;
+
+	  	 	  	  
+	  x1 = &x1_start[16 * i];
+	  x2 = &x2_start[16 * i];	  	  
+	
+	
+	  termv = _mm_set1_pd(0.0);	  	 
+	  
+	  for(j = 0; j < 4; j++)
+	    {
+	      x1v = _mm_load_pd(&x1[j * 4]);
+	      x2v = _mm_load_pd(&x2[j * 4]);
+	      dv   = _mm_load_pd(&diagptable[j * 4]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	      
+	      x1v = _mm_load_pd(&x1[j * 4 + 2]);
+	      x2v = _mm_load_pd(&x2[j * 4 + 2]);
+	      dv   = _mm_load_pd(&diagptable[j * 4 + 2]);
+	      
+	      x1v = _mm_mul_pd(x1v, x2v);
+	      x1v = _mm_mul_pd(x1v, dv);
+	      
+	      termv = _mm_add_pd(termv, x1v);
+	    }
+	  
+	  _mm_store_pd(t, termv);
+
+	  
+	    term = LOG(0.25 * FABS(t[0] + t[1]));
+	 	  
+
+	  
+	  sum += wptr[i] * term;
+	}                      	
+    }
+
+  return sum;
+} 
+
+
+static double evaluateGTRCAT (int *cptr, int *wptr,
+			      double *x1_start, double *x2_start, double *tipVector, 		      
+			      unsigned char *tipX1, int n, double *diagptable_start)
+{
+  double  sum = 0.0, term;       
+  int     i;
+
+  double  *diagptable, *x1, *x2;                      	    
+ 
+  if(tipX1)
+    {           
+      for (i = 0; i < n; i++) 
+	{	
+
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	  __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &x2_start[4 * i];
+	  
+	  diagptable = &diagptable_start[4 * cptr[i]];
+	  
+	    	  
+	  x1v1 =  _mm_load_pd(&x1[0]);
+	  x1v2 =  _mm_load_pd(&x1[2]);
+	  x2v1 =  _mm_load_pd(&x2[0]);
+	  x2v2 =  _mm_load_pd(&x2[2]);
+	  dv1  =  _mm_load_pd(&diagptable[0]);
+	  dv2  =  _mm_load_pd(&diagptable[2]);
+	  
+	  x1v1 = _mm_mul_pd(x1v1, x2v1);
+	  x1v1 = _mm_mul_pd(x1v1, dv1);
+	  
+	  x1v2 = _mm_mul_pd(x1v2, x2v2);
+	  x1v2 = _mm_mul_pd(x1v2, dv2);
+	  
+	  x1v1 = _mm_add_pd(x1v1, x1v2);
+	  
+	  _mm_store_pd(t, x1v1);
+	  
+	  
+	  term = LOG(FABS(t[0] + t[1]));
+	  
+
+	  sum += wptr[i] * term;
+	}	
+    }               
+  else
+    {
+      for (i = 0; i < n; i++) 
+	{ 
+
+	  double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+	   __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+	  x1 = &x1_start[4 * i];
+	  x2 = &x2_start[4 * i];
+	  
+	  diagptable = &diagptable_start[4 * cptr[i]];	
+	  
+  
+	  x1v1 =  _mm_load_pd(&x1[0]);
+	  x1v2 =  _mm_load_pd(&x1[2]);
+	  x2v1 =  _mm_load_pd(&x2[0]);
+	  x2v2 =  _mm_load_pd(&x2[2]);
+	  dv1  =  _mm_load_pd(&diagptable[0]);
+	  dv2  =  _mm_load_pd(&diagptable[2]);
+	  
+	  x1v1 = _mm_mul_pd(x1v1, x2v1);
+	  x1v1 = _mm_mul_pd(x1v1, dv1);
+	  
+	  x1v2 = _mm_mul_pd(x1v2, x2v2);
+	  x1v2 = _mm_mul_pd(x1v2, dv2);
+	  
+	  x1v1 = _mm_add_pd(x1v1, x1v2);
+	  
+	  _mm_store_pd(t, x1v1);
+	  
+	 
+	  term = LOG(FABS(t[0] + t[1]));
+	  
+
+	  sum += wptr[i] * term;
+	}    
+    }
+       
+  return  sum;         
+} 
+
+
+
+
+
+#endif
+
+
diff --git a/examl/evaluatePartialGenericSpecial.c b/examl/evaluatePartialGenericSpecial.c
new file mode 100644
index 0000000..8ea1e1f
--- /dev/null
+++ b/examl/evaluatePartialGenericSpecial.c
@@ -0,0 +1,1058 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ 
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32 
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+#endif
+
+
+
+#if defined(_OPTIMIZED_FUNCTIONS) && !defined(__MIC_NATIVE)
+static inline void computeVectorGTRCATPROT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+					   traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					   unsigned  char **yVector, int mxtips);
+
+static double evaluatePartialGTRCATPROT(int i, double ki, int counter,  traversalInfo *ti, double qz,
+					int w, double *EIGN, double *EI, double *EV,
+					double *tipVector, unsigned char **yVector, 
+					int branchReference, int mxtips);
+
+static inline void computeVectorGTRGAMMAPROT(double *lVector, int *eVector, double *gammaRates, int i, double qz, double rz,
+					     traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					     unsigned  char **yVector, int mxtips);
+
+static double evaluatePartialGTRGAMMAPROT(int i, int counter,  traversalInfo *ti, double qz,
+					  int w, double *EIGN, double *EI, double *EV,
+					  double *tipVector, unsigned char **yVector, 
+					  double *gammaRates,
+					  int branchReference, int mxtips);
+
+static inline void computeVectorGTRCAT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+				       traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+				       unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRCAT(int i, double ki, int counter,  traversalInfo *ti, double qz,
+				    int w, double *EIGN, double *EI, double *EV,
+				    double *tipVector, unsigned  char **yVector, 
+				    int branchReference, int mxtips);
+
+
+static inline void computeVectorGTRCAT_BINARY(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+					      traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					      unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRCAT_BINARY(int i, double ki, int counter,  traversalInfo *ti, double qz,
+					   int w, double *EIGN, double *EI, double *EV,
+					   double *tipVector, unsigned  char **yVector, 
+					   int branchReference, int mxtips);
+
+
+#else
+
+static inline void computeVectorCAT_FLEX(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+					 traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					 unsigned char **yVector, int mxtips, const int states)
+{       
+  double  
+    *d1 =    (double *)malloc(sizeof(double) * states), 
+    *d2 =    (double *)malloc(sizeof(double) * states),  
+    *x1px2 = (double *)malloc(sizeof(double) * states), 
+    ump_x1, 
+    ump_x2,    
+    lz1, 
+    lz2,
+    *x1, 
+    *x2, 
+    *x3;
+  
+  int 
+    scale,
+    j, 
+    k,
+    pNumber = ti->pNumber,
+    rNumber = ti->rNumber,
+    qNumber = ti->qNumber;
+ 
+  x3  = &lVector[states * (pNumber  - mxtips)];  
+ 
+  switch(ti->tipCase)
+    {
+    case TIP_TIP:     
+      x1 = &(tipVector[states * yVector[qNumber][i]]);
+      x2 = &(tipVector[states * yVector[rNumber][i]]);    
+      break;
+    case TIP_INNER:     
+      x1 = &(tipVector[states * yVector[qNumber][i]]);
+      x2 = &(lVector[states * (rNumber - mxtips)]);           
+      break;
+    case INNER_INNER:            
+      x1 = &(lVector[states * (qNumber - mxtips)]);
+      x2 = &(lVector[states * (rNumber - mxtips)]);     
+      break;
+    default:
+      assert(0);
+    }
+     
+  lz1 = qz * ki;  
+  lz2 = rz * ki;
+  
+  d1[0] = x1[0];
+  d2[0] = x2[0];
+
+
+  for(j = 1; j < states; j++)
+    {
+      d1[j] = x1[j] * EXP(EIGN[j] * lz1);
+      d2[j] = x2[j] * EXP(EIGN[j] * lz2);	    
+    }
+ 
+ 
+  for(j = 0; j < states; j++)
+    {         
+      ump_x1 = 0.0;
+      ump_x2 = 0.0;
+
+      for(k = 0; k < states; k++)
+	{
+	  ump_x1 += d1[k] * EI[j * states + k];
+	  ump_x2 += d2[k] * EI[j * states + k];
+	}
+      
+      x1px2[j] = ump_x1 * ump_x2;
+    }
+  
+  for(j = 0; j < states; j++)
+    x3[j] = 0.0;
+
+  for(j = 0; j < states; j++)          
+    for(k = 0; k < states; k++)	
+      x3[k] +=  x1px2[j] *  EV[states * j + k];	   
+      
+  scale = 1;
+  for(j = 0; scale && (j < states); j++)
+    scale = ((x3[j] < minlikelihood) && (x3[j] > minusminlikelihood));
+  
+  if(scale)
+    {
+      for(j = 0; j < states; j++)
+	x3[j] *= twotothe256;       
+      *eVector = *eVector + 1;
+    }	              
+
+  free(d1);
+  free(d2);
+  free(x1px2);
+       
+  return;
+}
+
+
+static double evaluatePartialCAT_FLEX(int i, double ki, int counter,  traversalInfo *ti, double qz,
+				      int w, double *EIGN, double *EI, double *EV,
+				      double *tipVector, unsigned  char **yVector, 
+				      int branchReference, int mxtips, const int states)
+{
+  int 
+    scale = 0, 
+    k;
+  
+  double 
+    *lVector = (double *)malloc_aligned(sizeof(double) * states * mxtips),
+    *d = (double *)malloc_aligned(sizeof(double) * states),
+    lz, 
+    term, 
+    *x1, 
+    *x2; 
+
+  traversalInfo 
+    *trav = &ti[0];
+ 
+  assert(isTip(trav->pNumber, mxtips));
+     
+  x1 = &(tipVector[states *  yVector[trav->pNumber][i]]);   
+
+  for(k = 1; k < counter; k++)    
+    {
+      double 
+	qz = ti[k].qz[branchReference],
+	rz = ti[k].rz[branchReference];
+      
+      qz = (qz > zmin) ? log(qz) : log(zmin);
+      rz = (rz > zmin) ? log(rz) : log(zmin);
+
+      computeVectorCAT_FLEX(lVector, &scale, ki, i, qz, rz, &ti[k], 
+			    EIGN, EI, EV, 
+			    tipVector, yVector, mxtips, states);       
+    }
+   
+  x2 = &lVector[states * (trav->qNumber - mxtips)]; 
+
+  assert(0 <=  (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);  
+       
+  if(qz < zmin) 
+    lz = zmin;
+  lz  = log(qz); 
+  lz *= ki;  
+  
+  d[0] = 1.0;
+
+  for(k = 1; k < states; k++)
+    d[k] = EXP (EIGN[k] * lz);
+  
+  term = 0.0;
+
+  for(k = 0; k < states; k++) 
+    term += x1[k] * x2[k] * d[k];       
+
+  term = LOG(FABS(term)) + (scale * LOG(minlikelihood));   
+
+  term = term * w;
+
+  free(lVector);  
+  free(d);
+
+  return  term;
+}
+
+#endif
+
+double evaluatePartialGeneric (tree *tr, int i, double ki, int _model)
+{
+  double result;
+  int 
+    branchReference,
+    states = tr->partitionData[_model].states;
+    
+  int 
+    index;
+
+  index = i;
+
+  
+  if(tr->numBranches > 1)
+    branchReference = _model;
+  else
+    branchReference = 0;
+
+#ifndef _OPTIMIZED_FUNCTIONS
+  if(tr->rateHetModel == CAT)
+    result = evaluatePartialCAT_FLEX(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference], 
+				     tr->partitionData[_model].wgt[index],
+				     tr->partitionData[_model].EIGN, 
+				     tr->partitionData[_model].EI, 
+				     tr->partitionData[_model].EV,
+				     tr->partitionData[_model].tipVector,
+				     tr->partitionData[_model].yVector, branchReference, tr->mxtips, states);
+  else
+    /* 
+       the per-site site likelihood function should only be called for the CAT model
+       under the GAMMA model this is required only for estimating per-site protein models 
+       which has however been removed in this version of the code
+    */
+    assert(0); 
+  
+ 
+#elif defined(__MIC_NATIVE)
+if (tr->rateHetModel == CAT)
+    result = evaluatePartialCAT_FLEX(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+                     tr->partitionData[_model].wgt[index],
+                     tr->partitionData[_model].EIGN,
+                     tr->partitionData[_model].EI,
+                     tr->partitionData[_model].EV,
+                     tr->partitionData[_model].tipVector,
+                     tr->partitionData[_model].yVector, branchReference, tr->mxtips, states);
+else
+    assert(0);
+
+#else
+  switch(states)
+    {
+    case 2:
+      assert(!tr->saveMemory);
+      assert(tr->rateHetModel == CAT);
+
+       result = evaluatePartialGTRCAT_BINARY(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference], 
+					    tr->partitionData[_model].wgt[index],
+					    tr->partitionData[_model].EIGN, 
+					    tr->partitionData[_model].EI, 
+					    tr->partitionData[_model].EV,
+					    tr->partitionData[_model].tipVector,
+					    tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+
+      
+
+      break;
+    case 4:   /* DNA */
+      assert(tr->rateHetModel == CAT);  
+      
+      result = evaluatePartialGTRCAT(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference], 
+				     tr->partitionData[_model].wgt[index],
+				     tr->partitionData[_model].EIGN, 
+				     tr->partitionData[_model].EI, 
+				     tr->partitionData[_model].EV,
+				     tr->partitionData[_model].tipVector,
+				     tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+      break;
+    case 20: /* proteins */
+      if(tr->rateHetModel == CAT)
+	result = evaluatePartialGTRCATPROT(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference], 
+					   tr->partitionData[_model].wgt[index],
+					   tr->partitionData[_model].EIGN, 
+					   tr->partitionData[_model].EI, 
+					   tr->partitionData[_model].EV,
+					   tr->partitionData[_model].tipVector, 
+					   tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+      else
+	result =  evaluatePartialGTRGAMMAPROT(index, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference], 
+					      tr->partitionData[_model].wgt[index],
+					      tr->partitionData[_model].EIGN, 
+					      tr->partitionData[_model].EI, 
+					      tr->partitionData[_model].EV,
+					      tr->partitionData[_model].tipVector, 
+					      tr->partitionData[_model].yVector, 
+					      tr->partitionData[_model].gammaRates,
+					      branchReference, tr->mxtips);
+      break;   
+    default:
+      assert(0);
+    }
+  #endif
+ 
+
+  return result;
+}
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+
+static inline void computeVectorGTRCAT_BINARY(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+					      traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					      unsigned char **yVector, int mxtips)
+{       
+  double  d1, d2,  ump_x1, ump_x2, x1px2[2], lz1, lz2; 
+  double *x1, *x2, *x3;
+  int 
+    j, k,
+    pNumber = ti->pNumber,
+    rNumber = ti->rNumber,
+    qNumber = ti->qNumber;
+ 
+  x3  = &lVector[2 * (pNumber  - mxtips)];  
+
+  switch(ti->tipCase)
+    {
+    case TIP_TIP:     
+      x1 = &(tipVector[2 * yVector[qNumber][i]]);
+      x2 = &(tipVector[2 * yVector[rNumber][i]]);   
+      break;
+    case TIP_INNER:     
+      x1 = &(tipVector[2 * yVector[qNumber][i]]);
+      x2 = &lVector[2 * (rNumber - mxtips)];                    
+      break;
+    case INNER_INNER:            
+      x1 = &lVector[2 * (qNumber - mxtips)];
+      x2 = &lVector[2 * (rNumber - mxtips)];               
+      break;
+    default:
+      assert(0);
+    }
+     
+  lz1 = qz * ki;  
+  lz2 = rz * ki;
+  
+ 
+  d1 = x1[1] * EXP(EIGN[1] * lz1);
+  d2 = x2[1] * EXP(EIGN[1] * lz2);	        
+ 
+  for(j = 0; j < 2; j++)
+    {     
+      ump_x1 = x1[0];
+      ump_x2 = x2[0];
+      
+      ump_x1 += d1 * EI[j * 2 + 1];
+      ump_x2 += d2 * EI[j * 2 + 1];
+	
+      x1px2[j] = ump_x1 * ump_x2;
+    }
+  
+  for(j = 0; j < 2; j++)
+    x3[j] = 0.0;
+
+  for(j = 0; j < 2; j++)          
+    for(k = 0; k < 2; k++)	
+      x3[k] +=  x1px2[j] *  EV[2 * j + k];	   
+      
+  
+  if (x3[0] < minlikelihood && x3[0] > minusminlikelihood &&
+      x3[1] < minlikelihood && x3[1] > minusminlikelihood
+      )
+    {	     
+      x3[0]   *= twotothe256;
+      x3[1]   *= twotothe256;     
+      *eVector = *eVector + 1;
+    }	              
+
+  return;
+}
+
+static double evaluatePartialGTRCAT_BINARY(int i, double ki, int counter,  traversalInfo *ti, double qz,
+					   int w, double *EIGN, double *EI, double *EV,
+					   double *tipVector, unsigned  char **yVector, 
+					   int branchReference, int mxtips)
+{
+  double lz, term;       
+  double  d;
+  double   *x1, *x2; 
+  int scale = 0, k;
+  double *lVector = (double *)malloc(sizeof(double) * 2 * mxtips);  
+  traversalInfo *trav = &ti[0];
+ 
+  assert(isTip(trav->pNumber, mxtips));
+     
+  x1 = &(tipVector[2 *  yVector[trav->pNumber][i]]);   
+
+  for(k = 1; k < counter; k++)  
+    {
+      double 
+	qz = ti[k].qz[branchReference],
+	rz = ti[k].rz[branchReference];
+      
+      qz = (qz > zmin) ? log(qz) : log(zmin);
+      rz = (rz > zmin) ? log(rz) : log(zmin);
+
+      computeVectorGTRCAT_BINARY(lVector, &scale, ki, i, qz, rz, &ti[k], 
+				 EIGN, EI, EV, 
+				 tipVector, yVector, mxtips);       
+    }
+   
+  x2 = &lVector[2 * (trav->qNumber - mxtips)];
+     
+  assert(0 <=  (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);  
+       
+  if(qz < zmin) 
+    lz = zmin;
+  lz  = log(qz); 
+  lz *= ki;  
+  
+  d = EXP(EIGN[1] * lz);
+  
+  term =  x1[0] * x2[0];
+  term += x1[1] * x2[1] * d; 
+
+  term = LOG(FABS(term)) + (scale * LOG(minlikelihood));   
+
+  term = term * w;
+
+  free(lVector);
+  
+  return  term;
+}
+
+
+
+static inline void computeVectorGTRCATPROT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+					   traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					   unsigned  char **yVector, int mxtips)
+{       
+  double   *x1, *x2, *x3;  
+  int
+    pNumber = ti->pNumber,
+    rNumber = ti->rNumber,
+    qNumber = ti->qNumber;
+ 
+  x3  = &(lVector[20 * (pNumber  - mxtips)]);     
+
+  switch(ti->tipCase)
+    {
+    case TIP_TIP:    
+      x1 = &(tipVector[20 * yVector[qNumber][i]]);
+      x2 = &(tipVector[20 * yVector[rNumber][i]]);     
+      break;
+    case TIP_INNER:     
+      x1 = &(tipVector[20 * yVector[qNumber][i]]);
+      x2 = &(  lVector[20 * (rNumber - mxtips)]);                    
+      break;
+    case INNER_INNER:            
+      x1 = &(lVector[20 * (qNumber - mxtips)]);
+      x2 = &(lVector[20 * (rNumber - mxtips)]);                 
+      break;    
+    default:
+      assert(0);
+    }
+     
+  {
+    double  
+      e1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+      e2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+      d1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+      d2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+      lz1, 
+      lz2;  
+    
+    int 
+      l, 
+      k, 
+      scale;
+     
+    lz1 = qz * ki;            
+    lz2 = rz * ki;        
+
+    e1[0] = 1.0;
+    e2[0] = 1.0;
+    
+    for(l = 1; l < 20; l++)
+      {
+	e1[l] = EXP(EIGN[l] * lz1);
+	e2[l] = EXP(EIGN[l] * lz2);
+      }
+
+    for(l = 0; l < 20; l+=2)
+      {
+	__m128d d1v = _mm_mul_pd(_mm_load_pd(&x1[l]), _mm_load_pd(&e1[l]));
+	__m128d d2v = _mm_mul_pd(_mm_load_pd(&x2[l]), _mm_load_pd(&e2[l]));
+	
+	_mm_store_pd(&d1[l], d1v);
+	_mm_store_pd(&d2[l], d2v);	
+      }
+
+    __m128d zero = _mm_setzero_pd();
+
+    for(l = 0; l < 20; l+=2)
+      _mm_store_pd(&x3[l], zero);
+                
+    for(l = 0; l < 20; l++)
+      { 	      
+	double *ev = &EV[l * 20];
+	__m128d ump_x1v = _mm_setzero_pd();
+	__m128d ump_x2v = _mm_setzero_pd();
+	__m128d x1px2v;
+
+	for(k = 0; k < 20; k+=2)
+	  {       
+	    __m128d eiv = _mm_load_pd(&EI[20 * l + k]);
+	    __m128d d1v = _mm_load_pd(&d1[k]);
+	    __m128d d2v = _mm_load_pd(&d2[k]);
+	    ump_x1v = _mm_add_pd(ump_x1v, _mm_mul_pd(d1v, eiv));
+	    ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(d2v, eiv));	  
+	  }
+
+	ump_x1v = _mm_hadd_pd(ump_x1v, ump_x1v);
+	ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+	x1px2v = _mm_mul_pd(ump_x1v, ump_x2v);
+
+	for(k = 0; k < 20; k+=2)
+	  {
+	    __m128d ex3v = _mm_load_pd(&x3[k]);
+	    __m128d EVV  = _mm_load_pd(&ev[k]);
+	    ex3v = _mm_add_pd(ex3v, _mm_mul_pd(x1px2v, EVV));
+	    
+	    _mm_store_pd(&x3[k], ex3v);	   	   
+	  }
+      }                      
+    
+    scale = 1;
+    for(l = 0; scale && (l < 20); l++)
+      scale = ((x3[l] < minlikelihood) && (x3[l] > minusminlikelihood));	       	      	      	       	       
+    
+    if(scale)
+      {	      
+	__m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+	for(l = 0; l < 20; l+=2)
+	  {
+	    __m128d ex3v = _mm_mul_pd(_mm_load_pd(&x3[l]),twoto);
+	    _mm_store_pd(&x3[l], ex3v);	
+	  }
+ 	
+
+
+	*eVector = *eVector + 1;
+      }
+    
+    return;      
+  }
+}
+
+static double evaluatePartialGTRCATPROT(int i, double ki, int counter,  traversalInfo *ti, double qz,
+					int w, double *EIGN, double *EI, double *EV,
+					double *tipVector, unsigned char **yVector, 
+					int branchReference, int mxtips)
+{
+  double lz, term;       
+  double  d[20];
+  double   *x1, *x2; 
+  int scale = 0, k, l;
+  double 
+    *lVector = (double *)malloc_aligned(sizeof(double) * 20 * mxtips),
+    myEI[400]  __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+  traversalInfo *trav = &ti[0];
+
+  
+
+  for(k = 0; k < 20; k++)
+    {     
+      for(l = 0; l < 20; l++)
+	myEI[k * 20 + l] = EI[k * 20 + l];
+    }
+
+  assert(isTip(trav->pNumber, mxtips));
+     
+  x1 = &(tipVector[20 *  yVector[trav->pNumber][i]]);   
+
+  for(k = 1; k < counter; k++)                
+    {
+      double 
+	qz = ti[k].qz[branchReference],
+	rz = ti[k].rz[branchReference];
+      
+      qz = (qz > zmin) ? log(qz) : log(zmin);
+      rz = (rz > zmin) ? log(rz) : log(zmin);
+
+      computeVectorGTRCATPROT(lVector, &scale, ki, i, qz, rz, 
+			      &ti[k], EIGN, myEI, EV, 
+			      tipVector, yVector, mxtips);       
+    }
+   
+  x2 = &lVector[20 * (trav->qNumber - mxtips)];
+
+       
+
+  assert(0 <=  (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);  
+  
+  if(qz < zmin) 
+    lz = zmin;
+  lz  = log(qz); 
+  lz *= ki;
+  
+  d[0] = 1.0;
+  for(l = 1; l < 20; l++)
+    d[l] = EXP (EIGN[l] * lz);
+
+  term = 0.0;
+  
+  for(l = 0; l < 20; l++)
+    term += x1[l] * x2[l] * d[l];   
+
+  term = LOG(FABS(term)) + (scale * LOG(minlikelihood));   
+
+  term = term * w;
+
+  free(lVector);
+  
+ 
+  return  term;
+}
+static inline void computeVectorGTRGAMMAPROT(double *lVector, int *eVector, double *gammaRates, int i, double qz, double rz,
+					     traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+					     unsigned  char **yVector, int mxtips)
+{       
+  double   
+    *x1, 
+    *x2, 
+    *x3;  
+  
+  int
+    s,
+    pNumber = ti->pNumber,
+    rNumber = ti->rNumber,
+    qNumber = ti->qNumber,
+    index1[4],
+    index2[4];
+  
+ 
+  x3  = &(lVector[80 * (pNumber  - mxtips)]);     
+
+  switch(ti->tipCase)
+    {
+    case TIP_TIP:    
+      x1 = &(tipVector[20 * yVector[qNumber][i]]);
+      x2 = &(tipVector[20 * yVector[rNumber][i]]);     
+      for(s = 0; s < 4; s++)
+	{
+	  index1[s] = 0;
+	  index2[s] = 0;
+	}
+      break;
+    case TIP_INNER:     
+      x1 = &(tipVector[20 * yVector[qNumber][i]]);
+      x2 = &(  lVector[80 * (rNumber - mxtips)]);   
+      for(s = 0; s < 4; s++)       
+	index1[s] = 0;
+      for(s = 0; s < 4; s++)     
+	index2[s] = s;                     
+      break;
+    case INNER_INNER:            
+      x1 = &(lVector[80 * (qNumber - mxtips)]);
+      x2 = &(lVector[80 * (rNumber - mxtips)]); 
+      for(s = 0; s < 4; s++)
+	{
+	  index1[s] = s;
+	  index2[s] = s;
+	}                
+      break;    
+    default:
+      assert(0);
+    }
+     
+  {
+    double  
+      e1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+      e2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+      d1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+      d2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+      lz1, lz2;  
+    
+    int 
+      l, 
+      k, 
+      scale, 
+      j;
+     
+    for(j = 0; j < 4; j++)
+      {
+	lz1 = qz * gammaRates[j];            
+	lz2 = rz * gammaRates[j];        
+
+	e1[0] = 1.0;
+	e2[0] = 1.0;
+    
+	for(l = 1; l < 20; l++)
+	  {
+	    e1[l] = EXP(EIGN[l] * lz1);
+	    e2[l] = EXP(EIGN[l] * lz2);
+	  }
+
+	for(l = 0; l < 20; l+=2)
+	  {
+	    __m128d d1v = _mm_mul_pd(_mm_load_pd(&x1[20 * index1[j] + l]), _mm_load_pd(&e1[l]));
+	    __m128d d2v = _mm_mul_pd(_mm_load_pd(&x2[20 * index2[j] + l]), _mm_load_pd(&e2[l]));
+	    
+	    _mm_store_pd(&d1[l], d1v);
+	    _mm_store_pd(&d2[l], d2v);	
+	  }
+
+	__m128d zero = _mm_setzero_pd();
+
+	for(l = 0; l < 20; l+=2)
+	  _mm_store_pd(&x3[j * 20 + l], zero);
+                
+	for(l = 0; l < 20; l++)
+	  { 	      
+	    double *ev = &EV[l * 20];
+	    __m128d ump_x1v = _mm_setzero_pd();
+	    __m128d ump_x2v = _mm_setzero_pd();
+	    __m128d x1px2v;
+	    
+	    for(k = 0; k < 20; k+=2)
+	      {       
+		__m128d eiv = _mm_load_pd(&EI[20 * l + k]);
+		__m128d d1v = _mm_load_pd(&d1[k]);
+		__m128d d2v = _mm_load_pd(&d2[k]);
+		ump_x1v = _mm_add_pd(ump_x1v, _mm_mul_pd(d1v, eiv));
+		ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(d2v, eiv));	  
+	      }
+
+	    ump_x1v = _mm_hadd_pd(ump_x1v, ump_x1v);
+	    ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+	    x1px2v = _mm_mul_pd(ump_x1v, ump_x2v);
+
+	    for(k = 0; k < 20; k+=2)
+	      {
+		__m128d ex3v = _mm_load_pd(&x3[j * 20 + k]);
+		__m128d EVV  = _mm_load_pd(&ev[k]);
+		ex3v = _mm_add_pd(ex3v, _mm_mul_pd(x1px2v, EVV));
+		
+		_mm_store_pd(&x3[j * 20 + k], ex3v);	   	   
+	      }
+	  }        
+      }
+    
+    scale = 1;
+    for(l = 0; scale && (l < 80); l++)
+      scale = ((x3[l] < minlikelihood) && (x3[l] > minusminlikelihood));	       	      	      	       	       
+    
+    if(scale)
+      {	      
+	__m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+	for(l = 0; l < 80; l+=2)
+	  {
+	    __m128d ex3v = _mm_mul_pd(_mm_load_pd(&x3[l]),twoto);
+	    _mm_store_pd(&x3[l], ex3v);	
+	  }
+
+	*eVector = *eVector + 1;
+      }
+    
+    return;      
+  }
+}
+
+
+static double evaluatePartialGTRGAMMAPROT(int i, int counter,  traversalInfo *ti, double qz,
+					  int w, double *EIGN, double *EI, double *EV,
+					  double *tipVector, unsigned char **yVector, 
+					  double *gammaRates,
+					  int branchReference, int mxtips)
+{
+  double lz, term;       
+  double  d[80];
+  double   *x1, *x2; 
+  int scale = 0, k, l, j;
+  double 
+    *lVector = (double *)malloc_aligned(sizeof(double) * 80 * mxtips),
+    myEI[400]  __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+  traversalInfo 
+    *trav = &ti[0];
+
+  for(k = 0; k < 20; k++)
+    {     
+      for(l = 0; l < 20; l++)
+	myEI[k * 20 + l] = EI[k * 20 + l];
+    }
+
+  assert(isTip(trav->pNumber, mxtips));
+     
+  x1 = &(tipVector[20 *  yVector[trav->pNumber][i]]);   
+
+  for(k = 1; k < counter; k++)                
+    {
+      double 
+	qz = ti[k].qz[branchReference],
+	rz = ti[k].rz[branchReference];
+      
+      qz = (qz > zmin) ? log(qz) : log(zmin);
+      rz = (rz > zmin) ? log(rz) : log(zmin);
+
+      computeVectorGTRGAMMAPROT(lVector, &scale, gammaRates, i, qz, rz, 
+				&ti[k], EIGN, myEI, EV, 
+				tipVector, yVector, mxtips);
+    }
+   
+  x2 = &lVector[80 * (trav->qNumber - mxtips)];       
+
+  assert(0 <=  (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);  
+  
+  if(qz < zmin) 
+    lz = zmin;
+  lz  = log(qz); 
+  
+  for(j = 0; j < 4; j++)
+    {
+      d[20 * j] = 1.0;
+      for(l = 1; l < 20; l++)
+	d[20 * j + l] = EXP(EIGN[l] * lz * gammaRates[j]);
+    }
+
+ 
+  for(j = 0, term = 0.0; j < 4; j++)
+    {
+      for(l = 0; l < 20; l++)
+	term += x1[l] * x2[20 * j + l] * d[j * 20 + l];	      
+    }
+  
+  term = LOG(0.25 * FABS(term)) + (scale * LOG(minlikelihood));   
+
+  term = term * w;
+
+  free(lVector);
+  
+ 
+  return  term;
+}
+
+
+
+
+
+static inline void computeVectorGTRCAT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+				       traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector, 
+				       unsigned char **yVector, int mxtips)
+{       
+  double  d1[3], d2[3],  ump_x1, ump_x2, x1px2[4], lz1, lz2; 
+  double *x1, *x2, *x3;
+  int j, k,
+    pNumber = ti->pNumber,
+    rNumber = ti->rNumber,
+    qNumber = ti->qNumber;
+ 
+  x3  = &lVector[4 * (pNumber  - mxtips)];  
+ 
+
+  switch(ti->tipCase)
+    {
+    case TIP_TIP:     
+      x1 = &(tipVector[4 * yVector[qNumber][i]]);
+      x2 = &(tipVector[4 * yVector[rNumber][i]]);    
+      break;
+    case TIP_INNER:     
+      x1 = &(tipVector[4 * yVector[qNumber][i]]);
+      x2 = &lVector[4 * (rNumber - mxtips)];           
+      break;
+    case INNER_INNER:            
+      x1 = &lVector[4 * (qNumber - mxtips)];
+      x2 = &lVector[4 * (rNumber - mxtips)];     
+      break;
+    default:
+      assert(0);
+    }
+     
+  lz1 = qz * ki;  
+  lz2 = rz * ki;
+  
+  for(j = 0; j < 3; j++)
+    {
+      d1[j] = 
+	x1[j + 1] * 
+	EXP(EIGN[j + 1] * lz1);
+      d2[j] = x2[j + 1] * EXP(EIGN[j + 1] * lz2);	    
+    }
+ 
+ 
+  for(j = 0; j < 4; j++)
+    {     
+      ump_x1 = x1[0];
+      ump_x2 = x2[0];
+      for(k = 0; k < 3; k++)
+	{
+	  ump_x1 += d1[k] * EI[j * 4 + k + 1];
+	  ump_x2 += d2[k] * EI[j * 4 + k + 1];
+	}
+      x1px2[j] = ump_x1 * ump_x2;
+    }
+  
+  for(j = 0; j < 4; j++)
+    x3[j] = 0.0;
+
+  for(j = 0; j < 4; j++)          
+    for(k = 0; k < 4; k++)	
+      x3[k] +=  x1px2[j] *  EV[4 * j + k];	   
+      
+  
+  if (x3[0] < minlikelihood && x3[0] > minusminlikelihood &&
+      x3[1] < minlikelihood && x3[1] > minusminlikelihood &&
+      x3[2] < minlikelihood && x3[2] > minusminlikelihood &&
+      x3[3] < minlikelihood && x3[3] > minusminlikelihood)
+    {	     
+      x3[0]   *= twotothe256;
+      x3[1]   *= twotothe256;
+      x3[2]   *= twotothe256;     
+      x3[3]   *= twotothe256;     
+      *eVector = *eVector + 1;
+    }	              
+
+  return;
+}
+
+
+
+
+
+
+
+
+static double evaluatePartialGTRCAT(int i, double ki, int counter,  traversalInfo *ti, double qz,
+				    int w, double *EIGN, double *EI, double *EV,
+				    double *tipVector, unsigned  char **yVector, 
+				    int branchReference, int mxtips)
+{
+  double lz, term;       
+  double  d[3];
+  double   *x1, *x2; 
+  int scale = 0, k;
+  double *lVector = (double *)malloc_aligned(sizeof(double) * 4 * mxtips);    
+
+  traversalInfo *trav = &ti[0];
+ 
+  assert(isTip(trav->pNumber, mxtips));
+     
+  x1 = &(tipVector[4 *  yVector[trav->pNumber][i]]);   
+
+  for(k = 1; k < counter; k++)    
+    {
+      double 
+	qz = ti[k].qz[branchReference],
+	rz = ti[k].rz[branchReference];
+      
+      qz = (qz > zmin) ? log(qz) : log(zmin);
+      rz = (rz > zmin) ? log(rz) : log(zmin);
+
+      computeVectorGTRCAT(lVector, &scale, ki, i, qz, rz, &ti[k], 
+			  EIGN, EI, EV, 
+			  tipVector, yVector, mxtips);       
+    }
+   
+  x2 = &lVector[4 * (trav->qNumber - mxtips)]; 
+
+  assert(0 <=  (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);  
+       
+  if(qz < zmin) 
+    lz = zmin;
+  lz  = log(qz); 
+  lz *= ki;  
+  
+  d[0] = EXP (EIGN[1] * lz);
+  d[1] = EXP (EIGN[2] * lz);
+  d[2] = EXP (EIGN[3] * lz);       	   
+  
+  term =  x1[0] * x2[0];
+  term += x1[1] * x2[1] * d[0];
+  term += x1[2] * x2[2] * d[1];
+  term += x1[3] * x2[3] * d[2];     
+
+  term = LOG(FABS(term)) + (scale * LOG(minlikelihood));   
+
+  term = term * w;
+
+  free(lVector);  
+
+  return  term;
+}
+
+
+
+#endif
diff --git a/examl/globalVariables.h b/examl/globalVariables.h
new file mode 100644
index 0000000..7080dc9
--- /dev/null
+++ b/examl/globalVariables.h
@@ -0,0 +1,180 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+
+int processes;
+double *globalResult;
+
+
+int processID;
+infoList iList;
+
+int Thorough = 0;
+
+checkPointState ckp;
+
+char run_id[128] = "", 
+  workdir[1024] = "", 
+  seq_file[1024] = "", 
+  tree_file[1024]="", 
+  weightFileName[1024] = "",   
+  resultFileName[1024] = "", 
+  logFileName[1024] = "",   
+  infoFileName[1024] = "", 
+  randomFileName[1024] = "",     
+  proteinModelFileName[1024] = "", 
+  binaryCheckpointName[1024] = "",
+  binaryCheckpointInputName[1024] = "",
+  byteFileName[1024] = "",
+  modelFileName[1024] = "",
+  treeFileName[1024] = "",
+  quartetGroupingFileName[1024],
+  quartetFileName[1024];
+
+char *protModels[NUM_PROT_MODELS] = {"DAYHOFF", "DCMUT", "JTT", "MTREV", "WAG", "RTREV", "CPREV", "VT", "BLOSUM62", "MTMAM", "LG", "MTART", "MTZOA", "PMB", 
+				     "HIVB", "HIVW", "JTTDCMUT", "FLU", "STMTREV", "AUTO", "LG4M", "LG4X", "GTR"};
+
+const char inverseMeaningBINARY[4] = {'_', '0', '1', '-'};
+const char inverseMeaningDNA[16]   = {'_', 'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', '-'};
+const char inverseMeaningPROT[23]  = {'A','R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 
+			       'T', 'W', 'Y', 'V', 'B', 'Z', '-'};
+const char inverseMeaningGeneric32[33] = {'0', '1', '2', '3', '4', '5', '6', '7', 
+				    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+				    'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+				    'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+				    '-'};
+const char inverseMeaningGeneric64[33] = {'0', '1', '2', '3', '4', '5', '6', '7', 
+				    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+				    'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+				    'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+				    '-'};
+
+const unsigned int bitVectorIdentity[256] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,25 ,26 ,
+					     27 ,28 ,29 ,30 ,31 ,32 ,33 ,34 ,35 ,36 ,37 ,38 ,39 ,40 ,41 ,42 ,43 ,44 ,45 ,46 ,47 ,48 ,49 ,50 ,51 ,
+					     52 ,53 ,54 ,55 ,56 ,57 ,58 ,59 ,60 ,61 ,62 ,63 ,64 ,65 ,66 ,67 ,68 ,69 ,70 ,71 ,72 ,73 ,74 ,75 ,76 ,
+					     77 ,78 ,79 ,80 ,81 ,82 ,83 ,84 ,85 ,86 ,87 ,88 ,89 ,90 ,91 ,92 ,93 ,94 ,95 ,96 ,97 ,98 ,99 ,100 ,101 ,
+					     102 ,103 ,104 ,105 ,106 ,107 ,108 ,109 ,110 ,111 ,112 ,113 ,114 ,115 ,116 ,117 ,118 ,119 ,120 ,121 ,122 ,
+					     123 ,124 ,125 ,126 ,127 ,128 ,129 ,130 ,131 ,132 ,133 ,134 ,135 ,136 ,137 ,138 ,139 ,140 ,141 ,142 ,143 ,
+					     144 ,145 ,146 ,147 ,148 ,149 ,150 ,151 ,152 ,153 ,154 ,155 ,156 ,157 ,158 ,159 ,160 ,161 ,162 ,163 ,164 ,
+					     165 ,166 ,167 ,168 ,169 ,170 ,171 ,172 ,173 ,174 ,175 ,176 ,177 ,178 ,179 ,180 ,181 ,182 ,183 ,184 ,185 ,
+					     186 ,187 ,188 ,189 ,190 ,191 ,192 ,193 ,194 ,195 ,196 ,197 ,198 ,199 ,200 ,201 ,202 ,203 ,204 ,205 ,206 ,
+					     207 ,208 ,209 ,210 ,211 ,212 ,213 ,214 ,215 ,216 ,217 ,218 ,219 ,220 ,221 ,222 ,223 ,224 ,225 ,226 ,227 ,
+					     228 ,229 ,230 ,231 ,232 ,233 ,234 ,235 ,236 ,237 ,238 ,239 ,240 ,241 ,242 ,243 ,244 ,245 ,246 ,247 ,248 ,
+					     249 ,250 ,251 ,252 ,253 ,254 ,255};
+
+
+
+const unsigned int bitVectorAA[23] = {1, 2, 4, 8, 16, 32, 64, 128, 
+				      256, 512, 1024, 2048, 4096, 
+				      8192, 16384, 32768, 65536, 131072, 262144, 
+				      524288, 12 /* N | D */, 96 /*Q | E*/, 1048575 /* - */};
+
+const unsigned int bitVectorSecondary[256] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
+					      10, 11, 12, 13, 14, 15, 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 
+					      208, 224, 240, 0, 17, 34, 51, 68, 85, 102, 119, 136, 153, 170, 187, 204, 221, 238, 
+					      255, 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328, 
+					      3584, 3840, 0, 257, 514, 771, 1028, 1285, 1542, 1799, 2056, 2313, 2570, 2827, 3084, 
+					      3341, 3598, 3855, 0, 272, 544, 816, 1088, 1360, 1632, 1904, 2176, 2448, 2720, 2992, 
+					      3264, 3536, 3808, 4080, 0, 273, 546, 819, 1092, 1365, 1638, 1911, 2184, 2457, 2730, 
+					      3003, 3276, 3549, 3822, 4095, 0, 4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768, 
+					      36864, 40960, 45056, 49152, 53248, 57344, 61440, 0, 4097, 8194, 12291, 16388, 20485, 24582, 
+					      28679, 32776, 36873, 40970, 45067, 49164, 53261, 57358, 61455, 0, 4112, 8224, 12336, 16448, 
+					      20560, 24672, 28784, 32896, 37008, 41120, 45232, 49344, 53456, 57568, 61680, 0, 4113, 8226, 
+					      12339, 16452, 20565, 24678, 28791, 32904, 37017, 41130, 45243, 49356, 53469, 57582, 61695, 
+					      0, 4352, 8704, 13056, 17408, 21760, 26112, 30464, 34816, 39168, 43520, 47872, 52224, 56576, 
+					      60928, 65280, 0, 4353, 8706, 13059, 17412, 21765, 26118, 30471, 34824, 39177, 43530, 47883, 
+					      52236, 56589, 60942, 65295, 0, 4368, 8736, 13104, 17472, 21840, 26208, 30576, 34944, 39312, 
+					      43680, 48048, 52416, 56784, 61152, 65520, 0, 4369, 8738, 13107, 17476, 21845, 26214, 30583, 
+					      34952, 39321, 43690, 48059, 52428, 56797, 61166, 65535};
+
+const unsigned int bitVector32[33] = {1,     2,    4,    8,   16,   32,    64,   128,
+                                      256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
+                                      65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
+                                      16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824, 2147483648u, 
+				      4294967295u};
+
+/*const unsigned int bitVector64[65] = {};*/
+
+const unsigned int mask32[32] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 
+					262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 
+					268435456, 536870912, 1073741824, 2147483648U};
+
+const char *secondaryModelList[21] = { "S6A (GTR)", "S6B", "S6C", "S6D", "S6E", "S7A (GTR)", "S7B", "S7C", "S7D", "S7E", "S7F", "S16 (GTR)", "S16A", "S16B", "S16C", 
+				       "S16D", "S16E", "S16F", "S16I", "S16J", "S16K"};
+
+double masterTime;
+double accumulatedTime;
+int optimizeRateCategoryInvocations = 1;
+
+
+
+
+
+partitionLengths pLengths[MAX_MODEL] = {
+  
+  /* BINARY */
+  //{4,   4,   2,  4,  2, 1, 2,  8, 2, 2, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+  //eiLength changed from 2 -> 4
+  {4,   4,   2,  4,  4, 1, 2,  8, 2, 2, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+  
+  /* DNA */
+  {16,  16,  4, 16, 16, 6, 4, 64, 6, 4, FALSE, 15, inverseMeaningDNA, 4, FALSE, bitVectorIdentity},
+        
+  /* AA */
+  {400, 400, 20, 400, 400, 190, 20, 460, 190, 20, FALSE, 22, inverseMeaningPROT, 20, TRUE, bitVectorAA},
+  
+  /* SECONDARY_DATA */
+
+  {256, 256, 16, 256, 256, 120, 16, 4096, 120, 16, FALSE, 255, (char*)NULL, 16, TRUE, bitVectorSecondary},
+
+  
+  /* SECONDARY_DATA_6 */
+  {36, 36,  6, 36, 36, 15, 6, 384, 15, 6, FALSE, 63, (char*)NULL, 6, TRUE, bitVectorIdentity},
+
+  
+  /* SECONDARY_DATA_7 */
+  {49,   49,    7,   49, 49,  21, 7, 896, 21, 7, FALSE, 127, (char*)NULL, 7, TRUE, bitVectorIdentity},
+
+  /* 32 states */
+  {1024, 1024, 32, 1024, 1024, 496, 32, 1056, 496, 32, FALSE, 32, inverseMeaningGeneric32, 32, TRUE, bitVector32},
+  
+  /* 64 states */
+  {4096, 4096, 64, 4096, 4096, 2016, 64, 4160, 64, 2016, FALSE, 64, (char*)NULL, 64, TRUE, (unsigned int*)NULL}
+};
+
+partitionLengths pLength;
+
+     
+
+
+
+
+
diff --git a/examl/makenewzGenericSpecial.c b/examl/makenewzGenericSpecial.c
new file mode 100644
index 0000000..6d80c67
--- /dev/null
+++ b/examl/makenewzGenericSpecial.c
@@ -0,0 +1,2747 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with
+ *  thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+/*#include <tmmintrin.h>*/
+#endif
+
+/* includes MIC-optimized functions */
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+
+/* pointers to reduction buffers for storing and gathering the first and second derivative 
+   of the likelihood in Pthreads and MPI */
+
+
+extern int processID;
+extern const unsigned int mask32[32];
+
+/*******************/
+
+
+/* generic function to get the required pointers to the data associated with the left and right node that define a branch */
+
+static void getVects(tree *tr, unsigned char **tipX1, unsigned char **tipX2, double **x1_start, double **x2_start, int *tipCase, int model,
+		     double **x1_gapColumn, double **x2_gapColumn, unsigned int **x1_gap, unsigned int **x2_gap, size_t offset)
+{
+  int    
+    rateHet = (int)discreteRateCategories(tr->rateHetModel),
+    states = tr->partitionData[model].states,
+    span = rateHet * states;
+  
+  size_t
+    x_offset = offset * (size_t)span;
+
+  int
+    pNumber, 
+    qNumber; 
+    
+  /* get the left and right node number of the nodes defining the branch we want to optimize */
+ 
+  pNumber = tr->td[0].ti[0].pNumber;
+  qNumber = tr->td[0].ti[0].qNumber;
+   
+
+  /* initialize to NULL */
+
+  *x1_start = (double*)NULL,
+  *x2_start = (double*)NULL;
+  *tipX1 = (unsigned char*)NULL,
+  *tipX2 = (unsigned char*)NULL;
+
+  /* switch over the different tip cases again here */
+
+  if(isTip(pNumber, tr->mxtips) || isTip(qNumber, tr->mxtips))
+    {      
+      if(!( isTip(pNumber, tr->mxtips) && isTip(qNumber, tr->mxtips)) )
+	{
+	  *tipCase = TIP_INNER;
+	  if(isTip(qNumber, tr->mxtips))
+	    {
+	      *tipX1 = tr->partitionData[model].yVector[qNumber] + offset;
+	      *x2_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+	      
+	      if(tr->saveMemory)
+		{
+		  *x2_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+		  *x2_gapColumn   = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet];  
+		}
+	    }
+	  else
+	    {
+	      *tipX1 = tr->partitionData[model].yVector[pNumber] + offset;
+	      *x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+	      
+	      if(tr->saveMemory)
+		{
+		  *x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+		  *x2_gapColumn   = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet];
+		}
+	    }
+	}
+      else
+	{
+	  /* note that tip tip should normally not occur since this means that we are trying to optimize 
+	     a branch in a two-taxon tree. However, this has been inherited be some RAxML function 
+	     that optimized pair-wise distances between all taxa in a tree */
+
+	  *tipCase = TIP_TIP;
+	  *tipX1 = tr->partitionData[model].yVector[pNumber] + offset;
+	  *tipX2 = tr->partitionData[model].yVector[qNumber] + offset;
+	}
+    }
+  else
+    {
+      *tipCase = INNER_INNER;
+
+      *x1_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+      *x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+      
+      if(tr->saveMemory)
+	{
+	  *x1_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+	  *x1_gapColumn   = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet]; 
+      
+	  *x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+	  *x2_gapColumn   = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet]; 
+	}
+    }
+
+}
+
+
+/* this is actually a pre-computation and storage of values that remain constant while we change the value of the branch length 
+   we want to adapt. the target pointer sumtable is a single pre-allocated array that has the same 
+   size as a conditional likelihood vector at an inner node.
+
+   So if we want to do a Newton-Rpahson optimization we only execute this function once in the beginning for each new branch we are considering !
+*/
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+static void sumCAT_FLEX(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			unsigned char *tipX1, unsigned char *tipX2, int n, const int states)
+{
+  int 
+    i, 
+    l;
+  
+  double 
+    *sum, 
+    *left, 
+    *right;
+
+  switch(tipCase)
+    {
+      
+      /* switch over possible configurations of the nodes p and q defining the branch */
+
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[states * tipX1[i]]);
+	  right = &(tipVector[states * tipX2[i]]);
+	  sum = &sumtable[states * i];
+
+	  /* just multiply the values with each other for each site, note the similarity with evaluate() 
+	     we precompute the product which will remain constant and then just multiply this pre-computed 
+	     product with the changing P matrix exponentaions that depend on the branch lengths */
+
+	  for(l = 0; l < states; l++)
+	    sum[l] = left[l] * right[l];
+	}
+      break;
+    case TIP_INNER:
+
+      /* same as for TIP_TIP only that 
+	 we now access on tip vector and one 
+	 inner vector. 
+
+	 You may also observe that we do not consider using scaling vectors anywhere here.
+
+	 This is because we are interested in the first and second derivatives of the likelihood and 
+	 hence the addition of the log() of the scaling factor times the number of scaling events
+	 becomes obsolete through the derivative */
+
+      for (i = 0; i < n; i++)
+	{
+	  left = &(tipVector[states * tipX1[i]]);
+	  right = &x2[states * i];
+	  sum = &sumtable[states * i];
+
+	  for(l = 0; l < states; l++)
+	    sum[l] = left[l] * right[l];
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  left  = &x1[states * i];
+	  right = &x2[states * i];
+	  sum = &sumtable[states * i];
+
+	  for(l = 0; l < states; l++)
+	    sum[l] = left[l] * right[l];
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+/* same thing for GAMMA models. The only noteworthy thing here is that we have an additional inner loop over the 
+   number of discrete gamma rates. The data access pattern is also different since for tip vector accesses through our 
+   lookup table, we do not distnguish between rates 
+
+   Note the different access pattern in TIP_INNER:
+
+   left = &(tipVector[states * tipX1[i]]);	  
+   right = &(x2[span * i + l * states]);
+
+*/
+
+static void sumGAMMA_FLEX(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			  unsigned char *tipX1, unsigned char *tipX2, int n, const int states)
+{
+  int 
+    i, 
+    l, 
+    k;
+  
+  const int 
+    span = 4 * states;
+
+  double 
+    *left, 
+    *right, 
+    *sum;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for(i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[states * tipX1[i]]);
+	  right = &(tipVector[states * tipX2[i]]);
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      sum = &sumtable[i * span + l * states];
+
+	      for(k = 0; k < states; k++)
+		sum[k] = left[k] * right[k];
+
+	    }
+	}
+      break;
+    case TIP_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  left = &(tipVector[states * tipX1[i]]);
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      right = &(x2[span * i + l * states]);
+	      sum = &sumtable[i * span + l * states];
+
+	      for(k = 0; k < states; k++)
+		sum[k] = left[k] * right[k];
+
+	    }
+	}
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  for(l = 0; l < 4; l++)
+	    {
+	      left  = &(x1[span * i + l * states]);
+	      right = &(x2[span * i + l * states]);
+	      sum   = &(sumtable[i * span + l * states]);
+
+
+	      for(k = 0; k < states; k++)
+		sum[k] = left[k] * right[k];
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+#endif
+
+/* optimized functions for branch length optimization */
+
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+static void sumGAMMA_BINARY(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+                            unsigned char *tipX1, unsigned char *tipX2, int n);
+static void coreGTRGAMMA_BINARY(const int upper, double *sumtable,
+                                volatile double *d1,   volatile double *d2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+static void coreGTRCAT_BINARY(int upper, int numberOfCategories, double *sum,
+                              volatile double *d1, volatile double *d2, 
+                              double *rptr, double *EIGN, int *cptr, double lz, int *wgt);
+static void sumCAT_BINARY(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+                          unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumCAT_SAVE(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+			unsigned char *tipX1, unsigned char *tipX2, int n, double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMA_GAPPED_SAVE(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+				 unsigned char *tipX1, unsigned char *tipX2, int n, 
+				 double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMA(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+		     unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumCAT(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+		   unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGAMMAPROT_GAPPED_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+				     unsigned char *tipX1, unsigned char *tipX2, int n, 
+				     double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMAPROT_LG4(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector[4],
+			     unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGAMMAPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			 unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGTRCATPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			  unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGTRCATPROT_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			       unsigned char *tipX1, unsigned char *tipX2, int n, 
+			       double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void coreGTRGAMMAPROT_LG4(double *gammaRates, double *EIGN[4], double *sumtable, int upper, int *wrptr,
+				 volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double lz, double *weights);
+
+static void coreGTRGAMMA(const int upper, double *sumtable,
+			 volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt);
+
+static void coreGTRCAT(int upper, int numberOfCategories, double *sum,
+		       volatile double *d1, volatile double *d2, int *wgt,
+		       double *rptr, double *EIGN, int *cptr, double lz);
+
+
+static void coreGTRGAMMAPROT(double *gammaRates, double *EIGN, double *sumtable, int upper, int *wgt,
+			     volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double lz);
+
+static void coreGTRCATPROT(double *EIGN, double lz, int numberOfCategories, double *rptr, int *cptr, int upper,
+			   int *wgt,  volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *sumtable);
+
+#endif
+
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+/* now this is the core function of the newton-Raphson based branch length optimization that actually computes 
+   the first and second derivative of the likelihood given a new proposed branch length lz */
+
+
+static void coreCAT_FLEX(int upper, int numberOfCategories, double *sum,
+			 volatile double *d1, volatile double *d2, int *wgt,
+			 double *rptr, double *EIGN, int *cptr, double lz, const int states)
+{
+  int 
+    i, 
+    l;
+  
+  double 
+    *d, 
+    
+    /* arrays to store stuff we can pre-compute */
+
+    *d_start = (double *)malloc_aligned(numberOfCategories * states * sizeof(double)),
+    *e =(double *)malloc_aligned(states * sizeof(double)),
+    *s = (double *)malloc_aligned(states * sizeof(double)),
+    *dd = (double *)malloc_aligned(states * sizeof(double)),
+    inv_Li, 
+    dlnLidlz, 
+    d2lnLidlz2,
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0;
+
+  d = d_start;
+  
+  e[0] = 0.0;
+  s[0] = 0.0; 
+  dd[0] = 0.0;
+
+
+  /* we are pre-computing values for computing the first and second derivative of P(lz)
+     since this requires an exponetial that the only thing we really have to derive here */
+
+  for(l = 1; l < states; l++)
+    { 
+      s[l]  = EIGN[l];
+      e[l]  = EIGN[l] * EIGN[l];     
+      dd[l] = s[l] * lz;
+    }
+
+  /* compute the P matrices and their derivatives for 
+     all per-site rate categories */
+
+  for(i = 0; i < numberOfCategories; i++)
+    {      
+      d[states * i] = 1.0;
+      for(l = 1; l < states; l++)
+	d[states * i + l] = EXP(dd[l] * rptr[i]);
+    }
+
+
+  /* now loop over the sites in this partition to obtain the per-site 1st and 2nd derivatives */
+
+  for (i = 0; i < upper; i++)
+    {    
+      double 
+	r = rptr[cptr[i]],
+	wr1 = r * wgt[i],
+	wr2 = r * r * wgt[i];
+
+      /* get the correct p matrix for the rate at the current site i */
+      
+      d = &d_start[states * cptr[i]];      
+          
+      /* this is the likelihood at site i, NOT the log likelihood, we don't need the log 
+	 likelihood to compute derivatives ! */
+
+      inv_Li     = sum[states * i]; 
+      
+      /* those are for storing the first and second derivative of the Likelihood at site i */
+
+      dlnLidlz   = 0.0;
+      d2lnLidlz2 = 0.0;
+
+      /* now multiply the likelihood and the first and second derivative with the 
+	 appropriate derivatives of P(lz) */
+
+      for(l = 1; l < states; l++)
+	{
+	  double
+	    tmpv = d[l] * sum[states * i + l];
+	  
+	  inv_Li     += tmpv;	 	  
+	  dlnLidlz   += tmpv * s[l];       
+	  d2lnLidlz2 += tmpv * e[l];
+	}     
+      
+      /* below we are implementing the other mathematical operations that are required 
+	 to obtain the deirivatives */
+
+      inv_Li = 1.0/ FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      /* under the CAT model, wrptr[] and wr2ptr[] are pre-computed extension sof the weight pointer:
+	 wrptr[i]  = wgt[i] * rptr[cptr[i]].
+	 and 
+	 wr2ptr[i]  = wgt[i] * rptr[cptr[i]] * rptr[cptr[i]] 
+
+	 this is also something that is required for the derivatives because when computing the 
+	 derivative of the exponential() the rate must be multiplied with the 
+	 exponential 
+
+	 wgt is just the pattern site wieght 
+      */
+
+      /* compute the accumulated first and second derivatives of this site */
+
+      dlnLdlz  += wr1 * dlnLidlz;
+      d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  /* 
+     set the result values, i.e., the sum of the per-site first and second derivatives of the likelihood function 
+     for this partition. 
+   */
+
+  *d1  = dlnLdlz;
+  *d2 = d2lnLdlz2;
+
+  /* free the temporary arrays */
+
+  free(d_start);
+  free(e);
+  free(s);
+  free(dd);
+}
+
+static void coreGAMMA_FLEX(int upper, double *sumtable, volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, 
+			   double *EIGN, double *gammaRates, double lz, int *wgt, const int states)
+{
+   double  
+    *sum, 
+     diagptable[1024], /* TODO make this dynamic */
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0,
+    ki, 
+    kisqr,
+    tmp,
+    inv_Li, 
+    dlnLidlz, 
+    d2lnLidlz2;
+
+  int     
+    i, 
+    j, 
+    l;  
+
+  const int 
+    gammaStates = 4 * states;
+
+  /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+  for(i = 0; i < 4; i++)
+    {
+      ki = gammaRates[i];
+      kisqr = ki * ki;
+
+      for(l = 1; l < states; l++)
+	{
+	  diagptable[i * gammaStates + l * 4]     = EXP(EIGN[l] * ki * lz);
+	  diagptable[i * gammaStates + l * 4 + 1] = EIGN[l] * ki;
+	  diagptable[i * gammaStates + l * 4 + 2] = EIGN[l] * EIGN[l] * kisqr;
+	}
+    }
+
+  /* loop over sites in this partition */
+
+  for (i = 0; i < upper; i++)
+    {
+      double 
+	r = rptr[cptr[i]],
+	wr1 = r * wgt[i],
+	wr2 = r * r * wgt[i];
+
+      /* access the array with pre-computed values */
+      sum = &sumtable[i * gammaStates];
+
+      /* initial per-site likelihood and 1st and 2nd derivatives */
+
+      inv_Li   = 0.0;
+      dlnLidlz = 0.0;
+      d2lnLidlz2 = 0.0;
+
+      /* loop over discrete GAMMA rates */
+
+      for(j = 0; j < 4; j++)
+	{
+	  inv_Li += sum[j * states];
+
+	  for(l = 1; l < states; l++)
+	    {
+	      inv_Li     += (tmp = diagptable[j * gammaStates + l * 4] * sum[j * states + l]);
+	      dlnLidlz   +=  tmp * diagptable[j * gammaStates + l * 4 + 1];
+	      d2lnLidlz2 +=  tmp * diagptable[j * gammaStates + l * 4 + 2];
+	    }
+	}
+
+      /* finalize derivative computation */
+      /* note that wrptr[] here unlike in CAT above is the 
+	 integer weight vector of the current site 
+
+	 The operations:
+
+	 EIGN[l] * ki;
+	 EIGN[l] * EIGN[l] * kisqr;
+
+	 that are hidden in CAT in wrptr (at least the * ki and * ki *ki part of them 
+	 are done explicitely here 
+
+      */
+
+      inv_Li = 1.0 / FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz   += wr1 * dlnLidlz;
+      d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *ext_dlnLdlz   = dlnLdlz;
+  *ext_d2lnLdlz2 = d2lnLdlz2;
+  
+}
+
+#endif
+
+/* the function below is called only once at the very beginning of each Newton-Raphson procedure for optimizing barnch lengths.
+   It initially invokes an iterative newview call to get a consistent pair of vectors at the left and the right end of the 
+   branch and thereafter invokes the one-time only precomputation of values (sumtable) that can be re-used in each Newton-Raphson 
+   iteration. Once this function has been called we can execute the actual NR procedure */
+
+void makenewzIterative(tree *tr)
+{
+  /* call newvieIterative to get the likelihood arrays to the left and right of the branch */
+
+  newviewIterative(tr, 1);
+
+
+  /*
+     loop over all partoitions to do the precomputation of the sumTable buffer
+     This is analogous to the newviewIterative() and evaluateIterative()
+     implementations.
+   */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+  {
+    int
+      m,
+      model,
+      maxModel,
+      tipCase;
+
+#ifdef _USE_OMP
+    maxModel = tr->maxModelsPerThread;
+#else
+    maxModel = tr->NumberOfModels;
+#endif
+
+  double
+    *x1_start = (double*)NULL,
+    *x2_start = (double*)NULL;
+  
+  unsigned char
+    *tipX1,
+    *tipX2;
+
+  double
+    *x1_gapColumn = (double*)NULL,
+    *x2_gapColumn = (double*)NULL;
+  
+  unsigned int
+    *x1_gap = (unsigned int*)NULL,
+    *x2_gap = (unsigned int*)NULL;			      
+  
+
+  for(m = 0; m < maxModel; m++)
+    { 
+      size_t
+	width = 0,
+	offset = 0;
+
+#ifdef _USE_OMP
+      int
+	tid = omp_get_thread_num();
+
+      /* check if this thread should process this partition */
+      Assign* 
+	pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+      if(pAss)
+	{
+	  model  = pAss->partitionId;
+	  width  = pAss->width;
+	  offset = pAss->offset;
+	}
+      else
+	break;
+#else
+      model = m;
+      
+      /* number of sites in this partition */
+      width  = (size_t)tr->partitionData[model].width;
+      offset = 0;
+#endif
+
+      
+      if(tr->td[0].executeModel[model] && width > 0)
+	{
+	  int 	   
+	    rateHet = (int)discreteRateCategories(tr->rateHetModel),
+
+	    /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+	    states = tr->partitionData[model].states,
+
+	    /* span for single alignment site (in doubles!) */
+	    span = rateHet * states;
+	  
+	  size_t
+	    /* offset for current thread's data in global xVector (in doubles!) */
+	    x_offset = offset * (size_t)span;
+	  
+	  getVects(tr, &tipX1, &tipX2, &x1_start, &x2_start, &tipCase, model, &x1_gapColumn, &x2_gapColumn, &x1_gap, &x2_gap, offset);
+
+	  double
+	    *sumBuffer = tr->partitionData[model].sumBuffer + x_offset;
+	 
+#ifndef _OPTIMIZED_FUNCTIONS
+	  assert(!tr->saveMemory);
+	  if(tr->rateHetModel == CAT)
+	    sumCAT_FLEX(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+			width, states);
+	  else
+	    sumGAMMA_FLEX(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+			  width, states);
+#else
+	  switch(states)
+	    {
+	    case 2:
+#ifdef __MIC_NATIVE
+ 	      assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+	      assert(!tr->saveMemory);
+	      if(tr->rateHetModel == CAT)
+		sumCAT_BINARY(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+			      tipX1, tipX2, width);
+	      else
+		sumGAMMA_BINARY(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+				tipX1, tipX2, width);
+#endif
+	      break;
+	    case 4: /* DNA */
+	      if(tr->rateHetModel == CAT)
+		{
+		  if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+		    sumCAT_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+				width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		   else
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+			sumCAT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+			   width);
+#endif
+		}
+	      else
+		{
+		  if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ 		      assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+			  sumGAMMA_GAPPED_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+					 width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		  else
+#ifdef __MIC_NATIVE
+             sumGAMMA_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+	                width);
+#else
+			  sumGAMMA(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+			     width);
+#endif
+		}
+	      break;		
+	    case 20: /* proteins */
+	      if(tr->rateHetModel == CAT)
+		{
+		  if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+			  sumGTRCATPROT_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+				       tipX1, tipX2, width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		  else	      	      
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+			  sumGTRCATPROT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+				  tipX1, tipX2, width);
+#endif
+		}
+	      else
+		{
+		  if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ 		    assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+		    sumGAMMAPROT_GAPPED_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+					     width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+		  else
+		    {
+		      if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)		      			   		
+#ifdef __MIC_NATIVE
+			sumGAMMAPROT_LG4_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+					     width);
+#else
+		      sumGAMMAPROT_LG4(tipCase,  sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector_LG4,
+				       tipX1, tipX2, width);
+#endif
+		      else
+#ifdef __MIC_NATIVE
+            sumGAMMAPROT_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+		                  width);
+#else
+			sumGAMMAPROT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+				     tipX1, tipX2, width);
+#endif
+		    }
+		   
+		}
+	      break;		
+	    default:
+	      assert(0);
+	    }
+#endif
+	}
+    }  // for model
+  }  // omp parallel region
+}
+
+
+
+/* this function actually computes the first and second derivatives of the likelihood for a given branch stored 
+   in tr->coreLZ[model] Note that in the parallel case coreLZ must always be broadcasted together with the 
+   traversal descriptor, at least for optimizing branch lengths */
+
+void execCore(tree *tr, volatile double *_dlnLdlz, volatile double *_d2lnLdlz2)
+{
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+  {
+    int
+      m,
+      model,
+      maxModel,
+      branchIndex;
+
+#ifdef _USE_OMP
+    int
+      tid = omp_get_thread_num(),
+      nModels = (tr->numBranches > 1) ? tr->NumberOfModels : 1,
+      p;
+
+    /* Clear reduction buffers: since in OMP version each thread works only on a subset of partitions,
+     * and their order is arbitrary, it's easier to perform this initialization before the main loop,
+     * just to be on the safe side. */
+    for(p = 0; p < nModels; p++)
+      {
+	tr->partitionData[p].reductionBuffer[tid] = 0.;
+	tr->partitionData[p].reductionBuffer2[tid] = 0.;
+      }
+
+    maxModel = tr->maxModelsPerThread;
+#else
+    maxModel = tr->NumberOfModels;
+#endif
+
+    double lz;
+    /*  double
+      buffer_dlnLdlz[NUM_BRANCHES],
+      buffer_d2lnLdlz2[NUM_BRANCHES];*/
+
+    /* loop over partitions */
+    for(m = 0; m < maxModel; m++)
+      {
+	size_t
+	  width = 0,
+	  offset = 0;
+
+#ifdef _USE_OMP
+	/* check if this thread should process this partition */
+	Assign* pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+	if (pAss)
+	{
+	  model  = pAss->partitionId;
+	  width  = GET_PADDED_WIDTH(pAss->width);
+	  offset = pAss->offset;
+	}
+	else
+	  break;
+#else
+	model = m;
+
+	/* number of sites in this partition */
+	width  = (size_t)tr->partitionData[model].width;
+	offset = 0;
+#endif
+
+	volatile double
+	  *d1acc   = (double*) NULL,
+	  *d2acc    = (double*) NULL;
+
+	  /* figure out if we are optimizing branch lengths individually per partition or jointly across
+	     all partitions. If we do this on a per partition basis, we also need to compute and store
+	     the per-partition derivatives of the likelihood separately, otherwise not */
+
+	if(tr->numBranches > 1)
+	  {
+	    branchIndex = model;
+	    lz = tr->td[0].parameterValues[model];
+	  }
+	else
+	  {
+	    branchIndex = 0;
+	    lz = tr->td[0].parameterValues[0];
+	  }
+
+#ifdef _USE_OMP
+	d1acc = &tr->partitionData[branchIndex].reductionBuffer[tid];
+	d2acc = &tr->partitionData[branchIndex].reductionBuffer2[tid];
+#else
+	d1acc = &_dlnLdlz[branchIndex];
+	d2acc = &_d2lnLdlz2[branchIndex];
+
+	/* We need to reset accumulated derivative values in two cases: a) per-partition derivatives or
+	 * b) joint derivatives AND we're processing the first partition */
+	if (branchIndex == model)
+	{
+	  *d1acc = 0.0;
+	  *d2acc = 0.0;
+	}
+#endif
+
+	/* check if we (the present thread for instance) needs to compute something at
+	    all for the present partition */
+
+       if(tr->td[0].executeModel[model] && width > 0)
+	{
+	  int
+	    rateHet = (int)discreteRateCategories(tr->rateHetModel),
+
+	    /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+	    states = tr->partitionData[model].states,
+
+	    /* span for single alignment site (in doubles!) */
+	    span = rateHet * states,
+
+	    /* offset for current thread's data in global xVector (in doubles!) */
+	    x_offset = offset * span,
+
+	    /* integer weight vector with pattern compression weights */
+	    *wgt = tr->partitionData[model].wgt + offset,
+
+	    /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+	    *rateCategory = tr->partitionData[model].rateCategory + offset;
+
+	  /* set a pointer to the part of the pre-computed sumBuffer we are going to access */
+	  double 
+	    *weights         = tr->partitionData[model].weights,
+	    *sumBuffer = tr->partitionData[model].sumBuffer + x_offset;
+
+	  volatile double
+	    dlnLdlz   = 0.0,
+	    d2lnLdlz2 = 0.0;
+
+  #ifndef _OPTIMIZED_FUNCTIONS
+
+	    /* compute first and second derivatives with the slow generic functions */
+
+	    if(tr->rateHetModel == CAT)
+	      coreCAT_FLEX(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+			   &dlnLdlz, &d2lnLdlz2, wgt,
+			   tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN, rateCategory, lz, states);
+	    else
+	      coreGAMMA_FLEX(width, sumBuffer,
+			     &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz,
+			     wgt, states);
+  #else
+	    switch(states)
+	      {
+	      case 2:
+#ifdef __MIC_NATIVE
+ 	      assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+		assert(!tr->saveMemory);
+		if(tr->rateHetModel == CAT)
+		  coreGTRCAT_BINARY(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+				    &dlnLdlz, &d2lnLdlz2,
+				    tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN,  rateCategory,
+				    lz, wgt);
+		else
+		   coreGTRGAMMA_BINARY(width, sumBuffer,
+				       &dlnLdlz, &d2lnLdlz2, 
+				       tr->partitionData[model].EIGN, 
+				       tr->partitionData[model].gammaRates, lz, wgt);
+#endif
+		break;
+	      case 4: /* DNA */
+		if(tr->rateHetModel == CAT)
+  #ifdef __MIC_NATIVE
+			       assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+  #else
+			    coreGTRCAT(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+					    &dlnLdlz, &d2lnLdlz2, wgt,
+					    tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN,  rateCategory, lz);
+  #endif
+		else
+  #ifdef __MIC_NATIVE
+			    coreGTRGAMMA_MIC(width, sumBuffer,
+			     &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+  #else
+			    coreGTRGAMMA(width, sumBuffer,
+					    &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+  #endif
+
+		break;
+	      case 20: /* proteins */
+		if(tr->rateHetModel == CAT)
+  #ifdef __MIC_NATIVE
+			       assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+  #else
+			    coreGTRCATPROT(tr->partitionData[model].EIGN, lz, tr->partitionData[model].numberOfCategories,  tr->partitionData[model].perSiteRates,
+					    rateCategory, width,
+					    wgt,
+					    &dlnLdlz, &d2lnLdlz2,
+					    sumBuffer);
+  #endif
+		else
+		{
+		  if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+#ifdef __MIC_NATIVE
+		    coreGTRGAMMAPROT_LG4_MIC(width, sumBuffer,
+					     &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN_LG4, tr->partitionData[model].gammaRates,
+					     lz, wgt, weights);
+#else
+		  {
+		    //printf("model %d weights %f %f %f %f\n", model, weights[0], weights[1], weights[2], weights[3]);
+		  coreGTRGAMMAPROT_LG4(tr->partitionData[model].gammaRates, tr->partitionData[model].EIGN_LG4,
+				       sumBuffer, width, wgt,
+				       &dlnLdlz, &d2lnLdlz2, lz, weights);
+		  }
+#endif
+		    else
+#ifdef __MIC_NATIVE
+		      coreGTRGAMMAPROT_MIC(width, sumBuffer,
+					   &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+#else
+		  coreGTRGAMMAPROT(tr->partitionData[model].gammaRates, tr->partitionData[model].EIGN,
+				   sumBuffer, width, wgt,
+				   &dlnLdlz, &d2lnLdlz2, lz);
+#endif
+		}
+		break;
+	      default:
+		assert(0);
+	      }
+  #endif
+
+	    /* store first and second derivative */
+
+	    *d1acc += dlnLdlz;
+	    *d2acc += d2lnLdlz2;
+	  }
+	 else
+	  {
+	    /* set to 0 to make the reduction operation consistent */
+
+	    if(width == 0 && (tr->numBranches > 1))
+	      {
+		*d1acc   = 0.0;
+		*d2acc   = 0.0;
+	      }
+
+	    if(width > 0 && (tr->numBranches > 1))
+	      {
+		assert(tr->td[0].executeModel[model] == FALSE);
+		/* _dlnLdlz[model]   = 0.0;
+		   _d2lnLdlz2[model] = 0.0;*/
+	      }
+
+	  }
+      }  // for model
+  }  // omp parallel section
+
+#ifdef _USE_OMP
+  /* perform reduction of 1st and 2nd derivative values */
+  int
+    model,
+    tid;
+
+  int nModels = (tr->numBranches > 1) ? tr->NumberOfModels : 1;
+  for(model = 0; model < nModels; model++)
+  {
+    _dlnLdlz[model] = 0.0;
+    _d2lnLdlz2[model] = 0.0;
+
+    for(tid = 0; tid < tr->nThreads; tid++)
+      {
+      _dlnLdlz[model] += tr->partitionData[model].reductionBuffer[tid];
+      _d2lnLdlz2[model] += tr->partitionData[model].reductionBuffer2[tid];
+      }
+  }
+#endif
+}
+
+
+/* the function below actually implements the iterative Newton-Raphson procedure.
+   It is particularly messy and hard to read because for the case of per-partition branch length 
+   estimates it needs to keep track of whetehr the Newton Raphson procedure has 
+   converged for each partition individually. 
+
+   The rational efor doing it like this is also provided in:
+
+   
+   A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009,
+
+*/
+
+static void topLevelMakenewz(tree *tr, double *z0, int _maxiter, double *result)
+{
+  double   z[NUM_BRANCHES], zprev[NUM_BRANCHES], zstep[NUM_BRANCHES];
+  double  dlnLdlz[NUM_BRANCHES], d2lnLdlz2[NUM_BRANCHES];
+  int i, maxiter[NUM_BRANCHES], model;
+  boolean firstIteration = TRUE;
+  boolean outerConverged[NUM_BRANCHES];
+  boolean loopConverged;
+
+
+  /* figure out if this is on a per partition basis or jointly across all partitions */
+  
+
+
+  /* initialize loop convergence variables etc. 
+     maxiter is the maximum number of NR iterations we are going to do before giving up */
+
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      z[i] = z0[i];
+      maxiter[i] = _maxiter;
+      outerConverged[i] = FALSE;
+      tr->curvatOK[i]       = TRUE;
+    }
+
+
+  /* nested do while loops of Newton-Raphson */
+
+  do
+    {
+
+      /* check if we ar done for partition i or if we need to adapt the branch length again */
+
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  if(outerConverged[i] == FALSE && tr->curvatOK[i] == TRUE)
+	    {
+	      tr->curvatOK[i] = FALSE;
+	      zprev[i] = z[i];
+	      zstep[i] = (1.0 - zmax) * z[i] + zmin;
+	    }
+	}
+
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  /* other case, the outer loop hasn't converged but we are trying to approach 
+	     the maximum from the wrong side */
+	  
+	  if(outerConverged[i] == FALSE && tr->curvatOK[i] == FALSE)
+	    {
+	      double lz;
+
+	      if (z[i] < zmin) z[i] = zmin;
+	      else if (z[i] > zmax) z[i] = zmax;
+	      lz    = log(z[i]);
+
+	      tr->coreLZ[i] = lz;
+	    }
+	}
+
+
+      /* set the execution mask */
+
+      if(tr->numBranches > 1)
+	{
+	  assert(tr->numBranches == tr->NumberOfModels);
+	  
+	  for(model = 0; model < tr->NumberOfModels; model++)
+	    {
+	      if(tr->executeModel[model])
+		tr->executeModel[model] = !tr->curvatOK[model];
+	    }
+	}
+      else
+	{
+	  for(model = 0; model < tr->NumberOfModels; model++)
+	    tr->executeModel[model] = !tr->curvatOK[0];
+	}
+
+
+      /* store it in traversal descriptor */
+
+      storeExecuteMaskInTraversalDescriptor(tr); 
+
+      /* store the new branch length values to be tested in traversal descriptor */
+
+      storeValuesInTraversalDescriptor(tr, &(tr->coreLZ[0]));
+
+      /* sequential part, if this is the first newton-raphson implementation,
+	 do the precomputations as well, otherwise just execute the computation
+	 of the derivatives */
+
+      if(firstIteration)
+	{
+	  makenewzIterative(tr);
+	  firstIteration = FALSE;
+	}
+      
+      execCore(tr, dlnLdlz, d2lnLdlz2);
+
+      {
+	double 
+	  *send = (double *)malloc(sizeof(double) * tr->numBranches * 2),
+	  *recv = (double *)malloc(sizeof(double) * tr->numBranches * 2);		
+  
+	memcpy(&send[0],                dlnLdlz,   sizeof(double) * tr->numBranches);
+	memcpy(&send[tr->numBranches],  d2lnLdlz2, sizeof(double) * tr->numBranches);
+	
+#ifdef _USE_ALLREDUCE	  
+	/* the MPI_Allreduce implementation is apparently sometimes not deterministic */
+
+	MPI_Allreduce(send, recv, tr->numBranches * 2, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);	    	    
+#else
+	MPI_Reduce(send, recv, tr->numBranches * 2, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+	MPI_Bcast(recv,        tr->numBranches * 2, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+#endif   
+
+	memcpy(dlnLdlz,   &recv[0],               sizeof(double) * tr->numBranches);
+	memcpy(d2lnLdlz2, &recv[tr->numBranches], sizeof(double) * tr->numBranches);
+
+	free(send);
+	free(recv);
+      }
+     
+      /* do a NR step, if we are on the correct side of the maximum that's okay, otherwise 
+	 shorten branch */
+
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  if(outerConverged[i] == FALSE && tr->curvatOK[i] == FALSE)
+	    {
+	      if ((d2lnLdlz2[i] >= 0.0) && (z[i] < zmax))
+		zprev[i] = z[i] = 0.37 * z[i] + 0.63;  /*  Bad curvature, shorten branch */
+	      else
+		tr->curvatOK[i] = TRUE;
+	    }
+	}
+
+      /* do the standard NR step to obrain the next value, depending on the state for each partition */
+
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  if(tr->curvatOK[i] == TRUE && outerConverged[i] == FALSE)
+	    {
+	      if (d2lnLdlz2[i] < 0.0)
+		{
+		  double tantmp = -dlnLdlz[i] / d2lnLdlz2[i];
+		  if (tantmp < 100)
+		    {
+		      z[i] *= EXP(tantmp);
+		      if (z[i] < zmin)
+			z[i] = zmin;
+
+		      if (z[i] > 0.25 * zprev[i] + 0.75)
+			z[i] = 0.25 * zprev[i] + 0.75;
+		    }
+		  else
+		    z[i] = 0.25 * zprev[i] + 0.75;
+		}
+	      if (z[i] > zmax) z[i] = zmax;
+
+	      /* decrement the maximum number of itarations */
+
+	      maxiter[i] = maxiter[i] - 1;
+	      
+	      /* check if the outer loop has converged */
+	      
+	      //old code below commented out, integrated new PRELIMINARY BUG FIX !
+	      //this needs further work at some point!
+	      
+	      /*
+		if(maxiter[i] > 0 && (ABS(z[i] - zprev[i]) > zstep[i]))
+		outerConverged[i] = FALSE;
+		else
+		outerConverged[i] = TRUE;
+	      */
+	      
+	      if((ABS(z[i] - zprev[i]) > zstep[i]))
+		{
+		  /* We should make a more informed decision here,
+		     based on the log like improvement */
+
+		  if(maxiter[i] < -20)
+		    {
+		      z[i] = z0[i];
+		      outerConverged[i] = TRUE;
+		    }
+		  else
+		    outerConverged[i] = FALSE;
+		}
+	      else				
+		outerConverged[i] = TRUE;							    
+	    }
+	}
+
+      /* check if the loop has converged for all partitions */
+
+      loopConverged = TRUE;
+      for(i = 0; i < tr->numBranches; i++)
+	loopConverged = loopConverged && outerConverged[i];
+    }
+  while (!loopConverged);
+
+
+  /* reset  partition execution mask */
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    tr->executeModel[model] = TRUE;
+
+  /* copy the new branches in the result array of branches.
+     if we don't do a per partition estimate of 
+     branches this will only set result[0]
+  */
+  
+  for(i = 0; i < tr->numBranches; i++)    
+    result[i] = z[i]; 
+}
+
+/* function called from RAxML to optimize a given branch with current branch lengths z0 
+   between nodes p and q.
+   The new branch lengths will be stored in result */
+
+void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask)
+{
+  int 
+    i;
+  
+  /* the first entry of the traversal descriptor stores the node pair that defines 
+     the branch */
+
+  tr->td[0].ti[0].pNumber = p->number;
+  tr->td[0].ti[0].qNumber = q->number;
+  
+  for(i = 0; i < tr->numBranches; i++)
+    {     
+      tr->td[0].ti[0].qz[i] =  z0[i];
+      
+      if(mask)
+	{
+	  if(tr->partitionConverged[i])
+	    tr->executeModel[i] = FALSE;
+	  else
+	    tr->executeModel[i] = TRUE;
+	}
+      else
+	assert(tr->executeModel[i]);
+    }
+  
+
+  /* compute the traversal descriptor of the likelihood vectors  that need to be re-computed 
+     first in makenewzIterative */
+
+  tr->td[0].count = 1;
+  
+  if(!p->x)
+    computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+  if(!q->x)
+    computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE); 
+
+  /* call the Newton-Raphson procedure */
+  
+  topLevelMakenewz(tr, z0, maxiter, result);
+ 
+  /* fix eceuteModel this seems to be a bit redundant with topLevelMakenewz */ 
+
+  for(i = 0; i < tr->numBranches; i++)
+      tr->executeModel[i] = TRUE;
+}
+
+
+/* below are, once again the optimized functions */
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/**** binary ***/
+static void coreGTRCAT_BINARY(int upper, int numberOfCategories, double *sum,
+                              volatile double *d1, volatile double *d2, 
+                              double *rptr, double *EIGN, int *cptr, double lz, int *wgt)
+{
+  int i;
+  double
+    *d, *d_start,
+    tmp_0, inv_Li, dlnLidlz, d2lnLidlz2,
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0;
+  double e[2];
+  double dd1;
+
+  /*e[0] = EIGN[0];
+    e[1] = EIGN[0] * EIGN[0];*/
+  
+  e[0] = EIGN[1];
+  e[1] = EIGN[1] * EIGN[1];
+
+  d = d_start = (double *)malloc((size_t)numberOfCategories * sizeof(double));
+
+  dd1 = e[0] * lz;
+
+  for(i = 0; i < numberOfCategories; i++)
+    d[i] = exp(dd1 * rptr[i]);
+
+  for (i = 0; i < upper; i++)
+    {
+      double
+        r = rptr[cptr[i]],
+        wr1 = r * wgt[i],
+        wr2 = r * r * wgt[i];
+      
+      d = &d_start[cptr[i]];
+
+      inv_Li = sum[2 * i];
+      inv_Li += (tmp_0 = d[0] * sum[2 * i + 1]);
+
+      inv_Li = 1.0/fabs(inv_Li);
+
+      dlnLidlz   = tmp_0 * e[0];
+      d2lnLidlz2 = tmp_0 * e[1];
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz   += wr1 * dlnLidlz;
+      d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *d1 = dlnLdlz;
+  *d2 = d2lnLdlz2;
+
+  free(d_start);
+}
+
+static void coreGTRGAMMA_BINARY(const int upper, double *sumtable,
+                                volatile double *d1,   volatile double *d2, double *EIGN, double *gammaRates, double lz, int *wrptr)
+{
+  double 
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0,
+    ki, 
+    kisqr,  
+    inv_Li, 
+    dlnLidlz, 
+    d2lnLidlz2,  
+    *sum, 
+    diagptable0[8] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable1[8] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable2[8] __attribute__ ((aligned (BYTE_ALIGNMENT)));    
+    
+  int     
+    i, 
+    j;
+  
+  for(i = 0; i < 4; i++)
+    {
+      ki = gammaRates[i];
+      kisqr = ki * ki;
+      
+      diagptable0[i * 2] = 1.0;
+      diagptable1[i * 2] = 0.0;
+      diagptable2[i * 2] = 0.0;
+     
+      diagptable0[i * 2 + 1] = exp(EIGN[1] * ki * lz);
+      diagptable1[i * 2 + 1] = EIGN[1] * ki;
+      diagptable2[i * 2 + 1] = EIGN[1] * EIGN[1] * kisqr;    
+    }
+
+  for (i = 0; i < upper; i++)
+    { 
+      __m128d a0 = _mm_setzero_pd();
+      __m128d a1 = _mm_setzero_pd();
+      __m128d a2 = _mm_setzero_pd();
+
+      sum = &sumtable[i * 8];         
+
+      for(j = 0; j < 4; j++)
+        {                       
+          double           
+            *d0 = &diagptable0[j * 2],
+            *d1 = &diagptable1[j * 2],
+            *d2 = &diagptable2[j * 2];
+                         
+          __m128d tmpv = _mm_mul_pd(_mm_load_pd(d0), _mm_load_pd(&sum[j * 2]));
+          a0 = _mm_add_pd(a0, tmpv);
+          a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(d1)));
+          a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(d2)));
+                          
+        }
+
+      a0 = _mm_hadd_pd(a0, a0);
+      a1 = _mm_hadd_pd(a1, a1);
+      a2 = _mm_hadd_pd(a2, a2);
+
+      _mm_storel_pd(&inv_Li, a0);     
+      _mm_storel_pd(&dlnLidlz, a1);
+      _mm_storel_pd(&d2lnLidlz2, a2); 
+
+      inv_Li = 1.0 / fabs(inv_Li);
+     
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;     
+
+      dlnLdlz   += wrptr[i] * dlnLidlz;
+      d2lnLdlz2 += wrptr[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+ 
+  *d1   = dlnLdlz;
+  *d2 = d2lnLdlz2; 
+}
+
+
+static void sumGAMMA_BINARY(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+                            unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  double 
+    *x1, 
+    *x2, 
+    *sum;
+  
+  int 
+    i, 
+    j;
+
+  /* C-OPT once again switch over possible configurations at inner node */
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      /* C-OPT main for loop overt alignment length */
+      for (i = 0; i < n; i++)
+        {
+          x1 = &(tipVector[2 * tipX1[i]]);
+          x2 = &(tipVector[2 * tipX2[i]]);
+          sum = &sumtable[i * 8];
+
+          for(j = 0; j < 4; j++)
+            _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));         
+        }
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+        {
+          x1  = &(tipVector[2 * tipX1[i]]);
+          x2  = &x2_start[8 * i];
+          sum = &sumtable[8 * i];
+
+          for(j = 0; j < 4; j++)
+            _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[j * 2] )));
+        }
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+        {
+          x1  = &x1_start[8 * i];
+          x2  = &x2_start[8 * i];
+          sum = &sumtable[8 * i];
+
+          for(j = 0; j < 4; j++)
+            _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[j * 2] ), _mm_load_pd( &x2[j * 2] )));
+        }
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumCAT_BINARY(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+                          unsigned char *tipX1, unsigned char *tipX2, int n)
+
+{
+  int 
+    i;
+  
+  double 
+    *x1, 
+    *x2;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+        {
+          x1 = &(tipVector[2 * tipX1[i]]);
+          x2 = &(tipVector[2 * tipX2[i]]);
+
+          _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));
+        }
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+        {
+          x1 = &(tipVector[2 * tipX1[i]]);
+          x2 = &x2_start[2 * i];
+
+          _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));  
+        }
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+        {
+          x1 = &x1_start[2 * i];
+          x2 = &x2_start[2 * i];
+
+          _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));   
+        }
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+/*** binary end ****/
+
+
+static void sumCAT_SAVE(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+			unsigned char *tipX1, unsigned char *tipX2, int n, double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  int i;
+  double 
+    *x1, 
+    *x2,    
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  if(isGap(x2_gap, i))
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;
+	      x2_ptr += 4;
+	    }
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  if(isGap(x1_gap, i))
+	    x1 = x1_gapColumn;
+	  else
+	    {
+	      x1 = x1_ptr;
+	      x1_ptr += 4;
+	    }
+
+	  if(isGap(x2_gap, i))
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2 = x2_ptr;
+	      x2_ptr += 4;
+	    }
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+
+	}    
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumGAMMA_GAPPED_SAVE(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+				 unsigned char *tipX1, unsigned char *tipX2, int n, 
+				 double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  double 
+    *x1, 
+    *x2, 
+    *sum,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start;
+  
+  int i, j, k; 
+
+  switch(tipCase)
+    {
+    case TIP_TIP:     
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+	  sum = &sumtable[i * 16];
+
+	  for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[k] )));
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1  = &(tipVector[4 * tipX1[i]]);
+	  
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2  = x2_ptr;
+	      x2_ptr += 16;
+	    }
+	  
+	  sum = &sumtable[16 * i];
+
+	  for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[j * 4 + k] )));
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  if(x1_gap[i / 32] & mask32[i % 32])
+	    x1 = x1_gapColumn;
+	  else
+	    {
+	      x1  = x1_ptr;
+	      x1_ptr += 16;
+	    }
+	  
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2 = x2_gapColumn;
+	  else
+	    {
+	      x2  = x2_ptr;
+	      x2_ptr += 16;
+	    }
+
+	  sum = &sumtable[16 * i];
+	  
+
+	   for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[j * 4 + k] ), _mm_load_pd( &x2[j * 4 + k] )));
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+
+
+static void sumGAMMA(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+		     unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  double *x1, *x2, *sum;
+  int i, j, k;
+
+  /* C-OPT once again switch over possible configurations at inner node */
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      /* C-OPT main for loop overt alignment length */
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+	  sum = &sumtable[i * 16];
+
+	  for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[k] )));
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1  = &(tipVector[4 * tipX1[i]]);
+	  x2  = &x2_start[16 * i];
+	  sum = &sumtable[16 * i];
+
+	  for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[j * 4 + k] )));
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1  = &x1_start[16 * i];
+	  x2  = &x2_start[16 * i];
+	  sum = &sumtable[16 * i];
+
+	   for(j = 0; j < 4; j++)	    
+	    for(k = 0; k < 4; k+=2)
+	      _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[j * 4 + k] ), _mm_load_pd( &x2[j * 4 + k] )));
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumCAT(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+		   unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  int i;
+  double 
+    *x1, 
+    *x2;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &x2_start[4 * i];
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &x1_start[4 * i];
+	  x2 = &x2_start[4 * i];
+
+	  _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+	  _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+
+	}    
+      break;
+    default:
+      assert(0);
+    }
+}
+static void sumGAMMAPROT_GAPPED_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+				     unsigned char *tipX1, unsigned char *tipX2, int n, 
+				     double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  int i, l, k;
+  double 
+    *left, 
+    *right, 
+    *sum,
+    *x1_ptr = x1,
+    *x2_ptr = x2,
+    *x1v,
+    *x2v;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for(i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[20 * tipX1[i]]);
+	  right = &(tipVector[20 * tipX2[i]]);
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      sum = &sumtable[i * 80 + l * 20];
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+
+	    }
+	}
+      break;
+    case TIP_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  left = &(tipVector[20 * tipX1[i]]);
+	   
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2v = x2_gapColumn;
+	  else
+	    {
+	      x2v = x2_ptr;
+	      x2_ptr += 80;
+	    }
+	  
+	  for(l = 0; l < 4; l++)
+	    {
+	      right = &(x2v[l * 20]);
+	      sum = &sumtable[i * 80 + l * 20];
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+	    }
+	}
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  if(x1_gap[i / 32] & mask32[i % 32])
+	    x1v = x1_gapColumn;
+	  else
+	    {
+	      x1v  = x1_ptr;
+	      x1_ptr += 80;
+	    }
+
+	  if(x2_gap[i / 32] & mask32[i % 32])
+	    x2v = x2_gapColumn;
+	  else
+	    {
+	      x2v  = x2_ptr;
+	      x2_ptr += 80;
+	    }
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      left  = &(x1v[l * 20]);
+	      right = &(x2v[l * 20]);
+	      sum   = &(sumtable[i * 80 + l * 20]);
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumGAMMAPROT_LG4(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector[4],
+			     unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  int i, l, k;
+  double *left, *right, *sum;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for(i = 0; i < n; i++)
+	{	  
+	  for(l = 0; l < 4; l++)
+	    {
+	      left  = &(tipVector[l][20 * tipX1[i]]);
+	      right = &(tipVector[l][20 * tipX2[i]]);
+
+	      sum = &sumtable[i * 80 + l * 20];
+#ifdef __SIM_SSE3
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+#else
+	      for(k = 0; k < 20; k++)
+		sum[k] = left[k] * right[k];
+#endif
+	    }
+	}
+      break;
+    case TIP_INNER:
+      for(i = 0; i < n; i++)
+	{
+	 
+
+	  for(l = 0; l < 4; l++)
+	    { 
+	      left = &(tipVector[l][20 * tipX1[i]]);
+	      right = &(x2[80 * i + l * 20]);
+	      sum = &sumtable[i * 80 + l * 20];
+#ifdef __SIM_SSE3
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+#else
+	      for(k = 0; k < 20; k++)
+		sum[k] = left[k] * right[k];
+#endif
+	    }
+	}
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  for(l = 0; l < 4; l++)
+	    {
+	      left  = &(x1[80 * i + l * 20]);
+	      right = &(x2[80 * i + l * 20]);
+	      sum   = &(sumtable[i * 80 + l * 20]);
+
+#ifdef __SIM_SSE3
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+#else
+	      for(k = 0; k < 20; k++)
+		sum[k] = left[k] * right[k];
+#endif
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumGAMMAPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			 unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  int i, l, k;
+  double *left, *right, *sum;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for(i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[20 * tipX1[i]]);
+	  right = &(tipVector[20 * tipX2[i]]);
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      sum = &sumtable[i * 80 + l * 20];
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+
+	    }
+	}
+      break;
+    case TIP_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  left = &(tipVector[20 * tipX1[i]]);
+
+	  for(l = 0; l < 4; l++)
+	    {
+	      right = &(x2[80 * i + l * 20]);
+	      sum = &sumtable[i * 80 + l * 20];
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+
+	    }
+	}
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  for(l = 0; l < 4; l++)
+	    {
+	      left  = &(x1[80 * i + l * 20]);
+	      right = &(x2[80 * i + l * 20]);
+	      sum   = &(sumtable[i * 80 + l * 20]);
+
+
+	      for(k = 0; k < 20; k+=2)
+		{
+		  __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+		  
+		  _mm_store_pd(&sum[k], sumv);		 
+		}
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumGTRCATPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			  unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+  int i, l;
+  double *sum, *left, *right;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[20 * tipX1[i]]);
+	  right = &(tipVector[20 * tipX2[i]]);
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  left = &(tipVector[20 * tipX1[i]]);
+	  right = &x2[20 * i];
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  left  = &x1[20 * i];
+	  right = &x2[20 * i];
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+static void sumGTRCATPROT_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+			       unsigned char *tipX1, unsigned char *tipX2, int n, 
+			       double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+  int 
+    i, 
+    l;
+  
+  double 
+    *sum, 
+    *left, 
+    *right,
+    *left_ptr = x1,
+    *right_ptr = x2;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+	{
+	  left  = &(tipVector[20 * tipX1[i]]);
+	  right = &(tipVector[20 * tipX2[i]]);
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+
+	}
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  left = &(tipVector[20 * tipX1[i]]);       
+	   
+	  if(isGap(x2_gap, i))
+	    right = x2_gapColumn;
+	  else
+	    {
+	      right = right_ptr;
+	      right_ptr += 20;
+	    }
+	  
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{	 
+	  if(isGap(x1_gap, i))
+	    left = x1_gapColumn;
+	  else
+	    {
+	      left = left_ptr;
+	      left_ptr += 20;
+	    }
+
+	  if(isGap(x2_gap, i))
+	    right = x2_gapColumn;
+	  else
+	    {
+	      right = right_ptr;
+	      right_ptr += 20;
+	    }
+
+	  sum = &sumtable[20 * i];
+
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+	      
+	      _mm_store_pd(&sum[l], sumv);		 
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+}
+
+static void coreGTRGAMMA(const int upper, double *sumtable,
+			 volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+  double 
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0,
+    ki, 
+    kisqr,  
+    inv_Li, 
+    dlnLidlz, 
+    d2lnLidlz2,  
+    *sum, 
+    diagptable0[16] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable1[16] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable2[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));    
+    
+  int     
+    i, 
+    j, 
+    l;
+  
+  for(i = 0; i < 4; i++)
+    {
+      ki = gammaRates[i];
+      kisqr = ki * ki;
+      
+      diagptable0[i * 4] = 1.0;
+      diagptable1[i * 4] = 0.0;
+      diagptable2[i * 4] = 0.0;
+
+      for(l = 1; l < 4; l++)
+	{
+	  diagptable0[i * 4 + l] = EXP(EIGN[l] * ki * lz);
+	  diagptable1[i * 4 + l] = EIGN[l] * ki;
+	  diagptable2[i * 4 + l] = EIGN[l] * EIGN[l] * kisqr;
+	}
+    }
+
+  for (i = 0; i < upper; i++)
+    { 
+      __m128d a0 = _mm_setzero_pd();
+      __m128d a1 = _mm_setzero_pd();
+      __m128d a2 = _mm_setzero_pd();
+      
+      
+      
+      sum = &sumtable[i * 16];         
+
+      for(j = 0; j < 4; j++)
+	{	 	  	
+	  double 	   
+	    *d0 = &diagptable0[j * 4],
+	    *d1 = &diagptable1[j * 4],
+	    *d2 = &diagptable2[j * 4];
+  	 	 
+	  for(l = 0; l < 4; l+=2)
+	    {
+	      __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 4 + l]));
+	      a0 = _mm_add_pd(a0, tmpv);
+	      a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+	      a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+	    }	 	  
+	}
+
+      a0 = _mm_hadd_pd(a0, a0);
+      a1 = _mm_hadd_pd(a1, a1);
+      a2 = _mm_hadd_pd(a2, a2);
+
+      _mm_storel_pd(&inv_Li, a0);     
+      _mm_storel_pd(&dlnLidlz, a1);
+      _mm_storel_pd(&d2lnLidlz2, a2); 
+
+      inv_Li = 1.0 / FABS(inv_Li);
+     
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;     
+
+      dlnLdlz   += wgt[i] * dlnLidlz;
+      d2lnLdlz2 += wgt[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+ 
+  *ext_dlnLdlz   = dlnLdlz;
+  *ext_d2lnLdlz2 = d2lnLdlz2; 
+}
+
+
+
+static void coreGTRCAT(int upper, int numberOfCategories, double *sum,
+			   volatile double *d1, volatile double *d2, int *wgt,
+			   double *rptr, double *EIGN, int *cptr, double lz)
+{
+  int i;
+  double
+    *d, *d_start,
+    inv_Li, dlnLidlz, d2lnLidlz2,
+    dlnLdlz = 0.0,
+    d2lnLdlz2 = 0.0;
+  double e1[4] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double e2[4] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double dd1, dd2, dd3;
+
+  __m128d
+    e1v[2],
+    e2v[2];
+
+  e1[0] = 0.0;
+  e2[0] = 0.0;
+  e1[1] = EIGN[1];
+  e2[1] = EIGN[1] * EIGN[1];
+  e1[2] = EIGN[2];
+  e2[2] = EIGN[2] * EIGN[2];
+  e1[3] = EIGN[3];
+  e2[3] = EIGN[3] * EIGN[3];
+
+  e1v[0]= _mm_load_pd(&e1[0]);
+  e1v[1]= _mm_load_pd(&e1[2]);
+
+  e2v[0]= _mm_load_pd(&e2[0]);
+  e2v[1]= _mm_load_pd(&e2[2]);
+
+  d = d_start = (double *)malloc_aligned(numberOfCategories * 4 * sizeof(double));
+
+  dd1 = EIGN[1] * lz;
+  dd2 = EIGN[2] * lz;
+  dd3 = EIGN[3] * lz;
+
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      d[i * 4 + 0] = 1.0;
+      d[i * 4 + 1] = EXP(dd1 * rptr[i]);
+      d[i * 4 + 2] = EXP(dd2 * rptr[i]);
+      d[i * 4 + 3] = EXP(dd3 * rptr[i]);
+    }
+
+  for (i = 0; i < upper; i++)
+    {
+      double 
+	*s = &sum[4 * i];
+      
+      double 
+	r = rptr[cptr[i]],
+	wr1 = r * wgt[i],
+	wr2 = r * r * wgt[i];
+
+      d = &d_start[4 * cptr[i]];  
+      
+      __m128d tmp_0v =_mm_mul_pd(_mm_load_pd(&d[0]),_mm_load_pd(&s[0]));
+      __m128d tmp_1v =_mm_mul_pd(_mm_load_pd(&d[2]),_mm_load_pd(&s[2]));
+
+      __m128d inv_Liv    = _mm_add_pd(tmp_0v, tmp_1v);      
+            	  
+      __m128d dlnLidlzv   = _mm_add_pd(_mm_mul_pd(tmp_0v, e1v[0]), _mm_mul_pd(tmp_1v, e1v[1]));	  
+      __m128d d2lnLidlz2v = _mm_add_pd(_mm_mul_pd(tmp_0v, e2v[0]), _mm_mul_pd(tmp_1v, e2v[1]));
+
+
+      inv_Liv   = _mm_hadd_pd(inv_Liv, inv_Liv);
+      dlnLidlzv = _mm_hadd_pd(dlnLidlzv, dlnLidlzv);
+      d2lnLidlz2v = _mm_hadd_pd(d2lnLidlz2v, d2lnLidlz2v);                 
+ 
+      _mm_storel_pd(&inv_Li, inv_Liv);     
+      _mm_storel_pd(&dlnLidlz, dlnLidlzv);                 
+      _mm_storel_pd(&d2lnLidlz2, d2lnLidlz2v);      
+
+      inv_Li = 1.0/FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz   += wr1 * dlnLidlz;
+      d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *d1 = dlnLdlz;
+  *d2 = d2lnLdlz2;
+
+  free(d_start);
+}
+
+
+static void coreGTRGAMMAPROT_LG4(double *gammaRates, double *EIGN[4], double *sumtable, int upper, int *wrptr,
+				 volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double lz, double *weights)
+{
+  double  *sum, 
+    diagptable0[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable1[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable2[80] __attribute__ ((aligned (BYTE_ALIGNMENT)));    
+  int     i, j, l;
+  double  dlnLdlz = 0;
+  double d2lnLdlz2 = 0;
+  double ki, kisqr; 
+  
+
+  for(i = 0; i < 4; i++)
+    {
+      ki = gammaRates[i];
+      kisqr = ki * ki;
+      
+      diagptable0[i * 20] = 1.0;
+      diagptable1[i * 20] = 0.0;
+      diagptable2[i * 20] = 0.0;
+
+      for(l = 1; l < 20; l++)
+	{
+	  diagptable0[i * 20 + l] = EXP(EIGN[i][l] * ki * lz);
+	  diagptable1[i * 20 + l] = EIGN[i][l] * ki;
+	  diagptable2[i * 20 + l] = EIGN[i][l] * EIGN[i][l] * kisqr;
+	}
+    }
+
+  for (i = 0; i < upper; i++)
+    { 
+      double 	
+	inv_Li = 0.0, 
+	dlnLidlz = 0.0, 
+	d2lnLidlz2 = 0.0;
+      
+
+      sum = &sumtable[i * 80];         
+
+      for(j = 0; j < 4; j++)
+	{	 	  	
+	  double 
+	    l0,
+	    l1,
+	    l2,
+	    *d0 = &diagptable0[j * 20],
+	    *d1 = &diagptable1[j * 20],
+	    *d2 = &diagptable2[j * 20];
+
+	  __m128d 
+	    a0 = _mm_setzero_pd(),
+	    a1 = _mm_setzero_pd(),
+	    a2 = _mm_setzero_pd();
+	  
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 20 +l]));
+	      a0 = _mm_add_pd(a0, tmpv);
+	      a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+	      a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+	    }
+
+	  a0 = _mm_hadd_pd(a0, a0);
+	  a1 = _mm_hadd_pd(a1, a1);
+	  a2 = _mm_hadd_pd(a2, a2);
+
+	  _mm_storel_pd(&l0, a0);
+	  _mm_storel_pd(&l1, a1);
+	  _mm_storel_pd(&l2, a2);
+	  
+	  inv_Li     += weights[j] * l0;
+	  dlnLidlz   += weights[j] * l1;
+	  d2lnLidlz2 += weights[j] * l2;
+	}
+
+     
+
+      inv_Li = 1.0 / FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz   += wrptr[i] * dlnLidlz;
+      d2lnLdlz2 += wrptr[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *ext_dlnLdlz   = dlnLdlz;
+  *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+static void coreGTRGAMMAPROT(double *gammaRates, double *EIGN, double *sumtable, int upper, int *wgt,
+			      volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double lz)
+{
+  double  *sum, 
+    diagptable0[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable1[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+    diagptable2[80] __attribute__ ((aligned (BYTE_ALIGNMENT)));    
+  int     i, j, l;
+  double  dlnLdlz = 0;
+  double d2lnLdlz2 = 0;
+  double ki, kisqr; 
+  double inv_Li, dlnLidlz, d2lnLidlz2;
+
+  for(i = 0; i < 4; i++)
+    {
+      ki = gammaRates[i];
+      kisqr = ki * ki;
+      
+      diagptable0[i * 20] = 1.0;
+      diagptable1[i * 20] = 0.0;
+      diagptable2[i * 20] = 0.0;
+
+      for(l = 1; l < 20; l++)
+	{
+	  diagptable0[i * 20 + l] = EXP(EIGN[l] * ki * lz);
+	  diagptable1[i * 20 + l] = EIGN[l] * ki;
+	  diagptable2[i * 20 + l] = EIGN[l] * EIGN[l] * kisqr;
+	}
+    }
+
+  for (i = 0; i < upper; i++)
+    { 
+      __m128d a0 = _mm_setzero_pd();
+      __m128d a1 = _mm_setzero_pd();
+      __m128d a2 = _mm_setzero_pd();
+
+     
+      sum = &sumtable[i * 80];         
+
+      for(j = 0; j < 4; j++)
+	{	 	  	
+	  double 	   
+	    *d0 = &diagptable0[j * 20],
+	    *d1 = &diagptable1[j * 20],
+	    *d2 = &diagptable2[j * 20];
+  	 	 
+	  for(l = 0; l < 20; l+=2)
+	    {
+	      __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 20 +l]));
+	      a0 = _mm_add_pd(a0, tmpv);
+	      a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+	      a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+	    }	 	  
+	}
+
+      a0 = _mm_hadd_pd(a0, a0);
+      a1 = _mm_hadd_pd(a1, a1);
+      a2 = _mm_hadd_pd(a2, a2);
+
+      _mm_storel_pd(&inv_Li, a0);
+      _mm_storel_pd(&dlnLidlz, a1);
+      _mm_storel_pd(&d2lnLidlz2, a2);
+
+      inv_Li = 1.0 / FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz   += wgt[i] * dlnLidlz;
+      d2lnLdlz2 += wgt[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *ext_dlnLdlz   = dlnLdlz;
+  *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+
+static void coreGTRCATPROT(double *EIGN, double lz, int numberOfCategories, double *rptr, int *cptr, int upper,
+			   int *wgt,  volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *sumtable)
+{
+  int i, l;
+  double *d1, *d_start, *sum;
+  double 
+    e[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+    s[20] __attribute__ ((aligned (BYTE_ALIGNMENT))), 
+    dd[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double inv_Li, dlnLidlz, d2lnLidlz2;
+  double  dlnLdlz = 0.0;
+  double  d2lnLdlz2 = 0.0;
+
+  d1 = d_start = (double *)malloc_aligned(numberOfCategories * 20 * sizeof(double));
+
+  e[0] = 0.0;
+  s[0] = 0.0; 
+
+  for(l = 1; l < 20; l++)
+    {
+      e[l]  = EIGN[l] * EIGN[l];
+      s[l]  = EIGN[l];
+      dd[l] = s[l] * lz;
+    }
+
+  for(i = 0; i < numberOfCategories; i++)
+    {      
+      d1[20 * i] = 1.0;
+      for(l = 1; l < 20; l++)
+	d1[20 * i + l] = EXP(dd[l] * rptr[i]);
+    }
+
+  for (i = 0; i < upper; i++)
+    {
+      __m128d a0 = _mm_setzero_pd();
+      __m128d a1 = _mm_setzero_pd();
+      __m128d a2 = _mm_setzero_pd();
+
+       double 
+	r = rptr[cptr[i]],
+	wr1 = r * wgt[i],
+	wr2 = r * r * wgt[i];
+
+      d1 = &d_start[20 * cptr[i]];
+      sum = &sumtable[20 * i];
+          
+      for(l = 0; l < 20; l+=2)
+	{	  
+	  __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d1[l]), _mm_load_pd(&sum[l]));
+	  
+	  a0 = _mm_add_pd(a0, tmpv);
+	  __m128d sv = _mm_load_pd(&s[l]);	  
+	  
+	  a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, sv));
+	  __m128d ev = _mm_load_pd(&e[l]);	  
+
+	  a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, ev));
+	}
+
+      a0 = _mm_hadd_pd(a0, a0);
+      a1 = _mm_hadd_pd(a1, a1);
+      a2 = _mm_hadd_pd(a2, a2);
+
+      _mm_storel_pd(&inv_Li, a0);     
+      _mm_storel_pd(&dlnLidlz, a1);                 
+      _mm_storel_pd(&d2lnLidlz2, a2);
+      
+      inv_Li = 1.0/FABS(inv_Li);
+
+      dlnLidlz   *= inv_Li;
+      d2lnLidlz2 *= inv_Li;
+
+      dlnLdlz  += wr1 * dlnLidlz;
+      d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+    }
+
+  *ext_dlnLdlz   = dlnLdlz;
+  *ext_d2lnLdlz2 = d2lnLdlz2;
+
+  free(d_start);
+}
+
+
+
+
+#endif
+
+
+
diff --git a/examl/mic_native.h b/examl/mic_native.h
new file mode 100644
index 0000000..05b4775
--- /dev/null
+++ b/examl/mic_native.h
@@ -0,0 +1,96 @@
+#ifndef MIC_NATIVE_H_
+#define MIC_NATIVE_H_
+
+//#define VECTOR_PADDING 8
+//#define GET_PADDED_WIDTH(w) w % VECTOR_PADDING == 0 ? w : w + (VECTOR_PADDING - (w % VECTOR_PADDING))
+
+// general functions
+void updateModel_MIC(pInfo* part);
+
+// DNA data
+
+void makeP_DNA_MIC(double z1, double z2, double *rptr, double *EI,  double *EIGN, int numberOfCategories,
+               double *left, double *right, boolean saveMem, int maxCat);
+
+void precomputeTips_DNA_MIC(int tipCase, double *tipVector, double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories);
+
+void newviewGTRGAMMA_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight);
+
+double evaluateGAMMA_MIC(int *wptr,
+                 double *x1_start, double *x2_start,
+                 double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable);
+
+void sumGAMMA_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMA_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+
+void sumcoreGTRGAMMA_MIC(int tipCase, double *x1_start, double *x2_start, double *tipVector,
+        unsigned char *tipX1, unsigned char *tipX2, const int n,
+        volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt);
+
+// protein data - single matrix
+
+void makeP_PROT_MIC(double z1, double z2, double *rptr, double *EI,  double *EIGN, int numberOfCategories,
+               double *left, double *right, boolean saveMem, int maxCat);
+
+void precomputeTips_PROT_MIC(int tipCase, double *tipVector, double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories);
+
+void newviewGTRGAMMAPROT_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight);
+
+double evaluateGAMMAPROT_MIC(int *wptr,
+                 double *x1_start, double *x2_start,
+                 double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable);
+
+void sumGAMMAPROT_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMAPROT_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+
+
+// protein data - LG4
+
+void updateModel_LG4_MIC(pInfo* part);
+
+void makeP_PROT_LG4_MIC(double z1, double z2, double *rptr, double *EI[4],  double *EIGN[4], int numberOfCategories, double *left, double *right);
+
+void precomputeTips_PROT_LG4_MIC(int tipCase, double *tipVector[4], double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories);
+
+void newviewGTRGAMMAPROT_LG4_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight);
+
+double evaluateGAMMAPROT_LG4_MIC(int *wptr,
+                 double *x1_start, double *x2_start,
+                 double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable, double* weights);
+
+void sumGAMMAPROT_LG4_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMAPROT_LG4_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN[4], double *gammaRates, double lz, int *wrptr, double* weights);
+
+
+
+#endif /* MIC_NATIVE_H_ */
diff --git a/examl/mic_native_aa.c b/examl/mic_native_aa.c
new file mode 100644
index 0000000..6e64105
--- /dev/null
+++ b/examl/mic_native_aa.c
@@ -0,0 +1,1323 @@
+#include <immintrin.h>
+#include <string.h>
+#include <math.h>
+
+#include "axml.h"
+#include "mic_native.h"
+
+static const int states = 20;
+static const int statesSquare = 20 * 20;
+static const int span = 20 * 4;
+static const int maxStateValue = 23;
+
+void makeP_PROT_MIC(double z1, double z2, double *rptr, double *EI,  double *EIGN, int numberOfCategories, double *left, double *right,
+               boolean saveMem, int maxCat)
+{
+  int
+    i,
+    j,
+    k,
+    span = states * numberOfCategories;
+
+  /* assign some space for pre-computing and later re-using functions */
+
+  double lz1[20] __attribute__((align(BYTE_ALIGNMENT)));
+  double lz2[20] __attribute__((align(BYTE_ALIGNMENT)));
+  double d1[20] __attribute__((align(BYTE_ALIGNMENT)));
+  double d2[20] __attribute__((align(BYTE_ALIGNMENT)));
+
+
+  /* multiply branch lengths with eigenvalues */
+  for(i = 1; i < states; i++)
+    {
+      lz1[i] = EIGN[i] * z1;
+      lz2[i] = EIGN[i] * z2;
+    }
+
+
+  /* loop over the number of rate categories, this will be 4 for the GAMMA model and
+     variable for the CAT model */
+
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      /* exponentiate the rate multiplied by the branch */
+
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP(rptr[i] * lz1[j]);
+	  d2[j] = EXP(rptr[i] * lz2[j]);
+	}
+
+      /* now fill the P matrices for the two branch length values */
+
+      for(j = 0; j < states; j++)
+	{
+	  /* left and right are pre-allocated arrays */
+
+	  left[i * states + j] = 1.0;
+	  right[i * states + j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[k * span + i * states + j]  = d1[k] * EI[states * j + k];
+	      right[k * span + i * states + j] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+
+
+  /* if memory saving is enabled and we are using CAT we need to do one additional P matrix
+     calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+  if(saveMem)
+    {
+      i = maxCat;
+
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP (lz1[j]);
+	  d2[j] = EXP (lz2[j]);
+	}
+
+      for(j = 0; j < states; j++)
+	{
+	  left[statesSquare * i  + states * j] = 1.0;
+	  right[statesSquare * i + states * j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[statesSquare * i + states * j + k]  = d1[k] * EI[states * j + k];
+	      right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+}
+
+void precomputeTips_PROT_MIC(int tipCase, double *tipVector, double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories)
+{
+  /* no precomputation needed if both children are inner nodes */
+  if (tipCase == INNER_INNER)
+    return;
+
+  const int
+    span 	= states * numberOfCategories,
+    umpSize 	= span * maxStateValue;
+
+  for(int k = 0; k < umpSize; ++k)
+  {
+    umpLeft[k] = 0.0;
+    umpRight[k] = 0.0;
+  }
+
+  for(int i = 0; i < maxStateValue; ++i)
+  {
+    for(int l = 0; l < states; ++l)
+    {
+	#pragma ivdep
+	#pragma vector aligned
+	for(int k = 0; k < span; ++k)
+	{
+	    umpLeft[span * i + k] +=  tipVector[i * states + l] * left[l * span + k];
+	    if (tipCase == TIP_TIP)
+	      umpRight[span * i + k] +=  tipVector[i * states + l] * right[l * span + k];
+	}
+    }
+  }
+}
+
+inline void mic_fma4x80(const double* inv, double* outv, double* mulv)
+{
+    __mmask8 k1 = _mm512_int2mask(0x0F);
+    __mmask8 k2 = _mm512_int2mask(0xF0);
+    for(int l = 0; l < 80; l += 40)
+    {
+        __m512d t = _mm512_setzero_pd();
+
+        t = _mm512_extload_pd(&inv[l], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+        __m512d m = _mm512_load_pd(&mulv[l]);
+        __m512d acc = _mm512_load_pd(&outv[l]);
+        __m512d r = _mm512_fmadd_pd(t, m, acc);
+        _mm512_store_pd(&outv[l], r);
+
+        m = _mm512_load_pd(&mulv[l + 8]);
+        acc = _mm512_load_pd(&outv[l + 8]);
+        r = _mm512_fmadd_pd(t, m, acc);
+        _mm512_store_pd(&outv[l + 8], r);
+
+        t = _mm512_mask_extload_pd(t, k1, &inv[l], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+        t = _mm512_mask_extload_pd(t, k2, &inv[l+20], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+        m = _mm512_load_pd(&mulv[l + 16]);
+        acc = _mm512_load_pd(&outv[l + 16]);
+        r = _mm512_fmadd_pd(t, m, acc);
+        _mm512_store_pd(&outv[l + 16], r);
+
+        t = _mm512_extload_pd(&inv[l+20], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+        m = _mm512_load_pd(&mulv[l + 24]);
+        acc = _mm512_load_pd(&outv[l + 24]);
+        r = _mm512_fmadd_pd(t, m, acc);
+        _mm512_store_pd(&outv[l + 24], r);
+
+        m = _mm512_load_pd(&mulv[l + 32]);
+        acc = _mm512_load_pd(&outv[l + 32]);
+        r = _mm512_fmadd_pd(t, m, acc);
+        _mm512_store_pd(&outv[l + 32], r);
+    }
+}
+
+
+void newviewGTRGAMMAPROT_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight)
+{
+  __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+  __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+  __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+  int addScale = 0;
+
+  double
+    *aEV = extEV,
+    *aRight = right,
+    *aLeft = left,
+    *umpX1 = umpLeft,
+    *umpX2 = umpRight;
+
+  switch(tipCase)
+  {
+    case TIP_TIP:
+      {
+        /* multiply all possible tip state vectors with the respective P-matrices
+        */
+
+        for (int i = 0; i < n; i++)
+        {
+            const double *uX1 = &umpX1[span * tipX1[i]];
+            const double *uX2 = &umpX2[span * tipX2[i]];
+
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double* v3 = &x3[i * span];
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+                for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+        } // sites loop
+      }
+      break;
+    case TIP_INNER:
+      {
+        for (int i = 0; i < n; i++)
+        {
+            #pragma unroll(10)
+            for (int j = 0; j < span; j += 8)
+            {
+                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+//                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+            }
+
+            /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+            double* uX1 = &umpX1[span * tipX1[i]];
+            double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+            double* v3 = &(x3[span * i]);
+
+            const double* v2 = &(x2[span * i]);
+
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX2[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+            }
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+            t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax = _mm512_reduce_gmax_pd(t1);
+            for (int l = 8; l < span; l += 8)
+            {
+                __m512d t = _mm512_load_pd(&v3[l]);
+                t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+                double vmax2 = _mm512_reduce_gmax_pd(t);
+                vmax = MAX(vmax, vmax2);
+            }
+
+            if (vmax < minlikelihood)
+            {
+                #pragma vector aligned nontemporal
+                for(int l = 0; l < span; l++)
+                  v3[l] *= twotothe256;
+
+                addScale += wgt[i];
+            }
+
+        } // site loop
+
+      }
+      break;
+    case INNER_INNER:
+    {
+      /* same as above, without pre-computations */
+
+        for (int i = 0; i < n; i++)
+        {
+
+            #pragma unroll(10)
+            for (int j = 0; j < span; j += 8)
+            {
+                _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T1);
+                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+//                _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T0);
+//                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+            }
+
+
+            double uX1[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+            double* v3 = &(x3[span * i]);
+
+            const double* v1 = &(x1[span * i]);
+            const double* v2 = &(x2[span * i]);
+
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX1[l] = 0.;
+                uX2[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+                    _mm_prefetch((const char *)&aLeft[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&v1[k], uX1, &aLeft[k * span]);
+                mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+            }
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+            t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax = _mm512_reduce_gmax_pd(t1);
+            for (int l = 8; l < span; l += 8)
+            {
+                __m512d t = _mm512_load_pd(&v3[l]);
+                t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+                double vmax2 = _mm512_reduce_gmax_pd(t);
+                vmax = MAX(vmax, vmax2);
+            }
+
+            if (vmax < minlikelihood)
+            {
+                #pragma vector aligned nontemporal
+                for(int l = 0; l < span; l++)
+                  v3[l] *= twotothe256;
+
+                addScale += wgt[i];
+            }
+        }
+    } break;
+    default:
+//      assert(0);
+      break;
+  }
+
+  *scalerIncrement = addScale;
+
+}
+
+
+
+double evaluateGAMMAPROT_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable)
+{
+    double sum = 0.0;
+
+    /* the left node is a tip */
+    if(tipX1)
+      {
+	double
+	  *aTipVec = tipVector;
+
+	/* loop over the sites of this partition */
+	for (int i = 0; i < n; i++)
+	  {
+	    /* access pre-computed tip vector values via a lookup table */
+	    const double *x1 = &(aTipVec[span * tipX1[i]]);
+	    /* access the other(inner) node at the other end of the branch */
+	    const double *x2 = &(x2_start[span * i]);
+
+	    #pragma unroll(10)
+	    for (int k = 0; k < span; k += 8)
+	    {
+		    _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+		    _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+	    double term = 0.;
+
+	    #pragma ivdep
+	    #pragma vector aligned
+	    #pragma noprefetch x2
+	    for(int j = 0; j < span; j++) {
+	      term += x1[j] * x2[j] * diagptable[j];
+	    }
+
+	    term = log(0.25 * fabs(term));
+
+	    sum += wgt[i] * term;
+	  }
+      }
+    else
+      {
+	for (int i = 0; i < n; i++)
+	  {
+	    #pragma unroll(10)
+	    for (int k = 0; k < span; k += 8)
+	    {
+	      _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+	      _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+	    const double *x1 = &(x1_start[span * i]);
+	    const double *x2 = &(x2_start[span * i]);
+
+	    double term = 0.;
+
+	    #pragma ivdep
+	    #pragma vector aligned
+	    #pragma noprefetch x1 x2
+	    for(int j = 0; j < span; j++)
+	      term += x1[j] * x2[j] * diagptable[j];
+
+	    term = log(0.25 * fabs(term));
+
+	    sum += wgt[i] * term;
+	  }
+      }
+
+    return sum;
+}
+
+void sumGAMMAPROT_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+    double
+      *aTipVec = tipVector;
+
+    switch(tipCase)
+    {
+      case TIP_TIP:
+      {
+        for(int i = 0; i < n; i++)
+	  {
+	    const double *left  = &(aTipVec[span * tipX1[i]]);
+	    const double *right = &(aTipVec[span * tipX2[i]]);
+
+	    #pragma ivdep
+	    #pragma vector aligned nontemporal
+	    for(int l = 0; l < span; l++)
+	      {
+		sumtable[i * span + l] = left[l] * right[l];
+	      }
+	  }
+      } break;
+      case TIP_INNER:
+      {
+        for(int i = 0; i < n; i++)
+        {
+	  #pragma unroll(10)
+	  for (int k = 0; k < span; k += 8)
+	    {
+	      _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+          const double *left = &(aTipVec[span * tipX1[i]]);
+          const double *right = &(x2_start[span * i]);
+
+          #pragma ivdep
+          #pragma vector aligned nontemporal
+		  #pragma noprefetch right
+          for(int l = 0; l < span; l++)
+          {
+              sumtable[i * span + l] = left[l] * right[l];
+          }
+        }
+      } break;
+      case INNER_INNER:
+      {
+        for(int i = 0; i < n; i++)
+        {
+	    #pragma unroll(10)
+	    for (int k = 0; k < span; k += 8)
+	      {
+		_mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+		_mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+		_mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+		_mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	      }
+
+            const double *left  = &(x1_start[span * i]);
+            const double *right = &(x2_start[span * i]);
+
+            #pragma ivdep
+            #pragma vector aligned nontemporal
+	    #pragma noprefetch left right
+            for(int l = 0; l < span; l++)
+	      {
+		sumtable[i * span + l] = left[l] * right[l];
+	      }
+        }
+      } break;
+  //    default:
+  //      assert(0);
+    }
+}
+
+void coreGTRGAMMAPROT_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+    static const int states = 20;
+    static const int span = 20 * 4;
+
+    double diagptable0[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable1[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable2[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable01[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable02[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+    /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+    for(int i = 0; i < 4; i++)
+    {
+        const double ki = gammaRates[i];
+        const double kisqr = ki * ki;
+
+        diagptable0[i*states] = 1.;
+        diagptable1[i*states] = 0.;
+        diagptable2[i*states] = 0.;
+
+        for(int l = 1; l < states; l++)
+        {
+          diagptable0[i * states + l]  = exp(EIGN[l] * ki * lz);
+          diagptable1[i * states + l] = EIGN[l] * ki;
+          diagptable2[i * states + l] = EIGN[l] * EIGN[l] * kisqr;
+        }
+    }
+
+    #pragma ivdep
+    for(int i = 0; i < span; i++)
+    {
+        diagptable01[i] = diagptable0[i] * diagptable1[i];
+        diagptable02[i] = diagptable0[i] * diagptable2[i];
+    }
+
+    /* loop over sites in this partition */
+
+    const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+    double dlnLBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+    double d2lnLBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+    for (int j = 0; j < 8; ++j)
+    {
+        dlnLBuf[j] = 0.;
+        d2lnLBuf[j] = 0.;
+    }
+
+    __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+    for (int i = 0; i < aligned_width; i++)
+    {
+        /* access the array with pre-computed values */
+        const double *sum = &sumtable[i * span * 8];
+
+        /* initial per-site likelihood and 1st and 2nd derivatives */
+
+        double invBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+        double d1Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+        double d2Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+
+        __m512d invVec;
+        __m512d d1Vec;
+        __m512d d2Vec;
+        int mask = 0x01;
+
+        #pragma noprefetch sum
+        #pragma unroll(8)
+        for(int j = 0; j < 8; j++)
+        {
+	    #pragma unroll(10)
+	    for (int k = 0; k < span; k += 8)
+	      {
+		_mm_prefetch((const char *) &sum[span*(j+2) + k], _MM_HINT_T1);
+		_mm_prefetch((const char *) &sum[span*(j+1) + k], _MM_HINT_T0);
+	      }
+
+            __m512d inv_1 = _mm512_setzero_pd();
+            __m512d d1_1 = _mm512_setzero_pd();
+            __m512d d2_1 = _mm512_setzero_pd();
+
+            for (int offset = 0; offset < span; offset += 8)
+	      {
+		__m512d d0_1 = _mm512_load_pd(&diagptable0[offset]);
+		__m512d d01_1 = _mm512_load_pd(&diagptable01[offset]);
+		__m512d d02_1 = _mm512_load_pd(&diagptable02[offset]);
+		__m512d s_1 = _mm512_load_pd(&sum[j*span + offset]);
+
+		inv_1 = _mm512_fmadd_pd(d0_1, s_1, inv_1);
+		d1_1 = _mm512_fmadd_pd(d01_1, s_1, d1_1);
+		d2_1 = _mm512_fmadd_pd(d02_1, s_1, d2_1);
+	      }
+
+            __mmask8 k1 = _mm512_int2mask(mask);
+            mask <<= 1;
+
+            // reduce
+            inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_CDAB));
+            inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_BADC));
+            inv_1 = _mm512_add_pd (inv_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_1), _MM_PERM_BADC)));
+            invVec = _mm512_mask_mov_pd(invVec, k1, inv_1);
+
+            d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_CDAB));
+            d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_BADC));
+            d1_1 = _mm512_add_pd (d1_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_1), _MM_PERM_BADC)));
+            d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_1);
+
+            d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_CDAB));
+            d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_BADC));
+            d2_1 = _mm512_add_pd (d2_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_1), _MM_PERM_BADC)));
+            d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_1);
+        }
+
+        _mm512_store_pd(&invBuf[0], invVec);
+        _mm512_store_pd(&d1Buf[0], d1Vec);
+        _mm512_store_pd(&d2Buf[0], d2Vec);
+
+        #pragma ivdep
+        #pragma vector aligned
+        for (int j = 0; j < 8; ++j)
+	  {
+	    const double inv_Li = 1.0 / invBuf[j];
+
+	    const double d1 = d1Buf[j] * inv_Li;
+	    const double d2 = d2Buf[j] * inv_Li;
+
+	    dlnLBuf[j] += wgt[i * 8 + j] * d1;
+	    d2lnLBuf[j] += wgt[i * 8 + j] * (d2 - d1 * d1);
+	  }
+    } // site loop
+
+    double dlnLdlz = 0.;
+    double d2lnLdlz2 = 0.;
+    for (int j = 0; j < 8; ++j)
+      {
+	dlnLdlz += dlnLBuf[j];
+	d2lnLdlz2 += d2lnLBuf[j];
+      }
+
+    *ext_dlnLdlz   = dlnLdlz;
+    *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+/****
+ *       PROTEIN - LG4
+ */
+void updateModel_LG4_MIC(pInfo* part)
+{
+  double
+    **EV              = part->EV_LG4,
+    **tipVector       = part->tipVector_LG4,
+    *aEV              = part->mic_EV,
+    *aTipVector       = part->mic_tipVector;
+
+  const int
+    states = part->states,
+    span = 4 * states,
+    maxState = getUndetermined(part->dataType) + 1;
+
+  int
+    k, l;
+
+  #pragma ivdep
+  for (l = 0; l < 4 * states * states; ++l)
+    {
+      aEV[l] = EV[(l % span) / states][(l / span) * states + (l % states)];
+    }
+
+  for(int k = 0; k < maxState; k++)
+    {
+      for(int j = 0; j < 4; j++)
+      {
+	for(int l = 0; l < states; l++)
+	  {
+	    aTipVector[k*span + j*states + l] = tipVector[j][k*states + l];
+	  }
+      }
+    }
+}
+
+void makeP_PROT_LG4_MIC(double z1, double z2, double *rptr, double *EI[4],  double *EIGN[4], int numberOfCategories, double *left, double *right)
+{
+  int
+    i,
+    j,
+    k;
+
+  double
+    d1[64],
+    d2[64];
+
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP (rptr[i] * EIGN[i][j] * z1);
+	  d2[j] = EXP (rptr[i] * EIGN[i][j] * z2);
+	}
+
+      for(j = 0; j < states; j++)
+	{
+	  left[i * states + j] = 1.0;
+	  right[i * states + j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[k * span + i * states + j]  = d1[k] * EI[i][states * j + k];
+	      right[k * span + i * states + j] = d2[k] * EI[i][states * j + k];
+	    }
+	}
+    }
+}
+
+void precomputeTips_PROT_LG4_MIC(int tipCase, double *tipVector[4], double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories)
+{
+  /* no precomputation needed if both children are inner nodes */
+  if (tipCase == INNER_INNER)
+    return;
+
+  const int
+    span 	= states * numberOfCategories,
+    umpSize 	= span * maxStateValue;
+
+  for(int k = 0; k < umpSize; ++k)
+  {
+    umpLeft[k] = 0.0;
+    umpRight[k] = 0.0;
+  }
+
+  for(int i = 0; i < maxStateValue; ++i)
+  {
+    for(int l = 0; l < states; ++l)
+    {
+	#pragma ivdep
+	#pragma vector aligned
+	for(int k = 0; k < span; ++k)
+	{
+	    umpLeft[span * i + k] +=  tipVector[k/20][i * states + l] * left[l * span + k];
+	    if (tipCase == TIP_TIP)
+	      umpRight[span * i + k] +=  tipVector[k/20][i * states + l] * right[l * span + k];
+	}
+    }
+  }
+}
+
+void newviewGTRGAMMAPROT_LG4_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight)
+{
+  __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+  __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+  __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+  int addScale = 0;
+
+  /* we assume that P-matrix and eigenvectors are in correct layout already */
+  double
+    *aEV = extEV,
+    *aRight = right,
+    *aLeft = left,
+    *umpX1 = umpLeft,
+    *umpX2 = umpRight;
+
+  switch(tipCase)
+  {
+    case TIP_TIP:
+      {
+        for (int i = 0; i < n; i++)
+        {
+            const double *uX1 = &umpX1[span * tipX1[i]];
+            const double *uX2 = &umpX2[span * tipX2[i]];
+
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double* v3 = &x3[i * span];
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+                for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+        } // sites loop
+      }
+      break;
+    case TIP_INNER:
+      {
+        for (int i = 0; i < n; i++)
+        {
+            #pragma unroll(10)
+            for (int j = 0; j < span; j += 8)
+            {
+                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+//                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+            }
+
+            /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+            double* uX1 = &umpX1[span * tipX1[i]];
+            double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+            double* v3 = &(x3[span * i]);
+
+            const double* v2 = &(x2[span * i]);
+
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX2[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+				#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+            }
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+            t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax = _mm512_reduce_gmax_pd(t1);
+            for (int l = 8; l < span; l += 8)
+            {
+                __m512d t = _mm512_load_pd(&v3[l]);
+                t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+                double vmax2 = _mm512_reduce_gmax_pd(t);
+                vmax = MAX(vmax, vmax2);
+            }
+
+            if (vmax < minlikelihood)
+            {
+                #pragma vector aligned nontemporal
+                for(int l = 0; l < span; l++)
+                  v3[l] *= twotothe256;
+
+                addScale += wgt[i];
+            }
+
+        } // site loop
+
+      }
+      break;
+    case INNER_INNER:
+    {
+        for (int i = 0; i < n; i++)
+        {
+
+            #pragma unroll(10)
+            for (int j = 0; j < span; j += 8)
+            {
+                _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T1);
+                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+//                _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T0);
+//                _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+            }
+
+
+            double uX1[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+            double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+            double* v3 = &(x3[span * i]);
+
+            const double* v1 = &(x1[span * i]);
+            const double* v2 = &(x2[span * i]);
+
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX1[l] = 0.;
+                uX2[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+                    _mm_prefetch((const char *)&aLeft[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&v1[k], uX1, &aLeft[k * span]);
+                mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+            }
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < span; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+                v3[l] = 0.;
+            }
+
+            for(int k = 0; k < states; ++k)
+            {
+		#pragma unroll(10)
+            	for (int j = 0; j < span; j += 8)
+                {
+                    _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+                }
+
+                mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+            }
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+            t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax = _mm512_reduce_gmax_pd(t1);
+            for (int l = 8; l < span; l += 8)
+            {
+                __m512d t = _mm512_load_pd(&v3[l]);
+                t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+                double vmax2 = _mm512_reduce_gmax_pd(t);
+                vmax = MAX(vmax, vmax2);
+            }
+
+            if (vmax < minlikelihood)
+            {
+                #pragma vector aligned nontemporal
+                for(int l = 0; l < span; l++)
+                  v3[l] *= twotothe256;
+
+                addScale += wgt[i];
+            }
+        }
+    } break;
+    default:
+//      assert(0);
+      break;
+  }
+
+  *scalerIncrement = addScale;
+
+}
+
+
+
+double evaluateGAMMAPROT_LG4_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable, double *weights)
+{
+    double wtable[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+    /* pre-multiply diagptable entries with the corresponding weights */
+    for(int j = 0; j < 4; j++)
+      for(int k = 0; k < states; k++)
+	{
+	  wtable[j * states + k] = diagptable[j * states + k] * weights[j];
+	}
+
+    double sum = 0.0;
+
+    /* the left node is a tip */
+    if(tipX1)
+    {
+        /* loop over the sites of this partition */
+        for (int i = 0; i < n; i++)
+        {
+          const double
+    	    *aTipVec = tipVector;
+
+	  /* access pre-computed tip vector values via a lookup table */
+	  const double *x1 = &(aTipVec[span * tipX1[i]]);
+	  /* access the other(inner) node at the other end of the branch */
+	  const double *x2 = &(x2_start[span * i]);
+
+	  #pragma unroll(10)
+	  for (int k = 0; k < span; k += 8)
+	    {
+	      _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+	  double term = 0.;
+
+	  #pragma ivdep
+	  #pragma vector aligned
+	  #pragma noprefetch x2
+	  for(int j = 0; j < span; j++)
+	    {
+	      term += x1[j] * x2[j] * wtable[j];
+	    }
+
+	  term = log(fabs(term));
+
+	  sum += wgt[i] * term;
+        }
+    }
+    else
+    {
+      for (int i = 0; i < n; i++)
+	{
+	  #pragma unroll(10)
+	  for (int k = 0; k < span; k += 8)
+	    {
+	      _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+	      _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+	  const double *x1 = &(x1_start[span * i]);
+	  const double *x2 = &(x2_start[span * i]);
+
+	  double term = 0.;
+
+	  #pragma ivdep
+	  #pragma vector aligned
+	  #pragma noprefetch x1 x2
+	  for(int j = 0; j < span; j++)
+	    term += x1[j] * x2[j] * wtable[j];
+
+	  term = log(fabs(term));
+
+	  sum += wgt[i] * term;
+	}
+    }
+
+    return sum;
+}
+
+void sumGAMMAPROT_LG4_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+    const double
+      *aTipVec = tipVector;
+
+    switch(tipCase)
+    {
+      case TIP_TIP:
+      {
+        for(int i = 0; i < n; i++)
+        {
+            const double *left  = &(aTipVec[span * tipX1[i]]);
+            const double *right = &(aTipVec[span * tipX2[i]]);
+
+            #pragma ivdep
+            #pragma vector aligned nontemporal
+            for(int l = 0; l < span; l++)
+            {
+                sumtable[i * span + l] = left[l] * right[l];
+            }
+        }
+      } break;
+      case TIP_INNER:
+      {
+        for(int i = 0; i < n; i++)
+        {
+	  #pragma unroll(10)
+	  for (int k = 0; k < span; k += 8)
+	    {
+	      _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+	      _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	    }
+
+          const double *left = &(aTipVec[span * tipX1[i]]);
+          const double *right = &(x2_start[span * i]);
+
+          #pragma ivdep
+          #pragma vector aligned nontemporal
+	  #pragma noprefetch right
+          for(int l = 0; l < span; l++)
+          {
+              sumtable[i * span + l] = left[l] * right[l];
+          }
+        }
+      } break;
+      case INNER_INNER:
+      {
+        for(int i = 0; i < n; i++)
+	  {
+	      #pragma unroll(10)
+	      for (int k = 0; k < span; k += 8)
+	      {
+		_mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+		_mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+		_mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+		_mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+	      }
+
+	      const double *left  = &(x1_start[span * i]);
+	      const double *right = &(x2_start[span * i]);
+
+	      #pragma ivdep
+	      #pragma vector aligned nontemporal
+	      #pragma noprefetch left right
+	      for(int l = 0; l < span; l++)
+	      {
+		  sumtable[i * span + l] = left[l] * right[l];
+	      }
+	  }
+      } break;
+  //    default:
+  //      assert(0);
+    }
+}
+
+void coreGTRGAMMAPROT_LG4_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN[4], double *gammaRates,
+    double lz, int *wgt, double *weights)
+{
+    double diagptable0[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable1[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable2[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable01[span] __attribute__((align(BYTE_ALIGNMENT)));
+    double diagptable02[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+    /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+    for(int i = 0; i < 4; i++)
+    {
+        const double ki = gammaRates[i];
+        const double kisqr = ki * ki;
+
+        diagptable0[i*states] = 1. * weights[i];
+        diagptable1[i*states] = 0.;
+        diagptable2[i*states] = 0.;
+
+        for(int l = 1; l < states; l++)
+        {
+          diagptable0[i * states + l]  = exp(EIGN[i][l] * ki * lz) * weights[i];
+          diagptable1[i * states + l] = EIGN[i][l] * ki;
+          diagptable2[i * states + l] = EIGN[i][l] * EIGN[i][l] * kisqr;
+        }
+    }
+
+    #pragma ivdep
+    for(int i = 0; i < span; i++)
+    {
+        diagptable01[i] = diagptable0[i] * diagptable1[i];
+        diagptable02[i] = diagptable0[i] * diagptable2[i];
+    }
+
+    /* loop over sites in this partition */
+
+    const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+    double dlnLdlz = 0.;
+    double d2lnLdlz2 = 0.;
+
+    __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+    for (int i = 0; i < aligned_width; i++)
+    {
+        /* access the array with pre-computed values */
+        const double *sum = &sumtable[i * span * 8];
+
+        /* initial per-site likelihood and 1st and 2nd derivatives */
+
+        double invBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+        double d1Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+        double d2Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+
+        __m512d invVec;
+        __m512d d1Vec;
+        __m512d d2Vec;
+        int mask = 0x01;
+
+        #pragma noprefetch sum
+        #pragma unroll(8)
+        for(int j = 0; j < 8; j++)
+        {
+	    #pragma unroll(10)
+	    for (int k = 0; k < span; k += 8)
+	    {
+		    _mm_prefetch((const char *) &sum[span*(j+2) + k], _MM_HINT_T1);
+		    _mm_prefetch((const char *) &sum[span*(j+1) + k], _MM_HINT_T0);
+	    }
+
+            __m512d inv_1 = _mm512_setzero_pd();
+            __m512d d1_1 = _mm512_setzero_pd();
+            __m512d d2_1 = _mm512_setzero_pd();
+
+            for (int offset = 0; offset < span; offset += 8)
+            {
+                __m512d d0_1 = _mm512_load_pd(&diagptable0[offset]);
+                __m512d d01_1 = _mm512_load_pd(&diagptable01[offset]);
+                __m512d d02_1 = _mm512_load_pd(&diagptable02[offset]);
+                __m512d s_1 = _mm512_load_pd(&sum[j*span + offset]);
+
+                inv_1 = _mm512_fmadd_pd(d0_1, s_1, inv_1);
+                d1_1 = _mm512_fmadd_pd(d01_1, s_1, d1_1);
+                d2_1 = _mm512_fmadd_pd(d02_1, s_1, d2_1);
+            }
+
+            __mmask8 k1 = _mm512_int2mask(mask);
+            mask <<= 1;
+
+            // reduce
+            inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_CDAB));
+            inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_BADC));
+            inv_1 = _mm512_add_pd (inv_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_1), _MM_PERM_BADC)));
+            invVec = _mm512_mask_mov_pd(invVec, k1, inv_1);
+
+            d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_CDAB));
+            d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_BADC));
+            d1_1 = _mm512_add_pd (d1_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_1), _MM_PERM_BADC)));
+            d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_1);
+
+            d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_CDAB));
+            d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_BADC));
+            d2_1 = _mm512_add_pd (d2_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_1), _MM_PERM_BADC)));
+            d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_1);
+        }
+
+        _mm512_store_pd(&invBuf[0], invVec);
+        _mm512_store_pd(&d1Buf[0], d1Vec);
+        _mm512_store_pd(&d2Buf[0], d2Vec);
+
+        #pragma ivdep
+        #pragma vector aligned
+        for (int j = 0; j < 8; ++j)
+        {
+            const double inv_Li = 1.0 / invBuf[j];
+
+            const double d1 = d1Buf[j] * inv_Li;
+            const double d2 = d2Buf[j] * inv_Li;
+
+            dlnLdlz += wgt[i * 8 + j] * d1;
+            d2lnLdlz2 += wgt[i * 8 + j] * (d2 - d1 * d1);
+        }
+    } // site loop
+
+    *ext_dlnLdlz   = dlnLdlz;
+    *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
diff --git a/examl/mic_native_dna.c b/examl/mic_native_dna.c
new file mode 100644
index 0000000..9bbc47a
--- /dev/null
+++ b/examl/mic_native_dna.c
@@ -0,0 +1,661 @@
+#include <immintrin.h>
+#include <string.h>
+#include <math.h>
+
+#include "axml.h"
+#include "mic_native.h"
+
+static const int states = 4;
+static const int statesSquare = 16;
+static const int span = 4 * 4;
+static const int maxStateValue = 16;
+
+/* Common functions */
+
+void updateModel_MIC(pInfo* part)
+{
+  double
+    *EV               = part->EV,
+    *tipVector        = part->tipVector,
+    *aEV              = part->mic_EV,
+    *aTipVector       = part->mic_tipVector;
+
+  const int
+    states = part->states,
+    span = 4 * states,
+    maxState = getUndetermined(part->dataType) + 1;
+
+  int
+    k, l;
+  #pragma ivdep
+  for (l = 0; l < 4 * states * states; ++l)
+  {
+    aEV[l] = EV[(l / span) * states + (l % states)];
+  }
+
+  for(int k = 0; k < maxState; k++)
+  {
+    #pragma ivdep
+    for(int l = 0; l < states; l++)
+    {
+	aTipVector[k*span + l] = aTipVector[k*span + states + l] = aTipVector[k*span + 2*states + l] = aTipVector[k*span + 3*states + l] = tipVector[k*states + l];
+    }
+  }
+}
+
+/* DNA */
+
+void makeP_DNA_MIC(double z1, double z2, double *rptr, double *EI,  double *EIGN, int numberOfCategories, double *left, double *right,
+               boolean saveMem, int maxCat)
+{
+  int
+    i,
+    j,
+    k,
+    span = states * numberOfCategories;
+
+  /* assign some space for pre-computing and later re-using functions */
+
+  double lz1[4] __attribute__((align(BYTE_ALIGNMENT)));
+  double lz2[4] __attribute__((align(BYTE_ALIGNMENT)));
+  double d1[4] __attribute__((align(BYTE_ALIGNMENT)));
+  double d2[4] __attribute__((align(BYTE_ALIGNMENT)));
+
+
+  /* multiply branch lengths with eigenvalues */
+  for(i = 1; i < states; i++)
+    {
+      lz1[i] = EIGN[i] * z1;
+      lz2[i] = EIGN[i] * z2;
+    }
+
+
+  /* loop over the number of rate categories, this will be 4 for the GAMMA model and
+     variable for the CAT model */
+
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      /* exponentiate the rate multiplied by the branch */
+
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP(rptr[i] * lz1[j]);
+	  d2[j] = EXP(rptr[i] * lz2[j]);
+	}
+
+      /* now fill the P matrices for the two branch length values */
+
+      for(j = 0; j < states; j++)
+	{
+	  /* left and right are pre-allocated arrays */
+
+	  left[i * states + j] = 1.0;
+	  right[i * states + j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[k * span + i * states + j]  = d1[k] * EI[states * j + k];
+	      right[k * span + i * states + j] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+
+
+  /* if memory saving is enabled and we are using CAT we need to do one additional P matrix
+     calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+  if(saveMem)
+    {
+      i = maxCat;
+
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP (lz1[j]);
+	  d2[j] = EXP (lz2[j]);
+	}
+
+      for(j = 0; j < states; j++)
+	{
+	  left[statesSquare * i  + states * j] = 1.0;
+	  right[statesSquare * i + states * j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[statesSquare * i + states * j + k]  = d1[k] * EI[states * j + k];
+	      right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+}
+
+void precomputeTips_DNA_MIC(int tipCase, double *tipVector, double *left, double *right,
+                  double *umpLeft, double *umpRight,
+                  int numberOfCategories)
+{
+  /* no precomputation needed if both children are inner nodes */
+  if (tipCase == INNER_INNER)
+    return;
+
+  const int
+    span 	= states * 4,
+    umpSize 	= span * 16;
+
+  for(int k = 0; k < umpSize; ++k)
+  {
+      umpLeft[k] = 0.0;
+      umpRight[k] = 0.0;
+  }
+
+  for(int i = 0; i < maxStateValue; ++i)
+  {
+    for(int l = 0; l < states; ++l)
+    {
+	#pragma ivdep
+	#pragma vector aligned
+	for(int k = 0; k < span; ++k)
+	{
+	    umpLeft[span * i + k] +=  tipVector[i * states + l] * left[l * span + k];
+	    if (tipCase == TIP_TIP)
+	      umpRight[span * i + k] +=  tipVector[i * states + l] * right[l * span + k];
+	}
+    }
+  }
+}
+
+inline void mic_fma4x16(const double* inv, double* outv, double* mulv)
+{
+    __mmask8 k1 = _mm512_int2mask(0x0F);
+    __mmask8 k2 = _mm512_int2mask(0xF0);
+
+    __m512d acc1 = _mm512_setzero_pd();
+    __m512d acc2 = _mm512_setzero_pd();
+
+    __m512d t;
+
+    for(int k = 0; k < 4; k++)
+    {
+        t = _mm512_mask_extload_pd(t, k1, &inv[0 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+        t = _mm512_mask_extload_pd(t, k2, &inv[4 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+        __m512d m = _mm512_load_pd(&mulv[k * 16]);
+        acc1 = _mm512_fmadd_pd(t, m, acc1);
+
+        t = _mm512_mask_extload_pd(t, k1, &inv[8 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+        t = _mm512_mask_extload_pd(t, k2, &inv[12 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+        m = _mm512_load_pd(&mulv[k * 16 + 8]);
+        acc2 = _mm512_fmadd_pd(t, m, acc2);
+    }
+
+    _mm512_store_pd(&outv[0], acc1);
+    _mm512_store_pd(&outv[8], acc2);
+}
+
+void newviewGTRGAMMA_MIC(int tipCase,
+                  double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+                  unsigned char *tipX1, unsigned char *tipX2,
+                  int n, double *left, double *right, int *wgt, int *scalerIncrement,
+                  double *umpLeft, double *umpRight)
+{
+    __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+    __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+    __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+    int addScale = 0;
+
+  /* we assume that P-matrix and eigenvectors are in correct layout already */
+  double
+    *aEV = extEV,
+    *aRight = right,
+    *aLeft = left,
+    *umpX1 = umpLeft,
+    *umpX2 = umpRight;
+  
+  switch(tipCase)
+  {
+    case TIP_TIP:
+      {
+	#pragma noprefetch umpX1,umpX2
+	for (int i = 0; i < n; i++)
+        {
+            _mm_prefetch((const char *)&x3[span*(i+8)], _MM_HINT_ET1);
+            _mm_prefetch((const char *)&x3[span*(i+8) + 8], _MM_HINT_ET1);
+
+            _mm_prefetch((const char *)&x3[span*(i+1)], _MM_HINT_ET0);
+            _mm_prefetch((const char *)&x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+            const double *uX1 = &umpX1[16 * tipX1[i]];
+            const double *uX2 = &umpX2[16 * tipX2[i]];
+
+            double uX[16] __attribute__((align(64)));
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < 16; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+            }
+
+            double* v3 = &x3[i * 16];
+
+            mic_fma4x16(uX, v3, aEV);
+        } // sites loop
+      }
+      break;
+    case TIP_INNER:
+      {
+        #pragma noprefetch umpX1
+	for (int i = 0; i < n; i++)
+        {
+            _mm_prefetch((const char *)&x2[span*(i+16)], _MM_HINT_T1);
+            _mm_prefetch((const char *)&x2[span*(i+16) + 8], _MM_HINT_T1);
+            _mm_prefetch((const char *)&x3[span*(i+16)], _MM_HINT_ET1);
+            _mm_prefetch((const char *)&x3[span*(i+16) + 8], _MM_HINT_ET1);
+
+            _mm_prefetch((const char *)&x2[span*(i+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *)&x2[span*(i+1) + 8], _MM_HINT_T0);
+            _mm_prefetch((const char *)&x3[span*(i+1)], _MM_HINT_ET0);
+            _mm_prefetch((const char *)&x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+            /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+            double* uX1 = &umpX1[span * tipX1[i]];
+            double uX2[16] __attribute__((align(64)));
+            double uX[16] __attribute__((align(64)));
+
+            const double* v2 = &(x2[16 * i]);
+
+            mic_fma4x16(v2, uX2, aRight);
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < 16; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+            }
+
+            double* v3 = &(x3[span * i]);
+
+            mic_fma4x16(uX, v3, aEV);
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+	    t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax1 = _mm512_reduce_gmax_pd(t1);
+            __m512d t2 = _mm512_load_pd(&v3[8]);
+	    t2 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t2), absMask_MIC));
+            double vmax2 = _mm512_reduce_gmax_pd(t2);
+
+            if(vmax1 < minlikelihood && vmax2 < minlikelihood)
+            {
+	      /*	t1 = _mm512_mul_pd(t1, twotothe256_MIC);
+        	_mm512_store_pd(&v3[0], t1);
+        	t2 = _mm512_mul_pd(t2, twotothe256_MIC);
+        	_mm512_store_pd(&v3[8], t2);*/
+	     
+#pragma vector aligned nontemporal
+	      for(int l = 0; l < span; l++)
+		v3[l] *= twotothe256;
+
+                addScale += wgt[i];
+            }
+        } // site loop
+      }
+      break;
+    case INNER_INNER:
+    {
+        for (int i = 0; i < n; i++)
+        {
+            _mm_prefetch((const char *) &x1[span*(i+8)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x1[span*(i+8) + 8], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2[span*(i+8)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2[span*(i+8) + 8], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x3[span*(i+8)], _MM_HINT_ET1);
+            _mm_prefetch((const char *) &x3[span*(i+8) + 8], _MM_HINT_ET1);
+
+            _mm_prefetch((const char *) &x1[span*(i+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x1[span*(i+1) + 8], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2[span*(i+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2[span*(i+1) + 8], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x3[span*(i+1)], _MM_HINT_ET0);
+            _mm_prefetch((const char *) &x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+            double uX1[16] __attribute__((align(64)));
+            double uX2[16] __attribute__((align(64)));
+            double uX[16] __attribute__((align(64)));
+
+            const double* v1 = &(x1[span * i]);
+            const double* v2 = &(x2[span * i]);
+
+            mic_fma4x16(v1, uX1, aLeft);
+            mic_fma4x16(v2, uX2, aRight);
+
+            #pragma ivdep
+            #pragma vector aligned
+            for(int l = 0; l < 16; ++l)
+            {
+                uX[l] = uX1[l] * uX2[l];
+            }
+
+            double* v3 =  &(x3[span * i]);
+
+            mic_fma4x16(uX, v3, aEV);
+
+            __m512d t1 = _mm512_load_pd(&v3[0]);
+	    t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+            double vmax1 = _mm512_reduce_gmax_pd(t1);
+            __m512d t2 = _mm512_load_pd(&v3[8]);
+	    t2 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t2), absMask_MIC));
+            double vmax2 = _mm512_reduce_gmax_pd(t2);
+
+            if(vmax1 < minlikelihood && vmax2 < minlikelihood)
+            {
+	      /* t1 = _mm512_mul_pd(t1, twotothe256_MIC);
+        	_mm512_store_pd(&v3[0], t1);
+        	t2 = _mm512_mul_pd(t2, twotothe256_MIC);
+        	_mm512_store_pd(&v3[8], t2);
+	      */
+	      
+#pragma vector aligned nontemporal
+	      for(int l = 0; l < span; l++)
+		v3[l] *= twotothe256;
+
+	      addScale += wgt[i];
+            }
+        }
+    } break;
+    default:
+//      assert(0);
+      break;
+  }
+
+  *scalerIncrement = addScale;
+
+}
+
+double evaluateGAMMA_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+                 unsigned char *tipX1, const int n, double *diagptable)
+{
+    double sum = 0.0;
+
+    /* the left node is a tip */
+    if(tipX1)
+    {
+	double
+	  *aTipVec = tipVector;
+
+        /* loop over the sites of this partition */
+        for (int i = 0; i < n; i++)
+        {
+          /* access pre-computed tip vector values via a lookup table */
+          const double *x1 = &(aTipVec[16 * tipX1[i]]);
+          /* access the other(inner) node at the other end of the branch */
+          const double *x2 = &(x2_start[span * i]);
+
+          double term = 0.;
+
+          #pragma ivdep
+          #pragma vector aligned
+          for(int j = 0; j < 16; j++)
+              term += x1[j] * x2[j] * diagptable[j];
+
+          term = log(0.25 * fabs(term));
+
+          sum +=  wgt[i] * term;
+        }
+    }
+    else
+    {
+        for (int i = 0; i < n; i++)
+        {
+            _mm_prefetch((const char *) &x1_start[span*(i+8)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x1_start[span*(i+8) + 8], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2_start[span*(i+8)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2_start[span*(i+8) + 8], _MM_HINT_T1);
+
+            _mm_prefetch((const char *) &x1_start[span*(i+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x1_start[span*(i+1) + 8], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2_start[span*(i+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2_start[span*(i+1) + 8], _MM_HINT_T0);
+
+          const double *x1 = &(x1_start[span * i]);
+          const double *x2 = &(x2_start[span * i]);
+
+          double term = 0.;
+
+          #pragma ivdep
+          #pragma vector aligned
+          for(int j = 0; j < 16; j++)
+              term += x1[j] * x2[j]  * diagptable[j];
+
+          term = log(0.25 * fabs(term));
+
+          sum +=  wgt[i] * term;
+        }
+    }
+
+    return sum;
+}
+
+void sumGAMMA_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+    unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+    const double
+      *aTipVec = tipVector;
+
+    switch(tipCase)
+    {
+      case TIP_TIP:
+      {
+        #pragma unroll(8)
+        for(int i = 0; i < n; i++)
+        {
+            const double *left  = &(aTipVec[16 * tipX1[i]]);
+            const double *right = &(aTipVec[16 * tipX2[i]]);
+
+            #pragma ivdep
+            #pragma vector aligned nontemporal
+            for(int l = 0; l < 16; l++)
+            {
+                sumtable[i * span + l] = left[l] * right[l];
+            }
+        }
+      } break;
+      case TIP_INNER:
+      {
+	#pragma unroll(8)
+	for(int i = 0; i < n; i++)
+        {
+          _mm_prefetch((const char *) &x2_start[span*(i+32)], _MM_HINT_T1);
+          _mm_prefetch((const char *) &x2_start[span*(i+32) + 8], _MM_HINT_T1);
+
+          _mm_prefetch((const char *) &x2_start[span*(i+8)], _MM_HINT_T0);
+          _mm_prefetch((const char *) &x2_start[span*(i+8) + 8], _MM_HINT_T0);
+
+          const double *left = &(aTipVec[16 * tipX1[i]]);
+          const double *right = &(x2_start[span * i]);
+
+          #pragma ivdep
+          #pragma vector aligned nontemporal
+          for(int l = 0; l < 16; l++)
+          {
+              sumtable[i * span + l] = left[l] * right[l];
+          }
+        }
+      } break;
+      case INNER_INNER:
+      {
+	#pragma unroll(8)
+        for(int i = 0; i < n; i++)
+        {
+            _mm_prefetch((const char *) &x1_start[span*(i+16)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x1_start[span*(i+16) + 8], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2_start[span*(i+16)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &x2_start[span*(i+16) + 8], _MM_HINT_T1);
+
+            _mm_prefetch((const char *) &x1_start[span*(i+4)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x1_start[span*(i+4) + 8], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2_start[span*(i+4)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &x2_start[span*(i+4) + 8], _MM_HINT_T0);
+
+            const double *left  = &(x1_start[span * i]);
+            const double *right = &(x2_start[span * i]);
+
+            #pragma ivdep
+            #pragma vector aligned nontemporal
+            for(int l = 0; l < 16; l++)
+            {
+                sumtable[i * span + l] = left[l] * right[l];
+            }
+        }
+      } break;
+  //    default:
+  //      assert(0);
+    }
+}
+
+void coreGTRGAMMA_MIC(const int upper, double *sumtable,
+    volatile double *ext_dlnLdlz,  volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+    double diagptable0[16] __attribute__((align(64)));
+    double diagptable1[16] __attribute__((align(64)));
+    double diagptable2[16] __attribute__((align(64)));
+    double diagptable01[16] __attribute__((align(64)));
+    double diagptable02[16] __attribute__((align(64)));
+
+    /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+    for(int i = 0; i < 4; i++)
+    {
+        const double ki = gammaRates[i];
+        const double kisqr = ki * ki;
+
+        diagptable0[i*4] = 1.;
+        diagptable1[i*4] = 0.;
+        diagptable2[i*4] = 0.;
+
+        for(int l = 1; l < 4; l++)
+        {
+          diagptable0[i * 4 + l]  = exp(EIGN[l] * ki * lz);
+          diagptable1[i * 4 + l] = EIGN[l] * ki;
+          diagptable2[i * 4 + l] = EIGN[l] * EIGN[l] * kisqr;
+        }
+    }
+
+    #pragma ivdep
+    for(int i = 0; i < 16; i++)
+    {
+        diagptable01[i] = diagptable0[i] * diagptable1[i];
+        diagptable02[i] = diagptable0[i] * diagptable2[i];
+    }
+
+    /* loop over sites in this partition */
+
+    const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+    double dlnLBuf[8] __attribute__((align(64)));
+    double d2lnLBuf[8] __attribute__((align(64)));
+    for (int j = 0; j < 8; ++j)
+    {
+        dlnLBuf[j] = 0.;
+        d2lnLBuf[j] = 0.;
+    }
+
+    __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+    for (int i = 0; i < aligned_width; i++)
+    {
+        _mm_prefetch((const char *) &sumtable[i * span * 8], _MM_HINT_T0);
+        _mm_prefetch((const char *) &sumtable[i * span * 8 + 8], _MM_HINT_T0);
+
+        /* access the array with pre-computed values */
+        const double *sum = &sumtable[i * span * 8];
+
+        /* initial per-site likelihood and 1st and 2nd derivatives */
+
+        double invBuf[8] __attribute__((align(64)));
+        double d1Buf[8] __attribute__((align(64)));
+        double d2Buf[8] __attribute__((align(64)));
+
+        __m512d invVec;
+        __m512d d1Vec;
+        __m512d d2Vec;
+        int mask = 0x01;
+
+        #pragma noprefetch sum
+        #pragma unroll(8)
+        for(int j = 0; j < 8; j++)
+        {
+            _mm_prefetch((const char *) &sum[span*(j+8)], _MM_HINT_T1);
+            _mm_prefetch((const char *) &sum[span*(j+8) + 8], _MM_HINT_T1);
+
+            _mm_prefetch((const char *) &sum[span*(j+1)], _MM_HINT_T0);
+            _mm_prefetch((const char *) &sum[span*(j+1) + 8], _MM_HINT_T0);
+
+            __m512d d0_1 = _mm512_load_pd(&diagptable0[0]);
+            __m512d d0_2 = _mm512_load_pd(&diagptable0[8]);
+
+            __m512d d01_1 = _mm512_load_pd(&diagptable01[0]);
+            __m512d d01_2 = _mm512_load_pd(&diagptable01[8]);
+
+            __m512d d02_1 = _mm512_load_pd(&diagptable02[0]);
+            __m512d d02_2 = _mm512_load_pd(&diagptable02[8]);
+
+            __m512d s_1 = _mm512_load_pd(&sum[j*16]);
+            __m512d s_2 = _mm512_load_pd(&sum[j*16 + 8]);
+            __m512d inv_1 = _mm512_mul_pd(d0_1, s_1);
+            __m512d d1_1 = _mm512_mul_pd(d01_1, s_1);
+            __m512d d2_1 = _mm512_mul_pd(d02_1, s_1);
+
+            __m512d inv_2 = _mm512_fmadd_pd(d0_2, s_2, inv_1);
+            __m512d d1_2 = _mm512_fmadd_pd(d01_2, s_2, d1_1);
+            __m512d d2_2 = _mm512_fmadd_pd(d02_2, s_2, d2_1);
+
+            __mmask8 k1 = _mm512_int2mask(mask);
+            mask <<= 1;
+
+            // reduce
+            inv_2 = _mm512_add_pd (inv_2, _mm512_swizzle_pd(inv_2, _MM_SWIZ_REG_CDAB));
+            inv_2 = _mm512_add_pd (inv_2, _mm512_swizzle_pd(inv_2, _MM_SWIZ_REG_BADC));
+            inv_2 = _mm512_add_pd (inv_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_2), _MM_PERM_BADC)));
+            invVec = _mm512_mask_mov_pd(invVec, k1, inv_2);
+
+            d1_2 = _mm512_add_pd (d1_2, _mm512_swizzle_pd(d1_2, _MM_SWIZ_REG_CDAB));
+            d1_2 = _mm512_add_pd (d1_2, _mm512_swizzle_pd(d1_2, _MM_SWIZ_REG_BADC));
+            d1_2 = _mm512_add_pd (d1_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_2), _MM_PERM_BADC)));
+            d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_2);
+
+            d2_2 = _mm512_add_pd (d2_2, _mm512_swizzle_pd(d2_2, _MM_SWIZ_REG_CDAB));
+            d2_2 = _mm512_add_pd (d2_2, _mm512_swizzle_pd(d2_2, _MM_SWIZ_REG_BADC));
+            d2_2 = _mm512_add_pd (d2_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_2), _MM_PERM_BADC)));
+            d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_2);
+        }
+
+        _mm512_store_pd(&invBuf[0], invVec);
+        _mm512_store_pd(&d1Buf[0], d1Vec);
+        _mm512_store_pd(&d2Buf[0], d2Vec);
+
+        #pragma ivdep
+        #pragma vector aligned
+        for (int j = 0; j < 8; ++j)
+        {
+            const double inv_Li = 1.0 / invBuf[j];
+
+            const double d1 = d1Buf[j] * inv_Li;
+            const double d2 = d2Buf[j] * inv_Li;
+
+            dlnLBuf[j] += wgt[i * 8 + j] * d1;
+            d2lnLBuf[j] += wgt[i * 8 + j] * (d2 - d1 * d1);
+        }
+    } // site loop
+
+    double dlnLdlz = 0.;
+    double d2lnLdlz2 = 0.;
+    for (int j = 0; j < 8; ++j)
+    {
+        dlnLdlz += dlnLBuf[j];
+        d2lnLdlz2 += d2lnLBuf[j];
+    }
+
+    *ext_dlnLdlz   = dlnLdlz;
+    *ext_d2lnLdlz2 = d2lnLdlz2;
+}
diff --git a/examl/models.c b/examl/models.c
new file mode 100644
index 0000000..02cd746
--- /dev/null
+++ b/examl/models.c
@@ -0,0 +1,4243 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands 
+ *  of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+extern int optimizeRatesInvocations;
+extern int optimizeRateCategoryInvocations;
+extern int optimizeAlphaInvocations;
+extern int optimizeTTRatioInvocations;
+extern int optimizeInvarInvocations;
+
+extern const unsigned int bitVectorSecondary[256];
+extern const unsigned int bitVector32[33];
+extern const unsigned int bitVectorAA[23];
+extern const unsigned int bitVectorIdentity[256];
+
+extern const partitionLengths pLengths[MAX_MODEL];
+
+
+
+
+
+//extern FILE *byteFile;
+
+
+
+
+
+
+
+
+
+
+
+void putWAG(double *ext_initialRates)
+{ 
+  double
+    scaler,
+    q[20][20],
+    daa[400];
+
+  int 
+    i,
+    j,
+    r;
+
+  daa[ 1*20+ 0] =  55.15710; daa[ 2*20+ 0] =  50.98480; daa[ 2*20+ 1] =  63.53460; 
+  daa[ 3*20+ 0] =  73.89980; daa[ 3*20+ 1] =  14.73040; daa[ 3*20+ 2] = 542.94200; 
+  daa[ 4*20+ 0] = 102.70400; daa[ 4*20+ 1] =  52.81910; daa[ 4*20+ 2] =  26.52560; 
+  daa[ 4*20+ 3] =   3.02949; daa[ 5*20+ 0] =  90.85980; daa[ 5*20+ 1] = 303.55000; 
+  daa[ 5*20+ 2] = 154.36400; daa[ 5*20+ 3] =  61.67830; daa[ 5*20+ 4] =   9.88179; 
+  daa[ 6*20+ 0] = 158.28500; daa[ 6*20+ 1] =  43.91570; daa[ 6*20+ 2] =  94.71980; 
+  daa[ 6*20+ 3] = 617.41600; daa[ 6*20+ 4] =   2.13520; daa[ 6*20+ 5] = 546.94700; 
+  daa[ 7*20+ 0] = 141.67200; daa[ 7*20+ 1] =  58.46650; daa[ 7*20+ 2] = 112.55600; 
+  daa[ 7*20+ 3] =  86.55840; daa[ 7*20+ 4] =  30.66740; daa[ 7*20+ 5] =  33.00520; 
+  daa[ 7*20+ 6] =  56.77170; daa[ 8*20+ 0] =  31.69540; daa[ 8*20+ 1] = 213.71500; 
+  daa[ 8*20+ 2] = 395.62900; daa[ 8*20+ 3] =  93.06760; daa[ 8*20+ 4] =  24.89720; 
+  daa[ 8*20+ 5] = 429.41100; daa[ 8*20+ 6] =  57.00250; daa[ 8*20+ 7] =  24.94100; 
+  daa[ 9*20+ 0] =  19.33350; daa[ 9*20+ 1] =  18.69790; daa[ 9*20+ 2] =  55.42360; 
+  daa[ 9*20+ 3] =   3.94370; daa[ 9*20+ 4] =  17.01350; daa[ 9*20+ 5] =  11.39170; 
+  daa[ 9*20+ 6] =  12.73950; daa[ 9*20+ 7] =   3.04501; daa[ 9*20+ 8] =  13.81900; 
+  daa[10*20+ 0] =  39.79150; daa[10*20+ 1] =  49.76710; daa[10*20+ 2] =  13.15280; 
+  daa[10*20+ 3] =   8.48047; daa[10*20+ 4] =  38.42870; daa[10*20+ 5] =  86.94890; 
+  daa[10*20+ 6] =  15.42630; daa[10*20+ 7] =   6.13037; daa[10*20+ 8] =  49.94620; 
+  daa[10*20+ 9] = 317.09700; daa[11*20+ 0] =  90.62650; daa[11*20+ 1] = 535.14200; 
+  daa[11*20+ 2] = 301.20100; daa[11*20+ 3] =  47.98550; daa[11*20+ 4] =   7.40339; 
+  daa[11*20+ 5] = 389.49000; daa[11*20+ 6] = 258.44300; daa[11*20+ 7] =  37.35580; 
+  daa[11*20+ 8] =  89.04320; daa[11*20+ 9] =  32.38320; daa[11*20+10] =  25.75550; 
+  daa[12*20+ 0] =  89.34960; daa[12*20+ 1] =  68.31620; daa[12*20+ 2] =  19.82210; 
+  daa[12*20+ 3] =  10.37540; daa[12*20+ 4] =  39.04820; daa[12*20+ 5] = 154.52600; 
+  daa[12*20+ 6] =  31.51240; daa[12*20+ 7] =  17.41000; daa[12*20+ 8] =  40.41410; 
+  daa[12*20+ 9] = 425.74600; daa[12*20+10] = 485.40200; daa[12*20+11] =  93.42760; 
+  daa[13*20+ 0] =  21.04940; daa[13*20+ 1] =  10.27110; daa[13*20+ 2] =   9.61621; 
+  daa[13*20+ 3] =   4.67304; daa[13*20+ 4] =  39.80200; daa[13*20+ 5] =   9.99208; 
+  daa[13*20+ 6] =   8.11339; daa[13*20+ 7] =   4.99310; daa[13*20+ 8] =  67.93710; 
+  daa[13*20+ 9] = 105.94700; daa[13*20+10] = 211.51700; daa[13*20+11] =   8.88360; 
+  daa[13*20+12] = 119.06300; daa[14*20+ 0] = 143.85500; daa[14*20+ 1] =  67.94890; 
+  daa[14*20+ 2] =  19.50810; daa[14*20+ 3] =  42.39840; daa[14*20+ 4] =  10.94040; 
+  daa[14*20+ 5] =  93.33720; daa[14*20+ 6] =  68.23550; daa[14*20+ 7] =  24.35700; 
+  daa[14*20+ 8] =  69.61980; daa[14*20+ 9] =   9.99288; daa[14*20+10] =  41.58440; 
+  daa[14*20+11] =  55.68960; daa[14*20+12] =  17.13290; daa[14*20+13] =  16.14440; 
+  daa[15*20+ 0] = 337.07900; daa[15*20+ 1] = 122.41900; daa[15*20+ 2] = 397.42300; 
+  daa[15*20+ 3] = 107.17600; daa[15*20+ 4] = 140.76600; daa[15*20+ 5] = 102.88700; 
+  daa[15*20+ 6] =  70.49390; daa[15*20+ 7] = 134.18200; daa[15*20+ 8] =  74.01690; 
+  daa[15*20+ 9] =  31.94400; daa[15*20+10] =  34.47390; daa[15*20+11] =  96.71300; 
+  daa[15*20+12] =  49.39050; daa[15*20+13] =  54.59310; daa[15*20+14] = 161.32800; 
+  daa[16*20+ 0] = 212.11100; daa[16*20+ 1] =  55.44130; daa[16*20+ 2] = 203.00600; 
+  daa[16*20+ 3] =  37.48660; daa[16*20+ 4] =  51.29840; daa[16*20+ 5] =  85.79280; 
+  daa[16*20+ 6] =  82.27650; daa[16*20+ 7] =  22.58330; daa[16*20+ 8] =  47.33070; 
+  daa[16*20+ 9] = 145.81600; daa[16*20+10] =  32.66220; daa[16*20+11] = 138.69800; 
+  daa[16*20+12] = 151.61200; daa[16*20+13] =  17.19030; daa[16*20+14] =  79.53840; 
+  daa[16*20+15] = 437.80200; daa[17*20+ 0] =  11.31330; daa[17*20+ 1] = 116.39200; 
+  daa[17*20+ 2] =   7.19167; daa[17*20+ 3] =  12.97670; daa[17*20+ 4] =  71.70700; 
+  daa[17*20+ 5] =  21.57370; daa[17*20+ 6] =  15.65570; daa[17*20+ 7] =  33.69830; 
+  daa[17*20+ 8] =  26.25690; daa[17*20+ 9] =  21.24830; daa[17*20+10] =  66.53090; 
+  daa[17*20+11] =  13.75050; daa[17*20+12] =  51.57060; daa[17*20+13] = 152.96400; 
+  daa[17*20+14] =  13.94050; daa[17*20+15] =  52.37420; daa[17*20+16] =  11.08640; 
+  daa[18*20+ 0] =  24.07350; daa[18*20+ 1] =  38.15330; daa[18*20+ 2] = 108.60000; 
+  daa[18*20+ 3] =  32.57110; daa[18*20+ 4] =  54.38330; daa[18*20+ 5] =  22.77100; 
+  daa[18*20+ 6] =  19.63030; daa[18*20+ 7] =  10.36040; daa[18*20+ 8] = 387.34400; 
+  daa[18*20+ 9] =  42.01700; daa[18*20+10] =  39.86180; daa[18*20+11] =  13.32640; 
+  daa[18*20+12] =  42.84370; daa[18*20+13] = 645.42800; daa[18*20+14] =  21.60460; 
+  daa[18*20+15] =  78.69930; daa[18*20+16] =  29.11480; daa[18*20+17] = 248.53900; 
+  daa[19*20+ 0] = 200.60100; daa[19*20+ 1] =  25.18490; daa[19*20+ 2] =  19.62460; 
+  daa[19*20+ 3] =  15.23350; daa[19*20+ 4] = 100.21400; daa[19*20+ 5] =  30.12810; 
+  daa[19*20+ 6] =  58.87310; daa[19*20+ 7] =  18.72470; daa[19*20+ 8] =  11.83580; 
+  daa[19*20+ 9] = 782.13000; daa[19*20+10] = 180.03400; daa[19*20+11] =  30.54340; 
+  daa[19*20+12] = 205.84500; daa[19*20+13] =  64.98920; daa[19*20+14] =  31.48870; 
+  daa[19*20+15] =  23.27390; daa[19*20+16] = 138.82300; daa[19*20+17] =  36.53690; 
+  daa[19*20+18] =  31.47300; 
+
+  for(i = 0; i < 20; i++)
+    for(j = 0; j < 20; j++)
+      q[i][j] = 0.0;
+
+  for (i=0; i<20; i++)  
+    for (j=0; j<i; j++)               
+      daa[j*20+i] = daa[i*20+j];
+
+  for(i = 0; i < 19; i++)
+    for(j = i + 1; j < 20; j++)      
+      q[i][j] = daa[i * 20 + j];
+
+  
+  /*
+    for (i=0; i<20; i++) 
+    {
+      for (j=0; j<20; j++)
+	printf("%1.2f ", q[i][j]);
+      printf("\n");
+    }
+    printf("\n");
+
+    printf("%f\n", q[18][19]);
+  */
+
+  scaler = 1.0 / q[18][19];
+
+  
+
+  for(i = 0; i < 19; i++)
+    for(j = i + 1; j < 20; j++)      
+      q[i][j] *= scaler;
+
+  for(i = 0, r = 0; i < 19; i++)          
+    for(j = i + 1; j < 20; j++)      
+      ext_initialRates[r++] = q[i][j];           
+      
+  /*
+    for (i=0; i<20; i++) 
+    {
+      for (j=0; j<20; j++)
+	printf("%1.2f ", q[i][j]);
+      printf("\n");
+    }
+    printf("\n");
+  */
+
+}
+
+static void makeAASubstMat(double *daa, double *f, double *rates, double *freqs)
+{
+  int 
+    i, j, r = 0;
+
+  for(i = 1; i < 20; i++)
+    for(j = 0; j < i; j++)
+      {
+	daa[i * 20 + j] = rates[r];
+	r++;
+      }
+  
+  assert(r == 190);
+  
+  for(i = 0; i < 20; i++)
+    f[i] = freqs[i];
+}
+
+static void initProtMat(double f[20], int proteinMatrix, double *ext_initialRates, int lg4_index)
+{ 
+  double q[20][20];
+  double daa[400], max, temp;
+  int i, j, r;
+  double *initialRates = ext_initialRates;
+  double scaler;
+
+  {
+      switch(proteinMatrix)
+	{
+	case DAYHOFF:
+	  {	
+	    daa[ 1*20+ 0] =   27.00; daa[ 2*20+ 0] =   98.00; daa[ 2*20+ 1] =   32.00; daa[ 3*20+ 0] =  120.00;
+	    daa[ 3*20+ 1] =    0.00; daa[ 3*20+ 2] =  905.00; daa[ 4*20+ 0] =   36.00; daa[ 4*20+ 1] =   23.00;
+	    daa[ 4*20+ 2] =    0.00; daa[ 4*20+ 3] =    0.00; daa[ 5*20+ 0] =   89.00; daa[ 5*20+ 1] =  246.00;
+	    daa[ 5*20+ 2] =  103.00; daa[ 5*20+ 3] =  134.00; daa[ 5*20+ 4] =    0.00; daa[ 6*20+ 0] =  198.00;
+	    daa[ 6*20+ 1] =    1.00; daa[ 6*20+ 2] =  148.00; daa[ 6*20+ 3] = 1153.00; daa[ 6*20+ 4] =    0.00;
+	    daa[ 6*20+ 5] =  716.00; daa[ 7*20+ 0] =  240.00; daa[ 7*20+ 1] =    9.00; daa[ 7*20+ 2] =  139.00;
+	    daa[ 7*20+ 3] =  125.00; daa[ 7*20+ 4] =   11.00; daa[ 7*20+ 5] =   28.00; daa[ 7*20+ 6] =   81.00;
+	    daa[ 8*20+ 0] =   23.00; daa[ 8*20+ 1] =  240.00; daa[ 8*20+ 2] =  535.00; daa[ 8*20+ 3] =   86.00;
+	    daa[ 8*20+ 4] =   28.00; daa[ 8*20+ 5] =  606.00; daa[ 8*20+ 6] =   43.00; daa[ 8*20+ 7] =   10.00;
+	    daa[ 9*20+ 0] =   65.00; daa[ 9*20+ 1] =   64.00; daa[ 9*20+ 2] =   77.00; daa[ 9*20+ 3] =   24.00;
+	    daa[ 9*20+ 4] =   44.00; daa[ 9*20+ 5] =   18.00; daa[ 9*20+ 6] =   61.00; daa[ 9*20+ 7] =    0.00;
+	    daa[ 9*20+ 8] =    7.00; daa[10*20+ 0] =   41.00; daa[10*20+ 1] =   15.00; daa[10*20+ 2] =   34.00;
+	    daa[10*20+ 3] =    0.00; daa[10*20+ 4] =    0.00; daa[10*20+ 5] =   73.00; daa[10*20+ 6] =   11.00;
+	    daa[10*20+ 7] =    7.00; daa[10*20+ 8] =   44.00; daa[10*20+ 9] =  257.00; daa[11*20+ 0] =   26.00;
+	    daa[11*20+ 1] =  464.00; daa[11*20+ 2] =  318.00; daa[11*20+ 3] =   71.00; daa[11*20+ 4] =    0.00;
+	    daa[11*20+ 5] =  153.00; daa[11*20+ 6] =   83.00; daa[11*20+ 7] =   27.00; daa[11*20+ 8] =   26.00;
+	    daa[11*20+ 9] =   46.00; daa[11*20+10] =   18.00; daa[12*20+ 0] =   72.00; daa[12*20+ 1] =   90.00;
+	    daa[12*20+ 2] =    1.00; daa[12*20+ 3] =    0.00; daa[12*20+ 4] =    0.00; daa[12*20+ 5] =  114.00;
+	    daa[12*20+ 6] =   30.00; daa[12*20+ 7] =   17.00; daa[12*20+ 8] =    0.00; daa[12*20+ 9] =  336.00;
+	    daa[12*20+10] =  527.00; daa[12*20+11] =  243.00; daa[13*20+ 0] =   18.00; daa[13*20+ 1] =   14.00;
+	    daa[13*20+ 2] =   14.00; daa[13*20+ 3] =    0.00; daa[13*20+ 4] =    0.00; daa[13*20+ 5] =    0.00;
+	    daa[13*20+ 6] =    0.00; daa[13*20+ 7] =   15.00; daa[13*20+ 8] =   48.00; daa[13*20+ 9] =  196.00;
+	    daa[13*20+10] =  157.00; daa[13*20+11] =    0.00; daa[13*20+12] =   92.00; daa[14*20+ 0] =  250.00;
+	    daa[14*20+ 1] =  103.00; daa[14*20+ 2] =   42.00; daa[14*20+ 3] =   13.00; daa[14*20+ 4] =   19.00;
+	    daa[14*20+ 5] =  153.00; daa[14*20+ 6] =   51.00; daa[14*20+ 7] =   34.00; daa[14*20+ 8] =   94.00;
+	    daa[14*20+ 9] =   12.00; daa[14*20+10] =   32.00; daa[14*20+11] =   33.00; daa[14*20+12] =   17.00;
+	    daa[14*20+13] =   11.00; daa[15*20+ 0] =  409.00; daa[15*20+ 1] =  154.00; daa[15*20+ 2] =  495.00;
+	    daa[15*20+ 3] =   95.00; daa[15*20+ 4] =  161.00; daa[15*20+ 5] =   56.00; daa[15*20+ 6] =   79.00;
+	    daa[15*20+ 7] =  234.00; daa[15*20+ 8] =   35.00; daa[15*20+ 9] =   24.00; daa[15*20+10] =   17.00;
+	    daa[15*20+11] =   96.00; daa[15*20+12] =   62.00; daa[15*20+13] =   46.00; daa[15*20+14] =  245.00;
+	    daa[16*20+ 0] =  371.00; daa[16*20+ 1] =   26.00; daa[16*20+ 2] =  229.00; daa[16*20+ 3] =   66.00;
+	    daa[16*20+ 4] =   16.00; daa[16*20+ 5] =   53.00; daa[16*20+ 6] =   34.00; daa[16*20+ 7] =   30.00;
+	    daa[16*20+ 8] =   22.00; daa[16*20+ 9] =  192.00; daa[16*20+10] =   33.00; daa[16*20+11] =  136.00;
+	    daa[16*20+12] =  104.00; daa[16*20+13] =   13.00; daa[16*20+14] =   78.00; daa[16*20+15] =  550.00;
+	    daa[17*20+ 0] =    0.00; daa[17*20+ 1] =  201.00; daa[17*20+ 2] =   23.00; daa[17*20+ 3] =    0.00;
+	    daa[17*20+ 4] =    0.00; daa[17*20+ 5] =    0.00; daa[17*20+ 6] =    0.00; daa[17*20+ 7] =    0.00;
+	    daa[17*20+ 8] =   27.00; daa[17*20+ 9] =    0.00; daa[17*20+10] =   46.00; daa[17*20+11] =    0.00;
+	    daa[17*20+12] =    0.00; daa[17*20+13] =   76.00; daa[17*20+14] =    0.00; daa[17*20+15] =   75.00;
+	    daa[17*20+16] =    0.00; daa[18*20+ 0] =   24.00; daa[18*20+ 1] =    8.00; daa[18*20+ 2] =   95.00;
+	    daa[18*20+ 3] =    0.00; daa[18*20+ 4] =   96.00; daa[18*20+ 5] =    0.00; daa[18*20+ 6] =   22.00;
+	    daa[18*20+ 7] =    0.00; daa[18*20+ 8] =  127.00; daa[18*20+ 9] =   37.00; daa[18*20+10] =   28.00;
+	    daa[18*20+11] =   13.00; daa[18*20+12] =    0.00; daa[18*20+13] =  698.00; daa[18*20+14] =    0.00;
+	    daa[18*20+15] =   34.00; daa[18*20+16] =   42.00; daa[18*20+17] =   61.00; daa[19*20+ 0] =  208.00;
+	    daa[19*20+ 1] =   24.00; daa[19*20+ 2] =   15.00; daa[19*20+ 3] =   18.00; daa[19*20+ 4] =   49.00;
+	    daa[19*20+ 5] =   35.00; daa[19*20+ 6] =   37.00; daa[19*20+ 7] =   54.00; daa[19*20+ 8] =   44.00;
+	    daa[19*20+ 9] =  889.00; daa[19*20+10] =  175.00; daa[19*20+11] =   10.00; daa[19*20+12] =  258.00;
+	    daa[19*20+13] =   12.00; daa[19*20+14] =   48.00; daa[19*20+15] =   30.00; daa[19*20+16] =  157.00;
+	    daa[19*20+17] =    0.00; daa[19*20+18] =   28.00;	    	    
+
+
+	    /*f[ 0] = 0.087000; f[ 1] = 0.041000; f[ 2] = 0.040000; f[ 3] = 0.047000;
+	    f[ 4] = 0.034000; f[ 5] = 0.038000; f[ 6] = 0.050000; f[ 7] = 0.089000;
+	    f[ 8] = 0.034000; f[ 9] = 0.037000; f[10] = 0.085000; f[11] = 0.080000;
+	    f[12] = 0.014000; f[13] = 0.040000; f[14] = 0.051000; f[15] = 0.070000;
+	    f[16] = 0.058000; f[17] = 0.011000; f[18] = 0.030000; f[19] = 0.064000;*/
+
+	    f[ 0] = 0.087127; f[ 1] = 0.040904; f[ 2] = 0.040432; f[ 3] = 0.046872;
+	    f[ 4] = 0.033474; f[ 5] = 0.038255; f[ 6] = 0.049530; f[ 7] = 0.088612;
+	    f[ 8] = 0.033618; f[ 9] = 0.036886; f[10] = 0.085357; f[11] = 0.080482;
+	    f[12] = 0.014753; f[13] = 0.039772; f[14] = 0.050680; f[15] = 0.069577;
+	    f[16] = 0.058542; f[17] = 0.010494; f[18] = 0.029916; f[19] = 0.064717;
+	  }
+	  break;
+	case DCMUT:
+	  {	
+	    daa[ 1*20+ 0] =   26.78280; daa[ 2*20+ 0] =   98.44740; daa[ 2*20+ 1] =   32.70590; daa[ 3*20+ 0] =  119.98050; 
+	    daa[ 3*20+ 1] =    0.00000; daa[ 3*20+ 2] =  893.15150; daa[ 4*20+ 0] =   36.00160; daa[ 4*20+ 1] =   23.23740; 
+	    daa[ 4*20+ 2] =    0.00000; daa[ 4*20+ 3] =    0.00000; daa[ 5*20+ 0] =   88.77530; daa[ 5*20+ 1] =  243.99390; 
+	    daa[ 5*20+ 2] =  102.85090; daa[ 5*20+ 3] =  134.85510; daa[ 5*20+ 4] =    0.00000; daa[ 6*20+ 0] =  196.11670; 
+	    daa[ 6*20+ 1] =    0.00000; daa[ 6*20+ 2] =  149.34090; daa[ 6*20+ 3] = 1138.86590; daa[ 6*20+ 4] =    0.00000; 
+	    daa[ 6*20+ 5] =  708.60220; daa[ 7*20+ 0] =  238.61110; daa[ 7*20+ 1] =    8.77910; daa[ 7*20+ 2] =  138.53520; 
+	    daa[ 7*20+ 3] =  124.09810; daa[ 7*20+ 4] =   10.72780; daa[ 7*20+ 5] =   28.15810; daa[ 7*20+ 6] =   81.19070; 
+	    daa[ 8*20+ 0] =   22.81160; daa[ 8*20+ 1] =  238.31480; daa[ 8*20+ 2] =  529.00240; daa[ 8*20+ 3] =   86.82410; 
+	    daa[ 8*20+ 4] =   28.27290; daa[ 8*20+ 5] =  601.16130; daa[ 8*20+ 6] =   43.94690; daa[ 8*20+ 7] =   10.68020; 
+	    daa[ 9*20+ 0] =   65.34160; daa[ 9*20+ 1] =   63.26290; daa[ 9*20+ 2] =   76.80240; daa[ 9*20+ 3] =   23.92480; 
+	    daa[ 9*20+ 4] =   43.80740; daa[ 9*20+ 5] =   18.03930; daa[ 9*20+ 6] =   60.95260; daa[ 9*20+ 7] =    0.00000; 
+	    daa[ 9*20+ 8] =    7.69810; daa[10*20+ 0] =   40.64310; daa[10*20+ 1] =   15.49240; daa[10*20+ 2] =   34.11130; 
+	    daa[10*20+ 3] =    0.00000; daa[10*20+ 4] =    0.00000; daa[10*20+ 5] =   73.07720; daa[10*20+ 6] =   11.28800; 
+	    daa[10*20+ 7] =    7.15140; daa[10*20+ 8] =   44.35040; daa[10*20+ 9] =  255.66850; daa[11*20+ 0] =   25.86350; 
+	    daa[11*20+ 1] =  461.01240; daa[11*20+ 2] =  314.83710; daa[11*20+ 3] =   71.69130; daa[11*20+ 4] =    0.00000; 
+	    daa[11*20+ 5] =  151.90780; daa[11*20+ 6] =   83.00780; daa[11*20+ 7] =   26.76830; daa[11*20+ 8] =   27.04750; 
+	    daa[11*20+ 9] =   46.08570; daa[11*20+10] =   18.06290; daa[12*20+ 0] =   71.78400; daa[12*20+ 1] =   89.63210; 
+	    daa[12*20+ 2] =    0.00000; daa[12*20+ 3] =    0.00000; daa[12*20+ 4] =    0.00000; daa[12*20+ 5] =  112.74990; 
+	    daa[12*20+ 6] =   30.48030; daa[12*20+ 7] =   17.03720; daa[12*20+ 8] =    0.00000; daa[12*20+ 9] =  333.27320; 
+	    daa[12*20+10] =  523.01150; daa[12*20+11] =  241.17390; daa[13*20+ 0] =   18.36410; daa[13*20+ 1] =   13.69060; 
+	    daa[13*20+ 2] =   13.85030; daa[13*20+ 3] =    0.00000; daa[13*20+ 4] =    0.00000; daa[13*20+ 5] =    0.00000; 
+	    daa[13*20+ 6] =    0.00000; daa[13*20+ 7] =   15.34780; daa[13*20+ 8] =   47.59270; daa[13*20+ 9] =  195.19510; 
+	    daa[13*20+10] =  156.51600; daa[13*20+11] =    0.00000; daa[13*20+12] =   92.18600; daa[14*20+ 0] =  248.59200; 
+	    daa[14*20+ 1] =  102.83130; daa[14*20+ 2] =   41.92440; daa[14*20+ 3] =   13.39400; daa[14*20+ 4] =   18.75500; 
+	    daa[14*20+ 5] =  152.61880; daa[14*20+ 6] =   50.70030; daa[14*20+ 7] =   34.71530; daa[14*20+ 8] =   93.37090; 
+	    daa[14*20+ 9] =   11.91520; daa[14*20+10] =   31.62580; daa[14*20+11] =   33.54190; daa[14*20+12] =   17.02050; 
+	    daa[14*20+13] =   11.05060; daa[15*20+ 0] =  405.18700; daa[15*20+ 1] =  153.15900; daa[15*20+ 2] =  488.58920; 
+	    daa[15*20+ 3] =   95.60970; daa[15*20+ 4] =  159.83560; daa[15*20+ 5] =   56.18280; daa[15*20+ 6] =   79.39990; 
+	    daa[15*20+ 7] =  232.22430; daa[15*20+ 8] =   35.36430; daa[15*20+ 9] =   24.79550; daa[15*20+10] =   17.14320; 
+	    daa[15*20+11] =   95.45570; daa[15*20+12] =   61.99510; daa[15*20+13] =   45.99010; daa[15*20+14] =  242.72020; 
+	    daa[16*20+ 0] =  368.03650; daa[16*20+ 1] =   26.57450; daa[16*20+ 2] =  227.16970; daa[16*20+ 3] =   66.09300; 
+	    daa[16*20+ 4] =   16.23660; daa[16*20+ 5] =   52.56510; daa[16*20+ 6] =   34.01560; daa[16*20+ 7] =   30.66620; 
+	    daa[16*20+ 8] =   22.63330; daa[16*20+ 9] =  190.07390; daa[16*20+10] =   33.10900; daa[16*20+11] =  135.05990; 
+	    daa[16*20+12] =  103.15340; daa[16*20+13] =   13.66550; daa[16*20+14] =   78.28570; daa[16*20+15] =  543.66740; 
+	    daa[17*20+ 0] =    0.00000; daa[17*20+ 1] =  200.13750; daa[17*20+ 2] =   22.49680; daa[17*20+ 3] =    0.00000; 
+	    daa[17*20+ 4] =    0.00000; daa[17*20+ 5] =    0.00000; daa[17*20+ 6] =    0.00000; daa[17*20+ 7] =    0.00000; 
+	    daa[17*20+ 8] =   27.05640; daa[17*20+ 9] =    0.00000; daa[17*20+10] =   46.17760; daa[17*20+11] =    0.00000; 
+	    daa[17*20+12] =    0.00000; daa[17*20+13] =   76.23540; daa[17*20+14] =    0.00000; daa[17*20+15] =   74.08190; 
+	    daa[17*20+16] =    0.00000; daa[18*20+ 0] =   24.41390; daa[18*20+ 1] =    7.80120; daa[18*20+ 2] =   94.69400; 
+	    daa[18*20+ 3] =    0.00000; daa[18*20+ 4] =   95.31640; daa[18*20+ 5] =    0.00000; daa[18*20+ 6] =   21.47170; 
+	    daa[18*20+ 7] =    0.00000; daa[18*20+ 8] =  126.54000; daa[18*20+ 9] =   37.48340; daa[18*20+10] =   28.65720; 
+	    daa[18*20+11] =   13.21420; daa[18*20+12] =    0.00000; daa[18*20+13] =  695.26290; daa[18*20+14] =    0.00000; 
+	    daa[18*20+15] =   33.62890; daa[18*20+16] =   41.78390; daa[18*20+17] =   60.80700; daa[19*20+ 0] =  205.95640; 
+	    daa[19*20+ 1] =   24.03680; daa[19*20+ 2] =   15.80670; daa[19*20+ 3] =   17.83160; daa[19*20+ 4] =   48.46780; 
+	    daa[19*20+ 5] =   34.69830; daa[19*20+ 6] =   36.72500; daa[19*20+ 7] =   53.81650; daa[19*20+ 8] =   43.87150; 
+	    daa[19*20+ 9] =  881.00380; daa[19*20+10] =  174.51560; daa[19*20+11] =   10.38500; daa[19*20+12] =  256.59550; 
+	    daa[19*20+13] =   12.36060; daa[19*20+14] =   48.50260; daa[19*20+15] =   30.38360; daa[19*20+16] =  156.19970; 
+	    daa[19*20+17] =    0.00000; daa[19*20+18] =   27.93790;   	    	   
+
+	    /* f[ 0] = 0.08700; f[ 1] = 0.04100; f[ 2] = 0.04000; f[ 3] = 0.04700;
+	    f[ 4] = 0.03300; f[ 5] = 0.03800; f[ 6] = 0.04900; f[ 7] = 0.08900;
+	    f[ 8] = 0.03400; f[ 9] = 0.03700; f[10] = 0.08500; f[11] = 0.08000;
+	    f[12] = 0.01500; f[13] = 0.04000; f[14] = 0.05200; f[15] = 0.06900;
+	    f[16] = 0.05900; f[17] = 0.01000; f[18] = 0.03000; f[19] = 0.06500;*/
+
+	    f[ 0] = 0.087127; f[ 1] = 0.040904; f[ 2] = 0.040432; f[ 3] = 0.046872;
+	    f[ 4] = 0.033474; f[ 5] = 0.038255; f[ 6] = 0.049530; f[ 7] = 0.088612;
+	    f[ 8] = 0.033619; f[ 9] = 0.036886; f[10] = 0.085357; f[11] = 0.080481;
+	    f[12] = 0.014753; f[13] = 0.039772; f[14] = 0.050680; f[15] = 0.069577;
+	    f[16] = 0.058542; f[17] = 0.010494; f[18] = 0.029916; f[19] = 0.064717;
+
+	  }
+	  break;
+	case JTT:
+	  {
+	    daa[ 1*20+ 0] =   58.00; daa[ 2*20+ 0] =   54.00; daa[ 2*20+ 1] =   45.00; daa[ 3*20+ 0] =   81.00;
+	    daa[ 3*20+ 1] =   16.00; daa[ 3*20+ 2] =  528.00; daa[ 4*20+ 0] =   56.00; daa[ 4*20+ 1] =  113.00;
+	    daa[ 4*20+ 2] =   34.00; daa[ 4*20+ 3] =   10.00; daa[ 5*20+ 0] =   57.00; daa[ 5*20+ 1] =  310.00;
+	    daa[ 5*20+ 2] =   86.00; daa[ 5*20+ 3] =   49.00; daa[ 5*20+ 4] =    9.00; daa[ 6*20+ 0] =  105.00;
+	    daa[ 6*20+ 1] =   29.00; daa[ 6*20+ 2] =   58.00; daa[ 6*20+ 3] =  767.00; daa[ 6*20+ 4] =    5.00;
+	    daa[ 6*20+ 5] =  323.00; daa[ 7*20+ 0] =  179.00; daa[ 7*20+ 1] =  137.00; daa[ 7*20+ 2] =   81.00;
+	    daa[ 7*20+ 3] =  130.00; daa[ 7*20+ 4] =   59.00; daa[ 7*20+ 5] =   26.00; daa[ 7*20+ 6] =  119.00;
+	    daa[ 8*20+ 0] =   27.00; daa[ 8*20+ 1] =  328.00; daa[ 8*20+ 2] =  391.00; daa[ 8*20+ 3] =  112.00;
+	    daa[ 8*20+ 4] =   69.00; daa[ 8*20+ 5] =  597.00; daa[ 8*20+ 6] =   26.00; daa[ 8*20+ 7] =   23.00;
+	    daa[ 9*20+ 0] =   36.00; daa[ 9*20+ 1] =   22.00; daa[ 9*20+ 2] =   47.00; daa[ 9*20+ 3] =   11.00;
+	    daa[ 9*20+ 4] =   17.00; daa[ 9*20+ 5] =    9.00; daa[ 9*20+ 6] =   12.00; daa[ 9*20+ 7] =    6.00;
+	    daa[ 9*20+ 8] =   16.00; daa[10*20+ 0] =   30.00; daa[10*20+ 1] =   38.00; daa[10*20+ 2] =   12.00;
+	    daa[10*20+ 3] =    7.00; daa[10*20+ 4] =   23.00; daa[10*20+ 5] =   72.00; daa[10*20+ 6] =    9.00;
+	    daa[10*20+ 7] =    6.00; daa[10*20+ 8] =   56.00; daa[10*20+ 9] =  229.00; daa[11*20+ 0] =   35.00;
+	    daa[11*20+ 1] =  646.00; daa[11*20+ 2] =  263.00; daa[11*20+ 3] =   26.00; daa[11*20+ 4] =    7.00;
+	    daa[11*20+ 5] =  292.00; daa[11*20+ 6] =  181.00; daa[11*20+ 7] =   27.00; daa[11*20+ 8] =   45.00;
+	    daa[11*20+ 9] =   21.00; daa[11*20+10] =   14.00; daa[12*20+ 0] =   54.00; daa[12*20+ 1] =   44.00;
+	    daa[12*20+ 2] =   30.00; daa[12*20+ 3] =   15.00; daa[12*20+ 4] =   31.00; daa[12*20+ 5] =   43.00;
+	    daa[12*20+ 6] =   18.00; daa[12*20+ 7] =   14.00; daa[12*20+ 8] =   33.00; daa[12*20+ 9] =  479.00;
+	    daa[12*20+10] =  388.00; daa[12*20+11] =   65.00; daa[13*20+ 0] =   15.00; daa[13*20+ 1] =    5.00;
+	    daa[13*20+ 2] =   10.00; daa[13*20+ 3] =    4.00; daa[13*20+ 4] =   78.00; daa[13*20+ 5] =    4.00;
+	    daa[13*20+ 6] =    5.00; daa[13*20+ 7] =    5.00; daa[13*20+ 8] =   40.00; daa[13*20+ 9] =   89.00;
+	    daa[13*20+10] =  248.00; daa[13*20+11] =    4.00; daa[13*20+12] =   43.00; daa[14*20+ 0] =  194.00;
+	    daa[14*20+ 1] =   74.00; daa[14*20+ 2] =   15.00; daa[14*20+ 3] =   15.00; daa[14*20+ 4] =   14.00;
+	    daa[14*20+ 5] =  164.00; daa[14*20+ 6] =   18.00; daa[14*20+ 7] =   24.00; daa[14*20+ 8] =  115.00;
+	    daa[14*20+ 9] =   10.00; daa[14*20+10] =  102.00; daa[14*20+11] =   21.00; daa[14*20+12] =   16.00;
+	    daa[14*20+13] =   17.00; daa[15*20+ 0] =  378.00; daa[15*20+ 1] =  101.00; daa[15*20+ 2] =  503.00;
+	    daa[15*20+ 3] =   59.00; daa[15*20+ 4] =  223.00; daa[15*20+ 5] =   53.00; daa[15*20+ 6] =   30.00;
+	    daa[15*20+ 7] =  201.00; daa[15*20+ 8] =   73.00; daa[15*20+ 9] =   40.00; daa[15*20+10] =   59.00;
+	    daa[15*20+11] =   47.00; daa[15*20+12] =   29.00; daa[15*20+13] =   92.00; daa[15*20+14] =  285.00;
+	    daa[16*20+ 0] =  475.00; daa[16*20+ 1] =   64.00; daa[16*20+ 2] =  232.00; daa[16*20+ 3] =   38.00;
+	    daa[16*20+ 4] =   42.00; daa[16*20+ 5] =   51.00; daa[16*20+ 6] =   32.00; daa[16*20+ 7] =   33.00;
+	    daa[16*20+ 8] =   46.00; daa[16*20+ 9] =  245.00; daa[16*20+10] =   25.00; daa[16*20+11] =  103.00;
+	    daa[16*20+12] =  226.00; daa[16*20+13] =   12.00; daa[16*20+14] =  118.00; daa[16*20+15] =  477.00;
+	    daa[17*20+ 0] =    9.00; daa[17*20+ 1] =  126.00; daa[17*20+ 2] =    8.00; daa[17*20+ 3] =    4.00;
+	    daa[17*20+ 4] =  115.00; daa[17*20+ 5] =   18.00; daa[17*20+ 6] =   10.00; daa[17*20+ 7] =   55.00;
+	    daa[17*20+ 8] =    8.00; daa[17*20+ 9] =    9.00; daa[17*20+10] =   52.00; daa[17*20+11] =   10.00;
+	    daa[17*20+12] =   24.00; daa[17*20+13] =   53.00; daa[17*20+14] =    6.00; daa[17*20+15] =   35.00;
+	    daa[17*20+16] =   12.00; daa[18*20+ 0] =   11.00; daa[18*20+ 1] =   20.00; daa[18*20+ 2] =   70.00;
+	    daa[18*20+ 3] =   46.00; daa[18*20+ 4] =  209.00; daa[18*20+ 5] =   24.00; daa[18*20+ 6] =    7.00;
+	    daa[18*20+ 7] =    8.00; daa[18*20+ 8] =  573.00; daa[18*20+ 9] =   32.00; daa[18*20+10] =   24.00;
+	    daa[18*20+11] =    8.00; daa[18*20+12] =   18.00; daa[18*20+13] =  536.00; daa[18*20+14] =   10.00;
+	    daa[18*20+15] =   63.00; daa[18*20+16] =   21.00; daa[18*20+17] =   71.00; daa[19*20+ 0] =  298.00;
+	    daa[19*20+ 1] =   17.00; daa[19*20+ 2] =   16.00; daa[19*20+ 3] =   31.00; daa[19*20+ 4] =   62.00;
+	    daa[19*20+ 5] =   20.00; daa[19*20+ 6] =   45.00; daa[19*20+ 7] =   47.00; daa[19*20+ 8] =   11.00;
+	    daa[19*20+ 9] =  961.00; daa[19*20+10] =  180.00; daa[19*20+11] =   14.00; daa[19*20+12] =  323.00;
+	    daa[19*20+13] =   62.00; daa[19*20+14] =   23.00; daa[19*20+15] =   38.00; daa[19*20+16] =  112.00;
+	    daa[19*20+17] =   25.00; daa[19*20+18] =   16.00;
+	    	    
+	    /*f[ 0] = 0.07700; f[ 1] = 0.05200; f[ 2] = 0.04200; f[ 3] = 0.05100;
+	    f[ 4] = 0.02000; f[ 5] = 0.04100; f[ 6] = 0.06200; f[ 7] = 0.07300;
+	    f[ 8] = 0.02300; f[ 9] = 0.05400; f[10] = 0.09200; f[11] = 0.05900;
+	    f[12] = 0.02400; f[13] = 0.04000; f[14] = 0.05100; f[15] = 0.06900;
+	    f[16] = 0.05800; f[17] = 0.01400; f[18] = 0.03200; f[19] = 0.06600;*/
+
+	    f[ 0] = 0.076748; f[ 1] = 0.051691; f[ 2] = 0.042645; f[ 3] = 0.051544;
+	    f[ 4] = 0.019803; f[ 5] = 0.040752; f[ 6] = 0.061830; f[ 7] = 0.073152;
+	    f[ 8] = 0.022944; f[ 9] = 0.053761; f[10] = 0.091904; f[11] = 0.058676;
+	    f[12] = 0.023826; f[13] = 0.040126; f[14] = 0.050901; f[15] = 0.068765;
+	    f[16] = 0.058565; f[17] = 0.014261; f[18] = 0.032102; f[19] = 0.066004;
+	  }
+	  break;
+	case  MTREV:
+	  {
+	    daa[ 1*20+ 0] =   23.18; daa[ 2*20+ 0] =   26.95; daa[ 2*20+ 1] =   13.24; daa[ 3*20+ 0] =   17.67;
+	    daa[ 3*20+ 1] =    1.90; daa[ 3*20+ 2] =  794.38; daa[ 4*20+ 0] =   59.93; daa[ 4*20+ 1] =  103.33;
+	    daa[ 4*20+ 2] =   58.94; daa[ 4*20+ 3] =    1.90; daa[ 5*20+ 0] =    1.90; daa[ 5*20+ 1] =  220.99;
+	    daa[ 5*20+ 2] =  173.56; daa[ 5*20+ 3] =   55.28; daa[ 5*20+ 4] =   75.24; daa[ 6*20+ 0] =    9.77;
+	    daa[ 6*20+ 1] =    1.90; daa[ 6*20+ 2] =   63.05; daa[ 6*20+ 3] =  583.55; daa[ 6*20+ 4] =    1.90;
+	    daa[ 6*20+ 5] =  313.56; daa[ 7*20+ 0] =  120.71; daa[ 7*20+ 1] =   23.03; daa[ 7*20+ 2] =   53.30;
+	    daa[ 7*20+ 3] =   56.77; daa[ 7*20+ 4] =   30.71; daa[ 7*20+ 5] =    6.75; daa[ 7*20+ 6] =   28.28;
+	    daa[ 8*20+ 0] =   13.90; daa[ 8*20+ 1] =  165.23; daa[ 8*20+ 2] =  496.13; daa[ 8*20+ 3] =  113.99;
+	    daa[ 8*20+ 4] =  141.49; daa[ 8*20+ 5] =  582.40; daa[ 8*20+ 6] =   49.12; daa[ 8*20+ 7] =    1.90;
+	    daa[ 9*20+ 0] =   96.49; daa[ 9*20+ 1] =    1.90; daa[ 9*20+ 2] =   27.10; daa[ 9*20+ 3] =    4.34;
+	    daa[ 9*20+ 4] =   62.73; daa[ 9*20+ 5] =    8.34; daa[ 9*20+ 6] =    3.31; daa[ 9*20+ 7] =    5.98;
+	    daa[ 9*20+ 8] =   12.26; daa[10*20+ 0] =   25.46; daa[10*20+ 1] =   15.58; daa[10*20+ 2] =   15.16;
+	    daa[10*20+ 3] =    1.90; daa[10*20+ 4] =   25.65; daa[10*20+ 5] =   39.70; daa[10*20+ 6] =    1.90;
+	    daa[10*20+ 7] =    2.41; daa[10*20+ 8] =   11.49; daa[10*20+ 9] =  329.09; daa[11*20+ 0] =    8.36;
+	    daa[11*20+ 1] =  141.40; daa[11*20+ 2] =  608.70; daa[11*20+ 3] =    2.31; daa[11*20+ 4] =    1.90;
+	    daa[11*20+ 5] =  465.58; daa[11*20+ 6] =  313.86; daa[11*20+ 7] =   22.73; daa[11*20+ 8] =  127.67;
+	    daa[11*20+ 9] =   19.57; daa[11*20+10] =   14.88; daa[12*20+ 0] =  141.88; daa[12*20+ 1] =    1.90;
+	    daa[12*20+ 2] =   65.41; daa[12*20+ 3] =    1.90; daa[12*20+ 4] =    6.18; daa[12*20+ 5] =   47.37;
+	    daa[12*20+ 6] =    1.90; daa[12*20+ 7] =    1.90; daa[12*20+ 8] =   11.97; daa[12*20+ 9] =  517.98;
+	    daa[12*20+10] =  537.53; daa[12*20+11] =   91.37; daa[13*20+ 0] =    6.37; daa[13*20+ 1] =    4.69;
+	    daa[13*20+ 2] =   15.20; daa[13*20+ 3] =    4.98; daa[13*20+ 4] =   70.80; daa[13*20+ 5] =   19.11;
+	    daa[13*20+ 6] =    2.67; daa[13*20+ 7] =    1.90; daa[13*20+ 8] =   48.16; daa[13*20+ 9] =   84.67;
+	    daa[13*20+10] =  216.06; daa[13*20+11] =    6.44; daa[13*20+12] =   90.82; daa[14*20+ 0] =   54.31;
+	    daa[14*20+ 1] =   23.64; daa[14*20+ 2] =   73.31; daa[14*20+ 3] =   13.43; daa[14*20+ 4] =   31.26;
+	    daa[14*20+ 5] =  137.29; daa[14*20+ 6] =   12.83; daa[14*20+ 7] =    1.90; daa[14*20+ 8] =   60.97;
+	    daa[14*20+ 9] =   20.63; daa[14*20+10] =   40.10; daa[14*20+11] =   50.10; daa[14*20+12] =   18.84;
+	    daa[14*20+13] =   17.31; daa[15*20+ 0] =  387.86; daa[15*20+ 1] =    6.04; daa[15*20+ 2] =  494.39;
+	    daa[15*20+ 3] =   69.02; daa[15*20+ 4] =  277.05; daa[15*20+ 5] =   54.11; daa[15*20+ 6] =   54.71;
+	    daa[15*20+ 7] =  125.93; daa[15*20+ 8] =   77.46; daa[15*20+ 9] =   47.70; daa[15*20+10] =   73.61;
+	    daa[15*20+11] =  105.79; daa[15*20+12] =  111.16; daa[15*20+13] =   64.29; daa[15*20+14] =  169.90;
+	    daa[16*20+ 0] =  480.72; daa[16*20+ 1] =    2.08; daa[16*20+ 2] =  238.46; daa[16*20+ 3] =   28.01;
+	    daa[16*20+ 4] =  179.97; daa[16*20+ 5] =   94.93; daa[16*20+ 6] =   14.82; daa[16*20+ 7] =   11.17;
+	    daa[16*20+ 8] =   44.78; daa[16*20+ 9] =  368.43; daa[16*20+10] =  126.40; daa[16*20+11] =  136.33;
+	    daa[16*20+12] =  528.17; daa[16*20+13] =   33.85; daa[16*20+14] =  128.22; daa[16*20+15] =  597.21;
+	    daa[17*20+ 0] =    1.90; daa[17*20+ 1] =   21.95; daa[17*20+ 2] =   10.68; daa[17*20+ 3] =   19.86;
+	    daa[17*20+ 4] =   33.60; daa[17*20+ 5] =    1.90; daa[17*20+ 6] =    1.90; daa[17*20+ 7] =   10.92;
+	    daa[17*20+ 8] =    7.08; daa[17*20+ 9] =    1.90; daa[17*20+10] =   32.44; daa[17*20+11] =   24.00;
+	    daa[17*20+12] =   21.71; daa[17*20+13] =    7.84; daa[17*20+14] =    4.21; daa[17*20+15] =   38.58;
+	    daa[17*20+16] =    9.99; daa[18*20+ 0] =    6.48; daa[18*20+ 1] =    1.90; daa[18*20+ 2] =  191.36;
+	    daa[18*20+ 3] =   21.21; daa[18*20+ 4] =  254.77; daa[18*20+ 5] =   38.82; daa[18*20+ 6] =   13.12;
+	    daa[18*20+ 7] =    3.21; daa[18*20+ 8] =  670.14; daa[18*20+ 9] =   25.01; daa[18*20+10] =   44.15;
+	    daa[18*20+11] =   51.17; daa[18*20+12] =   39.96; daa[18*20+13] =  465.58; daa[18*20+14] =   16.21;
+	    daa[18*20+15] =   64.92; daa[18*20+16] =   38.73; daa[18*20+17] =   26.25; daa[19*20+ 0] =  195.06;
+	    daa[19*20+ 1] =    7.64; daa[19*20+ 2] =    1.90; daa[19*20+ 3] =    1.90; daa[19*20+ 4] =    1.90;
+	    daa[19*20+ 5] =   19.00; daa[19*20+ 6] =   21.14; daa[19*20+ 7] =    2.53; daa[19*20+ 8] =    1.90;
+	    daa[19*20+ 9] = 1222.94; daa[19*20+10] =   91.67; daa[19*20+11] =    1.90; daa[19*20+12] =  387.54;
+	    daa[19*20+13] =    6.35; daa[19*20+14] =    8.23; daa[19*20+15] =    1.90; daa[19*20+16] =  204.54;
+	    daa[19*20+17] =    5.37; daa[19*20+18] =    1.90;
+	    
+	    
+	    f[ 0] = 0.072000; f[ 1] = 0.019000; f[ 2] = 0.039000; f[ 3] = 0.019000;
+	    f[ 4] = 0.006000; f[ 5] = 0.025000; f[ 6] = 0.024000; f[ 7] = 0.056000;
+	    f[ 8] = 0.028000; f[ 9] = 0.088000; f[10] = 0.169000; f[11] = 0.023000;
+	    f[12] = 0.054000; f[13] = 0.061000; f[14] = 0.054000; f[15] = 0.072000;
+	    f[16] = 0.086000; f[17] = 0.029000; f[18] = 0.033000; f[19] = 0.043000;
+	  }
+	  break;
+	case WAG:
+	  {
+	    daa[ 1*20+ 0] =  55.15710; daa[ 2*20+ 0] =  50.98480; daa[ 2*20+ 1] =  63.53460; 
+	    daa[ 3*20+ 0] =  73.89980; daa[ 3*20+ 1] =  14.73040; daa[ 3*20+ 2] = 542.94200; 
+	    daa[ 4*20+ 0] = 102.70400; daa[ 4*20+ 1] =  52.81910; daa[ 4*20+ 2] =  26.52560; 
+	    daa[ 4*20+ 3] =   3.02949; daa[ 5*20+ 0] =  90.85980; daa[ 5*20+ 1] = 303.55000; 
+	    daa[ 5*20+ 2] = 154.36400; daa[ 5*20+ 3] =  61.67830; daa[ 5*20+ 4] =   9.88179; 
+	    daa[ 6*20+ 0] = 158.28500; daa[ 6*20+ 1] =  43.91570; daa[ 6*20+ 2] =  94.71980; 
+	    daa[ 6*20+ 3] = 617.41600; daa[ 6*20+ 4] =   2.13520; daa[ 6*20+ 5] = 546.94700; 
+	    daa[ 7*20+ 0] = 141.67200; daa[ 7*20+ 1] =  58.46650; daa[ 7*20+ 2] = 112.55600; 
+	    daa[ 7*20+ 3] =  86.55840; daa[ 7*20+ 4] =  30.66740; daa[ 7*20+ 5] =  33.00520; 
+	    daa[ 7*20+ 6] =  56.77170; daa[ 8*20+ 0] =  31.69540; daa[ 8*20+ 1] = 213.71500; 
+	    daa[ 8*20+ 2] = 395.62900; daa[ 8*20+ 3] =  93.06760; daa[ 8*20+ 4] =  24.89720; 
+	    daa[ 8*20+ 5] = 429.41100; daa[ 8*20+ 6] =  57.00250; daa[ 8*20+ 7] =  24.94100; 
+	    daa[ 9*20+ 0] =  19.33350; daa[ 9*20+ 1] =  18.69790; daa[ 9*20+ 2] =  55.42360; 
+	    daa[ 9*20+ 3] =   3.94370; daa[ 9*20+ 4] =  17.01350; daa[ 9*20+ 5] =  11.39170; 
+	    daa[ 9*20+ 6] =  12.73950; daa[ 9*20+ 7] =   3.04501; daa[ 9*20+ 8] =  13.81900; 
+	    daa[10*20+ 0] =  39.79150; daa[10*20+ 1] =  49.76710; daa[10*20+ 2] =  13.15280; 
+	    daa[10*20+ 3] =   8.48047; daa[10*20+ 4] =  38.42870; daa[10*20+ 5] =  86.94890; 
+	    daa[10*20+ 6] =  15.42630; daa[10*20+ 7] =   6.13037; daa[10*20+ 8] =  49.94620; 
+	    daa[10*20+ 9] = 317.09700; daa[11*20+ 0] =  90.62650; daa[11*20+ 1] = 535.14200; 
+	    daa[11*20+ 2] = 301.20100; daa[11*20+ 3] =  47.98550; daa[11*20+ 4] =   7.40339; 
+	    daa[11*20+ 5] = 389.49000; daa[11*20+ 6] = 258.44300; daa[11*20+ 7] =  37.35580; 
+	    daa[11*20+ 8] =  89.04320; daa[11*20+ 9] =  32.38320; daa[11*20+10] =  25.75550; 
+	    daa[12*20+ 0] =  89.34960; daa[12*20+ 1] =  68.31620; daa[12*20+ 2] =  19.82210; 
+	    daa[12*20+ 3] =  10.37540; daa[12*20+ 4] =  39.04820; daa[12*20+ 5] = 154.52600; 
+	    daa[12*20+ 6] =  31.51240; daa[12*20+ 7] =  17.41000; daa[12*20+ 8] =  40.41410; 
+	    daa[12*20+ 9] = 425.74600; daa[12*20+10] = 485.40200; daa[12*20+11] =  93.42760; 
+	    daa[13*20+ 0] =  21.04940; daa[13*20+ 1] =  10.27110; daa[13*20+ 2] =   9.61621; 
+	    daa[13*20+ 3] =   4.67304; daa[13*20+ 4] =  39.80200; daa[13*20+ 5] =   9.99208; 
+	    daa[13*20+ 6] =   8.11339; daa[13*20+ 7] =   4.99310; daa[13*20+ 8] =  67.93710; 
+	    daa[13*20+ 9] = 105.94700; daa[13*20+10] = 211.51700; daa[13*20+11] =   8.88360; 
+	    daa[13*20+12] = 119.06300; daa[14*20+ 0] = 143.85500; daa[14*20+ 1] =  67.94890; 
+	    daa[14*20+ 2] =  19.50810; daa[14*20+ 3] =  42.39840; daa[14*20+ 4] =  10.94040; 
+	    daa[14*20+ 5] =  93.33720; daa[14*20+ 6] =  68.23550; daa[14*20+ 7] =  24.35700; 
+	    daa[14*20+ 8] =  69.61980; daa[14*20+ 9] =   9.99288; daa[14*20+10] =  41.58440; 
+	    daa[14*20+11] =  55.68960; daa[14*20+12] =  17.13290; daa[14*20+13] =  16.14440; 
+	    daa[15*20+ 0] = 337.07900; daa[15*20+ 1] = 122.41900; daa[15*20+ 2] = 397.42300; 
+	    daa[15*20+ 3] = 107.17600; daa[15*20+ 4] = 140.76600; daa[15*20+ 5] = 102.88700; 
+	    daa[15*20+ 6] =  70.49390; daa[15*20+ 7] = 134.18200; daa[15*20+ 8] =  74.01690; 
+	    daa[15*20+ 9] =  31.94400; daa[15*20+10] =  34.47390; daa[15*20+11] =  96.71300; 
+	    daa[15*20+12] =  49.39050; daa[15*20+13] =  54.59310; daa[15*20+14] = 161.32800; 
+	    daa[16*20+ 0] = 212.11100; daa[16*20+ 1] =  55.44130; daa[16*20+ 2] = 203.00600; 
+	    daa[16*20+ 3] =  37.48660; daa[16*20+ 4] =  51.29840; daa[16*20+ 5] =  85.79280; 
+	    daa[16*20+ 6] =  82.27650; daa[16*20+ 7] =  22.58330; daa[16*20+ 8] =  47.33070; 
+	    daa[16*20+ 9] = 145.81600; daa[16*20+10] =  32.66220; daa[16*20+11] = 138.69800; 
+	    daa[16*20+12] = 151.61200; daa[16*20+13] =  17.19030; daa[16*20+14] =  79.53840; 
+	    daa[16*20+15] = 437.80200; daa[17*20+ 0] =  11.31330; daa[17*20+ 1] = 116.39200; 
+	    daa[17*20+ 2] =   7.19167; daa[17*20+ 3] =  12.97670; daa[17*20+ 4] =  71.70700; 
+	    daa[17*20+ 5] =  21.57370; daa[17*20+ 6] =  15.65570; daa[17*20+ 7] =  33.69830; 
+	    daa[17*20+ 8] =  26.25690; daa[17*20+ 9] =  21.24830; daa[17*20+10] =  66.53090; 
+	    daa[17*20+11] =  13.75050; daa[17*20+12] =  51.57060; daa[17*20+13] = 152.96400; 
+	    daa[17*20+14] =  13.94050; daa[17*20+15] =  52.37420; daa[17*20+16] =  11.08640; 
+	    daa[18*20+ 0] =  24.07350; daa[18*20+ 1] =  38.15330; daa[18*20+ 2] = 108.60000; 
+	    daa[18*20+ 3] =  32.57110; daa[18*20+ 4] =  54.38330; daa[18*20+ 5] =  22.77100; 
+	    daa[18*20+ 6] =  19.63030; daa[18*20+ 7] =  10.36040; daa[18*20+ 8] = 387.34400; 
+	    daa[18*20+ 9] =  42.01700; daa[18*20+10] =  39.86180; daa[18*20+11] =  13.32640; 
+	    daa[18*20+12] =  42.84370; daa[18*20+13] = 645.42800; daa[18*20+14] =  21.60460; 
+	    daa[18*20+15] =  78.69930; daa[18*20+16] =  29.11480; daa[18*20+17] = 248.53900; 
+	    daa[19*20+ 0] = 200.60100; daa[19*20+ 1] =  25.18490; daa[19*20+ 2] =  19.62460; 
+	    daa[19*20+ 3] =  15.23350; daa[19*20+ 4] = 100.21400; daa[19*20+ 5] =  30.12810; 
+	    daa[19*20+ 6] =  58.87310; daa[19*20+ 7] =  18.72470; daa[19*20+ 8] =  11.83580; 
+	    daa[19*20+ 9] = 782.13000; daa[19*20+10] = 180.03400; daa[19*20+11] =  30.54340; 
+	    daa[19*20+12] = 205.84500; daa[19*20+13] =  64.98920; daa[19*20+14] =  31.48870; 
+	    daa[19*20+15] =  23.27390; daa[19*20+16] = 138.82300; daa[19*20+17] =  36.53690; 
+	    daa[19*20+18] =  31.47300; 
+	    	   
+	    /*f[0]  = 0.08700; f[1]  = 0.04400; f[2]  = 0.03900; f[3]  = 0.05700;
+	    f[4]  = 0.01900; f[5]  = 0.03700; f[6]  = 0.05800; f[7]  = 0.08300;
+	    f[8]  = 0.02400; f[9]  = 0.04900; f[10] = 0.08600; f[11] = 0.06200;
+	    f[12] = 0.02000; f[13] = 0.03800; f[14] = 0.04600; f[15] = 0.07000;
+	    f[16] = 0.06100; f[17] = 0.01400; f[18] = 0.03500; f[19] = 0.07100;   
+	    */
+
+	    f[0] = 0.0866279; f[1] =  0.043972; f[2] =  0.0390894; f[3] =  0.0570451;
+	    f[4] =  0.0193078; f[5] =  0.0367281; f[6] =  0.0580589; f[7] =  0.0832518;
+	    f[8] =  0.0244313; f[9] =  0.048466; f[10] =  0.086209; f[11] = 0.0620286;
+	    f[12] = 0.0195027; f[13] =  0.0384319; f[14] =  0.0457631; f[15] = 0.0695179;
+	    f[16] =  0.0610127; f[17] =  0.0143859; f[18] =  0.0352742; f[19] =  0.0708957;
+	  }
+	  break;
+	case RTREV:
+	  {
+	    daa[1*20+0]= 34;         daa[2*20+0]= 51;         daa[2*20+1]= 35;         daa[3*20+0]= 10;         
+	    daa[3*20+1]= 30;         daa[3*20+2]= 384;        daa[4*20+0]= 439;        daa[4*20+1]= 92;         
+	    daa[4*20+2]= 128;        daa[4*20+3]= 1;          daa[5*20+0]= 32;         daa[5*20+1]= 221;        
+	    daa[5*20+2]= 236;        daa[5*20+3]= 78;         daa[5*20+4]= 70;         daa[6*20+0]= 81;         
+	    daa[6*20+1]= 10;         daa[6*20+2]= 79;         daa[6*20+3]= 542;        daa[6*20+4]= 1;          
+	    daa[6*20+5]= 372;        daa[7*20+0]= 135;        daa[7*20+1]= 41;         daa[7*20+2]= 94;         
+	    daa[7*20+3]= 61;         daa[7*20+4]= 48;         daa[7*20+5]= 18;         daa[7*20+6]= 70;         
+	    daa[8*20+0]= 30;         daa[8*20+1]= 90;         daa[8*20+2]= 320;        daa[8*20+3]= 91;         
+	    daa[8*20+4]= 124;        daa[8*20+5]= 387;        daa[8*20+6]= 34;         daa[8*20+7]= 68;         
+	    daa[9*20+0]= 1;          daa[9*20+1]= 24;         daa[9*20+2]= 35;         daa[9*20+3]= 1;          
+	    daa[9*20+4]= 104;        daa[9*20+5]= 33;         daa[9*20+6]= 1;          daa[9*20+7]= 1;          
+	    daa[9*20+8]= 34;         daa[10*20+0]= 45;        daa[10*20+1]= 18;        daa[10*20+2]= 15;        
+	    daa[10*20+3]= 5;         daa[10*20+4]= 110;       daa[10*20+5]= 54;        daa[10*20+6]= 21;        
+	    daa[10*20+7]= 3;         daa[10*20+8]= 51;        daa[10*20+9]= 385;       daa[11*20+0]= 38;        
+	    daa[11*20+1]= 593;       daa[11*20+2]= 123;       daa[11*20+3]= 20;        daa[11*20+4]= 16;        
+	    daa[11*20+5]= 309;       daa[11*20+6]= 141;       daa[11*20+7]= 30;        daa[11*20+8]= 76;        
+	    daa[11*20+9]= 34;        daa[11*20+10]= 23;       daa[12*20+0]= 235;       daa[12*20+1]= 57;        
+	    daa[12*20+2]= 1;         daa[12*20+3]= 1;         daa[12*20+4]= 156;       daa[12*20+5]= 158;       
+	    daa[12*20+6]= 1;         daa[12*20+7]= 37;        daa[12*20+8]= 116;       daa[12*20+9]= 375;       
+	    daa[12*20+10]= 581;      daa[12*20+11]= 134;      daa[13*20+0]= 1;         daa[13*20+1]= 7;         
+	    daa[13*20+2]= 49;        daa[13*20+3]= 1;         daa[13*20+4]= 70;        daa[13*20+5]= 1;         
+	    daa[13*20+6]= 1;         daa[13*20+7]= 7;         daa[13*20+8]= 141;       daa[13*20+9]= 64;        
+	    daa[13*20+10]= 179;      daa[13*20+11]= 14;       daa[13*20+12]= 247;      daa[14*20+0]= 97;        
+	    daa[14*20+1]= 24;        daa[14*20+2]= 33;        daa[14*20+3]= 55;        daa[14*20+4]= 1;         
+	    daa[14*20+5]= 68;        daa[14*20+6]= 52;        daa[14*20+7]= 17;        daa[14*20+8]= 44;        
+	    daa[14*20+9]= 10;        daa[14*20+10]= 22;       daa[14*20+11]= 43;       daa[14*20+12]= 1;        
+	    daa[14*20+13]= 11;       daa[15*20+0]= 460;       daa[15*20+1]= 102;       daa[15*20+2]= 294;       
+	    daa[15*20+3]= 136;       daa[15*20+4]= 75;        daa[15*20+5]= 225;       daa[15*20+6]= 95;        
+	    daa[15*20+7]= 152;       daa[15*20+8]= 183;       daa[15*20+9]= 4;         daa[15*20+10]= 24;       
+	    daa[15*20+11]= 77;       daa[15*20+12]= 1;        daa[15*20+13]= 20;       daa[15*20+14]= 134;      
+	    daa[16*20+0]= 258;       daa[16*20+1]= 64;        daa[16*20+2]= 148;       daa[16*20+3]= 55;        
+	    daa[16*20+4]= 117;       daa[16*20+5]= 146;       daa[16*20+6]= 82;        daa[16*20+7]= 7;         
+	    daa[16*20+8]= 49;        daa[16*20+9]= 72;        daa[16*20+10]= 25;       daa[16*20+11]= 110;      
+	    daa[16*20+12]= 131;      daa[16*20+13]= 69;       daa[16*20+14]= 62;       daa[16*20+15]= 671;      
+	    daa[17*20+0]= 5;         daa[17*20+1]= 13;        daa[17*20+2]= 16;        daa[17*20+3]= 1;         
+	    daa[17*20+4]= 55;        daa[17*20+5]= 10;        daa[17*20+6]= 17;        daa[17*20+7]= 23;        
+	    daa[17*20+8]= 48;        daa[17*20+9]= 39;        daa[17*20+10]= 47;       daa[17*20+11]= 6;        
+	    daa[17*20+12]= 111;      daa[17*20+13]= 182;      daa[17*20+14]= 9;        daa[17*20+15]= 14;       
+	    daa[17*20+16]= 1;        daa[18*20+0]= 55;        daa[18*20+1]= 47;        daa[18*20+2]= 28;        
+	    daa[18*20+3]= 1;         daa[18*20+4]= 131;       daa[18*20+5]= 45;        daa[18*20+6]= 1;         
+	    daa[18*20+7]= 21;        daa[18*20+8]= 307;       daa[18*20+9]= 26;        daa[18*20+10]= 64;       
+	    daa[18*20+11]= 1;        daa[18*20+12]= 74;       daa[18*20+13]= 1017;     daa[18*20+14]= 14;       
+	    daa[18*20+15]= 31;       daa[18*20+16]= 34;       daa[18*20+17]= 176;      daa[19*20+0]= 197;       
+	    daa[19*20+1]= 29;        daa[19*20+2]= 21;        daa[19*20+3]= 6;         daa[19*20+4]= 295;       
+	    daa[19*20+5]= 36;        daa[19*20+6]= 35;        daa[19*20+7]= 3;         daa[19*20+8]= 1;         
+	    daa[19*20+9]= 1048;      daa[19*20+10]= 112;      daa[19*20+11]= 19;       daa[19*20+12]= 236;      
+	    daa[19*20+13]= 92;       daa[19*20+14]= 25;       daa[19*20+15]= 39;       daa[19*20+16]= 196;      
+	    daa[19*20+17]= 26;       daa[19*20+18]= 59;       
+	    
+	    f[0]= 0.0646;           f[1]= 0.0453;           f[2]= 0.0376;           f[3]= 0.0422;           
+	    f[4]= 0.0114;           f[5]= 0.0606;           f[6]= 0.0607;           f[7]= 0.0639;           
+	    f[8]= 0.0273;           f[9]= 0.0679;           f[10]= 0.1018;          f[11]= 0.0751;          
+	    f[12]= 0.015;           f[13]= 0.0287;          f[14]= 0.0681;          f[15]= 0.0488;          
+	    f[16]= 0.0622;          f[17]= 0.0251;          f[18]= 0.0318;          f[19]= 0.0619;	    	    
+	  }
+	  break;
+	case CPREV:
+	  {
+	    daa[1*20+0]= 105;        daa[2*20+0]= 227;        daa[2*20+1]= 357;        daa[3*20+0]= 175;        
+	    daa[3*20+1]= 43;         daa[3*20+2]= 4435;       daa[4*20+0]= 669;        daa[4*20+1]= 823;        
+	    daa[4*20+2]= 538;        daa[4*20+3]= 10;         daa[5*20+0]= 157;        daa[5*20+1]= 1745;       
+	    daa[5*20+2]= 768;        daa[5*20+3]= 400;        daa[5*20+4]= 10;         daa[6*20+0]= 499;        
+	    daa[6*20+1]= 152;        daa[6*20+2]= 1055;       daa[6*20+3]= 3691;       daa[6*20+4]= 10;         
+	    daa[6*20+5]= 3122;       daa[7*20+0]= 665;        daa[7*20+1]= 243;        daa[7*20+2]= 653;        
+	    daa[7*20+3]= 431;        daa[7*20+4]= 303;        daa[7*20+5]= 133;        daa[7*20+6]= 379;        
+	    daa[8*20+0]= 66;         daa[8*20+1]= 715;        daa[8*20+2]= 1405;       daa[8*20+3]= 331;        
+	    daa[8*20+4]= 441;        daa[8*20+5]= 1269;       daa[8*20+6]= 162;        daa[8*20+7]= 19;         
+	    daa[9*20+0]= 145;        daa[9*20+1]= 136;        daa[9*20+2]= 168;        daa[9*20+3]= 10;         
+	    daa[9*20+4]= 280;        daa[9*20+5]= 92;         daa[9*20+6]= 148;        daa[9*20+7]= 40;         
+	    daa[9*20+8]= 29;         daa[10*20+0]= 197;       daa[10*20+1]= 203;       daa[10*20+2]= 113;       
+	    daa[10*20+3]= 10;        daa[10*20+4]= 396;       daa[10*20+5]= 286;       daa[10*20+6]= 82;        
+	    daa[10*20+7]= 20;        daa[10*20+8]= 66;        daa[10*20+9]= 1745;      daa[11*20+0]= 236;       
+	    daa[11*20+1]= 4482;      daa[11*20+2]= 2430;      daa[11*20+3]= 412;       daa[11*20+4]= 48;        
+	    daa[11*20+5]= 3313;      daa[11*20+6]= 2629;      daa[11*20+7]= 263;       daa[11*20+8]= 305;       
+	    daa[11*20+9]= 345;       daa[11*20+10]= 218;      daa[12*20+0]= 185;       daa[12*20+1]= 125;       
+	    daa[12*20+2]= 61;        daa[12*20+3]= 47;        daa[12*20+4]= 159;       daa[12*20+5]= 202;       
+	    daa[12*20+6]= 113;       daa[12*20+7]= 21;        daa[12*20+8]= 10;        daa[12*20+9]= 1772;      
+	    daa[12*20+10]= 1351;     daa[12*20+11]= 193;      daa[13*20+0]= 68;        daa[13*20+1]= 53;        
+	    daa[13*20+2]= 97;        daa[13*20+3]= 22;        daa[13*20+4]= 726;       daa[13*20+5]= 10;        
+	    daa[13*20+6]= 145;       daa[13*20+7]= 25;        daa[13*20+8]= 127;       daa[13*20+9]= 454;       
+	    daa[13*20+10]= 1268;     daa[13*20+11]= 72;       daa[13*20+12]= 327;      daa[14*20+0]= 490;       
+	    daa[14*20+1]= 87;        daa[14*20+2]= 173;       daa[14*20+3]= 170;       daa[14*20+4]= 285;       
+	    daa[14*20+5]= 323;       daa[14*20+6]= 185;       daa[14*20+7]= 28;        daa[14*20+8]= 152;       
+	    daa[14*20+9]= 117;       daa[14*20+10]= 219;      daa[14*20+11]= 302;      daa[14*20+12]= 100;      
+	    daa[14*20+13]= 43;       daa[15*20+0]= 2440;      daa[15*20+1]= 385;       daa[15*20+2]= 2085;      
+	    daa[15*20+3]= 590;       daa[15*20+4]= 2331;      daa[15*20+5]= 396;       daa[15*20+6]= 568;       
+	    daa[15*20+7]= 691;       daa[15*20+8]= 303;       daa[15*20+9]= 216;       daa[15*20+10]= 516;      
+	    daa[15*20+11]= 868;      daa[15*20+12]= 93;       daa[15*20+13]= 487;      daa[15*20+14]= 1202;     
+	    daa[16*20+0]= 1340;      daa[16*20+1]= 314;       daa[16*20+2]= 1393;      daa[16*20+3]= 266;       
+	    daa[16*20+4]= 576;       daa[16*20+5]= 241;       daa[16*20+6]= 369;       daa[16*20+7]= 92;        
+	    daa[16*20+8]= 32;        daa[16*20+9]= 1040;      daa[16*20+10]= 156;      daa[16*20+11]= 918;      
+	    daa[16*20+12]= 645;      daa[16*20+13]= 148;      daa[16*20+14]= 260;      daa[16*20+15]= 2151;     
+	    daa[17*20+0]= 14;        daa[17*20+1]= 230;       daa[17*20+2]= 40;        daa[17*20+3]= 18;        
+	    daa[17*20+4]= 435;       daa[17*20+5]= 53;        daa[17*20+6]= 63;        daa[17*20+7]= 82;        
+	    daa[17*20+8]= 69;        daa[17*20+9]= 42;        daa[17*20+10]= 159;      daa[17*20+11]= 10;       
+	    daa[17*20+12]= 86;       daa[17*20+13]= 468;      daa[17*20+14]= 49;       daa[17*20+15]= 73;       
+	    daa[17*20+16]= 29;       daa[18*20+0]= 56;        daa[18*20+1]= 323;       daa[18*20+2]= 754;       
+	    daa[18*20+3]= 281;       daa[18*20+4]= 1466;      daa[18*20+5]= 391;       daa[18*20+6]= 142;       
+	    daa[18*20+7]= 10;        daa[18*20+8]= 1971;      daa[18*20+9]= 89;        daa[18*20+10]= 189;      
+	    daa[18*20+11]= 247;      daa[18*20+12]= 215;      daa[18*20+13]= 2370;     daa[18*20+14]= 97;       
+	    daa[18*20+15]= 522;      daa[18*20+16]= 71;       daa[18*20+17]= 346;      daa[19*20+0]= 968;       
+	    daa[19*20+1]= 92;        daa[19*20+2]= 83;        daa[19*20+3]= 75;        daa[19*20+4]= 592;       
+	    daa[19*20+5]= 54;        daa[19*20+6]= 200;       daa[19*20+7]= 91;        daa[19*20+8]= 25;        
+	    daa[19*20+9]= 4797;      daa[19*20+10]= 865;      daa[19*20+11]= 249;      daa[19*20+12]= 475;      
+	    daa[19*20+13]= 317;      daa[19*20+14]= 122;      daa[19*20+15]= 167;      daa[19*20+16]= 760;      
+	    daa[19*20+17]= 10;       daa[19*20+18]= 119;      
+	    
+	    f[0]= 0.076;            f[1]= 0.062;            f[2]= 0.041;            f[3]= 0.037;            
+	    f[4]= 0.009;            f[5]= 0.038;            f[6]= 0.049;            f[7]= 0.084;            
+	    f[8]= 0.025;            f[9]= 0.081;            f[10]= 0.101;           f[11]= 0.05;            
+	    f[12]= 0.022;           f[13]= 0.051;           f[14]= 0.043;           f[15]= 0.062;           
+	    f[16]= 0.054;           f[17]= 0.018;           f[18]= 0.031;           f[19]= 0.066; 
+	  }
+	  break;
+	case VT:
+	  {
+	    /*
+	      daa[1*20+0]= 0.233108;   daa[2*20+0]= 0.199097;   daa[2*20+1]= 0.210797;   daa[3*20+0]= 0.265145;   
+	      daa[3*20+1]= 0.105191;   daa[3*20+2]= 0.883422;   daa[4*20+0]= 0.227333;   daa[4*20+1]= 0.031726;   
+	      daa[4*20+2]= 0.027495;   daa[4*20+3]= 0.010313;   daa[5*20+0]= 0.310084;   daa[5*20+1]= 0.493763;   
+	      daa[5*20+2]= 0.2757;     daa[5*20+3]= 0.205842;   daa[5*20+4]= 0.004315;   daa[6*20+0]= 0.567957;   
+	      daa[6*20+1]= 0.25524;    daa[6*20+2]= 0.270417;   daa[6*20+3]= 1.599461;   daa[6*20+4]= 0.005321;   
+	      daa[6*20+5]= 0.960976;   daa[7*20+0]= 0.876213;   daa[7*20+1]= 0.156945;   daa[7*20+2]= 0.362028;   
+	      daa[7*20+3]= 0.311718;   daa[7*20+4]= 0.050876;   daa[7*20+5]= 0.12866;    daa[7*20+6]= 0.250447;   
+	      daa[8*20+0]= 0.078692;   daa[8*20+1]= 0.213164;   daa[8*20+2]= 0.290006;   daa[8*20+3]= 0.134252;   
+	      daa[8*20+4]= 0.016695;   daa[8*20+5]= 0.315521;   daa[8*20+6]= 0.104458;   daa[8*20+7]= 0.058131;   
+	      daa[9*20+0]= 0.222972;   daa[9*20+1]= 0.08151;    daa[9*20+2]= 0.087225;   daa[9*20+3]= 0.01172;    
+	      daa[9*20+4]= 0.046398;   daa[9*20+5]= 0.054602;   daa[9*20+6]= 0.046589;   daa[9*20+7]= 0.051089;   
+	      daa[9*20+8]= 0.020039;   daa[10*20+0]= 0.42463;   daa[10*20+1]= 0.192364;  daa[10*20+2]= 0.069245;  
+	      daa[10*20+3]= 0.060863;  daa[10*20+4]= 0.091709;  daa[10*20+5]= 0.24353;   daa[10*20+6]= 0.151924;  
+	      daa[10*20+7]= 0.087056;  daa[10*20+8]= 0.103552;  daa[10*20+9]= 2.08989;   daa[11*20+0]= 0.393245;  
+	      daa[11*20+1]= 1.755838;  daa[11*20+2]= 0.50306;   daa[11*20+3]= 0.261101;  daa[11*20+4]= 0.004067;  
+	      daa[11*20+5]= 0.738208;  daa[11*20+6]= 0.88863;   daa[11*20+7]= 0.193243;  daa[11*20+8]= 0.153323;  
+	      daa[11*20+9]= 0.093181;  daa[11*20+10]= 0.201204; daa[12*20+0]= 0.21155;   daa[12*20+1]= 0.08793;   
+	      daa[12*20+2]= 0.05742;   daa[12*20+3]= 0.012182;  daa[12*20+4]= 0.02369;   daa[12*20+5]= 0.120801;  
+	      daa[12*20+6]= 0.058643;  daa[12*20+7]= 0.04656;   daa[12*20+8]= 0.021157;  daa[12*20+9]= 0.493845;  
+	      daa[12*20+10]= 1.105667; daa[12*20+11]= 0.096474; daa[13*20+0]= 0.116646;  daa[13*20+1]= 0.042569;  
+	      daa[13*20+2]= 0.039769;  daa[13*20+3]= 0.016577;  daa[13*20+4]= 0.051127;  daa[13*20+5]= 0.026235;  
+	      daa[13*20+6]= 0.028168;  daa[13*20+7]= 0.050143;  daa[13*20+8]= 0.079807;  daa[13*20+9]= 0.32102;   
+	      daa[13*20+10]= 0.946499; daa[13*20+11]= 0.038261; daa[13*20+12]= 0.173052; daa[14*20+0]= 0.399143;  
+	      daa[14*20+1]= 0.12848;   daa[14*20+2]= 0.083956;  daa[14*20+3]= 0.160063;  daa[14*20+4]= 0.011137;  
+	      daa[14*20+5]= 0.15657;   daa[14*20+6]= 0.205134;  daa[14*20+7]= 0.124492;  daa[14*20+8]= 0.078892;  
+	      daa[14*20+9]= 0.054797;  daa[14*20+10]= 0.169784; daa[14*20+11]= 0.212302; daa[14*20+12]= 0.010363; 
+	      daa[14*20+13]= 0.042564; daa[15*20+0]= 1.817198;  daa[15*20+1]= 0.292327;  daa[15*20+2]= 0.847049;  
+	      daa[15*20+3]= 0.461519;  daa[15*20+4]= 0.17527;   daa[15*20+5]= 0.358017;  daa[15*20+6]= 0.406035;  
+	      daa[15*20+7]= 0.612843;  daa[15*20+8]= 0.167406;  daa[15*20+9]= 0.081567;  daa[15*20+10]= 0.214977; 
+	      daa[15*20+11]= 0.400072; daa[15*20+12]= 0.090515; daa[15*20+13]= 0.138119; daa[15*20+14]= 0.430431; 
+	      daa[16*20+0]= 0.877877;  daa[16*20+1]= 0.204109;  daa[16*20+2]= 0.471268;  daa[16*20+3]= 0.178197;  
+	      daa[16*20+4]= 0.079511;  daa[16*20+5]= 0.248992;  daa[16*20+6]= 0.321028;  daa[16*20+7]= 0.136266;  
+	      daa[16*20+8]= 0.101117;  daa[16*20+9]= 0.376588;  daa[16*20+10]= 0.243227; daa[16*20+11]= 0.446646; 
+	      daa[16*20+12]= 0.184609; daa[16*20+13]= 0.08587;  daa[16*20+14]= 0.207143; daa[16*20+15]= 1.767766; 
+	      daa[17*20+0]= 0.030309;  daa[17*20+1]= 0.046417;  daa[17*20+2]= 0.010459;  daa[17*20+3]= 0.011393;  
+	      daa[17*20+4]= 0.007732;  daa[17*20+5]= 0.021248;  daa[17*20+6]= 0.018844;  daa[17*20+7]= 0.02399;   
+	      daa[17*20+8]= 0.020009;  daa[17*20+9]= 0.034954;  daa[17*20+10]= 0.083439; daa[17*20+11]= 0.023321; 
+	      daa[17*20+12]= 0.022019; daa[17*20+13]= 0.12805;  daa[17*20+14]= 0.014584; daa[17*20+15]= 0.035933; 
+	      daa[17*20+16]= 0.020437; daa[18*20+0]= 0.087061;  daa[18*20+1]= 0.09701;   daa[18*20+2]= 0.093268;  
+	      daa[18*20+3]= 0.051664;  daa[18*20+4]= 0.042823;  daa[18*20+5]= 0.062544;  daa[18*20+6]= 0.0552;    
+	      daa[18*20+7]= 0.037568;  daa[18*20+8]= 0.286027;  daa[18*20+9]= 0.086237;  daa[18*20+10]= 0.189842; 
+	      daa[18*20+11]= 0.068689; daa[18*20+12]= 0.073223; daa[18*20+13]= 0.898663; daa[18*20+14]= 0.032043; 
+	      daa[18*20+15]= 0.121979; daa[18*20+16]= 0.094617; daa[18*20+17]= 0.124746; daa[19*20+0]= 1.230985;  
+	      daa[19*20+1]= 0.113146;  daa[19*20+2]= 0.049824;  daa[19*20+3]= 0.048769;  daa[19*20+4]= 0.163831;  
+	      daa[19*20+5]= 0.112027;  daa[19*20+6]= 0.205868;  daa[19*20+7]= 0.082579;  daa[19*20+8]= 0.068575;  
+	      daa[19*20+9]= 3.65443;   daa[19*20+10]= 1.337571; daa[19*20+11]= 0.144587; daa[19*20+12]= 0.307309; 
+	      daa[19*20+13]= 0.247329; daa[19*20+14]= 0.129315; daa[19*20+15]= 0.1277;   daa[19*20+16]= 0.740372; 
+	      daa[19*20+17]= 0.022134; daa[19*20+18]= 0.125733; 	    	    
+	      
+	      f[0]  = 0.07900;         f[1]= 0.05100;        f[2]  = 0.04200;         f[3]= 0.05300;         
+	      f[4]  = 0.01500;         f[5]= 0.03700;        f[6]  = 0.06200;         f[7]= 0.07100;         
+	      f[8]  = 0.02300;         f[9]= 0.06200;        f[10] = 0.09600;        f[11]= 0.05700;        
+	      f[12] = 0.02400;        f[13]= 0.04300;        f[14] = 0.04400;        f[15]= 0.06400;        
+	      f[16] = 0.05600;        f[17]= 0.01300;        f[18] = 0.03500;        f[19]= 0.07300; 
+	    */
+
+	    daa[1*20+0]=   1.2412691067876198;
+	    daa[2*20+0]=   1.2184237953498958;
+	    daa[2*20+1]=   1.5720770753326880;
+	    daa[3*20+0]=   1.3759368509441177;
+	    daa[3*20+1]=   0.7550654439001206;
+	    daa[3*20+2]=   7.8584219153689405;
+	    daa[4*20+0]=   2.4731223087544874;
+	    daa[4*20+1]=   1.4414262567428417;
+	    daa[4*20+2]=   0.9784679122774127;
+	    daa[4*20+3]=   0.2272488448121475;
+	    daa[5*20+0]=   2.2155167805137470;
+	    daa[5*20+1]=   5.5120819705248678;
+	    daa[5*20+2]=   3.0143201670924822;
+	    daa[5*20+3]=   1.6562495638176040;
+	    daa[5*20+4]=   0.4587469126746136;
+	    daa[6*20+0]=   2.3379911207495061;
+	    daa[6*20+1]=   1.3542404860613146;
+	    daa[6*20+2]=   2.0093434778398112;
+	    daa[6*20+3]=   9.6883451875685065;
+	    daa[6*20+4]=   0.4519167943192672;
+	    daa[6*20+5]=   6.8124601839937675;
+	    daa[7*20+0]=   3.3386555146457697;
+	    daa[7*20+1]=   1.3121700301622004;
+	    daa[7*20+2]=   2.4117632898861809;
+	    daa[7*20+3]=   1.9142079025990228;
+	    daa[7*20+4]=   1.1034605684472507;
+	    daa[7*20+5]=   0.8776110594765502;
+	    daa[7*20+6]=   1.3860121390169038;
+	    daa[8*20+0]=   0.9615841926910841;
+	    daa[8*20+1]=   4.9238668283945266;
+	    daa[8*20+2]=   6.1974384977884114;
+	    daa[8*20+3]=   2.1459640610133781;
+	    daa[8*20+4]=   1.5196756759380692;
+	    daa[8*20+5]=   7.9943228564946525;
+	    daa[8*20+6]=   1.6360079688522375;
+	    daa[8*20+7]=   0.8561248973045037;
+	    daa[9*20+0]=   0.8908203061925510;
+	    daa[9*20+1]=   0.4323005487925516;
+	    daa[9*20+2]=   0.9179291175331520;
+	    daa[9*20+3]=   0.2161660372725585;
+	    daa[9*20+4]=   0.9126668032539315;
+	    daa[9*20+5]=   0.4882733432879921;
+	    daa[9*20+6]=   0.4035497929633328;
+	    daa[9*20+7]=   0.2888075033037488;
+	    daa[9*20+8]=   0.5787937115407940;
+	    daa[10*20+0]=  1.0778497408764076;
+	    daa[10*20+1]=  0.8386701149158265;
+	    daa[10*20+2]=  0.4098311270816011;
+	    daa[10*20+3]=  0.3574207468998517;
+	    daa[10*20+4]=  1.4081315998413697;
+	    daa[10*20+5]=  1.3318097154194044;
+	    daa[10*20+6]=  0.5610717242294755;
+	    daa[10*20+7]=  0.3578662395745526;
+	    daa[10*20+8]=  1.0765007949562073;
+	    daa[10*20+9]=  6.0019110258426362;
+	    daa[11*20+0]=  1.4932055816372476;
+	    daa[11*20+1]=  10.017330817366002;
+	    daa[11*20+2]=  4.4034547578962568;
+	    daa[11*20+3]=  1.4521790561663968;
+	    daa[11*20+4]=  0.3371091785647479;
+	    daa[11*20+5]=  6.0519085243118811;
+	    daa[11*20+6]=  4.3290086529582830;
+	    daa[11*20+7]=  0.8945563662345198;
+	    daa[11*20+8]=  1.8085136096039203;
+	    daa[11*20+9]=  0.6244297525127139;
+	    daa[11*20+10]= 0.5642322882556321;
+	    daa[12*20+0]=  1.9006455961717605;
+	    daa[12*20+1]=  1.2488638689609959;
+	    daa[12*20+2]=  0.9378803706165143;
+	    daa[12*20+3]=  0.4075239926000898;
+	    daa[12*20+4]=  1.2213054800811556;
+	    daa[12*20+5]=  1.9106190827629084;
+	    daa[12*20+6]=  0.7471936218068498;
+	    daa[12*20+7]=  0.5954812791740037;
+	    daa[12*20+8]=  1.3808291710019667;
+	    daa[12*20+9]=  6.7597899772045418;
+	    daa[12*20+10]= 8.0327792947421148;
+	    daa[12*20+11]= 1.7129670976916258;
+	    daa[13*20+0]=  0.6883439026872615;
+	    daa[13*20+1]=  0.4224945197276290;
+	    daa[13*20+2]=  0.5044944273324311;
+	    daa[13*20+3]=  0.1675129724559251;
+	    daa[13*20+4]=  1.6953951980808002;
+	    daa[13*20+5]=  0.3573432522499545;
+	    daa[13*20+6]=  0.2317194387691585;
+	    daa[13*20+7]=  0.3693722640980460;
+	    daa[13*20+8]=  1.3629765501081097;
+	    daa[13*20+9]=  2.2864286949316077;
+	    daa[13*20+10]= 4.3611548063555778;
+	    daa[13*20+11]= 0.3910559903834828;
+	    daa[13*20+12]= 2.3201373546296349;
+	    daa[14*20+0]=  2.7355620089953550;
+	    daa[14*20+1]=  1.3091837782420783;
+	    daa[14*20+2]=  0.7103720531974738;
+	    daa[14*20+3]=  1.0714605979577547;
+	    daa[14*20+4]=  0.4326227078645523;
+	    daa[14*20+5]=  2.3019177728300728;
+	    daa[14*20+6]=  1.5132807416252063;
+	    daa[14*20+7]=  0.7744933618134962;
+	    daa[14*20+8]=  1.8370555852070649;
+	    daa[14*20+9]=  0.4811402387911145;
+	    daa[14*20+10]= 1.0084320519837335;
+	    daa[14*20+11]= 1.3918935593582853;
+	    daa[14*20+12]= 0.4953193808676289;
+	    daa[14*20+13]= 0.3746821107962129;
+	    daa[15*20+0]=  6.4208961859142883;
+	    daa[15*20+1]=  1.9202994262316166;
+	    daa[15*20+2]=  6.1234512396801764;
+	    daa[15*20+3]=  2.2161944596741829;
+	    daa[15*20+4]=  3.6366815408744255;
+	    daa[15*20+5]=  2.3193703643237220;
+	    daa[15*20+6]=  1.8273535587773553;
+	    daa[15*20+7]=  3.0637776193717610;
+	    daa[15*20+8]=  1.9699895187387506;
+	    daa[15*20+9]=  0.6047491507504744;
+	    daa[15*20+10]= 0.8953754669269811;
+	    daa[15*20+11]= 1.9776630140912268;
+	    daa[15*20+12]= 1.0657482318076852;
+	    daa[15*20+13]= 1.1079144700606407;
+	    daa[15*20+14]= 3.5465914843628927;
+	    daa[16*20+0]=  5.2892514169776437;
+	    daa[16*20+1]=  1.3363401740560601;
+	    daa[16*20+2]=  3.8852506105922231;
+	    daa[16*20+3]=  1.5066839872944762;
+	    daa[16*20+4]=  1.7557065205837685;
+	    daa[16*20+5]=  2.1576510103471440;
+	    daa[16*20+6]=  1.5839981708584689;
+	    daa[16*20+7]=  0.7147489676267383;
+	    daa[16*20+8]=  1.6136654573285647;
+	    daa[16*20+9]=  2.6344778384442731;
+	    daa[16*20+10]= 1.0192004372506540;
+	    daa[16*20+11]= 2.5513781312660280;
+	    daa[16*20+12]= 3.3628488360462363;
+	    daa[16*20+13]= 0.6882725908872254;
+	    daa[16*20+14]= 1.9485376673137556;
+	    daa[16*20+15]= 8.8479984061248178;
+	    daa[17*20+0]=  0.5488578478106930;
+	    daa[17*20+1]=  1.5170142153962840;
+	    daa[17*20+2]=  0.1808525752605976;
+	    daa[17*20+3]=  0.2496584188151770;
+	    daa[17*20+4]=  1.6275179891253113;
+	    daa[17*20+5]=  0.8959082681546182;
+	    daa[17*20+6]=  0.4198391148111098;
+	    daa[17*20+7]=  0.9349753595598769;
+	    daa[17*20+8]=  0.6301954684360302;
+	    daa[17*20+9]=  0.5604648274060783;
+	    daa[17*20+10]= 1.5183114434679339;
+	    daa[17*20+11]= 0.5851920879490173;
+	    daa[17*20+12]= 1.4680478689711018;
+	    daa[17*20+13]= 3.3448437239772266;
+	    daa[17*20+14]= 0.4326058001438786;
+	    daa[17*20+15]= 0.6791126595939816;
+	    daa[17*20+16]= 0.4514203099376473;
+	    daa[18*20+0]=  0.5411769916657778;
+	    daa[18*20+1]=  0.8912614404565405;
+	    daa[18*20+2]=  1.0894926581511342;
+	    daa[18*20+3]=  0.7447620891784513;
+	    daa[18*20+4]=  2.1579775140421025;
+	    daa[18*20+5]=  0.9183596801412757;
+	    daa[18*20+6]=  0.5818111331782764;
+	    daa[18*20+7]=  0.3374467649724478;
+	    daa[18*20+8]=  7.7587442309146040;
+	    daa[18*20+9]=  0.8626796044156272;
+	    daa[18*20+10]= 1.2452243224541324;
+	    daa[18*20+11]= 0.7835447533710449;
+	    daa[18*20+12]= 1.0899165770956820;
+	    daa[18*20+13]= 10.384852333133459;
+	    daa[18*20+14]= 0.4819109019647465;
+	    daa[18*20+15]= 0.9547229305958682;
+	    daa[18*20+16]= 0.8564314184691215;
+	    daa[18*20+17]= 4.5377235790405388;
+	    daa[19*20+0]=  4.6501894691803214;
+	    daa[19*20+1]=  0.7807017855806767;
+	    daa[19*20+2]=  0.4586061981719967;
+	    daa[19*20+3]=  0.4594535241660911;
+	    daa[19*20+4]=  2.2627456996290891;
+	    daa[19*20+5]=  0.6366932501396869;
+	    daa[19*20+6]=  0.8940572875547330;
+	    daa[19*20+7]=  0.6193321034173915;
+	    daa[19*20+8]=  0.5333220944030346;
+	    daa[19*20+9]=  14.872933461519061;
+	    daa[19*20+10]= 3.5458093276667237;
+	    daa[19*20+11]= 0.7801080335991272;
+	    daa[19*20+12]= 4.0584577156753401;
+	    daa[19*20+13]= 1.7039730522675411;
+	    daa[19*20+14]= 0.5985498912985666;
+	    daa[19*20+15]= 0.9305232113028208;
+	    daa[19*20+16]= 3.4242218450865543;
+	    daa[19*20+17]= 0.5658969249032649;
+	    daa[19*20+18]= 1.0000000000000000;
+	    
+	    f[0]=  0.0770764620135024;
+	    f[1]=  0.0500819370772208;
+	    f[2]=  0.0462377395993731;
+	    f[3]=  0.0537929860758246;
+	    f[4]=  0.0144533387583345;
+	    f[5]=  0.0408923608974345;
+	    f[6]=  0.0633579339160905;
+	    f[7]=  0.0655672355884439;
+	    f[8]=  0.0218802687005936;
+	    f[9]=  0.0591969699027449;
+	    f[10]= 0.0976461276528445;
+	    f[11]= 0.0592079410822730;
+	    f[12]= 0.0220695876653368;
+	    f[13]= 0.0413508521834260;
+	    f[14]= 0.0476871596856874;
+	    f[15]= 0.0707295165111524;
+	    f[16]= 0.0567759161524817;
+	    f[17]= 0.0127019797647213;
+	    f[18]= 0.0323746050281867;
+	    f[19]= 0.0669190817443274;
+	  }
+	  break;
+	case BLOSUM62:
+	  {
+	    daa[1*20+0]= 0.735790389698;  daa[2*20+0]= 0.485391055466;  daa[2*20+1]= 1.297446705134;  
+	    daa[3*20+0]= 0.543161820899;  
+	    daa[3*20+1]= 0.500964408555;  daa[3*20+2]= 3.180100048216;  daa[4*20+0]= 1.45999531047;   
+	    daa[4*20+1]= 0.227826574209;  
+	    daa[4*20+2]= 0.397358949897;  daa[4*20+3]= 0.240836614802;  daa[5*20+0]= 1.199705704602;  
+	    daa[5*20+1]= 3.020833610064;  
+	    daa[5*20+2]= 1.839216146992;  daa[5*20+3]= 1.190945703396;  daa[5*20+4]= 0.32980150463;   
+	    daa[6*20+0]= 1.1709490428;    
+	    daa[6*20+1]= 1.36057419042;   daa[6*20+2]= 1.24048850864;   daa[6*20+3]= 3.761625208368;  
+	    daa[6*20+4]= 0.140748891814;  
+	    daa[6*20+5]= 5.528919177928;  daa[7*20+0]= 1.95588357496;   daa[7*20+1]= 0.418763308518;  
+	    daa[7*20+2]= 1.355872344485;  
+	    daa[7*20+3]= 0.798473248968;  daa[7*20+4]= 0.418203192284;  daa[7*20+5]= 0.609846305383;  
+	    daa[7*20+6]= 0.423579992176;  
+	    daa[8*20+0]= 0.716241444998;  daa[8*20+1]= 1.456141166336;  daa[8*20+2]= 2.414501434208;  
+	    daa[8*20+3]= 0.778142664022;  
+	    daa[8*20+4]= 0.354058109831;  daa[8*20+5]= 2.43534113114;   daa[8*20+6]= 1.626891056982;  
+	    daa[8*20+7]= 0.539859124954;  
+	    daa[9*20+0]= 0.605899003687;  daa[9*20+1]= 0.232036445142;  daa[9*20+2]= 0.283017326278;  
+	    daa[9*20+3]= 0.418555732462;  
+	    daa[9*20+4]= 0.774894022794;  daa[9*20+5]= 0.236202451204;  daa[9*20+6]= 0.186848046932;  
+	    daa[9*20+7]= 0.189296292376;  
+	    daa[9*20+8]= 0.252718447885;  daa[10*20+0]= 0.800016530518; daa[10*20+1]= 0.622711669692; 
+	    daa[10*20+2]= 0.211888159615; 
+	    daa[10*20+3]= 0.218131577594; daa[10*20+4]= 0.831842640142; daa[10*20+5]= 0.580737093181; 
+	    daa[10*20+6]= 0.372625175087; 
+	    daa[10*20+7]= 0.217721159236; daa[10*20+8]= 0.348072209797; daa[10*20+9]= 3.890963773304; 
+	    daa[11*20+0]= 1.295201266783; 
+	    daa[11*20+1]= 5.411115141489; daa[11*20+2]= 1.593137043457; daa[11*20+3]= 1.032447924952; 
+	    daa[11*20+4]= 0.285078800906; 
+	    daa[11*20+5]= 3.945277674515; daa[11*20+6]= 2.802427151679; daa[11*20+7]= 0.752042440303; 
+	    daa[11*20+8]= 1.022507035889; 
+	    daa[11*20+9]= 0.406193586642; daa[11*20+10]= 0.445570274261;daa[12*20+0]= 1.253758266664; 
+	    daa[12*20+1]= 0.983692987457; 
+	    daa[12*20+2]= 0.648441278787; daa[12*20+3]= 0.222621897958; daa[12*20+4]= 0.76768882348;  
+	    daa[12*20+5]= 2.494896077113; 
+	    daa[12*20+6]= 0.55541539747;  daa[12*20+7]= 0.459436173579; daa[12*20+8]= 0.984311525359; 
+	    daa[12*20+9]= 3.364797763104; 
+	    daa[12*20+10]= 6.030559379572;daa[12*20+11]= 1.073061184332;daa[13*20+0]= 0.492964679748; 
+	    daa[13*20+1]= 0.371644693209; 
+	    daa[13*20+2]= 0.354861249223; daa[13*20+3]= 0.281730694207; daa[13*20+4]= 0.441337471187; 
+	    daa[13*20+5]= 0.14435695975;  
+	    daa[13*20+6]= 0.291409084165; daa[13*20+7]= 0.368166464453; daa[13*20+8]= 0.714533703928; 
+	    daa[13*20+9]= 1.517359325954; 
+	    daa[13*20+10]= 2.064839703237;daa[13*20+11]= 0.266924750511;daa[13*20+12]= 1.77385516883; 
+	    daa[14*20+0]= 1.173275900924; 
+	    daa[14*20+1]= 0.448133661718; daa[14*20+2]= 0.494887043702; daa[14*20+3]= 0.730628272998; 
+	    daa[14*20+4]= 0.356008498769; 
+	    daa[14*20+5]= 0.858570575674; daa[14*20+6]= 0.926563934846; daa[14*20+7]= 0.504086599527; daa[14*20+8]= 0.527007339151; 
+	    daa[14*20+9]= 0.388355409206; daa[14*20+10]= 0.374555687471;daa[14*20+11]= 1.047383450722;daa[14*20+12]= 0.454123625103;
+	    daa[14*20+13]= 0.233597909629;daa[15*20+0]= 4.325092687057; daa[15*20+1]= 1.12278310421;  daa[15*20+2]= 2.904101656456; 
+	    daa[15*20+3]= 1.582754142065; daa[15*20+4]= 1.197188415094; daa[15*20+5]= 1.934870924596; daa[15*20+6]= 1.769893238937; 
+	    daa[15*20+7]= 1.509326253224; daa[15*20+8]= 1.11702976291;  daa[15*20+9]= 0.35754441246;  daa[15*20+10]= 0.352969184527;
+	    daa[15*20+11]= 1.752165917819;daa[15*20+12]= 0.918723415746;daa[15*20+13]= 0.540027644824;daa[15*20+14]= 1.169129577716;
+	    daa[16*20+0]= 1.729178019485; daa[16*20+1]= 0.914665954563; daa[16*20+2]= 1.898173634533; daa[16*20+3]= 0.934187509431; 
+	    daa[16*20+4]= 1.119831358516; daa[16*20+5]= 1.277480294596; daa[16*20+6]= 1.071097236007; daa[16*20+7]= 0.641436011405; 
+	    daa[16*20+8]= 0.585407090225; daa[16*20+9]= 1.17909119726;  daa[16*20+10]= 0.915259857694;daa[16*20+11]= 1.303875200799;
+	    daa[16*20+12]= 1.488548053722;daa[16*20+13]= 0.488206118793;daa[16*20+14]= 1.005451683149;daa[16*20+15]= 5.15155629227; 
+	    daa[17*20+0]= 0.465839367725; daa[17*20+1]= 0.426382310122; daa[17*20+2]= 0.191482046247; daa[17*20+3]= 0.145345046279; 
+	    daa[17*20+4]= 0.527664418872; daa[17*20+5]= 0.758653808642; daa[17*20+6]= 0.407635648938; daa[17*20+7]= 0.508358924638; 
+	    daa[17*20+8]= 0.30124860078;  daa[17*20+9]= 0.34198578754;  daa[17*20+10]= 0.6914746346;  daa[17*20+11]= 0.332243040634;
+	    daa[17*20+12]= 0.888101098152;daa[17*20+13]= 2.074324893497;daa[17*20+14]= 0.252214830027;daa[17*20+15]= 0.387925622098;
+	    daa[17*20+16]= 0.513128126891;daa[18*20+0]= 0.718206697586; daa[18*20+1]= 0.720517441216; daa[18*20+2]= 0.538222519037; 
+	    daa[18*20+3]= 0.261422208965; daa[18*20+4]= 0.470237733696; daa[18*20+5]= 0.95898974285;  daa[18*20+6]= 0.596719300346; 
+	    daa[18*20+7]= 0.308055737035; daa[18*20+8]= 4.218953969389; daa[18*20+9]= 0.674617093228; daa[18*20+10]= 0.811245856323;
+	    daa[18*20+11]= 0.7179934869;  daa[18*20+12]= 0.951682162246;daa[18*20+13]= 6.747260430801;daa[18*20+14]= 0.369405319355;
+	    daa[18*20+15]= 0.796751520761;daa[18*20+16]= 0.801010243199;daa[18*20+17]= 4.054419006558;daa[19*20+0]= 2.187774522005; 
+	    daa[19*20+1]= 0.438388343772; daa[19*20+2]= 0.312858797993; daa[19*20+3]= 0.258129289418; daa[19*20+4]= 1.116352478606; 
+	    daa[19*20+5]= 0.530785790125; daa[19*20+6]= 0.524253846338; daa[19*20+7]= 0.25334079019;  daa[19*20+8]= 0.20155597175;  
+	    daa[19*20+9]= 8.311839405458; daa[19*20+10]= 2.231405688913;daa[19*20+11]= 0.498138475304;daa[19*20+12]= 2.575850755315;
+	    daa[19*20+13]= 0.838119610178;daa[19*20+14]= 0.496908410676;daa[19*20+15]= 0.561925457442;daa[19*20+16]= 2.253074051176;
+	    daa[19*20+17]= 0.266508731426;daa[19*20+18]= 1;             
+	    
+	    f[0]= 0.074;                 f[1]= 0.052;                 f[2]= 0.045;                 f[3]= 0.054;                 
+	    f[4]= 0.025;                 f[5]= 0.034;                 f[6]= 0.054;                 f[7]= 0.074;                 
+	    f[8]= 0.026;                 f[9]= 0.068;                 f[10]= 0.099;                f[11]= 0.058;                
+	    f[12]= 0.025;                f[13]= 0.047;                f[14]= 0.039;                f[15]= 0.057;                
+	    f[16]= 0.051;                f[17]= 0.013;                f[18]= 0.032;                f[19]= 0.073;
+	  }
+	  break;
+	case MTMAM:
+	  {
+	    daa[1*20+0]= 32;              daa[2*20+0]= 2;    daa[2*20+1]= 4;               daa[3*20+0]= 11;
+	    daa[3*20+1]= 0;               daa[3*20+2]= 864;  daa[4*20+0]= 0;               daa[4*20+1]= 186;
+	    daa[4*20+2]= 0;               daa[4*20+3]= 0;    daa[5*20+0]= 0;               daa[5*20+1]= 246;
+	    daa[5*20+2]= 8;               daa[5*20+3]= 49;   daa[5*20+4]= 0;               daa[6*20+0]= 0;
+	    daa[6*20+1]= 0;               daa[6*20+2]= 0;    daa[6*20+3]= 569;             daa[6*20+4]= 0;
+	    daa[6*20+5]= 274;             daa[7*20+0]= 78;   daa[7*20+1]= 18;              daa[7*20+2]= 47;
+	    daa[7*20+3]= 79;              daa[7*20+4]= 0;    daa[7*20+5]= 0;               daa[7*20+6]= 22;
+	    daa[8*20+0]= 8;               daa[8*20+1]= 232;  daa[8*20+2]= 458;             daa[8*20+3]= 11;
+	    daa[8*20+4]= 305;             daa[8*20+5]= 550;  daa[8*20+6]= 22;              daa[8*20+7]= 0;
+	    daa[9*20+0]= 75;              daa[9*20+1]= 0;    daa[9*20+2]= 19;              daa[9*20+3]= 0;
+	    daa[9*20+4]= 41;              daa[9*20+5]= 0;    daa[9*20+6]= 0;               daa[9*20+7]= 0;
+	    daa[9*20+8]= 0;               daa[10*20+0]= 21;  daa[10*20+1]= 6;              daa[10*20+2]= 0;
+	    daa[10*20+3]= 0;              daa[10*20+4]= 27;  daa[10*20+5]= 20;             daa[10*20+6]= 0;
+	    daa[10*20+7]= 0;              daa[10*20+8]= 26;  daa[10*20+9]= 232;            daa[11*20+0]= 0;
+	    daa[11*20+1]= 50;             daa[11*20+2]= 408; daa[11*20+3]= 0;              daa[11*20+4]= 0;
+	    daa[11*20+5]= 242;            daa[11*20+6]= 215; daa[11*20+7]= 0;              daa[11*20+8]= 0;
+	    daa[11*20+9]= 6;              daa[11*20+10]= 4;  daa[12*20+0]= 76;             daa[12*20+1]= 0;
+	    daa[12*20+2]= 21;             daa[12*20+3]= 0;   daa[12*20+4]= 0;              daa[12*20+5]= 22;
+	    daa[12*20+6]= 0;              daa[12*20+7]= 0;   daa[12*20+8]= 0;              daa[12*20+9]= 378;
+	    daa[12*20+10]= 609;           daa[12*20+11]= 59; daa[13*20+0]= 0;              daa[13*20+1]= 0;
+	    daa[13*20+2]= 6;              daa[13*20+3]= 5;   daa[13*20+4]= 7;              daa[13*20+5]= 0;
+	    daa[13*20+6]= 0;              daa[13*20+7]= 0;   daa[13*20+8]= 0;              daa[13*20+9]= 57;
+	    daa[13*20+10]= 246;           daa[13*20+11]= 0;  daa[13*20+12]= 11;            daa[14*20+0]= 53;
+	    daa[14*20+1]= 9;              daa[14*20+2]= 33;  daa[14*20+3]= 2;              daa[14*20+4]= 0;
+	    daa[14*20+5]= 51;             daa[14*20+6]= 0;   daa[14*20+7]= 0;              daa[14*20+8]= 53;
+	    daa[14*20+9]= 5;              daa[14*20+10]= 43; daa[14*20+11]= 18;            daa[14*20+12]= 0;
+	    daa[14*20+13]= 17;            daa[15*20+0]= 342; daa[15*20+1]= 3;              daa[15*20+2]= 446;
+	    daa[15*20+3]= 16;             daa[15*20+4]= 347; daa[15*20+5]= 30;             daa[15*20+6]= 21;
+	    daa[15*20+7]= 112;            daa[15*20+8]= 20;  daa[15*20+9]= 0;              daa[15*20+10]= 74;
+	    daa[15*20+11]= 65;            daa[15*20+12]= 47; daa[15*20+13]= 90;            daa[15*20+14]= 202;
+	    daa[16*20+0]= 681;            daa[16*20+1]= 0;   daa[16*20+2]= 110;            daa[16*20+3]= 0;
+	    daa[16*20+4]= 114;            daa[16*20+5]= 0;   daa[16*20+6]= 4;              daa[16*20+7]= 0;
+	    daa[16*20+8]= 1;              daa[16*20+9]= 360; daa[16*20+10]= 34;            daa[16*20+11]= 50;
+	    daa[16*20+12]= 691;           daa[16*20+13]= 8;  daa[16*20+14]= 78;            daa[16*20+15]= 614;
+	    daa[17*20+0]= 5;              daa[17*20+1]= 16;  daa[17*20+2]= 6;              daa[17*20+3]= 0;
+	    daa[17*20+4]= 65;             daa[17*20+5]= 0;   daa[17*20+6]= 0;              daa[17*20+7]= 0;
+	    daa[17*20+8]= 0;              daa[17*20+9]= 0;   daa[17*20+10]= 12;            daa[17*20+11]= 0;
+	    daa[17*20+12]= 13;            daa[17*20+13]= 0;  daa[17*20+14]= 7;             daa[17*20+15]= 17;
+	    daa[17*20+16]= 0;             daa[18*20+0]= 0;   daa[18*20+1]= 0;              daa[18*20+2]= 156;
+	    daa[18*20+3]= 0;              daa[18*20+4]= 530; daa[18*20+5]= 54;             daa[18*20+6]= 0;
+	    daa[18*20+7]= 1;              daa[18*20+8]= 1525;daa[18*20+9]= 16;             daa[18*20+10]= 25;
+	    daa[18*20+11]= 67;            daa[18*20+12]= 0;  daa[18*20+13]= 682;           daa[18*20+14]= 8;
+	    daa[18*20+15]= 107;           daa[18*20+16]= 0;  daa[18*20+17]= 14;            daa[19*20+0]= 398;
+	    daa[19*20+1]= 0;              daa[19*20+2]= 0;   daa[19*20+3]= 10;             daa[19*20+4]= 0;
+	    daa[19*20+5]= 33;             daa[19*20+6]= 20;  daa[19*20+7]= 5;              daa[19*20+8]= 0;
+	    daa[19*20+9]= 2220;           daa[19*20+10]= 100;daa[19*20+11]= 0;             daa[19*20+12]= 832;
+	    daa[19*20+13]= 6;             daa[19*20+14]= 0;  daa[19*20+15]= 0;             daa[19*20+16]= 237;
+	    daa[19*20+17]= 0;             daa[19*20+18]= 0;       
+	    
+	    f[0]= 0.06920;  f[1]=  0.01840;  f[2]= 0.04000;  f[3]= 0.018600;
+	    f[4]= 0.00650;  f[5]=  0.02380;  f[6]= 0.02360;  f[7]= 0.055700;
+	    f[8]= 0.02770;  f[9]=  0.09050;  f[10]=0.16750;  f[11]= 0.02210;
+	    f[12]=0.05610;  f[13]= 0.06110;  f[14]=0.05360;  f[15]= 0.07250;
+	    f[16]=0.08700;  f[17]= 0.02930;  f[18]=0.03400;  f[19]= 0.04280;
+	  }
+	  break;
+	case LG:
+	  {
+	    daa[1*20+0] = 0.425093;
+
+	    daa[2*20+0] = 0.276818; daa[2*20+1] = 0.751878;
+
+	    daa[3*20+0] = 0.395144; daa[3*20+1] = 0.123954; daa[3*20+2] = 5.076149;
+	    
+	    daa[4*20+0] = 2.489084; daa[4*20+1] = 0.534551; daa[4*20+2] = 0.528768; daa[4*20+3] = 0.062556;
+								 
+	    daa[5*20+0] = 0.969894; daa[5*20+1] = 2.807908; daa[5*20+2] = 1.695752; daa[5*20+3] = 0.523386; daa[5*20+4] = 0.084808;
+
+	    daa[6*20+0] = 1.038545; daa[6*20+1] = 0.363970; daa[6*20+2] = 0.541712; daa[6*20+3] = 5.243870; daa[6*20+4] = 0.003499; daa[6*20+5] = 4.128591;
+
+	    daa[7*20+0] = 2.066040; daa[7*20+1] = 0.390192; daa[7*20+2] = 1.437645; daa[7*20+3] = 0.844926; daa[7*20+4] = 0.569265; daa[7*20+5] = 0.267959; daa[7*20+6] = 0.348847;
+ 
+	    daa[8*20+0] = 0.358858; daa[8*20+1] = 2.426601; daa[8*20+2] = 4.509238; daa[8*20+3] = 0.927114; daa[8*20+4] = 0.640543; daa[8*20+5] = 4.813505; daa[8*20+6] = 0.423881; 
+	    daa[8*20+7] = 0.311484;
+
+	    daa[9*20+0] = 0.149830; daa[9*20+1] = 0.126991; daa[9*20+2] = 0.191503; daa[9*20+3] = 0.010690; daa[9*20+4] = 0.320627; daa[9*20+5] = 0.072854; daa[9*20+6] = 0.044265; 
+	    daa[9*20+7] = 0.008705; daa[9*20+8] = 0.108882; 
+
+	    daa[10*20+0] = 0.395337; daa[10*20+1] = 0.301848; daa[10*20+2] = 0.068427; daa[10*20+3] = 0.015076; daa[10*20+4] = 0.594007; daa[10*20+5] = 0.582457; daa[10*20+6] = 0.069673; 
+	    daa[10*20+7] = 0.044261; daa[10*20+8] = 0.366317; daa[10*20+9] = 4.145067 ;
+
+	    daa[11*20+0] = 0.536518; daa[11*20+1] = 6.326067; daa[11*20+2] = 2.145078; daa[11*20+3] = 0.282959; daa[11*20+4] = 0.013266; daa[11*20+5] = 3.234294; daa[11*20+6] = 1.807177; 
+	    daa[11*20+7] = 0.296636; daa[11*20+8] = 0.697264; daa[11*20+9] = 0.159069; daa[11*20+10] = 0.137500;
+
+
+	    daa[12*20+0] = 1.124035; daa[12*20+1] = 0.484133; daa[12*20+2] = 0.371004; daa[12*20+3] = 0.025548; daa[12*20+4] = 0.893680; daa[12*20+5] = 1.672569; daa[12*20+6] = 0.173735; 
+	    daa[12*20+7] = 0.139538; daa[12*20+8] = 0.442472; daa[12*20+9] = 4.273607; daa[12*20+10] = 6.312358; daa[12*20+11] = 0.656604;
+
+	    daa[13*20+0] = 0.253701; daa[13*20+1] = 0.052722;daa[13*20+2] = 0.089525; daa[13*20+3] = 0.017416; daa[13*20+4] = 1.105251; daa[13*20+5] = 0.035855; daa[13*20+6] = 0.018811; 
+	    daa[13*20+7] = 0.089586; daa[13*20+8] = 0.682139; daa[13*20+9] = 1.112727; daa[13*20+10] = 2.592692; daa[13*20+11] = 0.023918; daa[13*20+12] = 1.798853;
+
+	    daa[14*20+0] = 1.177651; daa[14*20+1] = 0.332533;daa[14*20+2] = 0.161787; daa[14*20+3] = 0.394456; daa[14*20+4] = 0.075382; daa[14*20+5] = 0.624294; daa[14*20+6] = 0.419409; 
+	    daa[14*20+7] = 0.196961; daa[14*20+8] = 0.508851; daa[14*20+9] = 0.078281; daa[14*20+10] = 0.249060; daa[14*20+11] = 0.390322; daa[14*20+12] = 0.099849; 
+	    daa[14*20+13] = 0.094464;
+ 
+	    daa[15*20+0] = 4.727182; daa[15*20+1] = 0.858151;daa[15*20+2] = 4.008358; daa[15*20+3] = 1.240275; daa[15*20+4] = 2.784478; daa[15*20+5] = 1.223828; daa[15*20+6] = 0.611973; 
+	    daa[15*20+7] = 1.739990; daa[15*20+8] = 0.990012; daa[15*20+9] = 0.064105; daa[15*20+10] = 0.182287; daa[15*20+11] = 0.748683; daa[15*20+12] = 0.346960; 
+	    daa[15*20+13] = 0.361819; daa[15*20+14] = 1.338132;
+ 
+	    daa[16*20+0] = 2.139501; daa[16*20+1] = 0.578987;daa[16*20+2] = 2.000679; daa[16*20+3] = 0.425860; daa[16*20+4] = 1.143480; daa[16*20+5] = 1.080136; daa[16*20+6] = 0.604545; 
+	    daa[16*20+7] = 0.129836; daa[16*20+8] = 0.584262; daa[16*20+9] = 1.033739; daa[16*20+10] = 0.302936; daa[16*20+11] = 1.136863; daa[16*20+12] = 2.020366; 
+	    daa[16*20+13] = 0.165001; daa[16*20+14] = 0.571468; daa[16*20+15] = 6.472279;
+
+	    daa[17*20+0] = 0.180717; daa[17*20+1] = 0.593607;daa[17*20+2] = 0.045376; daa[17*20+3] = 0.029890; daa[17*20+4] = 0.670128; daa[17*20+5] = 0.236199; daa[17*20+6] = 0.077852; 
+	    daa[17*20+7] = 0.268491; daa[17*20+8] = 0.597054; daa[17*20+9] = 0.111660; daa[17*20+10] = 0.619632; daa[17*20+11] = 0.049906; daa[17*20+12] = 0.696175; 
+	    daa[17*20+13] = 2.457121; daa[17*20+14] = 0.095131; daa[17*20+15] = 0.248862; daa[17*20+16] = 0.140825;
+
+	    daa[18*20+0] = 0.218959; daa[18*20+1] = 0.314440;daa[18*20+2] = 0.612025; daa[18*20+3] = 0.135107; daa[18*20+4] = 1.165532; daa[18*20+5] = 0.257336; daa[18*20+6] = 0.120037; 
+	    daa[18*20+7] = 0.054679; daa[18*20+8] = 5.306834; daa[18*20+9] = 0.232523; daa[18*20+10] = 0.299648; daa[18*20+11] = 0.131932; daa[18*20+12] = 0.481306; 
+	    daa[18*20+13] = 7.803902; daa[18*20+14] = 0.089613; daa[18*20+15] = 0.400547; daa[18*20+16] = 0.245841; daa[18*20+17] = 3.151815;
+
+	    daa[19*20+0] = 2.547870; daa[19*20+1] = 0.170887;daa[19*20+2] = 0.083688; daa[19*20+3] = 0.037967; daa[19*20+4] = 1.959291; daa[19*20+5] = 0.210332; daa[19*20+6] = 0.245034; 
+	    daa[19*20+7] = 0.076701; daa[19*20+8] = 0.119013; daa[19*20+9] = 10.649107; daa[19*20+10] = 1.702745; daa[19*20+11] = 0.185202; daa[19*20+12] = 1.898718; 
+	    daa[19*20+13] = 0.654683; daa[19*20+14] = 0.296501; daa[19*20+15] = 0.098369; daa[19*20+16] = 2.188158; daa[19*20+17] = 0.189510; daa[19*20+18] = 0.249313;
+	    
+	    /*f[0] = 0.07906;
+	    f[1] = 0.05594; 
+	    f[2] = 0.04198; 
+	    f[3] = 0.05305; 
+	    f[4] = 0.01294; 
+	    f[5] = 0.04077; 
+	    f[6] = 0.07158; 
+	    f[7] = 0.05734; 
+	    f[8] = 0.02235; 
+	    f[9] = 0.06216; 
+	    f[10] = 0.09908; 
+	    f[11] = 0.06460; 
+	    f[12] = 0.02295; 
+	    f[13] = 0.04230; 
+	    f[14] = 0.04404; 
+	    f[15] = 0.06120; 
+	    f[16] = 0.05329; 
+	    f[17] = 0.01207; 
+	    f[18] = 0.03415; 
+	    f[19] = 0.06915; 	   */
+
+	    f[0] = 0.079066; f[1] = 0.055941; f[2] = 0.041977; f[3] = 0.053052;
+	    f[4] = 0.012937; f[5] = 0.040767; f[6] = 0.071586; f[7] = 0.057337;
+	    f[8] = 0.022355; f[9] = 0.062157; f[10] = 0.099081; f[11] = 0.064600;
+	    f[12] = 0.022951; f[13] = 0.042302; f[14] = 0.044040; f[15] = 0.061197;
+	    f[16] = 0.053287; f[17] = 0.012066; f[18] = 0.034155; f[19] = 0.069146;
+	  }	  
+	  break;
+	case LG4M:
+	  {
+	    double 
+	      rates[4][190] = 
+	      {
+		{
+		  0.269343
+		  , 0.254612, 0.150988
+		  , 0.236821, 0.031863, 0.659648
+		  , 2.506547, 0.938594, 0.975736, 0.175533
+		  , 0.359080, 0.348288, 0.697708, 0.086573, 0.095967
+		  , 0.304674, 0.156000, 0.377704, 0.449140, 0.064706, 4.342595
+		  , 1.692015, 0.286638, 0.565095, 0.380358, 0.617945, 0.202058, 0.264342
+		  , 0.251974, 0.921633, 1.267609, 0.309692, 0.390429, 2.344059, 0.217750, 0.104842
+		  , 1.085220, 0.325624, 0.818658, 0.037814, 1.144150, 0.534567, 0.222793, 0.062682, 0.567431
+		  , 0.676353, 0.602366, 0.217027, 0.007533, 1.595775, 0.671143, 0.158424, 0.070463, 0.764255, 8.226528
+		  , 0.179155, 0.971338, 1.343718, 0.133744, 0.122468, 0.983857, 0.994128, 0.220916, 0.410581, 0.387487, 0.181110
+		  , 1.636817, 0.515217, 0.670461, 0.071252, 1.534848, 5.288642, 0.255628, 0.094198, 0.257229, 25.667158, 6.819689, 1.591212
+		  , 0.235498, 0.123932, 0.099793, 0.030425, 0.897279, 0.112229, 0.022529, 0.047488, 0.762914, 1.344259, 0.865691, 0.038921, 2.030833
+		  , 1.265605, 0.040163, 0.173354, 0.027579, 0.259961, 0.580374, 0.088041, 0.145595, 0.143676, 0.298859, 1.020117, 0.000714, 0.190019, 0.093964
+		  , 5.368405, 0.470952, 5.267140, 0.780505, 4.986071, 0.890554, 0.377949, 1.755515, 0.786352, 0.527246, 0.667783, 0.659948, 0.731921, 0.837669, 1.355630
+		  , 1.539394, 0.326789, 1.688169, 0.283738, 1.389282, 0.329821, 0.231770, 0.117017, 0.449977, 3.531600, 0.721586, 0.497588, 2.691697, 0.152088, 0.698040, 16.321298
+		  , 0.140944, 0.375611, 0.025163, 0.002757, 0.801456, 0.257253, 0.103678, 0.132995, 0.345834, 0.377156, 0.839647, 0.176970, 0.505682, 1.670170, 0.091298, 0.210096, 0.013165
+		  , 0.199836, 0.146857, 0.806275, 0.234246, 1.436970, 0.319669, 0.010076, 0.036859, 3.503317, 0.598632, 0.738969, 0.154436, 0.579000, 4.245524, 0.074524, 0.454195, 0.232913, 1.178490
+		  , 9.435529, 0.285934, 0.395670, 0.130890, 6.097263, 0.516259, 0.503665, 0.222960, 0.149143, 13.666175, 2.988174, 0.162725, 5.973826, 0.843416, 0.597394, 0.701149, 4.680002, 0.300085, 0.416262
+		},
+		{
+		  0.133720
+		  , 0.337212, 0.749052
+		  , 0.110918, 0.105087, 4.773487
+		  , 3.993460, 0.188305, 1.590332, 0.304942
+		  , 0.412075, 2.585774, 1.906884, 0.438367, 0.242076
+		  , 0.435295, 0.198278, 0.296366, 7.470333, 0.008443, 3.295515
+		  , 7.837540, 0.164607, 0.431724, 0.153850, 1.799716, 0.269744, 0.242866
+		  , 0.203872, 2.130334, 9.374479, 1.080878, 0.152458, 12.299133, 0.279589, 0.089714
+		  , 0.039718, 0.024553, 0.135254, 0.014979, 0.147498, 0.033964, 0.005585, 0.007248, 0.022746
+		  , 0.075784, 0.080091, 0.084971, 0.014128, 0.308347, 0.500836, 0.022833, 0.022999, 0.161270, 1.511682
+		  , 0.177662, 10.373708, 1.036721, 0.038303, 0.043030, 2.181033, 0.321165, 0.103050, 0.459502, 0.021215, 0.078395
+		  , 0.420784, 0.192765, 0.329545, 0.008331, 0.883142, 1.403324, 0.168673, 0.160728, 0.612573, 1.520889, 7.763266, 0.307903
+		  , 0.071268, 0.019652, 0.088753, 0.013547, 0.566609, 0.071878, 0.020050, 0.041022, 0.625361, 0.382806, 1.763059, 0.044644, 1.551911
+		  , 0.959127, 1.496585, 0.377794, 0.332010, 0.318192, 1.386970, 0.915904, 0.224255, 2.611479, 0.029351, 0.068250, 1.542356, 0.047525, 0.182715
+		  , 11.721512, 0.359408, 2.399158, 0.219464, 9.104192, 0.767563, 0.235229, 3.621219, 0.971955, 0.033780, 0.043035, 0.236929, 0.319964, 0.124977, 0.840651
+		  , 2.847068, 0.218463, 1.855386, 0.109808, 4.347048, 0.765848, 0.164569, 0.312024, 0.231569, 0.356327, 0.159597, 0.403210, 1.135162, 0.106903, 0.269190, 9.816481
+		  , 0.030203, 0.387292, 0.118878, 0.067287, 0.190240, 0.122113, 0.007023, 0.137411, 0.585141, 0.020634, 0.228824, 0.000122, 0.474862, 3.135128, 0.030313, 0.093830, 0.119152
+		  , 0.067183, 0.130101, 0.348730, 0.061798, 0.301198, 0.095382, 0.095764, 0.044628, 2.107384, 0.046105, 0.100117, 0.017073, 0.192383, 8.367641, 0.000937, 0.137416, 0.044722, 4.179782
+		  , 0.679398, 0.041567, 0.092408, 0.023701, 1.271187, 0.115566, 0.055277, 0.086988, 0.060779, 8.235167, 0.609420, 0.061764, 0.581962, 0.184187, 0.080246, 0.098033, 1.438350, 0.023439, 0.039124
+		},	    
+		{
+		  0.421017
+		  , 0.316236, 0.693340
+		  , 0.285984, 0.059926, 6.158219
+		  , 4.034031, 1.357707, 0.708088, 0.063669
+		  , 0.886972, 2.791622, 1.701830, 0.484347, 0.414286
+		  , 0.760525, 0.233051, 0.378723, 4.032667, 0.081977, 4.940411
+		  , 0.754103, 0.402894, 2.227443, 1.102689, 0.416576, 0.459376, 0.508409
+		  , 0.571422, 2.319453, 5.579973, 0.885376, 1.439275, 4.101979, 0.576745, 0.428799
+		  , 0.162152, 0.085229, 0.095692, 0.006129, 0.490937, 0.104843, 0.045514, 0.004705, 0.098934
+		  , 0.308006, 0.287051, 0.056994, 0.007102, 0.958988, 0.578990, 0.067119, 0.024403, 0.342983, 3.805528
+		  , 0.390161, 7.663209, 1.663641, 0.105129, 0.135029, 3.364474, 0.652618, 0.457702, 0.823674, 0.129858, 0.145630
+		  , 1.042298, 0.364551, 0.293222, 0.037983, 1.486520, 1.681752, 0.192414, 0.070498, 0.222626, 4.529623, 4.781730, 0.665308
+		  , 0.362476, 0.073439, 0.129245, 0.020078, 1.992483, 0.114549, 0.023272, 0.064490, 1.491794, 1.113437, 2.132006, 0.041677, 1.928654
+		  , 1.755491, 0.087050, 0.099325, 0.163817, 0.242851, 0.322939, 0.062943, 0.198698, 0.192904, 0.062948, 0.180283, 0.059655, 0.129323, 0.065778
+		  , 3.975060, 0.893398, 5.496314, 1.397313, 3.575120, 1.385297, 0.576191, 1.733288, 1.021255, 0.065131, 0.129115, 0.600308, 0.387276, 0.446001, 1.298493
+		  , 2.565079, 0.534056, 2.143993, 0.411388, 2.279084, 0.893006, 0.528209, 0.135731, 0.518741, 0.972662, 0.280700, 0.890086, 1.828755, 0.189028, 0.563778, 7.788147
+		  , 0.283631, 0.497926, 0.075454, 0.043794, 1.335322, 0.308605, 0.140137, 0.150797, 1.409726, 0.119868, 0.818331, 0.080591, 1.066017, 3.754687, 0.073415, 0.435046, 0.197272
+		  , 0.242513, 0.199157, 0.472207, 0.085937, 2.039787, 0.262751, 0.084578, 0.032247, 7.762326, 0.153966, 0.299828, 0.117255, 0.438215, 14.506235, 0.089180, 0.352766, 0.215417, 5.054245
+		  , 2.795818, 0.107130, 0.060909, 0.029724, 2.986426, 0.197267, 0.196977, 0.044327, 0.116751, 7.144311, 1.848622, 0.118020, 1.999696, 0.705747, 0.272763, 0.096935, 1.820982, 0.217007, 0.172975
+		},
+		{
+		  0.576160
+		  , 0.567606, 0.498643
+		  , 0.824359, 0.050698, 3.301401
+		  , 0.822724, 4.529235, 1.291808, 0.101930
+		  , 1.254238, 2.169809, 1.427980, 0.449474, 0.868679
+		  , 1.218615, 0.154502, 0.411471, 3.172277, 0.050239, 2.138661
+		  , 1.803443, 0.604673, 2.125496, 1.276384, 1.598679, 0.502653, 0.479490
+		  , 0.516862, 2.874265, 4.845769, 0.719673, 3.825677, 4.040275, 0.292773, 0.596643
+		  , 0.180898, 0.444586, 0.550969, 0.023542, 2.349573, 0.370160, 0.142187, 0.016618, 0.500788
+		  , 0.452099, 0.866322, 0.201033, 0.026731, 2.813990, 1.645178, 0.135556, 0.072152, 1.168817, 5.696116
+		  , 0.664186, 2.902886, 2.101971, 0.127988, 0.200218, 2.505933, 0.759509, 0.333569, 0.623100, 0.547454, 0.363656
+		  , 0.864415, 0.835049, 0.632649, 0.079201, 2.105931, 1.633544, 0.216462, 0.252419, 0.665406, 7.994105, 11.751178, 1.096842
+		  , 0.324478, 0.208947, 0.280339, 0.041683, 4.788477, 0.107022, 0.067711, 0.171320, 3.324779, 2.965328, 5.133843, 0.084856, 4.042591
+		  , 1.073043, 0.173826, 0.041985, 0.270336, 0.121299, 0.351384, 0.228565, 0.225318, 0.376089, 0.058027, 0.390354, 0.214230, 0.058954, 0.126299
+		  , 3.837562, 0.884342, 4.571911, 0.942751, 6.592827, 1.080063, 0.465397, 3.137614, 1.119667, 0.362516, 0.602355, 0.716940, 0.506796, 1.444484, 1.432558
+		  , 2.106026, 0.750016, 2.323325, 0.335915, 1.654673, 1.194017, 0.617231, 0.318671, 0.801030, 4.455842, 0.580191, 1.384210, 3.522468, 0.473128, 0.432718, 5.716300
+		  , 0.163720, 0.818102, 0.072322, 0.068275, 3.305436, 0.373790, 0.054323, 0.476587, 1.100360, 0.392946, 1.703323, 0.085720, 1.725516, 5.436253, 0.053108, 0.498594, 0.231832
+		  , 0.241167, 0.302440, 1.055095, 0.246940, 9.741942, 0.249895, 0.129973, 0.052363, 11.542498, 1.047449, 1.319667, 0.139770, 1.330225, 26.562270, 0.046986, 0.737653, 0.313460, 5.165098
+		  , 1.824586, 0.435795, 0.179086, 0.091739, 3.609570, 0.649507, 0.656681, 0.225234, 0.473437, 19.897252, 3.001995, 0.452926, 3.929598, 1.692159, 0.370204, 0.373501, 3.329822, 0.326593, 0.860743
+		}
+	      };
+	    
+	    double
+	      freqs[4][20] = 
+	      {{0.082276,0.055172,0.043853,0.053484,0.018957,0.028152,0.046679,0.157817,0.033297,0.028284,0.054284,0.025275,0.023665,0.041874,0.063071,0.066501,0.065424,0.023837,0.038633,0.049465},
+	       {0.120900,0.036460,0.026510,0.040410,0.015980,0.021132,0.025191,0.036369,0.015884,0.111029,0.162852,0.024820,0.028023,0.074058,0.012065,0.041963,0.039072,0.012666,0.040478,0.114137},
+	       {0.072639,0.051691,0.038642,0.055580,0.009829,0.031374,0.048731,0.065283,0.023791,0.086640,0.120847,0.052177,0.026728,0.032589,0.039238,0.046748,0.053361,0.008024,0.037426,0.098662},
+	       {0.104843,0.078835,0.043513,0.090498,0.002924,0.066163,0.151640,0.038843,0.022556,0.018383,0.038687,0.104462,0.010166,0.009089,0.066950,0.053667,0.049486,0.004409,0.012924,0.031963}};
+	    
+
+	    makeAASubstMat(daa, f, rates[lg4_index], freqs[lg4_index]);
+
+	    /*int 
+	      i, 
+	      j, 
+	      r = 0;
+	    
+	    for(i = 1; i < 20; i++)
+	      for(j = 0; j < i; j++)
+		{
+		  daa[i * 20 + j] = rates[lg4_index][r];
+		  r++;
+		}
+	    
+	    assert(r == 190);
+	    
+	    for(i = 0; i < 20; i++)
+	    f[i] = freqs[lg4_index][i];	  */
+	    
+	  }
+	  break;
+	case LG4X:
+	  {
+	    double 
+	      rates[4][190] = 
+	      {
+		{
+		  0.295719,
+		  0.067388, 0.448317,
+		  0.253712, 0.457483, 2.358429,
+		  1.029289, 0.576016, 0.251987, 0.189008,
+		  0.107964, 1.741924, 0.216561, 0.599450, 0.029955,
+		  0.514644, 0.736017, 0.503084, 109.901504, 0.084794, 4.117654,
+		  10.868848, 0.704334, 0.435271, 1.070052, 1.862626, 0.246260, 1.202023,
+		  0.380498, 5.658311, 4.873453, 5.229858, 0.553477, 6.508329, 1.634845, 0.404968,
+		  0.084223, 0.123387, 0.090748, 0.052764, 0.151733, 0.054187, 0.060194, 0.048984, 0.204296,
+		  0.086976, 0.221777, 0.033310, 0.021407, 0.230320, 0.195703, 0.069359, 0.069963, 0.504221, 1.495537,
+		  0.188789, 93.433377, 0.746537, 0.621146, 0.096955, 1.669092, 2.448827, 0.256662, 1.991533, 0.091940, 0.122332,
+		  0.286389, 0.382175, 0.128905, 0.081091, 0.352526, 0.810168, 0.232297, 0.228519, 0.655465, 1.994320, 3.256485, 0.457430,
+		  0.155567, 0.235965, 0.127321, 0.205164, 0.590018, 0.066081, 0.064822, 0.241077, 6.799829, 0.754940, 2.261319, 0.163849, 1.559944,
+		  1.671061, 6.535048, 0.904011, 5.164456, 0.386853, 2.437439, 3.537387, 4.320442, 11.291065, 0.170343, 0.848067, 5.260446, 0.426508, 0.438856,
+		  2.132922, 0.525521, 0.939733, 0.747330, 1.559564, 0.165666, 0.435384, 3.656545, 0.961142, 0.050315, 0.064441, 0.360946, 0.132547, 0.306683, 4.586081,
+		  0.529591, 0.303537, 0.435450, 0.308078, 0.606648, 0.106333, 0.290413, 0.290216, 0.448965, 0.372166, 0.102493, 0.389413, 0.498634, 0.109129, 2.099355, 3.634276,
+		  0.115551, 0.641259, 0.046646, 0.260889, 0.587531, 0.093417, 0.280695, 0.307466, 6.227274, 0.206332, 0.459041, 0.033291, 0.559069, 18.392863, 0.411347, 0.101797, 0.034710,
+		  0.102453, 0.289466, 0.262076, 0.185083, 0.592318, 0.035149, 0.105999, 0.096556, 20.304886, 0.097050, 0.133091, 0.115301, 0.264728, 66.647302, 0.476350, 0.148995, 0.063603, 20.561407,
+		  0.916683, 0.102065, 0.043986, 0.080708, 0.885230, 0.072549, 0.206603, 0.306067, 0.205944, 5.381403, 0.561215, 0.112593, 0.693307, 0.400021, 0.584622, 0.089177, 0.755865, 0.133790, 0.154902
+		},
+		{
+		  0.066142,
+		  0.590377, 0.468325,
+		  0.069930, 0.013688, 2.851667,
+		  9.850951, 0.302287, 3.932151, 0.146882,
+		  1.101363, 1.353957, 8.159169, 0.249672, 0.582670,
+		  0.150375, 0.028386, 0.219934, 0.560142, 0.005035, 3.054085,
+		  0.568586, 0.037750, 0.421974, 0.046719, 0.275844, 0.129551, 0.037250,
+		  0.051668, 0.262130, 2.468752, 0.106259, 0.098208, 4.210126, 0.029788, 0.013513,
+		  0.127170, 0.016923, 0.344765, 0.003656, 0.445038, 0.165753, 0.008541, 0.002533, 0.031779,
+		  0.292429, 0.064289, 0.210724, 0.004200, 1.217010, 1.088704, 0.014768, 0.005848, 0.064558, 7.278994,
+		  0.071458, 0.855973, 1.172204, 0.014189, 0.033969, 1.889645, 0.125869, 0.031390, 0.065585, 0.029917, 0.042762,
+		  1.218562, 0.079621, 0.763553, 0.009876, 1.988516, 3.344809, 0.056702, 0.021612, 0.079927, 7.918203, 14.799537, 0.259400,
+		  0.075144, 0.011169, 0.082464, 0.002656, 0.681161, 0.111063, 0.004186, 0.004854, 0.095591, 0.450964, 1.506485, 0.009457, 1.375871,
+		  7.169085, 0.161937, 0.726566, 0.040244, 0.825960, 2.067758, 0.110993, 0.129497, 0.196886, 0.169797, 0.637893, 0.090576, 0.457399, 0.143327,
+		  30.139501, 0.276530, 11.149790, 0.267322, 18.762977, 3.547017, 0.201148, 0.976631, 0.408834, 0.104288, 0.123793, 0.292108, 0.598048, 0.328689, 3.478333,
+		  13.461692, 0.161053, 4.782635, 0.053740, 11.949233, 2.466507, 0.139705, 0.053397, 0.126088, 1.578530, 0.641351, 0.297913, 4.418398, 0.125011, 2.984862, 13.974326,
+		  0.021372, 0.081472, 0.058046, 0.006597, 0.286794, 0.188236, 0.009201, 0.019475, 0.037226, 0.015909, 0.154810, 0.017172, 0.239749, 0.562720, 0.061299, 0.154326, 0.060703,
+		  0.045779, 0.036742, 0.498072, 0.027639, 0.534219, 0.203493, 0.012095, 0.004964, 0.452302, 0.094365, 0.140750, 0.021976, 0.168432, 1.414883, 0.077470, 0.224675, 0.123480, 0.447011,
+		  4.270235, 0.030342, 0.258487, 0.012745, 4.336817, 0.281953, 0.043812, 0.015539, 0.016212, 16.179952, 3.416059, 0.032578, 2.950318, 0.227807, 1.050562, 0.112000, 5.294490, 0.033381, 0.045528
+		},	    
+		{
+		  0.733336,
+		  0.558955, 0.597671,
+		  0.503360, 0.058964, 5.581680,
+		  4.149599, 2.863355, 1.279881, 0.225860,
+		  1.415369, 2.872594, 1.335650, 0.434096, 1.043232,
+		  1.367574, 0.258365, 0.397108, 2.292917, 0.209978, 4.534772,
+		  1.263002, 0.366868, 1.840061, 1.024707, 0.823594, 0.377181, 0.496780,
+		  0.994098, 2.578946, 5.739035, 0.821921, 3.039380, 4.877840, 0.532488, 0.398817,
+		  0.517204, 0.358350, 0.284730, 0.027824, 1.463390, 0.370939, 0.232460, 0.008940, 0.349195,
+		  0.775054, 0.672023, 0.109781, 0.021443, 1.983693, 1.298542, 0.169219, 0.043707, 0.838324, 5.102837,
+		  0.763094, 5.349861, 1.612642, 0.088850, 0.397640, 3.509873, 0.755219, 0.436013, 0.888693, 0.561690, 0.401070,
+		  1.890137, 0.691594, 0.466979, 0.060820, 2.831098, 2.646440, 0.379926, 0.087640, 0.488389, 7.010411, 8.929538, 1.357738,
+		  0.540460, 0.063347, 0.141582, 0.018288, 4.102068, 0.087872, 0.020447, 0.064863, 1.385133, 3.054968, 5.525874, 0.043394, 3.135353,
+		  0.200122, 0.032875, 0.019509, 0.042687, 0.059723, 0.072299, 0.023282, 0.036426, 0.050226, 0.039318, 0.067505, 0.023126, 0.012695, 0.015631,
+		  4.972745, 0.821562, 4.670980, 1.199607, 5.901348, 1.139018, 0.503875, 1.673207, 0.962470, 0.204155, 0.273372, 0.567639, 0.570771, 0.458799, 0.233109,
+		  1.825593, 0.580847, 1.967383, 0.420710, 2.034980, 0.864479, 0.577513, 0.124068, 0.502294, 2.653232, 0.437116, 1.048288, 2.319555, 0.151684, 0.077004, 8.113282,
+		  0.450842, 0.661866, 0.088064, 0.037642, 2.600668, 0.390688, 0.109318, 0.218118, 1.065585, 0.564368, 1.927515, 0.120994, 1.856122, 4.154750, 0.011074, 0.377578, 0.222293,
+		  0.526135, 0.265730, 0.581928, 0.141233, 5.413080, 0.322761, 0.153776, 0.039217, 8.351808, 0.854294, 0.940458, 0.180650, 0.975427, 11.429924, 0.026268, 0.429221, 0.273138, 4.731579,
+		  3.839269, 0.395134, 0.145401, 0.090101, 4.193725, 0.625409, 0.696533, 0.104335, 0.377304, 15.559906, 2.508169, 0.449074, 3.404087, 1.457957, 0.052132, 0.260296, 2.903836, 0.564762, 0.681215
+		},
+		{
+		  0.658412,
+		  0.566269, 0.540749,
+		  0.854111, 0.058015, 3.060574,
+		  0.884454, 5.851132, 1.279257, 0.160296,
+		  1.309554, 2.294145, 1.438430, 0.482619, 0.992259,
+		  1.272639, 0.182966, 0.431464, 2.992763, 0.086318, 2.130054,
+		  1.874713, 0.684164, 2.075952, 1.296206, 2.149634, 0.571406, 0.507160,
+		  0.552007, 3.192521, 4.840271, 0.841829, 5.103188, 4.137385, 0.351381, 0.679853,
+		  0.227683, 0.528161, 0.644656, 0.031467, 3.775817, 0.437589, 0.189152, 0.025780, 0.665865,
+		  0.581512, 1.128882, 0.266076, 0.048542, 3.954021, 2.071689, 0.217780, 0.082005, 1.266791, 8.904999,
+		  0.695190, 3.010922, 2.084975, 0.132774, 0.190734, 2.498630, 0.767361, 0.326441, 0.680174, 0.652629, 0.440178,
+		  0.967985, 1.012866, 0.720060, 0.133055, 1.776095, 1.763546, 0.278392, 0.343977, 0.717301, 10.091413, 14.013035, 1.082703,
+		  0.344015, 0.227296, 0.291854, 0.056045, 4.495841, 0.116381, 0.092075, 0.195877, 4.001286, 2.671718, 5.069337, 0.091278, 4.643214,
+		  0.978992, 0.156635, 0.028961, 0.209188, 0.264277, 0.296578, 0.177263, 0.217424, 0.362942, 0.086367, 0.539010, 0.172734, 0.121821, 0.161015,
+		  3.427163, 0.878405, 4.071574, 0.925172, 7.063879, 1.033710, 0.451893, 3.057583, 1.189259, 0.359932, 0.742569, 0.693405, 0.584083, 1.531223, 1.287474,
+		  2.333253, 0.802754, 2.258357, 0.360522, 2.221150, 1.283423, 0.653836, 0.377558, 0.964545, 4.797423, 0.780580, 1.422571, 4.216178, 0.599244, 0.444362, 5.231362,
+		  0.154701, 0.830884, 0.073037, 0.094591, 3.017954, 0.312579, 0.074620, 0.401252, 1.350568, 0.336801, 1.331875, 0.068958, 1.677263, 5.832025, 0.076328, 0.548763, 0.208791,
+		  0.221089, 0.431617, 1.238426, 0.313945, 8.558815, 0.305772, 0.181992, 0.072258, 12.869737, 1.021885, 1.531589, 0.163829, 1.575754, 33.873091, 0.079916, 0.831890, 0.307846, 5.910440,
+		  2.088785, 0.456530, 0.199728, 0.118104, 4.310199, 0.681277, 0.752277, 0.241015, 0.531100, 23.029406, 4.414850, 0.481711, 5.046403, 1.914768, 0.466823, 0.382271, 3.717971, 0.282540, 0.964421
+		}
+	      };
+
+	    double
+	      freqs[4][20] = 
+	      {{0.147383 , 0.017579 , 0.058208 , 0.017707 , 0.026331 , 0.041582 , 0.017494 , 0.027859 , 0.011849 , 0.076971 , 
+		0.147823 , 0.019535 , 0.037132 , 0.029940 , 0.008059 , 0.088179 , 0.089653 , 0.006477 , 0.032308 , 0.097931},
+	       {0.063139 , 0.066357 , 0.011586 , 0.066571 , 0.010800 , 0.009276 , 0.053984 , 0.146986 , 0.034214 , 0.088822 , 
+		0.098196 , 0.032390 , 0.021263 , 0.072697 , 0.016761 , 0.020711 , 0.020797 , 0.025463 , 0.045615 , 0.094372},
+	       {0.062457 , 0.066826 , 0.049332 , 0.065270 , 0.006513 , 0.041231 , 0.058965 , 0.080852 , 0.028024 , 0.037024 , 
+		0.075925 , 0.064131 , 0.019620 , 0.028710 , 0.104579 , 0.056388 , 0.062027 , 0.008241 , 0.033124 , 0.050760},
+	       {0.106471 , 0.074171 , 0.044513 , 0.096390 , 0.002148 , 0.066733 , 0.158908 , 0.037625 , 0.020691 , 0.014608 , 
+		0.028797 , 0.105352 , 0.007864 , 0.007477 , 0.083595 , 0.055726 , 0.047711 , 0.003975 , 0.010088 , 0.027159}};
+	    
+
+	    makeAASubstMat(daa, f, rates[lg4_index], freqs[lg4_index]);
+	    
+	    /*int 
+	      i, 
+	      j, 
+	      r = 0;
+
+	    for(i = 1; i < 20; i++)
+	      for(j = 0; j < i; j++)
+		{
+		  daa[i * 20 + j] = rates[lg4_index][r];
+		  r++;
+		}
+	  
+	    assert(r == 190);
+	    
+	    for(i = 0; i < 20; i++)
+	    f[i] = freqs[lg4_index][i];	  */
+	    
+	  }
+	  break;
+	case STMTREV:
+	  {
+	    double rates[190] =
+	      {
+		0.1159435373,  
+		0.2458816714, 0.1355713516, 
+		0.9578712472, 0.0775041665, 8.4408676914, 
+		0.2327281954, 9.1379470330, 0.1137687264, 0.0582110367, 
+		0.3309250853, 5.2854173238, 0.1727184754, 0.8191776581, 0.0009722083, 
+		0.6946680829, 0.0966719296, 0.2990806606, 7.3729791633, 0.0005604799, 3.5773486727, 
+		2.8076062202, 3.0815651393, 0.5575702616, 2.2627839242, 1.1721237455, 0.0482085663, 3.3184632572,  
+		0.2275494971, 2.8251848421, 9.5228608030, 2.3191131858, 0.0483235836, 4.4138715270, 0.0343694246, 0.0948383460, 
+		0.0627691644, 0.5712158076, 0.2238609194, 0.0205779319, 0.1527276944, 0.0206129952, 0.0328079744, 0.1239000315, 0.0802374651, 
+		0.0305818840, 0.1930408758, 0.0540967250, 0.0018843293, 0.2406073246, 0.3299454620, 0.0373753435, 0.0005918940, 0.1192904610, 1.3184058362, 
+		0.2231434272, 6.0541970908, 4.3977466558, 0.1347413792, 0.0001480536, 5.2864094506, 6.8883522181, 0.5345755286, 0.3991624551, 0.2107928508, 0.1055933141, 
+		0.1874527991, 0.2427875732, 0.0433577842, 0.0000022173, 0.0927357503, 0.0109238300, 0.0663619185, 0.0128777966, 0.0722334577, 4.3016010974, 1.1493262595, 0.4773694701, 
+		0.0458112245, 0.0310030750, 0.0233493970, 0.0000080023, 0.8419347601, 0.0027817812, 0.0361207581, 0.0490593583, 0.0197089530, 0.3634155844, 2.1032860162, 0.0861057517, 0.1735660361, 
+		1.5133910481, 0.7858555362, 0.3000131148, 0.3337627573, 0.0036260499, 1.5386413234, 0.5196922389, 0.0221252552, 1.0171151697, 0.0534088166, 6.0377879080, 0.4350064365, 0.1634497017, 
+		0.3545179411, 
+		2.3008246523, 0.7625702322, 1.9431704326, 0.6961369276, 2.3726544756, 0.1837198343, 0.9087013201, 2.5477016916, 0.3081949928, 0.1713464632, 2.7297706102, 0.3416923226, 0.0730798705, 
+		4.0107845583, 8.4630191575, 
+		4.3546170435, 1.0655012755, 1.6534489471, 0.0985354973, 0.1940108923, 0.3415280861, 0.2794040892, 0.1657005971, 0.2704552047, 2.3418182855, 0.0426297282, 1.2152488582, 4.6553742047, 
+		0.0068797851, 1.1613183519, 2.2213527952, 
+		0.0565037747, 6.7852754661, 0.0000010442, 0.0000002842, 0.9529353202, 0.0009844045, 0.0002705734, 0.5068170211, 0.0000932799, 0.0050518699, 0.3163744815, 0.0000023280, 0.1010587493, 
+		0.2890102379, 0.0041564377, 0.0495269526, 0.0002026765, 
+		0.0358664532, 0.0714121777, 0.3036789915, 1.3220740967, 1.7972997876, 0.0066458178, 0.3052655031, 0.0174305437, 21.9842817264, 0.1070890246, 0.0770894218, 0.1929529483, 0.0561599188, 
+		1.6748429971, 0.0021338646, 1.8890678523, 0.2834320440, 0.3134203648, 
+		3.2116908598, 0.0108028571, 0.0860833645, 0.0426724431, 0.3652373073, 0.0287789552, 0.1484349765, 0.5158740953, 0.0059791370, 3.3648305163, 0.8763855707, 0.0776875418, 0.9145670668, 
+		0.3963331926, 0.1080226203, 0.0640951379, 0.2278998021, 0.0388755869, 0.1836950254}; 
+	    
+	    double
+	      freqs[20] = {0.0461811000, 0.0534080000, 0.0361971000, 0.0233326000, 0.0234170000, 0.0390397000, 0.0341284001, 0.0389164000, 0.0164640000, 0.0891534000, 
+			   0.1617310001, 0.0551341000, 0.0233262000, 0.0911252000, 0.0344713001, 0.0771077000, 0.0418603001, 0.0200784000, 0.0305429000, 0.0643851996};  
+	    
+	    makeAASubstMat(daa, f, rates, freqs);	    
+	  }
+	break;
+	case MTART:
+	  {	   
+	    daa[1*20+0]=   0.2;
+	    daa[2*20+0]=   0.2;
+           daa[2*20+1]=   0.2;
+           daa[3*20+0]=   1;
+           daa[3*20+1]=   4;
+           daa[3*20+2]=   500;
+           daa[4*20+0]=   254;
+           daa[4*20+1]=   36;
+           daa[4*20+2]=   98;
+           daa[4*20+3]=   11;
+           daa[5*20+0]=   0.2;
+           daa[5*20+1]=   154;
+           daa[5*20+2]=   262;
+           daa[5*20+3]=   0.2;
+           daa[5*20+4]=   0.2;
+           daa[6*20+0]=   0.2;
+           daa[6*20+1]=   0.2;
+           daa[6*20+2]=   183;
+           daa[6*20+3]=   862;
+           daa[6*20+4]=   0.2;
+           daa[6*20+5]=   262;
+           daa[7*20+0]=   200;
+           daa[7*20+1]=   0.2;
+           daa[7*20+2]=   121;
+           daa[7*20+3]=   12;
+           daa[7*20+4]=   81;
+           daa[7*20+5]=   3;
+           daa[7*20+6]=   44;
+           daa[8*20+0]=   0.2;
+           daa[8*20+1]=   41;
+           daa[8*20+2]=   180;
+           daa[8*20+3]=   0.2;
+           daa[8*20+4]=   12;
+           daa[8*20+5]=   314;
+           daa[8*20+6]=   15;
+           daa[8*20+7]=   0.2;
+           daa[9*20+0]=   26;
+           daa[9*20+1]=   2;
+           daa[9*20+2]=   21;
+           daa[9*20+3]=   7;
+           daa[9*20+4]=   63;
+           daa[9*20+5]=   11;
+           daa[9*20+6]=   7;
+           daa[9*20+7]=   3;
+           daa[9*20+8]=   0.2;
+           daa[10*20+0]=  4;
+           daa[10*20+1]=  2;
+           daa[10*20+2]=  13;
+           daa[10*20+3]=  1;
+           daa[10*20+4]=  79;
+           daa[10*20+5]=  16;
+           daa[10*20+6]=  2;
+           daa[10*20+7]=  1;
+           daa[10*20+8]=  6;
+           daa[10*20+9]=  515;
+           daa[11*20+0]=  0.2;
+           daa[11*20+1]=  209;
+           daa[11*20+2]=  467;
+           daa[11*20+3]=  2;
+           daa[11*20+4]=  0.2;
+           daa[11*20+5]=  349;
+           daa[11*20+6]=  106;
+           daa[11*20+7]=  0.2;
+           daa[11*20+8]=  0.2;
+           daa[11*20+9]=  3;
+           daa[11*20+10]= 4;
+           daa[12*20+0]=  121;
+           daa[12*20+1]=  5;
+           daa[12*20+2]=  79;
+           daa[12*20+3]=  0.2;
+           daa[12*20+4]=  312;
+           daa[12*20+5]=  67;
+           daa[12*20+6]=  0.2;
+           daa[12*20+7]=  56;
+           daa[12*20+8]=  0.2;
+           daa[12*20+9]=  515;
+           daa[12*20+10]= 885;
+           daa[12*20+11]= 106;
+           daa[13*20+0]=  13;
+           daa[13*20+1]=  5;
+           daa[13*20+2]=  20;
+           daa[13*20+3]=  0.2;
+           daa[13*20+4]=  184;
+           daa[13*20+5]=  0.2;
+           daa[13*20+6]=  0.2;
+           daa[13*20+7]=  1;
+           daa[13*20+8]=  14;
+           daa[13*20+9]=  118;
+           daa[13*20+10]= 263;
+           daa[13*20+11]= 11;
+           daa[13*20+12]= 322;
+           daa[14*20+0]=  49;
+           daa[14*20+1]=  0.2;
+           daa[14*20+2]=  17;
+           daa[14*20+3]=  0.2;
+           daa[14*20+4]=  0.2;
+           daa[14*20+5]=  39;
+           daa[14*20+6]=  8;
+           daa[14*20+7]=  0.2;
+           daa[14*20+8]=  1;
+           daa[14*20+9]=  0.2;
+           daa[14*20+10]= 12;
+           daa[14*20+11]= 17;
+           daa[14*20+12]= 5;
+           daa[14*20+13]= 15;
+           daa[15*20+0]=  673;
+           daa[15*20+1]=  3;
+           daa[15*20+2]=  398;
+           daa[15*20+3]=  44;
+           daa[15*20+4]=  664;
+           daa[15*20+5]=  52;
+           daa[15*20+6]=  31;
+           daa[15*20+7]=  226;
+           daa[15*20+8]=  11;
+           daa[15*20+9]=  7;
+           daa[15*20+10]= 8;
+           daa[15*20+11]= 144;
+           daa[15*20+12]= 112;
+           daa[15*20+13]= 36;
+           daa[15*20+14]= 87;
+           daa[16*20+0]=  244;
+           daa[16*20+1]=  0.2;
+           daa[16*20+2]=  166;
+           daa[16*20+3]=  0.2;
+           daa[16*20+4]=  183;
+           daa[16*20+5]=  44;
+           daa[16*20+6]=  43;
+           daa[16*20+7]=  0.2;
+           daa[16*20+8]=  19;
+           daa[16*20+9]=  204;
+           daa[16*20+10]= 48;
+           daa[16*20+11]= 70;
+           daa[16*20+12]= 289;
+           daa[16*20+13]= 14;
+           daa[16*20+14]= 47;
+           daa[16*20+15]= 660;
+           daa[17*20+0]=  0.2;
+           daa[17*20+1]=  0.2;
+           daa[17*20+2]=  8;
+           daa[17*20+3]=  0.2;
+           daa[17*20+4]=  22;
+           daa[17*20+5]=  7;
+           daa[17*20+6]=  11;
+           daa[17*20+7]=  2;
+           daa[17*20+8]=  0.2;
+           daa[17*20+9]=  0.2;
+           daa[17*20+10]= 21;
+           daa[17*20+11]= 16;
+           daa[17*20+12]= 71;
+           daa[17*20+13]= 54;
+           daa[17*20+14]= 0.2;
+           daa[17*20+15]= 2;
+           daa[17*20+16]= 0.2;
+           daa[18*20+0]=  1;
+           daa[18*20+1]=  4;
+           daa[18*20+2]=  251;
+           daa[18*20+3]=  0.2;
+           daa[18*20+4]=  72;
+           daa[18*20+5]=  87;
+           daa[18*20+6]=  8;
+           daa[18*20+7]=  9;
+           daa[18*20+8]=  191;
+           daa[18*20+9]=  12;
+           daa[18*20+10]= 20;
+           daa[18*20+11]= 117;
+           daa[18*20+12]= 71;
+           daa[18*20+13]= 792;
+           daa[18*20+14]= 18;
+           daa[18*20+15]= 30;
+           daa[18*20+16]= 46;
+           daa[18*20+17]= 38;
+           daa[19*20+0]=  340;
+           daa[19*20+1]=  0.2;
+           daa[19*20+2]=  23;
+           daa[19*20+3]=  0.2;
+           daa[19*20+4]=  350;
+           daa[19*20+5]=  0.2;
+           daa[19*20+6]=  14;
+           daa[19*20+7]=  3;
+           daa[19*20+8]=  0.2;
+           daa[19*20+9]=  1855;
+           daa[19*20+10]= 85;
+           daa[19*20+11]= 26;
+           daa[19*20+12]= 281;
+           daa[19*20+13]= 52;
+           daa[19*20+14]= 32;
+           daa[19*20+15]= 61;
+           daa[19*20+16]= 544;
+           daa[19*20+17]= 0.2;
+           daa[19*20+18]= 2;
+           
+           f[0]=  0.054116;
+           f[1]=  0.018227;
+           f[2]=  0.039903;
+           f[3]=  0.020160;
+           f[4]=  0.009709;
+           f[5]=  0.018781;
+           f[6]=  0.024289;
+           f[7]=  0.068183;
+           f[8]=  0.024518;
+           f[9]=  0.092638;
+           f[10]= 0.148658;
+           f[11]= 0.021718;
+           f[12]= 0.061453;
+           f[13]= 0.088668;
+           f[14]= 0.041826;
+           f[15]= 0.091030;
+           f[16]= 0.049194;
+           f[17]= 0.029786;
+           f[18]= 0.039443;
+           f[19]= 0.057700;
+	  }
+	  break;
+	case MTZOA:
+	  {
+           daa[1*20+0]=   3.3;
+           daa[2*20+0]=   1.7;
+           daa[2*20+1]=   33.6;
+           daa[3*20+0]=   16.1;
+           daa[3*20+1]=   3.2;
+           daa[3*20+2]=   617.0;
+           daa[4*20+0]=   272.5;
+           daa[4*20+1]=   61.1;
+           daa[4*20+2]=   94.6;
+           daa[4*20+3]=   9.5;
+           daa[5*20+0]=   7.3;
+           daa[5*20+1]=   231.0;
+           daa[5*20+2]=   190.3;
+           daa[5*20+3]=   19.3;
+           daa[5*20+4]=   49.1;
+           daa[6*20+0]=   17.1;
+           daa[6*20+1]=   6.4;
+           daa[6*20+2]=   174.0;
+           daa[6*20+3]=   883.6;
+           daa[6*20+4]=   3.4;
+           daa[6*20+5]=   349.4;
+           daa[7*20+0]=   289.3;
+           daa[7*20+1]=   7.2;
+           daa[7*20+2]=   99.3;
+           daa[7*20+3]=   26.0;
+           daa[7*20+4]=   82.4;
+           daa[7*20+5]=   8.9;
+           daa[7*20+6]=   43.1;
+           daa[8*20+0]=   2.3;
+           daa[8*20+1]=   61.7;
+           daa[8*20+2]=   228.9;
+           daa[8*20+3]=   55.6;
+           daa[8*20+4]=   37.5;
+           daa[8*20+5]=   421.8;
+           daa[8*20+6]=   14.9;
+           daa[8*20+7]=   7.4;
+           daa[9*20+0]=   33.2;
+           daa[9*20+1]=   0.2;
+           daa[9*20+2]=   24.3;
+           daa[9*20+3]=   1.5;
+           daa[9*20+4]=   48.8;
+           daa[9*20+5]=   0.2;
+           daa[9*20+6]=   7.3;
+           daa[9*20+7]=   3.4;
+           daa[9*20+8]=   1.6;
+           daa[10*20+0]=  15.6;
+           daa[10*20+1]=  4.1;
+           daa[10*20+2]=  7.9;
+           daa[10*20+3]=  0.5;
+           daa[10*20+4]=  59.7;
+           daa[10*20+5]=  23.0;
+           daa[10*20+6]=  1.0;
+           daa[10*20+7]=  3.5;
+           daa[10*20+8]=  6.6;
+           daa[10*20+9]=  425.2;
+           daa[11*20+0]=  0.2;
+           daa[11*20+1]=  292.3;
+           daa[11*20+2]=  413.4;
+           daa[11*20+3]=  0.2;
+           daa[11*20+4]=  0.2;
+           daa[11*20+5]=  334.0;
+           daa[11*20+6]=  163.2;
+           daa[11*20+7]=  10.1;
+           daa[11*20+8]=  23.9;
+           daa[11*20+9]=  8.4;
+           daa[11*20+10]= 6.7;
+           daa[12*20+0]=  136.5;
+           daa[12*20+1]=  3.8;
+           daa[12*20+2]=  73.7;
+           daa[12*20+3]=  0.2;
+           daa[12*20+4]=  264.8;
+           daa[12*20+5]=  83.9;
+           daa[12*20+6]=  0.2;
+           daa[12*20+7]=  52.2;
+           daa[12*20+8]=  7.1;
+           daa[12*20+9]=  449.7;
+           daa[12*20+10]= 636.3;
+           daa[12*20+11]= 83.0;
+           daa[13*20+0]=  26.5;
+           daa[13*20+1]=  0.2;
+           daa[13*20+2]=  12.9;
+           daa[13*20+3]=  2.0;
+           daa[13*20+4]=  167.8;
+           daa[13*20+5]=  9.5;
+           daa[13*20+6]=  0.2;
+           daa[13*20+7]=  5.8;
+           daa[13*20+8]=  13.1;
+           daa[13*20+9]=  90.3;
+           daa[13*20+10]= 234.2;
+           daa[13*20+11]= 16.3;
+           daa[13*20+12]= 215.6;
+           daa[14*20+0]=  61.8;
+           daa[14*20+1]=  7.5;
+           daa[14*20+2]=  22.6;
+           daa[14*20+3]=  0.2;
+           daa[14*20+4]=  8.1;
+           daa[14*20+5]=  52.2;
+           daa[14*20+6]=  20.6;
+           daa[14*20+7]=  1.3;
+           daa[14*20+8]=  15.6;
+           daa[14*20+9]=  2.6;
+           daa[14*20+10]= 11.4;
+           daa[14*20+11]= 24.3;
+           daa[14*20+12]= 5.4;
+           daa[14*20+13]= 10.5;
+           daa[15*20+0]=  644.9;
+           daa[15*20+1]=  11.8;
+           daa[15*20+2]=  420.2;
+           daa[15*20+3]=  51.4;
+           daa[15*20+4]=  656.3;
+           daa[15*20+5]=  96.4;
+           daa[15*20+6]=  38.4;
+           daa[15*20+7]=  257.1;
+           daa[15*20+8]=  23.1;
+           daa[15*20+9]=  7.2;
+           daa[15*20+10]= 15.2;
+           daa[15*20+11]= 144.9;
+           daa[15*20+12]= 95.3;
+           daa[15*20+13]= 32.2;
+           daa[15*20+14]= 79.7;
+           daa[16*20+0]=  378.1;
+           daa[16*20+1]=  3.2;
+           daa[16*20+2]=  184.6;
+           daa[16*20+3]=  2.3;
+           daa[16*20+4]=  199.0;
+           daa[16*20+5]=  39.4;
+           daa[16*20+6]=  34.5;
+           daa[16*20+7]=  5.2;
+           daa[16*20+8]=  19.4;
+           daa[16*20+9]=  222.3;
+           daa[16*20+10]= 50.0;
+           daa[16*20+11]= 75.5;
+           daa[16*20+12]= 305.1;
+           daa[16*20+13]= 19.3;
+           daa[16*20+14]= 56.9;
+           daa[16*20+15]= 666.3;
+           daa[17*20+0]=  3.1;
+           daa[17*20+1]=  16.9;
+           daa[17*20+2]=  6.4;
+           daa[17*20+3]=  0.2;
+           daa[17*20+4]=  36.1;
+           daa[17*20+5]=  6.1;
+           daa[17*20+6]=  3.5;
+           daa[17*20+7]=  12.3;
+           daa[17*20+8]=  4.5;
+           daa[17*20+9]=  9.7;
+           daa[17*20+10]= 27.2;
+           daa[17*20+11]= 6.6;
+           daa[17*20+12]= 48.7;
+           daa[17*20+13]= 58.2;
+           daa[17*20+14]= 1.3;
+           daa[17*20+15]= 10.3;
+           daa[17*20+16]= 3.6;
+           daa[18*20+0]=  2.1;
+           daa[18*20+1]=  13.8;
+           daa[18*20+2]=  141.6;
+           daa[18*20+3]=  13.9;
+           daa[18*20+4]=  76.7;
+           daa[18*20+5]=  52.3;
+           daa[18*20+6]=  10.0;
+           daa[18*20+7]=  4.3;
+           daa[18*20+8]=  266.5;
+           daa[18*20+9]=  13.1;
+           daa[18*20+10]= 5.7;
+           daa[18*20+11]= 45.0;
+           daa[18*20+12]= 41.4;
+           daa[18*20+13]= 590.5;
+           daa[18*20+14]= 4.2;
+           daa[18*20+15]= 29.7;
+           daa[18*20+16]= 29.0;
+           daa[18*20+17]= 79.8;
+           daa[19*20+0]=  321.9;
+           daa[19*20+1]=  5.1;
+           daa[19*20+2]=  7.1;
+           daa[19*20+3]=  3.7;
+           daa[19*20+4]=  243.8;
+           daa[19*20+5]=  9.0;
+           daa[19*20+6]=  16.3;
+           daa[19*20+7]=  23.7;
+           daa[19*20+8]=  0.3;
+           daa[19*20+9]=  1710.6;
+           daa[19*20+10]= 126.1;
+           daa[19*20+11]= 11.1;
+           daa[19*20+12]= 279.6;
+           daa[19*20+13]= 59.6;
+           daa[19*20+14]= 17.9;
+           daa[19*20+15]= 49.5;
+           daa[19*20+16]= 396.4;
+           daa[19*20+17]= 13.7;
+           daa[19*20+18]= 15.6;
+           
+           f[0]=  0.069;
+           f[1]=  0.021;
+           f[2]=  0.030;
+           f[3]=  0.020;
+           f[4]=  0.010;
+           f[5]=  0.019;
+           f[6]=  0.025;
+           f[7]=  0.072;
+           f[8]=  0.027;
+           f[9]=  0.085;
+           f[10]= 0.157;
+           f[11]= 0.019;
+           f[12]= 0.051;
+           f[13]= 0.082;
+           f[14]= 0.045;
+           f[15]= 0.081;
+           f[16]= 0.056;
+           f[17]= 0.028;
+           f[18]= 0.037;
+           f[19]= 0.066;
+	  }
+	  break;
+	case PMB:
+	  {
+           daa[1*20+0]=   0.674995699;
+           daa[2*20+0]=   0.589645178;
+           daa[2*20+1]=   1.189067034;
+           daa[3*20+0]=   0.462499504;
+           daa[3*20+1]=   0.605460903;
+           daa[3*20+2]=   3.573373315;
+           daa[4*20+0]=   1.065445546;
+           daa[4*20+1]=   0.31444833;
+           daa[4*20+2]=   0.589852457;
+           daa[4*20+3]=   0.246951424;
+           daa[5*20+0]=   1.111766964;
+           daa[5*20+1]=   2.967840934;
+           daa[5*20+2]=   2.299755865;
+           daa[5*20+3]=   1.686058219;
+           daa[5*20+4]=   0.245163782;
+           daa[6*20+0]=   1.046334652;
+           daa[6*20+1]=   1.201770702;
+           daa[6*20+2]=   1.277836748;
+           daa[6*20+3]=   4.399995525;
+           daa[6*20+4]=   0.091071867;
+           daa[6*20+5]=   4.15967899;
+           daa[7*20+0]=   1.587964372;
+           daa[7*20+1]=   0.523770553;
+           daa[7*20+2]=   1.374854049;
+           daa[7*20+3]=   0.734992057;
+           daa[7*20+4]=   0.31706632;
+           daa[7*20+5]=   0.596789898;
+           daa[7*20+6]=   0.463812837;
+           daa[8*20+0]=   0.580830874;
+           daa[8*20+1]=   1.457127446;
+           daa[8*20+2]=   2.283037894;
+           daa[8*20+3]=   0.839348444;
+           daa[8*20+4]=   0.411543728;
+           daa[8*20+5]=   1.812173605;
+           daa[8*20+6]=   0.877842609;
+           daa[8*20+7]=   0.476331437;
+           daa[9*20+0]=   0.464590585;
+           daa[9*20+1]=   0.35964586;
+           daa[9*20+2]=   0.426069419;
+           daa[9*20+3]=   0.266775558;
+           daa[9*20+4]=   0.417547309;
+           daa[9*20+5]=   0.315256838;
+           daa[9*20+6]=   0.30421529;
+           daa[9*20+7]=   0.180198883;
+           daa[9*20+8]=   0.285186418;
+           daa[10*20+0]=  0.804404505;
+           daa[10*20+1]=  0.520701585;
+           daa[10*20+2]=  0.41009447;
+           daa[10*20+3]=  0.269124919;
+           daa[10*20+4]=  0.450795211;
+           daa[10*20+5]=  0.625792937;
+           daa[10*20+6]=  0.32078471;
+           daa[10*20+7]=  0.259854426;
+           daa[10*20+8]=  0.363981358;
+           daa[10*20+9]=  4.162454693;
+           daa[11*20+0]=  0.831998835;
+           daa[11*20+1]=  4.956476453;
+           daa[11*20+2]=  2.037575629;
+           daa[11*20+3]=  1.114178954;
+           daa[11*20+4]=  0.274163536;
+           daa[11*20+5]=  3.521346591;
+           daa[11*20+6]=  2.415974716;
+           daa[11*20+7]=  0.581001076;
+           daa[11*20+8]=  0.985885486;
+           daa[11*20+9]=  0.374784947;
+           daa[11*20+10]= 0.498011337;
+           daa[12*20+0]=  1.546725076;
+           daa[12*20+1]=  0.81346254;
+           daa[12*20+2]=  0.737846301;
+           daa[12*20+3]=  0.341932741;
+           daa[12*20+4]=  0.618614612;
+           daa[12*20+5]=  2.067388546;
+           daa[12*20+6]=  0.531773639;
+           daa[12*20+7]=  0.465349326;
+           daa[12*20+8]=  0.380925433;
+           daa[12*20+9]=  3.65807012;
+           daa[12*20+10]= 5.002338375;
+           daa[12*20+11]= 0.661095832;
+           daa[13*20+0]=  0.546169219;
+           daa[13*20+1]=  0.303437244;
+           daa[13*20+2]=  0.425193716;
+           daa[13*20+3]=  0.219005213;
+           daa[13*20+4]=  0.669206193;
+           daa[13*20+5]=  0.406042546;
+           daa[13*20+6]=  0.224154698;
+           daa[13*20+7]=  0.35402891;
+           daa[13*20+8]=  0.576231691;
+           daa[13*20+9]=  1.495264661;
+           daa[13*20+10]= 2.392638293;
+           daa[13*20+11]= 0.269496317;
+           daa[13*20+12]= 2.306919847;
+           daa[14*20+0]=  1.241586045;
+           daa[14*20+1]=  0.65577338;
+           daa[14*20+2]=  0.711495595;
+           daa[14*20+3]=  0.775624818;
+           daa[14*20+4]=  0.198679914;
+           daa[14*20+5]=  0.850116543;
+           daa[14*20+6]=  0.794584081;
+           daa[14*20+7]=  0.588254139;
+           daa[14*20+8]=  0.456058589;
+           daa[14*20+9]=  0.366232942;
+           daa[14*20+10]= 0.430073179;
+           daa[14*20+11]= 1.036079005;
+           daa[14*20+12]= 0.337502282;
+           daa[14*20+13]= 0.481144863;
+           daa[15*20+0]=  3.452308792;
+           daa[15*20+1]=  0.910144334;
+           daa[15*20+2]=  2.572577221;
+           daa[15*20+3]=  1.440896785;
+           daa[15*20+4]=  0.99870098;
+           daa[15*20+5]=  1.348272505;
+           daa[15*20+6]=  1.205509425;
+           daa[15*20+7]=  1.402122097;
+           daa[15*20+8]=  0.799966711;
+           daa[15*20+9]=  0.530641901;
+           daa[15*20+10]= 0.402471997;
+           daa[15*20+11]= 1.234648153;
+           daa[15*20+12]= 0.945453716;
+           daa[15*20+13]= 0.613230817;
+           daa[15*20+14]= 1.217683028;
+           daa[16*20+0]=  1.751412803;
+           daa[16*20+1]=  0.89517149;
+           daa[16*20+2]=  1.823161023;
+           daa[16*20+3]=  0.994227284;
+           daa[16*20+4]=  0.847312432;
+           daa[16*20+5]=  1.320626678;
+           daa[16*20+6]=  0.949599791;
+           daa[16*20+7]=  0.542185658;
+           daa[16*20+8]=  0.83039281;
+           daa[16*20+9]=  1.114132523;
+           daa[16*20+10]= 0.779827336;
+           daa[16*20+11]= 1.290709079;
+           daa[16*20+12]= 1.551488041;
+           daa[16*20+13]= 0.718895136;
+           daa[16*20+14]= 0.780913179;
+           daa[16*20+15]= 4.448982584;
+           daa[17*20+0]=  0.35011051;
+           daa[17*20+1]=  0.618778365;
+           daa[17*20+2]=  0.422407388;
+           daa[17*20+3]=  0.362495245;
+           daa[17*20+4]=  0.445669347;
+           daa[17*20+5]=  0.72038474;
+           daa[17*20+6]=  0.261258229;
+           daa[17*20+7]=  0.37874827;
+           daa[17*20+8]=  0.72436751;
+           daa[17*20+9]=  0.516260502;
+           daa[17*20+10]= 0.794797115;
+           daa[17*20+11]= 0.43340962;
+           daa[17*20+12]= 0.768395107;
+           daa[17*20+13]= 3.29519344;
+           daa[17*20+14]= 0.499869138;
+           daa[17*20+15]= 0.496334956;
+           daa[17*20+16]= 0.38372361;
+           daa[18*20+0]=  0.573154753;
+           daa[18*20+1]=  0.628599063;
+           daa[18*20+2]=  0.720013799;
+           daa[18*20+3]=  0.436220437;
+           daa[18*20+4]=  0.55626163;
+           daa[18*20+5]=  0.728970584;
+           daa[18*20+6]=  0.50720003;
+           daa[18*20+7]=  0.284727562;
+           daa[18*20+8]=  2.210952064;
+           daa[18*20+9]=  0.570562395;
+           daa[18*20+10]= 0.811019594;
+           daa[18*20+11]= 0.664884513;
+           daa[18*20+12]= 0.93253606;
+           daa[18*20+13]= 5.894735673;
+           daa[18*20+14]= 0.433748126;
+           daa[18*20+15]= 0.593795813;
+           daa[18*20+16]= 0.523549536;
+           daa[18*20+17]= 2.996248013;
+           daa[19*20+0]=  2.063050067;
+           daa[19*20+1]=  0.388680158;
+           daa[19*20+2]=  0.474418852;
+           daa[19*20+3]=  0.275658381;
+           daa[19*20+4]=  0.998911631;
+           daa[19*20+5]=  0.634408285;
+           daa[19*20+6]=  0.527640634;
+           daa[19*20+7]=  0.314700907;
+           daa[19*20+8]=  0.305792277;
+           daa[19*20+9]=  8.002789424;
+           daa[19*20+10]= 2.113077156;
+           daa[19*20+11]= 0.526184203;
+           daa[19*20+12]= 1.737356217;
+           daa[19*20+13]= 0.983844803;
+           daa[19*20+14]= 0.551333603;
+           daa[19*20+15]= 0.507506011;
+           daa[19*20+16]= 1.89965079;
+           daa[19*20+17]= 0.429570747;
+           daa[19*20+18]= 0.716795463;
+           
+           f[0]=  0.076;
+           f[1]=  0.054;
+           f[2]=  0.038;
+           f[3]=  0.045;
+           f[4]=  0.028;
+           f[5]=  0.034;
+           f[6]=  0.053;
+           f[7]=  0.078;
+           f[8]=  0.030;
+           f[9]=  0.060;
+           f[10]= 0.096;
+           f[11]= 0.052;
+           f[12]= 0.022;
+           f[13]= 0.045;
+           f[14]= 0.042;
+           f[15]= 0.068;
+           f[16]= 0.056;
+           f[17]= 0.016;
+           f[18]= 0.036;
+           f[19]= 0.071;
+	  }
+	  break;
+	case HIVB:
+	  {
+           daa[1*20+0]=   0.30750700;
+           daa[2*20+0]=   0.00500000;
+           daa[2*20+1]=   0.29554300;
+           daa[3*20+0]=   1.45504000;
+           daa[3*20+1]=   0.00500000;
+           daa[3*20+2]=   17.66120000;
+           daa[4*20+0]=   0.12375800;
+           daa[4*20+1]=   0.35172100;
+           daa[4*20+2]=   0.08606420;
+           daa[4*20+3]=   0.00500000;
+           daa[5*20+0]=   0.05511280;
+           daa[5*20+1]=   3.42150000;
+           daa[5*20+2]=   0.67205200;
+           daa[5*20+3]=   0.00500000;
+           daa[5*20+4]=   0.00500000;
+           daa[6*20+0]=   1.48135000;
+           daa[6*20+1]=   0.07492180;
+           daa[6*20+2]=   0.07926330;
+           daa[6*20+3]=   10.58720000;
+           daa[6*20+4]=   0.00500000;
+           daa[6*20+5]=   2.56020000;
+           daa[7*20+0]=   2.13536000;
+           daa[7*20+1]=   3.65345000;
+           daa[7*20+2]=   0.32340100;
+           daa[7*20+3]=   2.83806000;
+           daa[7*20+4]=   0.89787100;
+           daa[7*20+5]=   0.06191370;
+           daa[7*20+6]=   3.92775000;
+           daa[8*20+0]=   0.08476130;
+           daa[8*20+1]=   9.04044000;
+           daa[8*20+2]=   7.64585000;
+           daa[8*20+3]=   1.91690000;
+           daa[8*20+4]=   0.24007300;
+           daa[8*20+5]=   7.05545000;
+           daa[8*20+6]=   0.11974000;
+           daa[8*20+7]=   0.00500000;
+           daa[9*20+0]=   0.00500000;
+           daa[9*20+1]=   0.67728900;
+           daa[9*20+2]=   0.68056500;
+           daa[9*20+3]=   0.01767920;
+           daa[9*20+4]=   0.00500000;
+           daa[9*20+5]=   0.00500000;
+           daa[9*20+6]=   0.00609079;
+           daa[9*20+7]=   0.00500000;
+           daa[9*20+8]=   0.10311100;
+           daa[10*20+0]=  0.21525600;
+           daa[10*20+1]=  0.70142700;
+           daa[10*20+2]=  0.00500000;
+           daa[10*20+3]=  0.00876048;
+           daa[10*20+4]=  0.12977700;
+           daa[10*20+5]=  1.49456000;
+           daa[10*20+6]=  0.00500000;
+           daa[10*20+7]=  0.00500000;
+           daa[10*20+8]=  1.74171000;
+           daa[10*20+9]=  5.95879000;
+           daa[11*20+0]=  0.00500000;
+           daa[11*20+1]=  20.45000000;
+           daa[11*20+2]=  7.90443000;
+           daa[11*20+3]=  0.00500000;
+           daa[11*20+4]=  0.00500000;
+           daa[11*20+5]=  6.54737000;
+           daa[11*20+6]=  4.61482000;
+           daa[11*20+7]=  0.52170500;
+           daa[11*20+8]=  0.00500000;
+           daa[11*20+9]=  0.32231900;
+           daa[11*20+10]= 0.08149950;
+           daa[12*20+0]=  0.01866430;
+           daa[12*20+1]=  2.51394000;
+           daa[12*20+2]=  0.00500000;
+           daa[12*20+3]=  0.00500000;
+           daa[12*20+4]=  0.00500000;
+           daa[12*20+5]=  0.30367600;
+           daa[12*20+6]=  0.17578900;
+           daa[12*20+7]=  0.00500000;
+           daa[12*20+8]=  0.00500000;
+           daa[12*20+9]=  11.20650000;
+           daa[12*20+10]= 5.31961000;
+           daa[12*20+11]= 1.28246000;
+           daa[13*20+0]=  0.01412690;
+           daa[13*20+1]=  0.00500000;
+           daa[13*20+2]=  0.00500000;
+           daa[13*20+3]=  0.00500000;
+           daa[13*20+4]=  9.29815000;
+           daa[13*20+5]=  0.00500000;
+           daa[13*20+6]=  0.00500000;
+           daa[13*20+7]=  0.29156100;
+           daa[13*20+8]=  0.14555800;
+           daa[13*20+9]=  3.39836000;
+           daa[13*20+10]= 8.52484000;
+           daa[13*20+11]= 0.03426580;
+           daa[13*20+12]= 0.18802500;
+           daa[14*20+0]=  2.12217000;
+           daa[14*20+1]=  1.28355000;
+           daa[14*20+2]=  0.00739578;
+           daa[14*20+3]=  0.03426580;
+           daa[14*20+4]=  0.00500000;
+           daa[14*20+5]=  4.47211000;
+           daa[14*20+6]=  0.01202260;
+           daa[14*20+7]=  0.00500000;
+           daa[14*20+8]=  2.45318000;
+           daa[14*20+9]=  0.04105930;
+           daa[14*20+10]= 2.07757000;
+           daa[14*20+11]= 0.03138620;
+           daa[14*20+12]= 0.00500000;
+           daa[14*20+13]= 0.00500000;
+           daa[15*20+0]=  2.46633000;
+           daa[15*20+1]=  3.47910000;
+           daa[15*20+2]=  13.14470000;
+           daa[15*20+3]=  0.52823000;
+           daa[15*20+4]=  4.69314000;
+           daa[15*20+5]=  0.11631100;
+           daa[15*20+6]=  0.00500000;
+           daa[15*20+7]=  4.38041000;
+           daa[15*20+8]=  0.38274700;
+           daa[15*20+9]=  1.21803000;
+           daa[15*20+10]= 0.92765600;
+           daa[15*20+11]= 0.50411100;
+           daa[15*20+12]= 0.00500000;
+           daa[15*20+13]= 0.95647200;
+           daa[15*20+14]= 5.37762000;
+           daa[16*20+0]=  15.91830000;
+           daa[16*20+1]=  2.86868000;
+           daa[16*20+2]=  6.88667000;
+           daa[16*20+3]=  0.27472400;
+           daa[16*20+4]=  0.73996900;
+           daa[16*20+5]=  0.24358900;
+           daa[16*20+6]=  0.28977400;
+           daa[16*20+7]=  0.36961500;
+           daa[16*20+8]=  0.71159400;
+           daa[16*20+9]=  8.61217000;
+           daa[16*20+10]= 0.04376730;
+           daa[16*20+11]= 4.67142000;
+           daa[16*20+12]= 4.94026000;
+           daa[16*20+13]= 0.01412690;
+           daa[16*20+14]= 2.01417000;
+           daa[16*20+15]= 8.93107000;
+           daa[17*20+0]=  0.00500000;
+           daa[17*20+1]=  0.99133800;
+           daa[17*20+2]=  0.00500000;
+           daa[17*20+3]=  0.00500000;
+           daa[17*20+4]=  2.63277000;
+           daa[17*20+5]=  0.02665600;
+           daa[17*20+6]=  0.00500000;
+           daa[17*20+7]=  1.21674000;
+           daa[17*20+8]=  0.06951790;
+           daa[17*20+9]=  0.00500000;
+           daa[17*20+10]= 0.74884300;
+           daa[17*20+11]= 0.00500000;
+           daa[17*20+12]= 0.08907800;
+           daa[17*20+13]= 0.82934300;
+           daa[17*20+14]= 0.04445060;
+           daa[17*20+15]= 0.02487280;
+           daa[17*20+16]= 0.00500000;
+           daa[18*20+0]=  0.00500000;
+           daa[18*20+1]=  0.00991826;
+           daa[18*20+2]=  1.76417000;
+           daa[18*20+3]=  0.67465300;
+           daa[18*20+4]=  7.57932000;
+           daa[18*20+5]=  0.11303300;
+           daa[18*20+6]=  0.07926330;
+           daa[18*20+7]=  0.00500000;
+           daa[18*20+8]=  18.69430000;
+           daa[18*20+9]=  0.14816800;
+           daa[18*20+10]= 0.11198600;
+           daa[18*20+11]= 0.00500000;
+           daa[18*20+12]= 0.00500000;
+           daa[18*20+13]= 15.34000000;
+           daa[18*20+14]= 0.03043810;
+           daa[18*20+15]= 0.64802400;
+           daa[18*20+16]= 0.10565200;
+           daa[18*20+17]= 1.28022000;
+           daa[19*20+0]=  7.61428000;
+           daa[19*20+1]=  0.08124540;
+           daa[19*20+2]=  0.02665600;
+           daa[19*20+3]=  1.04793000;
+           daa[19*20+4]=  0.42002700;
+           daa[19*20+5]=  0.02091530;
+           daa[19*20+6]=  1.02847000;
+           daa[19*20+7]=  0.95315500;
+           daa[19*20+8]=  0.00500000;
+           daa[19*20+9]=  17.73890000;
+           daa[19*20+10]= 1.41036000;
+           daa[19*20+11]= 0.26582900;
+           daa[19*20+12]= 6.85320000;
+           daa[19*20+13]= 0.72327400;
+           daa[19*20+14]= 0.00500000;
+           daa[19*20+15]= 0.07492180;
+           daa[19*20+16]= 0.70922600;
+           daa[19*20+17]= 0.00500000;
+           daa[19*20+18]= 0.04105930;
+           
+           /*f[0]=  0.060;
+           f[1]=  0.066;
+           f[2]=  0.044;
+           f[3]=  0.042;
+           f[4]=  0.020;
+           f[5]=  0.054;
+           f[6]=  0.071;
+           f[7]=  0.072;
+           f[8]=  0.022;
+           f[9]=  0.070;
+           f[10]= 0.099;
+           f[11]= 0.057;
+           f[12]= 0.020;
+           f[13]= 0.029;
+           f[14]= 0.046;
+           f[15]= 0.051;
+           f[16]= 0.054;
+           f[17]= 0.033;
+           f[18]= 0.028;
+           f[19]= 0.062;*/
+
+	   f[0]= 0.060490222;           f[1]= 0.066039665;           f[2]= 0.044127815;           f[3]= 0.042109048;
+           f[4]= 0.020075899;           f[5]= 0.053606488;           f[6]= 0.071567447;           f[7]= 0.072308239;
+           f[8]= 0.022293943;           f[9]= 0.069730629;           f[10]= 0.098851122;          f[11]= 0.056968211;
+           f[12]= 0.019768318;          f[13]= 0.028809447;          f[14]= 0.046025282;          f[15]= 0.05060433;
+           f[16]= 0.053636813;          f[17]= 0.033011601;          f[18]= 0.028350243;          f[19]= 0.061625237;
+	  }
+	  break;
+	case HIVW:
+	  {
+           daa[1*20+0]=   0.0744808;
+           daa[2*20+0]=   0.6175090;
+           daa[2*20+1]=   0.1602400;
+           daa[3*20+0]=   4.4352100;
+           daa[3*20+1]=   0.0674539;
+           daa[3*20+2]=   29.4087000;
+           daa[4*20+0]=   0.1676530;
+           daa[4*20+1]=   2.8636400;
+           daa[4*20+2]=   0.0604932;
+           daa[4*20+3]=   0.0050000;
+           daa[5*20+0]=   0.0050000;
+           daa[5*20+1]=   10.6746000;
+           daa[5*20+2]=   0.3420680;
+           daa[5*20+3]=   0.0050000;
+           daa[5*20+4]=   0.0050000;
+           daa[6*20+0]=   5.5632500;
+           daa[6*20+1]=   0.0251632;
+           daa[6*20+2]=   0.2015260;
+           daa[6*20+3]=   12.1233000;
+           daa[6*20+4]=   0.0050000;
+           daa[6*20+5]=   3.2065600;
+           daa[7*20+0]=   1.8685000;
+           daa[7*20+1]=   13.4379000;
+           daa[7*20+2]=   0.0604932;
+           daa[7*20+3]=   10.3969000;
+           daa[7*20+4]=   0.0489798;
+           daa[7*20+5]=   0.0604932;
+           daa[7*20+6]=   14.7801000;
+           daa[8*20+0]=   0.0050000;
+           daa[8*20+1]=   6.8440500;
+           daa[8*20+2]=   8.5987600;
+           daa[8*20+3]=   2.3177900;
+           daa[8*20+4]=   0.0050000;
+           daa[8*20+5]=   18.5465000;
+           daa[8*20+6]=   0.0050000;
+           daa[8*20+7]=   0.0050000;
+           daa[9*20+0]=   0.0050000;
+           daa[9*20+1]=   1.3406900;
+           daa[9*20+2]=   0.9870280;
+           daa[9*20+3]=   0.1451240;
+           daa[9*20+4]=   0.0050000;
+           daa[9*20+5]=   0.0342252;
+           daa[9*20+6]=   0.0390512;
+           daa[9*20+7]=   0.0050000;
+           daa[9*20+8]=   0.0050000;
+           daa[10*20+0]=  0.1602400;
+           daa[10*20+1]=  0.5867570;
+           daa[10*20+2]=  0.0050000;
+           daa[10*20+3]=  0.0050000;
+           daa[10*20+4]=  0.0050000;
+           daa[10*20+5]=  2.8904800;
+           daa[10*20+6]=  0.1298390;
+           daa[10*20+7]=  0.0489798;
+           daa[10*20+8]=  1.7638200;
+           daa[10*20+9]=  9.1024600;
+           daa[11*20+0]=  0.5927840;
+           daa[11*20+1]=  39.8897000;
+           daa[11*20+2]=  10.6655000;
+           daa[11*20+3]=  0.8943130;
+           daa[11*20+4]=  0.0050000;
+           daa[11*20+5]=  13.0705000;
+           daa[11*20+6]=  23.9626000;
+           daa[11*20+7]=  0.2794250;
+           daa[11*20+8]=  0.2240600;
+           daa[11*20+9]=  0.8174810;
+           daa[11*20+10]= 0.0050000;
+           daa[12*20+0]=  0.0050000;
+           daa[12*20+1]=  3.2865200;
+           daa[12*20+2]=  0.2015260;
+           daa[12*20+3]=  0.0050000;
+           daa[12*20+4]=  0.0050000;
+           daa[12*20+5]=  0.0050000;
+           daa[12*20+6]=  0.0050000;
+           daa[12*20+7]=  0.0489798;
+           daa[12*20+8]=  0.0050000;
+           daa[12*20+9]=  17.3064000;
+           daa[12*20+10]= 11.3839000;
+           daa[12*20+11]= 4.0956400;
+           daa[13*20+0]=  0.5979230;
+           daa[13*20+1]=  0.0050000;
+           daa[13*20+2]=  0.0050000;
+           daa[13*20+3]=  0.0050000;
+           daa[13*20+4]=  0.3629590;
+           daa[13*20+5]=  0.0050000;
+           daa[13*20+6]=  0.0050000;
+           daa[13*20+7]=  0.0050000;
+           daa[13*20+8]=  0.0050000;
+           daa[13*20+9]=  1.4828800;
+           daa[13*20+10]= 7.4878100;
+           daa[13*20+11]= 0.0050000;
+           daa[13*20+12]= 0.0050000;
+           daa[14*20+0]=  1.0098100;
+           daa[14*20+1]=  0.4047230;
+           daa[14*20+2]=  0.3448480;
+           daa[14*20+3]=  0.0050000;
+           daa[14*20+4]=  0.0050000;
+           daa[14*20+5]=  3.0450200;
+           daa[14*20+6]=  0.0050000;
+           daa[14*20+7]=  0.0050000;
+           daa[14*20+8]=  13.9444000;
+           daa[14*20+9]=  0.0050000;
+           daa[14*20+10]= 9.8309500;
+           daa[14*20+11]= 0.1119280;
+           daa[14*20+12]= 0.0050000;
+           daa[14*20+13]= 0.0342252;
+           daa[15*20+0]=  8.5942000;
+           daa[15*20+1]=  8.3502400;
+           daa[15*20+2]=  14.5699000;
+           daa[15*20+3]=  0.4278810;
+           daa[15*20+4]=  1.1219500;
+           daa[15*20+5]=  0.1602400;
+           daa[15*20+6]=  0.0050000;
+           daa[15*20+7]=  6.2796600;
+           daa[15*20+8]=  0.7251570;
+           daa[15*20+9]=  0.7400910;
+           daa[15*20+10]= 6.1439600;
+           daa[15*20+11]= 0.0050000;
+           daa[15*20+12]= 0.3925750;
+           daa[15*20+13]= 4.2793900;
+           daa[15*20+14]= 14.2490000;
+           daa[16*20+0]=  24.1422000;
+           daa[16*20+1]=  0.9282030;
+           daa[16*20+2]=  4.5420600;
+           daa[16*20+3]=  0.6303950;
+           daa[16*20+4]=  0.0050000;
+           daa[16*20+5]=  0.2030910;
+           daa[16*20+6]=  0.4587430;
+           daa[16*20+7]=  0.0489798;
+           daa[16*20+8]=  0.9595600;
+           daa[16*20+9]=  9.3634500;
+           daa[16*20+10]= 0.0050000;
+           daa[16*20+11]= 4.0480200;
+           daa[16*20+12]= 7.4131300;
+           daa[16*20+13]= 0.1145120;
+           daa[16*20+14]= 4.3370100;
+           daa[16*20+15]= 6.3407900;
+           daa[17*20+0]=  0.0050000;
+           daa[17*20+1]=  5.9656400;
+           daa[17*20+2]=  0.0050000;
+           daa[17*20+3]=  0.0050000;
+           daa[17*20+4]=  5.4989400;
+           daa[17*20+5]=  0.0443298;
+           daa[17*20+6]=  0.0050000;
+           daa[17*20+7]=  2.8258000;
+           daa[17*20+8]=  0.0050000;
+           daa[17*20+9]=  0.0050000;
+           daa[17*20+10]= 1.3703100;
+           daa[17*20+11]= 0.0050000;
+           daa[17*20+12]= 0.0050000;
+           daa[17*20+13]= 0.0050000;
+           daa[17*20+14]= 0.0050000;
+           daa[17*20+15]= 1.1015600;
+           daa[17*20+16]= 0.0050000;
+           daa[18*20+0]=  0.0050000;
+           daa[18*20+1]=  0.0050000;
+           daa[18*20+2]=  5.0647500;
+           daa[18*20+3]=  2.2815400;
+           daa[18*20+4]=  8.3483500;
+           daa[18*20+5]=  0.0050000;
+           daa[18*20+6]=  0.0050000;
+           daa[18*20+7]=  0.0050000;
+           daa[18*20+8]=  47.4889000;
+           daa[18*20+9]=  0.1145120;
+           daa[18*20+10]= 0.0050000;
+           daa[18*20+11]= 0.0050000;
+           daa[18*20+12]= 0.5791980;
+           daa[18*20+13]= 4.1272800;
+           daa[18*20+14]= 0.0050000;
+           daa[18*20+15]= 0.9331420;
+           daa[18*20+16]= 0.4906080;
+           daa[18*20+17]= 0.0050000;
+           daa[19*20+0]=  24.8094000;
+           daa[19*20+1]=  0.2794250;
+           daa[19*20+2]=  0.0744808;
+           daa[19*20+3]=  2.9178600;
+           daa[19*20+4]=  0.0050000;
+           daa[19*20+5]=  0.0050000;
+           daa[19*20+6]=  2.1995200;
+           daa[19*20+7]=  2.7962200;
+           daa[19*20+8]=  0.8274790;
+           daa[19*20+9]=  24.8231000;
+           daa[19*20+10]= 2.9534400;
+           daa[19*20+11]= 0.1280650;
+           daa[19*20+12]= 14.7683000;
+           daa[19*20+13]= 2.2800000;
+           daa[19*20+14]= 0.0050000;
+           daa[19*20+15]= 0.8626370;
+           daa[19*20+16]= 0.0050000;
+           daa[19*20+17]= 0.0050000;
+           daa[19*20+18]= 1.3548200;
+           
+           /*f[0]=  0.038;
+           f[1]=  0.057;
+           f[2]=  0.089;
+           f[3]=  0.034;
+           f[4]=  0.024;
+           f[5]=  0.044;
+           f[6]=  0.062;
+           f[7]=  0.084;
+           f[8]=  0.016;
+           f[9]=  0.098;
+           f[10]= 0.058;
+           f[11]= 0.064;
+           f[12]= 0.016;
+           f[13]= 0.042;
+           f[14]= 0.046;
+           f[15]= 0.055;
+           f[16]= 0.081;
+           f[17]= 0.020;
+           f[18]= 0.021;
+           f[19]= 0.051;*/
+	   
+	   f[0]= 0.0377494;             f[1]= 0.057321;              f[2]= 0.0891129;             f[3]= 0.0342034;
+           f[4]= 0.0240105;             f[5]= 0.0437824;             f[6]= 0.0618606;             f[7]= 0.0838496;
+           f[8]= 0.0156076;             f[9]= 0.0983641;             f[10]= 0.0577867;            f[11]= 0.0641682;
+           f[12]= 0.0158419;            f[13]= 0.0422741;            f[14]= 0.0458601;            f[15]= 0.0550846;
+           f[16]= 0.0813774;            f[17]= 0.019597;             f[18]= 0.0205847;            f[19]= 0.0515638;
+	  }
+	  break;
+	case JTTDCMUT:
+	  {
+           daa[1*20+0]=   0.531678;
+           daa[2*20+0]=   0.557967;
+           daa[2*20+1]=   0.451095;
+           daa[3*20+0]=   0.827445;
+           daa[3*20+1]=   0.154899;
+           daa[3*20+2]=   5.549530;
+           daa[4*20+0]=   0.574478;
+           daa[4*20+1]=   1.019843;
+           daa[4*20+2]=   0.313311;
+           daa[4*20+3]=   0.105625;
+           daa[5*20+0]=   0.556725;
+           daa[5*20+1]=   3.021995;
+           daa[5*20+2]=   0.768834;
+           daa[5*20+3]=   0.521646;
+           daa[5*20+4]=   0.091304;
+           daa[6*20+0]=   1.066681;
+           daa[6*20+1]=   0.318483;
+           daa[6*20+2]=   0.578115;
+           daa[6*20+3]=   7.766557;
+           daa[6*20+4]=   0.053907;
+           daa[6*20+5]=   3.417706;
+           daa[7*20+0]=   1.740159;
+           daa[7*20+1]=   1.359652;
+           daa[7*20+2]=   0.773313;
+           daa[7*20+3]=   1.272434;
+           daa[7*20+4]=   0.546389;
+           daa[7*20+5]=   0.231294;
+           daa[7*20+6]=   1.115632;
+           daa[8*20+0]=   0.219970;
+           daa[8*20+1]=   3.210671;
+           daa[8*20+2]=   4.025778;
+           daa[8*20+3]=   1.032342;
+           daa[8*20+4]=   0.724998;
+           daa[8*20+5]=   5.684080;
+           daa[8*20+6]=   0.243768;
+           daa[8*20+7]=   0.201696;
+           daa[9*20+0]=   0.361684;
+           daa[9*20+1]=   0.239195;
+           daa[9*20+2]=   0.491003;
+           daa[9*20+3]=   0.115968;
+           daa[9*20+4]=   0.150559;
+           daa[9*20+5]=   0.078270;
+           daa[9*20+6]=   0.111773;
+           daa[9*20+7]=   0.053769;
+           daa[9*20+8]=   0.181788;
+           daa[10*20+0]=  0.310007;
+           daa[10*20+1]=  0.372261;
+           daa[10*20+2]=  0.137289;
+           daa[10*20+3]=  0.061486;
+           daa[10*20+4]=  0.164593;
+           daa[10*20+5]=  0.709004;
+           daa[10*20+6]=  0.097485;
+           daa[10*20+7]=  0.069492;
+           daa[10*20+8]=  0.540571;
+           daa[10*20+9]=  2.335139;
+           daa[11*20+0]=  0.369437;
+           daa[11*20+1]=  6.529255;
+           daa[11*20+2]=  2.529517;
+           daa[11*20+3]=  0.282466;
+           daa[11*20+4]=  0.049009;
+           daa[11*20+5]=  2.966732;
+           daa[11*20+6]=  1.731684;
+           daa[11*20+7]=  0.269840;
+           daa[11*20+8]=  0.525096;
+           daa[11*20+9]=  0.202562;
+           daa[11*20+10]= 0.146481;
+           daa[12*20+0]=  0.469395;
+           daa[12*20+1]=  0.431045;
+           daa[12*20+2]=  0.330720;
+           daa[12*20+3]=  0.190001;
+           daa[12*20+4]=  0.409202;
+           daa[12*20+5]=  0.456901;
+           daa[12*20+6]=  0.175084;
+           daa[12*20+7]=  0.130379;
+           daa[12*20+8]=  0.329660;
+           daa[12*20+9]=  4.831666;
+           daa[12*20+10]= 3.856906;
+           daa[12*20+11]= 0.624581;
+           daa[13*20+0]=  0.138293;
+           daa[13*20+1]=  0.065314;
+           daa[13*20+2]=  0.073481;
+           daa[13*20+3]=  0.032522;
+           daa[13*20+4]=  0.678335;
+           daa[13*20+5]=  0.045683;
+           daa[13*20+6]=  0.043829;
+           daa[13*20+7]=  0.050212;
+           daa[13*20+8]=  0.453428;
+           daa[13*20+9]=  0.777090;
+           daa[13*20+10]= 2.500294;
+           daa[13*20+11]= 0.024521;
+           daa[13*20+12]= 0.436181;
+           daa[14*20+0]=  1.959599;
+           daa[14*20+1]=  0.710489;
+           daa[14*20+2]=  0.121804;
+           daa[14*20+3]=  0.127164;
+           daa[14*20+4]=  0.123653;
+           daa[14*20+5]=  1.608126;
+           daa[14*20+6]=  0.191994;
+           daa[14*20+7]=  0.208081;
+           daa[14*20+8]=  1.141961;
+           daa[14*20+9]=  0.098580;
+           daa[14*20+10]= 1.060504;
+           daa[14*20+11]= 0.216345;
+           daa[14*20+12]= 0.164215;
+           daa[14*20+13]= 0.148483;
+           daa[15*20+0]=  3.887095;
+           daa[15*20+1]=  1.001551;
+           daa[15*20+2]=  5.057964;
+           daa[15*20+3]=  0.589268;
+           daa[15*20+4]=  2.155331;
+           daa[15*20+5]=  0.548807;
+           daa[15*20+6]=  0.312449;
+           daa[15*20+7]=  1.874296;
+           daa[15*20+8]=  0.743458;
+           daa[15*20+9]=  0.405119;
+           daa[15*20+10]= 0.592511;
+           daa[15*20+11]= 0.474478;
+           daa[15*20+12]= 0.285564;
+           daa[15*20+13]= 0.943971;
+           daa[15*20+14]= 2.788406;
+           daa[16*20+0]=  4.582565;
+           daa[16*20+1]=  0.650282;
+           daa[16*20+2]=  2.351311;
+           daa[16*20+3]=  0.425159;
+           daa[16*20+4]=  0.469823;
+           daa[16*20+5]=  0.523825;
+           daa[16*20+6]=  0.331584;
+           daa[16*20+7]=  0.316862;
+           daa[16*20+8]=  0.477355;
+           daa[16*20+9]=  2.553806;
+           daa[16*20+10]= 0.272514;
+           daa[16*20+11]= 0.965641;
+           daa[16*20+12]= 2.114728;
+           daa[16*20+13]= 0.138904;
+           daa[16*20+14]= 1.176961;
+           daa[16*20+15]= 4.777647;
+           daa[17*20+0]=  0.084329;
+           daa[17*20+1]=  1.257961;
+           daa[17*20+2]=  0.027700;
+           daa[17*20+3]=  0.057466;
+           daa[17*20+4]=  1.104181;
+           daa[17*20+5]=  0.172206;
+           daa[17*20+6]=  0.114381;
+           daa[17*20+7]=  0.544180;
+           daa[17*20+8]=  0.128193;
+           daa[17*20+9]=  0.134510;
+           daa[17*20+10]= 0.530324;
+           daa[17*20+11]= 0.089134;
+           daa[17*20+12]= 0.201334;
+           daa[17*20+13]= 0.537922;
+           daa[17*20+14]= 0.069965;
+           daa[17*20+15]= 0.310927;
+           daa[17*20+16]= 0.080556;
+           daa[18*20+0]=  0.139492;
+           daa[18*20+1]=  0.235601;
+           daa[18*20+2]=  0.700693;
+           daa[18*20+3]=  0.453952;
+           daa[18*20+4]=  2.114852;
+           daa[18*20+5]=  0.254745;
+           daa[18*20+6]=  0.063452;
+           daa[18*20+7]=  0.052500;
+           daa[18*20+8]=  5.848400;
+           daa[18*20+9]=  0.303445;
+           daa[18*20+10]= 0.241094;
+           daa[18*20+11]= 0.087904;
+           daa[18*20+12]= 0.189870;
+           daa[18*20+13]= 5.484236;
+           daa[18*20+14]= 0.113850;
+           daa[18*20+15]= 0.628608;
+           daa[18*20+16]= 0.201094;
+           daa[18*20+17]= 0.747889;
+           daa[19*20+0]=  2.924161;
+           daa[19*20+1]=  0.171995;
+           daa[19*20+2]=  0.164525;
+           daa[19*20+3]=  0.315261;
+           daa[19*20+4]=  0.621323;
+           daa[19*20+5]=  0.179771;
+           daa[19*20+6]=  0.465271;
+           daa[19*20+7]=  0.470140;
+           daa[19*20+8]=  0.121827;
+           daa[19*20+9]=  9.533943;
+           daa[19*20+10]= 1.761439;
+           daa[19*20+11]= 0.124066;
+           daa[19*20+12]= 3.038533;
+           daa[19*20+13]= 0.593478;
+           daa[19*20+14]= 0.211561;
+           daa[19*20+15]= 0.408532;
+           daa[19*20+16]= 1.143980;
+           daa[19*20+17]= 0.239697;
+           daa[19*20+18]= 0.165473;
+           
+           f[0]=  0.077;
+           f[1]=  0.051;
+           f[2]=  0.043;
+           f[3]=  0.051;
+           f[4]=  0.020;
+           f[5]=  0.041;
+           f[6]=  0.062;
+           f[7]=  0.075;
+           f[8]=  0.023;
+           f[9]=  0.053;
+           f[10]= 0.091;
+           f[11]= 0.059;
+           f[12]= 0.024;
+           f[13]= 0.040;
+           f[14]= 0.051;
+           f[15]= 0.068;
+           f[16]= 0.059;
+           f[17]= 0.014;
+           f[18]= 0.032;
+           f[19]= 0.066;
+	  }
+	  break;
+	case FLU:
+	  {
+	    daa[ 1*20+ 0] 	=	0.138658765	;
+	    daa[ 2*20+ 0] 	=	0.053366579	;
+	    daa[ 2*20+ 1] 	=	0.161000889	;
+	    daa[ 3*20+ 0] 	=	0.584852306	;
+	    daa[ 3*20+ 1] 	=	0.006771843	;
+	    daa[ 3*20+ 2] 	=	7.737392871	;
+	    daa[ 4*20+ 0] 	=	0.026447095	;
+	    daa[ 4*20+ 1] 	=	0.167207008	;
+	    daa[ 4*20+ 2] 	=	1.30E-05	;
+	    daa[ 4*20+ 3] 	=	1.41E-02	;
+	    daa[ 5*20+ 0] 	=	0.353753982	;
+	    daa[ 5*20+ 1] 	=	3.292716942	;
+	    daa[ 5*20+ 2] 	=	0.530642655	;
+	    daa[ 5*20+ 3] 	=	0.145469388	;
+	    daa[ 5*20+ 4] 	=	0.002547334	;
+	    daa[ 6*20+ 0] 	=	1.484234503	;
+	    daa[ 6*20+ 1] 	=	0.124897617	;
+	    daa[ 6*20+ 2] 	=	0.061652192	;
+	    daa[ 6*20+ 3] 	=	5.370511279	;
+	    daa[ 6*20+ 4] 	=	3.91E-11	;
+	    daa[ 6*20+ 5] 	=	1.195629122	;
+	    daa[ 7*20+ 0] 	=	1.132313122	;
+	    daa[ 7*20+ 1] 	=	1.190624465	;
+	    daa[ 7*20+ 2] 	=	0.322524648	;
+	    daa[ 7*20+ 3] 	=	1.934832784	;
+	    daa[ 7*20+ 4] 	=	0.116941459	;
+	    daa[ 7*20+ 5] 	=	0.108051341	;
+	    daa[ 7*20+ 6] 	=	1.593098825	;
+	    daa[ 8*20+ 0] 	=	0.214757862	;
+	    daa[ 8*20+ 1] 	=	1.879569938	;
+	    daa[ 8*20+ 2] 	=	1.387096032	;
+	    daa[ 8*20+ 3] 	=	0.887570549	;
+	    daa[ 8*20+ 4] 	=	2.18E-02	;
+	    daa[ 8*20+ 5] 	=	5.330313412	;
+	    daa[ 8*20+ 6] 	=	0.256491863	;
+	    daa[ 8*20+ 7] 	=	0.058774527	;
+	    daa[ 9*20+ 0] 	=	0.149926734	;
+	    daa[ 9*20+ 1] 	=	0.246117172	;
+	    daa[ 9*20+ 2] 	=	0.218571975	;
+	    daa[ 9*20+ 3] 	=	0.014085917	;
+	    daa[ 9*20+ 4] 	=	0.001112158	;
+	    daa[ 9*20+ 5] 	=	0.02883995	;
+	    daa[ 9*20+ 6] 	=	1.42E-02	;
+	    daa[ 9*20+ 7] 	=	1.63E-05	;
+	    daa[ 9*20+ 8] 	=	0.243190142	;
+	    daa[10*20+ 0] 	=	0.023116952	;
+	    daa[10*20+ 1] 	=	0.296045557	;
+	    daa[10*20+ 2] 	=	8.36E-04	;
+	    daa[10*20+ 3] 	=	0.005730682	;
+	    daa[10*20+ 4] 	=	0.005613627	;
+	    daa[10*20+ 5] 	=	1.020366955	;
+	    daa[10*20+ 6] 	=	0.016499536	;
+	    daa[10*20+ 7] 	=	0.006516229	;
+	    daa[10*20+ 8] 	=	0.321611694	;
+	    daa[10*20+ 9] 	=	3.512072282	;
+	    daa[11*20+ 0] 	=	0.47433361	;
+	    daa[11*20+ 1] 	=	15.30009662	;
+	    daa[11*20+ 2] 	=	2.646847965	;
+	    daa[11*20+ 3] 	=	0.29004298	;
+	    daa[11*20+ 4] 	=	3.83E-06	;
+	    daa[11*20+ 5] 	=	2.559587177	;
+	    daa[11*20+ 6] 	=	3.881488809	;
+	    daa[11*20+ 7] 	=	0.264148929	;
+	    daa[11*20+ 8] 	=	0.347302791	;
+	    daa[11*20+ 9] 	=	0.227707997	;
+	    daa[11*20+10] 	=	0.129223639	;
+	    daa[12*20+ 0] 	=	0.058745423	;
+	    daa[12*20+ 1] 	=	0.890162346	;
+	    daa[12*20+ 2] 	=	0.005251688	;
+	    daa[12*20+ 3] 	=	0.041762964	;
+	    daa[12*20+ 4] 	=	0.11145731	;
+	    daa[12*20+ 5] 	=	0.190259181	;
+	    daa[12*20+ 6] 	=	0.313974351	;
+	    daa[12*20+ 7] 	=	0.001500467	;
+	    daa[12*20+ 8] 	=	0.001273509	;
+	    daa[12*20+ 9] 	=	9.017954203	;
+	    daa[12*20+10] 	=	6.746936485	;
+	    daa[12*20+11] 	=	1.331291619	;
+	    daa[13*20+ 0] 	=	0.080490909	;
+	    daa[13*20+ 1] 	=	1.61E-02	;
+	    daa[13*20+ 2] 	=	8.36E-04	;
+	    daa[13*20+ 3] 	=	1.06E-06	;
+	    daa[13*20+ 4] 	=	0.104053666	;
+	    daa[13*20+ 5] 	=	0.032680657	;
+	    daa[13*20+ 6] 	=	0.001003501	;
+	    daa[13*20+ 7] 	=	0.001236645	;
+	    daa[13*20+ 8] 	=	0.119028506	;
+	    daa[13*20+ 9] 	=	1.463357278	;
+	    daa[13*20+10] 	=	2.986800036	;
+	    daa[13*20+11] 	=	3.20E-01	;
+	    daa[13*20+12] 	=	0.279910509	;
+	    daa[14*20+ 0] 	=	0.659311478	;
+	    daa[14*20+ 1] 	=	0.15402718	;
+	    daa[14*20+ 2] 	=	3.64E-02	;
+	    daa[14*20+ 3] 	=	0.188539456	;
+	    daa[14*20+ 4] 	=	1.59E-13	;
+	    daa[14*20+ 5] 	=	0.712769599	;
+	    daa[14*20+ 6] 	=	0.319558828	;
+	    daa[14*20+ 7] 	=	0.038631761	;
+	    daa[14*20+ 8] 	=	0.924466914	;
+	    daa[14*20+ 9] 	=	0.080543327	;
+	    daa[14*20+10] 	=	0.634308521	;
+	    daa[14*20+11] 	=	0.195750632	;
+	    daa[14*20+12] 	=	5.69E-02	;
+	    daa[14*20+13] 	=	0.00713243	;
+	    daa[15*20+ 0] 	=	3.011344519	;
+	    daa[15*20+ 1] 	=	0.95013841	;
+	    daa[15*20+ 2] 	=	3.881310531	;
+	    daa[15*20+ 3] 	=	0.338372183	;
+	    daa[15*20+ 4] 	=	0.336263345	;
+	    daa[15*20+ 5] 	=	0.487822499	;
+	    daa[15*20+ 6] 	=	0.307140298	;
+	    daa[15*20+ 7] 	=	1.585646577	;
+	    daa[15*20+ 8] 	=	0.58070425	;
+	    daa[15*20+ 9] 	=	0.290381075	;
+	    daa[15*20+10] 	=	0.570766693	;
+	    daa[15*20+11] 	=	0.283807672	;
+	    daa[15*20+12] 	=	0.007026588	;
+	    daa[15*20+13] 	=	0.99668567	;
+	    daa[15*20+14] 	=	2.087385344	;
+	    daa[16*20+ 0] 	=	5.418298175	;
+	    daa[16*20+ 1] 	=	0.183076905	;
+	    daa[16*20+ 2] 	=	2.140332316	;
+	    daa[16*20+ 3] 	=	0.135481233	;
+	    daa[16*20+ 4] 	=	0.011975266	;
+	    daa[16*20+ 5] 	=	0.602340963	;
+	    daa[16*20+ 6] 	=	0.280124895	;
+	    daa[16*20+ 7] 	=	0.01880803	;
+	    daa[16*20+ 8] 	=	0.368713573	;
+	    daa[16*20+ 9] 	=	2.904052286	;
+	    daa[16*20+10] 	=	0.044926357	;
+	    daa[16*20+11] 	=	1.5269642	;
+	    daa[16*20+12] 	=	2.031511321	;
+	    daa[16*20+13] 	=	0.000134906	;
+	    daa[16*20+14] 	=	0.542251094	;
+	    daa[16*20+15] 	=	2.206859934	;
+	    daa[17*20+ 0] 	=	1.96E-01	;
+	    daa[17*20+ 1] 	=	1.369429408	;
+	    daa[17*20+ 2] 	=	5.36E-04	;
+	    daa[17*20+ 3] 	=	1.49E-05	;
+	    daa[17*20+ 4] 	=	0.09410668	;
+	    daa[17*20+ 5] 	=	4.40E-02	;
+	    daa[17*20+ 6] 	=	0.155245492	;
+	    daa[17*20+ 7] 	=	0.196486447	;
+	    daa[17*20+ 8] 	=	2.24E-02	;
+	    daa[17*20+ 9] 	=	0.03213215	;
+	    daa[17*20+10] 	=	0.431277663	;
+	    daa[17*20+11] 	=	4.98E-05	;
+	    daa[17*20+12] 	=	0.070460039	;
+	    daa[17*20+13] 	=	0.814753094	;
+	    daa[17*20+14] 	=	0.000431021	;
+	    daa[17*20+15] 	=	0.099835753	;
+	    daa[17*20+16] 	=	0.207066206	;
+	    daa[18*20+ 0] 	=	0.018289288	;
+	    daa[18*20+ 1] 	=	0.099855497	;
+	    daa[18*20+ 2] 	=	0.373101927	;
+	    daa[18*20+ 3] 	=	0.525398543	;
+	    daa[18*20+ 4] 	=	0.601692431	;
+	    daa[18*20+ 5] 	=	0.072205935	;
+	    daa[18*20+ 6] 	=	0.10409287	;
+	    daa[18*20+ 7] 	=	0.074814997	;
+	    daa[18*20+ 8] 	=	6.448954446	;
+	    daa[18*20+ 9] 	=	0.273934263	;
+	    daa[18*20+10] 	=	0.340058468	;
+	    daa[18*20+11] 	=	0.012416222	;
+	    daa[18*20+12] 	=	0.874272175	;
+	    daa[18*20+13] 	=	5.393924245	;
+	    daa[18*20+14] 	=	1.82E-04	;
+	    daa[18*20+15] 	=	0.39255224	;
+	    daa[18*20+16] 	=	0.12489802	;
+	    daa[18*20+17] 	=	0.42775543	;
+	    daa[19*20+ 0] 	=	3.53200527	;
+	    daa[19*20+ 1] 	=	0.103964386	;
+	    daa[19*20+ 2] 	=	0.010257517	;
+	    daa[19*20+ 3] 	=	0.297123975	;
+	    daa[19*20+ 4] 	=	0.054904564	;
+	    daa[19*20+ 5] 	=	0.406697814	;
+	    daa[19*20+ 6] 	=	0.285047948	;
+	    daa[19*20+ 7] 	=	0.337229619	;
+	    daa[19*20+ 8] 	=	0.098631355	;
+	    daa[19*20+ 9] 	=	14.39405219	;
+	    daa[19*20+10] 	=	0.890598579	;
+	    daa[19*20+11] 	=	0.07312793	;
+	    daa[19*20+12] 	=	4.904842235	;
+	    daa[19*20+13] 	=	0.592587985	;
+	    daa[19*20+14] 	=	0.058971975	;
+	    daa[19*20+15] 	=	0.088256423	;
+	    daa[19*20+16] 	=	0.654109108	;
+	    daa[19*20+17] 	=	0.256900461	;
+	    daa[19*20+18] 	=	0.167581647	;
+	    
+ 
+  
+	    f[0]	=	0.0471	;
+	    f[1]	=	0.0509	;
+	    f[2]	=	0.0742	;
+	    f[3]	=	0.0479	;
+	    f[4]	=	0.0250	;
+	    f[5]	=	0.0333	;
+	    f[6]	=	0.0546	;
+	    f[7]	=	0.0764	;
+	    f[8]	=	0.0200	;
+	    f[9]	=	0.0671	;
+	    f[10]	=	0.0715	;
+	    f[11]	=	0.0568	;
+	    f[12]	=	0.0181	;
+	    f[13]	=	0.0305	;
+	    f[14]	=	0.0507	;
+	    f[15]	=	0.0884	;
+	    f[16]	=	0.0743	;
+	    f[17]	=	0.0185	;
+	    f[18]	=	0.0315	;
+	    f[19]	=	0.0632	;
+	  }
+	  break;     
+	default: 
+	  assert(0);
+	}
+    }
+
+
+  /*
+    
+  TODO review frequency sums for fixed as well as empirical base frequencies !
+  
+  NUMERICAL BUG fix, rounded AA freqs in some models, such that 
+  they actually really sum to 1.0 +/- epsilon 
+  
+  {
+    double acc = 0.0;
+  
+    for(i = 0; i < 20; i++)
+      acc += f[i];
+    
+    printf("%1.80f\n", acc);
+    assert(acc == 1.0);  
+  }
+  */
+ 
+
+
+  for (i=0; i<20; i++)  
+    for (j=0; j<i; j++)               
+      daa[j*20+i] = daa[i*20+j];
+
+  
+  /*
+    for (i=0; i<20; i++)  
+    {
+    for (j=0; j<20; j++)
+    {
+    if(i == j)
+    printf("0.0 ");
+    else
+    printf("%f ", daa[i * 20 + j]);
+    }
+    printf("\n");
+    }
+    
+    for (i=0; i<20; i++) 
+    printf("%f ", f[i]);
+    printf("\n");
+  */
+  
+
+  max = 0;
+  
+  for(i = 0; i < 19; i++)
+    for(j = i + 1; j < 20; j++)
+      {
+	q[i][j] = temp = daa[i * 20 + j];
+	if(temp > max) 
+	  max = temp;
+      }
+ 
+  scaler = AA_SCALE / max;
+   
+  /* SCALING HAS BEEN RE-INTRODUCED TO RESOLVE NUMERICAL  PROBLEMS */   
+
+  r = 0;
+  for(i = 0; i < 19; i++)
+    {      
+      for(j = i + 1; j < 20; j++)
+	{  
+	
+	  q[i][j] *= scaler;
+	  
+	  
+	  assert(q[i][j] <= AA_SCALE_PLUS_EPSILON);
+	  
+	  initialRates[r++] = q[i][j];
+	}
+    }             
+}
+
+          
+
+
+static void mytred2(double **a, const int n, double *d, double *e)
+{
+  int     l, k, j, i;
+  double  scale, hh, h, g, f; 
+ 
+  for (i = n; i > 1; i--)
+    {
+      l = i - 1;
+      h = 0.0;
+      scale = 0.0;
+      
+      if (l > 1)
+        {
+	  for (k = 1; k <= l; k++)
+	    scale += fabs(a[k - 1][i - 1]);
+	  if (scale == 0.0)
+	    e[i - 1] = a[l - 1][i - 1];
+	  else
+            {
+	      for (k = 1; k <= l; k++)
+                {
+		  a[k - 1][i - 1] /= scale;
+		  h += a[k - 1][i - 1] * a[k - 1][i - 1];
+                }
+	      f = a[l - 1][i - 1];
+	      g = ((f > 0) ? -sqrt(h) : sqrt(h)); /* diff */
+	      e[i - 1] = scale * g;
+	      h -= f * g;
+	      a[l - 1][i - 1] = f - g;
+	      f = 0.0;
+	      for (j = 1; j <= l; j++)
+		{
+		  a[i - 1][j - 1] = a[j - 1][i - 1] / h;
+		  g = 0.0;
+		  for (k = 1; k <= j; k++)
+		    g += a[k - 1][j - 1] * a[k - 1][i - 1];
+		  for (k = j + 1; k <= l; k++)
+		    g += a[j - 1][k - 1] * a[k - 1][i - 1];
+		  e[j - 1] = g / h;
+		  f += e[j - 1] * a[j - 1][i - 1];
+		}
+	      hh = f / (h + h);
+	      for (j = 1; j <= l; j++)
+		{
+		  f = a[j - 1][i - 1];
+		  g = e[j - 1] - hh * f;
+		  e[j - 1] = g;
+		  for (k = 1; k <= j; k++)
+		    a[k - 1][j - 1] -= (f * e[k - 1] + g * a[k - 1][i - 1]);
+                }
+	    }
+	} 
+      else
+	e[i - 1] = a[l - 1][i - 1];
+      d[i - 1] = h;
+    }
+  d[0] = 0.0;
+  e[0] = 0.0;
+  
+  for (i = 1; i <= n; i++)
+    {
+      l = i - 1;
+      if (d[i - 1] != 0.0)
+	{
+	  for (j = 1; j <= l; j++)
+            {
+                g = 0.0;
+                for (k = 1; k <= l; k++)
+		  g += a[k - 1][i - 1] * a[j - 1][k - 1];
+                for(k = 1; k <= l; k++)
+		  a[j - 1][k - 1] -= g * a[i - 1][k - 1];
+            }
+        }
+      d[i - 1] = a[i - 1][i - 1];
+      a[i - 1][i - 1] = 1.0;
+      for (j = 1; j <= l; j++)
+	a[i - 1][j - 1] = a[j - 1][i - 1] = 0.0;
+    }
+ 
+ 
+}
+/*#define MYSIGN(a,b) ((b)<0 ? -fabs(a) : fabs(a))*/
+
+static int mytqli(double *d, double *e, const int n, double **z)
+{
+  int     m, l, iter, i, k;
+  double  s, r, p, g, f, dd, c, b;
+   
+  for (i = 2; i <= n; i++)
+    e[i - 2] = e[i - 1];
+
+  e[n - 1] = 0.0;
+
+  for (l = 1; l <= n; l++)
+    {
+      iter = 0;
+      do
+	{
+	  for (m = l; m <= n - 1; m++)
+            {
+	      dd = fabs(d[m - 1]) + fabs(d[m]);
+	      if (fabs(e[m - 1]) + dd == dd)
+		break;
+	    }
+
+	  if (m != l)
+           {
+	     assert(iter < 30);
+	     
+	     g = (d[l] - d[l - 1]) / (2.0 * e[l - 1]);
+	     r = sqrt((g * g) + 1.0);
+	     g = d[m - 1] - d[l - 1] + e[l - 1] / (g + ((g < 0)?-fabs(r):fabs(r)));/*MYSIGN(r, g));*/
+	     s = c = 1.0;
+	     p = 0.0;
+
+	     for (i = m - 1; i >= l; i--)
+               {
+		 f = s * e[i - 1];
+		 b = c * e[i - 1];
+		 if (fabs(f) >= fabs(g))
+		   {
+		     c = g / f;
+		     r = sqrt((c * c) + 1.0);
+		     e[i] = f * r;
+		     c *= (s = 1.0 / r);
+		   } 
+		 else
+		   {
+		     s = f / g;
+		     r = sqrt((s * s) + 1.0);
+		     e[i] = g * r;
+		     s *= (c = 1.0 / r);
+		   }
+		 g = d[i] - p;
+		 r = (d[i - 1] - g) * s + 2.0 * c * b;
+		 p = s * r;
+		 d[i] = g + p;
+		 g = c * r - b;
+		 for (k = 1; k <= n; k++)
+		   {
+		     f = z[i][k-1];
+		     z[i][k-1] = s * z[i - 1][k - 1] + c * f;
+		     z[i - 1][k - 1] = c * z[i - 1][k - 1] - s * f;
+		   }
+	       }
+
+	     d[l - 1] = d[l - 1] - p;
+	     e[l - 1] = g;
+	     e[m - 1] = 0.0;
+	   }
+	} 
+      while (m != l);
+    }
+
+    
+ 
+    return (1);
+ }
+
+
+static void makeEigen(double **_a, const int n, double *d, double *e)
+{
+  mytred2(_a, n, d, e);
+  mytqli(d, e, n, _a);
+}
+
+static void initGeneric(const int n, const unsigned int *valueVector, int valueVectorLength,		      
+			double *ext_EIGN,
+			double *EV,
+			double *EI,
+			double *frequencies,
+			double *ext_initialRates,
+			double *tipVector,
+			int model)
+{
+  double 
+    fracchange = 0.0,
+    **r, 
+    **a, 
+    **EIGV,
+    *initialRates = ext_initialRates, 
+    *f, 
+    *e, 
+    *d, 
+    *invfreq, 
+    *EIGN,
+    *eptr; 
+  
+  int 
+    i, 
+    j, 
+    k, 
+    m, 
+    l;  
+
+  r    = (double **)malloc(n * sizeof(double *));
+  EIGV = (double **)malloc(n * sizeof(double *));  
+  a    = (double **)malloc(n * sizeof(double *));	  
+  
+  for(i = 0; i < n; i++)
+    {
+      a[i]    = (double*)malloc(n * sizeof(double));
+      EIGV[i] = (double*)malloc(n * sizeof(double));
+      r[i]    = (double*)malloc(n * sizeof(double));
+    }
+
+  f       = (double*)malloc(n * sizeof(double));
+  e       = (double*)malloc(n * sizeof(double));
+  d       = (double*)malloc(n * sizeof(double));
+  invfreq = (double*)malloc(n * sizeof(double));
+  EIGN    = (double*)malloc(n * sizeof(double));
+  
+  for(l = 0; l < n; l++)		 
+    f[l] = frequencies[l];	
+    
+  
+  i = 0;
+  
+  for(j = 0; j < n; j++)	 
+    for(k = 0; k < n; k++)
+      r[j][k] = 0.0;
+  
+  for(j = 0; j < n - 1; j++)
+    for (k = j+1; k < n; k++)      	  
+      r[j][k] = initialRates[i++];         
+  
+  for (j = 0; j < n; j++) 
+    {
+      r[j][j] = 0.0;
+      for (k = 0; k < j; k++)
+	r[j][k] = r[k][j];
+    }                            
+  
+  for (j = 0; j< n; j++)
+    for (k = 0; k< n; k++)
+      fracchange += f[j] * r[j][k] * f[k];             
+  
+  m = 0;
+  
+  for(i=0; i< n; i++) 
+    a[i][i] = 0;
+  
+  /*  assert(r[n - 2][n - 1] == 1.0);*/
+  
+  for(i=0; i < n; i++) 
+    {
+      for(j=i+1;  j < n; j++) 
+	{
+	  double factor =  initialRates[m++];
+	  a[i][j] = a[j][i] = factor * sqrt( f[i] * f[j]);
+	  a[i][i] -= factor * f[j];
+	  a[j][j] -= factor * f[i];
+	}
+    }             	        
+
+  makeEigen(a, n, d, e);
+  
+ 
+  
+  for(i=0; i<n; i++)     
+    for(j=0; j<n; j++)       
+      a[i][j] *= sqrt(f[j]);
+   
+  
+  
+  for (i=0; i<n; i++)
+    {	  
+      if (d[i] > -1e-8) 
+	{	      
+	  if (i != 0) 
+	    {		    
+	      double tmp = d[i], sum=0;
+	      d[i] = d[0];
+	      d[0] = tmp;
+	      for (j=0; j < n; j++) 
+		{
+		  tmp = a[i][j];
+		  a[i][j] = a[0][j];
+		  sum += (a[0][j] = tmp);
+		}
+	      for (j=0; j < n; j++) 
+		a[0][j] /= sum;
+	    }
+	  break;
+	}
+    }
+  
+  for (i=0; i< n; i++) 
+    {
+      EIGN[i] = -d[i];
+      
+      for (j=0; j<n; j++)
+	EIGV[i][j] = a[j][i];
+      invfreq[i] = 1 / EIGV[i][0]; 
+    }                                    
+  
+  ext_EIGN[0] = 0.0;
+
+  for(l = 1; l < n; l++)
+    {
+      ext_EIGN[l] = EIGN[l] * (1.0 / fracchange); 
+      assert(ext_EIGN[l] > 0.0);
+    }
+  
+  eptr = EV;
+  
+  for(i = 0; i < n; i++)		  
+    for(j = 0; j < n; j++)
+      {
+	*eptr++ = EIGV[i][j];	             	    	     
+	
+      }
+  
+  for(i = 0; i < n; i++)
+    for(j = 0; j < n; j++)
+      {
+	if(j == 0)
+	  EI[i * n + j] = 1.0;
+	else
+	  EI[i * n + j] = EV[i * n + j] * invfreq[i];  
+      }
+  
+  /* 
+     printf("EIGN\n");
+
+     for(i = 0; i < n; i++)
+     printf("%f ", ext_EIGN[i]);
+     printf("\n");
+
+     printf("EI\n");
+     for(i = 0; i < n; i++)
+     {
+     for(j = 0; j < n; j++)
+     {
+     printf("%f ", EI[i * n + j]);             
+     }
+     printf("\n");
+    }
+  */
+  
+
+
+  for(i=0; i < valueVectorLength; i++)
+    {
+      unsigned int value = valueVector[i];
+      
+      for(j = 0; j < n; j++)
+	tipVector[i * n + j]     = 0;	            
+
+      if(value > 0)
+	{		      
+	  for (j = 0; j < n; j++) 
+	    {	    
+	      if ((value >> j) & 1) 
+		{
+		  int l;
+		  for(l = 0; l < n; l++)
+		    tipVector[i * n + l] += EIGV[j][l];		      		      		     		      
+		}	     		  
+	    }	    
+	}     
+    }
+
+  for(i = 0; i < valueVectorLength; i++)
+    {
+       for(j = 0; j < n; j++)
+	 if(tipVector[i * n + j] > MAX_TIP_EV)
+	   tipVector[i * n + j] = MAX_TIP_EV;
+    }
+
+
+  
+
+  for(i = 0; i < n; i++)
+    {
+      free(EIGV[i]);
+      free(a[i]);
+      free(r[i]);
+    }
+
+  free(r);
+  free(a);
+  free(EIGV);
+
+  free(f);
+  free(e);
+  free(d);
+  free(invfreq);
+  free(EIGN);
+}
+
+
+
+
+void initReversibleGTR(tree *tr, int model)
+{ 
+  double      
+    *ext_EIGN         = tr->partitionData[model].EIGN,
+    *EV               = tr->partitionData[model].EV,
+    *EI               = tr->partitionData[model].EI,
+    *frequencies      = tr->partitionData[model].frequencies,
+    *ext_initialRates = tr->partitionData[model].substRates,
+    *tipVector        = tr->partitionData[model].tipVector;
+  
+  int 
+    states = tr->partitionData[model].states;
+  
+  switch(tr->partitionData[model].dataType)
+    { 
+    case GENERIC_32:
+    case GENERIC_64:
+    case SECONDARY_DATA_6:
+    case SECONDARY_DATA_7: 
+    case SECONDARY_DATA:
+    case DNA_DATA:
+    case BINARY_DATA:     
+      initGeneric(states, 
+		  getBitVector(tr->partitionData[model].dataType), 
+		  getUndetermined(tr->partitionData[model].dataType) + 1, 	       
+		  ext_EIGN, 
+		  EV, 
+		  EI, 
+		  frequencies, 
+		  ext_initialRates,
+		  tipVector, 
+		  model);
+     break;   
+   case AA_DATA:
+     if(tr->partitionData[model].protModels != GTR)           
+       {
+	 double 
+	   f[20];
+	 
+	 if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+	   {
+	     int 
+	       i;
+	     
+	     for(i = 0; i < 4; i++)
+	       {		 
+		 initProtMat(f, tr->partitionData[model].protModels, &(tr->partitionData[model].substRates_LG4[i][0]), i);
+		 
+		 if(!tr->partitionData[model].protFreqs && !tr->partitionData[model].optimizeBaseFrequencies)
+		   memcpy(tr->partitionData[model].frequencies_LG4[i], f, 20 * sizeof(double));
+		 //for(l = 0; l < 20; l++)		
+		 //   tr->partitionData[model].frequencies_LG4[i][l] = f[l];
+		 else
+		   memcpy(tr->partitionData[model].frequencies_LG4[i], frequencies, 20 * sizeof(double));
+	       }
+	   }
+	 else
+	   {
+	     if(tr->partitionData[model].protModels == AUTO)
+	       initProtMat(f, tr->partitionData[model].autoProtModels, ext_initialRates, 0);
+	     else	  
+	       initProtMat(f, tr->partitionData[model].protModels, ext_initialRates, 0); 		   
+	     
+	     /*if(adef->protEmpiricalFreqs && tr->NumberOfModels == 1)
+	       assert(tr->partitionData[model].protFreqs);*/
+	     
+	     if(tr->partitionData[model].protModels == AUTO)
+	       {
+		 if(tr->partitionData[model].protFreqs)
+		   memcpy(frequencies, f, 20 * sizeof(double));
+		 else
+		   memcpy(frequencies, tr->partitionData[model].empiricalFrequencies, 20 * sizeof(double));		 
+	       }
+	     else
+	       {
+		 if(!tr->partitionData[model].optimizeBaseFrequencies)
+		   {
+		     if(!tr->partitionData[model].protFreqs)	       	  
+		       {
+			 memcpy(frequencies, f, 20 * sizeof(double));
+			 /*for(l = 0; l < 20; l++)		
+			   frequencies[l] = f[l];			 */
+		       } 
+		     else
+		       {			
+			 memcpy(frequencies, tr->partitionData[model].empiricalFrequencies, 20 * sizeof(double));
+			 /*for(l = 0; l < 20; l++)		
+			   frequencies[l] = tr->partitionData[model].empiricalFrequencies[l];			 */
+		       }
+		   }
+	       }
+	   }
+       }  
+               
+     if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+       {
+	 int 
+	   i;	 
+
+	 for(i = 0; i < 4; i++)	   	     
+	   initGeneric(states, bitVectorAA, 23, 
+		       tr->partitionData[model].rawEIGN_LG4[i],  tr->partitionData[model].EV_LG4[i],  
+		       tr->partitionData[model].EI_LG4[i], tr->partitionData[model].frequencies_LG4[i], 
+		       tr->partitionData[model].substRates_LG4[i],
+		       tr->partitionData[model].tipVector_LG4[i], 
+		       model);   	 	 
+
+	 scaleLG4X_EIGN(tr, model);
+       }
+     else
+       initGeneric(states, bitVectorAA, 23,
+		   ext_EIGN, EV, EI, frequencies, ext_initialRates,
+		   tipVector, 
+		   model);                   
+     break;  
+   default:
+     assert(0);
+   } 
+
+#ifdef __MIC_NATIVE
+  if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+    updateModel_LG4_MIC(&tr->partitionData[model]);
+  else
+    updateModel_MIC(&tr->partitionData[model]);
+#endif
+}
+
+
+double LnGamma (double alpha)
+{
+/* returns ln(gamma(alpha)) for alpha>0, accurate to 10 decimal places.  
+   Stirling's formula is used for the central polynomial part of the procedure.
+   Pike MC & Hill ID (1966) Algorithm 291: Logarithm of the gamma function.
+   Communications of the Association for Computing Machinery, 9:684
+*/
+  double x, f, z, result;
+
+  x = alpha;
+  f = 0.0;
+  
+  if ( x < 7.0) 
+     {
+       f = 1.0;  
+       z = alpha - 1.0;
+      
+       while ((z = z + 1.0) < 7.0)  
+	 {	  
+	   f *= z;
+	 }
+       x = z;   
+     
+       assert(f != 0.0);
+	
+       f=-log(f);
+     }
+   
+   z = 1/(x*x);
+   
+   result = f + (x-0.5)*log(x) - x + .918938533204673 
+	  + (((-.000595238095238*z+.000793650793651)*z-.002777777777778)*z
+	       +.083333333333333)/x;  
+
+   return result;
+}
+
+
+
+double IncompleteGamma (double x, double alpha, double ln_gamma_alpha)
+{
+/* returns the incomplete gamma ratio I(x,alpha) where x is the upper 
+	   limit of the integration and alpha is the shape parameter.
+   returns (-1) if in error
+   ln_gamma_alpha = ln(Gamma(alpha)), is almost redundant.
+   (1) series expansion     if (alpha>x || x<=1)
+   (2) continued fraction   otherwise
+   RATNEST FORTRAN by
+   Bhattacharjee GP (1970) The incomplete gamma integral.  Applied Statistics,
+   19: 285-287 (AS32)
+*/
+   int i;
+   double p=alpha, g=ln_gamma_alpha;
+   double accurate=1e-8, overflow=1e30;
+   double factor, gin=0, rn=0, a=0,b=0,an=0,dif=0, term=0, pn[6];
+
+
+   if (x==0) return (0);
+   if (x<0 || p<=0) return (-1);
+
+   
+   factor=exp(p*log(x)-x-g);   
+   if (x>1 && x>=p) goto l30;
+   /* (1) series expansion */
+   gin=1;  term=1;  rn=p;
+ l20:
+   rn++;
+   term*=x/rn;   gin+=term;
+
+   if (term > accurate) goto l20;
+   gin*=factor/p;
+   goto l50;
+ l30:  
+   /* (2) continued fraction */
+   a=1-p;   b=a+x+1;  term=0;
+   pn[0]=1;  pn[1]=x;  pn[2]=x+1;  pn[3]=x*b;
+   gin=pn[2]/pn[3];   
+ l32:  
+   a++;  
+   b+=2;  
+   term++;   
+   an=a*term;
+   for (i=0; i<2; i++) 
+     pn[i+4]=b*pn[i+2]-an*pn[i];
+   if (pn[5] == 0) goto l35;
+   rn=pn[4]/pn[5];   
+   dif=fabs(gin-rn);  
+   if (dif>accurate) goto l34;
+   if (dif<=accurate*rn) goto l42;
+ l34:   
+   gin=rn;
+ l35: 
+   for (i=0; i<4; i++) 
+     pn[i]=pn[i+2];
+   if (fabs(pn[4]) < overflow)            
+     goto l32;        
+   
+   for (i=0; i<4; i++) 
+     pn[i]/=overflow;
+
+   
+   goto l32;
+ l42:  
+   gin=1-factor*gin;
+
+ l50: 
+   return (gin);
+}
+
+
+
+
+double PointNormal (double prob)
+{
+/* returns z so that Prob{x<z}=prob where x ~ N(0,1) and (1e-12)<prob<1-(1e-12)
+   returns (-9999) if in error
+   Odeh RE & Evans JO (1974) The percentage points of the normal distribution.
+   Applied Statistics 22: 96-97 (AS70)
+
+   Newer methods:
+     Wichura MJ (1988) Algorithm AS 241: the percentage points of the
+       normal distribution.  37: 477-484.
+     Beasley JD & Springer SG  (1977).  Algorithm AS 111: the percentage 
+       points of the normal distribution.  26: 118-121.
+
+*/
+   double a0=-.322232431088, a1=-1, a2=-.342242088547, a3=-.0204231210245;
+   double a4=-.453642210148e-4, b0=.0993484626060, b1=.588581570495;
+   double b2=.531103462366, b3=.103537752850, b4=.0038560700634;
+   double y, z=0, p=prob, p1;
+
+   p1 = (p<0.5 ? p : 1-p);
+   if (p1<1e-20) return (-9999);
+
+   y = sqrt (log(1/(p1*p1)));   
+   z = y + ((((y*a4+a3)*y+a2)*y+a1)*y+a0) / ((((y*b4+b3)*y+b2)*y+b1)*y+b0);
+   return (p<0.5 ? -z : z);
+}
+
+
+double PointChi2 (double prob, double v)
+{
+/* returns z so that Prob{x<z}=prob where x is Chi2 distributed with df=v
+   returns -1 if in error.   0.000002<prob<0.999998
+   RATNEST FORTRAN by
+       Best DJ & Roberts DE (1975) The percentage points of the 
+       Chi2 distribution.  Applied Statistics 24: 385-388.  (AS91)
+   Converted into C by Ziheng Yang, Oct. 1993.
+*/
+   double e=.5e-6, aa=.6931471805, p=prob, g;
+   double xx, c, ch, a=0,q=0,p1=0,p2=0,t=0,x=0,b=0,s1,s2,s3,s4,s5,s6;
+  
+   if (p<.000002 || p>.999998 || v<=0) return (-1);
+  
+   g = LnGamma(v/2);
+   
+   xx=v/2;   c=xx-1;
+   if (v >= -1.24*log(p)) goto l1;
+
+   ch=pow((p*xx*exp(g+xx*aa)), 1/xx);
+   if (ch-e<0) return (ch);
+   goto l4;
+l1:
+   if (v>.32) goto l3;
+   ch=0.4;   a=log(1-p);
+l2:
+   q=ch;  p1=1+ch*(4.67+ch);  p2=ch*(6.73+ch*(6.66+ch));
+   t=-0.5+(4.67+2*ch)/p1 - (6.73+ch*(13.32+3*ch))/p2;
+   ch-=(1-exp(a+g+.5*ch+c*aa)*p2/p1)/t;
+   if (fabs(q/ch-1)-.01 <= 0) goto l4;
+   else                       goto l2;
+  
+l3:    
+   x=PointNormal (p);
+   p1=0.222222/v;   ch=v*pow((x*sqrt(p1)+1-p1), 3.0);
+   if (ch>2.2*v+6)  ch=-2*(log(1-p)-c*log(.5*ch)+g);
+l4:
+   q=ch;   p1=.5*ch;   
+   if ((t=IncompleteGamma (p1, xx, g))< 0.0) 
+     {
+       printf ("IncompleteGamma \n");      
+       return (-1);
+     }
+  
+   p2=p-t;
+   t=p2*exp(xx*aa+g+p1-c*log(ch));   
+   b=t/ch;  a=0.5*t-b*c;
+
+   s1=(210+a*(140+a*(105+a*(84+a*(70+60*a))))) / 420;
+   s2=(420+a*(735+a*(966+a*(1141+1278*a))))/2520;
+   s3=(210+a*(462+a*(707+932*a)))/2520;
+   s4=(252+a*(672+1182*a)+c*(294+a*(889+1740*a)))/5040;
+   s5=(84+264*a+c*(175+606*a))/2520;
+   s6=(120+c*(346+127*c))/5040;
+   ch+=t*(1+0.5*t*s1-b*c*(s1-b*(s2-b*(s3-b*(s4-b*(s5-b*s6))))));
+   if (fabs(q/ch-1) > e) goto l4;
+
+   return (ch);
+}
+
+
+
+
+
+
+void makeGammaCats(double alpha, double *gammaRates, int K, boolean useMedian)
+{
+  int 
+    i;
+
+  double 
+    factor = alpha / alpha * K, 
+    lnga1, 
+    alfa = alpha, 
+    beta = alpha,
+    *gammaProbs = (double *)malloc(K * sizeof(double));
+
+  /* Note that ALPHA_MIN setting is somewhat critical due to   */
+  /* numerical instability caused by very small rate[0] values */
+  /* induced by low alpha values around 0.01 */
+
+  assert(alfa >= ALPHA_MIN); 
+
+  if(useMedian)
+    {
+      double  
+	middle = 1.0 / (2.0*K),
+	t = 0.0; 
+      
+      for(i = 0; i < K; i++)     
+	gammaRates[i] = PointGamma((double)(i * 2 + 1) * middle, alfa, beta);
+      
+      for (i = 0; i < K; i++) 
+	t += gammaRates[i];
+       for( i = 0; i < K; i++)     
+	 gammaRates[i] *= factor / t;
+    }
+  else
+    {
+      lnga1 = LnGamma(alfa + 1);
+
+      for (i = 0; i < K - 1; i++)
+	gammaProbs[i] = PointGamma((i + 1.0) / K, alfa, beta);
+
+      for (i = 0; i < K - 1; i++)
+	gammaProbs[i] = IncompleteGamma(gammaProbs[i] * beta, alfa + 1, lnga1);   
+
+      gammaRates[0] = gammaProbs[0] * factor;
+      
+      gammaRates[K - 1] = (1 - gammaProbs[K - 2]) * factor;
+
+      for (i= 1; i < K - 1; i++)  
+	gammaRates[i] = (gammaProbs[i] - gammaProbs[i - 1]) * factor;      
+    }
+  /* assert(gammaRates[0] >= 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875); */
+
+  free(gammaProbs);
+
+  return;  
+}
+
+
+static void setRates(double *r, int rates)
+{
+  int i;
+
+  //changed to 1.0 instead of 0.5 for making the 
+  //implementation of an interface function to set other models 
+  //than GTR easier 
+  
+  for(i = 0; i < rates - 1; i++)
+    r[i] = 1.0;
+  
+  r[rates - 1] = 1.0;
+}
+
+void initRateMatrix(tree *tr)
+{
+  int model;
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {	
+      int 	
+	i,
+	states = tr->partitionData[model].states,
+	rates  = (states * states - states) / 2;
+      
+      switch(tr->partitionData[model].dataType)
+	{
+	case BINARY_DATA:
+	case DNA_DATA:
+	case SECONDARY_DATA:
+	case SECONDARY_DATA_6:
+	case SECONDARY_DATA_7:
+	  setRates(tr->partitionData[model].substRates, rates);
+	  break;	  
+	case GENERIC_32:
+	case GENERIC_64:	  
+	  switch(tr->multiStateModel)
+	    {
+	    case ORDERED_MULTI_STATE:
+	      {
+		int 
+		  j, 
+		  k, 
+		  i = 0;
+		
+		for(j = 0; j < states; j++)
+		  for(k = j + 1; k < states; k++)
+		    tr->partitionData[model].substRates[i++] = (double)(k - j);			
+		assert(i == rates);		
+	      }
+	      break;
+	    case MK_MULTI_STATE:
+	      for(i = 0; i < rates; i++)
+		tr->partitionData[model].substRates[i] = 1.0;
+	      
+	      break;
+	    case GTR_MULTI_STATE:
+	      setRates(tr->partitionData[model].substRates, rates);
+	      break;
+	    default:
+	      assert(0);
+	    }
+	  break;
+	case AA_DATA:
+	  if(tr->partitionData[model].protModels == GTR)	      
+	    putWAG(tr->partitionData[model].substRates);
+	  break;
+	default:
+	  assert(0);
+	}           
+      
+      if(tr->partitionData[model].nonGTR)
+	{
+	  assert(tr->partitionData[model].dataType == SECONDARY_DATA || 
+		 tr->partitionData[model].dataType == SECONDARY_DATA_6 ||
+		 tr->partitionData[model].dataType == SECONDARY_DATA_7);
+	  	  
+	  for(i = 0; i < rates; i++)
+	    {
+	      if(tr->partitionData[model].symmetryVector[i] == -1)
+		tr->partitionData[model].substRates[i] = 0.0;
+	      else
+		{
+		  if(tr->partitionData[model].symmetryVector[i] == tr->partitionData[model].symmetryVector[rates - 1])
+		    tr->partitionData[model].substRates[i] = 1.0;
+		}
+	    }
+	}
+    }  
+}
+
+static void setSymmetry(int *s, int *sDest, const int sCount, int *f, int *fDest, const int fCount)
+{
+  int i;
+
+  for(i = 0; i < sCount; i++)
+    sDest[i] = s[i];
+
+  for(i = 0; i < fCount; i++)
+    fDest[i] = f[i];
+}
+
+static void setupSecondaryStructureSymmetries(tree *tr)
+{
+  int model;
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+      if(tr->partitionData[model].dataType == SECONDARY_DATA || 
+	 tr->partitionData[model].dataType == SECONDARY_DATA_6 || 
+	 tr->partitionData[model].dataType == SECONDARY_DATA_7)
+	{	
+	  switch(tr->secondaryStructureModel)
+	    {
+	    case SEC_6_A:
+	      tr->partitionData[model].nonGTR = FALSE;
+	      break;
+	    case SEC_6_B:
+	      {
+		int f[6]  = {0, 1, 2, 3, 4, 5};
+		int s[15] = {2, 0, 1, 2, 2, 2, 2, 0, 1, 1, 2, 2, 2, 2, 1};
+
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+		  
+		tr->partitionData[model].nonGTR = TRUE;
+	      }
+	      break;
+	    case SEC_6_C:
+	      {
+		int f[6]  = {0, 2, 2, 1, 0, 1};
+		int s[15] = {2, 0, 1, 2, 2, 2, 2, 0, 1, 1, 2, 2, 2, 2, 1};
+
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+		
+		tr->partitionData[model].nonGTR = TRUE;
+	      }
+	      break;
+	    case SEC_6_D:
+	      {
+		int f[6]  = {0, 2, 2, 1, 0, 1};
+		int s[15] = {2, -1, 1, 2, 2, 2, 2, -1, 1, 1, 2, 2, 2, 2, 1};
+
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+		tr->partitionData[model].nonGTR = TRUE;
+	      }
+	      break;
+	    case SEC_6_E:
+	      {
+		int f[6]  = {0, 1, 2, 3, 4, 5};
+		int s[15] = {2, -1, 1, 2, 2, 2, 2, -1, 1, 1, 2, 2, 2, 2, 1};
+
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+		tr->partitionData[model].nonGTR = TRUE;
+	      }
+	      break;
+	    case SEC_7_A:
+	      tr->partitionData[model].nonGTR = FALSE;
+	      break;
+	    case SEC_7_B:
+	      {
+	      	int f[7]  = {0, 2, 2, 1, 0, 1, 3};
+		int s[21] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+		tr->partitionData[model].nonGTR = TRUE;
+
+	      }
+	      break;
+	    case SEC_7_C:
+	      {
+	      	int f[7]  = {0, 1, 2, 3, 4, 5, 6};
+		int s[21] = {-1, -1, 0, -1, -1, 4, -1, -1, -1, 3, 5, 1, -1, -1, 6, -1, -1, 7, 2, 8, 9};
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+		tr->partitionData[model].nonGTR = TRUE;
+
+	      }
+	      break;
+	    case SEC_7_D:
+	      {
+	      	int f[7]  = {0, 1, 2, 3, 4, 5, 6};
+		int s[21] = {2, 0, 1, 2, 2, 3, 2, 2, 0, 1, 3, 1, 2, 2, 3, 2, 2, 3, 1, 3, 3};
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+		tr->partitionData[model].nonGTR = TRUE;
+
+	      }
+	      break;
+	    case SEC_7_E:
+	      {
+	      	int f[7]  = {0, 1, 2, 3, 4, 5, 6};
+		int s[21] = {-1, -1, 0, -1, -1, 1, -1, -1, -1, 0, 1, 0, -1, -1, 1, -1, -1, 1, 0, 1, 1};
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+		tr->partitionData[model].nonGTR = TRUE;
+
+	      }
+	      break;
+	    case SEC_7_F:
+	      {
+	      	int f[7]  = {0, 2, 2, 1, 0, 1, 3};
+		int s[21] = {2, 0, 1, 2, 2, 3, 2, 2, 0, 1, 3, 1, 2, 2, 3, 2, 2, 3, 1, 3, 3};		
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+		tr->partitionData[model].nonGTR = TRUE;
+
+	      }
+	      break;
+	      
+	    case SEC_16:
+	      tr->partitionData[1].nonGTR = FALSE;
+	      break;
+	    case SEC_16_A:
+	      {
+	      	int f[16]  = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+		int s[120] = {/* AA */  4,  4,  3,  4, -1, -1, -1,  4, -1, -1, -1,  3, -1, -1, -1,
+			      /* AC */  4,  3, -1,  4, -1, -1, -1,  3, -1, -1, -1,  4, -1, -1,
+			      /* AG */  3, -1, -1,  3, -1, -1, -1,  4, -1, -1, -1,  3, -1,
+			      /* AU */ -1, -1,  2,  3, -1,  0, -1,  1,  2, -1,  2,  3,
+			      /* CA */  4,  3,  4,  4, -1, -1, -1,  3, -1, -1, -1,
+			      /* CC */  3,  4, -1,  3, -1, -1, -1,  4, -1, -1,
+			      /* CG */  3, -1,  2,  3,  2,  0, -1,  1, -1,
+			      /* CU */ -1, -1, -1,  3, -1, -1, -1,  4,
+			      /* GA */  3,  4,  3,  3, -1, -1, -1,
+			      /* GC */  3,  1,  2,  3,  2, -1,
+			      /* GG */  3, -1, -1,  3, -1,
+			      /* GU */  2, -1,  2,  3,
+			      /* UA */  3,  1,  3,
+			      /* UC */  3,  4,
+			      /* UG */  3};
+			      
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 120, f, tr->partitionData[model].frequencyGrouping, 16);
+			      
+		tr->partitionData[model].nonGTR = TRUE;
+
+		}
+	      break;
+	    case SEC_16_B:
+	      {
+		int f[16]  = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+		int s[120] = {/* AA */  0,  0,  0,  0, -1, -1, -1,  0, -1, -1, -1,  0, -1, -1, -1,
+			      /* AC */  0,  0, -1,  0, -1, -1, -1,  0, -1, -1, -1,  0, -1, -1,
+			      /* AG */  0, -1, -1,  0, -1, -1, -1,  0, -1, -1, -1,  0, -1,
+			      /* AU */ -1, -1,  0,  0, -1,  0, -1,  0,  0, -1,  0,  0,
+			      /* CA */  0,  0,  0,  0, -1, -1, -1,  0, -1, -1, -1,
+			      /* CC */  0,  0, -1,  0, -1, -1, -1,  0, -1, -1,
+			      /* CG */  0, -1,  0,  0,  0,  0, -1,  0, -1,
+			      /* CU */ -1, -1, -1,  0, -1, -1, -1,  0,
+			      /* GA */  0,  0,  0,  0, -1, -1, -1,
+			      /* GC */  0,  0,  0,  0,  0, -1,
+			      /* GG */  0, -1, -1,  0, -1,
+			      /* GU */  0, -1,  0,  0,
+			      /* UA */  0,  0,  0,
+			      /* UC */  0,  0,
+			      /* UG */  0};
+			      
+		
+		setSymmetry(s, tr->partitionData[model].symmetryVector, 120, f, tr->partitionData[model].frequencyGrouping, 16);
+			      
+		tr->partitionData[model].nonGTR = TRUE;
+	      }
+	      break;
+	    case SEC_16_C:	      
+	    case SEC_16_D:
+	    case SEC_16_E:
+	    case SEC_16_F:
+	    case SEC_16_I:
+	    case SEC_16_J:
+	    case SEC_16_K:
+	      assert(0);
+	    default:
+	      assert(0);
+	    }
+	}
+
+    }
+
+}
+
+
+/* this function is only called once at program start-up ! */
+
+static void initializeBaseFreqs(tree *tr)
+{
+  size_t 
+    model;
+
+  for(model = 0; model < (size_t)tr->NumberOfModels; model++)
+    {      
+      if(tr->partitionData[model].optimizeBaseFrequencies)
+	{
+	  //set all base frequencies to identical starting values 1.0 / numberOfDataStates
+	  //if we want to optimize base freqeuncies for the current partition
+	  
+	  int 
+	    l,
+	    numFreqs = tr->partitionData[model].states;
+
+	  double 
+	    f = 1.0 / ((double)numFreqs);
+
+	  for(l = 0; l < numFreqs; l++)
+	    {		
+	      tr->partitionData[model].frequencies[l] = f;
+	      tr->partitionData[model].empiricalFrequencies[l] = f;
+	    }
+	}
+      else
+	{	 
+	  //otherwise, at startup examl reads and stores the empirical base frequencies as determined by the
+	  //parser code in .frequencies, now we just store them in .empiricalFrequencies such that we can 
+	  //overwrite .frequencies without losing the empirical base freqs
+	 
+	  memcpy(tr->partitionData[model].empiricalFrequencies, tr->partitionData[model].frequencies, sizeof(double) * tr->partitionData[model].states);	  
+	}
+    }
+}
+
+/* this function is only called once at program start-up ! */
+
+void initModel(tree *tr)
+{  
+  int 
+    model;
+
+
+  optimizeRateCategoryInvocations = 1;      
+  tr->numberOfInvariableColumns = 0;
+  tr->weightOfInvariableColumns = 0;	       
+
+  if(tr->rateHetModel == CAT)
+    {
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{            
+	  tr->partitionData[model].numberOfCategories = 1;           
+	  tr->partitionData[model].perSiteRates[0] = 1.0; 
+
+	  size_t i; 
+	  for(i = 0; i < tr->partitionData[model].width; ++i)
+	    {
+	      tr->partitionData[model].rateCategory[i] = 0; 
+	      tr->partitionData[model].patrat[i] = 1.; 
+	    }
+	}
+
+      checkPerSiteRates(tr); 
+    }
+
+  setupSecondaryStructureSymmetries(tr);
+  
+  initRateMatrix(tr); 
+
+  initializeBaseFreqs(tr);
+  
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+      int 
+	k;
+
+      tr->partitionData[model].alpha = 1.0;    
+      
+      if(tr->partitionData[model].protModels == AUTO)
+	tr->partitionData[model].autoProtModels = WAG; /* initialize by WAG per default when AUTO is used */
+                              
+      makeGammaCats(tr->partitionData[model].alpha, tr->partitionData[model].gammaRates, 4, tr->useMedian);     
+
+      for(k = 0; k < tr->partitionData[model].states; k++)
+	tr->partitionData[model].freqExponents[k] = 0.0;	
+
+      //LG4X inits 
+
+      for(k = 0; k < 4; k++)
+	{
+	  tr->partitionData[model].weights[k] = 0.25;
+	  tr->partitionData[model].weightExponents[k] = 0.0;
+	}
+
+      initReversibleGTR(tr, model);
+    }                   		           
+}
+
+
+
+
diff --git a/examl/newviewGenericSpecial.c b/examl/newviewGenericSpecial.c
new file mode 100644
index 0000000..b8e6daf
--- /dev/null
+++ b/examl/newviewGenericSpecial.c
@@ -0,0 +1,6218 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+
+#include <stdint.h>
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+
+/* required to compute the absoliute values of double precision numbers with SSE3 */
+
+const union __attribute__ ((aligned (BYTE_ALIGNMENT)))
+{
+       uint64_t i[2];
+       __m128d m;
+} absMask = {{0x7fffffffffffffffULL , 0x7fffffffffffffffULL }};
+
+
+
+#endif
+
+/* includes MIC-optimized functions */
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+extern int processID;
+
+/* bit mask */
+
+extern const unsigned int mask32[32];
+
+
+/* generic function for computing the P matrices, for computing the conditional likelihood at a node p, given child nodes q and r 
+   we compute P(z1) and P(z2) here */
+
+static void makeP(double z1, double z2, double *rptr, double *EI,  double *EIGN, int numberOfCategories, double *left, double *right, boolean saveMem, int maxCat, const int states)
+{
+  int 
+    i, 
+    j, 
+    k,
+    /* square of the number of states = P-matrix size */
+    statesSquare = states * states;
+  
+  /* assign some space for pre-computing and later re-using functions */
+
+  double 
+    *lz1 = (double*)malloc(sizeof(double) * states),
+    *lz2 = (double*)malloc(sizeof(double) * states),
+    *d1 = (double*)malloc(sizeof(double) * states),
+    *d2 = (double*)malloc(sizeof(double) * states);
+
+  /* multiply branch lengths with eigenvalues */
+
+  for(i = 1; i < states; i++)
+    {
+      lz1[i] = EIGN[i] * z1;
+      lz2[i] = EIGN[i] * z2;
+    }
+
+
+  /* loop over the number of rate categories, this will be 4 for the GAMMA model and 
+     variable for the CAT model */
+
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      /* exponentiate the rate multiplied by the branch */
+
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP(rptr[i] * lz1[j]);
+	  d2[j] = EXP(rptr[i] * lz2[j]);
+	}
+
+      /* now fill the P matrices for the two branch length values */
+
+      for(j = 0; j < states; j++)
+	{
+	  /* left and right are pre-allocated arrays */
+
+	  left[statesSquare * i  + states * j] = 1.0;
+	  right[statesSquare * i + states * j] = 1.0;	  
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[statesSquare * i + states * j + k]  = d1[k] * EI[states * j + k];
+	      right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+
+
+  /* if memory saving is enabled and we are using CAT we need to do one additional P matrix 
+     calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+  if(saveMem)
+    {
+      i = maxCat;
+      
+      for(j = 1; j < states; j++)
+	{
+	  d1[j] = EXP (lz1[j]);
+	  d2[j] = EXP (lz2[j]);
+	}
+
+      for(j = 0; j < states; j++)
+	{
+	  left[statesSquare * i  + states * j] = 1.0;
+	  right[statesSquare * i + states * j] = 1.0;
+
+	  for(k = 1; k < states; k++)
+	    {
+	      left[statesSquare * i + states * j + k]  = d1[k] * EI[states * j + k];
+	      right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+	    }
+	}
+    }
+  
+  /* free the temporary buffers */
+
+  free(lz1);
+  free(lz2);
+  free(d1);
+  free(d2);
+}
+
+static void makeP_FlexLG4(double z1, double z2, double *rptr, double *EI[4],  double *EIGN[4], int numberOfCategories, double *left, double *right, const int numStates)
+{
+  int 
+    i,
+    j,
+    k;
+  
+  const int
+    statesSquare = numStates * numStates;
+
+  double    
+    d1[64],  
+    d2[64];
+
+  assert(numStates <= 64);
+       
+  for(i = 0; i < numberOfCategories; i++)
+    {
+      for(j = 1; j < numStates; j++)
+	{
+	  d1[j] = EXP (rptr[i] * EIGN[i][j] * z1);
+	  d2[j] = EXP (rptr[i] * EIGN[i][j] * z2);
+	}
+
+      for(j = 0; j < numStates; j++)
+	{
+	  left[statesSquare * i  + numStates * j] = 1.0;
+	  right[statesSquare * i + numStates * j] = 1.0;
+
+	  for(k = 1; k < numStates; k++)
+	    {
+	      left[statesSquare * i + numStates * j + k]  = d1[k] * EI[i][numStates * j + k];
+	      right[statesSquare * i + numStates * j + k] = d2[k] * EI[i][numStates * j + k];
+	    }
+	}
+    }  
+}
+
+
+
+/* The functions here are organized in a similar way as in evaluateGenericSpecial.c 
+   I provide generic, slow but readable function implementations for computing the 
+   conditional likelihood arrays at p, given child nodes q and r. Once again we need 
+   two generic function implementations, one for CAT and one for GAMMA */
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+static void newviewCAT_FLEX(int tipCase, double *extEV,
+			    int *cptr,
+			    double *x1, double *x2, double *x3, double *tipVector,
+			    unsigned char *tipX1, unsigned char *tipX2,
+			    int n, double *left, double *right, int *wgt, int *scalerIncrement, const int states)
+{
+  double
+    *le, 
+    *ri, 
+    *v, 
+    *vl, 
+    *vr,
+    ump_x1, 
+    ump_x2, 
+    x1px2;
+
+  int 
+    i, 
+    l, 
+    j, 
+    scale, 
+    addScale = 0;
+
+  const int 
+    statesSquare = states * states;
+
+
+  /* here we switch over the different cases for efficiency, but also because 
+     each case accesses different data types.
+
+     We consider three cases: either q and r are both tips, q or r are tips, and q and r are inner 
+     nodes.
+  */
+     
+
+  switch(tipCase)
+    {
+      
+      /* both child nodes of p weher we want to update the conditional likelihood are tips */
+    case TIP_TIP:     
+      /* loop over sites */
+      for (i = 0; i < n; i++)
+	{
+	  /* set a pointer to the P-Matrices for the rate category of this site */
+	  le = &left[cptr[i] * statesSquare];
+	  ri = &right[cptr[i] * statesSquare];
+	  
+	  /* pointers to the likelihood entries of the tips q (vl) and r (vr) 
+	     We will do reading accesses to these values only.
+	   */
+	  vl = &(tipVector[states * tipX1[i]]);
+	  vr = &(tipVector[states * tipX2[i]]);
+	  
+	  /* address of the conditional likelihood array entres at site i. This is 
+	     a writing access to v */
+	  v  = &x3[states * i];
+	  
+	  /* initialize v */
+	  for(l = 0; l < states; l++)
+	    v[l] = 0.0;
+	  	  
+	  /* loop over states to compute the cond likelihoods at p (v) */
+
+	  for(l = 0; l < states; l++)
+	    {	      
+	      ump_x1 = 0.0;
+	      ump_x2 = 0.0;
+	      
+	      /* le and ri are the P-matrices */
+
+	      for(j = 0; j < states; j++)
+		{
+		  ump_x1 += vl[j] * le[l * states + j];
+		  ump_x2 += vr[j] * ri[l * states + j];
+		}
+	      
+	      x1px2 = ump_x1 * ump_x2;
+	      
+	      /* multiply with matrix of eigenvectors extEV */
+
+	      for(j = 0; j < states; j++)
+		v[j] += x1px2 * extEV[l * states + j];
+	    }	   
+	}    
+      break;
+    case TIP_INNER:      
+
+      /* same as above, only that now vl is a tip and vr is the conditional probability vector 
+	 at an inner node. Note that, if we have the case that either q or r is a tip, the 
+	 nodes will be flipped to ensure that tipX1 always points to the sequence at the tip.
+      */
+
+      for (i = 0; i < n; i++)
+	{
+	  le = &left[cptr[i] * statesSquare];
+	  ri = &right[cptr[i] * statesSquare];
+	  
+	  /* access tip vector lookup table */
+	  vl = &(tipVector[states * tipX1[i]]);
+
+	  /* access conditional likelihoo arrays */
+	  /* again, vl and vr are reading accesses, while v is a writing access */
+	  vr = &x2[states * i];
+	  v  = &x3[states * i];
+	  
+	  /* same as in the loop above */
+
+	  for(l = 0; l < states; l++)
+	    v[l] = 0.0;
+	  
+	  for(l = 0; l < states; l++)
+	    {
+	      ump_x1 = 0.0;
+	      ump_x2 = 0.0;
+	      
+	      for(j = 0; j < states; j++)
+		{
+		  ump_x1 += vl[j] * le[l * states + j];
+		  ump_x2 += vr[j] * ri[l * states + j];
+		}
+	      
+	      x1px2 = ump_x1 * ump_x2;
+	      
+	      for(j = 0; j < states; j++)
+		v[j] += x1px2 * extEV[l * states + j];
+	    }
+	  
+	  /* now let's check for numerical scaling. 
+	     The maths in RAxML are a bit non-standard to avoid/economize on arithmetic operations 
+	     at the virtual root and for branch length optimization and hence values stored 
+	     in the conditional likelihood vectors can become negative.
+	     Below we check if all absolute values stored at position i of v are smaller 
+	     than a pre-defined value in axml.h. If they are all smaller we can then safely 
+	     multiply them by a large, constant number twotothe256 (without numerical overflow) 
+	     that is also speced in axml.h */
+
+	  scale = 1;
+	  for(l = 0; scale && (l < states); l++)
+	    scale = ((v[l] < minlikelihood) && (v[l] > minusminlikelihood));	   
+	  
+	  if(scale)
+	    {
+	      for(l = 0; l < states; l++)
+		v[l] *= twotothe256;
+	      
+	      /* if we have scaled the entries to prevent underflow, we need to keep track of how many scaling 
+		 multiplications we did per node such as to undo them at the virtual root, e.g., in 
+		 evaluateGeneric() 
+		 Note here, that, if we scaled the site we need to increment the scaling counter by the wieght, i.e., 
+		 the number of sites this potentially compressed pattern represents ! */ 
+
+	      addScale += wgt[i];	  
+	    }
+	}   
+      break;
+    case INNER_INNER:
+      
+      /* same as above, only that the two child nodes q and r are now inner nodes */
+
+      for(i = 0; i < n; i++)
+	{
+	  le = &left[cptr[i] * statesSquare];
+	  ri = &right[cptr[i] * statesSquare];
+
+	  /* index conditional likelihood vectors of inner nodes */
+
+	  vl = &x1[states * i];
+	  vr = &x2[states * i];
+	  v = &x3[states * i];
+
+	  for(l = 0; l < states; l++)
+	    v[l] = 0.0;
+	 
+	  for(l = 0; l < states; l++)
+	    {
+	      ump_x1 = 0.0;
+	      ump_x2 = 0.0;
+
+	      for(j = 0; j < states; j++)
+		{
+		  ump_x1 += vl[j] * le[l * states + j];
+		  ump_x2 += vr[j] * ri[l * states + j];
+		}
+
+	      x1px2 =  ump_x1 * ump_x2;
+
+	      for(j = 0; j < states; j++)
+		v[j] += x1px2 * extEV[l * states + j];	      
+	    }
+
+	   scale = 1;
+	   for(l = 0; scale && (l < states); l++)
+	     scale = ((v[l] < minlikelihood) && (v[l] > minusminlikelihood));
+  
+	   if(scale)
+	     {
+	       for(l = 0; l < states; l++)
+		 v[l] *= twotothe256;
+
+	       addScale += wgt[i];	   
+	     }
+	}
+      break;
+    default:
+      assert(0);
+    }
+   
+  /* increment the scaling counter by the additional scalings done at node p */
+
+  *scalerIncrement = addScale;
+}
+
+
+static void newviewGAMMA_FLEX(int tipCase,
+			      double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+			      unsigned char *tipX1, unsigned char *tipX2,
+			      int n, double *left, double *right, int *wgt, int *scalerIncrement, const int states, const int maxStateValue)
+{
+  double  
+    *uX1, 
+    *uX2, 
+    *v, 
+    x1px2, 
+    *vl, 
+    *vr, 
+    al, 
+    ar;
+  
+  int  
+    i, 
+    j, 
+    l, 
+    k, 
+    scale, 
+    addScale = 0;
+
+  const int     
+    statesSquare = states * states,
+    span = states * 4,
+    /* this is required for doing some pre-computations that help to save 
+       numerical operations. What we are actually computing here are additional lookup tables 
+       for each possible state a certain data-type can assume.
+       for DNA with ambuguity coding this is 15, for proteins this is 22 or 23, since there 
+       also exist one or two amibguity codes for protein data.
+       Essentially this is very similar to the tip vectors which we also use as lookup tables */
+    precomputeLength = maxStateValue * span;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	/* allocate pre-compute memory space */
+
+	double 
+	  *umpX1 = (double*)malloc(sizeof(double) * precomputeLength),
+	  *umpX2 = (double*)malloc(sizeof(double) * precomputeLength);
+
+	/* multiply all possible tip state vectors with the respective P-matrices 
+	 */
+
+	for(i = 0; i < maxStateValue; i++)
+	  {
+	    v = &(tipVector[states * i]);
+
+	    for(k = 0; k < span; k++)
+	      {
+
+		umpX1[span * i + k] = 0.0;
+		umpX2[span * i + k] = 0.0;
+
+		for(l = 0; l < states; l++)
+		  {
+		    umpX1[span * i + k] +=  v[l] *  left[k * states + l];
+		    umpX2[span * i + k] +=  v[l] * right[k * states + l];
+		  }
+
+	      }
+	  }
+
+	for(i = 0; i < n; i++)
+	  {
+	    /* access the precomputed arrays (pre-computed multiplication of conditional with the tip state) 
+	     */
+
+	    uX1 = &umpX1[span * tipX1[i]];
+	    uX2 = &umpX2[span * tipX2[i]];
+
+	    /* loop over discrete GAMMA rates */
+
+	    for(j = 0; j < 4; j++)
+	      {
+		/* the rest is the same as for CAT */
+		v = &x3[i * span + j * states];
+
+		for(k = 0; k < states; k++)
+		  v[k] = 0.0;
+
+		for(k = 0; k < states; k++)
+		  {		   
+		    x1px2 = uX1[j * states + k] * uX2[j * states + k];
+		   
+		    for(l = 0; l < states; l++)		      					
+		      v[l] += x1px2 * extEV[states * k + l];		     
+		  }
+
+	      }	   
+	  }
+	
+	/* free precomputed vectors */
+
+	free(umpX1);
+	free(umpX2);
+      }
+      break;
+    case TIP_INNER:
+      {
+	/* we do analogous pre-computations as above, with the only difference that we now do them 
+	   only for one tip vector */
+
+	double 
+	  *umpX1 = (double*)malloc(sizeof(double) * precomputeLength),
+	  *ump_x2 = (double*)malloc(sizeof(double) * states);
+
+	/* precompute P and left tip vector product */
+
+	for(i = 0; i < maxStateValue; i++)
+	  {
+	    v = &(tipVector[states * i]);
+
+	    for(k = 0; k < span; k++)
+	      {
+  
+		umpX1[span * i + k] = 0.0;
+
+		for(l = 0; l < states; l++)
+		  umpX1[span * i + k] +=  v[l] * left[k * states + l];
+
+
+	      }
+	  }
+
+	for (i = 0; i < n; i++)
+	  {
+	    /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+
+	    uX1 = &umpX1[span * tipX1[i]];
+
+	    /* loop over discrete GAMMA rates */
+
+	    for(k = 0; k < 4; k++)
+	      {
+		v = &(x2[span * i + k * states]);
+
+		for(l = 0; l < states; l++)
+		  {
+		    ump_x2[l] = 0.0;
+
+		    for(j = 0; j < states; j++)
+		      ump_x2[l] += v[j] * right[k * statesSquare + l * states + j];
+		  }
+
+		v = &(x3[span * i + states * k]);
+
+		for(l = 0; l < states; l++)
+		  v[l] = 0;
+
+		for(l = 0; l < states; l++)
+		  {
+		    x1px2 = uX1[k * states + l]  * ump_x2[l];
+		    for(j = 0; j < states; j++)
+		      v[j] += x1px2 * extEV[l * states  + j];
+		  }
+	      }
+	   
+	    /* also do numerical scaling as above. Note that here we need to scale 
+	       4 * 4 values for DNA or 4 * 20 values for protein data.
+	       If they are ALL smaller than our threshold, we scale. Note that,
+	       this can cause numerical problems with GAMMA, if the values generated 
+	       by the four discrete GAMMA rates are too different.
+
+	       For details, see: 
+	       
+	       F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees"
+
+	    */
+	    
+
+	    v = &x3[span * i];
+	    scale = 1;
+	    for(l = 0; scale && (l < span); l++)
+	      scale = (ABS(v[l]) <  minlikelihood);
+
+
+	    if (scale)
+	      {
+		for(l = 0; l < span; l++)
+		  v[l] *= twotothe256;
+	
+		addScale += wgt[i];		    
+	      }
+	  }
+
+	free(umpX1);
+	free(ump_x2);
+      }
+      break;
+    case INNER_INNER:
+
+      /* same as above, without pre-computations */
+
+      for (i = 0; i < n; i++)
+       {
+	 for(k = 0; k < 4; k++)
+	   {
+	     vl = &(x1[span * i + states * k]);
+	     vr = &(x2[span * i + states * k]);
+	     v =  &(x3[span * i + states * k]);
+
+
+	     for(l = 0; l < states; l++)
+	       v[l] = 0;
+
+
+	     for(l = 0; l < states; l++)
+	       {		 
+
+		 al = 0.0;
+		 ar = 0.0;
+
+		 for(j = 0; j < states; j++)
+		   {
+		     al += vl[j] * left[k * statesSquare + l * states + j];
+		     ar += vr[j] * right[k * statesSquare + l * states + j];
+		   }
+
+		 x1px2 = al * ar;
+
+		 for(j = 0; j < states; j++)
+		   v[j] += x1px2 * extEV[states * l + j];
+
+	       }
+	   }
+	 
+	 v = &(x3[span * i]);
+	 scale = 1;
+	 for(l = 0; scale && (l < span); l++)
+	   scale = ((ABS(v[l]) <  minlikelihood));
+
+	 if(scale)
+	   {  
+	     for(l = 0; l < span; l++)
+	       v[l] *= twotothe256;
+	     
+	     addScale += wgt[i];	    	  
+	   }
+       }
+      break;
+    default:
+      assert(0);
+    }
+
+  /* as above, increment the global counter that counts scaling multiplications by the scaling multiplications 
+     carried out for computing the likelihood array at node p */
+
+  *scalerIncrement = addScale;
+}
+
+#endif
+
+
+    
+/* The function below computes partial traversals only down to the point/node in the tree where the 
+   conditional likelihhod vector summarizing a subtree is already oriented in the correct direction */
+
+void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal)
+{
+  /* if it's a tip we don't do anything */
+
+  if(isTip(p->number, maxTips))
+    return;
+
+  {
+    int 
+      i;
+    
+    /* get the left and right descendants */
+
+    nodeptr 
+      q = p->next->back,
+      r = p->next->next->back;   
+
+    /* if the left and right children are tips there is not that much to do */
+
+    if(isTip(r->number, maxTips) && isTip(q->number, maxTips))
+      {
+	/* fix the orientation of p->x */
+	
+	if (! p->x)
+	  getxnode(p);	
+	assert(p->x);
+	  
+	/* add the current node triplet p,q,r to the traversal descriptor */
+
+	ti[*counter].tipCase = TIP_TIP;
+	ti[*counter].pNumber = p->number;
+	ti[*counter].qNumber = q->number;
+	ti[*counter].rNumber = r->number;
+
+	/* copy branches to traversal descriptor */
+
+	for(i = 0; i < numBranches; i++)
+	  {	    
+	    ti[*counter].qz[i] = q->z[i];
+	    ti[*counter].rz[i] = r->z[i];
+	  }
+
+	/* increment length counter */
+
+	*counter = *counter + 1;
+      }
+    else
+      {
+	/* if either r or q are tips, flip them to make sure that the tip data is stored 
+	   for q */
+
+	if(isTip(r->number, maxTips) || isTip(q->number, maxTips))
+	  {	    
+	    if(isTip(r->number, maxTips))
+	      {
+		nodeptr 
+		  tmp = r;
+		r = q;
+		q = tmp;
+	      }
+	   
+	    /* if the orientation of the liklihood vector at r is not correct we need to re-compute it 
+	       and descend into its subtree to figure out if there are more vrctors in there to re-compute and 
+	       re-orient */
+
+	    if(!r->x || !partialTraversal)
+	      computeTraversalInfo(r, ti, counter, maxTips, numBranches, partialTraversal);
+	    if(! p->x)
+	      getxnode(p);	 
+	    
+	    /* make sure that everything is consistent now */
+
+	    assert(p->x && r->x);
+
+	    /* store data for p, q, r in the traversal descriptor */
+
+	    ti[*counter].tipCase = TIP_INNER;
+	    ti[*counter].pNumber = p->number;
+	    ti[*counter].qNumber = q->number;
+	    ti[*counter].rNumber = r->number;
+
+	    for(i = 0; i < numBranches; i++)
+	      {	
+		ti[*counter].qz[i] = q->z[i];
+		ti[*counter].rz[i] = r->z[i];
+	      }
+
+	    *counter = *counter + 1;
+	  }
+	else
+	  {
+	    /* same as above, only now q and r are inner nodes. Hence if they are not 
+	       oriented correctly they will need to be recomputed and we need to descend into the 
+	       respective subtrees to check if everything is consistent in there, potentially expanding 
+	       the traversal descriptor */
+	   
+	    if(! q->x || !partialTraversal)
+	      computeTraversalInfo(q, ti, counter, maxTips, numBranches, partialTraversal);
+	    if(! r->x || !partialTraversal)
+	      computeTraversalInfo(r, ti, counter, maxTips, numBranches, partialTraversal);
+	    if(! p->x)
+	      getxnode(p);
+	     
+	    /* check that the vector orientations are consistent now */
+
+	    assert(p->x && r->x && q->x);
+
+	    ti[*counter].tipCase = INNER_INNER;
+	    ti[*counter].pNumber = p->number;
+	    ti[*counter].qNumber = q->number;
+	    ti[*counter].rNumber = r->number;
+
+	    for(i = 0; i < numBranches; i++)
+	      {	
+		ti[*counter].qz[i] = q->z[i];
+		ti[*counter].rz[i] = r->z[i];
+	      }
+
+	    *counter = *counter + 1;
+	  }
+      }
+  }
+}
+
+/* below are the optimized unrolled, and vectorized versions of the above generi cfunctions 
+   for computing the conditional likelihood at p given child nodes q and r. The actual implementation is located at the end/bottom of this 
+   file.
+*/
+
+#if (defined(_OPTIMIZED_FUNCTIONS) && !defined(__AVX))
+
+static void newviewGTRGAMMAPROT_LG4(int tipCase,
+				    double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+				    int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				    int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+static void newviewGTRGAMMA_GAPPED_SAVE(int tipCase,
+					double *x1_start, double *x2_start, double *x3_start,
+					double *EV, double *tipVector,
+					unsigned char *tipX1, unsigned char *tipX2,
+					const int n, double *left, double *right, int *wgt, int *scalerIncrement, 
+					unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+					double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn);
+
+static void newviewGTRGAMMA(int tipCase,
+			    double *x1_start, double *x2_start, double *x3_start,
+			    double *EV, double *tipVector,
+			    unsigned char *tipX1, unsigned char *tipX2,
+			    const int n, double *left, double *right, int *wgt, int *scalerIncrement
+			    );
+
+static void newviewGTRCAT( int tipCase,  double *EV,  int *cptr,
+			   double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+			   unsigned char *tipX1, unsigned char *tipX2,
+			   int n,  double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+static void newviewGTRCAT_SAVE( int tipCase,  double *EV,  int *cptr,
+				double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+				unsigned char *tipX1, unsigned char *tipX2,
+				int n,  double *left, double *right, int *wgt, int *scalerIncrement,
+				unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+static void newviewGTRGAMMAPROT_GAPPED_SAVE(int tipCase,
+					    double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+					    unsigned char *tipX1, unsigned char *tipX2,
+					    int n, double *left, double *right, int *wgt, int *scalerIncrement, 
+					    unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,  
+					    double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+					    );
+
+static void newviewGTRGAMMAPROT(int tipCase,
+				double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+				unsigned char *tipX1, unsigned char *tipX2,
+				int n, double *left, double *right, int *wgt, int *scalerIncrement);
+static void newviewGTRCATPROT(int tipCase, double *extEV,
+			      int *cptr,
+			      double *x1, double *x2, double *x3, double *tipVector,
+			      unsigned char *tipX1, unsigned char *tipX2,
+			      int n, double *left, double *right, int *wgt, int *scalerIncrement );
+
+static void newviewGTRCATPROT_SAVE(int tipCase, double *extEV,
+				   int *cptr,
+				   double *x1, double *x2, double *x3, double *tipVector,
+				   unsigned char *tipX1, unsigned char *tipX2,
+				   int n, double *left, double *right, int *wgt, int *scalerIncrement,
+				   unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				   double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+#endif
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+static void newviewGTRCAT_BINARY( int tipCase,  double *EV,  int *cptr,
+                                  double *x1_start,  double *x2_start,  double *x3_start,  double *tipVector,
+                                  int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+                                  int n,  double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+static void newviewGTRGAMMA_BINARY(int tipCase,
+				   double *x1_start, double *x2_start, double *x3_start,
+				   double *EV, double *tipVector,
+				   int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				   const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling
+				   );
+
+#endif
+
+boolean isGap(unsigned int *x, int pos)
+{
+  return (x[pos / 32] & mask32[pos % 32]);
+}
+
+boolean noGap(unsigned int *x, int pos)
+{
+  return (!(x[pos / 32] & mask32[pos % 32]));
+}
+
+/* now this is the function that just iterates over the length of the traversal descriptor and 
+   just computes the conditional likelihhod arrays in the order given by the descriptor.
+   So in a sense, this function has no clue that there is any tree-like structure 
+   in the traversal descriptor, it just operates on an array of structs of given length */ 
+
+
+extern const char inverseMeaningDNA[16]; 
+
+void newviewIterative (tree *tr, int startIndex)
+{
+  traversalInfo 
+    *ti   = tr->td[0].ti;
+
+  int 
+    i;
+
+    /* loop over traversal descriptor length. Note that on average we only re-compute the conditionals on 3 -4
+       nodes in RAxML */
+
+  for(i = startIndex; i < tr->td[0].count; i++)
+    {
+      traversalInfo 
+	*tInfo = &ti[i];
+      
+      int 
+	model;
+      
+#ifdef _USE_OMP
+#pragma omp parallel for
+#endif
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  /* check if this partition has to be processed now - otherwise no need to compute P matrix */
+	  if(!tr->td[0].executeModel[model] || tr->partitionData[model].width == 0)
+	    continue;
+
+	  int
+	    categories,
+	    states = tr->partitionData[model].states;
+	  
+	  double
+	    qz,
+	    rz,
+	    *rateCategories,
+	    *left = tr->partitionData[model].left,
+	    *right = tr->partitionData[model].right;
+	  
+	  /* figure out what kind of rate heterogeneity approach we are using */
+	  if(tr->rateHetModel == CAT)
+	    {
+	      rateCategories = tr->partitionData[model].perSiteRates;
+	      categories = tr->partitionData[model].numberOfCategories;
+	    }
+	  else
+	    {
+	      rateCategories = tr->partitionData[model].gammaRates;
+	      categories = 4;
+	    }
+	  
+	  /* if we use per-partition branch length optimization
+	     get the branch length of partition model and take the log otherwise
+	     use the joint branch length among all partitions that is always stored
+	     at index [0] */
+	  if(tr->numBranches > 1)
+	    {
+	      qz = tInfo->qz[model];
+	      rz = tInfo->rz[model];
+	    }
+	  else
+	    {
+	      qz = tInfo->qz[0];
+	      rz = tInfo->rz[0];
+	    }
+	  
+	  qz = (qz > zmin) ? log(qz) : log(zmin);
+	  rz = (rz > zmin) ? log(rz) : log(zmin);
+
+	  /* compute the left and right P matrices */
+#ifdef __MIC_NATIVE
+	  switch (tr->partitionData[model].states)
+	    {
+	    case 2: /* BINARY data */
+	      assert(0 && "Binary data model is not implemented on Intel MIC");
+	      break;
+	    case 4: /* DNA data */
+	      {
+		makeP_DNA_MIC(qz, rz, rateCategories,   tr->partitionData[model].EI,
+			      tr->partitionData[model].EIGN, categories,
+			      left, right, tr->saveMemory, tr->maxCategories);
+		
+		precomputeTips_DNA_MIC(tInfo->tipCase, tr->partitionData[model].tipVector,
+				       left, right,
+				       tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+				       categories);
+	      } 
+	      break;
+	    case 20: /* AA data */
+	      {
+		if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+		  {
+		    makeP_PROT_LG4_MIC(qz, rz, tr->partitionData[model].gammaRates,
+				       tr->partitionData[model].EI_LG4, tr->partitionData[model].EIGN_LG4,
+				       4, left, right);
+		    
+		    precomputeTips_PROT_LG4_MIC(tInfo->tipCase, tr->partitionData[model].tipVector_LG4,
+						left, right,
+						tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+						categories);
+		  }
+		else
+		  {
+		    makeP_PROT_MIC(qz, rz, rateCategories, tr->partitionData[model].EI,
+				   tr->partitionData[model].EIGN, categories,
+				   left, right, tr->saveMemory, tr->maxCategories);
+		    
+		    precomputeTips_PROT_MIC(tInfo->tipCase, tr->partitionData[model].tipVector,
+					    left, right,
+					    tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+					    categories);
+		  }
+	      } 
+	      break;
+	    default:
+	      assert(0);
+	    }
+#else
+	  if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+	    makeP_FlexLG4(qz, rz, tr->partitionData[model].gammaRates,
+			  tr->partitionData[model].EI_LG4,
+			  tr->partitionData[model].EIGN_LG4,
+			  4, left, right, 20);
+	  else
+	    makeP(qz, rz, rateCategories,   tr->partitionData[model].EI,
+		  tr->partitionData[model].EIGN, categories,
+		  left, right, tr->saveMemory, tr->maxCategories, states);
+#endif
+      } // for model
+
+
+      /* now loop over all partitions for nodes p, q, and r of the current traversal vector entry */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+      {
+	int
+	  m,
+	  model,
+	  maxModel;
+	
+#ifdef _USE_OMP
+	maxModel = tr->maxModelsPerThread;
+#else
+	maxModel = tr->NumberOfModels;
+#endif
+
+	for(m = 0; m < maxModel; m++)
+	  {
+	    size_t
+	      width  = 0,
+	      offset = 0;
+	    
+	    double
+	      *left     = (double*)NULL,
+	      *right    = (double*)NULL;
+	    
+	    unsigned int
+	      *globalScaler = (unsigned int*)NULL;
+
+#ifdef _USE_OMP
+	    int
+	      tid = omp_get_thread_num();
+
+	    /* check if this thread should process this partition */
+	    Assign* 
+	      pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+	    if(pAss)
+	      {
+		assert(tid == pAss->procId);
+		
+		model  = pAss->partitionId;
+		width  = pAss->width;
+		offset = pAss->offset;
+		
+		left  = tr->partitionData[model].left;
+		right = tr->partitionData[model].right;
+		globalScaler = tr->partitionData[model].threadGlobalScaler[tid];
+	      }
+	    else
+	      break;
+#else
+	    model = m;	    
+
+	    /* number of sites in this partition */
+	    width  = (size_t)tr->partitionData[model].width;
+	    offset = 0;
+
+	    /* set the pointers to the left and right P matrices to the pre-allocated memory space for storing them */
+	    
+	    left  = tr->partitionData[model].left;
+	    right = tr->partitionData[model].right;
+	    globalScaler = tr->partitionData[model].globalScaler;
+#endif
+
+	    /* this conditional statement is exactly identical to what we do in evaluateIterative */
+	    if(tr->td[0].executeModel[model] && width > 0)
+	      {	      
+		double
+		  *x1_start = (double*)NULL,
+		  *x2_start = (double*)NULL,
+		  *x3_start = (double*)NULL, //tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1],
+		  *x1_gapColumn = (double*)NULL,
+		  *x2_gapColumn = (double*)NULL,
+		  *x3_gapColumn = (double*)NULL;
+
+		int
+		  scalerIncrement = 0,
+		
+		  /* integer wieght vector with pattern compression weights */
+		  *wgt = tr->partitionData[model].wgt + offset,
+
+		  /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+		  *rateCategory = tr->partitionData[model].rateCategory + offset;
+
+		unsigned int
+		  *x1_gap = (unsigned int*)NULL,
+		  *x2_gap = (unsigned int*)NULL,
+		  *x3_gap = (unsigned int*)NULL;
+
+		unsigned char
+		  *tipX1 = (unsigned char *)NULL,
+		  *tipX2 = (unsigned char *)NULL;	
+	      
+		size_t
+		  gapOffset = 0,
+		  rateHet = discreteRateCategories(tr->rateHetModel),
+		  
+		  /* get the number of states in the data stored in partition model */
+		  
+		  states = (size_t)tr->partitionData[model].states,	
+		  
+		  /* span for single alignment site (in doubles!) */
+		  span = rateHet * states,
+		  x_offset = offset * (size_t)span,
+		  
+		  
+		  /* get the length of the current likelihood array stored at node p. This is 
+		     important mainly for the SEV-based memory saving option described in here:
+		     
+		     F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees".
+		     
+		     So tr->partitionData[model].xSpaceVector[i] provides the length of the allocated conditional array of partition model 
+		     and node i 
+		  */
+		
+		  availableLength = tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)],
+		  requiredLength = 0;	     
+
+		x3_start = tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] + x_offset;
+
+		/* memory saving stuff, not important right now, but if you are interested ask Fernando */
+		if(tr->saveMemory)
+		  {
+		    size_t
+		      j,
+		      setBits = 0;		  
+		    
+		    gapOffset = states * (size_t)getUndetermined(tr->partitionData[model].dataType);
+		    
+		    x1_gap = &(tr->partitionData[model].gapVector[tInfo->qNumber * tr->partitionData[model].gapVectorLength]);
+		    x2_gap = &(tr->partitionData[model].gapVector[tInfo->rNumber * tr->partitionData[model].gapVectorLength]);
+		    x3_gap = &(tr->partitionData[model].gapVector[tInfo->pNumber * tr->partitionData[model].gapVectorLength]);		      		  
+		    
+		    for(j = 0; j < (size_t)tr->partitionData[model].gapVectorLength; j++)
+		      {		     
+			x3_gap[j] = x1_gap[j] & x2_gap[j];
+			setBits += (size_t)(precomputed16_bitcount(x3_gap[j], tr->bits_in_16bits));		      
+		      }
+		    
+		    requiredLength = (width - setBits)  * rateHet * states * sizeof(double);		
+		  }
+		else
+		  /* if we are not trying to save memory the space required to store an inner likelihood array 
+		     is the number of sites in the partition times the number of states of the data type in the partition 
+		     times the number of discrete GAMMA rates (1 for CAT essentially) times 8 bytes */
+		  requiredLength  =  width * rateHet * states * sizeof(double);
+		
+		/* Initially, even when not using memory saving no space is allocated for inner likelihood arrats hence 
+		   availableLength will be zero at the very first time we traverse the tree.
+		   Hence we need to allocate something here */
+#ifndef _USE_OMP
+		if(requiredLength != availableLength)
+		  {
+		    /* if there is a vector of incorrect length assigned here i.e., x3 != NULL we must free
+		       it first */
+		    if(x3_start)
+		      free(x3_start);
+		    
+		    /* allocate memory: note that here we use a byte-boundary aligned malloc, because we need the vectors
+		       to be aligned at 16 BYTE (SSE3) or 32 BYTE (AVX) boundaries! */
+		    
+		    x3_start = (double*)malloc_aligned(requiredLength);
+		    
+		    /* update the data structures for consistent bookkeeping */
+		    tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] = x3_start;
+		    tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)] = requiredLength;
+		  }
+#endif
+
+		/* now just set the pointers for data accesses in the newview() implementations above to the corresponding values 
+		   according to the tip case */
+		
+		switch(tInfo->tipCase)
+		  {
+		  case TIP_TIP:		  
+		    tipX1    = tr->partitionData[model].yVector[tInfo->qNumber] + offset;
+		    tipX2    = tr->partitionData[model].yVector[tInfo->rNumber] + offset;
+		    
+		    if(tr->saveMemory)
+		      {
+			x1_gapColumn   = &(tr->partitionData[model].tipVector[gapOffset]);
+			x2_gapColumn   = &(tr->partitionData[model].tipVector[gapOffset]);		    
+			x3_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];		    
+		      }
+		    
+		    break;
+		  case TIP_INNER:		 
+		    tipX1    =  tr->partitionData[model].yVector[tInfo->qNumber] + offset;
+		    x2_start = tr->partitionData[model].xVector[tInfo->rNumber - tr->mxtips - 1] + x_offset;
+		    
+		    if(tr->saveMemory)
+		      {	
+			x1_gapColumn   = &(tr->partitionData[model].tipVector[gapOffset]);	     
+			x2_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->rNumber - tr->mxtips - 1) * states * rateHet];
+			x3_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];
+		      }
+		    
+		    break;
+		  case INNER_INNER:		 		 
+		    x1_start       = tr->partitionData[model].xVector[tInfo->qNumber - tr->mxtips - 1] + x_offset;
+		    x2_start       = tr->partitionData[model].xVector[tInfo->rNumber - tr->mxtips - 1] + x_offset;
+		    
+		    if(tr->saveMemory)
+		      {
+			x1_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->qNumber - tr->mxtips - 1) * states * rateHet];
+			x2_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->rNumber - tr->mxtips - 1) * states * rateHet];
+			x3_gapColumn   = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];
+		      }
+		    
+		    break;
+		  default:
+		    assert(0);
+		  }
+		
+#ifndef _OPTIMIZED_FUNCTIONS
+
+	      /* memory saving not implemented */
+
+	      assert(!tr->saveMemory);
+
+	      /* figure out if we need to compute the CAT or GAMMA model of rate heterogeneity */
+
+	      if(tr->rateHetModel == CAT)
+		newviewCAT_FLEX(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+				x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+				tipX1, tipX2,
+				width, left, right, wgt, &scalerIncrement, states);
+	      else
+		newviewGAMMA_FLEX(tInfo->tipCase,
+				  x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+				  tipX1, tipX2,
+				  width, left, right, wgt, &scalerIncrement, states, getUndetermined(tr->partitionData[model].dataType) + 1);
+
+#else
+	      /* dedicated highly optimized functions. Analogously to the functions in evaluateGeneric() 
+		 we also siwtch over the state number */
+
+	      switch(states)
+		{		
+		case 2:
+#ifdef __MIC_NATIVE
+ 	      assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+		  assert(!tr->saveMemory);
+		  if(tr->rateHetModel == CAT)
+		    newviewGTRCAT_BINARY(tInfo->tipCase,  tr->partitionData[model].EV,  rateCategory,
+					 x1_start,  x2_start,  x3_start, tr->partitionData[model].tipVector,
+					 (int*)NULL, tipX1, tipX2,
+					 width, left, right, wgt, &scalerIncrement, TRUE);
+		  else
+		    newviewGTRGAMMA_BINARY(tInfo->tipCase,
+					   x1_start, x2_start, x3_start,
+					   tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+					   (int *)NULL, tipX1, tipX2,
+					   width, left, right, wgt, &scalerIncrement, TRUE);		 
+#endif
+		  break;
+		case 4:	/* DNA */
+		  if(tr->rateHetModel == CAT)
+		    {		    		     
+		      if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#elif __AVX
+			newviewGTRCAT_AVX_GAPPED_SAVE(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+						      x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+						      (int*)NULL, tipX1, tipX2,
+						      width, left, right, wgt, &scalerIncrement, TRUE, x1_gap, x2_gap, x3_gap,
+						      x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#else
+			newviewGTRCAT_SAVE(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+					   x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+					   tipX1, tipX2,
+					   width, left, right, wgt, &scalerIncrement, x1_gap, x2_gap, x3_gap,
+					   x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#endif
+		      else
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#elif __AVX
+			newviewGTRCAT_AVX(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+					  x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+					  tipX1, tipX2,
+					  width, left, right, wgt, &scalerIncrement);
+#else
+			newviewGTRCAT(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+				      x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+				      tipX1, tipX2,
+				      width, left, right, wgt, &scalerIncrement);
+#endif
+		    }
+		  else
+		    {
+		      
+		       
+		       if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+		     assert(0 && "Memory saving is not implemented on Intel MIC");
+#elif __AVX
+			 newviewGTRGAMMA_AVX_GAPPED_SAVE(tInfo->tipCase,
+							 x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector, (int*)NULL,
+							 tipX1, tipX2,
+							 width, left, right, wgt, &scalerIncrement, TRUE,
+							 x1_gap, x2_gap, x3_gap, 
+							 x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#else
+		       newviewGTRGAMMA_GAPPED_SAVE(tInfo->tipCase,
+						   x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+						   tipX1, tipX2,
+						   width, left, right, wgt, &scalerIncrement, 
+						   x1_gap, x2_gap, x3_gap, 
+						   x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#endif
+		       else
+#ifdef __MIC_NATIVE
+			 newviewGTRGAMMA_MIC(tInfo->tipCase,
+				  x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].tipVector,
+				  tipX1, tipX2,
+				  width, left, right, wgt, &scalerIncrement,
+				  tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+			 newviewGTRGAMMA_AVX(tInfo->tipCase,
+					     x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+					     tipX1, tipX2,
+					     width, left, right, wgt, &scalerIncrement);
+#else
+		       newviewGTRGAMMA(tInfo->tipCase,
+					 x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+					 tipX1, tipX2,
+					 width, left, right, wgt, &scalerIncrement);
+#endif
+		    }
+		
+		  break;		    
+		case 20: /* proteins */
+
+		  if(tr->rateHetModel == CAT)
+		    {		     
+		      if(tr->saveMemory)
+			{
+#ifdef __MIC_NATIVE
+		     assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#elif __AVX
+			  newviewGTRCATPROT_AVX_GAPPED_SAVE(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+							    x1_start, x2_start, x3_start, tr->partitionData[model].tipVector, (int*)NULL,
+							    tipX1, tipX2, width, left, right, wgt, &scalerIncrement, TRUE, x1_gap, x2_gap, x3_gap,
+							    x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#else
+			  newviewGTRCATPROT_SAVE(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+						 x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+						 tipX1, tipX2, width, left, right, wgt, &scalerIncrement, x1_gap, x2_gap, x3_gap,
+						 x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#endif
+			}
+		      else
+			{			 			
+#ifdef __MIC_NATIVE
+		     assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#elif __AVX
+			  newviewGTRCATPROT_AVX(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+						x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+						tipX1, tipX2, width, left, right, wgt, &scalerIncrement);
+#else
+			  newviewGTRCATPROT(tInfo->tipCase,  tr->partitionData[model].EV, rateCategory,
+					    x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+					    tipX1, tipX2, width, left, right, wgt, &scalerIncrement);			
+#endif
+			}
+		    }
+		  else
+		    {		    			 			  
+		      if(tr->saveMemory)
+			{
+#ifdef __MIC_NATIVE
+		     assert(0 && "Memory saving is not implemented on Intel MIC");
+#elif __AVX
+			  newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(tInfo->tipCase,
+							      x1_start, x2_start, x3_start,
+							      tr->partitionData[model].EV,
+							      tr->partitionData[model].tipVector, (int*)NULL,
+							      tipX1, tipX2,
+							      width, left, right, wgt, &scalerIncrement, TRUE,
+							      x1_gap, x2_gap, x3_gap,
+							      x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#else
+			  newviewGTRGAMMAPROT_GAPPED_SAVE(tInfo->tipCase,
+							  x1_start, x2_start, x3_start,
+							  tr->partitionData[model].EV,
+							  tr->partitionData[model].tipVector,
+							  tipX1, tipX2,
+							  width, left, right, wgt, &scalerIncrement,
+							  x1_gap, x2_gap, x3_gap,
+							  x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#endif
+			}
+		      else
+			{
+			  if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+			    {
+#ifdef __MIC_NATIVE
+			      newviewGTRGAMMAPROT_LG4_MIC(tInfo->tipCase,
+							x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].mic_tipVector,
+							tipX1, tipX2,
+							width, left, right, wgt, &scalerIncrement,
+							tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+			      newviewGTRGAMMAPROT_AVX_LG4(tInfo->tipCase,
+							  x1_start, x2_start, x3_start,
+							  tr->partitionData[model].EV_LG4,
+							  tr->partitionData[model].tipVector_LG4,
+							  (int*)NULL, tipX1, tipX2,
+							  width, left, right, wgt, &scalerIncrement, TRUE);
+#else
+			      newviewGTRGAMMAPROT_LG4(tInfo->tipCase,
+						      x1_start, x2_start, x3_start,
+						      tr->partitionData[model].EV_LG4,
+						      tr->partitionData[model].tipVector_LG4,
+						      (int*)NULL, tipX1, tipX2,
+						      width, left, right, 
+						      wgt, &scalerIncrement, TRUE);
+#endif			    
+			    }
+			  else
+			    {
+#ifdef __MIC_NATIVE
+			      newviewGTRGAMMAPROT_MIC(tInfo->tipCase,
+							x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].mic_tipVector,
+							tipX1, tipX2,
+							width, left, right, wgt, &scalerIncrement,
+							tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+			      newviewGTRGAMMAPROT_AVX(tInfo->tipCase,
+						      x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+						      tipX1, tipX2,
+						      width, left, right, wgt, &scalerIncrement);
+#else
+			      newviewGTRGAMMAPROT(tInfo->tipCase,
+						  x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+						  tipX1, tipX2,
+						  width, left, right, wgt, &scalerIncrement);
+#endif
+			    }
+			}
+		    }	
+		  break;	
+		default:
+		  assert(0);
+		}
+#endif
+
+	      /* important step, here we essentiallt recursively compute the number of scaling multiplications 
+		 at node p: it's the sum of the number of scaling multiplications already conducted 
+		 for computing nodes q and r plus the scaling multiplications done at node p */
+
+	      globalScaler[tInfo->pNumber] =
+		globalScaler[tInfo->qNumber] +
+		globalScaler[tInfo->rNumber] +
+		(unsigned int)scalerIncrement;
+
+	      /* check that we are not getting an integer overflow ! */
+
+	      assert(globalScaler[tInfo->pNumber] < INT_MAX);
+	    }	
+	} // for model
+    }  // omp parallel block
+  }  // for traversal
+}
+
+
+/* here is the generic function that could be called from the user program 
+   it re-computes the vector at node p (regardless of whether it's orientation is 
+   correct and then it also re-computes reciursively the likelihood arrays 
+   in the subtrees of p as needed and if needed */
+
+void newviewGeneric (tree *tr, nodeptr p, boolean masked)
+{  
+  /* if it's a tip there is nothing to do */
+
+  if(isTip(p->number, tr->mxtips))
+    return;
+  
+  /* the first entry of the traversal descriptor is always reserved for evaluate or branch length optimization calls,
+     hence we start filling the array at the second entry with index one. This is not very nice and should be fixed 
+     at some point */
+
+  tr->td[0].count = 0;
+
+  /* compute the traversal descriptor */
+  computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+
+  /* the traversal descriptor has been recomputed -> not sure if it really always changes, something to 
+     optimize in the future */
+  tr->td[0].traversalHasChanged = TRUE;
+  
+  /* We do a masked newview, i.e., do not execute newvies for each partition, when for example 
+     doing a branch length optimization on the entire tree when branches are estimated on a per partition basis.
+
+     you may imagine that for partition 5 the branch length optimization has already converged whereas 
+     for partition 6 we still need to go over the tree again.
+
+     This is explained in more detail in:
+
+     A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009
+
+     The external boolean array tr->partitionConverged[] contains exactly that information and is copied 
+     to executeModel and subsequently to the executeMask of the traversal descriptor 
+
+  */
+
+
+  if(masked)
+    {
+      int model;
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  if(tr->partitionConverged[model])
+	    tr->executeModel[model] = FALSE;
+	  else
+	    tr->executeModel[model] = TRUE;
+	}
+    }
+
+  /* if there is something to re-compute */
+
+  if(tr->td[0].count > 0)
+    {
+      /* store execute mask in traversal descriptor */
+
+      storeExecuteMaskInTraversalDescriptor(tr);           
+      newviewIterative(tr, 0);
+    }
+
+  /* clean up */
+
+  if(masked)
+    {
+      int model;
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	tr->executeModel[model] = TRUE;
+    }
+
+  tr->td[0].traversalHasChanged = FALSE;
+}
+
+
+/* optimized function implementations */
+
+#if (defined(_OPTIMIZED_FUNCTIONS) && !defined(__AVX))
+
+static void newviewGTRGAMMA_GAPPED_SAVE(int tipCase,
+					double *x1_start, double *x2_start, double *x3_start,
+					double *EV, double *tipVector,
+					unsigned char *tipX1, unsigned char *tipX2,
+					const int n, double *left, double *right, int *wgt, int *scalerIncrement, 
+					unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap, 
+					double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn)
+{
+  int     
+    i, 
+    j, 
+    k, 
+    l,
+    addScale = 0, 
+    scaleGap = 0;
+  
+  double
+    *x1,
+    *x2,
+    *x3,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start,       
+    max,
+    maxima[2] __attribute__ ((aligned (BYTE_ALIGNMENT))),        
+    EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));      
+    
+  __m128d 
+    values[8],
+    EVV[8];  
+
+  for(k = 0; k < 4; k++)
+    for (l=0; l < 4; l++)
+      EV_t[4 * l + k] = EV[4 * k + l];
+
+  for(k = 0; k < 8; k++)
+    EVV[k] = _mm_load_pd(&EV_t[k * 2]);      
+ 
+  
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT))), *uX2, umpX2[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+	for (i = 1; i < 16; i++)
+	  {	    
+	    __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+	    __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));	   
+
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{			 	 
+		  __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+		}
+	  
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{
+		  __m128d left1 = _mm_load_pd(&right[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&right[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX2[i*16 + j*4 + k], acc);
+		 
+		}
+	  }   		  
+	
+	uX1 = &umpX1[240];
+	uX2 = &umpX2[240];	   	    	    
+	
+	for (j = 0; j < 4; j++)
+	  {				 		  		  		   
+	    __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+	    __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+	    	    
+	    __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+	    __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+	    
+	    __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+	    __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );		    		    		   
+	    
+	    __m128d EV_t_l0_k0 = EVV[0];
+	    __m128d EV_t_l0_k2 = EVV[1];
+	    __m128d EV_t_l1_k0 = EVV[2];
+	    __m128d EV_t_l1_k2 = EVV[3];
+	    __m128d EV_t_l2_k0 = EVV[4];
+	    __m128d EV_t_l2_k2 = EVV[5];
+	    __m128d EV_t_l3_k0 = EVV[6]; 
+	    __m128d EV_t_l3_k2 = EVV[7];
+	    
+	    EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	    EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	    
+	    EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	    EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	    
+	    EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	    
+	    EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	    EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	    
+	    EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	    EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	    EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	    
+	    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+	    	  
+	    _mm_store_pd( &x3_gapColumn[j * 4 + 0], EV_t_l0_k0 );
+	    _mm_store_pd( &x3_gapColumn[j * 4 + 2], EV_t_l2_k0 );	   
+	  }  
+	
+       
+	x3 = x3_start;
+	
+	for (i = 0; i < n; i++)
+	  {	    
+	    if(!(x3_gap[i / 32] & mask32[i % 32]))	     
+	      {
+		uX1 = &umpX1[16 * tipX1[i]];
+		uX2 = &umpX2[16 * tipX2[i]];	   	    	    		
+		
+		for (j = 0; j < 4; j++)
+		  {				 		  		  		   
+		    __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+		    __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+		    
+		    
+		    __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+		    __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+		    
+		    
+		    //
+		    // multiply left * right
+		    //
+		    
+		    __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+		    __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );
+		    
+		    
+		    //
+		    // multiply with EV matrix (!?)
+		    //
+		    
+		    __m128d EV_t_l0_k0 = EVV[0];
+		    __m128d EV_t_l0_k2 = EVV[1];
+		    __m128d EV_t_l1_k0 = EVV[2];
+		    __m128d EV_t_l1_k2 = EVV[3];
+		    __m128d EV_t_l2_k0 = EVV[4];
+		    __m128d EV_t_l2_k2 = EVV[5];
+		    __m128d EV_t_l3_k0 = EVV[6]; 
+		    __m128d EV_t_l3_k2 = EVV[7];
+		    
+		    EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+		    EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+		    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+		    
+		    EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+		    EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+		    
+		    EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+		    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+		    
+		    EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+		    EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+		    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+		    
+		    EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+		    EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+		    EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+		    
+		    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+		    
+		    _mm_store_pd( &x3[j * 4 + 0], EV_t_l0_k0 );
+		    _mm_store_pd( &x3[j * 4 + 2], EV_t_l2_k0 );
+		  }
+		
+		x3 += 16;
+	      }
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {	
+	double 
+	  *uX1, 
+	  umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));		 
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+	    __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));	   
+
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);		 
+		}
+	  }
+
+	{
+	  __m128d maxv =_mm_setzero_pd();
+	  
+	  scaleGap = 0;
+	  
+	  x2 = x2_gapColumn;			 
+	  x3 = x3_gapColumn;
+	  
+	  uX1 = &umpX1[240];	     
+	  
+	  for (j = 0; j < 4; j++)
+	    {		     		   
+	      double *x2_p = &x2[j*4];
+	      double *right_k0_p = &right[j*16];
+	      double *right_k1_p = &right[j*16 + 1*4];
+	      double *right_k2_p = &right[j*16 + 2*4];
+	      double *right_k3_p = &right[j*16 + 3*4];
+	      __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+	      __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+	      
+	      __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+	      __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+	      __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+	      __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+	      __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+	      __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+	      __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+	      __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+	      	      
+	      right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	      right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	      
+	      right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	      right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	      
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	      right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	      	       
+	      right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	      right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	      	       
+	      right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	      right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	      
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	      right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+	      
+	      __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+	      __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+	      
+	      __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+	      __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+	      
+	      __m128d EV_t_l0_k0 = EVV[0];
+	      __m128d EV_t_l0_k2 = EVV[1];
+	      __m128d EV_t_l1_k0 = EVV[2];
+	      __m128d EV_t_l1_k2 = EVV[3];
+	      __m128d EV_t_l2_k0 = EVV[4];
+	      __m128d EV_t_l2_k2 = EVV[5];
+	      __m128d EV_t_l3_k0 = EVV[6]; 
+	      __m128d EV_t_l3_k2 = EVV[7];
+	      
+	      EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	      EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	      
+	      EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	      EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	      
+	      EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	      
+	      EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	      EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	      
+	      EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	      EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	      EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	      
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+	      
+	      values[j * 2]     = EV_t_l0_k0;
+	      values[j * 2 + 1] = EV_t_l2_k0;		   		   
+	      
+	      maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+	      maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));		   	     		   
+	    }
+
+	  
+	  _mm_store_pd(maxima, maxv);
+		 
+	  max = MAX(maxima[0], maxima[1]);
+	  
+	  if(max < minlikelihood)
+	    {
+	      scaleGap = 1;
+	      
+	      __m128d sv = _mm_set1_pd(twotothe256);
+	      
+	      _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+	      _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+	      _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+	      _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+	      _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+	      _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+	      _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+	      _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     	      	     
+	    }
+	  else
+	    {
+	      _mm_store_pd(&x3[0], values[0]);	   
+	      _mm_store_pd(&x3[2], values[1]);
+	      _mm_store_pd(&x3[4], values[2]);
+	      _mm_store_pd(&x3[6], values[3]);
+	      _mm_store_pd(&x3[8], values[4]);	   
+	      _mm_store_pd(&x3[10], values[5]);
+	      _mm_store_pd(&x3[12], values[6]);
+	      _mm_store_pd(&x3[14], values[7]);
+	    }
+	}		       	
+      	
+	x3 = x3_start;
+
+	for (i = 0; i < n; i++)
+	   {
+	     if((x3_gap[i / 32] & mask32[i % 32]))
+	       {	       
+		 if(scaleGap)
+		   {		    
+		       addScale += wgt[i];		     
+		   }
+	       }
+	     else
+	       {				 
+		 __m128d maxv =_mm_setzero_pd();		 
+		 
+		 if(x2_gap[i / 32] & mask32[i % 32])
+		   x2 = x2_gapColumn;
+		 else
+		   {
+		     x2 = x2_ptr;
+		     x2_ptr += 16;
+		   }
+		 		 		 
+		 uX1 = &umpX1[16 * tipX1[i]];	     
+		 
+		 
+		 for (j = 0; j < 4; j++)
+		   {		     		   
+		     double *x2_p = &x2[j*4];
+		     double *right_k0_p = &right[j*16];
+		     double *right_k1_p = &right[j*16 + 1*4];
+		     double *right_k2_p = &right[j*16 + 2*4];
+		     double *right_k3_p = &right[j*16 + 3*4];
+		     __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+		     __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+		     
+		     __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+		     __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+		     __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+		     __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+		     __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+		     __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+		     __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+		     __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+		     
+		     		     
+		     right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+		     right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+		     
+		     right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+		     right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+		     
+		     right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+		     right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+		     right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+		     
+		     
+		     right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+		     right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+		     
+		     
+		     right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+		     right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+		     
+		     right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+		     right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+		     right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+		     
+		     {
+		       //
+		       // load left side from tip vector
+		       //
+		       
+		       __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+		       __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+		       
+		       
+		       //
+		       // multiply left * right
+			   //
+		       
+		       __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+		       __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+		       
+		       
+		       //
+		       // multiply with EV matrix (!?)
+		       //		   		  
+		       
+		       __m128d EV_t_l0_k0 = EVV[0];
+		       __m128d EV_t_l0_k2 = EVV[1];
+		       __m128d EV_t_l1_k0 = EVV[2];
+		       __m128d EV_t_l1_k2 = EVV[3];
+		       __m128d EV_t_l2_k0 = EVV[4];
+		       __m128d EV_t_l2_k2 = EVV[5];
+		       __m128d EV_t_l3_k0 = EVV[6]; 
+		       __m128d EV_t_l3_k2 = EVV[7];
+		       
+		       
+		       EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+		       EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+		       EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+		       
+		       EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+		       EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+		       
+		       EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+		       EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+		       
+		       EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+		       EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+		       EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+		       
+		       EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+		       EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+		       EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+		       
+		       EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+		       
+		       values[j * 2]     = EV_t_l0_k0;
+		       values[j * 2 + 1] = EV_t_l2_k0;		   		   
+			   
+		       maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+		       maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));		   
+		     }		   
+		   }
+
+	     
+		 _mm_store_pd(maxima, maxv);
+		 
+		 max = MAX(maxima[0], maxima[1]);
+		 
+		 if(max < minlikelihood)
+		   {
+		     __m128d sv = _mm_set1_pd(twotothe256);
+		     
+		     _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+		     _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+		     _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+		     _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+		     _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+		     _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+		     _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+		     _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     
+		     
+		     
+		     addScale += wgt[i];
+		    
+		   }
+		 else
+		   {
+		     _mm_store_pd(&x3[0], values[0]);	   
+		     _mm_store_pd(&x3[2], values[1]);
+		     _mm_store_pd(&x3[4], values[2]);
+		     _mm_store_pd(&x3[6], values[3]);
+		     _mm_store_pd(&x3[8], values[4]);	   
+		     _mm_store_pd(&x3[10], values[5]);
+		     _mm_store_pd(&x3[12], values[6]);
+		     _mm_store_pd(&x3[14], values[7]);
+		   }		 
+		 
+		 x3 += 16;
+	       }
+	   }
+      }
+      break;
+    case INNER_INNER:         
+      {
+	__m128d maxv =_mm_setzero_pd();
+	
+	scaleGap = 0;
+	
+	x1 = x1_gapColumn;	     	    
+	x2 = x2_gapColumn;	    
+	x3 = x3_gapColumn;
+	
+	for (j = 0; j < 4; j++)
+	  {
+	    
+	    double *x1_p = &x1[j*4];
+	    double *left_k0_p = &left[j*16];
+	    double *left_k1_p = &left[j*16 + 1*4];
+	    double *left_k2_p = &left[j*16 + 2*4];
+	    double *left_k3_p = &left[j*16 + 3*4];
+	    
+	    __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+	    __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+	    
+	    __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+	    __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+	    __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+	    __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+	    __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+	    __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+	    __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+	    __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+	    
+	    left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	    left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	    
+	    left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	    left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	    
+	    left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	    left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	    left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	    
+	    left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	    left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	    
+	    left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	    left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	    
+	    left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	    left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	    left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	    
+	    
+	    double *x2_p = &x2[j*4];
+	    double *right_k0_p = &right[j*16];
+	    double *right_k1_p = &right[j*16 + 1*4];
+	    double *right_k2_p = &right[j*16 + 2*4];
+	    double *right_k3_p = &right[j*16 + 3*4];
+	    __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+	    __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+	    
+	    __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+	    __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+	    __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+	    __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+	    __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+	    __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+	    __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+	    __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+	    
+	    right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	    right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	    
+	    right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	    right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	    
+	    right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	    right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	    right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	    
+	    right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	    right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	    	    
+	    right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	    right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	    
+	    right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	    right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	    right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   		 		
+	    
+	    __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	    __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );		 		 	   
+	    
+	    __m128d EV_t_l0_k0 = EVV[0];
+	    __m128d EV_t_l0_k2 = EVV[1];
+	    __m128d EV_t_l1_k0 = EVV[2];
+	    __m128d EV_t_l1_k2 = EVV[3];
+	    __m128d EV_t_l2_k0 = EVV[4];
+	    __m128d EV_t_l2_k2 = EVV[5];
+	    __m128d EV_t_l3_k0 = EVV[6]; 
+	    __m128d EV_t_l3_k2 = EVV[7];
+	    
+	    EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	    EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	    
+	    EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	    EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	    
+	    EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	    
+	    EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	    EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	    
+	    EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	    EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	    EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	    
+	    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+	    
+	    
+	    values[j * 2] = EV_t_l0_k0;
+	    values[j * 2 + 1] = EV_t_l2_k0;            	   	    
+	    
+	    maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+	    maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+	  }
+		     
+	_mm_store_pd(maxima, maxv);
+	
+	max = MAX(maxima[0], maxima[1]);
+	
+	if(max < minlikelihood)
+	  {
+	    __m128d sv = _mm_set1_pd(twotothe256);
+	    
+	    scaleGap = 1;
+	    
+	    _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+	    _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+	    _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+	    _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+	    _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+	    _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+	    _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+	    _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     	    	 
+	  }
+	else
+	  {
+	    _mm_store_pd(&x3[0], values[0]);	   
+	    _mm_store_pd(&x3[2], values[1]);
+	    _mm_store_pd(&x3[4], values[2]);
+	    _mm_store_pd(&x3[6], values[3]);
+	    _mm_store_pd(&x3[8], values[4]);	   
+	    _mm_store_pd(&x3[10], values[5]);
+	    _mm_store_pd(&x3[12], values[6]);
+	    _mm_store_pd(&x3[14], values[7]);
+	  }
+      }
+
+     
+      x3 = x3_start;
+
+     for (i = 0; i < n; i++)
+       { 
+	 if(x3_gap[i / 32] & mask32[i % 32])
+	   {	     
+	     if(scaleGap)
+	       {		 
+		 addScale += wgt[i];		 	       
+	       }
+	   }
+	 else
+	   {
+	     __m128d maxv =_mm_setzero_pd();	     	    
+	     
+	     if(x1_gap[i / 32] & mask32[i % 32])
+	       x1 = x1_gapColumn;
+	     else
+	       {
+		 x1 = x1_ptr;
+		 x1_ptr += 16;
+	       }
+	     
+	     if(x2_gap[i / 32] & mask32[i % 32])
+	       x2 = x2_gapColumn;
+	     else
+	       {
+		 x2 = x2_ptr;
+		 x2_ptr += 16;
+	       }
+	     
+	     
+	     for (j = 0; j < 4; j++)
+	       {
+		 
+		 double *x1_p = &x1[j*4];
+		 double *left_k0_p = &left[j*16];
+		 double *left_k1_p = &left[j*16 + 1*4];
+		 double *left_k2_p = &left[j*16 + 2*4];
+		 double *left_k3_p = &left[j*16 + 3*4];
+		 
+		 __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+		 __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+		 
+		 __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+		 __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+		 __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+		 __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+		 __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+		 __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+		 __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+		 __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+		 
+		 left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+		 left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+		 
+		 left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+		 left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+		 
+		 left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+		 left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+		 left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+		 
+		 left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+		 left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+		 
+		 left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+		 left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+		 
+		 left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+		 left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+		 left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+		 
+		 
+		 //
+		 // multiply/add right side
+		 //
+		 double *x2_p = &x2[j*4];
+		 double *right_k0_p = &right[j*16];
+		 double *right_k1_p = &right[j*16 + 1*4];
+		 double *right_k2_p = &right[j*16 + 2*4];
+		 double *right_k3_p = &right[j*16 + 3*4];
+		 __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+		 __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+		 
+		 __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+		 __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+		 __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+		 __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+		 __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+		 __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+		 __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+		 __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+		 
+		 right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+		 right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+		 
+		 right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+		 right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+		 
+		 right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+		 right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+		 right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+		 
+		 right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+		 right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+		 
+		 
+		 right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+		 right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+		 
+		 right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+		 right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+		 right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+		 
+		 //
+		 // multiply left * right
+		 //
+		 
+		 __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+		 __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+		 
+		 
+		 //
+		 // multiply with EV matrix (!?)
+		 //	     
+		 
+		 __m128d EV_t_l0_k0 = EVV[0];
+		 __m128d EV_t_l0_k2 = EVV[1];
+		 __m128d EV_t_l1_k0 = EVV[2];
+		 __m128d EV_t_l1_k2 = EVV[3];
+		 __m128d EV_t_l2_k0 = EVV[4];
+		 __m128d EV_t_l2_k2 = EVV[5];
+		 __m128d EV_t_l3_k0 = EVV[6]; 
+		 __m128d EV_t_l3_k2 = EVV[7];
+		 
+		 
+		 EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+		 EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+		 EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+		 
+		 EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+		 EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+		 
+		 EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+		 EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+		 
+		 EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+		 EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+		 EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+		 
+		 
+		 EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+		 EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+		 EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+		 
+		 EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+		 
+		 
+		 values[j * 2] = EV_t_l0_k0;
+		 values[j * 2 + 1] = EV_t_l2_k0;            	   	    
+		 
+		 maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+		 maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+	       }
+	     
+	     
+	     _mm_store_pd(maxima, maxv);
+	     
+	     max = MAX(maxima[0], maxima[1]);
+	     
+	     if(max < minlikelihood)
+	       {
+		 __m128d sv = _mm_set1_pd(twotothe256);
+		 
+		 _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+		 _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+		 _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+		 _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+		 _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+		 _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+		 _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+		 _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     
+		 
+		 
+		 addScale += wgt[i];
+		
+	       }
+	     else
+	       {
+		 _mm_store_pd(&x3[0], values[0]);	   
+		 _mm_store_pd(&x3[2], values[1]);
+		 _mm_store_pd(&x3[4], values[2]);
+		 _mm_store_pd(&x3[6], values[3]);
+		 _mm_store_pd(&x3[8], values[4]);	   
+		 _mm_store_pd(&x3[10], values[5]);
+		 _mm_store_pd(&x3[12], values[6]);
+		 _mm_store_pd(&x3[14], values[7]);
+	       }	 
+
+	    
+		 
+	     x3 += 16;
+
+	   }
+       }
+     break;
+    default:
+      assert(0);
+    }
+  
+ 
+  *scalerIncrement = addScale;
+}
+
+
+static void newviewGTRGAMMA(int tipCase,
+			    double *x1_start, double *x2_start, double *x3_start,
+			    double *EV, double *tipVector,
+			    unsigned char *tipX1, unsigned char *tipX2,
+			    const int n, double *left, double *right, int *wgt, int *scalerIncrement
+			    )
+{
+  int 
+    i, 
+    j, 
+    k, 
+    l,
+    addScale = 0;
+  
+  double
+    *x1,
+    *x2,
+    *x3,
+    max,
+    maxima[2] __attribute__ ((aligned (BYTE_ALIGNMENT))),       
+    EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));      
+    
+  __m128d 
+    values[8],
+    EVV[8];  
+
+  for(k = 0; k < 4; k++)
+    for (l=0; l < 4; l++)
+      EV_t[4 * l + k] = EV[4 * k + l];
+
+  for(k = 0; k < 8; k++)
+    EVV[k] = _mm_load_pd(&EV_t[k * 2]);
+   
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT))), *uX2, umpX2[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+	    __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));	   
+
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+		}
+	  
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{
+		  __m128d left1 = _mm_load_pd(&right[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&right[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX2[i*16 + j*4 + k], acc);
+		 
+		}
+	  }   	
+	  
+	for (i = 0; i < n; i++)
+	  {
+	    x3 = &x3_start[i * 16];
+
+	    
+	    uX1 = &umpX1[16 * tipX1[i]];
+	    uX2 = &umpX2[16 * tipX2[i]];	   	    	    
+	    
+	    for (j = 0; j < 4; j++)
+	       {				 		  		  		   
+		 __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+		 __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+		 				  
+		   
+		 __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+		 __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+ 		 
+
+		 //
+		 // multiply left * right
+		 //
+		 
+		 __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+		 __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );
+		 
+		 
+		 //
+		 // multiply with EV matrix (!?)
+		 //
+		 
+		 __m128d EV_t_l0_k0 = EVV[0];
+		 __m128d EV_t_l0_k2 = EVV[1];
+		 __m128d EV_t_l1_k0 = EVV[2];
+		 __m128d EV_t_l1_k2 = EVV[3];
+		 __m128d EV_t_l2_k0 = EVV[4];
+		 __m128d EV_t_l2_k2 = EVV[5];
+		 __m128d EV_t_l3_k0 = EVV[6]; 
+		 __m128d EV_t_l3_k2 = EVV[7];
+		 
+		 EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+		 EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+		 EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+		 
+		 EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+		 EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+		 
+		 EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+		 EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+		 
+		 EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+		 EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+		 EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+		 
+		 EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+		 EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+		 EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+		 
+		 EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+		 
+		 _mm_store_pd( &x3[j * 4 + 0], EV_t_l0_k0 );
+		 _mm_store_pd( &x3[j * 4 + 2], EV_t_l2_k0 );
+	       }
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {	
+	double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+	for (i = 1; i < 16; i++)
+	  {
+	    __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+	    __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));	   
+
+	    for (j = 0; j < 4; j++)
+	      for (k = 0; k < 4; k++)
+		{		 
+		  __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+		  __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+		  
+		  __m128d acc = _mm_setzero_pd();
+
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+		  acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+		  		  
+		  acc = _mm_hadd_pd(acc, acc);
+		  _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);		 
+		}
+	  }
+
+	 for (i = 0; i < n; i++)
+	   {
+	     __m128d maxv =_mm_setzero_pd();
+	     
+	     x2 = &x2_start[i * 16];
+	     x3 = &x3_start[i * 16];
+
+	     uX1 = &umpX1[16 * tipX1[i]];	     
+
+	     for (j = 0; j < 4; j++)
+	       {
+
+		 //
+		 // multiply/add right side
+		 //
+		 double *x2_p = &x2[j*4];
+		 double *right_k0_p = &right[j*16];
+		 double *right_k1_p = &right[j*16 + 1*4];
+		 double *right_k2_p = &right[j*16 + 2*4];
+		 double *right_k3_p = &right[j*16 + 3*4];
+		 __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+		 __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+		 __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+		 __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+		 __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+		 __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+		 __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+		 __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+		 __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+		 __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+
+
+		 right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+		 right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+		 right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+		 right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+		 right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+		 right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+		 right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+
+		 right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+		 right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+
+		 right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+		 right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+		 right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+		 right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+		 right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+		 {
+		   //
+		   // load left side from tip vector
+		   //
+		   
+		   __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+		   __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+		 
+		 
+		   //
+		   // multiply left * right
+		   //
+		   
+		   __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+		   __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+		   
+		   
+		   //
+		   // multiply with EV matrix (!?)
+		   //		   		  
+
+		   __m128d EV_t_l0_k0 = EVV[0];
+		   __m128d EV_t_l0_k2 = EVV[1];
+		   __m128d EV_t_l1_k0 = EVV[2];
+		   __m128d EV_t_l1_k2 = EVV[3];
+		   __m128d EV_t_l2_k0 = EVV[4];
+		   __m128d EV_t_l2_k2 = EVV[5];
+		   __m128d EV_t_l3_k0 = EVV[6]; 
+		   __m128d EV_t_l3_k2 = EVV[7];
+
+		   
+		   EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+		   EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+		   EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+		   
+		   EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+		   EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+		   
+		   EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+		   EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+		   
+		   EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+		   EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+		   EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+		   		   
+		   EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+		   EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+		   EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+		   
+		   EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+		   
+		   values[j * 2]     = EV_t_l0_k0;
+		   values[j * 2 + 1] = EV_t_l2_k0;		   		   
+		   
+		   maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+		   maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));		   
+		 }
+	       }
+
+	     
+	     _mm_store_pd(maxima, maxv);
+
+	     max = MAX(maxima[0], maxima[1]);
+
+	     if(max < minlikelihood)
+	       {
+		 __m128d sv = _mm_set1_pd(twotothe256);
+	       		       	   	 	     
+		 _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+		 _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+		 _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+		 _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+		 _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+		 _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+		 _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+		 _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     
+		 
+		 
+		 addScale += wgt[i];
+		 
+	       }
+	     else
+	       {
+		 _mm_store_pd(&x3[0], values[0]);	   
+		 _mm_store_pd(&x3[2], values[1]);
+		 _mm_store_pd(&x3[4], values[2]);
+		 _mm_store_pd(&x3[6], values[3]);
+		 _mm_store_pd(&x3[8], values[4]);	   
+		 _mm_store_pd(&x3[10], values[5]);
+		 _mm_store_pd(&x3[12], values[6]);
+		 _mm_store_pd(&x3[14], values[7]);
+	       }
+	   }
+      }
+      break;
+    case INNER_INNER:     
+     for (i = 0; i < n; i++)
+       {
+	 __m128d maxv =_mm_setzero_pd();
+	 
+
+	 x1 = &x1_start[i * 16];
+	 x2 = &x2_start[i * 16];
+	 x3 = &x3_start[i * 16];
+	 
+	 for (j = 0; j < 4; j++)
+	   {
+	     
+	     double *x1_p = &x1[j*4];
+	     double *left_k0_p = &left[j*16];
+	     double *left_k1_p = &left[j*16 + 1*4];
+	     double *left_k2_p = &left[j*16 + 2*4];
+	     double *left_k3_p = &left[j*16 + 3*4];
+	     
+	     __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+	     __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+	     
+	     __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+	     __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+	     __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+	     __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+	     __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+	     __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+	     __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+	     __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+	     
+	     left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	     left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	     
+	     left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	     left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	     
+	     left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	     left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	     left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	     
+	     left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	     left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	     
+	     left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	     left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	     
+	     left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	     left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	     left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	     
+	     
+	     //
+	     // multiply/add right side
+	     //
+	     double *x2_p = &x2[j*4];
+	     double *right_k0_p = &right[j*16];
+	     double *right_k1_p = &right[j*16 + 1*4];
+	     double *right_k2_p = &right[j*16 + 2*4];
+	     double *right_k3_p = &right[j*16 + 3*4];
+	     __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+	     __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+	     
+	     __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+	     __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+	     __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+	     __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+	     __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+	     __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+	     __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+	     __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+	     
+	     right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	     right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	     
+	     right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	     right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	     
+	     right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	     right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	     right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	     
+	     right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	     right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	     
+	     
+	     right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	     right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	     
+	     right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	     right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	     right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+
+             //
+             // multiply left * right
+             //
+
+	     __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	     __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+
+             //
+             // multiply with EV matrix (!?)
+             //	     
+
+	     __m128d EV_t_l0_k0 = EVV[0];
+	     __m128d EV_t_l0_k2 = EVV[1];
+	     __m128d EV_t_l1_k0 = EVV[2];
+	     __m128d EV_t_l1_k2 = EVV[3];
+	     __m128d EV_t_l2_k0 = EVV[4];
+	     __m128d EV_t_l2_k2 = EVV[5];
+	     __m128d EV_t_l3_k0 = EVV[6]; 
+	     __m128d EV_t_l3_k2 = EVV[7];
+
+
+	    EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	    EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+	    EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	    EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+	    EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+	    EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	    EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+
+	    EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+            EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+            EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+            EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+	    
+	    values[j * 2] = EV_t_l0_k0;
+	    values[j * 2 + 1] = EV_t_l2_k0;            	   	    
+
+	    maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+	    maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+           }
+	 	 
+	 
+	 _mm_store_pd(maxima, maxv);
+	 
+	 max = MAX(maxima[0], maxima[1]);
+	 
+	 if(max < minlikelihood)
+	   {
+	     __m128d sv = _mm_set1_pd(twotothe256);
+	       		       	   	 	     
+	     _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));	   
+	     _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+	     _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+	     _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+	     _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));	   
+	     _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+	     _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+	     _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));	     
+	     
+	    
+	     addScale += wgt[i];
+	    
+	   }
+	 else
+	   {
+	     _mm_store_pd(&x3[0], values[0]);	   
+	     _mm_store_pd(&x3[2], values[1]);
+	     _mm_store_pd(&x3[4], values[2]);
+	     _mm_store_pd(&x3[6], values[3]);
+	     _mm_store_pd(&x3[8], values[4]);	   
+	     _mm_store_pd(&x3[10], values[5]);
+	     _mm_store_pd(&x3[12], values[6]);
+	     _mm_store_pd(&x3[14], values[7]);
+	   }	 
+       }
+   
+     break;
+    default:
+      assert(0);
+    }
+  
+ 
+  *scalerIncrement = addScale;
+
+}
+static void newviewGTRCAT( int tipCase,  double *EV,  int *cptr,
+			   double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+			   unsigned char *tipX1, unsigned char *tipX2,
+			   int n,  double *left, double *right, int *wgt, int *scalerIncrement)
+{
+  double
+    *le,
+    *ri,
+    *x1,
+    *x2, 
+    *x3, 
+    EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+    
+  int 
+    i, 
+    j, 
+    scale, 
+    addScale = 0;
+   
+  __m128d
+    minlikelihood_sse = _mm_set1_pd( minlikelihood ),
+    sc = _mm_set1_pd(twotothe256),
+    EVV[8];  
+  
+  for(i = 0; i < 4; i++)
+    for (j=0; j < 4; j++)
+      EV_t[4 * j + i] = EV[4 * i + j];
+  
+  for(i = 0; i < 8; i++)
+    EVV[i] = _mm_load_pd(&EV_t[i * 2]);
+  
+  switch(tipCase)
+    {
+    case TIP_TIP:      
+      for (i = 0; i < n; i++)
+	{	 
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &(tipVector[4 * tipX2[i]]);
+	  
+	  x3 = &x3_start[i * 4];
+	  
+	  le =  &left[cptr[i] * 16];
+	  ri =  &right[cptr[i] * 16];
+	  
+	  __m128d x1_0 = _mm_load_pd( &x1[0] );
+	  __m128d x1_2 = _mm_load_pd( &x1[2] );
+	  
+	  __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	  __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	  __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	  __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	  __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	  __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	  __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	  __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	  
+	  left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	  left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	  
+	  left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	  left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	  
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	  left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	  
+	  left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	  left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	  
+	  left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	  left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	  
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	  left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	  
+	  __m128d x2_0 = _mm_load_pd( &x2[0] );
+	  __m128d x2_2 = _mm_load_pd( &x2[2] );
+	  
+	  __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	  __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	  __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	  __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	  __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	  __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	  __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	  __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	  
+	  right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	  right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	  
+	  right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	  right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	  
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	  right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	  
+	  right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	  right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	  
+	  right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	  right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	  
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	  right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	  
+	  __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	  __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );	  	  
+
+	  __m128d EV_t_l0_k0 = EVV[0];
+	  __m128d EV_t_l0_k2 = EVV[1];
+	  __m128d EV_t_l1_k0 = EVV[2];
+	  __m128d EV_t_l1_k2 = EVV[3];
+	  __m128d EV_t_l2_k0 = EVV[4];
+	  __m128d EV_t_l2_k2 = EVV[5];
+	  __m128d EV_t_l3_k0 = EVV[6];
+	  __m128d EV_t_l3_k2 = EVV[7];
+	  
+	  EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	  EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	  
+	  EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	  EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	  
+	  EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	  
+	  EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	  EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	  	  
+	  EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	  EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	  EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	  
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	 
+	  	  
+	  _mm_store_pd(x3, EV_t_l0_k0);
+	  _mm_store_pd(&x3[2], EV_t_l2_k0);	  	 	   	    
+	}
+      break;
+    case TIP_INNER:      
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &(tipVector[4 * tipX1[i]]);
+	  x2 = &x2_start[4 * i];
+	  x3 = &x3_start[4 * i];
+	  
+	  le =  &left[cptr[i] * 16];
+	  ri =  &right[cptr[i] * 16];
+
+	  __m128d x1_0 = _mm_load_pd( &x1[0] );
+	  __m128d x1_2 = _mm_load_pd( &x1[2] );
+	  
+	  __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	  __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	  __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	  __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	  __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	  __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	  __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	  __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	  
+	  left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	  left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	  
+	  left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	  left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	  
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	  left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	  
+	  left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	  left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	  
+	  left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	  left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	  
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	  left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	  
+	  __m128d x2_0 = _mm_load_pd( &x2[0] );
+	  __m128d x2_2 = _mm_load_pd( &x2[2] );
+	  
+	  __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	  __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	  __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	  __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	  __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	  __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	  __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	  __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	  
+	  right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	  right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	  
+	  right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	  right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	  
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	  right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	  
+	  right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	  right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	  
+	  right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	  right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	  
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	  right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	  
+	  __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	  __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+	  
+	  __m128d EV_t_l0_k0 = EVV[0];
+	  __m128d EV_t_l0_k2 = EVV[1];
+	  __m128d EV_t_l1_k0 = EVV[2];
+	  __m128d EV_t_l1_k2 = EVV[3];
+	  __m128d EV_t_l2_k0 = EVV[4];
+	  __m128d EV_t_l2_k2 = EVV[5];
+	  __m128d EV_t_l3_k0 = EVV[6];
+	  __m128d EV_t_l3_k2 = EVV[7];
+	 
+	  
+	  EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	  EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	  
+	  EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	  EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	  
+	  EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	  
+	  EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	  EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	  	  
+	  EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	  EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	  EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	  
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	  	 	    		  
+	 
+	  scale = 1;
+	  	  	  	    
+	  __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+	  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	  if(_mm_movemask_pd( v1 ) != 3)
+	    scale = 0;
+	  else
+	    {
+	      v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+	      v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	      if(_mm_movemask_pd( v1 ) != 3)
+		scale = 0;
+	    }
+	  	  
+	  if(scale)
+	    {		      
+	      _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+	      _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));	      	      
+	      
+	      
+	      addScale += wgt[i];	  
+	    }	
+	  else
+	    {
+	      _mm_store_pd(x3, EV_t_l0_k0);
+	      _mm_store_pd(&x3[2], EV_t_l2_k0);
+	    }
+	 
+	  	  
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{
+	  x1 = &x1_start[4 * i];
+	  x2 = &x2_start[4 * i];
+	  x3 = &x3_start[4 * i];
+	  
+	  le =  &left[cptr[i] * 16];
+	  ri =  &right[cptr[i] * 16];
+
+	  __m128d x1_0 = _mm_load_pd( &x1[0] );
+	  __m128d x1_2 = _mm_load_pd( &x1[2] );
+	  
+	  __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	  __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	  __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	  __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	  __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	  __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	  __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	  __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	  
+	  left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	  left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	  
+	  left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	  left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	  
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	  left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	  left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	  
+	  left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	  left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	  
+	  left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	  left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	  
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	  left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	  left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	  
+	  __m128d x2_0 = _mm_load_pd( &x2[0] );
+	  __m128d x2_2 = _mm_load_pd( &x2[2] );
+	  
+	  __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	  __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	  __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	  __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	  __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	  __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	  __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	  __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	  
+	  right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	  right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	  
+	  right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	  right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	  
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	  right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	  right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	  
+	  right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	  right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	  
+	  right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	  right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	  
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	  right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	  right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	  
+	  __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	  __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+	  
+	  __m128d EV_t_l0_k0 = EVV[0];
+	  __m128d EV_t_l0_k2 = EVV[1];
+	  __m128d EV_t_l1_k0 = EVV[2];
+	  __m128d EV_t_l1_k2 = EVV[3];
+	  __m128d EV_t_l2_k0 = EVV[4];
+	  __m128d EV_t_l2_k2 = EVV[5];
+	  __m128d EV_t_l3_k0 = EVV[6];
+	  __m128d EV_t_l3_k2 = EVV[7];
+	 
+	  
+	  EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	  EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	  
+	  EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	  EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	  
+	  EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	  EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	  
+	  EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	  EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	  	  
+	  EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	  EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	  EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	  
+	  EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	  	 	    		  	 
+
+	  scale = 1;
+	  	  
+	  __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+	  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	  if(_mm_movemask_pd( v1 ) != 3)
+	    scale = 0;
+	  else
+	    {
+	      v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+	      v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	      if(_mm_movemask_pd( v1 ) != 3)
+		scale = 0;
+	    }
+	  	  
+	  if(scale)
+	    {		      
+	      _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+	      _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));	      	      
+	      
+	      
+	      addScale += wgt[i];	  
+	    }	
+	  else
+	    {
+	      _mm_store_pd(x3, EV_t_l0_k0);
+	      _mm_store_pd(&x3[2], EV_t_l2_k0);
+	    }
+	  	  
+	}
+      break;
+    default:
+      assert(0);
+    }
+
+  
+  *scalerIncrement = addScale;
+}
+
+
+
+static void newviewGTRCAT_SAVE( int tipCase,  double *EV,  int *cptr,
+				double *x1_start, double *x2_start,  double *x3_start, double *tipVector,
+				unsigned char *tipX1, unsigned char *tipX2,
+				int n,  double *left, double *right, int *wgt, int *scalerIncrement,
+				unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+  double
+    *le,
+    *ri,
+    *x1,
+    *x2,
+    *x3,
+    *x1_ptr = x1_start,
+    *x2_ptr = x2_start, 
+    *x3_ptr = x3_start, 
+    EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+    
+  int 
+    i, 
+    j, 
+    scale, 
+    scaleGap = 0,
+    addScale = 0;
+   
+  __m128d
+    minlikelihood_sse = _mm_set1_pd( minlikelihood ),
+    sc = _mm_set1_pd(twotothe256),
+    EVV[8];  
+  
+  for(i = 0; i < 4; i++)
+    for (j=0; j < 4; j++)
+      EV_t[4 * j + i] = EV[4 * i + j];
+  
+  for(i = 0; i < 8; i++)
+    EVV[i] = _mm_load_pd(&EV_t[i * 2]);
+  
+  {
+    x1 = x1_gapColumn;	      
+    x2 = x2_gapColumn;
+    x3 = x3_gapColumn;
+    
+    le =  &left[maxCats * 16];	     	 
+    ri =  &right[maxCats * 16];		   	  	  	  	         
+
+    __m128d x1_0 = _mm_load_pd( &x1[0] );
+    __m128d x1_2 = _mm_load_pd( &x1[2] );
+    
+    __m128d left_k0_0 = _mm_load_pd( &le[0] );
+    __m128d left_k0_2 = _mm_load_pd( &le[2] );
+    __m128d left_k1_0 = _mm_load_pd( &le[4] );
+    __m128d left_k1_2 = _mm_load_pd( &le[6] );
+    __m128d left_k2_0 = _mm_load_pd( &le[8] );
+    __m128d left_k2_2 = _mm_load_pd( &le[10] );
+    __m128d left_k3_0 = _mm_load_pd( &le[12] );
+    __m128d left_k3_2 = _mm_load_pd( &le[14] );
+    
+    left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+    left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+    
+    left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+    left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+    
+    left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+    left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+    left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+    
+    left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+    left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+    
+    left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+    left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+    
+    left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+    left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+    left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+    
+    __m128d x2_0 = _mm_load_pd( &x2[0] );
+    __m128d x2_2 = _mm_load_pd( &x2[2] );
+    
+    __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+    __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+    __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+    __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+    __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+    __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+    __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+    __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+    
+    right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+    right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+    
+    right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+    right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+    
+    right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+    right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+    right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+    
+    right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+    right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+    
+    right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+    right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+    
+    right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+    right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+    right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+    
+    __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+    __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+    
+    __m128d EV_t_l0_k0 = EVV[0];
+    __m128d EV_t_l0_k2 = EVV[1];
+    __m128d EV_t_l1_k0 = EVV[2];
+    __m128d EV_t_l1_k2 = EVV[3];
+    __m128d EV_t_l2_k0 = EVV[4];
+    __m128d EV_t_l2_k2 = EVV[5];
+    __m128d EV_t_l3_k0 = EVV[6];
+    __m128d EV_t_l3_k2 = EVV[7];
+        
+    EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+    EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+    
+    EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+    EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+    
+    EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+    EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+    
+    EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+    EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+    
+    EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+    EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+    EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+    
+    EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	  	 	    		  
+	
+    if(tipCase != TIP_TIP)
+      {    
+	scale = 1;
+	      
+	__m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+	v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	if(_mm_movemask_pd( v1 ) != 3)
+	  scale = 0;
+	else
+	  {
+	    v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+	    v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	    if(_mm_movemask_pd( v1 ) != 3)
+	      scale = 0;
+	  }
+	
+	if(scale)
+	  {		      
+	    _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+	    _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));	      	      
+	    
+	    scaleGap = TRUE;	   
+	  }	
+	else
+	  {
+	    _mm_store_pd(x3, EV_t_l0_k0);
+	    _mm_store_pd(&x3[2], EV_t_l2_k0);
+	  }
+      }
+    else
+      {
+	_mm_store_pd(x3, EV_t_l0_k0);
+	_mm_store_pd(&x3[2], EV_t_l2_k0);
+      }
+  }
+  
+
+  switch(tipCase)
+    {
+    case TIP_TIP:      
+      for (i = 0; i < n; i++)
+	{
+	  if(noGap(x3_gap, i))
+	    {
+	      x1 = &(tipVector[4 * tipX1[i]]);
+	      x2 = &(tipVector[4 * tipX2[i]]);
+	  
+	      x3 = x3_ptr;
+	  
+	      if(isGap(x1_gap, i))
+		le =  &left[maxCats * 16];
+	      else	  	  
+		le =  &left[cptr[i] * 16];	  
+	  
+	      if(isGap(x2_gap, i))
+		ri =  &right[maxCats * 16];
+	      else	 	  
+		ri =  &right[cptr[i] * 16];
+	  
+	      __m128d x1_0 = _mm_load_pd( &x1[0] );
+	      __m128d x1_2 = _mm_load_pd( &x1[2] );
+	      
+	      __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	      __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	      __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	      __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	      __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	      __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	      __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	      __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	  
+	      left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	      left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	      
+	      left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	      left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	      
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	      left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	      
+	      left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	      left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	      
+	      left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	      left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	      
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	      left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	      
+	      __m128d x2_0 = _mm_load_pd( &x2[0] );
+	      __m128d x2_2 = _mm_load_pd( &x2[2] );
+	      
+	      __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	      __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	      __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	      __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	      __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	      __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	      __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	      __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	      
+	      right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	      right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	      
+	      right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	      right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	      
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	      right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	      
+	      right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	      right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	      
+	      right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	      right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	      
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	      right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	      
+	      __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	      __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );	  	  
+	      
+	      __m128d EV_t_l0_k0 = EVV[0];
+	      __m128d EV_t_l0_k2 = EVV[1];
+	      __m128d EV_t_l1_k0 = EVV[2];
+	      __m128d EV_t_l1_k2 = EVV[3];
+	      __m128d EV_t_l2_k0 = EVV[4];
+	      __m128d EV_t_l2_k2 = EVV[5];
+	      __m128d EV_t_l3_k0 = EVV[6];
+	      __m128d EV_t_l3_k2 = EVV[7];
+	      
+	      EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	      EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	      
+	      EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	      EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	      
+	      EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	      
+	      EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	      EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	      
+	      EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	      EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	      EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	      
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	 
+	  	  
+	      _mm_store_pd(x3, EV_t_l0_k0);
+	      _mm_store_pd(&x3[2], EV_t_l2_k0);	  	 	   	    
+
+	      x3_ptr += 4;
+	    }
+	}
+      break;
+    case TIP_INNER:      
+      for (i = 0; i < n; i++)
+	{ 
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)		   		    
+		addScale += wgt[i];
+	    }
+	  else
+	    {	      
+	      x1 = &(tipVector[4 * tipX1[i]]);
+	      
+	      x2 = x2_ptr;
+	      x3 = x3_ptr;
+
+	      if(isGap(x1_gap, i))
+		le =  &left[maxCats * 16];
+	      else
+		le =  &left[cptr[i] * 16];
+
+	      if(isGap(x2_gap, i))
+		{		 
+		  ri =  &right[maxCats * 16];
+		  x2 = x2_gapColumn;
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 16];
+		  x2 = x2_ptr;
+		  x2_ptr += 4;
+		}	  	  	  	  
+
+	      __m128d x1_0 = _mm_load_pd( &x1[0] );
+	      __m128d x1_2 = _mm_load_pd( &x1[2] );
+	      
+	      __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	      __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	      __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	      __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	      __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	      __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	      __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	      __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	      
+	      left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	      left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	      
+	      left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	      left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	      
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	      left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	      
+	      left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	      left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	      
+	      left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	      left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	      
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	      left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	      
+	      __m128d x2_0 = _mm_load_pd( &x2[0] );
+	      __m128d x2_2 = _mm_load_pd( &x2[2] );
+	      
+	      __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	      __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	      __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	      __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	      __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	      __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	      __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	      __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	      
+	      right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	      right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	  
+	      right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	      right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	      
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	      right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	      
+	      right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	      right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	      
+	      right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	      right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	      
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	      right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	      
+	      __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	      __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+	      
+	      __m128d EV_t_l0_k0 = EVV[0];
+	      __m128d EV_t_l0_k2 = EVV[1];
+	      __m128d EV_t_l1_k0 = EVV[2];
+	      __m128d EV_t_l1_k2 = EVV[3];
+	      __m128d EV_t_l2_k0 = EVV[4];
+	      __m128d EV_t_l2_k2 = EVV[5];
+	      __m128d EV_t_l3_k0 = EVV[6];
+	      __m128d EV_t_l3_k2 = EVV[7];
+	      
+	      
+	      EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	      EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	      
+	      EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	      EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	      
+	      EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	      
+	      EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	      EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	      
+	      EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	      EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	      EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	      
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	  	 	    		  
+	      
+	      scale = 1;
+	      
+	      __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+	      v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	      if(_mm_movemask_pd( v1 ) != 3)
+		scale = 0;
+	      else
+		{
+		  v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}
+	  	  
+	      if(scale)
+		{		      
+		  _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+		  _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));	      	      
+		  		  
+		  addScale += wgt[i];	  
+		}	
+	      else
+		{
+		  _mm_store_pd(x3, EV_t_l0_k0);
+		  _mm_store_pd(&x3[2], EV_t_l2_k0);
+		}
+
+	      x3_ptr += 4;
+	    }
+	  	  
+	}
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+	{ 
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)		   		    
+		addScale += wgt[i];
+	    }
+	  else
+	    {	     
+	      x3 = x3_ptr;
+	  	  
+	      if(isGap(x1_gap, i))
+		{
+		  x1 = x1_gapColumn;
+		  le =  &left[maxCats * 16];
+		}
+	      else
+		{
+		  le =  &left[cptr[i] * 16];
+		  x1 = x1_ptr;
+		  x1_ptr += 4;
+		}
+
+	      if(isGap(x2_gap, i))	
+		{
+		  x2 = x2_gapColumn;
+		  ri =  &right[maxCats * 16];	    
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 16];
+		  x2 = x2_ptr;
+		  x2_ptr += 4;
+		}	 	  	  	  
+
+	      __m128d x1_0 = _mm_load_pd( &x1[0] );
+	      __m128d x1_2 = _mm_load_pd( &x1[2] );
+	      
+	      __m128d left_k0_0 = _mm_load_pd( &le[0] );
+	      __m128d left_k0_2 = _mm_load_pd( &le[2] );
+	      __m128d left_k1_0 = _mm_load_pd( &le[4] );
+	      __m128d left_k1_2 = _mm_load_pd( &le[6] );
+	      __m128d left_k2_0 = _mm_load_pd( &le[8] );
+	      __m128d left_k2_2 = _mm_load_pd( &le[10] );
+	      __m128d left_k3_0 = _mm_load_pd( &le[12] );
+	      __m128d left_k3_2 = _mm_load_pd( &le[14] );
+	      
+	      left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+	      left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+	      
+	      left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+	      left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+	      
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+	      left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+	      left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+	      
+	      left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+	      left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+	      
+	      left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+	      left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+	      
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+	      left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+	      left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+	      
+	      __m128d x2_0 = _mm_load_pd( &x2[0] );
+	      __m128d x2_2 = _mm_load_pd( &x2[2] );
+	      
+	      __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+	      __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+	      __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+	      __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+	      __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+	      __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+	      __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+	      __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+	      
+	      right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+	      right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+	      
+	      right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+	      right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+	      
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+	      right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+	      right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+	      
+	      right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+	      right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+	      
+	      right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+	      right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+	      
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+	      right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+	      right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);	   
+	      
+	      __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+	      __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+	      
+	      __m128d EV_t_l0_k0 = EVV[0];
+	      __m128d EV_t_l0_k2 = EVV[1];
+	      __m128d EV_t_l1_k0 = EVV[2];
+	      __m128d EV_t_l1_k2 = EVV[3];
+	      __m128d EV_t_l2_k0 = EVV[4];
+	      __m128d EV_t_l2_k2 = EVV[5];
+	      __m128d EV_t_l3_k0 = EVV[6];
+	      __m128d EV_t_l3_k2 = EVV[7];
+	      
+	      
+	      EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+	      EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+	      
+	      EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+	      EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+	      
+	      EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+	      EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+	      
+	      EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+	      EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+	      
+	      EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+	      EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+	      EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+	      
+	      EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );	  	 	    		  	 
+	      
+	      scale = 1;
+	      
+	      __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+	      v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	      if(_mm_movemask_pd( v1 ) != 3)
+		scale = 0;
+	      else
+		{
+		  v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}
+	  	  
+	      if(scale)
+		{		      
+		  _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+		  _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));	      	      
+		  	      
+		  addScale += wgt[i];	  
+		}	
+	      else
+		{
+		  _mm_store_pd(x3, EV_t_l0_k0);
+		  _mm_store_pd(&x3[2], EV_t_l2_k0);
+		}
+	     
+	      x3_ptr += 4;
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+
+  
+  *scalerIncrement = addScale;
+}
+
+static void newviewGTRGAMMAPROT_GAPPED_SAVE(int tipCase,
+					    double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+					    unsigned char *tipX1, unsigned char *tipX2,
+					    int n, double *left, double *right, int *wgt, int *scalerIncrement, 
+					    unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,  
+					    double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+					    )
+{
+  double  *uX1, *uX2, *v;
+  double x1px2;
+  int  i, j, l, k, scale, addScale = 0,   
+    gapScaling = 0;
+  double 
+    *vl, *vr, *x1v, *x2v,
+    *x1_ptr = x1,
+    *x2_ptr = x2,
+    *x3_ptr = x3;
+
+  
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double umpX1[1840], umpX2[1840];
+
+	for(i = 0; i < 23; i++)
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++)
+	      {
+		double *ll =  &left[k * 20];
+		double *rr =  &right[k * 20];
+		
+		__m128d umpX1v = _mm_setzero_pd();
+		__m128d umpX2v = _mm_setzero_pd();
+
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+		    umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+		umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+		
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+		_mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+	      }
+	  }
+
+	{
+	  uX1 = &umpX1[1760];
+	  uX2 = &umpX2[1760];
+
+	  for(j = 0; j < 4; j++)
+	    {
+	      v = &x3_gapColumn[j * 20];
+
+	      __m128d zero =  _mm_setzero_pd();
+	      for(k = 0; k < 20; k+=2)		  		    
+		_mm_store_pd(&v[k], zero);
+
+	      for(k = 0; k < 20; k++)
+		{ 
+		  double *eev = &extEV[k * 20];
+		  x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+		  __m128d x1px2v = _mm_set1_pd(x1px2);
+		  
+		  for(l = 0; l < 20; l+=2)
+		    {
+		      __m128d vv = _mm_load_pd(&v[l]);
+		      __m128d ee = _mm_load_pd(&eev[l]);
+		      
+		      vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+		      
+		      _mm_store_pd(&v[l], vv);
+		    }
+		}
+	    }	   
+	}	
+
+	for(i = 0; i < n; i++)
+	  {
+	    if(!(x3_gap[i / 32] & mask32[i % 32]))
+	      {
+		uX1 = &umpX1[80 * tipX1[i]];
+		uX2 = &umpX2[80 * tipX2[i]];
+		
+		for(j = 0; j < 4; j++)
+		  {
+		    v = &x3_ptr[j * 20];
+		    
+		    
+		    __m128d zero =  _mm_setzero_pd();
+		    for(k = 0; k < 20; k+=2)		  		    
+		      _mm_store_pd(&v[k], zero);
+		    
+		    for(k = 0; k < 20; k++)
+		      { 
+			double *eev = &extEV[k * 20];
+			x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+			__m128d x1px2v = _mm_set1_pd(x1px2);
+			
+			for(l = 0; l < 20; l+=2)
+			  {
+			    __m128d vv = _mm_load_pd(&v[l]);
+			    __m128d ee = _mm_load_pd(&eev[l]);
+			    
+			    vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			    
+			    _mm_store_pd(&v[l], vv);
+			  }
+		      }
+		  }	   
+		x3_ptr += 80;
+	      }
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	double umpX1[1840], ump_x2[20];
+
+
+	for(i = 0; i < 23; i++)
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++)
+	      {
+		double *ll =  &left[k * 20];
+				
+		__m128d umpX1v = _mm_setzero_pd();
+		
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));		    					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);				
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);		
+
+	      }
+	  }
+
+	{
+	  uX1 = &umpX1[1760];
+
+	  for(k = 0; k < 4; k++)
+	    {
+	      v = &(x2_gapColumn[k * 20]);
+	       
+	      for(l = 0; l < 20; l++)
+		{		   
+		  double *r =  &right[k * 400 + l * 20];
+		  __m128d ump_x2v = _mm_setzero_pd();	    
+		  
+		  for(j = 0; j < 20; j+= 2)
+		    {
+		      __m128d vv = _mm_load_pd(&v[j]);
+		      __m128d rr = _mm_load_pd(&r[j]);
+		      ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+		    }
+		  
+		  ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+		  
+		  _mm_storel_pd(&ump_x2[l], ump_x2v);		   		     
+		}
+
+	      v = &(x3_gapColumn[20 * k]);
+
+	      __m128d zero =  _mm_setzero_pd();
+	      for(l = 0; l < 20; l+=2)		  		    
+		_mm_store_pd(&v[l], zero);
+		  
+	      for(l = 0; l < 20; l++)
+		{
+		  double *eev = &extEV[l * 20];
+		  x1px2 = uX1[k * 20 + l]  * ump_x2[l];
+		  __m128d x1px2v = _mm_set1_pd(x1px2);
+		  
+		  for(j = 0; j < 20; j+=2)
+		    {
+		      __m128d vv = _mm_load_pd(&v[j]);
+		      __m128d ee = _mm_load_pd(&eev[j]);
+		      
+		      vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+		      
+		      _mm_store_pd(&v[j], vv);
+		    }		     		    
+		}			
+	      
+	    }
+	  
+	  { 
+	    v = x3_gapColumn;
+	    __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	    
+	    scale = 1;
+	    for(l = 0; scale && (l < 80); l += 2)
+	      {
+		__m128d vv = _mm_load_pd(&v[l]);
+		__m128d v1 = _mm_and_pd(vv, absMask.m);
+		v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		if(_mm_movemask_pd( v1 ) != 3)
+		  scale = 0;
+	      }	    	  
+	  }
+
+
+	  if (scale)
+	    {
+	      gapScaling = 1;
+	      __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	      
+	      for(l = 0; l < 80; l+=2)
+		{
+		  __m128d ex3v = _mm_load_pd(&v[l]);		  
+		  _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		}		   		  	      	    	       
+	    }
+	}
+
+	for (i = 0; i < n; i++)
+	  {	    
+	    if((x3_gap[i / 32] & mask32[i % 32]))
+	       {	       
+		 if(gapScaling)
+		   {		     
+		     addScale += wgt[i];		     
+		   }
+	       }
+	     else
+	       {
+		 uX1 = &umpX1[80 * tipX1[i]];
+
+		  if(x2_gap[i / 32] & mask32[i % 32])
+		   x2v = x2_gapColumn;
+		  else
+		    {
+		      x2v = x2_ptr;
+		      x2_ptr += 80;
+		    }
+		 
+		 for(k = 0; k < 4; k++)
+		   {
+		     v = &(x2v[k * 20]);
+		     
+		     for(l = 0; l < 20; l++)
+		       {		   
+			 double *r =  &right[k * 400 + l * 20];
+			 __m128d ump_x2v = _mm_setzero_pd();	    
+			 
+			 for(j = 0; j < 20; j+= 2)
+			   {
+			     __m128d vv = _mm_load_pd(&v[j]);
+			     __m128d rr = _mm_load_pd(&r[j]);
+			     ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+			   }
+			 
+			 ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+			 
+			 _mm_storel_pd(&ump_x2[l], ump_x2v);		   		     
+		       }
+		     
+		     v = &x3_ptr[20 * k];
+		     
+		     __m128d zero =  _mm_setzero_pd();
+		     for(l = 0; l < 20; l+=2)		  		    
+		       _mm_store_pd(&v[l], zero);
+		     
+		     for(l = 0; l < 20; l++)
+		       {
+			 double *eev = &extEV[l * 20];
+			 x1px2 = uX1[k * 20 + l]  * ump_x2[l];
+			 __m128d x1px2v = _mm_set1_pd(x1px2);
+			 
+			 for(j = 0; j < 20; j+=2)
+			   {
+			     __m128d vv = _mm_load_pd(&v[j]);
+			     __m128d ee = _mm_load_pd(&eev[j]);
+			     
+			     vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			     
+			     _mm_store_pd(&v[j], vv);
+			   }		     		    
+		       }			
+		     
+		   }
+		 
+		 
+		 { 
+		   v = x3_ptr;
+		   __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+		   
+		   scale = 1;
+		   for(l = 0; scale && (l < 80); l += 2)
+		     {
+		       __m128d vv = _mm_load_pd(&v[l]);
+		       __m128d v1 = _mm_and_pd(vv, absMask.m);
+		       v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		       if(_mm_movemask_pd( v1 ) != 3)
+			 scale = 0;
+		     }	    	  
+		 }
+		 
+		 
+		 if (scale)
+		   {
+		     __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+		     
+		     for(l = 0; l < 80; l+=2)
+		       {
+			 __m128d ex3v = _mm_load_pd(&v[l]);		  
+			 _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		       }		   		  
+		     		    
+		     addScale += wgt[i];		      
+		   }
+		 
+		 x3_ptr += 80;
+	       }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      {
+	for(k = 0; k < 4; k++)
+	   {
+	     vl = &(x1_gapColumn[20 * k]);
+	     vr = &(x2_gapColumn[20 * k]);
+	     v =  &(x3_gapColumn[20 * k]);
+
+	     __m128d zero =  _mm_setzero_pd();
+	     for(l = 0; l < 20; l+=2)		  		    
+	       _mm_store_pd(&v[l], zero);
+	     
+	     for(l = 0; l < 20; l++)
+	       {		 
+		 {
+		   __m128d al = _mm_setzero_pd();
+		   __m128d ar = _mm_setzero_pd();
+
+		   double *ll   = &left[k * 400 + l * 20];
+		   double *rr   = &right[k * 400 + l * 20];
+		   double *EVEV = &extEV[20 * l];
+		   
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d lv  = _mm_load_pd(&ll[j]);
+		       __m128d rv  = _mm_load_pd(&rr[j]);
+		       __m128d vll = _mm_load_pd(&vl[j]);
+		       __m128d vrr = _mm_load_pd(&vr[j]);
+		       
+		       al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+		       ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+		     }  		 
+		       
+		   al = _mm_hadd_pd(al, al);
+		   ar = _mm_hadd_pd(ar, ar);
+		   
+		   al = _mm_mul_pd(al, ar);
+
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d vv  = _mm_load_pd(&v[j]);
+		       __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+		       vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+		       _mm_store_pd(&v[j], vv);
+		     }		  		   		  
+		 }		 
+
+	       }
+	   }
+	 
+
+	{ 
+	   v = x3_gapColumn;
+	   __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	   
+	   scale = 1;
+	   for(l = 0; scale && (l < 80); l += 2)
+	     {
+	       __m128d vv = _mm_load_pd(&v[l]);
+	       __m128d v1 = _mm_and_pd(vv, absMask.m);
+	       v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	       if(_mm_movemask_pd( v1 ) != 3)
+		 scale = 0;
+	     }	    	  
+	 }
+
+	 if (scale)
+	   {
+	     gapScaling = 1;
+	     __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	     
+	     for(l = 0; l < 80; l+=2)
+	       {
+		 __m128d ex3v = _mm_load_pd(&v[l]);		  
+		 _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+	       }		   		  
+	     
+	    	  
+	   }
+      }
+
+      for (i = 0; i < n; i++)
+       {
+	  if(x3_gap[i / 32] & mask32[i % 32])
+	   {	     
+	     if(gapScaling)
+	       {		
+		 addScale += wgt[i];			       
+	       }
+	   }
+	 else
+	   {
+	     if(x1_gap[i / 32] & mask32[i % 32])
+	       x1v = x1_gapColumn;
+	     else
+	       {
+		 x1v = x1_ptr;
+		 x1_ptr += 80;
+	       }
+
+	     if(x2_gap[i / 32] & mask32[i % 32])
+	       x2v = x2_gapColumn;
+	     else
+	       {
+		 x2v = x2_ptr;
+		 x2_ptr += 80;
+	       }
+
+	     for(k = 0; k < 4; k++)
+	       {
+		 vl = &(x1v[20 * k]);
+		 vr = &(x2v[20 * k]);
+		 v =  &x3_ptr[20 * k];
+		 		 
+		 __m128d zero =  _mm_setzero_pd();
+		 for(l = 0; l < 20; l+=2)		  		    
+		   _mm_store_pd(&v[l], zero);
+		 		 
+		 for(l = 0; l < 20; l++)
+		   {		 
+		     {
+		       __m128d al = _mm_setzero_pd();
+		       __m128d ar = _mm_setzero_pd();
+		       
+		       double *ll   = &left[k * 400 + l * 20];
+		       double *rr   = &right[k * 400 + l * 20];
+		       double *EVEV = &extEV[20 * l];
+		       
+		       for(j = 0; j < 20; j+=2)
+			 {
+			   __m128d lv  = _mm_load_pd(&ll[j]);
+			   __m128d rv  = _mm_load_pd(&rr[j]);
+			   __m128d vll = _mm_load_pd(&vl[j]);
+			   __m128d vrr = _mm_load_pd(&vr[j]);
+			   
+			   al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+			   ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+			 }  		 
+		       
+		       al = _mm_hadd_pd(al, al);
+		       ar = _mm_hadd_pd(ar, ar);
+		       
+		       al = _mm_mul_pd(al, ar);
+		       
+		       for(j = 0; j < 20; j+=2)
+			 {
+			   __m128d vv  = _mm_load_pd(&v[j]);
+			   __m128d EVV = _mm_load_pd(&EVEV[j]);
+			   
+			   vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+			   
+			   _mm_store_pd(&v[j], vv);
+			 }		  		   		  
+		     }		 
+		     
+		   }
+	       }
+	     
+
+	     
+	     { 
+	       v = x3_ptr;
+	       __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	       
+	       scale = 1;
+	       for(l = 0; scale && (l < 80); l += 2)
+		 {
+		   __m128d vv = _mm_load_pd(&v[l]);
+		   __m128d v1 = _mm_and_pd(vv, absMask.m);
+		   v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		   if(_mm_movemask_pd( v1 ) != 3)
+		     scale = 0;
+		 }	    	  
+	     }
+	     
+	     
+	     if (scale)
+	       {
+		 __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+		 
+		 for(l = 0; l < 80; l+=2)
+		   {
+		     __m128d ex3v = _mm_load_pd(&v[l]);		  
+		     _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		   }		   		  
+		 		
+		 addScale += wgt[i];		 	  
+	       }
+	     x3_ptr += 80;
+	   }
+       }
+      break;
+    default:
+      assert(0);
+    }
+
+ 
+  *scalerIncrement = addScale;  
+}
+
+
+
+static void newviewGTRGAMMAPROT(int tipCase,
+				double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+				unsigned char *tipX1, unsigned char *tipX2,
+				int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+  double  *uX1, *uX2, *v;
+  double x1px2;
+  int  i, j, l, k, scale, addScale = 0;
+  double *vl, *vr;
+
+
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double umpX1[1840], umpX2[1840];
+
+	for(i = 0; i < 23; i++)
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++)
+	      {
+		double *ll =  &left[k * 20];
+		double *rr =  &right[k * 20];
+		
+		__m128d umpX1v = _mm_setzero_pd();
+		__m128d umpX2v = _mm_setzero_pd();
+
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+		    umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+		umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+		
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+		_mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+
+	      }
+	  }
+
+	for(i = 0; i < n; i++)
+	  {
+	    uX1 = &umpX1[80 * tipX1[i]];
+	    uX2 = &umpX2[80 * tipX2[i]];
+
+	    for(j = 0; j < 4; j++)
+	      {
+		v = &x3[i * 80 + j * 20];
+
+
+		__m128d zero =  _mm_setzero_pd();
+		for(k = 0; k < 20; k+=2)		  		    
+		  _mm_store_pd(&v[k], zero);
+
+		for(k = 0; k < 20; k++)
+		  { 
+		    double *eev = &extEV[k * 20];
+		    x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+		    __m128d x1px2v = _mm_set1_pd(x1px2);
+
+		    for(l = 0; l < 20; l+=2)
+		      {
+		      	__m128d vv = _mm_load_pd(&v[l]);
+			__m128d ee = _mm_load_pd(&eev[l]);
+
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			
+			_mm_store_pd(&v[l], vv);
+		      }
+		  }
+
+
+	      }	   
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	double umpX1[1840], ump_x2[20];
+
+
+	for(i = 0; i < 23; i++)
+	  {
+	    v = &(tipVector[20 * i]);
+
+	    for(k = 0; k < 80; k++)
+	      {
+		double *ll =  &left[k * 20];
+				
+		__m128d umpX1v = _mm_setzero_pd();
+		
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));		    					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);				
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);		
+
+
+	      }
+	  }
+
+	for (i = 0; i < n; i++)
+	  {
+	    uX1 = &umpX1[80 * tipX1[i]];
+
+	    for(k = 0; k < 4; k++)
+	      {
+		v = &(x2[80 * i + k * 20]);
+	       
+		for(l = 0; l < 20; l++)
+		  {		   
+		    double *r =  &right[k * 400 + l * 20];
+		    __m128d ump_x2v = _mm_setzero_pd();	    
+		    
+		    for(j = 0; j < 20; j+= 2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			__m128d rr = _mm_load_pd(&r[j]);
+			ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+		      }
+		     
+		    ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+		    
+		    _mm_storel_pd(&ump_x2[l], ump_x2v);		   		     
+		  }
+
+		v = &(x3[80 * i + 20 * k]);
+
+		__m128d zero =  _mm_setzero_pd();
+		for(l = 0; l < 20; l+=2)		  		    
+		  _mm_store_pd(&v[l], zero);
+		  
+		for(l = 0; l < 20; l++)
+		  {
+		    double *eev = &extEV[l * 20];
+		    x1px2 = uX1[k * 20 + l]  * ump_x2[l];
+		    __m128d x1px2v = _mm_set1_pd(x1px2);
+		  
+		    for(j = 0; j < 20; j+=2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			__m128d ee = _mm_load_pd(&eev[j]);
+			
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			
+			_mm_store_pd(&v[j], vv);
+		      }		     		    
+		  }			
+
+	      }
+	   
+
+	    { 
+	      v = &(x3[80 * i]);
+	      __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	      
+	      scale = 1;
+	      for(l = 0; scale && (l < 80); l += 2)
+		{
+		  __m128d vv = _mm_load_pd(&v[l]);
+		  __m128d v1 = _mm_and_pd(vv, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}	    	  
+	    }
+
+
+	    if (scale)
+	      {
+
+	       __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	       
+	       for(l = 0; l < 80; l+=2)
+		 {
+		   __m128d ex3v = _mm_load_pd(&v[l]);		  
+		   _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		 }		   		  
+
+
+	
+		addScale += wgt[i];
+		       
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+       {
+	 for(k = 0; k < 4; k++)
+	   {
+	     vl = &(x1[80 * i + 20 * k]);
+	     vr = &(x2[80 * i + 20 * k]);
+	     v =  &(x3[80 * i + 20 * k]);
+
+
+	     __m128d zero =  _mm_setzero_pd();
+	     for(l = 0; l < 20; l+=2)		  		    
+	       _mm_store_pd(&v[l], zero);
+
+
+	     for(l = 0; l < 20; l++)
+	       {		 
+
+		 {
+		   __m128d al = _mm_setzero_pd();
+		   __m128d ar = _mm_setzero_pd();
+
+		   double *ll   = &left[k * 400 + l * 20];
+		   double *rr   = &right[k * 400 + l * 20];
+		   double *EVEV = &extEV[20 * l];
+		   
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d lv  = _mm_load_pd(&ll[j]);
+		       __m128d rv  = _mm_load_pd(&rr[j]);
+		       __m128d vll = _mm_load_pd(&vl[j]);
+		       __m128d vrr = _mm_load_pd(&vr[j]);
+		       
+		       al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+		       ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+		     }  		 
+		       
+		   al = _mm_hadd_pd(al, al);
+		   ar = _mm_hadd_pd(ar, ar);
+		   
+		   al = _mm_mul_pd(al, ar);
+
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d vv  = _mm_load_pd(&v[j]);
+		       __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+		       vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+		       _mm_store_pd(&v[j], vv);
+		     }		  		   		  
+		 }		 
+
+	       }
+	   }
+	 
+
+
+	 { 
+	   v = &(x3[80 * i]);
+	   __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	   
+	   scale = 1;
+	   for(l = 0; scale && (l < 80); l += 2)
+	     {
+	       __m128d vv = _mm_load_pd(&v[l]);
+	       __m128d v1 = _mm_and_pd(vv, absMask.m);
+	       v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	       if(_mm_movemask_pd( v1 ) != 3)
+		 scale = 0;
+	     }	    	  
+	 }
+
+
+	 if (scale)
+	   {
+
+	       __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	       
+	       for(l = 0; l < 80; l+=2)
+		 {
+		   __m128d ex3v = _mm_load_pd(&v[l]);		  
+		   _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		 }		   		  
+
+
+	    
+	     addScale += wgt[i];
+	      
+	   }
+       }
+      break;
+    default:
+      assert(0);
+    }
+
+  
+  *scalerIncrement = addScale;
+
+}
+
+
+     
+static void newviewGTRCATPROT(int tipCase, double *extEV,
+			      int *cptr,
+			      double *x1, double *x2, double *x3, double *tipVector,
+			      unsigned char *tipX1, unsigned char *tipX2,
+			      int n, double *left, double *right, int *wgt, int *scalerIncrement )
+{
+  double
+    *le, *ri, *v, *vl, *vr;
+
+  int i, l, j, scale, addScale = 0;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	for (i = 0; i < n; i++)
+	  {
+	    le = &left[cptr[i] * 400];
+	    ri = &right[cptr[i] * 400];
+
+	    vl = &(tipVector[20 * tipX1[i]]);
+	    vr = &(tipVector[20 * tipX2[i]]);
+	    v  = &x3[20 * i];
+
+	    for(l = 0; l < 20; l+=2)
+	      _mm_store_pd(&v[l], _mm_setzero_pd());	      		
+
+
+	    for(l = 0; l < 20; l++)
+	      {
+		__m128d x1v = _mm_setzero_pd();
+		__m128d x2v = _mm_setzero_pd();	 
+		double 
+		  *ev = &extEV[l * 20],
+		  *lv = &le[l * 20],
+		  *rv = &ri[l * 20];
+
+		for(j = 0; j < 20; j+=2)
+		  {
+		    x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+		    x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		  }
+
+		x1v = _mm_hadd_pd(x1v, x1v);
+		x2v = _mm_hadd_pd(x2v, x2v);
+
+		x1v = _mm_mul_pd(x1v, x2v);
+		
+		for(j = 0; j < 20; j+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[j]);
+		    vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+		    _mm_store_pd(&v[j], vv);
+		  }		    
+
+	      }	   
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	for (i = 0; i < n; i++)
+	  {
+	    le = &left[cptr[i] * 400];
+	    ri = &right[cptr[i] * 400];
+
+	    vl = &(tipVector[20 * tipX1[i]]);
+	    vr = &x2[20 * i];
+	    v  = &x3[20 * i];
+
+	    for(l = 0; l < 20; l+=2)
+	      _mm_store_pd(&v[l], _mm_setzero_pd());	      		
+
+	   
+
+	    for(l = 0; l < 20; l++)
+	      {
+
+		__m128d x1v = _mm_setzero_pd();
+		__m128d x2v = _mm_setzero_pd();	
+		double 
+		  *ev = &extEV[l * 20],
+		  *lv = &le[l * 20],
+		  *rv = &ri[l * 20];
+
+		for(j = 0; j < 20; j+=2)
+		  {
+		    x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+		    x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		  }
+
+		x1v = _mm_hadd_pd(x1v, x1v);
+		x2v = _mm_hadd_pd(x2v, x2v);
+
+		x1v = _mm_mul_pd(x1v, x2v);
+		
+		for(j = 0; j < 20; j+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[j]);
+		    vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+		    _mm_store_pd(&v[j], vv);
+		  }		    
+
+	      }
+
+	    { 	    
+	      __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	      
+	      scale = 1;
+	      for(l = 0; scale && (l < 20); l += 2)
+		{
+		  __m128d vv = _mm_load_pd(&v[l]);
+		  __m128d v1 = _mm_and_pd(vv, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}	    	  
+	    }
+
+
+	    if(scale)
+	      {
+
+		__m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d ex3v = _mm_load_pd(&v[l]);
+		    _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));		    
+		  }
+	
+		addScale += wgt[i];	  
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{
+	  le = &left[cptr[i] * 400];
+	  ri = &right[cptr[i] * 400];
+
+	  vl = &x1[20 * i];
+	  vr = &x2[20 * i];
+	  v = &x3[20 * i];
+
+
+	    for(l = 0; l < 20; l+=2)
+	      _mm_store_pd(&v[l], _mm_setzero_pd());	      		
+
+	 
+	  for(l = 0; l < 20; l++)
+	    {
+
+		__m128d x1v = _mm_setzero_pd();
+		__m128d x2v = _mm_setzero_pd();
+		double 
+		  *ev = &extEV[l * 20],
+		  *lv = &le[l * 20],
+		  *rv = &ri[l * 20];
+
+
+		for(j = 0; j < 20; j+=2)
+		  {
+		    x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+		    x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		  }
+
+		x1v = _mm_hadd_pd(x1v, x1v);
+		x2v = _mm_hadd_pd(x2v, x2v);
+
+		x1v = _mm_mul_pd(x1v, x2v);
+		
+		for(j = 0; j < 20; j+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[j]);
+		    vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+		    _mm_store_pd(&v[j], vv);
+		  }		    
+
+	    }
+
+	    { 	    
+	      __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	      
+	      scale = 1;
+	      for(l = 0; scale && (l < 20); l += 2)
+		{
+		  __m128d vv = _mm_load_pd(&v[l]);
+		  __m128d v1 = _mm_and_pd(vv, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}	    	  
+	    }
+   
+
+	   if(scale)
+	     {
+
+	       __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	       
+	       for(l = 0; l < 20; l+=2)
+		 {
+		   __m128d ex3v = _mm_load_pd(&v[l]);		  
+		   _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		 }		   		  
+
+
+	       
+	       addScale += wgt[i];	   
+	     }
+	}
+      break;
+    default:
+      assert(0);
+    }
+  
+ 
+  *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRCATPROT_SAVE(int tipCase, double *extEV,
+				   int *cptr,
+				   double *x1, double *x2, double *x3, double *tipVector,
+				   unsigned char *tipX1, unsigned char *tipX2,
+				   int n, double *left, double *right, int *wgt, int *scalerIncrement,
+				   unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+				   double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+  double
+    *le, 
+    *ri, 
+    *v, 
+    *vl, 
+    *vr,
+    *x1_ptr = x1,
+    *x2_ptr = x2, 
+    *x3_ptr = x3;
+
+  int 
+    i, 
+    l, 
+    j, 
+    scale, 
+    scaleGap = 0,
+    addScale = 0;
+
+  {
+    vl = x1_gapColumn;	      
+    vr = x2_gapColumn;
+    v = x3_gapColumn;
+
+    le = &left[maxCats * 400];
+    ri = &right[maxCats * 400];	  
+
+    for(l = 0; l < 20; l+=2)
+      _mm_store_pd(&v[l], _mm_setzero_pd());	      		
+	 
+    for(l = 0; l < 20; l++)
+      {
+	__m128d x1v = _mm_setzero_pd();
+	__m128d x2v = _mm_setzero_pd();
+	double 
+	  *ev = &extEV[l * 20],
+	  *lv = &le[l * 20],
+	  *rv = &ri[l * 20];
+
+
+	for(j = 0; j < 20; j+=2)
+	  {
+	    x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+	    x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+	  }
+	
+	x1v = _mm_hadd_pd(x1v, x1v);
+	x2v = _mm_hadd_pd(x2v, x2v);
+	
+	x1v = _mm_mul_pd(x1v, x2v);
+	
+	for(j = 0; j < 20; j+=2)
+	  {
+	    __m128d vv = _mm_load_pd(&v[j]);
+	    vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+	    _mm_store_pd(&v[j], vv);
+	  }		    	
+      }
+    
+    if(tipCase != TIP_TIP)
+      { 	    
+	__m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	      
+	scale = 1;
+	for(l = 0; scale && (l < 20); l += 2)
+	  {
+	    __m128d vv = _mm_load_pd(&v[l]);
+	    __m128d v1 = _mm_and_pd(vv, absMask.m);
+	    v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	    if(_mm_movemask_pd( v1 ) != 3)
+	      scale = 0;
+	  }	    	        
+  
+	if(scale)
+	  {
+	    __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	    
+	    for(l = 0; l < 20; l+=2)
+	      {
+		__m128d ex3v = _mm_load_pd(&v[l]);		  
+		_mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+	      }		   		  
+	       
+	    scaleGap = TRUE;	   
+	  }
+      }
+  }
+  
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	for (i = 0; i < n; i++)
+	  {
+	    if(noGap(x3_gap, i))
+	      {		
+		vl = &(tipVector[20 * tipX1[i]]);
+		vr = &(tipVector[20 * tipX2[i]]);
+		v  = x3_ptr;
+
+		if(isGap(x1_gap, i))
+		  le =  &left[maxCats * 400];
+		else	  	  
+		  le =  &left[cptr[i] * 400];	  
+	  
+		if(isGap(x2_gap, i))
+		  ri =  &right[maxCats * 400];
+		else	 	  
+		  ri =  &right[cptr[i] * 400];
+
+		for(l = 0; l < 20; l+=2)
+		  _mm_store_pd(&v[l], _mm_setzero_pd());	      		
+		
+		for(l = 0; l < 20; l++)
+		  {
+		    __m128d x1v = _mm_setzero_pd();
+		    __m128d x2v = _mm_setzero_pd();	 
+		    double 
+		      *ev = &extEV[l * 20],
+		      *lv = &le[l * 20],
+		      *rv = &ri[l * 20];
+		    
+		    for(j = 0; j < 20; j+=2)
+		      {
+			x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+			x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		      }
+		    
+		    x1v = _mm_hadd_pd(x1v, x1v);
+		    x2v = _mm_hadd_pd(x2v, x2v);
+		    
+		    x1v = _mm_mul_pd(x1v, x2v);
+		    
+		    for(j = 0; j < 20; j+=2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+			_mm_store_pd(&v[j], vv);
+		      }		   
+		  }
+
+		x3_ptr += 20;
+
+	      }   
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	for (i = 0; i < n; i++)
+	  {
+	    if(isGap(x3_gap, i))
+	      {
+		if(scaleGap)		   		    
+		  addScale += wgt[i];
+	      }
+	    else
+	      {	 
+		vl = &(tipVector[20 * tipX1[i]]);
+	      
+		vr = x2_ptr;
+		v = x3_ptr;
+
+		if(isGap(x1_gap, i))
+		  le =  &left[maxCats * 400];
+		else
+		  le =  &left[cptr[i] * 400];
+
+		if(isGap(x2_gap, i))
+		  {		 
+		    ri =  &right[maxCats * 400];
+		    vr = x2_gapColumn;
+		  }
+		else
+		  {
+		    ri =  &right[cptr[i] * 400];
+		    vr = x2_ptr;
+		    x2_ptr += 20;
+		  }	  	  	  	  		  
+
+		for(l = 0; l < 20; l+=2)
+		  _mm_store_pd(&v[l], _mm_setzero_pd());	      			   
+
+		for(l = 0; l < 20; l++)
+		  {
+		    __m128d x1v = _mm_setzero_pd();
+		    __m128d x2v = _mm_setzero_pd();	
+		    double 
+		      *ev = &extEV[l * 20],
+		      *lv = &le[l * 20],
+		      *rv = &ri[l * 20];
+		    
+		    for(j = 0; j < 20; j+=2)
+		      {
+			x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+			x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		      }
+		    
+		    x1v = _mm_hadd_pd(x1v, x1v);
+		    x2v = _mm_hadd_pd(x2v, x2v);
+		    
+		    x1v = _mm_mul_pd(x1v, x2v);
+		    
+		    for(j = 0; j < 20; j+=2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+			_mm_store_pd(&v[j], vv);
+		      }		    
+		  }
+		
+		{ 	    
+		  __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+		  
+		  scale = 1;
+		  for(l = 0; scale && (l < 20); l += 2)
+		    {
+		      __m128d vv = _mm_load_pd(&v[l]);
+		      __m128d v1 = _mm_and_pd(vv, absMask.m);
+		      v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		      if(_mm_movemask_pd( v1 ) != 3)
+			scale = 0;
+		    }	    	  
+		}
+		
+		
+		if(scale)
+		  {
+		    __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+		    
+		    for(l = 0; l < 20; l+=2)
+		      {
+			__m128d ex3v = _mm_load_pd(&v[l]);
+			_mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));		    
+		      }
+		    
+		    addScale += wgt[i];	  
+		  }
+		x3_ptr += 20;
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      for(i = 0; i < n; i++)
+	{ 
+	  if(isGap(x3_gap, i))
+	    {
+	      if(scaleGap)		   		    
+		addScale += wgt[i];
+	    }
+	  else
+	    {	  	     
+	      v = x3_ptr;
+	  	  
+	      if(isGap(x1_gap, i))
+		{
+		  vl = x1_gapColumn;
+		  le =  &left[maxCats * 400];
+		}
+	      else
+		{
+		  le =  &left[cptr[i] * 400];
+		  vl = x1_ptr;
+		  x1_ptr += 20;
+		}
+
+	      if(isGap(x2_gap, i))	
+		{
+		  vr = x2_gapColumn;
+		  ri =  &right[maxCats * 400];	    
+		}
+	      else
+		{
+		  ri =  &right[cptr[i] * 400];
+		  vr = x2_ptr;
+		  x2_ptr += 20;
+		}	 	  	  	  
+
+	      for(l = 0; l < 20; l+=2)
+		_mm_store_pd(&v[l], _mm_setzero_pd());	      		
+	 
+	      for(l = 0; l < 20; l++)
+		{
+		  __m128d x1v = _mm_setzero_pd();
+		  __m128d x2v = _mm_setzero_pd();
+		  double 
+		    *ev = &extEV[l * 20],
+		    *lv = &le[l * 20],
+		    *rv = &ri[l * 20];
+		  		  
+		  for(j = 0; j < 20; j+=2)
+		    {
+		      x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));		    
+		      x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+		    }
+		  
+		  x1v = _mm_hadd_pd(x1v, x1v);
+		  x2v = _mm_hadd_pd(x2v, x2v);
+		  
+		  x1v = _mm_mul_pd(x1v, x2v);
+		  
+		  for(j = 0; j < 20; j+=2)
+		    {
+		      __m128d vv = _mm_load_pd(&v[j]);
+		      vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+		      _mm_store_pd(&v[j], vv);
+		    }		    
+		  
+		}
+	      
+	      { 	    
+		__m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+		
+		scale = 1;
+		for(l = 0; scale && (l < 20); l += 2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    __m128d v1 = _mm_and_pd(vv, absMask.m);
+		    v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		    if(_mm_movemask_pd( v1 ) != 3)
+		      scale = 0;
+		  }	    	  
+	      }
+  
+	      if(scale)
+		{
+		  __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+		  
+		  for(l = 0; l < 20; l+=2)
+		    {
+		      __m128d ex3v = _mm_load_pd(&v[l]);		  
+		      _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		    }		   		  
+		  
+		  addScale += wgt[i];	   
+		}
+	      x3_ptr += 20;
+	    }
+	}
+      break;
+    default:
+      assert(0);
+    }
+  
+ 
+  *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRGAMMAPROT_LG4(int tipCase,
+				    double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+				    int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				    int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling)
+{
+  double  *uX1, *uX2, *v;
+  double x1px2;
+  int  i, j, l, k, scale, addScale = 0;
+  double *vl, *vr;
+#ifndef __SIM_SSE3
+  double al, ar;
+#endif
+
+
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+	double umpX1[1840], umpX2[1840];
+
+	for(i = 0; i < 23; i++)
+	  {
+	   
+
+	    for(k = 0; k < 80; k++)
+	      {
+		
+		v = &(tipVector[k / 20][20 * i]);
+#ifdef __SIM_SSE3
+		double *ll =  &left[k * 20];
+		double *rr =  &right[k * 20];
+		
+		__m128d umpX1v = _mm_setzero_pd();
+		__m128d umpX2v = _mm_setzero_pd();
+
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+		    umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+		umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+		
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+		_mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+#else
+		umpX1[80 * i + k] = 0.0;
+		umpX2[80 * i + k] = 0.0;
+
+		for(l = 0; l < 20; l++)
+		  {
+		    umpX1[80 * i + k] +=  v[l] *  left[k * 20 + l];
+		    umpX2[80 * i + k] +=  v[l] * right[k * 20 + l];
+		  }
+#endif
+	      }
+	  }
+
+	for(i = 0; i < n; i++)
+	  {
+	    uX1 = &umpX1[80 * tipX1[i]];
+	    uX2 = &umpX2[80 * tipX2[i]];
+
+	    for(j = 0; j < 4; j++)
+	      {
+		v = &x3[i * 80 + j * 20];
+
+#ifdef __SIM_SSE3
+		__m128d zero =  _mm_setzero_pd();
+		for(k = 0; k < 20; k+=2)		  		    
+		  _mm_store_pd(&v[k], zero);
+
+		for(k = 0; k < 20; k++)
+		  { 
+		    double *eev = &extEV[j][k * 20];
+		    x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+		    __m128d x1px2v = _mm_set1_pd(x1px2);
+
+		    for(l = 0; l < 20; l+=2)
+		      {
+		      	__m128d vv = _mm_load_pd(&v[l]);
+			__m128d ee = _mm_load_pd(&eev[l]);
+
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			
+			_mm_store_pd(&v[l], vv);
+		      }
+		  }
+
+#else
+
+		for(k = 0; k < 20; k++)
+		  v[k] = 0.0;
+
+		for(k = 0; k < 20; k++)
+		  {		   
+		    x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+		   
+		    for(l = 0; l < 20; l++)		      					
+		      v[l] += x1px2 * extEV[j][20 * k + l];		     
+		  }
+#endif
+	      }	   
+	  }
+      }
+      break;
+    case TIP_INNER:
+      {
+	double umpX1[1840], ump_x2[20];
+
+
+	for(i = 0; i < 23; i++)
+	  {
+	   
+
+	    for(k = 0; k < 80; k++)
+	      { 
+		v = &(tipVector[k / 20][20 * i]);
+#ifdef __SIM_SSE3
+		double *ll =  &left[k * 20];
+				
+		__m128d umpX1v = _mm_setzero_pd();
+		
+		for(l = 0; l < 20; l+=2)
+		  {
+		    __m128d vv = _mm_load_pd(&v[l]);
+		    umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));		    					
+		  }
+		
+		umpX1v = _mm_hadd_pd(umpX1v, umpX1v);				
+		_mm_storel_pd(&umpX1[80 * i + k], umpX1v);		
+#else	    
+		umpX1[80 * i + k] = 0.0;
+
+		for(l = 0; l < 20; l++)
+		  umpX1[80 * i + k] +=  v[l] * left[k * 20 + l];
+#endif
+
+	      }
+	  }
+
+	for (i = 0; i < n; i++)
+	  {
+	    uX1 = &umpX1[80 * tipX1[i]];
+
+	    for(k = 0; k < 4; k++)
+	      {
+		v = &(x2[80 * i + k * 20]);
+#ifdef __SIM_SSE3	       
+		for(l = 0; l < 20; l++)
+		  {		   
+		    double *r =  &right[k * 400 + l * 20];
+		    __m128d ump_x2v = _mm_setzero_pd();	    
+		    
+		    for(j = 0; j < 20; j+= 2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			__m128d rr = _mm_load_pd(&r[j]);
+			ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+		      }
+		     
+		    ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+		    
+		    _mm_storel_pd(&ump_x2[l], ump_x2v);		   		     
+		  }
+
+		v = &(x3[80 * i + 20 * k]);
+
+		__m128d zero =  _mm_setzero_pd();
+		for(l = 0; l < 20; l+=2)		  		    
+		  _mm_store_pd(&v[l], zero);
+		  
+		for(l = 0; l < 20; l++)
+		  {
+		    double *eev = &extEV[k][l * 20];
+		    x1px2 = uX1[k * 20 + l]  * ump_x2[l];
+		    __m128d x1px2v = _mm_set1_pd(x1px2);
+		  
+		    for(j = 0; j < 20; j+=2)
+		      {
+			__m128d vv = _mm_load_pd(&v[j]);
+			__m128d ee = _mm_load_pd(&eev[j]);
+			
+			vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+			
+			_mm_store_pd(&v[j], vv);
+		      }		     		    
+		  }			
+#else
+		for(l = 0; l < 20; l++)
+		  {
+		    ump_x2[l] = 0.0;
+
+		    for(j = 0; j < 20; j++)
+		      ump_x2[l] += v[j] * right[k * 400 + l * 20 + j];
+		  }
+
+		v = &(x3[80 * i + 20 * k]);
+
+		for(l = 0; l < 20; l++)
+		  v[l] = 0;
+
+		for(l = 0; l < 20; l++)
+		  {
+		    x1px2 = uX1[k * 20 + l]  * ump_x2[l];
+		    for(j = 0; j < 20; j++)
+		      v[j] += x1px2 * extEV[k][l * 20  + j];
+		  }
+#endif
+	      }
+	   
+#ifdef __SIM_SSE3
+	    { 
+	      v = &(x3[80 * i]);
+	      __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	      
+	      scale = 1;
+	      for(l = 0; scale && (l < 80); l += 2)
+		{
+		  __m128d vv = _mm_load_pd(&v[l]);
+		  __m128d v1 = _mm_and_pd(vv, absMask.m);
+		  v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+		  if(_mm_movemask_pd( v1 ) != 3)
+		    scale = 0;
+		}	    	  
+	    }
+#else
+	    v = &x3[80 * i];
+	    scale = 1;
+	    for(l = 0; scale && (l < 80); l++)
+	      scale = (ABS(v[l]) <  minlikelihood);
+#endif
+
+	    if (scale)
+	      {
+#ifdef __SIM_SSE3
+	       __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	       
+	       for(l = 0; l < 80; l+=2)
+		 {
+		   __m128d ex3v = _mm_load_pd(&v[l]);		  
+		   _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		 }		   		  
+#else
+		for(l = 0; l < 80; l++)
+		  v[l] *= twotothe256;
+#endif
+
+		if(useFastScaling)
+		  addScale += wgt[i];
+		else
+		  ex3[i]  += 1;	       
+	      }
+	  }
+      }
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+       {
+	 for(k = 0; k < 4; k++)
+	   {
+	     vl = &(x1[80 * i + 20 * k]);
+	     vr = &(x2[80 * i + 20 * k]);
+	     v =  &(x3[80 * i + 20 * k]);
+
+#ifdef __SIM_SSE3
+	     __m128d zero =  _mm_setzero_pd();
+	     for(l = 0; l < 20; l+=2)		  		    
+	       _mm_store_pd(&v[l], zero);
+#else
+	     for(l = 0; l < 20; l++)
+	       v[l] = 0;
+#endif
+
+	     for(l = 0; l < 20; l++)
+	       {		 
+#ifdef __SIM_SSE3
+		 {
+		   __m128d al = _mm_setzero_pd();
+		   __m128d ar = _mm_setzero_pd();
+
+		   double *ll   = &left[k * 400 + l * 20];
+		   double *rr   = &right[k * 400 + l * 20];
+		   double *EVEV = &extEV[k][20 * l];
+		   
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d lv  = _mm_load_pd(&ll[j]);
+		       __m128d rv  = _mm_load_pd(&rr[j]);
+		       __m128d vll = _mm_load_pd(&vl[j]);
+		       __m128d vrr = _mm_load_pd(&vr[j]);
+		       
+		       al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+		       ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+		     }  		 
+		       
+		   al = _mm_hadd_pd(al, al);
+		   ar = _mm_hadd_pd(ar, ar);
+		   
+		   al = _mm_mul_pd(al, ar);
+
+		   for(j = 0; j < 20; j+=2)
+		     {
+		       __m128d vv  = _mm_load_pd(&v[j]);
+		       __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+		       vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+		       _mm_store_pd(&v[j], vv);
+		     }		  		   		  
+		 }		 
+#else
+		 al = 0.0;
+		 ar = 0.0;
+
+		 for(j = 0; j < 20; j++)
+		   {
+		     al += vl[j] * left[k * 400 + l * 20 + j];
+		     ar += vr[j] * right[k * 400 + l * 20 + j];
+		   }
+
+		 x1px2 = al * ar;
+
+		 for(j = 0; j < 20; j++)
+		   v[j] += x1px2 * extEV[k][20 * l + j];
+#endif
+	       }
+	   }
+	 
+
+#ifdef __SIM_SSE3
+	 { 
+	   v = &(x3[80 * i]);
+	   __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	   
+	   scale = 1;
+	   for(l = 0; scale && (l < 80); l += 2)
+	     {
+	       __m128d vv = _mm_load_pd(&v[l]);
+	       __m128d v1 = _mm_and_pd(vv, absMask.m);
+	       v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	       if(_mm_movemask_pd( v1 ) != 3)
+		 scale = 0;
+	     }	    	  
+	 }
+#else
+	 v = &(x3[80 * i]);
+	 scale = 1;
+	 for(l = 0; scale && (l < 80); l++)
+	   scale = ((ABS(v[l]) <  minlikelihood));
+#endif
+
+	 if (scale)
+	   {
+#ifdef __SIM_SSE3
+	       __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	       
+	       for(l = 0; l < 80; l+=2)
+		 {
+		   __m128d ex3v = _mm_load_pd(&v[l]);		  
+		   _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));	
+		 }		   		  
+#else	     
+	     for(l = 0; l < 80; l++)
+	       v[l] *= twotothe256;
+#endif
+
+	     if(useFastScaling)
+	       addScale += wgt[i];
+	     else
+	       ex3[i]  += 1;	  
+	   }
+       }
+      break;
+    default:
+      assert(0);
+    }
+
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+
+}
+
+#endif
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/*** BINARY DATA functions *****/
+
+static void newviewGTRCAT_BINARY( int tipCase,  double *EV,  int *cptr,
+                                  double *x1_start,  double *x2_start,  double *x3_start,  double *tipVector,
+                                  int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+                                  int n,  double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling)
+{
+  double
+    *le,
+    *ri,
+    *x1, *x2, *x3;
+  int i, l, scale, addScale = 0;
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      {
+        for(i = 0; i < n; i++)
+          {
+            x1 = &(tipVector[2 * tipX1[i]]);
+            x2 = &(tipVector[2 * tipX2[i]]);
+            x3 = &x3_start[2 * i];         
+
+            le =  &left[cptr[i] * 4];
+            ri =  &right[cptr[i] * 4];
+
+            _mm_store_pd(x3, _mm_setzero_pd());     
+                     
+            for(l = 0; l < 2; l++)
+              {                                                                                                                          
+                __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+                __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+                
+                al = _mm_hadd_pd(al, al);
+                ar = _mm_hadd_pd(ar, ar);
+                
+                al = _mm_mul_pd(al, ar);
+                
+                __m128d vv  = _mm_load_pd(x3);
+                __m128d EVV = _mm_load_pd(&EV[2 * l]);
+                
+                vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+                
+                _mm_store_pd(x3, vv);                                                     
+              }            
+          }
+      }
+      break;
+    case TIP_INNER:
+      {
+        for (i = 0; i < n; i++)
+          {
+            x1 = &(tipVector[2 * tipX1[i]]);
+            x2 = &x2_start[2 * i];
+            x3 = &x3_start[2 * i];
+            
+            le =  &left[cptr[i] * 4];
+            ri =  &right[cptr[i] * 4];
+
+            _mm_store_pd(x3, _mm_setzero_pd());     
+                     
+            for(l = 0; l < 2; l++)
+              {                                                                                                                          
+                __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+                __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+                
+                al = _mm_hadd_pd(al, al);
+                ar = _mm_hadd_pd(ar, ar);
+                
+                al = _mm_mul_pd(al, ar);
+                
+                __m128d vv  = _mm_load_pd(x3);
+                __m128d EVV = _mm_load_pd(&EV[2 * l]);
+                
+                vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+                
+                _mm_store_pd(x3, vv);                                                     
+              }  
+            
+            __m128d minlikelihood_sse = _mm_set1_pd(minlikelihood);
+         
+            scale = 1;
+            
+            __m128d v1 = _mm_and_pd(_mm_load_pd(x3), absMask.m);
+            v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+            if(_mm_movemask_pd( v1 ) != 3)
+              scale = 0;                         
+            
+            if(scale)
+              {
+                __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+                
+                __m128d ex3v = _mm_load_pd(x3);           
+                _mm_store_pd(x3, _mm_mul_pd(ex3v,twoto));                                                 
+                
+                if(useFastScaling)
+                  addScale += wgt[i];
+                else
+                  ex3[i]  += 1;   
+              }                    
+          }
+      }
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+        {
+          x1 = &x1_start[2 * i];
+          x2 = &x2_start[2 * i];
+          x3 = &x3_start[2 * i];
+
+          le = &left[cptr[i] * 4];
+          ri = &right[cptr[i] * 4];
+
+          _mm_store_pd(x3, _mm_setzero_pd());       
+          
+          for(l = 0; l < 2; l++)
+            {                                                                                                                            
+              __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+              __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+              
+              al = _mm_hadd_pd(al, al);
+              ar = _mm_hadd_pd(ar, ar);
+              
+              al = _mm_mul_pd(al, ar);
+              
+              __m128d vv  = _mm_load_pd(x3);
+              __m128d EVV = _mm_load_pd(&EV[2 * l]);
+              
+              vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+              
+              _mm_store_pd(x3, vv);                                                       
+            }                             
+
+          __m128d minlikelihood_sse = _mm_set1_pd(minlikelihood);
+         
+          scale = 1;
+                  
+          __m128d v1 = _mm_and_pd(_mm_load_pd(x3), absMask.m);
+          v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+          if(_mm_movemask_pd( v1 ) != 3)
+            scale = 0;                   
+         
+          if(scale)
+            {
+              __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+                    
+              __m128d ex3v = _mm_load_pd(x3);             
+              _mm_store_pd(x3, _mm_mul_pd(ex3v,twoto));                                           
+             
+              if(useFastScaling)
+                addScale += wgt[i];
+              else
+                ex3[i]  += 1;     
+           }             
+        }
+      break;
+    default:
+      assert(0);
+    }
+
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRGAMMA_BINARY(int tipCase,
+				   double *x1_start, double *x2_start, double *x3_start,
+				   double *EV, double *tipVector,
+				   int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+				   const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling
+				   )
+{
+  double
+    *x1, *x2, *x3;
+ 
+  int i, k, l, scale, addScale = 0; 
+
+  switch(tipCase)
+    {
+    case TIP_TIP:
+      for (i = 0; i < n; i++)
+       {
+	 x1  = &(tipVector[2 * tipX1[i]]);
+	 x2  = &(tipVector[2 * tipX2[i]]);
+	 
+	 for(k = 0; k < 4; k++)
+	   {	     	     	    
+	     x3 = &(x3_start[8 * i + 2 * k]);	     
+	    	         
+	     _mm_store_pd(x3, _mm_setzero_pd());	    
+	    	     
+	     for(l = 0; l < 2; l++)
+	       {		 		 						   		  		 		 
+		 __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+		 __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+		 		       
+		 al = _mm_hadd_pd(al, al);
+		 ar = _mm_hadd_pd(ar, ar);
+		   
+		 al = _mm_mul_pd(al, ar);
+		   
+		 __m128d vv  = _mm_load_pd(x3);
+		 __m128d EVV = _mm_load_pd(&EV[2 * l]);
+		 
+		 vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+		 
+		 _mm_store_pd(x3, vv);		     	  		   		  
+	       }	     	    
+	   }
+       }
+      break;
+    case TIP_INNER:
+      for (i = 0; i < n; i++)
+       {
+	 x1  = &(tipVector[2 * tipX1[i]]);
+	 
+	 for(k = 0; k < 4; k++)
+	   {	     	     
+	     x2 = &(x2_start[8 * i + 2 * k]);
+	     x3 = &(x3_start[8 * i + 2 * k]);	     
+	    	         
+	     _mm_store_pd(x3, _mm_setzero_pd());	    
+	    	     
+	     for(l = 0; l < 2; l++)
+	       {		 		 						   		  		 		 
+		 __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+		 __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+		 		       
+		 al = _mm_hadd_pd(al, al);
+		 ar = _mm_hadd_pd(ar, ar);
+		   
+		 al = _mm_mul_pd(al, ar);
+		   
+		 __m128d vv  = _mm_load_pd(x3);
+		 __m128d EVV = _mm_load_pd(&EV[2 * l]);
+		 
+		 vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+		 
+		 _mm_store_pd(x3, vv);		     	  		   		  
+	       }	     	    
+	   }
+	
+	 x3 = &(x3_start[8 * i]);
+	 __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	 
+	 scale = 1;
+	 for(l = 0; scale && (l < 8); l += 2)
+	   {
+	     __m128d vv = _mm_load_pd(&x3[l]);
+	     __m128d v1 = _mm_and_pd(vv, absMask.m);
+	     v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	     if(_mm_movemask_pd( v1 ) != 3)
+	       scale = 0;
+	   }	    	         
+	 
+	 if(scale)
+	   {
+	     __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	     
+	     for(l = 0; l < 8; l+=2)
+	       {
+		 __m128d ex3v = _mm_load_pd(&x3[l]);		  
+		 _mm_store_pd(&x3[l], _mm_mul_pd(ex3v,twoto));	
+	       }		   		  
+	     
+	     if(useFastScaling)
+	       addScale += wgt[i];
+	     else
+	       ex3[i]  += 1;	  
+	   }	 
+       }      
+      break;
+    case INNER_INNER:
+      for (i = 0; i < n; i++)
+       {	 
+	 for(k = 0; k < 4; k++)
+	   {	     
+	     x1 = &(x1_start[8 * i + 2 * k]);
+	     x2 = &(x2_start[8 * i + 2 * k]);
+	     x3 = &(x3_start[8 * i + 2 * k]);	     
+	    	         
+	     _mm_store_pd(x3, _mm_setzero_pd());	    
+	    	     
+	     for(l = 0; l < 2; l++)
+	       {		 		 						   		  		 		 
+		 __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+		 __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+		 		       
+		 al = _mm_hadd_pd(al, al);
+		 ar = _mm_hadd_pd(ar, ar);
+		   
+		 al = _mm_mul_pd(al, ar);
+		   
+		 __m128d vv  = _mm_load_pd(x3);
+		 __m128d EVV = _mm_load_pd(&EV[2 * l]);
+		 
+		 vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+		 
+		 _mm_store_pd(x3, vv);		     	  		   		  
+	       }	     	    
+	   }
+	
+	 x3 = &(x3_start[8 * i]);
+	 __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+	 
+	 scale = 1;
+	 for(l = 0; scale && (l < 8); l += 2)
+	   {
+	     __m128d vv = _mm_load_pd(&x3[l]);
+	     __m128d v1 = _mm_and_pd(vv, absMask.m);
+	     v1 = _mm_cmplt_pd(v1,  minlikelihood_sse);
+	     if(_mm_movemask_pd( v1 ) != 3)
+	       scale = 0;
+	   }	    	         
+	 
+	 if(scale)
+	   {
+	     __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+	     
+	     for(l = 0; l < 8; l+=2)
+	       {
+		 __m128d ex3v = _mm_load_pd(&x3[l]);		  
+		 _mm_store_pd(&x3[l], _mm_mul_pd(ex3v,twoto));	
+	       }		   		  
+	     
+	     if(useFastScaling)
+	       addScale += wgt[i];
+	     else
+	       ex3[i]  += 1;	  
+	   }	 
+       }
+      break;
+
+    default:
+      assert(0);
+    }
+
+  if(useFastScaling)
+    *scalerIncrement = addScale;
+
+}
+
+
+/**** BINARY DATA functions end ****/
+
+
+
+#endif
+
+
diff --git a/examl/optimizeModel.c b/examl/optimizeModel.c
new file mode 100644
index 0000000..a9b5ebb
--- /dev/null
+++ b/examl/optimizeModel.c
@@ -0,0 +1,3134 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands 
+ *  of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+
+static const double MNBRAK_GOLD =    1.618034;
+static const double MNBRAK_TINY =      1.e-20;
+static const double MNBRAK_GLIMIT =     100.0;
+static const double BRENT_ZEPS  =      1.e-5;
+static const double BRENT_CGOLD =   0.3819660;
+
+extern int optimizeRatesInvocations;
+extern int optimizeRateCategoryInvocations;
+extern int optimizeAlphaInvocations;
+extern int optimizeInvarInvocations;
+extern double masterTime;
+extern char ratesFileName[1024];
+extern char workdir[1024];
+extern char run_id[128];
+extern char lengthFileName[1024];
+extern char lengthFileNameModel[1024];
+extern char *protModels[NUM_PROT_MODELS];
+
+extern checkPointState ckp;
+
+extern int processes;
+extern int processID;
+
+static void optParamGeneric(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int rateNumber, double lim_inf, double lim_sup, int whichParameterType);
+
+// FLAG for easier debugging of model parameter optimization routines 
+
+//#define _DEBUG_MOD_OPT
+
+
+/*********************FUNCTIONS FOOR EXACT MODEL OPTIMIZATION UNDER GTRGAMMA ***************************************/
+
+
+static void setRateModel(tree *tr, int model, double rate, int position)
+{
+  int
+    states   = tr->partitionData[model].states,
+    numRates = (states * states - states) / 2;
+
+  if(tr->partitionData[model].dataType == DNA_DATA)
+    assert(position >= 0 && position < (numRates - 1));
+  else
+    assert(position >= 0 && position < numRates);
+
+  assert(tr->partitionData[model].dataType != BINARY_DATA); 
+
+  assert(rate >= RATE_MIN && rate <= RATE_MAX);
+
+  if(tr->partitionData[model].nonGTR)
+    {    
+      int 
+	i, 
+	index = tr->partitionData[model].symmetryVector[position],
+	lastRate = tr->partitionData[model].symmetryVector[numRates - 1];
+
+      
+           
+      for(i = 0; i < numRates; i++)
+	{	
+	  if(tr->partitionData[model].symmetryVector[i] == index)
+	    {
+	      if(index == lastRate)
+		tr->partitionData[model].substRates[i] = 1.0;
+	      else
+		tr->partitionData[model].substRates[i] = rate;      
+	    }
+	  
+	  //printf("%f ", tr->partitionData[model].substRates[i]);
+	}
+      //printf("\n");
+    }
+  else
+    tr->partitionData[model].substRates[position] = rate;
+}
+
+
+//LIBRARY: the only thing that we will need to do here is to 
+//replace linkList by a string and also add some error correction 
+//code
+
+
+static linkageList* initLinkageList(int *linkList, tree *tr)
+{
+  int 
+    k,
+    partitions,
+    numberOfModels = 0,
+    i,
+    pos;
+  
+  linkageList
+    *ll = (linkageList*)malloc(sizeof(linkageList));
+      
+  for(i = 0; i < tr->NumberOfModels; i++)    
+    {
+      assert(linkList[i] >= 0 && linkList[i] < tr->NumberOfModels);
+
+      if(linkList[i] > numberOfModels)
+	numberOfModels = linkList[i];
+    }
+
+  numberOfModels++;
+  
+  ll->entries = numberOfModels;
+  ll->ld      = (linkageData*)malloc(sizeof(linkageData) * numberOfModels);
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      ll->ld[i].valid = TRUE;
+      
+      partitions = 0;
+
+      for(k = 0; k < tr->NumberOfModels; k++)	
+	if(linkList[k] == i)
+	  partitions++;	    
+
+      ll->ld[i].partitions = partitions;
+      ll->ld[i].partitionList = (int*)malloc(sizeof(int) * partitions);
+      
+      for(k = 0, pos = 0; k < tr->NumberOfModels; k++)	
+	if(linkList[k] == i)
+	  ll->ld[i].partitionList[pos++] = k;
+    }
+
+  return ll;
+}
+
+static linkageList* initLinkageListString(char *linkageString, tree *tr)
+{
+  int 
+    *list = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+    j;
+
+  linkageList 
+    *l;
+
+  char
+    *str1,
+    *saveptr,
+    *ch = (char *)calloc(strlen(linkageString), sizeof(char)),
+    *token;
+  strncpy(ch, linkageString, strlen(linkageString));
+
+  for(j = 0, str1 = ch; ;j++, str1 = (char *)NULL) 
+    {
+      token = strtok_r(str1, ",", &saveptr);
+      if(token == (char *)NULL)
+	break;
+      assert(j < tr->NumberOfModels);
+      list[j] = atoi(token);
+      //printf("%d: %s\n", j, token);
+    }
+  
+  free(ch);
+
+  l = initLinkageList(list, tr);
+  
+  free(list);
+
+  return l;
+}
+
+static void init_Q_MatrixSymmetries(char *linkageString, tree *tr, int model)
+{
+  int 
+    states = tr->partitionData[model].states,
+    numberOfRates = ((states * states - states) / 2), 
+    *list = (int *)malloc(sizeof(int) * numberOfRates),
+    j,
+    max = -1;
+
+  char
+    *str1,
+    *saveptr,
+    *ch = (char*)calloc(strlen(linkageString), sizeof(char)), 
+    *token;
+  
+  strncpy(ch, linkageString, strlen(linkageString)); 
+
+  for(j = 0, str1 = ch; ;j++, str1 = (char *)NULL) 
+    {
+      token = strtok_r(str1, ",", &saveptr);
+      if(token == (char *)NULL)
+	break;
+      assert(j < numberOfRates);
+      list[j] = atoi(token);     
+    }
+  
+  free(ch);
+
+  for(j = 0; j < numberOfRates; j++)
+    {
+      assert(list[j] <= j);
+      assert(list[j] <= max + 1);
+      
+      if(list[j] > max)
+	max = list[j];
+    }  
+
+  assert(numberOfRates == 6);
+  
+  for(j = 0; j < numberOfRates; j++)  
+    tr->partitionData[model].symmetryVector[j] = list[j];    
+
+  //less than the maximum possible number of rate parameters
+
+  if(max < numberOfRates - 1)    
+    tr->partitionData[model].nonGTR = TRUE;
+
+  free(list);
+}
+
+
+
+static linkageList* initLinkageListGTR(tree *tr)
+{
+  int
+    i,
+    *links = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+    firstAA = tr->NumberOfModels + 2,
+    countGTR = 0,
+    countOtherModel = 0;
+  linkageList* ll;
+
+  for(i = 0; i < tr->NumberOfModels; i++)
+    {     
+      if(tr->partitionData[i].dataType == AA_DATA)
+	{
+	  if(tr->partitionData[i].protModels == GTR)
+	    {
+	      if(i < firstAA)
+		firstAA = i;
+	      countGTR++;
+	    }
+	  else
+	    countOtherModel++;
+	}
+    }
+  
+  assert((countGTR > 0 && countOtherModel == 0) || (countGTR == 0 && countOtherModel > 0) ||  (countGTR == 0 && countOtherModel == 0));
+
+  if(countGTR == 0)
+    {
+      for(i = 0; i < tr->NumberOfModels; i++)
+	links[i] = i;
+    }
+  else
+    {
+      for(i = 0; i < tr->NumberOfModels; i++)
+	{
+	  switch(tr->partitionData[i].dataType)
+	    {	   
+	    case DNA_DATA:
+	    case BINARY_DATA:
+	    case GENERIC_32:
+	    case GENERIC_64:
+	    case SECONDARY_DATA:
+	    case SECONDARY_DATA_6:
+	    case SECONDARY_DATA_7: 
+	      links[i] = i;
+	      break;
+	    case AA_DATA:	  
+	      links[i] = firstAA;
+	      break;
+	    default:
+	      assert(0);
+	    }
+	}
+    }
+  
+
+  ll = initLinkageList(links, tr);
+
+  free(links);
+  
+  return ll;
+}
+
+
+
+static void freeLinkageList( linkageList* ll)
+{
+  int i;    
+
+  for(i = 0; i < ll->entries; i++)    
+    free(ll->ld[i].partitionList);         
+
+  free(ll->ld);
+  free(ll);   
+}
+
+#define ALPHA_F    0
+#define RATE_F     1
+#define FREQ_F     2
+#define LXRATE_F   3
+#define LXWEIGHT_F 4
+
+void scaleLG4X_EIGN(tree *tr, int model)
+{
+  double 
+    acc = 0.0;
+
+  int 
+    i, 
+    l;
+          
+  for(i = 0; i < 4; i++)	     
+    acc += tr->partitionData[model].weights[i] *  tr->partitionData[model].gammaRates[i];
+
+  acc = 1.0 / acc;
+
+  /*
+    printf("update %f %f %f %f %f\n", acc, tr->partitionData[model].gammaRates[0], tr->partitionData[model].gammaRates[1], tr->partitionData[model].gammaRates[2], 
+    tr->partitionData[model].gammaRates[3]);
+
+    printf("weigths: %f %f %f %f\n", tr->partitionData[model].weights[0], tr->partitionData[model].weights[1], tr->partitionData[model].weights[2], 
+    tr->partitionData[model].weights[3]);
+  */
+
+  for(i = 0; i < 4; i++)
+    for(l = 0; l < 20; l++)
+	tr->partitionData[model].EIGN_LG4[i][l] = tr->partitionData[model].rawEIGN_LG4[i][l] * acc;
+}
+
+
+static void updateWeights(tree *tr, int model, int rate, double value)
+{
+  int 
+    j;
+
+  double 
+    w = 0.0;
+
+  assert(rate >= 0 && rate < 4);
+
+  tr->partitionData[model].weightExponents[rate] = value;
+
+  for(j = 0; j < 4; j++)
+    w += exp(tr->partitionData[model].weightExponents[j]);
+
+  for(j = 0; j < 4; j++)	    	    
+    tr->partitionData[model].weights[j] = exp(tr->partitionData[model].weightExponents[j]) / w;
+}
+
+static void optimizeWeights(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels)
+{
+  int 
+    i;
+  
+  double 
+    initialLH = 0.0,
+    finalLH   = 0.0;
+
+  evaluateGeneric(tr, tr->start, TRUE);
+ 
+  initialLH = tr->likelihood;
+  //printf("W: %f %f [%f] ->", tr->perPartitionLH[0], tr->perPartitionLH[1], initialLH);
+
+  for(i = 0; i < 4; i++)   
+    optParamGeneric(tr, modelEpsilon, ll, numberOfModels, i, -1000000.0, 200.0, LXWEIGHT_F);
+    //optLG4X_Weights(tr, ll, numberOfModels, i, modelEpsilon);
+
+  evaluateGeneric(tr, tr->start, TRUE); 
+
+  finalLH = tr->likelihood;
+
+  if(finalLH < initialLH)
+    printf("Final: %f initial: %f\n", finalLH, initialLH);
+  assert(finalLH >= initialLH);
+
+  //printf("%f %f [%f]\n",  tr->perPartitionLH[0], tr->perPartitionLH[1], finalLH);
+}
+
+
+static void changeModelParameters(int index, int rateNumber, double value, int whichParameterType, tree *tr)
+{
+  switch(whichParameterType)
+    {
+    case RATE_F:
+      setRateModel(tr, index, value, rateNumber);  
+      initReversibleGTR(tr, index);		 
+      break;
+    case ALPHA_F:
+      tr->partitionData[index].alpha = value;
+      makeGammaCats(tr->partitionData[index].alpha, tr->partitionData[index].gammaRates, 4, tr->useMedian);
+      break;
+    case FREQ_F:
+      {
+	int
+	  states = tr->partitionData[index].states,
+	  j;
+
+	double 
+	  w = 0.0;
+
+	tr->partitionData[index].freqExponents[rateNumber] = value;
+
+	for(j = 0; j < states; j++)
+	  w += exp(tr->partitionData[index].freqExponents[j]);
+
+	for(j = 0; j < states; j++)	    	    
+	  tr->partitionData[index].frequencies[j] = exp(tr->partitionData[index].freqExponents[j]) / w;
+	
+	initReversibleGTR(tr, index);
+      }
+      break;
+    case LXRATE_F:
+      tr->partitionData[index].gammaRates[rateNumber] = value;
+      scaleLG4X_EIGN(tr, index);
+      break;
+    case LXWEIGHT_F:
+      updateWeights(tr, index, rateNumber, value);
+      scaleLG4X_EIGN(tr, index);
+      break;
+    default:
+      assert(0);
+    }
+}
+
+static void evaluateChange(tree *tr, int rateNumber, double *value, double *result, boolean* converged, int whichFunction, int numberOfModels, linkageList *ll, double modelEpsilon)
+{ 
+  int 
+    i, 
+    k, 
+    pos;
+   
+  for(i = 0, pos = 0; i < ll->entries; i++)
+    {
+      if(ll->ld[i].valid)
+	{
+	  if(converged[pos])
+	    {
+	      //if parameter optimizations for this specific model have converged 
+	      //set executeModel to FALSE 
+
+	      for(k = 0; k < ll->ld[i].partitions; k++)
+		tr->executeModel[ll->ld[i].partitionList[k]] = FALSE;
+	    }
+	  else
+	    {	      
+	      for(k = 0; k < ll->ld[i].partitions; k++)
+		{
+		  int 
+		    index = ll->ld[i].partitionList[k];
+
+		  changeModelParameters(index, rateNumber, value[pos], whichFunction, tr);		    		  
+		}
+	    }
+	  pos++;
+	}
+      else
+	{
+	  // if this partition is not being optimized anyway (e.g., we may be optimizing GTR rates for all DNA partitions,
+	  // but there are also a couple of Protein partitions with fixed models like WAG, JTT, etc.) set executeModel to FALSE
+	  
+	  for(k = 0; k < ll->ld[i].partitions; k++)
+	    tr->executeModel[ll->ld[i].partitionList[k]] = FALSE;	     
+	}      
+    }
+
+  assert(pos == numberOfModels);
+
+  //some error checks for individual model parameters
+
+  switch(whichFunction)
+    {      
+    case RATE_F:
+      assert(rateNumber != -1);       
+      break;
+    case ALPHA_F:	     
+      break;    
+    case LXRATE_F:
+      assert(rateNumber != -1);
+    case LXWEIGHT_F:
+      assert(rateNumber != -1);
+      break;
+    case FREQ_F:
+      break;
+    default:
+      assert(0);      
+    }
+
+  switch(whichFunction)
+    {
+    case RATE_F:
+    case ALPHA_F:  
+    case LXRATE_F: 
+    case FREQ_F: 
+    case LXWEIGHT_F: 
+      evaluateGeneric(tr, tr->start, TRUE);          
+      break;
+    default:
+      assert(0);
+    }   
+    
+
+  //LIBRARY: need to switch over parallel regions here either call 
+  //the one for the rates or for alpha!
+  
+  //commented out evaluate below in the course of the LG4X integration
+  //evaluateGeneric(tr, tr->start, TRUE);  
+               
+  for(i = 0, pos = 0; i < ll->entries; i++)	
+    {
+      if(ll->ld[i].valid)
+	{
+	  result[pos] = 0.0;
+	  
+	  for(k = 0; k < ll->ld[i].partitions; k++)
+	    {
+	      int 
+		index = ll->ld[i].partitionList[k];
+
+	      assert(tr->perPartitionLH[index] <= 0.0);
+	      
+	      result[pos] -= tr->perPartitionLH[index];
+	      
+	    }
+	  pos++;
+	}
+
+      //set execute model for ALL partitions to true again 
+      //for consistency 
+
+      for(k = 0; k < ll->ld[i].partitions; k++)
+	{
+	  int 
+	    index = ll->ld[i].partitionList[k];	  
+	  tr->executeModel[index] = TRUE;
+	}	  
+    }
+  
+  assert(pos == numberOfModels);   
+}
+
+
+
+static void brentGeneric(double *ax, double *bx, double *cx, double *fb, double tol, double *xmin, double *result, int numberOfModels, 
+			 int whichFunction, int rateNumber, tree *tr, linkageList *ll, double *lim_inf, double *lim_sup)
+{
+  int iter, i;
+  double 
+    *a     = (double *)malloc(sizeof(double) * numberOfModels),
+    *b     = (double *)malloc(sizeof(double) * numberOfModels),
+    *d     = (double *)malloc(sizeof(double) * numberOfModels),
+    *etemp = (double *)malloc(sizeof(double) * numberOfModels),
+    *fu    = (double *)malloc(sizeof(double) * numberOfModels),
+    *fv    = (double *)malloc(sizeof(double) * numberOfModels),
+    *fw    = (double *)malloc(sizeof(double) * numberOfModels),
+    *fx    = (double *)malloc(sizeof(double) * numberOfModels),
+    *p     = (double *)malloc(sizeof(double) * numberOfModels),
+    *q     = (double *)malloc(sizeof(double) * numberOfModels),
+    *r     = (double *)malloc(sizeof(double) * numberOfModels),
+    *tol1  = (double *)malloc(sizeof(double) * numberOfModels),
+    *tol2  = (double *)malloc(sizeof(double) * numberOfModels),
+    *u     = (double *)malloc(sizeof(double) * numberOfModels),
+    *v     = (double *)malloc(sizeof(double) * numberOfModels),
+    *w     = (double *)malloc(sizeof(double) * numberOfModels),
+    *x     = (double *)malloc(sizeof(double) * numberOfModels),
+    *xm    = (double *)malloc(sizeof(double) * numberOfModels),
+    *e     = (double *)malloc(sizeof(double) * numberOfModels);
+  boolean *converged = (boolean *)malloc(sizeof(boolean) * numberOfModels);
+  boolean allConverged;
+  
+  for(i = 0; i < numberOfModels; i++)    
+    converged[i] = FALSE;
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      e[i] = 0.0;
+      d[i] = 0.0;
+    }
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      a[i]=((ax[i] < cx[i]) ? ax[i] : cx[i]);
+      b[i]=((ax[i] > cx[i]) ? ax[i] : cx[i]);
+      x[i] = w[i] = v[i] = bx[i];
+      fw[i] = fv[i] = fx[i] = fb[i];
+    }
+
+  for(i = 0; i < numberOfModels; i++)
+    {      
+      assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+      assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+      assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+      assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+      assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+    }
+  
+  
+
+  for(iter = 1; iter <= ITMAX; iter++)
+    {
+      allConverged = TRUE;
+
+      for(i = 0; i < numberOfModels && allConverged; i++)
+	allConverged = allConverged && converged[i];
+
+      if(allConverged)
+	{
+	  free(converged);
+	  free(a);
+	  free(b);
+	  free(d);
+	  free(etemp);
+	  free(fu);
+	  free(fv);
+	  free(fw);
+	  free(fx);
+	  free(p);
+	  free(q);
+	  free(r);
+	  free(tol1);
+	  free(tol2);
+	  free(u);
+	  free(v);
+	  free(w);
+	  free(x);
+	  free(xm);
+	  free(e);
+	  return;
+	}     
+
+      for(i = 0; i < numberOfModels; i++)
+	{
+	  if(!converged[i])
+	    {	     	      
+	      assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+	      assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+	      assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+	      assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+	      assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+  
+	      xm[i] = 0.5 * (a[i] + b[i]);
+	      tol2[i] = 2.0 * (tol1[i] = tol * fabs(x[i]) + BRENT_ZEPS);
+	  
+	      if(fabs(x[i] - xm[i]) <= (tol2[i] - 0.5 * (b[i] - a[i])))
+		{		 
+		  result[i] =  -fx[i];
+		  xmin[i]   = x[i];
+		  converged[i] = TRUE;		  
+		}
+	      else
+		{
+		  if(fabs(e[i]) > tol1[i])
+		    {		     
+		      r[i] = (x[i] - w[i]) * (fx[i] - fv[i]);
+		      q[i] = (x[i] - v[i]) * (fx[i] - fw[i]);
+		      p[i] = (x[i] - v[i]) * q[i] - (x[i] - w[i]) * r[i];
+		      q[i] = 2.0 * (q[i] - r[i]);
+		      if(q[i] > 0.0)
+			p[i] = -p[i];
+		      q[i] = fabs(q[i]);
+		      etemp[i] = e[i];
+		      e[i] = d[i];
+		      if((fabs(p[i]) >= fabs(0.5 * q[i] * etemp[i])) || (p[i] <= q[i] * (a[i]-x[i])) || (p[i] >= q[i] * (b[i] - x[i])))
+			d[i] = BRENT_CGOLD * (e[i] = (x[i] >= xm[i] ? a[i] - x[i] : b[i] - x[i]));
+		      else
+			{
+			  d[i] = p[i] / q[i];
+			  u[i] = x[i] + d[i];
+			  if( u[i] - a[i] < tol2[i] || b[i] - u[i] < tol2[i])
+			    d[i] = SIGN(tol1[i], xm[i] - x[i]);
+			}
+		    }
+		  else
+		    {		     
+		      d[i] = BRENT_CGOLD * (e[i] = (x[i] >= xm[i] ? a[i] - x[i]: b[i] - x[i]));
+		    }
+		  u[i] = ((fabs(d[i]) >= tol1[i]) ? (x[i] + d[i]): (x[i] +SIGN(tol1[i], d[i])));
+		}
+
+	      if(!converged[i])
+		assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+	    }
+	}
+                 
+      evaluateChange(tr, rateNumber, u, fu, converged, whichFunction, numberOfModels, ll, tol);
+
+      for(i = 0; i < numberOfModels; i++)
+	{
+	  if(!converged[i])
+	    {
+	      if(fu[i] <= fx[i])
+		{
+		  if(u[i] >= x[i])
+		    a[i] = x[i];
+		  else
+		    b[i] = x[i];
+		  
+		  SHFT(v[i],w[i],x[i],u[i]);
+		  SHFT(fv[i],fw[i],fx[i],fu[i]);
+		}
+	      else
+		{
+		  if(u[i] < x[i])
+		    a[i] = u[i];
+		  else
+		    b[i] = u[i];
+		  
+		  if(fu[i] <= fw[i] || w[i] == x[i])
+		    {
+		      v[i] = w[i];
+		      w[i] = u[i];
+		      fv[i] = fw[i];
+		      fw[i] = fu[i];
+		    }
+		  else
+		    {
+		      if(fu[i] <= fv[i] || v[i] == x[i] || v[i] == w[i])
+			{
+			  v[i] = u[i];
+			  fv[i] = fu[i];
+			}
+		    }	    
+		}
+	      
+	      assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+	      assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+	      assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+	      assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+	      assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+	      assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+	    }
+	}
+    }
+
+  free(converged);
+  free(a);
+  free(b);
+  free(d);
+  free(etemp);
+  free(fu);
+  free(fv);
+  free(fw);
+  free(fx);
+  free(p);
+  free(q);
+  free(r);
+  free(tol1);
+  free(tol2);
+  free(u);
+  free(v);
+  free(w);
+  free(x);
+  free(xm);
+  free(e);
+
+  printf("\n. Too many iterations in BRENT !");
+  assert(0);
+}
+
+
+
+static int brakGeneric(double *param, double *ax, double *bx, double *cx, double *fa, double *fb, 
+		       double *fc, double *lim_inf, double *lim_sup, 
+		       int numberOfModels, int rateNumber, int whichFunction, tree *tr, linkageList *ll, double modelEpsilon)
+{
+  double 
+    *ulim = (double *)malloc(sizeof(double) * numberOfModels),
+    *u    = (double *)malloc(sizeof(double) * numberOfModels),
+    *r    = (double *)malloc(sizeof(double) * numberOfModels),
+    *q    = (double *)malloc(sizeof(double) * numberOfModels),
+    *fu   = (double *)malloc(sizeof(double) * numberOfModels),
+    *dum  = (double *)malloc(sizeof(double) * numberOfModels), 
+    *temp = (double *)malloc(sizeof(double) * numberOfModels);
+  
+  int 
+    i,
+    *state    = (int *)malloc(sizeof(int) * numberOfModels),
+    *endState = (int *)malloc(sizeof(int) * numberOfModels);
+
+  boolean *converged = (boolean *)malloc(sizeof(boolean) * numberOfModels);
+  boolean allConverged;
+
+  for(i = 0; i < numberOfModels; i++)
+    converged[i] = FALSE;
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      state[i] = 0;
+      endState[i] = 0;
+
+      u[i] = 0.0;
+
+      param[i] = ax[i];
+
+      if(param[i] > lim_sup[i]) 	
+	param[i] = ax[i] = lim_sup[i];
+      
+      if(param[i] < lim_inf[i]) 
+	param[i] = ax[i] = lim_inf[i];
+
+      assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+    }
+   
+  
+  evaluateChange(tr, rateNumber, param, fa, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      param[i] = bx[i];
+      if(param[i] > lim_sup[i]) 
+	param[i] = bx[i] = lim_sup[i];
+      if(param[i] < lim_inf[i]) 
+	param[i] = bx[i] = lim_inf[i];
+
+      assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+    }
+  
+  evaluateChange(tr, rateNumber, param, fb, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+  for(i = 0; i < numberOfModels; i++)  
+    {
+      if (fb[i] > fa[i]) 
+	{	  
+	  SHFT(dum[i],ax[i],bx[i],dum[i]);
+	  SHFT(dum[i],fa[i],fb[i],dum[i]);
+	}
+      
+      cx[i] = bx[i] + MNBRAK_GOLD * (bx[i] - ax[i]);
+      
+      param[i] = cx[i];
+      
+      if(param[i] > lim_sup[i]) 
+	param[i] = cx[i] = lim_sup[i];
+      if(param[i] < lim_inf[i]) 
+	param[i] = cx[i] = lim_inf[i];
+
+      assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+    }
+  
+ 
+  evaluateChange(tr, rateNumber, param, fc, converged, whichFunction, numberOfModels,  ll, modelEpsilon);
+
+  while(1) 
+    {       
+      allConverged = TRUE;
+
+      for(i = 0; i < numberOfModels && allConverged; i++)
+	allConverged = allConverged && converged[i];
+
+      if(allConverged)
+	{
+	  for(i = 0; i < numberOfModels; i++)
+	    {	       
+	      if(ax[i] > lim_sup[i]) 
+		ax[i] = lim_sup[i];
+	      if(ax[i] < lim_inf[i]) 
+		ax[i] = lim_inf[i];
+
+	      if(bx[i] > lim_sup[i]) 
+		bx[i] = lim_sup[i];
+	      if(bx[i] < lim_inf[i]) 
+		bx[i] = lim_inf[i];
+	       
+	      if(cx[i] > lim_sup[i]) 
+		cx[i] = lim_sup[i];
+	      if(cx[i] < lim_inf[i]) 
+		cx[i] = lim_inf[i];
+	    }
+
+	  free(converged);
+	  free(ulim);
+	  free(u);
+	  free(r);
+	  free(q);
+	  free(fu);
+	  free(dum); 
+	  free(temp);
+	  free(state);   
+	  free(endState);
+	  return 0;
+	   
+	}
+
+      for(i = 0; i < numberOfModels; i++)
+	{
+	  if(!converged[i])
+	    {
+	      switch(state[i])
+		{
+		case 0:
+		  endState[i] = 0;
+		  if(!(fb[i] > fc[i]))		         
+		    converged[i] = TRUE;		       		     
+		  else
+		    {
+		   
+		      if(ax[i] > lim_sup[i]) 
+			ax[i] = lim_sup[i];
+		      if(ax[i] < lim_inf[i]) 
+			ax[i] = lim_inf[i];
+		      if(bx[i] > lim_sup[i]) 
+			bx[i] = lim_sup[i];
+		      if(bx[i] < lim_inf[i]) 
+			bx[i] = lim_inf[i];
+		      if(cx[i] > lim_sup[i]) 
+			cx[i] = lim_sup[i];
+		      if(cx[i] < lim_inf[i]) 
+			cx[i] = lim_inf[i];
+		       
+		      r[i]=(bx[i]-ax[i])*(fb[i]-fc[i]);
+		      q[i]=(bx[i]-cx[i])*(fb[i]-fa[i]);
+		      u[i]=(bx[i])-((bx[i]-cx[i])*q[i]-(bx[i]-ax[i])*r[i])/
+			(2.0*SIGN(MAX(fabs(q[i]-r[i]),MNBRAK_TINY),q[i]-r[i]));
+		       
+		      ulim[i]=(bx[i])+MNBRAK_GLIMIT*(cx[i]-bx[i]);
+		       
+		      if(u[i] > lim_sup[i]) 
+			u[i] = lim_sup[i];
+		      if(u[i] < lim_inf[i]) 
+			u[i] = lim_inf[i];
+		      if(ulim[i] > lim_sup[i]) 
+			ulim[i] = lim_sup[i];
+		      if(ulim[i] < lim_inf[i]) 
+			ulim[i] = lim_inf[i];
+		       
+		      if ((bx[i]-u[i])*(u[i]-cx[i]) > 0.0)
+			{
+			  param[i] = u[i];
+			  if(param[i] > lim_sup[i]) 			     
+			    param[i] = u[i] = lim_sup[i];
+			  if(param[i] < lim_inf[i])
+			    param[i] = u[i] = lim_inf[i];
+			  endState[i] = 1;
+			}
+		      else 
+			{
+			  if ((cx[i]-u[i])*(u[i]-ulim[i]) > 0.0) 
+			    {
+			      param[i] = u[i];
+			      if(param[i] > lim_sup[i]) 
+				param[i] = u[i] = lim_sup[i];
+			      if(param[i] < lim_inf[i]) 
+				param[i] = u[i] = lim_inf[i];
+			      endState[i] = 2;
+			    }		  	       
+			  else
+			    {
+			      if ((u[i]-ulim[i])*(ulim[i]-cx[i]) >= 0.0) 
+				{
+				  u[i] = ulim[i];
+				  param[i] = u[i];	
+				  if(param[i] > lim_sup[i]) 
+				    param[i] = u[i] = ulim[i] = lim_sup[i];
+				  if(param[i] < lim_inf[i]) 
+				    param[i] = u[i] = ulim[i] = lim_inf[i];
+				  endState[i] = 0;
+				}		  		
+			      else 
+				{		  
+				  u[i]=(cx[i])+MNBRAK_GOLD*(cx[i]-bx[i]);
+				  param[i] = u[i];
+				  endState[i] = 0;
+				  if(param[i] > lim_sup[i]) 
+				    param[i] = u[i] = lim_sup[i];
+				  if(param[i] < lim_inf[i]) 
+				    param[i] = u[i] = lim_inf[i];
+				}
+			    }	  
+			}
+		    }
+		  break;
+		case 1:
+		  endState[i] = 0;
+		  break;
+		case 2:
+		  endState[i] = 3;
+		  break;
+		default:
+		  assert(0);
+		}
+	      assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+	    }
+	}
+             
+      evaluateChange(tr, rateNumber, param, temp, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+      for(i = 0; i < numberOfModels; i++)
+	{
+	  if(!converged[i])
+	    {	       
+	      switch(endState[i])
+		{
+		case 0:
+		  fu[i] = temp[i];
+		  SHFT(ax[i],bx[i],cx[i],u[i]);
+		  SHFT(fa[i],fb[i],fc[i],fu[i]);
+		  state[i] = 0;
+		  break;
+		case 1:
+		  fu[i] = temp[i];
+		  if (fu[i] < fc[i]) 
+		    {
+		      ax[i]=(bx[i]);
+		      bx[i]=u[i];
+		      fa[i]=(fb[i]);
+		      fb[i]=fu[i]; 
+		      converged[i] = TRUE;		      
+		    } 
+		  else 
+		    {
+		      if (fu[i] > fb[i]) 
+			{
+			  assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+			  cx[i]=u[i];
+			  fc[i]=fu[i];
+			  converged[i] = TRUE;			  
+			}
+		      else
+			{		   
+			  u[i]=(cx[i])+MNBRAK_GOLD*(cx[i]-bx[i]);
+			  param[i] = u[i];
+			  if(param[i] > lim_sup[i]) {param[i] = u[i] = lim_sup[i];}
+			  if(param[i] < lim_inf[i]) {param[i] = u[i] = lim_inf[i];}	  
+			  state[i] = 1;		 
+			}		  
+		    }
+		  break;
+		case 2: 
+		  fu[i] = temp[i];
+		  if (fu[i] < fc[i]) 
+		    {		     
+		      SHFT(bx[i],cx[i],u[i], cx[i]+MNBRAK_GOLD*(cx[i]-bx[i]));
+		      state[i] = 2;
+		    }	   
+		  else
+		    {
+		      state[i] = 0;
+		      SHFT(ax[i],bx[i],cx[i],u[i]);
+		      SHFT(fa[i],fb[i],fc[i],fu[i]);
+		    }
+		  break;	   
+		case 3:		  
+		  SHFT(fb[i],fc[i],fu[i], temp[i]);
+		  SHFT(ax[i],bx[i],cx[i],u[i]);
+		  SHFT(fa[i],fb[i],fc[i],fu[i]);
+		  state[i] = 0;
+		  break;
+		default:
+		  assert(0);
+		}
+	    }
+	}
+    }
+   
+
+  assert(0);
+  free(converged);
+  free(ulim);
+  free(u);
+  free(r);
+  free(q);
+  free(fu);
+  free(dum); 
+  free(temp);
+  free(state);   
+  free(endState);
+
+  
+
+  return(0);
+}
+
+
+/*******************************************************************************************************/
+/******** LG4X ***************************************************************************************/
+
+static void optLG4X(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels)
+{
+  int 
+    i;
+
+  for(i = 0; i < 4; i++)
+    {
+      optParamGeneric(tr, modelEpsilon, ll, numberOfModels, i, LG4X_RATE_MIN, LG4X_RATE_MAX, LXRATE_F); 
+      optimizeWeights(tr, modelEpsilon, ll, numberOfModels); 
+    }
+}
+
+
+/**********************************************************************************************************/
+/* ALPHA PARAM ********************************************************************************************/
+
+
+
+//this function is required for implementing the LG4X model later-on 
+
+static void optAlphasGeneric(tree *tr, double modelEpsilon, linkageList *ll)
+{
+  int 
+    i,
+    non_LG4X_Partitions = 0,
+    LG4X_Partitions  = 0;
+
+  /* assumes homogeneous super-partitions, that either contain DNA or AA partitions !*/
+  /* does not check whether AA are all linked */
+
+  /* first do non-LG4X partitions */
+
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{
+	case DNA_DATA:			  	
+	case BINARY_DATA:
+	case SECONDARY_DATA:
+	case SECONDARY_DATA_6:
+	case SECONDARY_DATA_7:
+	case GENERIC_32:
+	case GENERIC_64:
+	  ll->ld[i].valid = TRUE;
+	  non_LG4X_Partitions++;
+	  break;
+	case AA_DATA:	  
+	  //to be implemented later-on 
+	  if(tr->partitionData[ll->ld[i].partitionList[0]].protModels == LG4X)
+	    {
+	      LG4X_Partitions++;	      
+	      ll->ld[i].valid = FALSE;
+	    }
+	  else
+	    {
+	      ll->ld[i].valid = TRUE;
+	      non_LG4X_Partitions++;
+	    }
+	  break;
+	default:
+	  assert(0);
+	}      
+    }   
+
+ 
+
+  if(non_LG4X_Partitions > 0)    
+    optParamGeneric(tr, modelEpsilon, ll, non_LG4X_Partitions, -1, ALPHA_MIN, ALPHA_MAX, ALPHA_F);
+  
+  
+  
+
+  /* then LG4x partitions */
+
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{
+	case DNA_DATA:			  	
+	case BINARY_DATA:
+	case SECONDARY_DATA:
+	case SECONDARY_DATA_6:
+	case SECONDARY_DATA_7:
+	case GENERIC_32:
+	case GENERIC_64:
+	  ll->ld[i].valid = FALSE;	  
+	  break;
+	case AA_DATA:	  	  
+	  if(tr->partitionData[ll->ld[i].partitionList[0]].protModels == LG4X)	      
+	    ll->ld[i].valid = TRUE;	   
+	  else
+	    ll->ld[i].valid = FALSE;	   	    
+	  break;
+	default:
+	  assert(0);
+	}      
+    }   
+  
+  if(LG4X_Partitions > 0)
+    optLG4X(tr, modelEpsilon, ll, LG4X_Partitions);
+
+  for(i = 0; i < ll->entries; i++)
+    ll->ld[i].valid = TRUE;
+}
+
+
+static double minFreq(int index, int whichFreq, tree *tr, double absoluteMin)
+{
+  double 
+    min = 0.0,
+    *w = tr->partitionData[index].freqExponents,
+    c = 0.0;
+
+  int
+    states = tr->partitionData[index].states,
+    i;
+
+  for(i = 0; i < states; i++)
+    if(i != whichFreq)
+      c += exp(w[i]);
+
+  min = log(FREQ_MIN) + log(c) - log (1.0 - FREQ_MIN);
+
+  if(0)
+    {
+      double
+	check = exp(min) / (exp(min) + c);
+      
+      printf("check %f\n", check);    
+
+      printf("min: %f \n", min);
+    }
+  
+  return MAX(min, absoluteMin);
+}
+
+static double maxFreq(int index, int whichFreq, tree *tr, double absoluteMax)
+{
+  double 
+    max = 0.0,
+    *w = tr->partitionData[index].freqExponents,
+    c = 0.0;
+
+  int
+    states = tr->partitionData[index].states,
+    i;
+
+  for(i = 0; i < states; i++)
+    if(i != whichFreq)
+      c += exp(w[i]);
+
+  max = log(1.0 - ((double)(states - 1) * FREQ_MIN)) + log(c) - log ((double)(states - 1) * FREQ_MIN);
+
+  if(0)
+    {
+      double
+	check = exp(max) / (exp(max) + c);
+      
+      printf("check max %f\n", check);    
+      
+      printf("max: %f \n", max);
+    }
+  
+  return MIN(max, absoluteMax);
+}
+
+
+static void optParamGeneric(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int rateNumber, double _lim_inf, double _lim_sup, int whichParameterType)
+{
+  int
+    l,
+    k, 
+    j, 
+    pos;
+    
+  double  
+    *startRates   = (double *)malloc(sizeof(double) * numberOfModels * 4),
+    *startWeights = (double *)malloc(sizeof(double) * numberOfModels * 4),
+    *startExponents = (double *)malloc(sizeof(double) * numberOfModels * 4),
+    *startValues = (double *)malloc(sizeof(double) * numberOfModels),
+    *startLH    = (double *)malloc(sizeof(double) * numberOfModels),
+    *endLH      = (double *)malloc(sizeof(double) * numberOfModels),
+    *_a         = (double *)malloc(sizeof(double) * numberOfModels),
+    *_b         = (double *)malloc(sizeof(double) * numberOfModels),
+    *_c         = (double *)malloc(sizeof(double) * numberOfModels),
+    *_fa        = (double *)malloc(sizeof(double) * numberOfModels),
+    *_fb        = (double *)malloc(sizeof(double) * numberOfModels),
+    *_fc        = (double *)malloc(sizeof(double) * numberOfModels),
+    *_param     = (double *)malloc(sizeof(double) * numberOfModels),    
+    *_x         = (double *)malloc(sizeof(double) * numberOfModels),
+    *lim_inf    = (double *)malloc(sizeof(double) * numberOfModels),
+    *lim_sup    = (double *)malloc(sizeof(double) * numberOfModels);
+ 
+  
+  evaluateGeneric(tr, tr->start, TRUE);
+
+  
+  
+#ifdef  _DEBUG_MOD_OPT
+  double
+    initialLH = tr->likelihood;
+#endif
+
+  /* 
+     at this point here every worker has the traversal data it needs for the 
+     search 
+  */
+
+  for(l = 0, pos = 0; l < ll->entries; l++)
+    {
+      if(ll->ld[l].valid)
+	{
+	  endLH[pos] = unlikely;
+	  startLH[pos] = 0.0;
+
+	  for(j = 0; j < ll->ld[l].partitions; j++)
+	    {
+	      int 
+		index = ll->ld[l].partitionList[j];
+	      
+	      startLH[pos] += tr->perPartitionLH[index];
+	      
+	      switch(whichParameterType)
+		{
+		case ALPHA_F:
+		  lim_inf[pos] = _lim_inf;
+		  lim_sup[pos] = _lim_sup;
+		  startValues[pos] = tr->partitionData[index].alpha;
+		  break;
+		case RATE_F:
+		  lim_inf[pos] = _lim_inf;
+		  lim_sup[pos] = _lim_sup;
+		  startValues[pos] = tr->partitionData[index].substRates[rateNumber];      
+		  break;
+		case FREQ_F:
+		  lim_inf[pos] = minFreq(index, rateNumber, tr, _lim_inf);
+		  lim_sup[pos] = maxFreq(index, rateNumber, tr, _lim_sup);
+		  startValues[pos] = tr->partitionData[index].freqExponents[rateNumber];
+		  break;
+		case LXRATE_F:		 
+		  lim_inf[pos] = _lim_inf;
+		  lim_sup[pos] = _lim_sup;
+		  assert(rateNumber >= 0 && rateNumber < 4);
+		  startValues[pos] = tr->partitionData[index].gammaRates[rateNumber];
+		  memcpy(&startRates[pos * 4],   tr->partitionData[index].gammaRates, 4 * sizeof(double)); 
+		  memcpy(&startExponents[pos * 4], tr->partitionData[index].weightExponents, 4 * sizeof(double));
+		  memcpy(&startWeights[pos * 4], tr->partitionData[index].weights,    4 * sizeof(double));
+		  break;
+		case LXWEIGHT_F: 		  
+		  lim_inf[pos] = _lim_inf;
+		  lim_sup[pos] = _lim_sup;
+		  assert(rateNumber >= 0 && rateNumber < 4);
+		  startValues[pos] = tr->partitionData[index].weightExponents[rateNumber];		  
+		  break;
+		default:
+		  assert(0);
+		}
+		  
+	    }
+	  pos++;
+	}
+    }  
+
+  assert(pos == numberOfModels);
+   
+  for(k = 0, pos = 0; k < ll->entries; k++)
+    {
+      if(ll->ld[k].valid)
+	{	 	 	  
+	  _a[pos] = startValues[pos] + 0.1;
+	  _b[pos] = startValues[pos] - 0.1;
+	      
+	  if(_a[pos] < lim_inf[pos]) 
+	    _a[pos] = lim_inf[pos];
+	  
+	  if(_a[pos] > lim_sup[pos]) 
+	    _a[pos] = lim_sup[pos];
+	      
+	  if(_b[pos] < lim_inf[pos]) 
+	    _b[pos] = lim_inf[pos];
+	  
+	  if(_b[pos] > lim_sup[pos]) 
+	    _b[pos] = lim_sup[pos];    
+
+	  pos++;
+	}
+    }                    	     
+
+  assert(pos == numberOfModels);
+
+  brakGeneric(_param, _a, _b, _c, _fa, _fb, _fc, lim_inf, lim_sup, numberOfModels, rateNumber, whichParameterType, tr, ll, modelEpsilon);
+      
+  for(k = 0; k < numberOfModels; k++)
+    {
+      assert(_a[k] >= lim_inf[k] && _a[k] <= lim_sup[k]);
+      assert(_b[k] >= lim_inf[k] && _b[k] <= lim_sup[k]);	  
+      assert(_c[k] >= lim_inf[k] && _c[k] <= lim_sup[k]);	    
+    }      
+
+  brentGeneric(_a, _b, _c, _fb, modelEpsilon, _x, endLH, numberOfModels, whichParameterType, rateNumber, tr,  ll, lim_inf, lim_sup);
+		      
+  for(k = 0, pos = 0; k < ll->entries; k++)
+    {
+      if(ll->ld[k].valid)
+	{ 
+	  if(startLH[pos] > endLH[pos])
+	    {
+	      //if the initial likelihood was better than the likelihodo after optimization, we set the values back 
+	      //to their original values 
+
+	      for(j = 0; j < ll->ld[k].partitions; j++)
+		{
+		  int 
+		    index = ll->ld[k].partitionList[j];		  		 
+		  
+		  changeModelParameters(index, rateNumber, startValues[pos], whichParameterType, tr);		 
+		}
+	    }
+	  else
+	    {
+	      //otherwise we set the value to the optimized value 
+	      //this used to be a bug in standard RAxML, before I fixed it 
+	      //I was not using _x[pos] as value that needs to be set 
+
+	      for(j = 0; j < ll->ld[k].partitions; j++)
+		{
+		  int 
+		    index = ll->ld[k].partitionList[j];
+
+		  changeModelParameters(index, rateNumber, _x[pos], whichParameterType, tr);		  
+		}
+	    }
+	  pos++;
+	}
+    }
+
+
+  //LIBRARY call the barrier here in the LIBRARY to update model params at all threads/processes !
+    
+  assert(pos == numberOfModels);
+
+  free(startLH);
+  free(endLH);
+  free(_a);
+  free(_b);
+  free(_c);
+  free(_fa);
+  free(_fb);
+  free(_fc);
+  free(_param);
+  free(_x);  
+  free(startValues);
+  free(startRates);
+  free(startWeights);
+  free(startExponents);
+  free(lim_inf);
+  free(lim_sup);
+
+#ifdef _DEBUG_MOD_OPT
+  evaluateGeneric(tr, tr->start, TRUE);
+
+  if(tr->likelihood < initialLH)
+    printf("%f %f\n", tr->likelihood, initialLH);
+  assert(tr->likelihood >= initialLH);
+#endif
+
+}
+
+
+
+//******************** rate optimization functions ***************************************************/
+
+static void optFreqs(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int states)
+{ 
+  int 
+    rateNumber;
+
+  double
+    freqMin = -1000000.0,
+    freqMax = 200.0;
+  
+  for(rateNumber = 0; rateNumber < states; rateNumber++)
+    optParamGeneric(tr, modelEpsilon, ll, numberOfModels, rateNumber, freqMin, freqMax, FREQ_F);   
+}
+
+static void optBaseFreqs(tree *tr, double modelEpsilon, linkageList *ll)
+{
+  int 
+    i,
+    states,
+    dnaPartitions = 0,
+    aaPartitions  = 0,
+    binaryPartitions = 0;
+
+  /* first do DNA */
+
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{       	  	  
+	case DNA_DATA:	
+	  states = tr->partitionData[ll->ld[i].partitionList[0]].states;	 
+	  if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+	    {
+	      ll->ld[i].valid = TRUE;
+	      dnaPartitions++;  	    
+	    }
+	  else
+	    ll->ld[i].valid = FALSE;
+	  break;       
+	case AA_DATA:
+	case BINARY_DATA:
+	  ll->ld[i].valid = FALSE;
+	  break;
+	default:
+	  assert(0);
+	}      
+    }   
+
+  if(dnaPartitions > 0)
+    optFreqs(tr, modelEpsilon, ll, dnaPartitions, states);
+  
+  /* then AA */
+
+  
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{
+	case AA_DATA:	  
+	  states = tr->partitionData[ll->ld[i].partitionList[0]].states; 	      
+	  if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+	    {
+	      ll->ld[i].valid = TRUE;
+	      aaPartitions++;		
+	    }
+	  else
+	    ll->ld[i].valid = FALSE; 
+	  break;
+	case DNA_DATA:	 
+	case BINARY_DATA:
+	  ll->ld[i].valid = FALSE;
+	  break;
+	default:
+	  assert(0);
+	}	 
+    }
+  
+  if(aaPartitions > 0)      
+    optFreqs(tr, modelEpsilon, ll, aaPartitions, states);
+
+
+  //then binary 
+
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{
+	case BINARY_DATA:	  
+	  states = tr->partitionData[ll->ld[i].partitionList[0]].states; 	      
+	  if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+	    {
+	      ll->ld[i].valid = TRUE;
+	      binaryPartitions++;		
+	    }
+	  else
+	    ll->ld[i].valid = FALSE; 
+	  break;
+	case DNA_DATA:	    
+	case AA_DATA:
+	  ll->ld[i].valid = FALSE;
+	  break;	 
+	default:
+	  assert(0);
+	}	 
+    }
+  
+  if(binaryPartitions > 0)      
+    optFreqs(tr, modelEpsilon, ll, binaryPartitions, states);
+
+  for(i = 0; i < ll->entries; i++)
+    ll->ld[i].valid = TRUE;
+}
+
+
+//new version for optimizing rates, an external loop that iterates over the rates 
+
+static void optRates(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int states)
+{
+  int
+    rateNumber,
+    numberOfRates = ((states * states - states) / 2) - 1;
+
+  for(rateNumber = 0; rateNumber < numberOfRates; rateNumber++)
+    optParamGeneric(tr, modelEpsilon, ll, numberOfModels, rateNumber, RATE_MIN, RATE_MAX, RATE_F);   
+}
+
+
+static boolean AAisGTR(tree *tr)
+{
+  int i, count = 0;
+
+  for(i = 0; i < tr->NumberOfModels; i++)   
+    {
+      if(tr->partitionData[i].dataType == AA_DATA)
+	{
+	  count++;
+	  if(tr->partitionData[i].protModels != GTR)
+	    return FALSE;
+	}
+    }
+
+  if(count == 0)
+    return FALSE;
+
+  return TRUE;
+}
+
+static void optRatesGeneric(tree *tr, double modelEpsilon, linkageList *ll)
+{
+  int 
+    i,
+    dnaPartitions = 0,
+    aaPartitions  = 0,
+    states = -1;
+
+  /* assumes homogeneous super-partitions, that either contain DNA or AA partitions !*/
+  /* does not check whether AA are all linked */
+
+  /* first do DNA */
+
+  for(i = 0; i < ll->entries; i++)
+    {
+      switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	{
+	case DNA_DATA:	
+	  states = tr->partitionData[ll->ld[i].partitionList[0]].states;	 
+	  ll->ld[i].valid = TRUE;
+	  dnaPartitions++;  	    
+	  break;
+	case BINARY_DATA:
+	case AA_DATA:
+	case SECONDARY_DATA:
+	case SECONDARY_DATA_6:
+	case SECONDARY_DATA_7:
+	case GENERIC_32:
+	case GENERIC_64:
+	  ll->ld[i].valid = FALSE;
+	  break;
+	default:
+	  assert(0);
+	}      
+    }   
+
+  if(dnaPartitions > 0)
+    optRates(tr, modelEpsilon, ll, dnaPartitions, states);
+  
+  /* then AA for GTR */
+
+  if(AAisGTR(tr))
+    {
+      for(i = 0; i < ll->entries; i++)
+	{
+	  switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+	    {
+	    case AA_DATA:
+	      states = tr->partitionData[ll->ld[i].partitionList[0]].states; 	      
+	      ll->ld[i].valid = TRUE;
+	      aaPartitions++;		
+	      break;
+	    case DNA_DATA:	    
+	    case BINARY_DATA:
+	    case SECONDARY_DATA:	
+	    case SECONDARY_DATA_6:
+	    case SECONDARY_DATA_7:
+	      ll->ld[i].valid = FALSE;
+	      break;
+	    default:
+	      assert(0);
+	    }	 
+	}
+
+      assert(aaPartitions == 1);     
+      
+      optRates(tr, modelEpsilon, ll, aaPartitions, states);
+    }  
+
+  for(i = 0; i < ll->entries; i++)
+    ll->ld[i].valid = TRUE;
+}
+
+
+
+
+
+/*********************FUNCTIONS FOR APPROXIMATE MODEL OPTIMIZATION ***************************************/
+
+
+
+
+
+
+static int catCompare(const void *p1, const void *p2)
+{
+  rateCategorize *rc1 = (rateCategorize *)p1;
+  rateCategorize *rc2 = (rateCategorize *)p2;
+
+  double i = rc1->accumulatedSiteLikelihood;
+  double j = rc2->accumulatedSiteLikelihood;
+  
+  if (i > j)
+    return (1);
+  if (i < j)
+    return (-1);
+  return (0);
+}
+
+
+static void categorizePartition(tree *tr, rateCategorize *rc, int model, int lower, int upper, double *patrat, 
+				int *rateCategory /* temporary; used to be tr->rateCategory */ 
+				)
+{
+  
+
+  int
+    zeroCounter,
+    i, 
+    k;
+  
+  double 
+    diff, 
+    min;
+
+  for (i = lower, zeroCounter = 0; i < upper; i++, zeroCounter++) 
+    {
+      double
+	temp = patrat[i];
+
+      int
+	found = 0;
+	
+      for(k = 0; k < tr->partitionData[model].numberOfCategories; k++)
+	{
+	  if(temp == rc[k].rate || (fabs(temp - rc[k].rate) < 0.001))
+	    {
+	      found = 1;
+	      rateCategory[i] = k;
+	      break;
+	    }
+	}
+	
+      if(!found)
+	{
+	  min = fabs(temp - rc[0].rate);
+	  rateCategory[i] = 0;
+
+	  for(k = 1; k < tr->partitionData[model].numberOfCategories; k++)
+	    {
+	      diff = fabs(temp - rc[k].rate);
+
+	      if(diff < min)
+		{
+		  min = diff;
+		  rateCategory[i] = k;
+		}
+	    }
+	}
+    }
+
+  for(k = 0; k < tr->partitionData[model].numberOfCategories; k++)
+    tr->partitionData[model].perSiteRates[k] = rc[k].rate; 
+}
+
+
+
+
+static void optRateCatPthreads(tree *tr, double lower_spacing, double upper_spacing)
+{
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+  {
+    int
+      m,
+      model,
+      maxModel;
+
+#ifdef _USE_OMP
+    maxModel = tr->maxModelsPerThread;
+#else
+    maxModel = tr->NumberOfModels;
+#endif
+
+  for(m = 0; m < maxModel; m++)
+    {
+      /* just defaults -> if partion wasn't assigned to this thread, it will be ignored later on */
+      size_t
+	width = 0,
+	offset = 0;
+
+#ifdef _USE_OMP
+    	  int
+    	    tid = omp_get_thread_num();
+
+    	  /* check if this thread should process this partition */
+    	  Assign*
+	    pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+    	  if(pAss)
+	    {
+	      model  = pAss->partitionId;
+	      width  = pAss->width;
+	      offset = pAss->offset;
+
+	      assert(model < tr->NumberOfModels);
+	    }
+    	  else
+    	    break;
+
+#else
+    	  model = m;
+
+    	  /* number of sites in this partition */
+	  width  = (size_t)tr->partitionData[model].width;
+	  offset = 0;
+#endif
+
+      size_t
+	i;
+
+      pInfo 
+	*partition = &(tr->partitionData[model]); 
+      
+      for( i = offset; i < offset + width; ++i)
+	{
+	  double 
+	    initialRate, 
+	    initialLikelihood, 
+	    leftLH, 
+	    rightLH, 
+	    leftRate, 
+	    rightRate, 
+	    v;
+	      
+	  const double 
+	    epsilon = 0.00001;
+	      
+	  int 
+	    k;	      
+
+	  initialRate = partition->patrat[i]; 
+	      
+	  initialLikelihood = evaluatePartialGeneric(tr, i, initialRate, model); /* i is real i ??? */	      	      	      	      
+
+	  leftLH = rightLH = initialLikelihood;
+	  leftRate = rightRate = initialRate;
+	      
+	  k = 1;
+	      
+	  while((initialRate - k * lower_spacing > 0.0001) && 
+		((v = evaluatePartialGeneric(tr, i, initialRate - k * lower_spacing, model)) 
+		 > leftLH) && 
+		(fabs(leftLH - v) > epsilon))  
+	    {	  
+#ifndef WIN32
+	      if(isnan(v))
+		assert(0);
+#endif
+		  
+	      leftLH = v;
+	      leftRate = initialRate - k * lower_spacing;
+	      k++;	  
+	    }      
+	      
+	  k = 1;
+	      
+	  while(((v = evaluatePartialGeneric(tr, i, initialRate + k * upper_spacing, model)) > rightLH) &&
+		(fabs(rightLH - v) > epsilon))    	
+	    {
+#ifndef WIN32
+	      if(isnan(v))
+		assert(0);
+#endif     
+	      rightLH = v;
+	      rightRate = initialRate + k * upper_spacing;	 
+	      k++;
+	    }           
+	      
+	  if(rightLH > initialLikelihood || leftLH > initialLikelihood)
+	    {
+	      if(rightLH > leftLH)	    
+		{	     
+		  partition->patrat[i] = rightRate;
+		  partition->lhs[i]  = rightLH; 
+		}
+	      else
+		{	      
+		  partition->patrat[i] = leftRate; 
+		  partition->lhs[i] = leftLH;
+		}
+	    }
+	  else	    
+	    partition->lhs[i] = initialLikelihood;	    
+	}
+    }
+  }
+}
+
+
+
+
+
+/** 
+    determines the weighted rates for each partition. Intended for use
+    with normalization of the CAT model rates.
+    
+    Since information about rates and weights is distributed (each
+    process only has the respective info for the data assigned to it),
+    we have to communicate with peer processes. Notice, that
+    weightPerPart could actually be stored in a variable, since the
+    result does not change...
+
+    output:
+    weightPerPart_result  -- the sum of weights per partition
+    weightedRates_result  -- sum of rates per partition weighted by site weight
+
+*/ 
+static void getWeightsAndWeightedRates(const tree * const tr, int **weightPerPart_result, double **weightedRates_result )
+{
+  int 
+    i,
+    *weightPerPart = (int *)NULL;
+  
+  double 
+    *weightedRates = (double *)NULL;
+   
+  *weightPerPart_result = (int*)calloc((size_t)tr->NumberOfModels, sizeof(int)); 
+  *weightedRates_result = (double*) calloc((size_t)tr->NumberOfModels, sizeof(double)); 
+  
+  
+  weightedRates = *weightedRates_result; 
+  weightPerPart = *weightPerPart_result; 
+  
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    {
+      size_t 
+	j; 
+
+      pInfo 
+	*partition = &(tr->partitionData[i]); 
+      
+      for(j = 0; j < partition->width; ++j)
+	{ 
+	  int 
+	    c = partition->rateCategory[j];
+	  
+	  weightPerPart[i] += partition->wgt[j];	  
+	  assert(0 <= c && c < tr->maxCategories); 
+	  weightedRates[i] += ((double)partition->wgt[j]) * partition->perSiteRates[c]; 
+	}
+    }
+  
+  MPI_Allreduce(MPI_IN_PLACE, weightPerPart, tr->NumberOfModels, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+  MPI_Allreduce(MPI_IN_PLACE, weightedRates, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); 
+
+  for( i = 0; i < tr->NumberOfModels; ++i)
+    {
+      assert(weightPerPart[i] > 0 ); 
+      assert(weightedRates[i] > 0.0 );
+    }
+}
+
+
+/* 
+   this used to be updatePerSiteRates without scaling. Previously,
+   updatePerSiteRates without scaling only conducted a check about
+   whether sites are scaled correctly.
+ */
+
+//Andre but isn't this checking that the rates have been scaled correctly?
+//shouldn't the assertions fail in this case, i.e., without scaling ?
+void checkPerSiteRates(const tree *const tr )
+{
+  int 
+    i,
+    *weightPerPart =  (int *)NULL; 
+  
+  double 
+    *weightedRates = (double *)NULL; 
+  
+  /*
+    determine the sum of weights (weightPerPart) and the sum of all
+    rates of a partition weighted by site weights
+   */ 
+  getWeightsAndWeightedRates(tr, &weightPerPart, &weightedRates); 
+
+  if(tr->numBranches > 1 )
+    {
+      /* check if the mean of rates of each partition is 1 */
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	{
+	  double accRat = weightedRates[i] / (double)weightPerPart[i]; 
+	  assert(fabs(accRat - 1.0) < 1e-5); 
+	}
+    }
+  else 
+    {
+      /* check, if the overall mean of rates is 1 */
+
+      double 
+	accRat = 0.0,
+	accWgt = 0.0; 
+      
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	{
+	  accRat += weightedRates[i]; 
+	  accWgt += weightPerPart[i]; 
+	}
+      accRat  /= (double)accWgt; 
+      
+      assert(fabs(accRat - 1.0) < 1e-5); 
+    }
+
+  free(weightedRates); 
+  free(weightPerPart); 
+}
+
+
+/** 
+    updatePerSiteRates is called after the master has categorized
+    rates into several categories and every process has obtained the
+    categorization for only the data assigned to it. Now, we still
+    have to scale the rates, s.t. they are 1 on average.
+
+    Thus, some communication is still needed to determine the total
+    weight and the weighted rates (because this information is
+    destributed).
+
+    Notice that this function previously had two modes (scaleRates =
+    {TRUE,FALSE}). Previously, scaleRates = FALSE, only performed a
+    check on whether rates are scaled correctly such that the average
+    rate is 1. For clarity, this functionality is now in a separate
+    function called checkPerSiteRates.
+*/ 
+static void updatePerSiteRates(tree *tr)
+{
+  int 
+    i, 
+    *weightPerPart =  (int *)NULL; 
+  
+  double 
+    *weightedRates = (double *)NULL; 
+
+  getWeightsAndWeightedRates(tr, &weightPerPart, &weightedRates); 
+
+  if(tr->numBranches > 1  )	
+    {
+      /* scale each partition, s.t. average rate within the partition is 1 */
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	{
+	  int j; 
+	  double scaler = weightedRates[i] / (double)weightPerPart[i]; 
+	  scaler = 1.0 / scaler; 
+	  for(j = 0; j < tr->partitionData[i].numberOfCategories; ++j)
+	    tr->partitionData[i].perSiteRates[j] *= scaler; 
+	}
+    }
+  else
+    {
+      /* scale, s.t. average rate is 1 */ 
+
+      double 
+	scaler = 0.0,
+	accWgt = 0.0; 
+      
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	{
+	  scaler += weightedRates[i]; 
+	  accWgt += weightPerPart[i]; 
+	}
+      scaler /= (double)accWgt; 
+      scaler = 1.0 / scaler; 
+
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	{
+	  pInfo 
+	    *partition = &(tr->partitionData[i]); 
+	  
+	  int 
+	    j; 
+	  
+	  for(j = 0; j < partition->numberOfCategories; ++j)
+	    partition->perSiteRates[j] *= scaler; 
+	}
+    }
+
+  free(weightedRates); 
+  free(weightPerPart); 
+  
+  /* 
+     finally check, whether the rates are scaled correctly, s.t. their
+     mean is 1
+   */ 
+  checkPerSiteRates(tr);
+}
+
+
+
+/*
+  gathers optimized rates and the associated persite-lnls from all
+  processes at the master.
+
+  Notice that for instance tr->patrat_basePtr already contain all rate
+  data of a single process.
+
+  Output: 
+  optRates_result  -- (only at master) a pointer to an array of optimized rates (corresponds to what used to be  tr->patratStored)
+  lnls_result -- (only at master) a pointer to an array with persite-lnls that correspond to the newly proposed rate  (used to be tr->lhs) 
+ */ 
+static void gatherOptimizedRates(tree *tr, double **optRates_result, double **lnls_result)
+{
+  /* determine counts and displacement for data for each processor  */
+  int 
+    *numPerProc = (int *)NULL, 
+    *displPerProc = (int *)NULL; 
+  
+  calculateLengthAndDisplPerProcess(tr, &numPerProc, &displPerProc); 
+  
+  gatherDistributedArray( tr, (void**) optRates_result, tr->patrat_basePtr, MPI_DOUBLE, numPerProc, displPerProc); 
+  gatherDistributedArray(tr , (void**) lnls_result, tr->lhs_basePtr, MPI_DOUBLE, numPerProc, displPerProc); 
+
+  free(numPerProc);
+  free(displPerProc); 
+}
+
+
+/* 
+    The master creates rate categories and assigns the rate categories
+   to processes. Only executed by the master to assure consistent
+   categorization.
+
+   This code used to be the first part of of optimizeRateCategories()
+   and has only slightly been modified.
+
+   Input: 
+   patrat  -- a global array of  optimized rates (used to be tr->patratStored)  
+   lnls -- a global array of per-site lnls (used to be tr->lhs)
+
+   Output: 
+   rateCategory_result  -- a pointer to a global array of rate categories (used to be tr->rateCategory) 
+   
+   side effect:
+   tr->partitionData[i].perSiteRates gets computed in categorizePartition.
+
+ */ 
+static void categorizeTheRates(tree *tr, double *patrat, double *lnls, int maxCategories, int **rateCategory_result)
+{
+  int 
+    model, i; 
+
+  *rateCategory_result = (int*) calloc((size_t)tr->originalCrunchedLength, sizeof(int)); 
+  
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {    
+      double 
+	temp = 0.0; 
+      
+      int 
+	where = 1,
+	found = 0,
+	width = tr->partitionData[model].upper -  tr->partitionData[model].lower,
+	upper = tr->partitionData[model].upper,
+	lower = tr->partitionData[model].lower;
+	      
+      rateCategorize 
+	*rc = (rateCategorize *)malloc(sizeof(rateCategorize) * width);		 
+	      
+      for (i = 0; i < width; i++)
+	{
+	  rc[i].accumulatedSiteLikelihood = 0.0;
+	  rc[i].rate = 0.0;
+	}  
+	      
+      rc[0].accumulatedSiteLikelihood = lnls[lower];
+      rc[0].rate = patrat[lower];
+
+      for (i = lower + 1; i < upper; i++) 
+	{
+	  int k; 
+
+	  temp = patrat[i];
+	  found = 0;
+		  
+	  for(k = 0; k < where; k++)
+	    {
+	      if(temp == rc[k].rate || (fabs(temp - rc[k].rate) < 0.001))
+		{
+		  found = 1;						
+		  rc[k].accumulatedSiteLikelihood += lnls[i];	
+		  break;
+		}
+	    }
+		  
+	  if(!found)
+	    {	    
+	      rc[where].rate = temp;	    
+	      rc[where].accumulatedSiteLikelihood += lnls[i];	    
+	      where++;
+	    }
+	}
+	      
+      qsort(rc, where, sizeof(rateCategorize), catCompare);
+	      
+      if(where < maxCategories)
+	{
+	  tr->partitionData[model].numberOfCategories = where;
+	  categorizePartition(tr, rc, model, lower, upper, patrat, *rateCategory_result);
+	}
+      else
+	{
+	  tr->partitionData[model].numberOfCategories = maxCategories;	
+	  categorizePartition(tr, rc, model, lower, upper, patrat, *rateCategory_result);
+	}
+	      
+      free(rc);
+    } 
+}
+
+
+/* #define PRINT_RAT_CAT */
+
+/** 
+    informs all peer processes about 
+    * rateCategory
+    * numberOfCategories
+    * perSiteRates
+    of their data
+*/
+static void scatterProcessedRates(tree *tr, int *rateCategory)
+{
+  int 
+    i,
+    *countPerProc = (int *)NULL, 
+    *displPerProc = (int *)NULL,
+    *numCatPerPart = (int*) calloc((size_t)tr->NumberOfModels, sizeof(int)); 
+  
+  if(processID == 0)
+    {
+      for(i = 0; i < tr->NumberOfModels; ++i)
+	numCatPerPart[i] = tr->partitionData[i].numberOfCategories; 
+    }
+  MPI_Bcast(numCatPerPart, tr->NumberOfModels,  MPI_INT, 0,MPI_COMM_WORLD); 
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    tr->partitionData[i].numberOfCategories = numCatPerPart[i]; 
+  free(numCatPerPart); 
+    
+  /* for simplicity, broad cast all peSiteRates */
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    MPI_Bcast(tr->partitionData[i].perSiteRates, tr->maxCategories, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+
+  /* prepare for scattering */
+  calculateLengthAndDisplPerProcess(tr,  &countPerProc, &displPerProc); 
+
+
+#ifdef PRINT_RAT_CAT  
+  if(processID == 0)
+    {
+      printf("rates BEFORE: "); 
+      for(i = 0; i < tr->originalCrunchedLength; ++i)
+	printf("%d,", rateCategory[i]); 
+      printf("\n"); 
+    }
+#endif
+
+  scatterDistrbutedArray(tr, rateCategory, tr->rateCategory_basePtr, MPI_INT, countPerProc, displPerProc); 
+
+#ifdef PRINT_RAT_CAT
+  int len = getMyCharacterLength(tr); 
+  printf("basepointer AFTER: "); 
+  for(i = 0; i < len ; ++i)
+    printf("%d,", tr->rateCategory_basePtr[i]);
+  printf("\n"); 
+#endif
+  
+  free(countPerProc); 
+  free(displPerProc); 
+}
+
+
+
+/* backup for one partition */
+typedef struct 
+{
+  double *patrat; 
+  int *rateCategory; 
+  double *perSiteRates; 
+  int numberOfCategories; 
+} RateBackup; 
+
+
+/** 
+    This function creates a backup of all data relevant for CAT-rate
+    assignment.
+    
+    Previously, the backup info has been stored in patratStored. Or
+    maybe it is the otherway around and patrat was the backup, while
+    patratStored contained the actual optimized rates.
+
+    Output: 
+    resultPtr -- contains the backup  
+ */ 
+static void backupRates(tree *tr, RateBackup** resultPtr)
+{
+  int 
+    i,
+    numCat = tr->maxCategories;
+  
+  RateBackup
+    *backup;
+
+  *resultPtr = (RateBackup* ) calloc((size_t)tr->NumberOfModels, sizeof(RateBackup));
+  
+  backup = *resultPtr; 
+ 
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    {
+      pInfo
+	*partition = &(tr->partitionData[i]); 
+      RateBackup 
+	*bk = backup + i; 
+
+      bk->patrat = (double*)calloc((size_t)partition->width, sizeof(double)); 
+      bk->perSiteRates = (double*) calloc((size_t)numCat, sizeof(double)); 
+      bk->rateCategory = (int*) calloc((size_t)partition->width, sizeof(int)) ;
+      bk->numberOfCategories = partition->numberOfCategories; 
+
+      memcpy(bk->patrat, partition->patrat, sizeof(double) * (size_t)partition->width); 
+      memcpy(bk->perSiteRates, partition->perSiteRates, sizeof(double) * (size_t)numCat); 
+      memcpy(bk->rateCategory, partition->rateCategory, sizeof(int) * (size_t)partition->width); 
+    } 
+}
+
+
+static void restoreBackupRates(tree *tr , RateBackup *rb)
+{
+  int 
+    numCat = tr->maxCategories,
+    i; 
+  
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    {
+      pInfo 
+	*partition = &(tr->partitionData[i]); 
+      
+      RateBackup 
+	*bk = rb + i; 
+
+      partition->numberOfCategories = bk->numberOfCategories; 
+
+      memcpy(partition->patrat, bk->patrat, sizeof(double) * (size_t)partition->width); 
+      memcpy(partition->perSiteRates, bk->perSiteRates, sizeof(double) * (size_t)numCat); 
+      memcpy(partition->rateCategory, bk->rateCategory, sizeof(int) * (size_t)partition->width); 
+    }
+}
+
+
+
+static void deleteBackupRates(tree *tr, RateBackup** rbPtr)
+{
+  int i ;
+
+  for(i = 0; i< tr->NumberOfModels; ++i)
+    {
+      RateBackup 
+	*rb =  &((*rbPtr)[i]);
+      
+      free(rb->patrat);
+      free(rb->perSiteRates);
+      free(rb->rateCategory);
+    }
+
+  free(*rbPtr);
+  rbPtr = (RateBackup **)NULL;
+}
+
+
+static void optimizeRateCategories(tree *tr, int _maxCategories)
+{
+  assert(_maxCategories > 0);  
+
+  if(_maxCategories == 1)
+    return; 
+
+
+  double  
+    lower_spacing, 
+    upper_spacing,
+    initialLH = tr->likelihood,
+    *optRates = (double*)NULL,
+    *lnls = (double*)NULL; 
+
+  int
+    *rateCategory = (int *)NULL,
+    maxCategories = _maxCategories ; 
+
+  RateBackup 
+    *rateBackup = (RateBackup *)NULL; 
+
+  assert(isTip(tr->start->number, tr->mxtips));         
+
+  evaluateGeneric(tr, tr->start, TRUE);     
+
+  if(optimizeRateCategoryInvocations == 1)
+    {
+      lower_spacing = 0.5 / ((double)optimizeRateCategoryInvocations);
+      upper_spacing = 1.0 / ((double)optimizeRateCategoryInvocations);
+    }
+  else
+    {
+      lower_spacing = 0.05 / ((double)optimizeRateCategoryInvocations);
+      upper_spacing = 0.1 / ((double)optimizeRateCategoryInvocations);
+    }
+
+  if(lower_spacing < 0.001)
+    lower_spacing = 0.001;
+
+  if(upper_spacing < 0.001)
+    upper_spacing = 0.001;
+
+  optimizeRateCategoryInvocations++;
+
+  //store old rate category assignment 
+  backupRates(tr, &rateBackup); 
+
+  /* process specific: each process optimizes rates for data
+     assigned to it */
+  optRateCatPthreads(tr, lower_spacing, upper_spacing);
+  
+  /* gather rates and lnls at the master */
+  gatherOptimizedRates(tr, &optRates, &lnls); 
+  
+  /* master has all necessary info now and can categorize the rates */
+  if(processID == 0)
+    {
+      categorizeTheRates(tr, optRates, lnls, maxCategories, &rateCategory ); 
+  
+      /* only allocated at master  */
+      free(optRates); 
+      free(lnls); 
+    }
+
+  scatterProcessedRates(tr, rateCategory );
+  if(processID == 0)
+    free(rateCategory); 
+
+  /* every process has now new rates and a new category
+     assignment. However, we still have to scale the rates, such their
+     weighted mean rate is 1.  */
+  updatePerSiteRates(tr); 
+
+  evaluateGeneric(tr, tr->start, TRUE);
+
+  if(tr->likelihood < initialLH)
+    {	 		  
+      restoreBackupRates(tr, rateBackup); 
+
+      //Andre I don't understand the comment below ... 
+      //can per-site rate scaling still be dis-abled in this version of the code?
+
+      /* 
+	 => Andre: I am afraid neither do I. Comparing it to the
+	 original code, I think everything should be fine: we restore
+	 the previous state and check, whether rates are scaled
+	 correctly.
+      */
+      
+      /* cannot do that any more here  */
+      checkPerSiteRates(tr); 
+      
+      evaluateGeneric(tr, tr->start, TRUE);	 
+
+      assert(initialLH == tr->likelihood);
+    }
+
+  deleteBackupRates(tr,&rateBackup); 
+}
+
+
+
+
+/*****************************************************************************************************/
+
+void resetBranches(tree *tr)
+{
+  nodeptr  p, q;
+  int  nodes, i;
+
+  nodes = tr->mxtips  +  3 * (tr->mxtips - 2);
+  p = tr->nodep[1];
+  while (nodes-- > 0) 
+    {   
+      for(i = 0; i < tr->numBranches; i++)
+	p->z[i] = defaultz;
+	
+      q = p->next;
+      while(q != p)
+	{	
+	  for(i = 0; i < tr->numBranches; i++)
+	    q->z[i] = defaultz;	    
+	  q = q->next;
+	}
+      p++;
+    }
+}
+
+
+static void printAAmatrix(tree *tr, double epsilon)
+{
+  if(AAisGTR(tr))
+    {
+      int model;
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  if(tr->partitionData[model].dataType == AA_DATA) 
+	    {
+	      char gtrFileName[1024];
+	      char epsilonStr[1024];
+	      FILE *gtrFile;
+	      double *rates = tr->partitionData[model].substRates;
+	      double *f     = tr->partitionData[model].frequencies;
+	      double q[20][20];
+	      int    r = 0;
+	      int i, j;
+
+	      assert(tr->partitionData[model].protModels == GTR);
+
+	      sprintf(epsilonStr, "%f", epsilon);
+
+	      strcpy(gtrFileName, workdir);
+	      strcat(gtrFileName, "RAxML_proteinGTRmodel.");
+	      strcat(gtrFileName, run_id);
+	      strcat(gtrFileName, "_");
+	      strcat(gtrFileName, epsilonStr);
+
+	      gtrFile = myfopen(gtrFileName, "wb");
+
+	      for(i = 0; i < 20; i++)
+		for(j = 0; j < 20; j++)
+		  q[i][j] = 0.0;
+
+	      for(i = 0; i < 19; i++)
+		for(j = i + 1; j < 20; j++)
+		  q[i][j] = rates[r++];
+
+	      for(i = 0; i < 20; i++)
+		for(j = 0; j <= i; j++)
+		  {
+		    if(i == j)
+		      q[i][j] = 0.0;
+		    else
+		      q[i][j] = q[j][i];
+		  }
+	   
+	      for(i = 0; i < 20; i++)
+		{
+		  for(j = 0; j < 20; j++)		
+		    fprintf(gtrFile, "%1.80f ", q[i][j]);
+		
+		  fprintf(gtrFile, "\n");
+		}
+	      for(i = 0; i < 20; i++)
+		fprintf(gtrFile, "%1.80f ", f[i]);
+	      fprintf(gtrFile, "\n");
+
+	      fclose(gtrFile);
+
+	      printBothOpen("\nPrinted intermediate AA substitution matrix to file %s\n\n", gtrFileName);
+	      
+	      break;
+	    }
+
+	}	  
+    }
+}
+
+
+
+
+static void optModel(tree *tr, int numProteinModels, int *bestIndex, double *bestScores, boolean empiricalFreqs)
+{
+  int
+    i,
+    model;
+    
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {      
+      bestIndex[model] = -1;
+      bestScores[model] = unlikely;
+    }
+      
+  for(i = 0; i < numProteinModels; i++)
+    {
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{	   
+	  if(tr->partitionData[model].protModels == AUTO)
+	    { 
+	      if(empiricalFreqs)
+		tr->partitionData[model].protFreqs = 0;
+	      else
+		tr->partitionData[model].protFreqs = 1;
+
+	      assert(!tr->partitionData[model].optimizeBaseFrequencies);
+
+	      tr->partitionData[model].autoProtModels = i;
+	      initReversibleGTR(tr, model);  
+	    }
+	}
+      
+      resetBranches(tr);
+      evaluateGeneric(tr, tr->start, TRUE);  
+      treeEvaluate(tr, 0.5);      
+      
+      //if(processID == 0)   
+      //printf("Subst Model %d Freqs: %s like %f %f\n", i, (empiricalFreqs == TRUE)?"empirical":"fixed", tr->likelihood, tr->perPartitionLH[0]);
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  if(tr->partitionData[model].protModels == AUTO)
+	    {	
+	      /*
+		if(processID == 0)
+		{
+		  
+		  int k;
+		  
+		  for(k = 0; k < 20; k++)
+		    printf("%f ", tr->partitionData[model].frequencies[k]);
+		  printf("\n");
+		}
+	      */
+	  
+	      if(tr->perPartitionLH[model] > bestScores[model])
+		{
+		  bestScores[model] = tr->perPartitionLH[model];
+		  bestIndex[model] = i;		      
+		}
+	    }	      
+	}       
+    }             
+}
+
+static void autoProtein(tree *tr, analdef *adef)
+{
+  int 
+    countAutos = 0,   
+    model;  
+
+  for(model = 0; model < tr->NumberOfModels; model++)	      
+    if(tr->partitionData[model].protModels == AUTO)
+      countAutos++;
+
+  if(countAutos > 0)
+    {
+      int        
+	numProteinModels = AUTO,
+	*bestIndex = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+	*oldIndex  = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+	*bestIndexEmpFreqs = (int*)malloc(sizeof(int) * tr->NumberOfModels);
+
+      boolean
+	*oldFreqs =  (boolean*)malloc(sizeof(boolean) * tr->NumberOfModels);
+
+      double
+	startLH,
+	*bestScores         = (double*)malloc(sizeof(double) * tr->NumberOfModels),
+	*bestScoresEmpFreqs = (double*)malloc(sizeof(double) * tr->NumberOfModels);
+
+      topolRELL_LIST 
+	*rl = (topolRELL_LIST *)malloc(sizeof(topolRELL_LIST));
+
+      char
+	*autoModels[4] = {"ML", "BIC", "AIC", "AICc"};
+
+      initTL(rl, tr, 1);
+      saveTL(rl, tr, 0);
+
+      evaluateGeneric(tr, tr->start, TRUE); 
+
+      startLH = tr->likelihood;
+      
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{
+	  oldIndex[model] = tr->partitionData[model].autoProtModels;
+	  oldFreqs[model] = tr->partitionData[model].protFreqs;
+	}
+            
+      optModel(tr, numProteinModels, bestIndex, bestScores, FALSE);
+      
+      optModel(tr, numProteinModels, bestIndexEmpFreqs, bestScoresEmpFreqs, TRUE);      
+     
+      printBothOpen("Automatic protein model assignment algorithm using %s criterion:\n\n", autoModels[tr->autoProteinSelectionType]);
+
+      for(model = 0; model < tr->NumberOfModels; model++)
+	{	   
+	  if(tr->partitionData[model].protModels == AUTO)
+	    {	     	      	       
+	      int 
+		bestIndexFixed = bestIndex[model],
+		bestIndexEmp = bestIndexEmpFreqs[model];
+	      
+	      double
+		bestLhFixed = bestScores[model],
+		bestLhEmp   = bestScoresEmpFreqs[model],
+		samples = 0.0,		
+		freeParamsFixed = 0.0,
+		freeParamsEmp = 0.0;	      	  	      
+	      
+	      samples = tr->partitionWeights[model]; 
+	      //printf("Sample size %f\n", samples);
+	      assert(samples != -1.0 && samples > 0.0);
+	     
+	     
+
+	      //we always deal with comprehensive trees in ExaML 
+	      assert(tr->ntips == tr->mxtips);
+	      freeParamsFixed = freeParamsEmp = (2 * tr->ntips - 3);
+	      freeParamsEmp += 19.0;
+
+	      switch(tr->rateHetModel)
+		{
+		case CAT:
+		  freeParamsFixed += (double)tr->partitionData[model].numberOfCategories;
+		  freeParamsEmp += (double)tr->partitionData[model].numberOfCategories;
+		  break;
+		case GAMMA: 
+		  freeParamsFixed += 1.0;
+		  freeParamsEmp += 1.0;
+		  break;
+		case GAMMA_I:
+		  freeParamsFixed += 2.0;
+		  freeParamsEmp += 2.0;
+		  break;
+		default:
+		  assert(0);
+		}
+		    
+	      switch(tr->autoProteinSelectionType)
+		{
+		case AUTO_ML:	
+		  if(bestLhFixed > bestLhEmp)
+		    {
+		      tr->partitionData[model].autoProtModels = bestIndexFixed;
+		      tr->partitionData[model].protFreqs = 1;
+		    }
+		  else
+		    {
+		      tr->partitionData[model].autoProtModels = bestIndexEmp;
+		      tr->partitionData[model].protFreqs = 0;
+		    }
+		  break;
+		case AUTO_BIC:
+		  { 
+		    //BIC: -2 * lnL + k * ln(n)
+		    double
+		      bicFixed = -2.0 * bestLhFixed + freeParamsFixed * log(samples),
+		      bicEmp   = -2.0 * bestLhEmp   + freeParamsEmp   * log(samples);
+
+		    if(bicFixed < bicEmp)
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexFixed;
+			tr->partitionData[model].protFreqs = 1;
+		      }
+		    else
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexEmp;
+			tr->partitionData[model].protFreqs = 0;
+		      }		   
+		  }
+		  break;
+		case AUTO_AIC:
+		  {
+		    //AIC: 2 * (k - lnL)
+		    double
+		      aicFixed = 2.0 * (freeParamsFixed - bestLhFixed),
+		      aicEmp   = 2.0 * (freeParamsEmp   - bestLhEmp);
+		    
+		    if(aicFixed < aicEmp)
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexFixed;
+			tr->partitionData[model].protFreqs = 1;
+		      }
+		    else
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexEmp;
+			tr->partitionData[model].protFreqs = 0;
+		      }	
+		  }
+		  break;
+		case AUTO_AICC:
+		  { 
+		    //AICc: AIC + (2 * k * (k + 1))/(n - k - 1)
+		    double
+		      aiccFixed, 
+		      aiccEmp;   
+
+		    /* 
+		     * Even though samples and freeParamsFixed are fp variables, they are actually integers.
+		     * That's why we are comparing with a 0.5 threshold.
+		     */
+		    
+		    if(fabs(samples - freeParamsFixed - 1.0) < 0.5) 		      
+		      aiccFixed = 0.0;
+		    else 
+		      aiccFixed = (2.0 * (freeParamsFixed - bestLhFixed)) + ((2.0 * freeParamsFixed * (freeParamsFixed + 1.0)) / (samples - freeParamsFixed - 1.0));
+
+		    if(fabs(samples - freeParamsEmp - 1.0) < 0.5)
+		      aiccEmp = 0.0;
+		    else 
+		      aiccEmp   = (2.0 * (freeParamsEmp   - bestLhEmp))   + ((2.0 * freeParamsEmp   * (freeParamsEmp + 1.0))   / (samples - freeParamsEmp   - 1.0));
+
+		    if(aiccFixed < aiccEmp)
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexFixed;
+			tr->partitionData[model].protFreqs = 1;
+		      }
+		    else
+		      {
+			tr->partitionData[model].autoProtModels = bestIndexEmp;
+			tr->partitionData[model].protFreqs = 0;
+		      }	
+		  }
+		  break;
+		default:
+		  assert(0);
+		}
+
+	      initReversibleGTR(tr, model);  
+	      printBothOpen("\tPartition: %d best-scoring AA model: %s likelihood %f with %s base frequencies\n", 
+			    model, protModels[tr->partitionData[model].autoProtModels],  
+			    (tr->partitionData[model].protFreqs == 1)?bestLhFixed:bestLhEmp, 
+			    (tr->partitionData[model].protFreqs == 1)?"fixed":"empirical");
+		  
+	    }	 
+	}
+
+      printBothOpen("\n\n");
+          
+      resetBranches(tr);
+      evaluateGeneric(tr, tr->start, TRUE); 
+      treeEvaluate(tr, 2.0);    
+
+      //printf("exit %f\n", tr->likelihood);
+      
+      if(tr->likelihood < startLH)
+	{	
+	  for(model = 0; model < tr->NumberOfModels; model++)
+	    {
+	      if(tr->partitionData[model].protModels == AUTO)
+		{
+		  tr->partitionData[model].autoProtModels = oldIndex[model];
+		  tr->partitionData[model].protFreqs = oldFreqs[model] ;
+		  initReversibleGTR(tr, model);
+		}
+	    }
+	  
+
+	  restoreTL(rl, tr, 0);	
+	  evaluateGeneric(tr, tr->start, TRUE);              
+	}
+      
+      assert(tr->likelihood >= startLH);
+      
+      freeTL(rl);   
+      free(rl); 
+      
+      free(oldIndex);
+      free(bestIndex);
+      free(bestScores);
+      free(bestIndexEmpFreqs);
+      free(bestScoresEmpFreqs);
+      free(oldFreqs);
+    }
+}
+
+
+static void checkMatrixSymnmetriesAndLinkage(tree *tr, linkageList *ll)
+{
+  int 
+    i;
+  
+  for(i = 0; i < ll->entries; i++)
+    {
+      int
+	partitions = ll->ld[i].partitions;
+
+      if(partitions > 1)
+	{
+	  int
+	    k, 
+	    reference = ll->ld[i].partitionList[0];
+
+	  for(k = 1; k < partitions; k++)
+	    {
+	      int 
+		index = ll->ld[i].partitionList[k];
+
+	      int
+		states = tr->partitionData[index].states,
+		rates = ((states * states - states) / 2);
+	      
+	      if(tr->partitionData[reference].nonGTR != tr->partitionData[index].nonGTR)
+		assert(0);
+	      
+	      if(tr->partitionData[reference].nonGTR)
+		{
+		  int 
+		    j;
+		  
+		  for(j = 0; j < rates; j++)
+		    {
+		      if(tr->partitionData[reference].symmetryVector[j] != tr->partitionData[index].symmetryVector[j])
+			assert(0);
+		    }
+		}
+	    }	    
+	}
+    }
+}
+
+
+static void checkTolerance(double l1, double l2)
+{
+  if(l1 < l2)
+    {   
+      double 
+	tolerance = fabs(MAX(l1, l2) * 0.000000000001);
+
+      if(fabs(l1 - l2) > MIN(0.1, tolerance))
+	{
+	  printf("Likelihood problem in model optimization l1: %1.40f l2: %1.40f tolerance: %1.40f\n", l1, l2, tolerance);
+	  assert(0);	
+	}
+    }
+}
+
+void modOpt(tree *tr, double likelihoodEpsilon, analdef *adef, int treeIteration)
+{ 
+  int 
+    i, 
+    catOpt = 0,
+    *unlinked = (int *)malloc(sizeof(int) * tr->NumberOfModels);
+  
+  double 
+    inputLikelihood,
+    currentLikelihood,
+    modelEpsilon = 0.0001;
+  
+  linkageList 
+    *alphaList,
+    *rateList,
+    *freqList;      
+
+  for(i = 0; i < tr->NumberOfModels; i++)
+    unlinked[i] = i;
+
+  //test code for library 
+  if(0)
+    {
+      //assuming that we have three partitions for testing here 
+
+      alphaList = initLinkageListString("0,1,2", tr);
+      rateList  = initLinkageListString("0,1,1", tr);
+    
+      init_Q_MatrixSymmetries("0,1,2,3,4,5", tr, 0);
+      init_Q_MatrixSymmetries("0,1,2,3,4,4", tr, 1);
+      init_Q_MatrixSymmetries("0,1,1,2,3,4", tr, 2);
+      
+      //function that checks that partitions that have linked Q matrices as in our example above
+      //will not have different configurations of the Q matrix as set by the init_Q_MatrixSymmetries() function
+      //e.g., on would have HKY and one would have GTR, while the user claimes that they are linked
+      //in our example, the Q matrices of partitions 1 and 2 are linked 
+      //but we set different matrix symmetries via 
+      // init_Q_MatrixSymmetries("0,1,2,3,4,4", tr, 1);
+      // and
+      // init_Q_MatrixSymmetries("0,1,1,2,3,4", tr, 2);
+      //
+      //the function just let's assertions fail for the time being .....
+
+      checkMatrixSymnmetriesAndLinkage(tr, rateList);
+    }
+  else
+    {
+      alphaList = initLinkageList(unlinked, tr);
+      freqList  = initLinkageList(unlinked, tr);
+      rateList  = initLinkageListGTR(tr);
+    }
+   
+  tr->start = tr->nodep[1];
+                 
+  if(adef->useCheckpoint && adef->mode == TREE_EVALUATION)
+    {
+      assert(ckp.state == MOD_OPT);
+          	
+      catOpt = ckp.catOpt;             
+    }
+
+  inputLikelihood = tr->likelihood;
+
+  evaluateGeneric(tr, tr->start, TRUE); 
+
+ 
+
+  assert(inputLikelihood == tr->likelihood);
+
+  do
+    {    
+      if(adef->mode == TREE_EVALUATION)
+	{
+	  ckp.state = MOD_OPT;
+	  
+	  ckp.catOpt = catOpt;
+
+	  ckp.treeIteration = treeIteration;
+	  
+	  writeCheckpoint(tr, adef);
+	}   
+      
+      currentLikelihood = tr->likelihood;     
+          
+#ifdef _DEBUG_MOD_OPT
+      printf("start: %f\n", currentLikelihood);
+#endif
+
+      optRatesGeneric(tr, modelEpsilon, rateList);
+                        
+      evaluateGeneric(tr, tr->start, TRUE);    
+
+#ifdef _DEBUG_MOD_OPT
+      printf("after rates %f\n", tr->likelihood);
+#endif                                                  
+
+      autoProtein(tr, adef);
+
+      treeEvaluate(tr, 0.0625);    
+      
+#ifdef _DEBUG_MOD_OPT
+      evaluateGeneric(tr, tr->start, TRUE); 
+      printf("after br-len 1 %f\n", tr->likelihood);
+#endif     
+
+      evaluateGeneric(tr, tr->start, TRUE);
+
+      optBaseFreqs(tr, modelEpsilon, freqList);
+      
+      evaluateGeneric(tr, tr->start, TRUE);
+      
+      treeEvaluate(tr, 0.0625);
+
+#ifdef _DEBUG_MOD_OPT
+      evaluateGeneric(tr, tr->start, TRUE); 
+      printf("after optBaseFreqs 1 %f\n", tr->likelihood);
+#endif 
+
+      switch(tr->rateHetModel)
+	{
+	case GAMMA:      	  
+	  optAlphasGeneric(tr, modelEpsilon, alphaList); 
+	  
+	  evaluateGeneric(tr, tr->start, TRUE); 
+	 	 
+#ifdef _DEBUG_MOD_OPT	 
+	  printf("after alphas %f\n", tr->likelihood);
+#endif	  
+	  treeEvaluate(tr, 0.1);	  	 
+
+#ifdef _DEBUG_MOD_OPT
+	  evaluateGeneric(tr, tr->start, TRUE); 
+	  printf("after br-len 2 %f\n", tr->likelihood);
+#endif
+	 
+	  break;
+	case CAT:
+	  if(catOpt < 3)
+	    {	      	     	     	     
+	      evaluateGeneric(tr, tr->start, TRUE);
+	      optimizeRateCategories(tr, tr->categories);	      	     	      	      	     
+
+#ifdef _DEBUG_MOD_OPT
+	      evaluateGeneric(tr, tr->start, TRUE); 
+	      printf("after cat-opt %f\n", tr->likelihood);
+#endif
+
+	      catOpt++;
+	    }
+	  break;	  
+	default:
+	  assert(0);
+	}                                
+
+      checkTolerance(tr->likelihood, currentLikelihood);
+
+      /*
+	if(tr->likelihood < currentLikelihood)
+	printf("%f %f\n", tr->likelihood, currentLikelihood);
+	assert(tr->likelihood >= currentLikelihood);
+      */
+      
+      printAAmatrix(tr, fabs(currentLikelihood - tr->likelihood));            
+    }
+  while(fabs(currentLikelihood - tr->likelihood) > likelihoodEpsilon);  
+  
+  free(unlinked);
+  freeLinkageList(freqList);
+  freeLinkageList(alphaList);
+  freeLinkageList(rateList);
+}
+
diff --git a/examl/partitionAssignment.c b/examl/partitionAssignment.c
new file mode 100644
index 0000000..3432ae2
--- /dev/null
+++ b/examl/partitionAssignment.c
@@ -0,0 +1,693 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+
+#include "partitionAssignment.h" 
+
+extern int processID; 
+
+
+void initializePartitionAssignment( PartitionAssignment **pAssPtr, pInfo **partitions, int numPart, int numProc)
+{
+  int 
+    i; 
+  
+  PartitionAssignment 
+    *pAss;
+
+  *pAssPtr = (PartitionAssignment*)calloc(1, sizeof(PartitionAssignment)); 
+  
+  pAss = *pAssPtr; 
+
+  pAss->numProc = numProc; 
+  pAss->numPartitions = numPart; 
+
+  pAss->partitions = (Partition *)calloc((size_t)pAss->numPartitions, sizeof(Partition)); 
+  
+  for(i = 0; i < numPart; ++i)
+    {
+      Partition 
+	*p = pAss->partitions + i; 
+      p->id = i; 
+      p->width = partitions[i]->upper - partitions[i]->lower;
+      p->type = partitions[i]->states;
+    }
+  
+  pAss->assignPerProc = (Assignment **)calloc((size_t)pAss->numProc , sizeof(Assignment*)); 
+  pAss->numAssignPerProc = (int *)calloc((size_t)pAss->numProc, sizeof(int)); 
+}
+
+
+void deletePartitionAssignment(PartitionAssignment *pAss)
+{
+  int 
+    i; 
+
+  free(pAss->partitions); 
+  for(i = 0; i < pAss->numProc; ++i)
+    free(pAss->assignPerProc[i]);
+  free(pAss->assignPerProc); 
+  free(pAss);
+}
+
+
+static int partSort(const void *a, const void *b ) 
+{
+  return  ((const Partition*)a)->width - ((const Partition*) b)->width ; 
+}
+
+
+/** 
+    helper function that executes the assignment of a partial assignment (only numElem character are assigned)
+ */ 
+static void assignPartitionPartial(PartitionAssignment *pa, Partition* p, int procId, int *numAssigned, size_t *sizeAssigned, size_t offset, size_t numElem)
+{
+  Assignment 
+    *a;
+  
+  int 
+    newArrayLen; 
+ 
+  ++numAssigned[procId]; 
+  ++pa->numAssignPerProc[procId]; 
+
+  newArrayLen = pa->numAssignPerProc[procId]; 
+
+  pa->assignPerProc[procId] = (Assignment*)realloc(pa->assignPerProc[procId], newArrayLen * sizeof(Assignment)); 
+  
+  a = pa->assignPerProc[procId] + (newArrayLen-1); 
+
+  a->offset = offset; 
+  a->partId = p->id; 
+  a->width = numElem; 
+  sizeAssigned[procId] += numElem; 
+}
+
+
+/** 
+    helper function that executes the assignment of a full partition 
+ */ 
+static void assignPartitionFull(PartitionAssignment* pa, Partition* p, int procId, int *numAssigned, size_t *sizeAssigned)
+{
+  assignPartitionPartial(pa, p, procId, numAssigned, sizeAssigned, 0, p->width);
+} 
+
+
+/** 
+    Request a process to which a part of the partition should
+    be assigned to. 
+
+    At this stage there are processes that have one more partition
+    assignment than others. These have been categorized into two
+    stacks (high and low).  For each stack, we have in iter variable
+    (of type int**) that gets decremented, if an element is removed
+    from the stack and a the start of the array that allows us to
+    determine when the stack is empty.
+
+    popAndYield tries to satisfy the request for a process with more
+    or less assignments (see wantLow), but will return any process, if
+    the request cannot be fulfilled.
+
+    If both queues are empty, it will return -1.
+ */ 
+static int popAndYield(int **procsHighIter, int *procsHighStart, int **procsLowIter, int *procsLowStart, boolean wantLow)
+{
+  boolean 
+    fromHigh = FALSE, 
+    fromLow = FALSE;
+
+  int 
+    result = -1; 
+  
+  if(wantLow)
+    {
+      if(*procsLowIter - procsLowStart > 0) 
+	fromLow = TRUE; 
+      else 
+	if(*procsHighIter - procsHighStart > 0)
+	  fromHigh = TRUE; 
+    }
+  else 
+    {
+      if(*procsHighIter - procsHighStart > 0 )
+	fromHigh = TRUE; 
+      else 
+	if(*procsLowIter - procsLowStart > 0)
+	  fromLow = TRUE; 
+    }
+
+  if(fromHigh) 
+    {
+      result = **procsHighIter;
+      --(*procsHighIter); 
+    }
+  else 
+    if(fromLow)
+      {
+	result = **procsLowIter; 
+	--(*procsLowIter); 
+      }
+
+  return result; 
+}
+
+
+
+static void assignThesePartitions(PartitionAssignment* pa, Partition *partitions, int numCur)
+{
+  int
+    proc, 
+    remainder,			/* number of processes that receive 1 character less than other s */
+    i,
+    numFull = 0,		/* number of processes that cannot take any more   */
+    numLow = 0,			/*  */
+    *numAssigned = (int *)NULL,	/* number of characters assigned to a process */
+    *procsHighIter = (int *)NULL, /* stack of processes that have one assignment more than others  */
+    *procsLowIter = (int *)NULL,  /* stack of processes that have one assignment less than others  */
+    *procsLowStart = (int *)NULL, /* start of stack */
+    *procsHighStart = (int *)NULL, /* start of stack */
+    highProc,			   /* id of a process that potentially has more partitions assigned than others   */
+    lowProc;			   /* id of a process that potentially has less partitions assigned than others   */
+  
+  size_t
+    totalElems = 0,
+    *sizeAssigned = (size_t *)NULL,
+    toAdd,
+    cap; 			/* defines a cap: once we have
+				   assigned this many characters, the
+				   remaining processes will get one
+				   character less */
+  
+  boolean 
+    iterate = TRUE; 
+  
+  Partition 
+    *partIter = partitions,
+    *partEnd = partitions +numCur; 
+
+  /* The following implements Kassian's algorithm. Originally, his
+     algorithm consists of 5 phases */
+
+  /*
+    Sorts partitions according to their size. According to Kassians
+    algorithm, this step is NOT obligatory. However, if we do it, then
+    phase 3 (called "top-up" phase) is not necessary (this has been
+    clarified with Kassian).
+  */ 
+  qsort(partitions, numCur, sizeof(Partition), partSort);
+
+  for(i = 0; i < numCur; ++i)
+    totalElems += partitions[i].width; 
+
+  cap = ceil( (double)totalElems / (double)pa->numProc ); 
+  
+  remainder = cap * pa->numProc - totalElems; 
+  
+  assert(remainder >= 0 ); 
+  
+  numAssigned = (int *)calloc((size_t)pa->numProc, sizeof(int));
+  
+  sizeAssigned = (size_t *)calloc((size_t)pa->numProc, sizeof(size_t)); 
+  
+  /* phase 2: initial distribution of full partitions to procesess. We
+     distribute full partitions until for the first time, we cannot
+     assign an entire partition any more, because this would exceed
+     the number of characters we want to assign to this process */
+  while(iterate)
+    {           
+      for(proc = 0; proc < pa->numProc;++proc)
+	{
+	  if(partIter < partEnd && sizeAssigned[proc] + partIter->width <= cap)
+	    {
+	      assignPartitionFull(pa, partIter, proc, numAssigned, sizeAssigned); 
+	      
+	      if(sizeAssigned[proc] == cap)
+		{
+		  ++numFull;
+		  if(numFull == pa->numProc - remainder)
+		    --cap; 
+		}
+	      
+	      ++partIter; 
+	    }
+	  else 
+	    {
+	      numLow = numAssigned[proc]; 
+	      iterate = FALSE; 
+	      break; 
+	    }
+	}
+    }
+
+  /* phase 3: top-up => not necessary because of previous sorting */
+
+  
+  /* 
+     phase 4: stick breaking
+
+     Here we partially assign the remaining partitions to processes
+     until every process has as many characters as it can take.
+  */
+
+  
+  /* first categorize processes into two stacks, dependent on whether
+     they have gotton one more partition than others */
+  procsHighIter =  (int*)calloc((size_t)pa->numProc + 1, sizeof(int)); 
+  procsLowIter = (int *)calloc((size_t)pa->numProc + 1, sizeof(int)); 
+  procsLowStart = procsLowIter; 
+  procsHighStart = procsHighIter;
+    
+
+  numFull = 0; 
+  
+  for(proc = 0; proc < pa->numProc; ++proc)
+    {
+      if(sizeAssigned[proc] < cap)
+ 	{
+ 	  if(numAssigned[proc] == numLow)
+ 	    {
+ 	      ++procsLowIter; 
+ 	      *procsLowIter = proc; 
+ 	    }
+ 	  else  
+ 	    {
+ 	      ++procsHighIter;
+ 	      *procsHighIter = proc; 
+ 	    }
+ 	}
+      else 	
+	++numFull; 
+    }
+  
+  assert((procsHighIter - procsHighStart) + (procsLowIter - procsLowStart) + numFull == pa->numProc); 
+
+  toAdd = (partIter < partEnd) ? partIter->width : 0  ; 
+  highProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE); 
+  lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, TRUE); 
+
+
+  /* 
+     now assign as long as there is something to assign. Once both
+     stacks are empty, popAndYield yields -1. This then breaks the
+     loop condition here.
+   */ 
+  while(  ! (highProc == -1 && lowProc == -1 
+	     && (procsHighIter - procsHighStart <= 0 ) && (procsLowIter - procsLowStart <= 0 )) )
+    {
+      /* try to finish a assignments for a process that has many partitions  */
+      if(highProc != -1 && sizeAssigned[highProc] + toAdd >= cap)
+	{ 
+	  size_t
+	    toTransfer = cap - sizeAssigned[highProc],
+	    offset = partIter->width - toAdd; 
+	  
+	  assignPartitionPartial( pa, partIter, highProc, numAssigned, sizeAssigned, offset, toTransfer);
+	  
+	  toAdd -= toTransfer; 
+
+	  if(toAdd == 0 && partIter < partEnd ) 
+	    {
+	      ++partIter; 
+	      toAdd = partIter < partEnd ?  partIter->width : 0; 
+	    }
+	  ++numFull; 
+	  
+	  if(numFull == pa->numProc - remainder)
+	    --cap; 
+
+	  highProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE); 
+	}
+      else 
+	if(lowProc != -1)
+	  {
+	    /* assign the enitre remaining portion to a process that
+	       still has fewer partitions */
+	    if(sizeAssigned[lowProc] + toAdd < cap)
+	      {
+		size_t
+		  offset = partIter->width - toAdd; 
+		
+		assignPartitionPartial(pa, partIter, lowProc, numAssigned, sizeAssigned, offset, toAdd); 
+		
+		if(highProc != -1 )
+		  {
+		    ++procsHighIter; 
+		    *procsHighIter = highProc; 
+		  }
+		
+		highProc = lowProc; 
+		
+		toAdd = 0; 
+		
+		if( partIter != partEnd )
+		  {
+		    ++partIter;
+		    toAdd = partIter < partEnd ? partIter->width : 0  ;
+		  }
+		
+		lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, TRUE); 
+	      }
+	    else 
+	      { 
+		/* assign as much as possible to a process with less
+		   partitions (the rest probably needs to be assigned
+		   to the next process) */
+
+		size_t
+		  toTransfer = cap - sizeAssigned[lowProc],
+		  offset = partIter->width - toAdd; 
+		
+		assignPartitionPartial(pa, partIter, lowProc, numAssigned, sizeAssigned, offset, toTransfer); 
+		
+		toAdd -= toTransfer;
+		
+		if(toAdd == 0 && partIter < partEnd)
+		  {
+		    ++partIter; 
+		    toAdd = partIter < partEnd ? partIter->width : 0; 
+		  }
+		
+		++numFull;
+		if(numFull == pa->numProc - remainder)
+		  --cap ;
+		
+		lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE); 
+	      }
+	  }
+	else 
+	  {
+	    /* should not occurr, but I am not entirely happly with
+	       this assert. */
+	    assert(0);
+	  }
+    }
+
+  assert(toAdd == 0 ); 
+  assert(partIter == partEnd); 
+  
+  free(numAssigned); 
+  free(sizeAssigned); 
+}
+
+/** 
+    Assigns all partitions. Notice that for each data type (currently
+    only AA and DNA), we execute the algorithm separately. Thus, in
+    the worst case imbalances for AA and DNA could hit the same
+    processes (but this is probably not worth bothering). 
+ */ 
+void assign(PartitionAssignment *pa)
+{
+  int 
+    partitionsHandled = 0,
+    curType = -1,
+    j, 
+    i; 
+
+  /* 
+     only handling 3 types (BIN, DNA, AA) at the moment. Please adapt,
+     when the number of types increases. 
+   */ 
+  int
+    types[3] = { 2, 4, 20 };
+
+  for(j = 0; j < 3 ; ++j)
+    {
+      size_t 
+	cnt; 
+      
+      Partition 
+	*curPartitions = (Partition *)NULL; 
+
+      curType = types[j]; 
+      
+      /* count number of type */
+      cnt = 0; 
+      for(i = 0; i < pa->numPartitions; ++i)      
+	{
+	  if(pa->partitions[i].type == curType)
+	    ++cnt; 
+	}
+
+      if(cnt == 0 )
+	continue; 
+      
+      curPartitions = (Partition*)calloc((size_t)cnt, sizeof(Partition)); 
+      
+      cnt = 0; 
+      for(i = 0; i< pa->numPartitions; ++i)
+	{
+	  if(pa->partitions[i].type == curType)
+	    curPartitions[cnt++] = pa->partitions[i]; 
+	}
+
+      assignThesePartitions(pa, curPartitions, cnt);
+      free(curPartitions); 
+      
+      partitionsHandled += cnt; 
+    }
+
+  assert(partitionsHandled ==  pa->numPartitions); 
+}
+
+
+
+
+void printAssignment(Assignment a, int procid)
+{
+  printf("p: %d\t(%lu,%lu) -> proc %d\n", a.partId, a.offset, a.width , procid); 
+} 
+
+
+void printAssignments(PartitionAssignment *pa)
+{
+  int i,j; 
+  printf("proc\toffset\tlength\tpart\n"); 
+  for(i = 0; i < pa->numProc; ++i)
+    {
+      for(j = 0; j < pa->numAssignPerProc[i] ; ++j)
+	{
+	  Assignment a = pa->assignPerProc[i][j]; 
+	  printf("%d\t%lu\t%lu\t%d\n", i,a.offset, a.width, a.partId); 
+	}
+    }
+}
+
+
+void printLoad(PartitionAssignment *pa)
+{
+  int 
+    i,
+    j, 
+    *numsPerProc = (int *)calloc((size_t)pa->numProc, sizeof(int)); 
+  
+  size_t
+    *sitesPerProc = (size_t *)calloc((size_t)pa->numProc, sizeof(size_t)); 
+
+  for(i = 0; i< pa->numProc; ++i)
+    {
+      for(j = 0; j < pa->numAssignPerProc[i]; ++j)
+	{
+	  Assignment a = pa->assignPerProc[i][j]; 
+	  sitesPerProc[i] += a.width; 
+	  ++numsPerProc[i]; 
+	}
+    }
+
+  printf("#proc\t#part\t#sites\n"); 
+  for( i = 0; i < pa->numProc ; ++i)
+    printf("%d\t%d\t%lu\n", i, numsPerProc[i], sitesPerProc[i]); 
+
+  free(numsPerProc);
+  free(sitesPerProc);
+}
+
+
+
+/**
+   allocates global arrays for CAT and sets the pointers in each pInfo
+   instance.
+ */ 
+static void setupBasePointersInTree(tree *tr)
+{
+  size_t 
+    len = 0;
+  int
+    i; 
+  
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    len += tr->partitionData[i].width ; 
+
+  tr->patrat_basePtr = (double*) calloc((size_t)len, sizeof(double));
+  tr->rateCategory_basePtr = (int*) calloc((size_t)len, sizeof(int)); 
+  tr->lhs_basePtr = (double*) calloc((size_t)len, sizeof(double)); 
+
+  len = 0; 
+  for(i = 0; i < tr->NumberOfModels; ++i)
+    {
+      if(tr->partitionData[i].width > 0)
+	{
+	  tr->partitionData[i].rateCategory = tr->rateCategory_basePtr + len; 
+	  tr->partitionData[i].patrat = tr->patrat_basePtr + len; 
+	  tr->partitionData[i].lhs = tr->lhs_basePtr + len ; 
+	}
+      else 
+	{
+	  tr->partitionData[i].rateCategory = (int *)NULL; 
+	  tr->partitionData[i].patrat = (double*)NULL; 
+	  tr->partitionData[i].lhs = (double*)NULL;
+	}
+
+      len += tr->partitionData[i].width; 
+    }
+}
+
+
+
+static int sortById(const void *a, const void *b)
+{
+  return ((Assign*) a)->partitionId -  ((Assign*) b)->partitionId ;
+}
+
+
+
+void copyAssignmentInfoToTree(PartitionAssignment *pa, tree *tr)
+{
+  int 
+    i,
+    numAssign = 0; 
+  
+  Assign 
+    *assIter;
+  
+  for(i = 0; i < pa->numProc; ++i)
+    numAssign += pa->numAssignPerProc[i]; 
+  
+  /* copy the partition assignment to the tree structure */
+
+  tr->numAssignments  = numAssign; 
+  tr->partAssigns = (Assign *)calloc((size_t)numAssign, sizeof(Assign)); 
+
+  assIter = tr->partAssigns; 
+  
+  for(i = 0; i < pa->numProc; ++i)
+    {
+      int j; 
+      for( j = 0 ; j < pa->numAssignPerProc[i]; ++j)
+	{
+	  Assignment *ass = pa->assignPerProc[i] + j; 
+	  assIter->procId = i; 
+	  assIter->offset = ass->offset; 
+	  assIter->width = ass->width; 
+	  assIter->partitionId = ass->partId; 
+	  ++assIter; 
+	}
+    }
+
+  /* 
+     the sorting makes it easier to deal with the assignments later in
+     case of a gather/scatter at the master. Thus, we do not need to
+     jump around in the array that be obtained or send to a process,
+     because we are sure that the data is ordered the same we as we
+     obtained.
+   */ 
+  qsort(tr->partAssigns, tr->numAssignments, sizeof(Assign), sortById);
+  
+  if(tr->rateHetModel == CAT)
+    setupBasePointersInTree( tr);
+}
+
+#ifdef _USE_OMP
+void copyThreadAssignmentInfoToTree(PartitionAssignment *pa, tree *tr)
+{
+  int
+    i, j;
+
+  /* we want to know max number of partitions assigned to a single thread -> mainly for memory allocation */
+
+  int
+    *numsPerProc = (int *)calloc((size_t)pa->numProc, sizeof(int)),
+    *numsPerPart = (int *)calloc((size_t)pa->numPartitions, sizeof(int));
+
+  for(i = 0; i< pa->numProc; ++i)
+    {
+      for(j = 0; j < pa->numAssignPerProc[i]; ++j)
+	{
+	  Assignment *a = &pa->assignPerProc[i][j];
+	  ++numsPerProc[i];
+	  ++numsPerPart[a->partId];
+	}
+    }
+
+  int
+    pmax = 0;
+
+  for(i = 1; i< pa->numProc; ++i)
+    {
+      if (numsPerProc[i] > numsPerProc[pmax])
+	pmax = i;
+    }
+
+  /* save max partition count to the tree structure */
+  tr->maxModelsPerThread = numsPerProc[pmax];
+
+  assert(tr->maxModelsPerThread > 0 && tr->maxModelsPerThread <= pa->numPartitions);
+
+  pmax = 0;
+  for(i = 1; i< pa->numPartitions; ++i)
+    {
+      if (numsPerPart[i] > numsPerPart[pmax])
+	pmax = i;
+    }
+
+  /* save max threads count to the tree structure */
+  tr->maxThreadsPerModel = numsPerPart[pmax];
+
+  assert(tr->maxThreadsPerModel > 0 && tr->maxThreadsPerModel <= pa->numProc);
+
+  free(numsPerProc);
+  free(numsPerPart);
+
+  printf("\n maxModelsPerThread: %d,   maxThreadsPerModel: %d\n", tr->maxModelsPerThread, tr->maxThreadsPerModel);
+
+  /* copy the partition assignment to the tree structure */
+
+  int
+    threadPartSize = pa->numProc * tr->maxModelsPerThread,
+    partThreadSize = pa->numPartitions * tr->maxThreadsPerModel;
+
+  tr->threadPartAssigns = (Assign **)calloc((size_t)threadPartSize, sizeof(Assign*));
+  tr->partThreadAssigns = (Assign **)calloc((size_t)partThreadSize, sizeof(Assign*));
+
+  for(i = 0; i < pa->numProc; ++i)
+    {
+      int
+	partCount = 0;
+
+      for( j = 0 ; j < pa->numAssignPerProc[i]; ++j)
+        {
+	  Assignment *ass = pa->assignPerProc[i] + j;
+	  Assign* pTreeAss = (Assign *)calloc(1, sizeof(Assign));
+	  pTreeAss->procId = i;
+	  pTreeAss->offset = ass->offset;
+	  pTreeAss->width = ass->width;
+	  pTreeAss->partitionId = ass->partId;
+
+	  size_t
+	    ind = i * tr->maxModelsPerThread + partCount;
+
+	  assert(ind < (i+1) * tr->maxModelsPerThread);
+
+	  tr->threadPartAssigns[ind] = pTreeAss;
+	  ++partCount;
+
+	  ind = ass->partId * tr->maxThreadsPerModel;
+	  while (tr->partThreadAssigns[ind])
+	    ++ind;
+
+	  assert( ind < (ass->partId+1) * tr->maxThreadsPerModel);
+
+	  tr->partThreadAssigns[ind] = pTreeAss;
+        }
+    }
+}
+#endif
diff --git a/examl/partitionAssignment.h b/examl/partitionAssignment.h
new file mode 100644
index 0000000..22f74d4
--- /dev/null
+++ b/examl/partitionAssignment.h
@@ -0,0 +1,64 @@
+#ifndef _PARTITIION_ASSIGNMENT
+#define _PARTITIION_ASSIGNMENT
+
+#include "axml.h"
+
+#define not !  
+
+
+typedef struct 
+{
+  int partId;
+  size_t width; 
+  size_t offset; 
+} Assignment; 
+
+
+typedef struct
+{
+  int id; 
+  size_t width; 
+  int type;
+}  Partition; 
+
+
+typedef struct
+{
+  int numProc; 
+  int numPartitions; 
+  Partition *partitions;
+  Assignment **assignPerProc;  	/* procid -> array of assignments  */
+  int *numAssignPerProc; 	/* procid -> size of above array */
+} PartitionAssignment; 
+
+/*
+  constructor
+*/ 
+void initializePartitionAssignment( PartitionAssignment **pAssPtr, pInfo **partitions, int numPart, int numProc);
+/* 
+   deletor 
+ */ 
+void deletePartitionAssignment(PartitionAssignment *pAss); 
+/*
+  assign partitions to all proceses  
+ */ 
+void assign(PartitionAssignment *pa); 
+/* 
+   prints a single assignment 
+ */ 
+void printAssignment(Assignment a, int procid); 
+/* 
+   calculates and prints the load (number of partitions and number of
+   sites) for each process
+ */ 
+void printLoad(PartitionAssignment *pa); 
+
+void copyAssignmentInfoToTree(PartitionAssignment *pa, tree *tr); 
+
+#ifdef _USE_OMP
+void copyThreadAssignmentInfoToTree(PartitionAssignment *pa, tree *tr);
+#endif
+
+void printAssignments(PartitionAssignment *pa); 
+
+#endif
diff --git a/examl/quartets.c b/examl/quartets.c
new file mode 100644
index 0000000..cabea69
--- /dev/null
+++ b/examl/quartets.c
@@ -0,0 +1,615 @@
+/*  RAxML-HPC, a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright March 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  stamatak at ics.forth.gr
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *  
+ *  Alexandros Stamatakis: "An Efficient Program for phylogenetic Inference Using Simulated Annealing". 
+ *  Proceedings of IPDPS2005,  Denver, Colorado, April 2005.
+ *  
+ *  AND
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#include <limits.h>
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include "axml.h"
+
+extern double masterTime;
+extern char workdir[1024];
+extern char run_id[128];
+extern char quartetGroupingFileName[1024];
+extern char quartetFileName[1024];
+extern checkPointState ckp;
+extern int processID;
+
+/* a parser error function */
+
+static void parseError(int c)
+{
+  printBothOpen("Quartet grouping parser expecting symbol: %c\n", c);
+  assert(0);
+}
+
+/* parser for the taxon grouping format, one has to specify 4 groups in a newick-like 
+   format from which quartets (a substantially smaller number compared to ungrouped quartets) 
+   will be drawn */
+
+static void groupingParser(char *quartetGroupFileName, int *groups[4], int groupSize[4], tree *tr)
+{
+  FILE 
+    *f = myfopen(quartetGroupFileName, "r");
+  
+  int 
+    taxonCounter = 0,
+    n,
+    state = 0,
+    groupCounter = 0,
+    ch,
+    i;
+
+  printBothOpen("%s\n", quartetGroupFileName);
+
+  for(i = 0; i < 4; i++)
+    {
+      groups[i] = (int*)malloc(sizeof(int) * (tr->mxtips + 1));
+      groupSize[i] = 0;
+    }
+  
+  while((ch = getc(f)) != EOF)
+    {
+      if(!whitechar(ch))
+	{
+	  switch(state)
+	    {
+	    case 0:
+	      if(ch != '(')
+		parseError('(');
+	      state = 1;
+	      break;
+	    case 1:
+	      ungetc(ch, f);
+	      n = treeFindTipName(f, tr, FALSE);  
+	      if(n <= 0 || n > tr->mxtips)		
+		printBothOpen("parsing error, raxml is expecting to read a taxon name, found \"%c\" instead\n", ch);		
+	      assert(n > 0 && n <= tr->mxtips);	     
+	      taxonCounter++;
+	      groups[groupCounter][groupSize[groupCounter]] = n;
+	      groupSize[groupCounter] = groupSize[groupCounter] + 1;	    
+	      state = 2;
+	      break;
+	    case 2:
+	      if(ch == ',')
+		state = 1;
+	      else
+		{
+		  if(ch == ')')
+		    {
+		      groupCounter++;
+		      state = 3;
+		    }
+		  else
+		    parseError('?');
+		}
+	      break;
+	    case 3:
+	      if(groupCounter == 4)
+		{
+		  if(ch == ';')
+		    state = 4;
+		  else
+		    parseError(';');
+		}
+	      else
+		{
+		  if(ch != ',')
+		    parseError(',');
+		  state = 0;
+		}
+	      break; 
+	    case 4:
+	      printBothOpen("Error: extra char after ; %c\n", ch);
+	      assert(0);
+	    default:
+	      assert(0);
+	    }
+	}
+    }
+
+  assert(state == 4);
+  assert(groupCounter == 4); 
+  assert(taxonCounter == tr->mxtips);
+
+  printBothOpen("Successfully parsed quartet groups\n\n");
+
+  /* print out the taxa that have been assigned to the 4 groups */
+
+  for(i = 0; i < 4; i++)
+    {
+      int 
+	j;
+      
+      printBothOpen("group %d has %d members\n", i, groupSize[i]);
+
+      for(j = 0; j < groupSize[i]; j++)
+	printBothOpen("%s\n", tr->nameList[groups[i][j]]);
+
+      printBothOpen("\n");
+    }
+
+  fclose(f);
+}
+
+/*****************************/
+
+static void nniSmooth(tree *tr, nodeptr p, int maxtimes)
+{
+  int
+    i;
+
+  for(i = 0; i < tr->numBranches; i++)	
+    tr->partitionConverged[i] = FALSE;	
+ 
+  while (--maxtimes >= 0) 
+    {           
+      for(i = 0; i < tr->numBranches; i++)	
+	tr->partitionSmoothed[i] = TRUE;
+            
+      assert(!isTip(p->number, tr->mxtips)); 	
+      assert(!isTip(p->back->number, tr->mxtips));  
+      
+      update(tr, p);
+     
+      update(tr, p->next);
+     
+      update(tr, p->next->next);
+      
+      update(tr, p->back->next);
+      
+      update(tr, p->back->next->next);           
+     
+      if (allSmoothed(tr)) 
+	break;      
+    }
+
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      tr->partitionSmoothed[i] = FALSE; 
+      tr->partitionConverged[i] = FALSE;
+    }  
+}
+
+
+
+
+
+static double quartetLikelihood(tree *tr, nodeptr p1, nodeptr p2, nodeptr p3, nodeptr p4, nodeptr q1, nodeptr q2, analdef *adef, boolean firstQuartet)
+{
+  /* 
+     build a quartet tree, where q1 and q2 are the inner nodes and p1, p2, p3, p4
+     are the tips of the quartet where the sequence data is located.
+
+     initially set all branch lengths to the default value.
+  */
+
+  /* 
+     for the tree and node data structure used, please see one of the last chapter's of Joe 
+     Felsensteins book. 
+  */
+
+  hookupDefault(q1, q2, tr->numBranches);
+  
+  hookupDefault(q1->next,       p1, tr->numBranches);
+  hookupDefault(q1->next->next, p2, tr->numBranches);
+  
+  hookupDefault(q2->next,       p3, tr->numBranches);
+  hookupDefault(q2->next->next, p4, tr->numBranches);
+  
+  /* now compute the likelihood vectors at the two inner nodes of the tree,
+     here the virtual root is located between the two inner nodes q1 and q2.
+  */
+
+  newviewGeneric(tr, q1, FALSE);
+  newviewGeneric(tr, q2, FALSE);
+  
+  /* call a function that is also used for NNIs that iteratively optimizes all 
+     5 branch lengths in the tree.
+
+     Note that 16 is an important tuning parameter, this integer value determines 
+     how many times we visit all branches until we give up further optimizing the branch length 
+     configuration.
+  */
+
+  nniSmooth(tr, q1, 16);
+
+  /* now compute the log likelihood of the tree for the virtual root located between inner nodes q1 and q2 */
+  
+  /* debugging code 
+     {
+    double l;
+  */
+  
+  evaluateGeneric(tr, q1->back->next->next, FALSE);
+  
+  /* debugging code 
+     
+     l = tr->likelihood;
+
+     newviewGeneric(tr, q1);
+     newviewGeneric(tr, q2);
+     evaluateGeneric(tr, q1);
+     
+   
+     assert(ABS(l - tr->likelihood) < 0.00001);
+     }
+  */
+
+  return (tr->likelihood);
+}
+
+
+
+static void computeAllThreeQuartets(tree *tr, nodeptr q1, nodeptr q2, int t1, int t2, int t3, int t4, FILE *f, analdef *adef)
+{
+  /* set the tip nodes to different sequences 
+     with the tip indices t1, t2, t3, t4 */
+	       
+  nodeptr 
+    p1 = tr->nodep[t1],
+    p2 = tr->nodep[t2],
+    p3 = tr->nodep[t3], 
+    p4 = tr->nodep[t4];
+  
+  double 
+    l;
+  
+  /* first quartet */	    
+  
+  /* compute the likelihood of tree ((p1, p2), (p3, p4)) */
+  
+  l = quartetLikelihood(tr, p1, p2, p3, p4, q1, q2, adef, TRUE);
+ 
+  if(processID == 0)
+    fprintf(f, "%d %d | %d %d: %f\n", p1->number, p2->number, p3->number, p4->number, l);
+
+  /* second quartet */	    
+  
+  /* compute the likelihood of tree ((p1, p3), (p2, p4)) */
+  
+  l = quartetLikelihood(tr, p1, p3, p2, p4, q1, q2, adef, FALSE);
+
+  if(processID == 0)
+    fprintf(f, "%d %d | %d %d: %f\n", p1->number, p3->number, p2->number, p4->number, l);
+
+  /* third quartet */	    
+  
+  /* compute the likelihood of tree ((p1, p4), (p2, p3)) */
+  
+  l = quartetLikelihood(tr, p1, p4, p2, p3, q1, q2, adef, FALSE);
+
+  if(processID == 0)
+    fprintf(f, "%d %d | %d %d: %f\n", p1->number, p4->number, p2->number, p3->number, l);	    	   
+}
+
+/* the three quartet options: all quartets, randomly sub-sample a certain number n of quartets, 
+   subsample all quartets from 4 pre-defined groups of quartets */
+
+
+static void writeQuartetCheckpoint(uint64_t quartetCounter, FILE *f, tree *tr, analdef *adef)
+{
+  if(quartetCounter % adef->quartetCkpInterval == 0)
+    {     
+      ckp.quartetCounter = quartetCounter;
+      if(processID == 0)
+        { 
+          fflush(f);
+          ckp.filePosition = ftell(f);
+        }
+      printBothOpen("\nPrinting checkpoint after %f seconds of run-time\n", gettime() - masterTime);      
+      writeCheckpoint(tr, adef);      
+    }
+}
+
+
+#define ALL_QUARTETS 0
+#define RANDOM_QUARTETS 1
+#define GROUPED_QUARTETS 2
+
+void computeQuartets(tree *tr, analdef *adef)
+{
+  /* some indices for generating quartets in an arbitrary way */
+
+  int
+    flavor = ALL_QUARTETS, //type of quartet calculation 
+    i, 
+    t1, 
+    t2, 
+    t3, 
+    t4, 
+    *groups[4],
+    groupSize[4];    
+
+  double
+    fraction = 0.0;
+
+  uint64_t
+    randomQuartets = (uint64_t)(adef->numberRandomQuartets), //number of random quartets to compute 
+    quartetCounter = 0, 
+    //total number of possible quartets, note that we count the following ((A,B),(C,D)), ((A,C),(B,D)), ((A,D),(B,C)) as one quartet here 
+    numberOfQuartets = ((uint64_t)tr->mxtips * ((uint64_t)tr->mxtips - 1) * ((uint64_t)tr->mxtips - 2) * ((uint64_t)tr->mxtips - 3)) / 24; 
+  
+  /* use two inner tree nodes for building quartet trees */
+
+  nodeptr 	
+    q1 = tr->nodep[tr->mxtips + 1],
+    q2 = tr->nodep[tr->mxtips + 2];
+
+  FILE 
+    *f;
+
+  long
+    seed = (long)(tr->randomSeed);
+       
+  /***********************************/  
+
+  /* get a starting tree on which we optimize the likelihood model parameters: either reads in a tree or computes a randomized stepwise addition parsimony tree */
+  if(adef->useCheckpoint)
+    { 
+      /* read checkpoint file */
+      restart(tr, adef);
+
+      strcpy(quartetFileName, workdir);
+      strcat(quartetFileName, basename(ckp.quartetFileName));
+      printBothOpen("Time for reading checkpoint file: %f\n\n", gettime() - masterTime); 
+
+      seed = ckp.seed;
+   
+      if(processID == 0)
+        {   
+           f = myfopen(quartetFileName, "r+");
+        
+           fseek(f, ckp.filePosition, SEEK_SET);  
+           if(ftruncate(fileno(f),  ckp.filePosition) != 0)
+  	    assert(0);
+        }
+    }
+  else
+    {
+      getStartingTree(tr);
+      evaluateGeneric(tr, tr->start, TRUE);
+      treeEvaluate(tr, 1);
+
+      /* optimize model parameters on that comprehensive tree that can subsequently be used for evaluation of quartet likelihoods */
+
+      modOpt(tr, adef->likelihoodEpsilon, adef, 0);
+
+      printBothOpen("Time for parsing input tree or building parsimony tree and optimizing model parameters: %f\n\n", gettime() - masterTime); 
+      printBothOpen("Tree likelihood: %f\n\n", tr->likelihood);  
+
+      if(processID == 0)
+        f = myfopen(quartetFileName, "w");
+    }
+
+  /* figure out which flavor of quartets we want to compute */
+
+  if(adef->useQuartetGrouping)
+    {
+      //quartet grouping evaluates all possible quartets from four disjoint 
+      //sets of user-specified taxon names 
+
+      flavor = GROUPED_QUARTETS;
+      
+      //parse the four disjoint sets of taxon names specified by the user from file      
+      groupingParser(quartetGroupingFileName, groups, groupSize, tr);
+    }
+  else
+    {
+      //if the user specified more random quartets to sample than there actually 
+      //exist for the number of taxa, then fix this.
+
+      if(randomQuartets == 0 || randomQuartets >= numberOfQuartets)
+	//TODO add usre warning? if second case above true? 
+	flavor = ALL_QUARTETS;
+      else
+	{     	  
+	  //compute the fraction of random quartets to sample 
+	  //there may be an issue here with the unit64_t <-> double cast
+	  fraction = (double)randomQuartets / (double)numberOfQuartets;      
+	  assert(fraction < 1.0);
+	  flavor = RANDOM_QUARTETS;
+	}
+    }
+
+  ckp.state = QUARTETS;
+  ckp.seed = seed; 
+  strncpy(ckp.quartetFileName, quartetFileName, 1024);
+
+  /* print some output on what we are doing*/
+
+  switch(flavor)
+    {
+    case ALL_QUARTETS:
+      printBothOpen("There are %" PRIu64 " quartet sets for which RAxML will evaluate all %" PRIu64 " quartet trees\n", numberOfQuartets, numberOfQuartets * 3);
+      break;
+    case RANDOM_QUARTETS:
+      printBothOpen("There are %" PRIu64 " quartet sets for which RAxML will randomly sub-sambple %" PRIu64 " sets (%f per cent), i.e., compute %" PRIu64 " quartet trees\n", 
+		    numberOfQuartets, randomQuartets, 100 * fraction, randomQuartets * 3);
+      break;
+    case GROUPED_QUARTETS:  
+      printBothOpen("There are 4 quartet groups from which RAxML will evaluate all %u quartet trees\n", 
+		    (unsigned int)groupSize[0] * (unsigned int)groupSize[1] * (unsigned int)groupSize[2] * (unsigned int)groupSize[3] * 3);
+      break;
+    default:
+      assert(0);
+    }
+
+  /* print taxon name to taxon number correspondance table to output file */
+
+  if(!adef->useCheckpoint)
+    {
+      if(processID == 0)
+	fprintf(f, "Taxon names and indices:\n\n");
+      
+      for(i = 1; i <= tr->mxtips; i++)
+	{
+	  if(processID == 0)
+	    fprintf(f, "%s %d\n", tr->nameList[i], i);
+	  assert(tr->nodep[i]->number == i);
+	}
+      
+      if(processID == 0)
+        {
+	  fprintf(f, "\n\n");
+          fflush(f);
+        }
+    }
+  
+  
+  /* do a loop to generate some quartets to test.
+     note that tip nodes/sequences in RAxML are indexed from 1,...,n
+     and not from 0,...,n-1 as one might expect 
+     
+     tr->mxtips is the maximum number of tips in the alignment/tree
+  */
+
+
+  //now do the respective quartet evaluations by switching over the three distinct flavors 
+     
+  switch(flavor)
+    {
+    case ALL_QUARTETS:
+      {		    
+	/* compute all possible quartets */	   	   
+	
+	for(t1 = 1; t1 <= tr->mxtips; t1++)
+	  for(t2 = t1 + 1; t2 <= tr->mxtips; t2++)
+	    for(t3 = t2 + 1; t3 <= tr->mxtips; t3++)
+	      for(t4 = t3 + 1; t4 <= tr->mxtips; t4++)
+		{		      
+		  if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+		    {
+		      writeQuartetCheckpoint(quartetCounter, f, tr, adef);		      
+		      
+		      computeAllThreeQuartets(tr, q1, q2, t1, t2, t3, t4, f, adef);
+		    }
+		  quartetCounter++;
+		}
+	
+	assert(quartetCounter == numberOfQuartets);
+      }
+      break;
+    case RANDOM_QUARTETS:
+      {	 
+	
+	//endless loop ta make sure we randomly sub-sample exactly as many quartets as the user specified
+	
+	//This is not very elegant, but it works, note however, that especially when the number of 
+	//random quartets to be sampled is large, that is, close to the total number of quartets 
+	//some quartets may be sampled twice by pure chance. To randomly sample unique quartets 
+	//using hashes or bitmaps to store which quartets have already been sampled is not memory efficient.
+	//Insetad, we need to use a random number generator that can generate a unique series of random numbers 
+	//and then have a function f() that maps those random numbers to the corresponding index quartet (t1, t2, t3, t4).
+	
+	do
+	  {	      
+	    //loop over all quartets 
+	    for(t1 = 1; t1 <= tr->mxtips; t1++)
+	      for(t2 = t1 + 1; t2 <= tr->mxtips; t2++)
+		for(t3 = t2 + 1; t3 <= tr->mxtips; t3++)
+		  for(t4 = t3 + 1; t4 <= tr->mxtips; t4++)
+		    {
+		      //chose a random number
+		      double
+			r = randum(&seed);
+			  			  
+		      //if the random number is smaller than the fraction of quartets to subsample
+		      //evaluate the likelihood of the current quartet
+		      if(r < fraction)
+			{
+			  if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+			    {
+			      writeQuartetCheckpoint(quartetCounter, f, tr, adef);			     			      
+			      
+			      //function that computes the likelihood for all three possible unrooted trees 
+			      //defined by the given quartet of taxa 
+			      computeAllThreeQuartets(tr, q1, q2, t1, t2, t3, t4, f, adef);
+			    }
+			  //increment quartet counter that counts how many quartets we have evaluated
+			  quartetCounter++;
+			}
+		      
+		      //exit endless loop if we have randomly sub-sampled as many quartets as the user specified
+		      if(quartetCounter == randomQuartets)
+			goto DONE;
+		    }
+	  }
+	while(1);
+	
+      DONE:
+	assert(quartetCounter == randomQuartets);	  
+      }
+      break;
+    case GROUPED_QUARTETS:
+      {
+	/* compute all quartets that can be built out of the four pre-defined groups */
+	
+	for(t1 = 0; t1 < groupSize[0]; t1++)
+	  for(t2 = 0; t2 < groupSize[1]; t2++)
+	    for(t3 = 0; t3 < groupSize[2]; t3++)
+	      for(t4 = 0; t4 < groupSize[3]; t4++)
+		{
+		  int
+		    i1 = groups[0][t1],
+		    i2 = groups[1][t2],
+		    i3 = groups[2][t3],
+		    i4 = groups[3][t4];
+		  
+		  if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+		    {
+		      writeQuartetCheckpoint(quartetCounter, f, tr, adef);
+		      		     
+		      computeAllThreeQuartets(tr, q1, q2, i1, i2, i3, i4, f, adef);
+		    }
+		  quartetCounter++;
+		}
+	
+	printBothOpen("\nComputed all %" PRIu64 " possible grouped quartets\n", quartetCounter); 	    
+      }
+      break;
+    default:
+      assert(0);
+    }
+  
+  fclose(f);
+}
diff --git a/examl/restartHashTable.c b/examl/restartHashTable.c
new file mode 100644
index 0000000..6df780a
--- /dev/null
+++ b/examl/restartHashTable.c
@@ -0,0 +1,357 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+static boolean treeNeedString(const char *fp, char c1, int *position)
+{
+  char 
+    c2 = fp[(*position)++];
+  
+  if(c2 == c1)  
+    return TRUE;
+  else  
+    {   
+      int 
+	lower = MAX(0, *position - 20),
+	upper = *position + 20;
+      
+      printf("Tree Parsing ERROR: Expecting '%c', found: '%c'\n", c1, c2); 
+      printf("Context: \n");
+      
+      while(lower < upper && fp[lower])
+	printf("%c", fp[lower++]);
+      
+      printf("\n");
+
+      return FALSE;
+  }
+} 
+
+
+
+static boolean treeLabelEndString (char ch)
+{
+  switch(ch) 
+    {   
+    case '\0':  
+    case '\t':  
+    case '\n':  
+    case '\r': 
+    case ' ':
+    case ':':  
+    case ',':   
+    case '(':   
+    case ')':  
+    case ';':
+      return TRUE;
+    default:
+      break;
+    }
+  
+  return FALSE;
+} 
+
+static boolean  treeGetLabelString (const char *fp, char *lblPtr, int maxlen, int *position)
+{
+  char 
+    ch;
+  
+  boolean  
+    done, 
+    lblfound;
+
+  if (--maxlen < 0) 
+    lblPtr = (char *)NULL; 
+  else 
+    if(lblPtr == NULL) 
+      maxlen = 0;
+
+  ch = fp[(*position)++];
+  
+  done = treeLabelEndString(ch);
+
+  lblfound = !done;  
+
+  while(!done) 
+    {      
+      if(treeLabelEndString(ch)) 
+	break;     
+
+      if(--maxlen >= 0) 
+	*lblPtr++ = ch;
+      
+      ch = fp[(*position)++];      
+    }
+  
+  (*position)--; 
+
+  if (lblPtr != NULL) 
+    *lblPtr = '\0';
+
+  return lblfound;
+}
+
+static boolean  treeFlushLabelString(const char *fp, int *position)
+{ 
+  return  treeGetLabelString(fp, (char *) NULL, (int) 0, position);
+} 
+
+
+static boolean treeProcessLengthString (const char *fp, double *dptr, int *position)
+{ 
+  (*position)++;
+  
+  if(sscanf(&fp[*position], "%lf", dptr) != 1) 
+    {
+      printf("ERROR: treeProcessLength: Problem reading branch length\n");     
+      assert(0);
+    }
+
+  while(fp[*position] != ',' && fp[*position] != ')' && fp[*position] != ';')
+    *position = *position + 1;
+  
+  return  TRUE;
+}
+
+static int treeFlushLenString (const char *fp, int *position)
+{
+  double  
+    dummy;  
+  
+  char     
+    ch;
+
+  ch = fp[(*position)++];
+ 
+  if(ch == ':') 
+    {     
+      if(!treeProcessLengthString(fp, &dummy, position)) 
+	return 0;
+      return 1;	  
+    }
+    
+  (*position)--;
+
+  return 1;
+} 
+
+static int treeFindTipByLabelString(char  *str, tree *tr)                    
+{
+  int lookup = lookupWord(str, tr->nameHash);
+
+  if(lookup > 0)
+    {
+      assert(! tr->nodep[lookup]->back);
+      return lookup;
+    }
+  else
+    { 
+      printf("ERROR: Cannot find tree species: %s\n", str);
+      return  0;
+    }
+}
+
+static int treeFindTipNameString (const char *fp, tree *tr, int *position)
+{
+  char    str[nmlngth+2];
+  int      n;
+
+  if(treeGetLabelString(fp, str, nmlngth+2, position))
+    n = treeFindTipByLabelString(str, tr);
+  else
+    n = 0;
+   
+  return  n;
+} 
+
+static boolean addElementLenString(const char *fp, tree *tr, nodeptr p, int *position)
+{
+  nodeptr  
+    q;
+  
+  int      
+    n, 
+    fres;
+
+  char 
+    ch;
+  
+  if ((ch = fp[(*position)++]) == '(') 
+    { 
+      n = (tr->nextnode)++;
+      if (n > 2*(tr->mxtips) - 2) 
+	{
+	  if (tr->rooted || n > 2*(tr->mxtips) - 1) 
+	    {
+	      printf("ERROR: Too many internal nodes.  Is tree rooted?\n");
+	      printf("       Deepest splitting should be a trifurcation.\n");
+	      return FALSE;
+	    }
+	  else 
+	    {	   
+	      tr->rooted = TRUE;
+	    }
+	}
+      
+      q = tr->nodep[n];
+
+      if (!addElementLenString(fp, tr, q->next, position))        
+	return FALSE;
+      if (!treeNeedString(fp, ',', position))             
+	return FALSE;
+      if (!addElementLenString(fp, tr, q->next->next, position))  
+	return FALSE;
+      if (!treeNeedString(fp, ')', position))             
+	return FALSE;
+      
+     
+      treeFlushLabelString(fp, position);
+    }
+  else 
+    {   
+      (*position)--;
+     
+      if ((n = treeFindTipNameString(fp, tr, position)) <= 0)          
+	return FALSE;
+      q = tr->nodep[n];
+      
+      if (tr->start->number > n)  
+	tr->start = q;
+      (tr->ntips)++;
+    }
+  
+     
+  fres = treeFlushLenString(fp, position);
+  if(!fres) 
+    return FALSE;
+  
+  hookupDefault(p, q, tr->numBranches);
+
+  return TRUE;          
+}
+
+
+
+
+void treeReadTopologyString(char *treeString, tree *tr)
+{ 
+  char 
+    *fp = treeString;
+
+  nodeptr  
+    p;
+  
+  int
+    position = 0, 
+    i;
+  
+  char 
+    ch;   
+    
+
+  for(i = 1; i <= tr->mxtips; i++)    
+    tr->nodep[i]->back = (node *)NULL;      
+  
+  for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+    {
+      tr->nodep[i]->back = (nodeptr)NULL;
+      tr->nodep[i]->next->back = (nodeptr)NULL;
+      tr->nodep[i]->next->next->back = (nodeptr)NULL;
+      tr->nodep[i]->number = i;
+      tr->nodep[i]->next->number = i;
+      tr->nodep[i]->next->next->number = i;           
+    }
+      
+  tr->start       = tr->nodep[1];
+  tr->ntips       = 0;
+  tr->nextnode    = tr->mxtips + 1;    
+  tr->rooted      = FALSE;      
+  
+  p = tr->nodep[(tr->nextnode)++]; 
+   
+  assert(fp[position++] == '(');  
+    
+  if (! addElementLenString(fp, tr, p, &position))                 
+    assert(0);
+  
+  if (! treeNeedString(fp, ',', &position))                
+    assert(0);
+   
+  if (! addElementLenString(fp, tr, p->next, &position))           
+    assert(0);
+
+  if(!tr->rooted) 
+    {
+      if ((ch = fp[position++]) == ',') 
+	{ 
+	  if (! addElementLenString(fp, tr, p->next->next, &position)) 
+	    assert(0);	 
+	}
+      else 
+	assert(0);     
+    }
+  else
+    assert(0);
+        
+  if (! treeNeedString(fp, ')', &position))                
+    assert(0);
+
+  treeFlushLabelString(fp, &position);
+  
+  if (!treeFlushLenString(fp, &position))                         
+    assert(0);
+  
+  if (!treeNeedString(fp, ';', &position))       
+    assert(0);
+    
+  if(tr->rooted)     
+    assert(0);           
+  else           
+    tr->start = tr->nodep[1];   
+
+  printf("Tree parsed\n");
+
+} 
diff --git a/examl/searchAlgo.c b/examl/searchAlgo.c
new file mode 100644
index 0000000..f02399b
--- /dev/null
+++ b/examl/searchAlgo.c
@@ -0,0 +1,2651 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+extern int processes; 
+
+extern int Thorough;
+extern int optimizeRateCategoryInvocations;
+extern infoList iList;
+extern char seq_file[1024];
+extern char resultFileName[1024];
+extern char tree_file[1024];
+extern char workdir[1024];
+extern char run_id[128];
+extern double masterTime;
+extern double accumulatedTime;
+
+extern checkPointState ckp;
+extern partitionLengths pLengths[MAX_MODEL];
+extern char binaryCheckpointName[1024];
+extern char binaryCheckpointInputName[1024];
+
+extern int processID;
+
+
+
+static int checker(tree *tr, nodeptr p)
+{
+  int group = tr->constraintVector[p->number];
+
+  if(isTip(p->number, tr->mxtips))
+    {
+      group = tr->constraintVector[p->number];
+      return group;
+    }
+  else
+    {
+      if(group != -9) 
+	return group;
+
+      group = checker(tr, p->next->back);
+      if(group != -9) 
+	return group;
+
+      group = checker(tr, p->next->next->back);
+      if(group != -9) 
+	return group;
+
+      return -9;
+    }
+}
+
+boolean initrav (tree *tr, nodeptr p)
+{ 
+  nodeptr  q;
+  
+  if (!isTip(p->number, tr->mxtips)) 
+    {      
+      q = p->next;
+      
+      do 
+	{	   
+	  if (! initrav(tr, q->back))  return FALSE;		   
+	  q = q->next;	
+	} 
+      while (q != p);  
+      
+      newviewGeneric(tr, p, FALSE);
+    }
+  
+  return TRUE;
+} 
+
+
+
+
+
+
+
+
+
+
+/* #define _DEBUG_UPDATE */ 
+
+boolean update(tree *tr, nodeptr p)
+{       
+  nodeptr  q; 
+  boolean smoothedPartitions[NUM_BRANCHES];
+  int i;
+  double   z[NUM_BRANCHES], z0[NUM_BRANCHES];
+  double _deltaz;
+
+#ifdef _DEBUG_UPDATE
+  double 
+    startLH;
+
+  evaluateGeneric(tr, p, FALSE);
+
+  startLH = tr->likelihood;
+#endif
+
+  q = p->back;   
+
+  for(i = 0; i < tr->numBranches; i++)
+    z0[i] = q->z[i];    
+
+  if(tr->numBranches > 1)
+    makenewzGeneric(tr, p, q, z0, newzpercycle, z, TRUE);  
+  else
+    makenewzGeneric(tr, p, q, z0, newzpercycle, z, FALSE);
+  
+  for(i = 0; i < tr->numBranches; i++)    
+    smoothedPartitions[i]  = tr->partitionSmoothed[i];
+      
+  for(i = 0; i < tr->numBranches; i++)
+    {         
+      if(!tr->partitionConverged[i])
+	{	  
+	  _deltaz = deltaz;
+	    
+	  if(ABS(z[i] - z0[i]) > _deltaz)  
+	    {	      
+	      smoothedPartitions[i] = FALSE;       
+	    }	 
+
+	  
+	  
+	  p->z[i] = q->z[i] = z[i];	 
+	}
+    }
+
+#ifdef _DEBUG_UPDATE
+  evaluateGeneric(tr, p, FALSE);
+
+  if(tr->likelihood <= startLH)
+    {
+      if(fabs(tr->likelihood - startLH) > 0.01)
+	{
+	  printf("%f %f\n", startLH, tr->likelihood);
+	  assert(0);      
+	}
+    }
+#endif
+
+  for(i = 0; i < tr->numBranches; i++)    
+    tr->partitionSmoothed[i]  = smoothedPartitions[i];
+  
+  return TRUE;
+}
+
+
+
+
+boolean smooth (tree *tr, nodeptr p)
+{
+  nodeptr  q;
+  
+  if (! update(tr, p))               return FALSE; /*  Adjust branch */
+  if (! isTip(p->number, tr->mxtips)) 
+    {                                  /*  Adjust descendants */
+      q = p->next;
+      while (q != p) 
+	{
+	  if (! smooth(tr, q->back))   return FALSE;
+	  q = q->next;
+	}
+      
+      if(tr->numBranches > 1)
+	newviewGeneric(tr, p, TRUE);     
+      else
+	newviewGeneric(tr, p, FALSE);
+    }
+  
+  return TRUE;
+} 
+
+boolean allSmoothed(tree *tr)
+{
+  int i;
+  boolean result = TRUE;
+  
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      if(tr->partitionSmoothed[i] == FALSE)
+	result = FALSE;
+      else
+	tr->partitionConverged[i] = TRUE;
+    }
+
+  return result;
+}
+
+
+
+boolean smoothTree (tree *tr, int maxtimes)
+{
+  nodeptr  p, q;   
+  int i, count = 0;
+   
+  p = tr->start;
+  for(i = 0; i < tr->numBranches; i++)
+    tr->partitionConverged[i] = FALSE;
+
+  while (--maxtimes >= 0) 
+    {    
+      for(i = 0; i < tr->numBranches; i++)	
+	tr->partitionSmoothed[i] = TRUE;		
+
+      if (! smooth(tr, p->back))       return FALSE;
+      if (!isTip(p->number, tr->mxtips)) 
+	{
+	  q = p->next;
+	  while (q != p) 
+	    {
+	      if (! smooth(tr, q->back))   return FALSE;
+	      q = q->next;
+	    }
+	}
+         
+      count++;
+
+      if (allSmoothed(tr)) 
+	break;      
+    }
+
+  for(i = 0; i < tr->numBranches; i++)
+    tr->partitionConverged[i] = FALSE;
+
+
+
+  return TRUE;
+} 
+
+
+
+boolean localSmooth (tree *tr, nodeptr p, int maxtimes)
+{ 
+  nodeptr  q;
+  int i;
+  
+  if (isTip(p->number, tr->mxtips)) return FALSE;
+  
+   for(i = 0; i < tr->numBranches; i++)	
+     tr->partitionConverged[i] = FALSE;	
+
+  while (--maxtimes >= 0) 
+    {     
+      for(i = 0; i < tr->numBranches; i++)	
+	tr->partitionSmoothed[i] = TRUE;
+	 	
+      q = p;
+      do 
+	{
+	  if (! update(tr, q)) return FALSE;
+	  q = q->next;
+        } 
+      while (q != p);
+      
+      if (allSmoothed(tr)) 
+	break;
+    }
+
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      tr->partitionSmoothed[i] = FALSE; 
+      tr->partitionConverged[i] = FALSE;
+    }
+
+  return TRUE;
+}
+
+
+
+
+
+static void resetInfoList(void)
+{
+  int i;
+
+  iList.valid = 0;
+
+  for(i = 0; i < iList.n; i++)    
+    {
+      iList.list[i].node = (nodeptr)NULL;
+      iList.list[i].likelihood = unlikely;
+    }    
+}
+
+void initInfoList(int n)
+{
+  int i;
+
+  iList.n = n;
+  iList.valid = 0;
+  iList.list = (bestInfo *)malloc(sizeof(bestInfo) * n);
+
+  for(i = 0; i < n; i++)
+    {
+      iList.list[i].node = (nodeptr)NULL;
+      iList.list[i].likelihood = unlikely;
+    }
+}
+
+void freeInfoList(void)
+{ 
+  free(iList.list);   
+}
+
+
+void insertInfoList(nodeptr node, double likelihood)
+{
+  int i;
+  int min = 0;
+  double min_l =  iList.list[0].likelihood;
+
+  for(i = 1; i < iList.n; i++)
+    {
+      if(iList.list[i].likelihood < min_l)
+	{
+	  min = i;
+	  min_l = iList.list[i].likelihood;
+	}
+    }
+
+  if(likelihood > min_l)
+    {
+      iList.list[min].likelihood = likelihood;
+      iList.list[min].node = node;
+      iList.valid += 1;
+    }
+
+  if(iList.valid > iList.n)
+    iList.valid = iList.n;
+}
+
+
+boolean smoothRegion (tree *tr, nodeptr p, int region)
+{ 
+  nodeptr  q;
+  
+  if (! update(tr, p))               return FALSE; /*  Adjust branch */
+
+  if(region > 0)
+    {
+      if (!isTip(p->number, tr->mxtips)) 
+	{                                 
+	  q = p->next;
+	  while (q != p) 
+	    {
+	      if (! smoothRegion(tr, q->back, --region))   return FALSE;
+	      q = q->next;
+	    }	
+	  
+	  newviewGeneric(tr, p, FALSE);
+	}
+    }
+  
+  return TRUE;
+}
+
+boolean regionalSmooth (tree *tr, nodeptr p, int maxtimes, int region)
+  {
+    nodeptr  q;
+    int i;
+
+    if (isTip(p->number, tr->mxtips)) return FALSE;            /* Should be an error */
+
+    for(i = 0; i < tr->numBranches; i++)
+      tr->partitionConverged[i] = FALSE;
+
+    while (--maxtimes >= 0) 
+      {	
+	for(i = 0; i < tr->numBranches; i++)	  
+	  tr->partitionSmoothed[i] = TRUE;
+	  
+	q = p;
+	do 
+	  {
+	    if (! smoothRegion(tr, q, region)) return FALSE;
+	    q = q->next;
+	  } 
+	while (q != p);
+	
+	if (allSmoothed(tr)) 
+	  break;
+      }
+
+    for(i = 0; i < tr->numBranches; i++)
+      tr->partitionSmoothed[i] = FALSE;
+    for(i = 0; i < tr->numBranches; i++)
+      tr->partitionConverged[i] = FALSE;
+   
+    return TRUE;
+  } /* localSmooth */
+
+
+
+
+
+nodeptr  removeNodeBIG (tree *tr, nodeptr p, int numBranches)
+{  
+  double   zqr[NUM_BRANCHES], result[NUM_BRANCHES];
+  nodeptr  q, r;
+  int i;
+        
+  q = p->next->back;
+  r = p->next->next->back;
+  
+  for(i = 0; i < numBranches; i++)
+    zqr[i] = q->z[i] * r->z[i];        
+   
+  makenewzGeneric(tr, q, r, zqr, iterations, result, FALSE);   
+
+  for(i = 0; i < numBranches; i++)        
+    tr->zqr[i] = result[i];
+
+  hookup(q, r, result, numBranches); 
+      
+  p->next->next->back = p->next->back = (node *) NULL;
+
+  return  q; 
+}
+
+nodeptr  removeNodeRestoreBIG (tree *tr, nodeptr p)
+{
+  nodeptr  q, r;
+        
+  q = p->next->back;
+  r = p->next->next->back;  
+
+  newviewGeneric(tr, q, FALSE);
+  newviewGeneric(tr, r, FALSE);
+  
+  hookup(q, r, tr->currentZQR, tr->numBranches);
+
+  p->next->next->back = p->next->back = (node *) NULL;
+     
+  return  q;
+}
+
+
+boolean insertBIG (tree *tr, nodeptr p, nodeptr q, int numBranches)
+{
+  nodeptr  r, s;
+  int i;
+  
+  r = q->back;
+  s = p->back;
+      
+  for(i = 0; i < numBranches; i++)
+    tr->lzi[i] = q->z[i];
+  
+  if(Thorough)
+    { 
+      double  zqr[NUM_BRANCHES], zqs[NUM_BRANCHES], zrs[NUM_BRANCHES], lzqr, lzqs, lzrs, lzsum, lzq, lzr, lzs, lzmax;      
+      double defaultArray[NUM_BRANCHES];	
+      double e1[NUM_BRANCHES], e2[NUM_BRANCHES], e3[NUM_BRANCHES];
+      double *qz;
+      
+      qz = q->z;
+      
+      for(i = 0; i < numBranches; i++)
+	defaultArray[i] = defaultz;
+      
+      makenewzGeneric(tr, q, r, qz, iterations, zqr, FALSE);           
+      makenewzGeneric(tr, q, s, defaultArray, iterations, zqs, FALSE);                  
+      makenewzGeneric(tr, r, s, defaultArray, iterations, zrs, FALSE);
+      
+      
+      for(i = 0; i < numBranches; i++)
+	{
+	  lzqr = (zqr[i] > zmin) ? log(zqr[i]) : log(zmin); 
+	  lzqs = (zqs[i] > zmin) ? log(zqs[i]) : log(zmin);
+	  lzrs = (zrs[i] > zmin) ? log(zrs[i]) : log(zmin);
+	  lzsum = 0.5 * (lzqr + lzqs + lzrs);
+	  
+	  lzq = lzsum - lzrs;
+	  lzr = lzsum - lzqs;
+	  lzs = lzsum - lzqr;
+	  lzmax = log(zmax);
+	  
+	  if      (lzq > lzmax) {lzq = lzmax; lzr = lzqr; lzs = lzqs;} 
+	  else if (lzr > lzmax) {lzr = lzmax; lzq = lzqr; lzs = lzrs;}
+	  else if (lzs > lzmax) {lzs = lzmax; lzq = lzqs; lzr = lzrs;}          
+	  
+	  e1[i] = exp(lzq);
+	  e2[i] = exp(lzr);
+	  e3[i] = exp(lzs);
+	}
+      hookup(p->next,       q, e1, numBranches);
+      hookup(p->next->next, r, e2, numBranches);
+      hookup(p,             s, e3, numBranches);      		  
+    }
+  else
+    {       
+      double  z[NUM_BRANCHES]; 
+      
+      for(i = 0; i < numBranches; i++)
+	{
+	  z[i] = sqrt(q->z[i]);      
+	  
+	  if(z[i] < zmin) 
+	    z[i] = zmin;
+	  if(z[i] > zmax)
+	    z[i] = zmax;
+	}
+      
+      hookup(p->next,       q, z, tr->numBranches);
+      hookup(p->next->next, r, z, tr->numBranches);	                         
+    }
+  
+  newviewGeneric(tr, p, FALSE);
+  
+  if(Thorough)
+    {     
+      localSmooth(tr, p, smoothings);  
+      
+      for(i = 0; i < numBranches; i++)
+	{
+	  tr->lzq[i] = p->next->z[i];
+	  tr->lzr[i] = p->next->next->z[i];
+	  tr->lzs[i] = p->z[i];            
+	}
+    }           
+  
+  return  TRUE;
+}
+
+boolean insertRestoreBIG (tree *tr, nodeptr p, nodeptr q)
+{
+  nodeptr  r, s;
+  
+  r = q->back;
+  s = p->back;
+
+  if(Thorough)
+    {                        
+      hookup(p->next,       q, tr->currentLZQ, tr->numBranches);
+      hookup(p->next->next, r, tr->currentLZR, tr->numBranches);
+      hookup(p,             s, tr->currentLZS, tr->numBranches);      		  
+    }
+  else
+    {       
+      double  z[NUM_BRANCHES];
+      int i;
+      
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  double zz;
+	  zz = sqrt(q->z[i]);     
+	  if(zz < zmin) 
+	    zz = zmin;
+	  if(zz > zmax)
+	    zz = zmax;
+  	  z[i] = zz;
+	}
+
+      hookup(p->next,       q, z, tr->numBranches);
+      hookup(p->next->next, r, z, tr->numBranches);
+    }   
+    
+  newviewGeneric(tr, p, FALSE);
+       
+  return  TRUE;
+}
+
+
+static void restoreTopologyOnly(tree *tr, bestlist *bt, bestlist *bestML)
+{ 
+  nodeptr p = tr->removeNode;
+  nodeptr q = tr->insertNode;
+  double qz[NUM_BRANCHES], pz[NUM_BRANCHES], p1z[NUM_BRANCHES], p2z[NUM_BRANCHES];
+  nodeptr p1, p2, r, s;
+  double currentLH = tr->likelihood;
+  int i;
+      
+  p1 = p->next->back;
+  p2 = p->next->next->back;
+  
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      p1z[i] = p1->z[i];
+      p2z[i] = p2->z[i];
+    }
+  
+  hookup(p1, p2, tr->currentZQR, tr->numBranches);
+  
+  p->next->next->back = p->next->back = (node *) NULL;             
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      qz[i] = q->z[i];
+      pz[i] = p->z[i];           
+    }
+  
+  r = q->back;
+  s = p->back;
+  
+  if(Thorough)
+    {                        
+      hookup(p->next,       q, tr->currentLZQ, tr->numBranches);
+      hookup(p->next->next, r, tr->currentLZR, tr->numBranches);
+      hookup(p,             s, tr->currentLZS, tr->numBranches);      		  
+    }
+  else
+    { 	
+      double  z[NUM_BRANCHES];	
+      for(i = 0; i < tr->numBranches; i++)
+	{
+	  z[i] = sqrt(q->z[i]);      
+	  if(z[i] < zmin)
+	    z[i] = zmin;
+	  if(z[i] > zmax)
+	    z[i] = zmax;
+	}
+      hookup(p->next,       q, z, tr->numBranches);
+      hookup(p->next->next, r, z, tr->numBranches);
+    }     
+  
+  tr->likelihood = tr->bestOfNode;
+    
+  saveBestTree(bt, tr, TRUE);
+  if(tr->saveBestTrees)
+    saveBestTree(bestML, tr, FALSE);
+  
+  tr->likelihood = currentLH;
+  
+  hookup(q, r, qz, tr->numBranches);
+  
+  p->next->next->back = p->next->back = (nodeptr) NULL;
+  
+  if(Thorough)    
+    hookup(p, s, pz, tr->numBranches);          
+      
+  hookup(p->next,       p1, p1z, tr->numBranches); 
+  hookup(p->next->next, p2, p2z, tr->numBranches);      
+}
+
+
+
+boolean testInsertBIG (tree *tr, nodeptr p, nodeptr q)
+{
+  double  qz[NUM_BRANCHES], pz[NUM_BRANCHES];
+  nodeptr  r;
+  boolean doIt = TRUE;
+  double startLH = tr->endLH;
+  int i;
+  
+  r = q->back; 
+  for(i = 0; i < tr->numBranches; i++)
+    {
+      qz[i] = q->z[i];
+      pz[i] = p->z[i];
+    }
+  
+  if(tr->constraintTree)
+    {
+      int rNumber, qNumber, pNumber;
+      
+      doIt = FALSE;
+      
+      rNumber = tr->constraintVector[r->number];
+      qNumber = tr->constraintVector[q->number];
+      pNumber = tr->constraintVector[p->number];
+      
+      if(pNumber == -9)
+	pNumber = checker(tr, p->back);
+      if(pNumber == -9)
+	doIt = TRUE;
+      else
+	{
+	  if(qNumber == -9)
+	    qNumber = checker(tr, q);
+	  
+	  if(rNumber == -9)
+	    rNumber = checker(tr, r);
+	  
+	  if(pNumber == rNumber || pNumber == qNumber)
+	    doIt = TRUE;    	  
+	}
+    }
+  
+  if(doIt)
+    {     
+      if (! insertBIG(tr, p, q, tr->numBranches))       return FALSE;         
+      
+      evaluateGeneric(tr, p->next->next, FALSE);   
+       
+      if(tr->likelihood > tr->bestOfNode)
+	{
+	  tr->bestOfNode = tr->likelihood;
+	  tr->insertNode = q;
+	  tr->removeNode = p;   
+	  for(i = 0; i < tr->numBranches; i++)
+	    {
+	      tr->currentZQR[i] = tr->zqr[i];           
+	      tr->currentLZR[i] = tr->lzr[i];
+	      tr->currentLZQ[i] = tr->lzq[i];
+	      tr->currentLZS[i] = tr->lzs[i];      
+	    }
+	}
+      
+      if(tr->likelihood > tr->endLH)
+	{			  
+	  tr->insertNode = q;
+	  tr->removeNode = p;   
+	  for(i = 0; i < tr->numBranches; i++)
+	    tr->currentZQR[i] = tr->zqr[i];      
+	  tr->endLH = tr->likelihood;                      
+	}        
+      
+      hookup(q, r, qz, tr->numBranches);
+      
+      p->next->next->back = p->next->back = (nodeptr) NULL;
+      
+      if(Thorough)
+	{
+	  nodeptr s = p->back;
+	  hookup(p, s, pz, tr->numBranches);      
+	} 
+      
+      if((tr->doCutoff) && (tr->likelihood < startLH))
+	{
+	  tr->lhAVG += (startLH - tr->likelihood);
+	  tr->lhDEC++;
+	  if((startLH - tr->likelihood) >= tr->lhCutoff)
+	    return FALSE;	    
+	  else
+	    return TRUE;
+	}
+      else
+	return TRUE;
+    }
+  else
+    return TRUE;  
+}
+
+
+
+
+
+
+ 
+void addTraverseBIG(tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav)
+{  
+  if (--mintrav <= 0) 
+    {              
+      if (! testInsertBIG(tr, p, q))  return;
+
+    }
+  
+  if ((!isTip(q->number, tr->mxtips)) && (--maxtrav > 0)) 
+    {    
+      addTraverseBIG(tr, p, q->next->back, mintrav, maxtrav);
+      addTraverseBIG(tr, p, q->next->next->back, mintrav, maxtrav);    
+    }
+} 
+
+
+
+
+
+int rearrangeBIG(tree *tr, nodeptr p, int mintrav, int maxtrav)   
+{  
+  double   p1z[NUM_BRANCHES], p2z[NUM_BRANCHES], q1z[NUM_BRANCHES], q2z[NUM_BRANCHES];
+  nodeptr  p1, p2, q, q1, q2;
+  int      mintrav2, i;  
+  boolean doP = TRUE, doQ = TRUE;
+  
+  if (maxtrav < 1 || mintrav > maxtrav)  return 0;
+  q = p->back;
+  
+ 
+  
+  if (!isTip(p->number, tr->mxtips) && doP) 
+    {     
+      p1 = p->next->back;
+      p2 = p->next->next->back;
+      
+     
+      if(!isTip(p1->number, tr->mxtips) || !isTip(p2->number, tr->mxtips))
+	{
+	  for(i = 0; i < tr->numBranches; i++)
+	    {
+	      p1z[i] = p1->z[i];
+	      p2z[i] = p2->z[i];	   	   
+	    }
+	  
+	  if (! removeNodeBIG(tr, p,  tr->numBranches)) return badRear;
+	  
+	  if (!isTip(p1->number, tr->mxtips)) 
+	    {
+	      addTraverseBIG(tr, p, p1->next->back,
+			     mintrav, maxtrav);         
+	      addTraverseBIG(tr, p, p1->next->next->back,
+			     mintrav, maxtrav);          
+	    }
+	  
+	  if (!isTip(p2->number, tr->mxtips)) 
+	    {
+	      addTraverseBIG(tr, p, p2->next->back,
+			     mintrav, maxtrav);
+	      addTraverseBIG(tr, p, p2->next->next->back,
+			     mintrav, maxtrav);          
+	    }
+	  	  
+	  hookup(p->next,       p1, p1z, tr->numBranches); 
+	  hookup(p->next->next, p2, p2z, tr->numBranches);	   	    	    
+	  newviewGeneric(tr, p, FALSE);	   	    
+	}
+    }  
+  
+  if (!isTip(q->number, tr->mxtips) && maxtrav > 0 && doQ) 
+    {
+      q1 = q->next->back;
+      q2 = q->next->next->back;
+      
+      /*if (((!q1->tip) && (!q1->next->back->tip || !q1->next->next->back->tip)) ||
+	((!q2->tip) && (!q2->next->back->tip || !q2->next->next->back->tip))) */
+      if (
+	  (
+	   ! isTip(q1->number, tr->mxtips) && 
+	   (! isTip(q1->next->back->number, tr->mxtips) || ! isTip(q1->next->next->back->number, tr->mxtips))
+	   )
+	  ||
+	  (
+	   ! isTip(q2->number, tr->mxtips) && 
+	   (! isTip(q2->next->back->number, tr->mxtips) || ! isTip(q2->next->next->back->number, tr->mxtips))
+	   )
+	  )
+	{
+	  
+	  for(i = 0; i < tr->numBranches; i++)
+	    {
+	      q1z[i] = q1->z[i];
+	      q2z[i] = q2->z[i];
+	    }
+	  
+	  if (! removeNodeBIG(tr, q, tr->numBranches)) return badRear;
+	  
+	  mintrav2 = mintrav > 2 ? mintrav : 2;
+	  
+	  if (/*! q1->tip*/ !isTip(q1->number, tr->mxtips)) 
+	    {
+	      addTraverseBIG(tr, q, q1->next->back,
+			     mintrav2 , maxtrav);
+	      addTraverseBIG(tr, q, q1->next->next->back,
+			     mintrav2 , maxtrav);         
+	    }
+	  
+	  if (/*! q2->tip*/ ! isTip(q2->number, tr->mxtips)) 
+	    {
+	      addTraverseBIG(tr, q, q2->next->back,
+			     mintrav2 , maxtrav);
+	      addTraverseBIG(tr, q, q2->next->next->back,
+			     mintrav2 , maxtrav);          
+	    }	   
+	  
+	  hookup(q->next,       q1, q1z, tr->numBranches); 
+	  hookup(q->next->next, q2, q2z, tr->numBranches);
+	  
+	  newviewGeneric(tr, q, FALSE); 	   
+	}
+    } 
+  
+  return  1;
+} 
+
+
+
+
+
+double treeOptimizeRapid(tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt, bestlist *bestML)
+{
+  int 
+    i, 
+    index,
+    *perm = (int*)NULL;   
+
+  nodeRectifier(tr);
+
+  if (maxtrav > tr->mxtips - 3)  
+    maxtrav = tr->mxtips - 3;  
+    
+  resetInfoList();
+  
+  resetBestTree(bt);
+ 
+  tr->startLH = tr->endLH = tr->likelihood;
+ 
+  if(tr->doCutoff)
+    {
+      if(tr->bigCutoff)
+	{	  
+	  if(tr->itCount == 0)    
+	    tr->lhCutoff = 0.5 * (tr->likelihood / -1000.0);    
+	  else    		 
+	    tr->lhCutoff = 0.5 * ((tr->lhAVG) / ((double)(tr->lhDEC))); 	  
+	}
+      else
+	{
+	  if(tr->itCount == 0)    
+	    tr->lhCutoff = tr->likelihood / -1000.0;    
+	  else    		 
+	    tr->lhCutoff = (tr->lhAVG) / ((double)(tr->lhDEC));   
+	}    
+
+      tr->itCount = tr->itCount + 1;
+      tr->lhAVG = 0;
+      tr->lhDEC = 0;
+    }
+  
+  /*
+    printf("DoCutoff: %d\n", tr->doCutoff);
+    printf("%d %f %f %f\n", tr->itCount, tr->lhAVG, tr->lhDEC, tr->lhCutoff);
+
+    printf("%d %d\n", mintrav, maxtrav);
+  */
+
+  for(i = 1; i <= tr->mxtips + tr->mxtips - 2; i++)
+    {           
+      tr->bestOfNode = unlikely;          
+
+      if(adef->permuteTreeoptimize)
+	index = perm[i];
+      else
+	index = i;     
+
+      if(rearrangeBIG(tr, tr->nodep[index], mintrav, maxtrav))
+	{    
+	  if(Thorough)
+	    {
+	      if(tr->endLH > tr->startLH)                 	
+		{			   	     
+		  restoreTreeFast(tr);	 	 
+		  tr->startLH = tr->endLH = tr->likelihood;	 
+		  saveBestTree(bt, tr, TRUE); 
+		  if(tr->saveBestTrees)
+		    saveBestTree(bestML, tr, FALSE);
+		}
+	      else
+		{ 		  
+		  if(tr->bestOfNode != unlikely)		    	     
+		    restoreTopologyOnly(tr, bt, bestML);		    
+		}	   
+	    }
+	  else
+	    {
+	      insertInfoList(tr->nodep[index], tr->bestOfNode);	    
+	      if(tr->endLH > tr->startLH)                 	
+		{		      
+		  restoreTreeFast(tr);	  	      
+		  tr->startLH = tr->endLH = tr->likelihood;	  	 	  	  	  	  	  	  
+		}	    	  
+	    }
+	}     
+    }     
+
+  if(!Thorough)
+    {           
+      Thorough = 1;  
+      
+      for(i = 0; i < iList.valid; i++)
+	{ 	  
+	  tr->bestOfNode = unlikely;
+	  
+	  if(rearrangeBIG(tr, iList.list[i].node, mintrav, maxtrav))
+	    {	  
+	      if(tr->endLH > tr->startLH)                 	
+		{	 	     
+		  restoreTreeFast(tr);	 	 
+		  tr->startLH = tr->endLH = tr->likelihood;	 
+		  saveBestTree(bt, tr, TRUE);
+		  if(tr->saveBestTrees)
+		    saveBestTree(bestML, tr, FALSE);
+		}
+	      else
+		{ 
+	      
+		  if(tr->bestOfNode != unlikely)
+		    {	     
+		      restoreTopologyOnly(tr, bt, bestML);
+		    }	
+		}      
+	    }
+	}       
+          
+      Thorough = 0;
+    }
+
+  if(adef->permuteTreeoptimize)
+    free(perm);
+
+  return tr->startLH;     
+}
+
+
+
+
+boolean testInsertRestoreBIG (tree *tr, nodeptr p, nodeptr q)
+{    
+  if(Thorough)
+    {
+      if (! insertBIG(tr, p, q, tr->numBranches))       return FALSE;    
+      
+      evaluateGeneric(tr, p->next->next, FALSE);               
+    }
+  else
+    {
+      if (! insertRestoreBIG(tr, p, q))       return FALSE;
+      
+      {
+	nodeptr x, y;
+	x = p->next->next;
+	y = p->back;
+			
+	if(! isTip(x->number, tr->mxtips) && isTip(y->number, tr->mxtips))
+	  {
+	    while ((! x->x)) 
+	      {
+		if (! (x->x))
+		  newviewGeneric(tr, x, FALSE);		     
+	      }
+	  }
+	
+	if(isTip(x->number, tr->mxtips) && !isTip(y->number, tr->mxtips))
+	  {
+	    while ((! y->x)) 
+	      {		  
+		if (! (y->x))
+		  newviewGeneric(tr, y, FALSE);
+	      }
+	  }
+	
+	if(!isTip(x->number, tr->mxtips) && !isTip(y->number, tr->mxtips))
+	  {
+	    while ((! x->x) || (! y->x)) 
+	      {
+		if (! (x->x))
+		  newviewGeneric(tr, x, FALSE);
+		if (! (y->x))
+		  newviewGeneric(tr, y, FALSE);
+	      }
+	  }				      	
+	
+      }
+	
+      tr->likelihood = tr->endLH;
+    }
+     
+  return TRUE;
+} 
+
+void restoreTreeFast(tree *tr)
+{
+  removeNodeRestoreBIG(tr, tr->removeNode);    
+  testInsertRestoreBIG(tr, tr->removeNode, tr->insertNode);
+}
+
+
+static void writeTree(tree *tr, FILE *f)
+{
+  int 
+    x = tr->mxtips + 3 * (tr->mxtips - 1);
+
+  nodeptr
+    base = tr->nodeBaseAddress;
+
+  myBinFwrite(&(tr->start->number), sizeof(int), 1, f);
+  myBinFwrite(&base, sizeof(nodeptr), 1, f);
+  myBinFwrite(tr->nodeBaseAddress, sizeof(node), x, f);
+
+}
+
+int ckpCount = 0;
+
+
+/**
+    gathers patrat and rateCategory
+ */
+static void gatherDistributedCatInfos(tree *tr, int **rateCategory_result, double **patrat_result)
+{
+  /*
+    countPerProc and displPerProc must be int, since the MPI functino
+    signatures demand so
+   */ 
+  
+  int 
+    *countPerProc = (int*)NULL, 
+    *displPerProc = (int*)NULL;
+
+  calculateLengthAndDisplPerProcess(tr,  &countPerProc, &displPerProc);
+  
+  if(processID == 0)
+    {
+      *rateCategory_result = (int*)calloc((size_t)tr->originalCrunchedLength , sizeof(int));
+      *patrat_result       = (double*)calloc((size_t)tr->originalCrunchedLength, sizeof(double));
+    }
+  
+  gatherDistributedArray(tr, (void**) patrat_result,  tr->patrat_basePtr, MPI_DOUBLE , countPerProc, displPerProc); 
+  gatherDistributedArray(tr, (void**) rateCategory_result, tr->rateCategory_basePtr, MPI_INT, countPerProc, displPerProc ); 
+
+  free(countPerProc);
+  free(displPerProc);
+}
+
+
+/** 
+    added parameters patrat and rateCategory. The checkpoint writer
+    has to gather this distributed information first. 
+ */ 
+static void writeCheckpointInner(tree *tr, int *rateCategory, double *patrat, analdef *adef)
+{
+  int   
+    model; 
+  
+  char 
+    extendedName[2048],
+    buf[64];
+
+  FILE 
+    *f;
+
+  /* only master should write the checkpoint */
+  assert(processID == 0); 
+
+  strcpy(extendedName,  binaryCheckpointName);
+  strcat(extendedName, "_");
+  sprintf(buf, "%d", ckpCount);
+  strcat(extendedName, buf);  
+
+  ckpCount++;
+
+  f = myfopen(extendedName, "w"); 
+  
+
+  ckp.cmd.useMedian = tr->useMedian;
+  ckp.cmd.saveBestTrees = tr->saveBestTrees;
+  ckp.cmd.saveMemory = tr->saveMemory;
+  ckp.cmd.searchConvergenceCriterion = tr->searchConvergenceCriterion;
+  ckp.cmd.perGeneBranchLengths = adef->perGeneBranchLengths; //adef
+  ckp.cmd.likelihoodEpsilon = adef->likelihoodEpsilon; //adef
+  ckp.cmd.categories =  tr->categories;
+  ckp.cmd.mode = adef->mode; //adef
+  ckp.cmd.fastTreeEvaluation =  tr->fastTreeEvaluation;
+  ckp.cmd.initialSet = adef->initialSet;//adef
+  ckp.cmd.initial = adef->initial;//adef
+  ckp.cmd.rateHetModel = tr->rateHetModel;
+  ckp.cmd.autoProteinSelectionType = tr->autoProteinSelectionType;
+
+  ckp.cmd.useQuartetGrouping = adef->useQuartetGrouping;
+  ckp.cmd.numberRandomQuartets = adef->numberRandomQuartets;
+  
+  /* cdta */   
+  
+  ckp.accumulatedTime = accumulatedTime + (gettime() - masterTime);
+  ckp.constraintTree = tr->constraintTree;
+
+  /* printf("Acc time: %f\n", ckp.accumulatedTime); */
+
+  myBinFwrite(&ckp, sizeof(checkPointState), 1, f);
+  
+  if(tr->constraintTree)
+    myBinFwrite(tr->constraintVector, sizeof(int), 2 * tr->mxtips, f);  
+
+  myBinFwrite(tr->tree0, sizeof(char), tr->treeStringLength, f);
+  myBinFwrite(tr->tree1, sizeof(char), tr->treeStringLength, f);
+
+
+  if(tr->rateHetModel == CAT)
+    {
+      myBinFwrite(rateCategory, sizeof(int), tr->originalCrunchedLength, f);
+      myBinFwrite(patrat, sizeof(double), tr->originalCrunchedLength, f);
+    }  
+
+  //end
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+      int 
+	dataType = tr->partitionData[model].dataType;
+            
+      myBinFwrite(&(tr->partitionData[model].numberOfCategories), sizeof(int), 1, f);
+      myBinFwrite(tr->partitionData[model].perSiteRates, sizeof(double), tr->maxCategories, f);
+      myBinFwrite(tr->partitionData[model].EIGN, sizeof(double), pLengths[dataType].eignLength, f);
+      myBinFwrite(tr->partitionData[model].EV, sizeof(double),  pLengths[dataType].evLength, f);
+      myBinFwrite(tr->partitionData[model].EI, sizeof(double),  pLengths[dataType].eiLength, f);  
+
+      myBinFwrite(tr->partitionData[model].freqExponents, sizeof(double),  pLengths[dataType].frequenciesLength, f);
+      myBinFwrite(tr->partitionData[model].frequencies,   sizeof(double),  pLengths[dataType].frequenciesLength, f);
+      myBinFwrite(tr->partitionData[model].tipVector,     sizeof(double),  pLengths[dataType].tipVectorLength, f);       
+      myBinFwrite(tr->partitionData[model].substRates, sizeof(double),  pLengths[dataType].substRatesLength, f);
+
+      //LG4X related variables 
+
+      myBinFwrite(tr->partitionData[model].weights , sizeof(double), 4, f);
+      myBinFwrite(tr->partitionData[model].weightExponents , sizeof(double), 4, f);
+      //myBinFwrite(tr->partitionData[model].weightsBuffer , sizeof(double), 4, f);
+      //myBinFwrite(tr->partitionData[model].weightExponentsBuffer , sizeof(double), 4, f);
+
+      //LG4X end 
+
+      if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+	{
+	  int 
+	    k;
+	  
+	  for(k = 0; k < 4; k++)
+	    {
+	      myBinFwrite(tr->partitionData[model].rawEIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+	      myBinFwrite(tr->partitionData[model].EIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+	      myBinFwrite(tr->partitionData[model].EV_LG4[k], sizeof(double),  pLengths[dataType].evLength, f);
+	      myBinFwrite(tr->partitionData[model].EI_LG4[k], sizeof(double),  pLengths[dataType].eiLength, f);    
+	      myBinFwrite(tr->partitionData[model].frequencies_LG4[k], sizeof(double),  pLengths[dataType].frequenciesLength, f);
+	      myBinFwrite(tr->partitionData[model].tipVector_LG4[k], sizeof(double),  pLengths[dataType].tipVectorLength, f);  
+	      myBinFwrite(tr->partitionData[model].substRates_LG4[k], sizeof(double),  pLengths[dataType].substRatesLength, f);    
+	    }
+	}
+    
+      myBinFwrite(&(tr->partitionData[model].alpha), sizeof(double), 1, f);
+      myBinFwrite(&(tr->partitionData[model].gammaRates), sizeof(double), 4, f);
+      
+      myBinFwrite(&(tr->partitionData[model].protModels), sizeof(int), 1, f);
+      myBinFwrite(&(tr->partitionData[model].autoProtModels), sizeof(int), 1, f);
+    }
+    
+  if(ckp.state == MOD_OPT)
+    {
+      myBinFwrite(tr->likelihoods, sizeof(double), tr->numberOfTrees, f);
+      myBinFwrite(tr->treeStrings, sizeof(char), (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees, f);
+    }
+
+  writeTree(tr, f);
+
+  fclose(f); 
+
+  /* printBothOpen("\nCheckpoint written to: %s likelihood: %f\n", extendedName, tr->likelihood); */
+}
+
+
+void writeCheckpoint(tree *tr, analdef *adef)
+{
+  int 
+    *rateCategory = (int *)NULL; 
+  
+  double 
+    *patrat = (double *)NULL; 
+
+  if(tr->rateHetModel == CAT)
+    gatherDistributedCatInfos(tr, &rateCategory, &patrat); 
+
+  if(processID == 0)
+    {
+      writeCheckpointInner(tr, rateCategory, patrat, adef); 
+
+      if(tr->rateHetModel == CAT)
+	{
+	  free(rateCategory); 
+	  free(patrat); 
+	}
+    }
+}
+
+
+
+
+static void readTree(tree *tr, FILE *f)
+{
+  int 
+    nodeNumber,   
+    x = tr->mxtips + 3 * (tr->mxtips - 1);
+
+ 
+
+  
+  
+  nodeptr
+    startAddress;
+
+  myBinFread(&nodeNumber, sizeof(int), 1, f);
+
+  tr->start = tr->nodep[nodeNumber];
+
+  /*printf("Start: %d %d\n", tr->start->number, nodeNumber);*/
+
+  myBinFread(&startAddress, sizeof(nodeptr), 1, f);
+
+  /*printf("%u %u\n", (size_t)startAddress, (size_t)tr->nodeBaseAddress);*/
+
+
+
+  myBinFread(tr->nodeBaseAddress, sizeof(node), x, f);
+    
+  {
+    int i;    
+
+    size_t         
+      offset;
+
+    boolean 
+      addIt;
+
+    if(startAddress > tr->nodeBaseAddress)
+      {
+	addIt = FALSE;
+	offset = (size_t)startAddress - (size_t)tr->nodeBaseAddress;
+      }
+    else
+      {
+	addIt = TRUE;
+	offset = (size_t)tr->nodeBaseAddress - (size_t)startAddress;
+      }       
+
+    for(i = 0; i < x; i++)
+      {      	
+	if(addIt)
+	  {	    
+	    tr->nodeBaseAddress[i].next = (nodeptr)((size_t)tr->nodeBaseAddress[i].next + offset);	
+	    tr->nodeBaseAddress[i].back = (nodeptr)((size_t)tr->nodeBaseAddress[i].back + offset);
+	  }
+	else
+	 {
+	  
+	   tr->nodeBaseAddress[i].next = (nodeptr)((size_t)tr->nodeBaseAddress[i].next - offset);	
+	   tr->nodeBaseAddress[i].back = (nodeptr)((size_t)tr->nodeBaseAddress[i].back - offset);	   
+	 } 
+      }
+
+  }
+  
+  evaluateGeneric(tr, tr->start, TRUE);  
+  
+  if(ckp.state != QUARTETS)
+    printBothOpen("ExaML Restart with likelihood: %1.50f\n", tr->likelihood);
+}
+
+static void genericError(void)
+{
+  printBothOpen("\nError: command lines used in initial run and re-start from checkpoint do not match!\n");
+}
+
+static void checkCommandLineArguments(tree *tr, analdef *adef)
+{
+  boolean
+    match = TRUE;
+
+  if(ckp.cmd.useMedian != tr->useMedian)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in median for gamma option: -a\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.saveBestTrees != tr->saveBestTrees)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in tree saving option: -B\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.saveMemory != tr->saveMemory)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in memory saving option: -S\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.searchConvergenceCriterion != tr->searchConvergenceCriterion)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in search convergence criterion: -D\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.perGeneBranchLengths != adef->perGeneBranchLengths)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in using per-partition branch lengths: -M\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.likelihoodEpsilon != adef->likelihoodEpsilon)
+     {
+      genericError();
+      printBothOpen("\nDisagreement in likelihood epsilon value: -e\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.categories !=  tr->categories)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in number of PSR rate categories: -c\n");
+      match = FALSE;
+    }
+
+  if(ckp.cmd.mode != adef->mode)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in tree search or evaluation mode\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.fastTreeEvaluation !=  tr->fastTreeEvaluation)
+    {
+      genericError();
+      printBothOpen("\nDisagreement in fast tree evaluation: -e|-E\n");
+      match = FALSE;
+    }
+  
+  
+
+  if(ckp.cmd.initialSet != adef->initialSet)
+     {
+      genericError();
+      printBothOpen("\nDisagreement in rearrangement radius limitation setting: -i\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.initial != adef->initial)
+     {
+      genericError();
+      printBothOpen("\nDisagreement in rearrangement radius value: -i\n");
+      match = FALSE;
+    }
+  
+  if(ckp.cmd.rateHetModel != tr->rateHetModel)
+     {
+      genericError();
+      printBothOpen("\nDisagreement in rate heterogeneity model: -m\n");
+      match = FALSE;
+    }
+
+   if(ckp.cmd.autoProteinSelectionType != tr->autoProteinSelectionType)
+     {
+      genericError();
+      printBothOpen("\nDisagreement in protein model selection criterion: --auto-prot\n");
+      match = FALSE;
+    }
+
+   if(ckp.cmd.useQuartetGrouping != adef->useQuartetGrouping)
+     {
+       genericError();
+       printBothOpen("\nDisagreement in quartet grouping option: -Y\n");
+       match = FALSE;
+     }
+
+   if(ckp.cmd.numberRandomQuartets != adef->numberRandomQuartets)
+     { 
+       genericError();
+       printBothOpen("\nDisagreement in number of random quartet subsamples: -r\n");
+       match = FALSE;
+     }
+
+  if(!match)
+    {
+      printBothOpen("\nExaML will exit now ...\n\n");
+      errorExit(-1);
+    }
+}
+
+static void readCheckpoint(tree *tr, analdef *adef)
+{
+  int   
+    model; 
+
+  FILE 
+    *f = myfopen(binaryCheckpointInputName, "rb");
+
+  /* cdta */   
+
+  myBinFread(&ckp, sizeof(checkPointState), 1, f);
+
+  checkCommandLineArguments(tr, adef);
+
+  tr->constraintTree = ckp.constraintTree;
+
+  if(tr->constraintTree)
+    myBinFread(tr->constraintVector, sizeof(int), 2 * tr->mxtips, f);  
+
+  tr->ntips = tr->mxtips;
+
+  
+
+  tr->startLH    = ckp.tr_startLH;
+  tr->endLH      = ckp.tr_endLH;
+  tr->likelihood = ckp.tr_likelihood;
+  tr->bestOfNode = ckp.tr_bestOfNode;
+  
+  tr->lhCutoff   = ckp.tr_lhCutoff;
+  tr->lhAVG      = ckp.tr_lhAVG;
+  tr->lhDEC      = ckp.tr_lhDEC;
+  tr->itCount    = ckp.tr_itCount;
+  Thorough       = ckp.Thorough;
+  
+  accumulatedTime = ckp.accumulatedTime;
+
+  /* printf("Accumulated time so far: %f\n", accumulatedTime); */
+
+  optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;
+
+
+  myBinFread(tr->tree0, sizeof(char), tr->treeStringLength, f);
+  myBinFread(tr->tree1, sizeof(char), tr->treeStringLength, f);
+
+  if(tr->searchConvergenceCriterion && processID == 0)
+    {
+      int bCounter = 0;
+      
+      if((ckp.state == FAST_SPRS && ckp.fastIterations > 0) ||
+	 (ckp.state == SLOW_SPRS && ckp.thoroughIterations > 0))
+	{ 
+
+#ifdef _DEBUG_CHECKPOINTING    
+	  printf("parsing Tree 0\n");
+#endif
+
+	  treeReadTopologyString(tr->tree0, tr);   
+	  
+	  bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, 0, BIPARTITIONS_RF, (branchInfo *)NULL,
+				  &bCounter, 1, FALSE, FALSE);
+	  
+	  assert(bCounter == tr->mxtips - 3);
+	}
+      
+      bCounter = 0;
+      
+      if((ckp.state == FAST_SPRS && ckp.fastIterations > 1) ||
+	 (ckp.state == SLOW_SPRS && ckp.thoroughIterations > 1))
+	{
+
+#ifdef _DEBUG_CHECKPOINTING
+	  printf("parsing Tree 1\n");
+#endif
+
+	  treeReadTopologyString(tr->tree1, tr); 
+	  
+	  bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, 1, BIPARTITIONS_RF, (branchInfo *)NULL,
+				  &bCounter, 1, FALSE, FALSE);
+	  
+	  assert(bCounter == tr->mxtips - 3);
+	}
+    }
+
+  
+  if(tr->rateHetModel == CAT )
+    {
+      /* every process reads its data */
+
+      /* Andre will this also work if we re-start with a different
+	 number of processors? have you tested? 
+	 
+	 => Andre: yes that works: before writing the checkpoint, we
+	 gather all lhs/patrat with gatherDistributedCatInfos. This
+	 function calls gatherDisributedArray in
+	 communication.c. gatherDistributedArray takes care of
+	 reordering the data it obtained from the various processes,
+	 such the correct global array (i.e., indexing consistent with
+	 character position) is obtained.  Thus, the indexing below
+	 (for reading in the patrat/lhs again) works correctly.  */
+      
+
+      /* Andre I think tr->originalCrunchedLength is of type size_t???
+	 -> casting required ?  
+	 
+	 Andre: in the very worst case, pPos overflows. There is not
+	 much one can do here. See explanation about fseek/fseeko at
+	 other location. But I have added an assert, in case something
+	 goes wrong */
+      exa_off_t
+	rPos = exa_ftell(f),      
+	pPos  = rPos + sizeof(int) * tr->originalCrunchedLength; 
+
+      /* fails, in case reading failed (ftello returns -1) or an overflow happened */
+      assert( ! ( rPos < 0 || pPos < 0 ) && rPos <= pPos ) ;
+      
+      /* first patrat then rateCategory */
+      
+      Assign *aIter =  tr->partAssigns,
+	*aEnd = &(tr->partAssigns [ tr->numAssignments ]) ; 
+      
+      /* Andre coould you maybe add a drawing (scanned drawn by hand if you like) documenting this layout ? => Andre: TODO  */
+
+      while(aIter != aEnd)
+	{
+	  if(aIter->procId == processID)
+	    {
+	      pInfo
+		*partition = &(tr->partitionData[aIter->partitionId]); 
+	      exa_off_t
+		theOffset = pPos + (partition->lower + aIter->offset)  * sizeof(double); 
+	      assert(pPos <= theOffset); 
+
+	      exa_fseek(f,  theOffset, SEEK_SET); 
+	      
+	      myBinFread(partition->patrat, sizeof(double), aIter->width, f);  
+
+	      theOffset = rPos + (partition->lower + aIter->offset) * sizeof(int); 
+	      assert(rPos <= theOffset); 
+	      exa_fseek(f, theOffset, SEEK_SET); 
+	      myBinFread(partition->rateCategory, sizeof(int), aIter->width, f); 
+	    } 
+	  ++aIter; 
+	}
+
+      /* Set file pointer to the end of both of the arrays    */
+      exa_fseek(f, pPos + tr->originalCrunchedLength * sizeof(double) , SEEK_SET); 
+    }
+
+
+  
+  
+ 
+
+  //end
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {
+      int 
+	dataType = tr->partitionData[model].dataType;
+            
+      myBinFread(&(tr->partitionData[model].numberOfCategories), sizeof(int), 1, f);
+      myBinFread(tr->partitionData[model].perSiteRates, sizeof(double), tr->maxCategories, f);
+      myBinFread(tr->partitionData[model].EIGN, sizeof(double), pLengths[dataType].eignLength, f);
+      myBinFread(tr->partitionData[model].EV, sizeof(double),  pLengths[dataType].evLength, f);
+      myBinFread(tr->partitionData[model].EI, sizeof(double),  pLengths[dataType].eiLength, f);  
+
+      myBinFread(tr->partitionData[model].freqExponents, sizeof(double),  pLengths[dataType].frequenciesLength, f);
+      myBinFread(tr->partitionData[model].frequencies, sizeof(double),  pLengths[dataType].frequenciesLength, f);
+      myBinFread(tr->partitionData[model].tipVector, sizeof(double),  pLengths[dataType].tipVectorLength, f);  
+      myBinFread(tr->partitionData[model].substRates, sizeof(double),  pLengths[dataType].substRatesLength, f);  
+
+      //LG4X related variables 
+
+      myBinFread(tr->partitionData[model].weights , sizeof(double), 4, f);
+      myBinFread(tr->partitionData[model].weightExponents , sizeof(double), 4, f);
+      //myBinFread(tr->partitionData[model].weightsBuffer , sizeof(double), 4, f);
+      //myBinFread(tr->partitionData[model].weightExponentsBuffer , sizeof(double), 4, f);
+
+      //LG4X end 
+
+      if(tr->partitionData[model].protModels == LG4X || tr->partitionData[model].protModels == LG4M)
+	{
+	  int 
+	    k;
+	  
+	  for(k = 0; k < 4; k++)
+	    {
+	       myBinFread(tr->partitionData[model].rawEIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+	      myBinFread(tr->partitionData[model].EIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+	      myBinFread(tr->partitionData[model].EV_LG4[k], sizeof(double),  pLengths[dataType].evLength, f);
+	      myBinFread(tr->partitionData[model].EI_LG4[k], sizeof(double),  pLengths[dataType].eiLength, f);    
+	      myBinFread(tr->partitionData[model].frequencies_LG4[k], sizeof(double),  pLengths[dataType].frequenciesLength, f);
+	      myBinFread(tr->partitionData[model].tipVector_LG4[k], sizeof(double),  pLengths[dataType].tipVectorLength, f);  
+	      myBinFread(tr->partitionData[model].substRates_LG4[k], sizeof(double),  pLengths[dataType].substRatesLength, f);    
+	    }
+	}
+
+
+      myBinFread(&(tr->partitionData[model].alpha), sizeof(double), 1, f);      
+      myBinFread(&(tr->partitionData[model].gammaRates), sizeof(double), 4, f);
+      //conditional added by Andre modified by me
+      //only overwrite values of discrete gamma cats by calling makeGammaCats if not using 
+      //LG4X!
+      if(tr->rateHetModel != CAT && !(tr->partitionData[model].protModels == LG4X))
+	makeGammaCats(tr->partitionData[model].alpha, tr->partitionData[model].gammaRates, 4, tr->useMedian); 
+
+      myBinFread(&(tr->partitionData[model].protModels), sizeof(int), 1, f);
+      myBinFread(&(tr->partitionData[model].autoProtModels), sizeof(int), 1, f);
+    }
+    
+  if(ckp.state == MOD_OPT)
+    {
+      myBinFread(tr->likelihoods, sizeof(double), tr->numberOfTrees, f);
+      myBinFread(tr->treeStrings, sizeof(char), (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees, f);
+    }
+
+  if(tr->rateHetModel == CAT)
+    checkPerSiteRates(tr); 
+
+  readTree(tr, f);
+  fclose(f); 
+}
+
+
+void restart(tree *tr, analdef *adef)
+{  
+  readCheckpoint(tr, adef);
+
+  switch(ckp.state)
+    {
+    case REARR_SETTING:      
+      assert(adef->mode == BIG_RAPID_MODE);
+      break;
+    case FAST_SPRS:
+      assert(adef->mode == BIG_RAPID_MODE);
+      break;
+    case SLOW_SPRS:
+      assert(adef->mode == BIG_RAPID_MODE);
+      break;
+    case MOD_OPT:
+      assert(adef->mode == TREE_EVALUATION);
+      break;
+    case QUARTETS:
+      assert(adef->mode == QUARTET_CALCULATION);
+      break;
+    default:
+      assert(0);
+    }
+}
+
+int determineRearrangementSetting(tree *tr,  analdef *adef, bestlist *bestT, bestlist *bt, bestlist *bestML)
+{
+  const 
+    int MaxFast = 26;
+  
+  int 
+    i,   
+    maxtrav = 5, 
+    bestTrav = 5;
+
+  double 
+    startLH = tr->likelihood; 
+  
+  boolean 
+    impr   = TRUE,
+    cutoff = tr->doCutoff;
+   
+  if(adef->useCheckpoint)
+    {
+      assert(ckp.state == REARR_SETTING);
+         
+      maxtrav = ckp.maxtrav;
+      bestTrav = ckp.bestTrav;
+      startLH  = ckp.startLH;
+      impr     = ckp.impr;
+      
+      cutoff = ckp.cutoff;
+
+      adef->useCheckpoint = FALSE;
+    }
+  
+  tr->doCutoff = FALSE;      
+
+  resetBestTree(bt);    
+ 
+#ifdef _DEBUG_CHECKPOINTING
+  printBothOpen("MAXTRAV: %d\n", maxtrav);
+#endif
+
+  assert(Thorough == 0);
+
+  while(impr && maxtrav < MaxFast)
+    {	
+      recallBestTree(bestT, 1, tr);     
+      nodeRectifier(tr);            
+      
+      /* Andre I believe that the code below, except for
+	 writeCheckpoint cann still only be executed by process 0 =>
+	 Andre: all other processes need to enter writeCheckpoint,
+	 because of the gather that happens there. But the assignments
+	 to the checkpoint state are not necessary for all processes;
+	 does it matter? */
+      {
+	ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;
+	  
+	ckp.cutoff = cutoff;
+	ckp.state = REARR_SETTING;     
+	ckp.maxtrav = maxtrav;
+	ckp.bestTrav = bestTrav;
+	ckp.startLH  = startLH;
+	ckp.impr = impr;
+	  
+	ckp.tr_startLH  = tr->startLH;
+	ckp.tr_endLH    = tr->endLH;
+	ckp.tr_likelihood = tr->likelihood;
+	ckp.tr_bestOfNode = tr->bestOfNode;
+	  
+	ckp.tr_lhCutoff = tr->lhCutoff;
+	ckp.tr_lhAVG    = tr->lhAVG;
+	ckp.tr_lhDEC    = tr->lhDEC;      
+	ckp.tr_itCount  = tr->itCount;
+	  
+	  
+	writeCheckpoint(tr, adef);    
+      }
+
+      if (maxtrav > tr->mxtips - 3)  
+	maxtrav = tr->mxtips - 3;    
+ 
+      tr->startLH = tr->endLH = tr->likelihood;
+      
+      /* printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc); */
+
+      {
+	int changes = 0;
+	
+	for(i = 1; i <= tr->mxtips + tr->mxtips - 2; i++)
+	  {                	         
+	    tr->bestOfNode = unlikely;
+	    
+	    if(rearrangeBIG(tr, tr->nodep[i], 1, maxtrav))
+	      {	     
+		if(tr->endLH > tr->startLH)                 	
+		  {		 	 	      
+		    restoreTreeFast(tr);	        	  	 	  	      
+		    tr->startLH = tr->endLH = tr->likelihood;			  
+		    changes++;
+		  }	         	       	
+	      }
+	  }
+	
+      
+	/*
+	  evaluateGeneric(tr, tr->start, TRUE);	
+	  
+	  printBothOpen("Changes: %d TRAV: %d lh %f MNZC %d\n", changes, maxtrav, tr->likelihood, mnzc);
+	*/      
+      }
+      
+      treeEvaluate(tr, 0.25);
+
+      /* printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc); */
+
+      saveBestTree(bt, tr, TRUE); 
+      if(tr->saveBestTrees)
+	saveBestTree(bestML, tr, FALSE);           
+                                         
+#ifdef _DEBUG_CHECKPOINTING
+      printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc);
+#endif
+
+      if(tr->likelihood > startLH)
+	{	 
+	  startLH = tr->likelihood; 	  	  	  
+	  printLog(tr);	  
+	  bestTrav = maxtrav;	 
+	  impr = TRUE;
+	}
+      else	
+	impr = FALSE;	
+      
+      
+      
+      if(tr->doCutoff)
+	{
+	  tr->lhCutoff = (tr->lhAVG) / ((double)(tr->lhDEC));       
+  
+	  tr->itCount =  tr->itCount + 1;
+	  tr->lhAVG = 0;
+	  tr->lhDEC = 0;
+	}
+      
+      maxtrav += 5;
+      
+             
+    }
+
+  recallBestTree(bt, 1, tr);
+  
+  tr->doCutoff = cutoff; 
+  
+#ifdef _DEBUG_CHECKPOINTING
+  printBothOpen("BestTrav %d\n", bestTrav);
+#endif
+
+  return bestTrav;     
+}
+
+
+
+
+
+void computeBIGRAPID (tree *tr, analdef *adef, boolean estimateModel) 
+{   
+  int
+    i,
+    impr, 
+    bestTrav = 0,
+    treeVectorLength = 0,
+    rearrangementsMax = 0, 
+    rearrangementsMin = 0,    
+    thoroughIterations = 0,
+    fastIterations = 0;
+   
+  double 
+    lh = unlikely, 
+    previousLh = unlikely, 
+    difference, 
+    epsilon;              
+  
+  bestlist 
+    *bestML,
+    *bestT, 
+    *bt;        
+ 
+  /* now here is the RAxML hill climbing search algorithm */
+  
+  tr->lhAVG = 0.0;
+  tr->lhDEC = 0.0;
+
+  /* initialization for the hash table to compute RF distances */
+
+  if(tr->searchConvergenceCriterion && processID == 0)   
+    treeVectorLength = 1;
+     
+  /* initialize two lists of size 1 and size 20 that will keep track of the best 
+     and 20 best tree topologies respectively */
+
+  bestT = (bestlist *) malloc(sizeof(bestlist));
+  bestT->ninit = 0;
+  initBestTree(bestT, 1, tr->mxtips);
+      
+  bt = (bestlist *) malloc(sizeof(bestlist));      
+  bt->ninit = 0;
+  initBestTree(bt, 20, tr->mxtips);    
+
+
+
+  if(tr->saveBestTrees > 0)
+    { 
+      bestML = (bestlist *) malloc(sizeof(bestlist));      
+      bestML->ninit = 0;
+      initBestTree(bestML, tr->saveBestTrees, tr->mxtips);  
+    }
+  else
+    bestML = (bestlist *)NULL;
+  
+  
+  /* initialize an additional data structure used by the search algo, all of this is pretty 
+     RAxML-specific and should probably not be in the library */
+
+  initInfoList(50);
+ 
+  /* some pretty atbitrary thresholds */
+
+  difference = 10.0;
+  epsilon = 0.01;    
+    
+  /* Thorough = 0 means that we will do fast SPR inbsertions without optimizing the 
+     three branches adjacent to the subtree insertion position via Newton-Raphson 
+  */
+
+  Thorough = 0;     
+  
+  /* if we are not using a checkpoint and estimateModel is set to TRUE we call the function 
+     that optimizes model parameters, such as the CAT model assignment, the alpha paremeter
+     or the rates in the GTR matrix. Otherwise we just optimize the branch lengths. Note that 
+     the second parameter of treeEvaluate() controls how many times we will iterate over all branches 
+     of the tree until we give up, provided that, the br-len opt. has not converged before.
+  */
+
+  if(!adef->useCheckpoint)
+    {
+      if(estimateModel)
+	modOpt(tr, 10.0, adef, 0);
+      else
+	treeEvaluate(tr, 2);  
+    }
+
+  /* print some stuff to the RAxML_log file */
+
+  printLog(tr); 
+
+  /* save the current tree (which is the input tree parsed via -t in the bestT list */
+
+  saveBestTree(bestT, tr, TRUE);
+  
+  /* if the rearrangmenet radius has been set by the user ie. adef->initailSet == TRUE 
+     then just set the apppropriate parameter.
+     Otherwise, call the function  determineRearrangementSetting() that seeks 
+     for the best radius by executing SPR moves on the initial tree with different radii
+     and returns the smallest radius that yields the best log likelihood score after 
+     applying one cycle of SPR moves to the tree 
+  */
+
+  if(!adef->initialSet)   
+    {
+      if((!adef->useCheckpoint) || (adef->useCheckpoint && ckp.state == REARR_SETTING))
+	{
+	  bestTrav = adef->bestTrav = determineRearrangementSetting(tr, adef, bestT, bt, bestML);     	  
+	  printBothOpen("\nBest rearrangement radius: %d\n", bestTrav);
+	}
+    }
+  else
+    {
+      bestTrav = adef->bestTrav = adef->initial;       
+      printBothOpen("\nUser-defined rearrangement radius: %d\n", bestTrav);
+    }
+
+  
+  /* some checkpointing noise */
+  if(!(adef->useCheckpoint && (ckp.state == FAST_SPRS || ckp.state == SLOW_SPRS)))
+    {      
+
+      /* optimize model params more thoroughly or just optimize branch lengths */
+      if(estimateModel)
+	modOpt(tr, 5.0, adef, 0);
+      else
+	treeEvaluate(tr, 1);   
+    }
+  
+  /* save the current tree again, while the topology has not changed, the branch lengths have changed in the meantime, hence
+     we need to store them again */
+
+  saveBestTree(bestT, tr, TRUE); 
+
+  /* set the loop variable to TRUE */
+
+  impr = 1;
+
+  /* this is for the additional RAxML heuristics described imn this paper here:
+
+     A. Stamatakis,  F. Blagojevic, C.D. Antonopoulos, D.S. Nikolopoulos: "Exploring new Search Algorithms and Hardware for Phylogenetics: RAxML meets the IBM Cell". 
+     In Journal of VLSI Signal Processing Systems, 48(3):271-286, 2007.
+
+     This is turned on by default 
+  */
+     
+  
+  if(tr->doCutoff)
+    tr->itCount = 0;
+
+  /* figure out where to continue computations if we restarted from a checkpoint */
+
+  if(adef->useCheckpoint && ckp.state == FAST_SPRS)
+    goto START_FAST_SPRS;
+
+  if(adef->useCheckpoint && ckp.state == SLOW_SPRS)
+    goto START_SLOW_SPRS;
+  
+  while(impr)
+    {              
+    START_FAST_SPRS:
+      /* if re-starting from checkpoint set the required variable values to the 
+	 values that they had when the checkpoint was written */
+
+      if(adef->useCheckpoint && ckp.state == FAST_SPRS)
+	{
+	  optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;   	
+
+  
+	  impr = ckp.impr;
+	  Thorough = ckp.Thorough;
+	  bestTrav = ckp.bestTrav;
+	  treeVectorLength = ckp.treeVectorLength;
+	  rearrangementsMax = ckp.rearrangementsMax;
+	  rearrangementsMin = ckp.rearrangementsMin;
+	  thoroughIterations = ckp.thoroughIterations;
+	  fastIterations = ckp.fastIterations;
+   
+  
+	  lh = ckp.lh;
+	  previousLh = ckp.previousLh;
+	  difference = ckp.difference;
+	  epsilon    = ckp.epsilon;                    
+           
+	
+	  tr->likelihood = ckp.tr_likelihood;
+              
+	  tr->lhCutoff = ckp.tr_lhCutoff;
+	  tr->lhAVG    = ckp.tr_lhAVG;
+	  tr->lhDEC    = ckp.tr_lhDEC;   	 
+	  tr->itCount = ckp.tr_itCount;	  
+
+	  adef->useCheckpoint = FALSE;
+	}
+      else
+	/* otherwise, restore the currently best tree */
+	recallBestTree(bestT, 1, tr); 
+
+      /* save states of algorithmic/heuristic variables for printing the next checkpoint */
+
+      /* 
+	 Andre I believe that the code below, except for
+	 writeCheckpoint cann still only be executed by process 0 =>
+	 Andre: see above
+       */ 
+      {              
+	ckp.state = FAST_SPRS;  
+	ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;              
+	  
+	  
+	ckp.impr = impr;
+	ckp.Thorough = Thorough;
+	ckp.bestTrav = bestTrav;
+	ckp.treeVectorLength = treeVectorLength;
+	ckp.rearrangementsMax = rearrangementsMax;
+	ckp.rearrangementsMin = rearrangementsMin;
+	ckp.thoroughIterations = thoroughIterations;
+	ckp.fastIterations = fastIterations;
+	  
+	  
+	ckp.lh = lh;
+	ckp.previousLh = previousLh;
+	ckp.difference = difference;
+	ckp.epsilon    = epsilon; 
+	  
+	  
+	ckp.bestTrav = bestTrav;       
+	ckp.impr = impr;
+	  
+	ckp.tr_startLH  = tr->startLH;
+	ckp.tr_endLH    = tr->endLH;
+	ckp.tr_likelihood = tr->likelihood;
+	ckp.tr_bestOfNode = tr->bestOfNode;
+	  
+	ckp.tr_lhCutoff = tr->lhCutoff;
+	ckp.tr_lhAVG    = tr->lhAVG;
+	ckp.tr_lhDEC    = tr->lhDEC;       
+	ckp.tr_itCount  = tr->itCount;       
+	  
+	/* write a binary checkpoint */
+	writeCheckpoint(tr, adef); 
+      }	
+
+      /* this is the aforementioned convergence criterion that requires computing the RF,
+	 let's not worry about this right now */
+
+      if(tr->searchConvergenceCriterion && processID == 0)
+	{
+	  int 
+	    bCounter = 0; 
+
+	  char 
+	    *buffer = (char*)calloc(tr->treeStringLength, sizeof(char));	  	      	 	  	  	
+
+	  if(fastIterations > 1)	    	      
+	    cleanupHashTable(tr->h, (fastIterations % 2));		
+	    	  	 
+
+	  bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, fastIterations % 2, BIPARTITIONS_RF, (branchInfo *)NULL,
+				  &bCounter, 1, FALSE, FALSE);	    
+	  	  
+	   
+#ifdef _DEBUG_CHECKPOINTING
+	  printf("Storing tree in slot %d\n", fastIterations % 2);
+#endif
+
+	  Tree2String(buffer, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+
+	  if(fastIterations % 2 == 0)	      
+	    memcpy(tr->tree0, buffer, tr->treeStringLength * sizeof(char));
+	  else
+	    memcpy(tr->tree1, buffer, tr->treeStringLength * sizeof(char));	    
+	  
+	  free(buffer);	  
+
+	  assert(bCounter == tr->mxtips - 3);	    	   	  	 
+
+	  if(fastIterations > 0)
+	    {
+	      double 
+		rrf = convergenceCriterion(tr->h, tr->mxtips);
+	      
+	      MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+	      
+	      if(rrf <= 0.01) /* 1% cutoff */
+		{
+		  printBothOpen("ML fast search converged at fast SPR cycle %d with stopping criterion\n", fastIterations);
+		  printBothOpen("Relative Robinson-Foulds (RF) distance between respective best trees after one succseful SPR cycle: %f%s\n", rrf, "%");
+		  cleanupHashTable(tr->h, 0);
+		  cleanupHashTable(tr->h, 1);
+		  goto cleanup_fast;
+		}
+	      else		    
+		printBothOpen("ML search convergence criterion fast cycle %d->%d Relative Robinson-Foulds %f\n", fastIterations - 1, fastIterations, rrf);
+	    }
+	}
+
+      if(tr->searchConvergenceCriterion && processID != 0 && fastIterations > 0)
+	{
+	  double 
+	    rrf;
+	  
+	  MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+	 
+	  if(rrf <= 0.01) /* 1% cutoff */		   
+	    goto cleanup_fast;	      
+	}
+
+      
+      /* count how many fast iterations with so-called fast SPR moves we have executed */
+
+      fastIterations++;	
+
+      /* optimize branch lengths */
+     
+     
+      treeEvaluate(tr, 1.0);    
+     
+      /* save the tree with those branch lengths again */
+      
+      saveBestTree(bestT, tr, TRUE);           
+
+      /* print the log likelihood */
+
+      printLog(tr);    
+
+      /* print this intermediate tree to file */
+     
+      printResult(tr, adef, FALSE);    
+
+      /* update the current best likelihood */
+
+      lh = previousLh = tr->likelihood;
+            
+      /* in here we actually do a cycle of SPR moves */
+
+      treeOptimizeRapid(tr, 1, bestTrav, adef, bt, bestML);   
+          
+      /* set impr to 0 since in the immediately following for loop we check if the SPR moves above have generated 
+	 a better tree */
+
+      impr = 0;
+	  
+      /* loop over the 20 best trees generated by the fast SPR moves, and check if they improve the likelihood after all of their branch lengths
+	 have been optimized */
+
+      for(i = 1; i <= bt->nvalid; i++)
+	{	    	
+	  /* restore tree i from list generated by treeOptimizeRapid */
+	  	   
+	  recallBestTree(bt, i, tr);
+	  
+	  /* optimize branch lengths of this tree */
+
+	  treeEvaluate(tr, 0.25);
+
+	  /* calc. the likelihood improvement */
+
+	  difference = ((tr->likelihood > previousLh)? 
+			tr->likelihood - previousLh: 
+			previousLh - tr->likelihood); 	    
+
+	  /* if the likelihood has improved save the current tree as best tree and continue */
+	  /* note that we always compre this tree to the likelihood of the previous best tree */
+	  
+	  if(tr->likelihood > lh && difference > epsilon)
+	    {
+	      impr = 1;	       
+	      lh = tr->likelihood;	       	     
+	      saveBestTree(bestT, tr, TRUE);
+	      
+	    }	   	   
+	}
+#ifdef _DEBUG_CHECKPOINTING
+      printBothOpen("FAST LH: %f\n", lh);
+#endif
+
+	
+    }
+  
+  /* needed for this RF-based convergence criterion that I actually describe in here:
+
+     A. Stamatakis: "Phylogenetic Search Algorithms for Maximum Likelihood". In M. Elloumi, A.Y. Zomaya, editors. 
+     Algorithms in Computational Biology: techniques, Approaches and Applications, John Wiley and Sons
+
+     a copy of this book is in my office */
+
+  if(tr->searchConvergenceCriterion && processID == 0)
+    {
+      cleanupHashTable(tr->h, 0);
+      cleanupHashTable(tr->h, 1);
+    }
+  
+ cleanup_fast:  
+  /*
+    now we have jumped out of the loop that executes 
+     fast SPRs, and next we will execute a loop that executes throough SPR cycles (with SPR moves 
+     that optimize via newton-Raphson all adjacent branches to the insertion point) 
+     until no through SPR move can be found that improves the likelihood further. A classic 
+     hill climbing algo.
+  */
+
+  Thorough = 1;
+  impr = 1;
+  
+  /* restore the currently best tree. this si actually required, because we do not know which tree
+     is actually stored in the tree data structure when the above loop exits */
+
+  recallBestTree(bestT, 1, tr); 
+  
+  /* RE-TRAVERSE THE ENTIRE TREE */
+  
+  evaluateGeneric(tr, tr->start, TRUE);
+#ifdef _DEBUG_CHECKPOINTING
+  printBothOpen("After Fast SPRs Final %f\n", tr->likelihood);   
+#endif
+    
+  /* optimize model params (including branch lengths) or just 
+     optimize branch lengths and leave the other model parameters (GTR rates, alhpa) 
+     alone */
+
+  if(estimateModel)
+    modOpt(tr, 1.0, adef, 0);
+  else
+    treeEvaluate(tr, 1.0);
+
+  /* start loop that executes thorough SPR cycles */
+
+  while(1)
+    {	 
+      /* once again if we want to restart from a checkpoint that was written during this loop we need
+	 to restore the values of the variables appropriately */
+    START_SLOW_SPRS:
+      if(adef->useCheckpoint && ckp.state == SLOW_SPRS)
+	{
+	  optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;   
+      
+	
+
+  
+	  impr = ckp.impr;
+	  Thorough = ckp.Thorough;
+	  bestTrav = ckp.bestTrav;
+	  treeVectorLength = ckp.treeVectorLength;
+	  rearrangementsMax = ckp.rearrangementsMax;
+	  rearrangementsMin = ckp.rearrangementsMin;
+	  thoroughIterations = ckp.thoroughIterations;
+	  fastIterations = ckp.fastIterations;
+   
+  
+	  lh = ckp.lh;
+	  previousLh = ckp.previousLh;
+	  difference = ckp.difference;
+	  epsilon    = ckp.epsilon;                    
+           
+	
+	  tr->likelihood = ckp.tr_likelihood;
+              
+	  tr->lhCutoff = ckp.tr_lhCutoff;
+	  tr->lhAVG    = ckp.tr_lhAVG;
+	  tr->lhDEC    = ckp.tr_lhDEC;   	 
+	  tr->itCount = ckp.tr_itCount;	 
+
+	  adef->useCheckpoint = FALSE;
+	}
+      else
+	/* otherwise we restore the currently best tree and load it from bestT into our tree data 
+	   structuire tr */
+	recallBestTree(bestT, 1, tr);
+
+      /* now, we write a checkpoint */
+      /* Andre I believe that the code below, except for
+	 writeCheckpoint cann still only be executed by process 0
+	 => Andre: see above */
+	{              
+	  ckp.state = SLOW_SPRS;  
+	  ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;              
+	  
+	  
+	  ckp.impr = impr;
+	  ckp.Thorough = Thorough;
+	  ckp.bestTrav = bestTrav;
+	  ckp.treeVectorLength = treeVectorLength;
+	  ckp.rearrangementsMax = rearrangementsMax;
+	  ckp.rearrangementsMin = rearrangementsMin;
+	  ckp.thoroughIterations = thoroughIterations;
+	  ckp.fastIterations = fastIterations;
+	  
+	  
+	  ckp.lh = lh;
+	  ckp.previousLh = previousLh;
+	  ckp.difference = difference;
+	  ckp.epsilon    = epsilon; 
+	  
+	  
+	  ckp.bestTrav = bestTrav;       
+	  ckp.impr = impr;
+	  
+	  ckp.tr_startLH  = tr->startLH;
+	  ckp.tr_endLH    = tr->endLH;
+	  ckp.tr_likelihood = tr->likelihood;
+	  ckp.tr_bestOfNode = tr->bestOfNode;
+	  
+	  ckp.tr_lhCutoff = tr->lhCutoff;
+	  ckp.tr_lhAVG    = tr->lhAVG;
+	  ckp.tr_lhDEC    = tr->lhDEC;     
+	  ckp.tr_itCount  = tr->itCount;	
+	  
+	  /* write binary checkpoint to file */
+	  
+	  writeCheckpoint(tr, adef); 
+	}
+    
+      if(impr)
+	{	    
+	  /* if the logl has improved write out some stuff and adapt the rearrangement radii */
+	  printResult(tr, adef, FALSE);
+	  /* minimum rearrangement radius */
+	  rearrangementsMin = 1;
+	  /* max radius, this is probably something I need to explain at the whiteboard */
+	  rearrangementsMax = adef->stepwidth;	
+	 
+	  /* once again the convergence criterion */
+
+	  if(tr->searchConvergenceCriterion && processID == 0)
+	    {
+	      int 
+		bCounter = 0;	   	
+	      
+	      char 
+		*buffer = (char*)calloc(tr->treeStringLength, sizeof(char));   
+
+	      if(thoroughIterations > 1)
+		cleanupHashTable(tr->h, (thoroughIterations % 2));		
+		
+	      bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, thoroughIterations % 2, BIPARTITIONS_RF, (branchInfo *)NULL,
+				      &bCounter, 1, FALSE, FALSE);	    
+	      	     	    	   
+	      	      	
+#ifdef _DEBUG_CHECKPOINTING		
+	      printf("Storing tree in slot %d\n", thoroughIterations % 2);
+#endif
+	      
+	      Tree2String(buffer, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+	      
+	      if(thoroughIterations % 2 == 0)	      
+		memcpy(tr->tree0, buffer, tr->treeStringLength * sizeof(char));
+	      else
+		memcpy(tr->tree1, buffer, tr->treeStringLength * sizeof(char));	    
+	      
+	      free(buffer);	      
+
+	      assert(bCounter == tr->mxtips - 3);
+
+	      if(thoroughIterations > 0)
+		{
+		  double 
+		    rrf = convergenceCriterion(tr->h, tr->mxtips);
+		  
+		  MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+		  
+		  if(rrf <= 0.01) /* 1% cutoff */
+		    {
+		      printBothOpen("ML search converged at thorough SPR cycle %d with stopping criterion\n", thoroughIterations);
+		      printBothOpen("Relative Robinson-Foulds (RF) distance between respective best trees after one succseful SPR cycle: %f%s\n", rrf, "%");
+		      goto cleanup;
+		    }
+		  else		    
+		    printBothOpen("ML search convergence criterion thorough cycle %d->%d Relative Robinson-Foulds %f\n", thoroughIterations - 1, thoroughIterations, rrf);
+		}
+	    }
+	  
+	  if(tr->searchConvergenceCriterion && processID != 0 && thoroughIterations > 0)
+	    {
+	      double 
+		rrf;
+	      
+	      MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+	      
+	      if(rrf <= 0.01) /* 1% cutoff */		   
+		goto cleanup;	      
+	    }
+
+	 
+	   	  
+	  thoroughIterations++;	  
+	}			  			
+      else
+	{
+	  /* if the lnl has not imrpved by the current SPR cycle adapt the min and max rearrangemnt radii and try again */
+		       	   
+	  rearrangementsMax += adef->stepwidth;
+	  rearrangementsMin += adef->stepwidth; 	        	      
+
+	  /* if we have already tried them then abandon this loop, the search has converged */
+	  if(rearrangementsMax > adef->max_rearrange)	     	     	 
+	    goto cleanup; 	   
+	}
+      
+      /* optimize branch lengths of best tree */
+
+      treeEvaluate(tr, 1.0);
+     
+      /* do some bokkeeping and printouts again */
+      previousLh = lh = tr->likelihood;	      
+      saveBestTree(bestT, tr, TRUE);     
+      printLog(tr);
+
+      /* do a cycle of thorough SPR moves with the minimum and maximum rearrangement radii */
+
+      treeOptimizeRapid(tr, rearrangementsMin, rearrangementsMax, adef, bt, bestML);
+	
+      impr = 0;			      		            
+
+      /* once again get the best 20 trees produced by the SPR cycle, load them from the bt tree list into tr
+	 optimize their branch lengths and figure out if the LnL of the tree has improved */
+
+      for(i = 1; i <= bt->nvalid; i++)
+	{		 
+	  recallBestTree(bt, i, tr);	 	    	    	
+	  
+	  treeEvaluate(tr, 0.25);	    	 
+	    
+	  difference = ((tr->likelihood > previousLh)? 
+			tr->likelihood - previousLh: 
+			previousLh - tr->likelihood); 	    
+	  if(tr->likelihood > lh && difference > epsilon)
+	    {
+	      impr = 1;	       
+	      lh = tr->likelihood;	  	     
+	      saveBestTree(bestT, tr, TRUE);
+	    }	   	   
+	}  
+
+#ifdef _DEBUG_CHECKPOINTING
+      printBothOpen("SLOW LH: %f\n", lh);              
+#endif
+    }
+
+ cleanup: 
+  
+  /* do a final full tree traversal, not sure if this is required here */
+  
+  evaluateGeneric(tr, tr->start, TRUE);
+    
+#ifdef _DEBUG_CHECKPOINTING
+  printBothOpen("After SLOW SPRs Final %f\n", tr->likelihood);   
+#endif
+   
+  printBothOpen("\nLikelihood of best tree: %f\n", tr->likelihood);
+  /* print the absolut best tree */
+
+  printLog(tr);
+  printResult(tr, adef, TRUE);
+
+  /* print other good trees encountered during the search */
+
+  if(tr->saveBestTrees > 0)
+    { 
+      char 
+	fileName[2048] = "",
+	buf[64] = "";
+     
+      printBothOpen("\n\nEvaluating %d other good ML trees\n\n", bestML->nvalid);
+      
+      for(i = 1; i <= bestML->nvalid; i++)
+	{		 
+	  recallBestTree(bestML, i, tr);	 	    	    		  
+	  /*treeEvaluate(tr, 0.25);*/
+	  printBothOpen("tree %d likelihood %1.80f\n", i, tr->likelihood);
+
+	  if(processID == 0)
+	    { 	      		
+	      FILE 
+		*treeFile;
+
+	      strcpy(fileName,       workdir);
+	      strcat(fileName, "RAxML_");
+	      sprintf(buf, "%d", bestML->nvalid);
+	      strcat(fileName, buf);
+	      strcat(fileName, "_goodTrees.");
+	      strcat(fileName, run_id);
+
+	      treeFile = myfopen(fileName, "a");
+	     
+	      Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, TRUE, SUMMARIZE_LH, FALSE, FALSE);
+	       
+	      fprintf(treeFile, "%s", tr->tree_string);
+	      fclose(treeFile);	      
+	    }
+
+	  
+	}      
+	
+      printBothOpen("\n\nOther good trees written to file %s\n", fileName);			
+    }
+
+
+  /* free data structures */
+
+  if(tr->searchConvergenceCriterion && processID == 0)
+    {
+      freeBitVectors(tr->bitVectors, 2 * tr->mxtips);
+      free(tr->bitVectors);
+      freeHashTable(tr->h);
+      free(tr->h);
+    }
+  
+  freeBestTree(bestT);
+  free(bestT);
+  freeBestTree(bt);
+  free(bt);
+  freeInfoList();  
+  
+
+  /* and we are done, return to main() in axml.c  */
+
+}
+
+
+
+boolean treeEvaluate (tree *tr, double smoothFactor)       /* Evaluate a user tree */
+{
+  boolean result;
+
+ 
+  result = smoothTree(tr, (int)((double)smoothings * smoothFactor));
+  
+  assert(result); 
+
+  //make sure that all vectors are oriented correctly !
+
+  evaluateGeneric(tr, tr->start, TRUE);   
+    
+
+  return TRUE;
+}
+
diff --git a/examl/topologies.c b/examl/topologies.c
new file mode 100644
index 0000000..3e30bf8
--- /dev/null
+++ b/examl/topologies.c
@@ -0,0 +1,653 @@
+
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+
+
+
+
+static void saveTopolRELLRec(tree *tr, nodeptr p, topolRELL *tpl, int *i, int numsp, int numBranches)
+{
+  int k;
+  if(isTip(p->number, numsp))
+    return;
+  else
+    {
+      nodeptr q = p->next;      
+      while(q != p)
+	{	  
+	  tpl->connect[*i].p = q;
+	  tpl->connect[*i].q = q->back; 
+	  
+	  if(tr->constraintTree)
+	    {
+	      tpl->connect[*i].cp = tr->constraintVector[q->number];
+	      tpl->connect[*i].cq = tr->constraintVector[q->back->number]; 
+	    }
+	  
+	  for(k = 0; k < numBranches; k++)
+	    tpl->connect[*i].z[k] = q->z[k];
+	  *i = *i + 1;
+	  
+	  saveTopolRELLRec(tr, q->back, tpl, i, numsp, numBranches);
+	  q = q->next;
+	}
+    }
+}
+
+static void saveTopolRELL(tree *tr, topolRELL *tpl)
+{
+  nodeptr p = tr->start;
+  int k, i = 0;
+      
+  tpl->likelihood = tr->likelihood;
+  tpl->start      = 1;
+      
+  tpl->connect[i].p = p;
+  tpl->connect[i].q = p->back;
+  
+  if(tr->constraintTree)
+    {
+      tpl->connect[i].cp = tr->constraintVector[p->number];
+      tpl->connect[i].cq = tr->constraintVector[p->back->number]; 
+    }
+
+  for(k = 0; k < tr->numBranches; k++)
+    tpl->connect[i].z[k] = p->z[k];
+  i++;
+      
+  saveTopolRELLRec(tr, p->back, tpl, &i, tr->mxtips, tr->numBranches);   
+
+  assert(i == 2 * tr->mxtips - 3);
+}
+
+
+static void restoreTopolRELL(tree *tr, topolRELL *tpl)
+{
+  int i;
+  
+  for (i = 0; i < 2 * tr->mxtips - 3; i++) 
+    {
+      hookup(tpl->connect[i].p, tpl->connect[i].q, tpl->connect[i].z,  tr->numBranches);    
+      tr->constraintVector[tpl->connect[i].p->number] = tpl->connect[i].cp;
+      tr->constraintVector[tpl->connect[i].q->number] = tpl->connect[i].cq;
+    }
+  
+
+  tr->likelihood = tpl->likelihood;
+  tr->start      = tr->nodep[tpl->start];
+  /* TODO */
+}
+
+
+
+
+void initTL(topolRELL_LIST *rl, tree *tr, int n)
+{
+  int i;
+
+  rl->max = n; 
+  rl->t = (topolRELL **)malloc(sizeof(topolRELL *) * n);
+
+  for(i = 0; i < n; i++)
+    {
+      rl->t[i] = (topolRELL *)malloc(sizeof(topolRELL));
+      rl->t[i]->connect = (connectRELL *)malloc((2 * tr->mxtips - 3) * sizeof(connectRELL));
+      rl->t[i]->likelihood = unlikely;     
+    }
+}
+
+
+void freeTL(topolRELL_LIST *rl)
+{
+  int i;
+  for(i = 0; i < rl->max; i++)    
+    {
+      free(rl->t[i]->connect);          
+      free(rl->t[i]);
+    }
+  free(rl->t);
+}
+
+
+void restoreTL(topolRELL_LIST *rl, tree *tr, int n)
+{
+  assert(n >= 0 && n < rl->max);    
+
+  restoreTopolRELL(tr, rl->t[n]);  
+}
+
+
+
+
+void resetTL(topolRELL_LIST *rl)
+{
+  int i;
+
+  for(i = 0; i < rl->max; i++)    
+    rl->t[i]->likelihood = unlikely;          
+}
+
+
+
+
+void saveTL(topolRELL_LIST *rl, tree *tr, int index)
+{ 
+  assert(index >= 0 && index < rl->max);    
+    
+  if(tr->likelihood > rl->t[index]->likelihood)        
+    saveTopolRELL(tr, rl->t[index]); 
+}
+
+
+static void  *tipValPtr (nodeptr p)
+{ 
+  return  (void *) & p->number;
+}
+
+
+static int  cmpTipVal (void *v1, void *v2)
+{
+  int  i1, i2;
+  
+  i1 = *((int *) v1);
+  i2 = *((int *) v2);
+  return  (i1 < i2) ? -1 : ((i1 == i2) ? 0 : 1);
+}
+
+
+/*  These are the only routines that need to UNDERSTAND topologies */
+
+static topol  *setupTopol (int maxtips)
+{
+  topol   *tpl;
+
+  if (! (tpl = (topol *) malloc(sizeof(topol))) || 
+      ! (tpl->links = (connptr) malloc((2*maxtips-3) * sizeof(connect))))
+    {
+      printf("ERROR: Unable to get topology memory");
+      tpl = (topol *) NULL;
+    }
+  else 
+    {
+      tpl->likelihood  = unlikely;
+      tpl->start       = (node *) NULL;
+      tpl->nextlink    = 0;
+      tpl->ntips       = 0;
+      tpl->nextnode    = 0;    
+      tpl->scrNum      = 0;     /* position in sorted list of scores */
+      tpl->tplNum      = 0;     /* position in sorted list of trees */	      
+    }
+  
+  return  tpl;
+} 
+
+
+static void  freeTopol (topol *tpl)
+{
+  free(tpl->links);
+  free(tpl);
+} 
+
+
+static int saveSubtree (nodeptr p, topol *tpl, int numsp, int numBranches)  
+{
+  connptr  r, r0;
+  nodeptr  q, s;
+  int      t, t0, t1, k;
+
+  r0 = tpl->links;
+  r = r0 + (tpl->nextlink)++;
+  r->p = p;
+  r->q = q = p->back;
+
+  for(k = 0; k < numBranches; k++)
+    r->z[k] = p->z[k];
+
+  r->descend = 0;                     /* No children (yet) */
+
+  if (isTip(q->number, numsp)) 
+    {
+      r->valptr = tipValPtr(q);         /* Assign value */
+    }
+  else 
+    {                              /* Internal node, look at children */
+      s = q->next;                      /* First child */
+      do 
+	{
+	  t = saveSubtree(s, tpl, numsp, numBranches);        /* Generate child's subtree */
+
+	  t0 = 0;                         /* Merge child into list */
+	  t1 = r->descend;
+	  while (t1 && (cmpTipVal(r0[t1].valptr, r0[t].valptr) < 0)) {
+	    t0 = t1;
+	    t1 = r0[t1].sibling;
+          }
+	  if (t0) r0[t0].sibling = t;  else  r->descend = t;
+	  r0[t].sibling = t1;
+
+	  s = s->next;                    /* Next child */
+        } while (s != q);
+
+      r->valptr = r0[r->descend].valptr;   /* Inherit first child's value */
+      }                                 /* End of internal node processing */
+
+  return  (r - r0);
+}
+
+
+static nodeptr minSubtreeTip (nodeptr  p0, int numsp)
+{ 
+  nodeptr  minTip, p, testTip;
+
+  if (isTip(p0->number, numsp)) 
+    return p0;
+
+  p = p0->next;
+
+  minTip = minSubtreeTip(p->back, numsp);
+
+  while ((p = p->next) != p0) 
+    {
+      testTip = minSubtreeTip(p->back, numsp);
+      if (cmpTipVal(tipValPtr(testTip), tipValPtr(minTip)) < 0)
+        minTip = testTip;
+    }
+  return minTip;
+} 
+
+
+static nodeptr  minTreeTip (nodeptr  p, int numsp)
+{
+  nodeptr  minp, minpb;
+
+  minp  = minSubtreeTip(p, numsp);
+  minpb = minSubtreeTip(p->back, numsp);
+  return (cmpTipVal(tipValPtr(minp), tipValPtr(minpb)) < 0 ? minp : minpb);
+}
+
+
+static void saveTree (tree *tr, topol *tpl)
+/*  Save a tree topology in a standard order so that first branches
+ *  from a node contain lower value tips than do second branches from
+ *  the node.  The root tip should have the lowest value of all.
+ */
+{
+  connptr  r;  
+  
+  tpl->nextlink = 0;                             /* Reset link pointer */
+  r = tpl->links + saveSubtree(minTreeTip(tr->start, tr->mxtips), tpl, tr->mxtips, tr->numBranches);  /* Save tree */
+  r->sibling = 0;
+  
+  tpl->likelihood = tr->likelihood;
+  tpl->start      = tr->start;
+  tpl->ntips      = tr->ntips;
+  tpl->nextnode   = tr->nextnode;    
+  
+} /* saveTree */
+
+
+static boolean restoreTree (topol *tpl, tree *tr)
+{ 
+  connptr  r;
+  nodeptr  p, p0;    
+  int  i;
+
+  for (i = 1; i <= 2*(tr->mxtips) - 2; i++) 
+    {  
+      /* Uses p = p->next at tip */
+      p0 = p = tr->nodep[i];
+      do 
+	{
+	  p->back = (nodeptr) NULL;
+	  p = p->next;
+	} 
+      while (p != p0);
+    }
+
+  /*  Copy connections from topology */
+
+  for (r = tpl->links, i = 0; i < tpl->nextlink; r++, i++)     
+    hookup(r->p, r->q, r->z, tr->numBranches);      
+
+  tr->likelihood = tpl->likelihood;
+  tr->start      = tpl->start;
+  tr->ntips      = tpl->ntips;
+  
+  tr->nextnode   = tpl->nextnode;    
+
+  evaluateGeneric(tr, tr->start, TRUE);
+  return TRUE;
+}
+
+
+
+
+int initBestTree (bestlist *bt, int newkeep, int numsp)
+{ /* initBestTree */
+  int  i;
+
+  bt->nkeep = 0;
+
+  if (bt->ninit <= 0) 
+    {
+      if (! (bt->start = setupTopol(numsp)))  return  0;
+      bt->ninit = -1;
+      bt->nvalid = 0;
+      bt->numtrees = 0;
+      bt->best = unlikely;
+      bt->improved = FALSE;
+      bt->byScore = (topol **) malloc((newkeep+1) * sizeof(topol *));
+      bt->byTopol = (topol **) malloc((newkeep+1) * sizeof(topol *));
+      if (! bt->byScore || ! bt->byTopol) {
+        printf( "initBestTree: malloc failure\n");
+        return 0;
+      }
+    }
+  else if (ABS(newkeep) > bt->ninit) {
+    if (newkeep <  0) newkeep = -(bt->ninit);
+    else newkeep = bt->ninit;
+  }
+
+  if (newkeep < 1) {    /*  Use negative newkeep to clear list  */
+    newkeep = -newkeep;
+    if (newkeep < 1) newkeep = 1;
+    bt->nvalid = 0;
+    bt->best = unlikely;
+  }
+  
+  if (bt->nvalid >= newkeep) {
+    bt->nvalid = newkeep;
+    bt->worst = bt->byScore[newkeep]->likelihood;
+  }
+  else 
+    {
+      bt->worst = unlikely;
+    }
+  
+  for (i = bt->ninit + 1; i <= newkeep; i++) 
+    {    
+      if (! (bt->byScore[i] = setupTopol(numsp)))  break;
+      bt->byTopol[i] = bt->byScore[i];
+      bt->ninit = i;
+    }
+  
+  return  (bt->nkeep = MIN(newkeep, bt->ninit));
+} /* initBestTree */
+
+
+
+void resetBestTree (bestlist *bt)
+{ /* resetBestTree */
+  bt->best     = unlikely;
+  bt->worst    = unlikely;
+  bt->nvalid   = 0;
+  bt->improved = FALSE;
+} /* resetBestTree */
+
+
+boolean  freeBestTree(bestlist *bt)
+{ /* freeBestTree */
+  while (bt->ninit >= 0)  freeTopol(bt->byScore[(bt->ninit)--]);
+    
+  /* VALGRIND */
+
+  free(bt->byScore);
+  free(bt->byTopol);
+
+  /* VALGRIND END */
+
+  freeTopol(bt->start);
+  return TRUE;
+} /* freeBestTree */
+
+
+/*  Compare two trees, assuming that each is in standard order.  Return
+ *  -1 if first preceeds second, 0 if they are identical, or +1 if first
+ *  follows second in standard order.  Lower number tips preceed higher
+ *  number tips.  A tip preceeds a corresponding internal node.  Internal
+ *  nodes are ranked by their lowest number tip.
+ */
+
+static int  cmpSubtopol (connptr p10, connptr p1, connptr p20, connptr p2)
+{
+  connptr  p1d, p2d;
+  int  cmp;
+  
+  if (! p1->descend && ! p2->descend)          /* Two tips */
+    return cmpTipVal(p1->valptr, p2->valptr);
+  
+  if (! p1->descend) return -1;                /* p1 = tip, p2 = node */
+  if (! p2->descend) return  1;                /* p2 = tip, p1 = node */
+  
+  p1d = p10 + p1->descend;
+  p2d = p20 + p2->descend;
+  while (1) {                                  /* Two nodes */
+    if ((cmp = cmpSubtopol(p10, p1d, p20, p2d)))  return cmp; /* Subtrees */
+    if (! p1d->sibling && ! p2d->sibling)  return  0; /* Lists done */
+    if (! p1d->sibling) return -1;             /* One done, other not */
+    if (! p2d->sibling) return  1;             /* One done, other not */
+    p1d = p10 + p1d->sibling;                  /* Neither done */
+    p2d = p20 + p2d->sibling;
+  }
+}
+
+
+
+static int  cmpTopol (void *tpl1, void *tpl2)
+{ 
+  connptr  r1, r2;
+  int      cmp;    
+  
+  r1 = ((topol *) tpl1)->links;
+  r2 = ((topol *) tpl2)->links;
+  cmp = cmpTipVal(tipValPtr(r1->p), tipValPtr(r2->p));
+  if (cmp)      	
+    return cmp;     
+  return  cmpSubtopol(r1, r1, r2, r2);
+} 
+
+
+
+static int  cmpTplScore (void *tpl1, void *tpl2)
+{ 
+  double  l1, l2;
+  
+  l1 = ((topol *) tpl1)->likelihood;
+  l2 = ((topol *) tpl2)->likelihood;
+  return  (l1 > l2) ? -1 : ((l1 == l2) ? 0 : 1);
+}
+
+
+
+/*  Find an item in a sorted list of n items.  If the item is in the list,
+ *  return its index.  If it is not in the list, return the negative of the
+ *  position into which it should be inserted.
+ */
+
+static int  findInList (void *item, void *list[], int n, int (* cmpFunc)(void *, void *))
+{
+  int  mid, hi, lo, cmp = 0;
+  
+  if (n < 1) return  -1;                    /*  No match; first index  */
+  
+  lo = 1;
+  mid = 0;
+  hi = n;
+  while (lo < hi) {
+    mid = (lo + hi) >> 1;
+    cmp = (* cmpFunc)(item, list[mid-1]);
+    if (cmp) {
+      if (cmp < 0) hi = mid;
+      else lo = mid + 1;
+    }
+    else  return  mid;                        /*  Exact match  */
+  }
+  
+  if (lo != mid) {
+    cmp = (* cmpFunc)(item, list[lo-1]);
+    if (cmp == 0) return lo;
+  }
+  if (cmp > 0) lo++;                         /*  Result of step = 0 test  */
+  return  -lo;
+} 
+
+
+
+static int  findTreeInList (bestlist *bt, tree *tr)
+{
+  topol  *tpl;
+  
+  tpl = bt->byScore[0];
+  saveTree(tr, tpl);
+  return  findInList((void *) tpl, (void **) (& (bt->byTopol[1])),
+		     bt->nvalid, cmpTopol);
+} 
+
+
+int  saveBestTree (bestlist *bt, tree *tr, boolean keepIdenticalTrees)
+{    
+  topol  
+    *tpl, 
+    *reuse;
+  
+  int  
+    tplNum, 
+    scrNum, 
+    reuseScrNum, 
+    reuseTplNum, 
+    i, 
+    oldValid, 
+    newValid;
+  
+  tplNum = findTreeInList(bt, tr);
+  tpl = bt->byScore[0];
+  oldValid = newValid = bt->nvalid;
+  
+  if(tplNum > 0) 
+    {                      
+      /* Topology is in list  */
+
+      if(!keepIdenticalTrees)
+	return 0;
+
+      reuse = bt->byTopol[tplNum];         /* Matching topol  */
+      reuseScrNum = reuse->scrNum;
+      reuseTplNum = reuse->tplNum;
+    }
+  /* Good enough to keep? */
+  else
+    {
+      if(tr->likelihood < bt->worst)  
+	return 0;  
+      else 
+	{                                 /* Topology is not in list */
+	  tplNum = -tplNum;                    /* Add to list (not replace) */
+	  if (newValid < bt->nkeep) bt->nvalid = ++newValid;
+	  reuseScrNum = newValid;              /* Take worst tree */
+	  reuse = bt->byScore[reuseScrNum];
+	  reuseTplNum = (newValid > oldValid) ? newValid : reuse->tplNum;
+	  if (tr->likelihood > bt->start->likelihood) 
+	    bt->improved = TRUE;
+	}
+    }
+  
+  scrNum = findInList((void *) tpl, (void **) (& (bt->byScore[1])),
+		      oldValid, cmpTplScore);
+  scrNum = ABS(scrNum);
+  
+  if (scrNum < reuseScrNum)
+    {
+      for (i = reuseScrNum; i > scrNum; i--)
+	(bt->byScore[i] = bt->byScore[i-1])->scrNum = i;
+    }
+  else
+    {
+      if (scrNum > reuseScrNum) 
+	{
+	  scrNum--;
+	  for (i = reuseScrNum; i < scrNum; i++)
+	    (bt->byScore[i] = bt->byScore[i+1])->scrNum = i;
+	}
+    }
+  
+  if(tplNum < reuseTplNum)
+    for (i = reuseTplNum; i > tplNum; i--)
+      (bt->byTopol[i] = bt->byTopol[i-1])->tplNum = i;  
+  else 
+    {
+      if (tplNum > reuseTplNum) 
+	{
+	  tplNum--;
+	  for (i = reuseTplNum; i < tplNum; i++)
+	    (bt->byTopol[i] = bt->byTopol[i+1])->tplNum = i;
+	}
+    }
+      
+  tpl->scrNum = scrNum;
+  tpl->tplNum = tplNum;
+  bt->byTopol[tplNum] = bt->byScore[scrNum] = tpl;
+  bt->byScore[0] = reuse;
+  
+  if (scrNum == 1)  bt->best = tr->likelihood;
+  if (newValid == bt->nkeep) bt->worst = bt->byScore[newValid]->likelihood;
+  
+  return  scrNum;
+} 
+
+
+int  recallBestTree (bestlist *bt, int rank, tree *tr)
+{ 
+  if (rank < 1)  rank = 1;
+  if (rank > bt->nvalid)  rank = bt->nvalid;
+  if (rank > 0)  if (! restoreTree(bt->byScore[rank], tr)) return FALSE;
+  return  rank;
+}
+
+
+
+
diff --git a/examl/trash.c b/examl/trash.c
new file mode 100644
index 0000000..e681a85
--- /dev/null
+++ b/examl/trash.c
@@ -0,0 +1,78 @@
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>  
+#endif
+
+#include <limits.h>
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+
+ 
+  
+static void reorderNodes(tree *tr, nodeptr *np, nodeptr p, int *count)
+{
+  int i, found = 0;
+
+  if(isTip(p->number, tr->mxtips))    
+    return;
+  else
+    {              
+      for(i = tr->mxtips + 1; (i <= (tr->mxtips + tr->mxtips - 1)) && (found == 0); i++)
+	{
+	  if (p == np[i] || p == np[i]->next || p == np[i]->next->next)
+	    {
+	      if(p == np[i])			       
+		tr->nodep[*count + tr->mxtips + 1] = np[i];		 		
+	      else
+		{
+		  if(p == np[i]->next)		  
+		    tr->nodep[*count + tr->mxtips + 1] = np[i]->next;		     	   
+		  else		   
+		    tr->nodep[*count + tr->mxtips + 1] = np[i]->next->next;		    		    
+		}
+
+	      found = 1;	      	     
+	      *count = *count + 1;
+	    }
+	} 
+      
+      assert(found != 0);
+     
+      reorderNodes(tr, np, p->next->back, count);     
+      reorderNodes(tr, np, p->next->next->back, count);                
+    }
+}
+
+void nodeRectifier(tree *tr)
+{
+  nodeptr *np = (nodeptr *)malloc(2 * tr->mxtips * sizeof(nodeptr));
+  int i;
+  int count = 0;
+  
+  tr->start       = tr->nodep[1];
+  tr->rooted      = FALSE;
+
+  /* TODO why is tr->rooted set to FALSE here ?*/
+  
+  for(i = tr->mxtips + 1; i <= (tr->mxtips + tr->mxtips - 1); i++)
+    np[i] = tr->nodep[i];           
+  
+  reorderNodes(tr, np, tr->start->back, &count); 
+
+ 
+  free(np);
+}
+
+nodeptr findAnyTip(nodeptr p, int numsp)
+{ 
+  return  isTip(p->number, numsp) ? p : findAnyTip(p->next->back, numsp);
+} 
+
diff --git a/examl/treeIO.c b/examl/treeIO.c
new file mode 100644
index 0000000..ac5fe71
--- /dev/null
+++ b/examl/treeIO.c
@@ -0,0 +1,1184 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+extern char infoFileName[1024];
+extern char tree_file[1024];
+extern char *likelihood_key;
+extern char *ntaxa_key;
+extern char *smoothed_key;
+extern double masterTime;
+
+
+
+
+
+stringHashtable *initStringHashTable(hashNumberType n)
+{
+  /* 
+     init with primes 
+  */
+    
+  static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+					     196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+					     50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+ 
+
+  /* init with powers of two
+
+  static const  hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+					      32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+					      4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+					      268435456, 536870912, 1073741824, 2147483648U};
+  */
+  
+  stringHashtable *h = (stringHashtable*)malloc(sizeof(stringHashtable));
+  
+  hashNumberType
+    tableSize,
+    i,
+    primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+    maxSize = (hashNumberType)-1;    
+
+  assert(n <= maxSize);
+
+  i = 0;
+
+  while(initTable[i] < n && i < primeTableLength)
+    i++;
+
+  assert(i < primeTableLength);
+
+  tableSize = initTable[i];  
+
+  h->table = (stringEntry**)calloc(tableSize, sizeof(stringEntry*));
+  h->tableSize = tableSize;    
+
+  return h;
+}
+
+
+static hashNumberType  hashString(char *p, hashNumberType tableSize)
+{
+  hashNumberType h = 0;
+  
+  for(; *p; p++)
+    h = 31 * h + *p;
+  
+  return (h % tableSize);
+}
+
+ 
+
+void addword(char *s, stringHashtable *h, int nodeNumber)
+{
+  hashNumberType position = hashString(s, h->tableSize);
+  stringEntry *p = h->table[position];
+  
+  for(; p!= NULL; p = p->next)
+    {
+      if(strcmp(s, p->word) == 0)		 
+	return;	  	
+    }
+
+  p = (stringEntry *)malloc(sizeof(stringEntry));
+
+  assert(p);
+  
+  p->nodeNumber = nodeNumber;
+
+  p->word = (char*)calloc(strlen(s) + 1, sizeof(char)); 
+  strcpy(p->word, s); 
+  
+  p->next =  h->table[position];
+  
+  h->table[position] = p;
+}
+
+int lookupWord(char *s, stringHashtable *h)
+{
+  hashNumberType position = hashString(s, h->tableSize);
+  stringEntry *p = h->table[position];
+  
+  for(; p!= NULL; p = p->next)
+    {
+      if(strcmp(s, p->word) == 0)		 
+	return p->nodeNumber;	  	
+    }
+
+  return -1;
+}
+
+
+int countTips(nodeptr p, int numsp)
+{
+  if(isTip(p->number, numsp))  
+    return 1;    
+  {
+    nodeptr q;
+    int tips = 0;
+
+    q = p->next;
+    while(q != p)
+      { 
+	tips += countTips(q->back, numsp);
+	q = q->next;
+      } 
+    
+    return tips;
+  }
+}
+
+
+static double getBranchLength(tree *tr, int perGene, nodeptr p)
+{
+  double 
+    z = 0.0,
+    x = 0.0;
+
+  assert(perGene != NO_BRANCHES);
+	      
+  if(tr->numBranches == 1)
+    {          
+      z = p->z[0];
+      if (z < zmin) 
+	z = zmin;      	 
+      
+      x = -log(z);    
+    }
+  else
+    {
+      if(perGene == SUMMARIZE_LH)
+	{
+	  int 
+	    i;
+	  
+	  double 
+	    avgX = 0.0;
+		      
+	  for(i = 0; i < tr->numBranches; i++)
+	    {	  
+	      z = p->z[i];
+	      if(z < zmin) 
+		z = zmin;      	 
+	      x = -log(z);
+	      avgX += x * tr->partitionContributions[i];
+	    }
+
+	  x = avgX;
+	}
+      else
+	{		  
+	  assert(perGene >= 0 && perGene < tr->numBranches);
+	  
+	  z = p->z[perGene];
+	  
+	  if(z < zmin) 
+	    z = zmin;      	 
+	  
+	  x = -log(z);	  
+	}
+    }
+
+  return x;
+}
+
+
+  
+
+
+static char *Tree2StringREC(char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, 
+			    boolean printLikelihood, boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport)
+{
+  char  *nameptr;            
+      
+  if(isTip(p->number, tr->mxtips)) 
+    {	       	  
+      if(printNames)
+	{
+	  nameptr = tr->nameList[p->number];     
+	  sprintf(treestr, "%s", nameptr);
+	}
+      else
+	sprintf(treestr, "%d", p->number);    
+	
+      while (*treestr) treestr++;
+    }
+  else 
+    {                 	 
+      *treestr++ = '(';
+      treestr = Tree2StringREC(treestr, tr, p->next->back, printBranchLengths, printNames, printLikelihood, rellTree, 
+			       finalPrint, perGene, branchLabelSupport, printSHSupport);
+      *treestr++ = ',';
+      treestr = Tree2StringREC(treestr, tr, p->next->next->back, printBranchLengths, printNames, printLikelihood, rellTree, 
+			       finalPrint, perGene, branchLabelSupport, printSHSupport);
+      if(p == tr->start->back) 
+	{
+	  *treestr++ = ',';
+	  treestr = Tree2StringREC(treestr, tr, p->back, printBranchLengths, printNames, printLikelihood, rellTree, 
+				   finalPrint, perGene, branchLabelSupport, printSHSupport);
+	}
+      *treestr++ = ')';                    
+    }
+
+  if(p == tr->start->back) 
+    {	      	 
+      if(printBranchLengths && !rellTree)
+	sprintf(treestr, ":0.0;\n");
+      else
+	sprintf(treestr, ";\n");	 	  	
+    }
+  else 
+    {                   
+      if(rellTree || branchLabelSupport || printSHSupport)
+	{	 	 
+	  if(( !isTip(p->number, tr->mxtips)) && 
+	     ( !isTip(p->back->number, tr->mxtips)))
+	    {	
+	      assert(0);
+		      
+	      /*assert(p->bInf != (branchInfo *)NULL);*/
+	      
+	      /*if(rellTree)
+		sprintf(treestr, "%d:%8.20f", p->bInf->support, p->z[0]);
+	      if(branchLabelSupport)
+		sprintf(treestr, ":%8.20f[%d]", p->z[0], p->bInf->support);
+	      if(printSHSupport)
+		sprintf(treestr, ":%8.20f[%d]", getBranchLength(tr, perGene, p), p->bInf->support);
+	      */
+	      
+	    }
+	  else		
+	    {
+	      if(rellTree || branchLabelSupport)
+		sprintf(treestr, ":%8.20f", p->z[0]);	
+	      if(printSHSupport)
+		sprintf(treestr, ":%8.20f", getBranchLength(tr, perGene, p));
+	    }
+	}
+      else
+	{
+	  if(printBranchLengths)	    
+	    sprintf(treestr, ":%8.20f", getBranchLength(tr, perGene, p));	      	   
+	  else	    
+	    sprintf(treestr, "%s", "\0");	    
+	}      
+    }
+  
+  while (*treestr) treestr++;
+  return  treestr;
+}
+
+
+
+
+
+    
+
+
+
+
+char *Tree2String(char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood, 
+		  boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport)
+{ 
+
+  if(rellTree)
+    assert(!branchLabelSupport && !printSHSupport);
+
+  if(branchLabelSupport)
+    assert(!rellTree && !printSHSupport);
+
+  if(printSHSupport)
+    assert(!branchLabelSupport && !rellTree);
+
+ 
+  Tree2StringREC(treestr, tr, p, printBranchLengths, printNames, printLikelihood, rellTree, 
+		 finalPrint, perGene, branchLabelSupport, printSHSupport);  
+    
+  
+  while (*treestr) treestr++;
+  
+  return treestr;
+}
+
+
+void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission)
+{  
+  FILE *treeFile;
+  char extendedTreeFileName[1024];
+  char buf[16];
+  int i;
+
+  assert(adef->perGeneBranchLengths);
+     
+  for(i = 0; i < tr->numBranches; i++)	
+    {
+      strcpy(extendedTreeFileName, fileName);
+      sprintf(buf,"%d", i);
+      strcat(extendedTreeFileName, ".PARTITION.");
+      strcat(extendedTreeFileName, buf);
+      /*printf("Partitiuon %d file %s\n", i, extendedTreeFileName);*/
+      Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, TRUE, i, FALSE, FALSE);
+      treeFile = myfopen(extendedTreeFileName, permission);
+      fprintf(treeFile, "%s", tr->tree_string);
+      fclose(treeFile);
+    }  
+    
+}
+
+
+
+/*=======================================================================*/
+/*                         Read a tree from a file                       */
+/*=======================================================================*/
+
+
+/*  1.0.A  Processing of quotation marks in comment removed
+ */
+
+static int treeFinishCom (FILE *fp, char **strp)
+{
+  int  ch;
+  
+  while ((ch = getc(fp)) != EOF && ch != ']') {
+    if (strp != NULL) *(*strp)++ = ch;    /* save character  */
+    if (ch == '[') {                      /* nested comment; find its end */
+      if ((ch = treeFinishCom(fp, strp)) == EOF)  break;
+      if (strp != NULL) *(*strp)++ = ch;  /* save closing ]  */
+    }
+  }
+  
+  if (strp != NULL) **strp = '\0';        /* terminate string  */
+  return  ch;
+} /* treeFinishCom */
+
+
+static int treeGetCh (FILE *fp)         /* get next nonblank, noncomment character */
+{ /* treeGetCh */
+  int  ch;
+
+  while ((ch = getc(fp)) != EOF) {
+    if (whitechar(ch)) ;
+    else if (ch == '[') {                   /* comment; find its end */
+      if ((ch = treeFinishCom(fp, (char **) NULL)) == EOF)  break;
+    }
+    else  break;
+  }
+  
+  return  ch;
+} /* treeGetCh */
+
+
+static boolean treeLabelEnd (int ch)
+{
+  switch (ch) 
+    {
+    case EOF:  
+    case '\0':  
+    case '\t':  
+    case '\n':  
+    case '\r': 
+    case ' ':
+    case ':':  
+    case ',':   
+    case '(':   
+    case ')':  
+    case ';':
+      return TRUE;
+    default:
+      break;
+    }
+  return FALSE;
+} 
+
+
+static boolean  treeGetLabel (FILE *fp, char *lblPtr, int maxlen)
+{
+  int      ch;
+  boolean  done, quoted, lblfound;
+
+  if (--maxlen < 0) 
+    lblPtr = (char *) NULL; 
+  else 
+    if (lblPtr == NULL) 
+      maxlen = 0;
+
+  ch = getc(fp);
+  done = treeLabelEnd(ch);
+
+  lblfound = ! done;
+  quoted = (ch == '\'');
+  if (quoted && ! done) 
+    {
+      ch = getc(fp); 
+      done = (ch == EOF);
+    }
+
+  while (! done) 
+    {
+      if (quoted) 
+	{
+	  if (ch == '\'') 
+	    {
+	      ch = getc(fp); 
+	      if (ch != '\'') 
+		break;
+	    }
+        }
+      else 
+	if (treeLabelEnd(ch)) break;     
+
+      if (--maxlen >= 0) *lblPtr++ = ch;
+      ch = getc(fp);
+      if (ch == EOF) break;
+    }
+
+  if (ch != EOF)  (void) ungetc(ch, fp);
+
+  if (lblPtr != NULL) *lblPtr = '\0';
+
+  return lblfound;
+}
+
+
+static boolean  treeFlushLabel (FILE *fp)
+{ 
+  return  treeGetLabel(fp, (char *) NULL, (int) 0);
+} 
+
+
+
+
+static int treeFindTipByLabelString(char  *str, tree *tr, boolean check)                    
+{
+  int lookup = lookupWord(str, tr->nameHash);
+
+  if(lookup > 0)
+    {
+      if(check)
+	assert(! tr->nodep[lookup]->back);
+      return lookup;
+    }
+  else
+    { 
+      printf("ERROR: Cannot find tree species: %s\n", str);
+      return  0;
+    }
+}
+
+
+int treeFindTipName(FILE *fp, tree *tr, boolean check)
+{
+  char    str[nmlngth+2];
+  int      n;
+
+  if(treeGetLabel(fp, str, nmlngth+2))
+    n = treeFindTipByLabelString(str, tr, check);
+  else
+    n = 0;
+   
+
+  return  n;
+} 
+
+
+
+static void  treeEchoContext (FILE *fp1, FILE *fp2, int n)
+{ /* treeEchoContext */
+  int      ch;
+  boolean  waswhite;
+  
+  waswhite = TRUE;
+  
+  while (n > 0 && ((ch = getc(fp1)) != EOF)) {
+    if (whitechar(ch)) {
+      ch = waswhite ? '\0' : ' ';
+      waswhite = TRUE;
+    }
+    else {
+      waswhite = FALSE;
+    }
+    
+    if (ch > '\0') {putc(ch, fp2); n--;}
+  }
+} /* treeEchoContext */
+
+
+static boolean treeProcessLength (FILE *fp, double *dptr)
+{
+  int  ch;
+  
+  if ((ch = treeGetCh(fp)) == EOF)  return FALSE;    /*  Skip comments */
+  (void) ungetc(ch, fp);
+  
+  if (fscanf(fp, "%lf", dptr) != 1) {
+    printf("ERROR: treeProcessLength: Problem reading branch length\n");
+    treeEchoContext(fp, stdout, 40);
+    printf("\n");
+    return  FALSE;
+  }
+  
+  return  TRUE;
+}
+
+
+static int treeFlushLen (FILE  *fp)
+{
+  double  dummy;  
+  int     ch;
+  
+  ch = treeGetCh(fp);
+  
+  if (ch == ':') 
+    {
+      ch = treeGetCh(fp);
+      
+      ungetc(ch, fp);
+      if(!treeProcessLength(fp, & dummy)) return 0;
+      return 1;	  
+    }
+  
+  
+  
+  if (ch != EOF) (void) ungetc(ch, fp);
+  return 1;
+} 
+
+
+
+
+
+static boolean treeNeedCh (FILE *fp, int c1, char *where)
+{
+  int  c2;
+  
+  if ((c2 = treeGetCh(fp)) == c1)  return TRUE;
+  
+  printf("ERROR: Expecting '%c' %s tree; found:", c1, where);
+  if (c2 == EOF) 
+    {
+      printf("End-of-File");
+    }
+  else 
+    {      	
+      ungetc(c2, fp);
+      treeEchoContext(fp, stdout, 40);
+    }
+  putchar('\n');
+
+  if(c1 == ':')    
+    printf("RAxML may be expecting to read a tree that contains branch lengths\n");
+
+  return FALSE;
+} 
+
+
+
+static boolean addElementLen (FILE *fp, tree *tr, nodeptr p, boolean readBranchLengths, boolean readNodeLabels, int *lcount)
+{   
+  nodeptr  q;
+  int      n, ch, fres;
+  
+  if ((ch = treeGetCh(fp)) == '(') 
+    { 
+      n = (tr->nextnode)++;
+      if (n > 2*(tr->mxtips) - 2) 
+	{
+	  if (tr->rooted || n > 2*(tr->mxtips) - 1) 
+	    {
+	      printf("ERROR: Too many internal nodes.  Is tree rooted?\n");
+	      printf("       Deepest splitting should be a trifurcation.\n");
+	      return FALSE;
+	    }
+	  else 
+	    {
+	      assert(!readNodeLabels);
+	      tr->rooted = TRUE;
+	    }
+	}
+      
+      q = tr->nodep[n];
+
+      if (! addElementLen(fp, tr, q->next, readBranchLengths, readNodeLabels, lcount))        return FALSE;
+      if (! treeNeedCh(fp, ',', "in"))             return FALSE;
+      if (! addElementLen(fp, tr, q->next->next, readBranchLengths, readNodeLabels, lcount))  return FALSE;
+      if (! treeNeedCh(fp, ')', "in"))             return FALSE;
+      
+      if(readNodeLabels)
+	{
+	  char label[64];
+	  int support;
+
+	  if(treeGetLabel (fp, label, 10))
+	    {	
+	      int val = sscanf(label, "%d", &support);
+      
+	      assert(val == 1);
+
+	      /*printf("LABEL %s Number %d\n", label, support);*/
+	      /*p->support = q->support = support;*/
+	      /*printf("%d %d %d %d\n", p->support, q->support, p->number, q->number);*/
+	      assert(p->number > tr->mxtips && q->number > tr->mxtips);
+	      *lcount = *lcount + 1;
+	    }
+	}
+      else	
+	(void) treeFlushLabel(fp);
+    }
+  else 
+    {   
+      ungetc(ch, fp);
+      if ((n = treeFindTipName(fp, tr, TRUE)) <= 0)          return FALSE;
+      q = tr->nodep[n];
+      if (tr->start->number > n)  tr->start = q;
+      (tr->ntips)++;
+    }
+  
+  if(readBranchLengths)
+    {
+      double branch;
+      if (! treeNeedCh(fp, ':', "in"))                 return FALSE;
+      if (! treeProcessLength(fp, &branch))            return FALSE;
+      
+      /*printf("Branch %8.20f %d\n", branch, tr->numBranches);*/
+      hookup(p, q, &branch, tr->numBranches);
+    }
+  else
+    {
+      fres = treeFlushLen(fp);
+      if(!fres) return FALSE;
+      
+      hookupDefault(p, q, tr->numBranches);
+    }
+  return TRUE;          
+} 
+
+
+
+
+
+
+
+
+
+
+
+
+static nodeptr uprootTree (tree *tr, nodeptr p, boolean readBranchLengths)
+{
+  nodeptr  q, r, s, start;
+  int      n, i;              
+
+  for(i = tr->mxtips + 1; i < 2 * tr->mxtips - 1; i++)
+    assert(i == tr->nodep[i]->number);
+  
+  if(isTip(p->number, tr->mxtips) || p->back) 
+    {
+      printf("ERROR: Unable to uproot tree.\n");
+      printf("       Inappropriate node marked for removal.\n");
+      assert(0);
+    }
+  
+  assert(p->back == (nodeptr)NULL);
+  
+  tr->nextnode = tr->nextnode - 1;
+
+  assert(tr->nextnode < 2 * tr->mxtips);
+  
+  n = tr->nextnode;               
+  
+  assert(tr->nodep[tr->nextnode]);
+
+  if (n != tr->mxtips + tr->ntips - 1) 
+    {
+      printf("ERROR: Unable to uproot tree.  Inconsistent\n");
+      printf("       number of tips and nodes for rooted tree.\n");
+      assert(0);
+    }
+
+  q = p->next->back;                  /* remove p from tree */
+  r = p->next->next->back;
+  assert(p->back == (nodeptr)NULL);
+    
+  if(readBranchLengths)
+    {
+      double b[NUM_BRANCHES];
+      int i;
+      for(i = 0; i < tr->numBranches; i++)
+	b[i] = (r->z[i] + q->z[i]);
+      hookup (q, r, b, tr->numBranches);
+    }
+  else    
+    hookupDefault(q, r, tr->numBranches);    
+
+  if(tr->constraintTree)
+    {    
+      if(tr->constraintVector[p->number] != 0)
+	{
+	  printf("Root node to remove should have top-level grouping of 0\n");
+	  assert(0);
+	}
+    }  
+ 
+  assert(!(isTip(r->number, tr->mxtips) && isTip(q->number, tr->mxtips))); 
+
+  assert(p->number > tr->mxtips);
+
+  if(tr->ntips > 2 && p->number != n) 
+    {    	
+      q = tr->nodep[n];            /* transfer last node's conections to p */
+      r = q->next;
+      s = q->next->next;
+      
+      if(tr->constraintTree)	
+	tr->constraintVector[p->number] = tr->constraintVector[q->number];       
+      
+      hookup(p,             q->back, q->z, tr->numBranches);   /* move connections to p */
+      hookup(p->next,       r->back, r->z, tr->numBranches);
+      hookup(p->next->next, s->back, s->z, tr->numBranches);           
+      
+      q->back = q->next->back = q->next->next->back = (nodeptr) NULL;
+    }
+  else    
+    p->back = p->next->back = p->next->next->back = (nodeptr) NULL;
+  
+  assert(tr->ntips > 2);
+  
+  start = findAnyTip(tr->nodep[tr->mxtips + 1], tr->mxtips);
+  
+  assert(isTip(start->number, tr->mxtips));
+  tr->rooted = FALSE;
+  return  start;
+}
+
+
+int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly)
+{
+  nodeptr  
+    p;
+  
+  int      
+    i, 
+    ch, 
+    lcount = 0; 
+
+  for (i = 1; i <= tr->mxtips; i++) 
+    {
+      tr->nodep[i]->back = (node *) NULL; 
+      /*if(topologyOnly)
+	tr->nodep[i]->support = -1;*/
+    }
+
+  for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+    {
+      tr->nodep[i]->back = (nodeptr)NULL;
+      tr->nodep[i]->next->back = (nodeptr)NULL;
+      tr->nodep[i]->next->next->back = (nodeptr)NULL;
+      tr->nodep[i]->number = i;
+      tr->nodep[i]->next->number = i;
+      tr->nodep[i]->next->next->number = i;
+
+      /*if(topologyOnly)
+	{
+	  tr->nodep[i]->support = -2;
+	  tr->nodep[i]->next->support = -2;
+	  tr->nodep[i]->next->next->support = -2;
+	  }*/
+    }
+
+  if(topologyOnly)
+    tr->start       = tr->nodep[tr->mxtips];
+  else
+    tr->start       = tr->nodep[1];
+
+  tr->ntips       = 0;
+  tr->nextnode    = tr->mxtips + 1;      
+ 
+  for(i = 0; i < tr->numBranches; i++)
+    tr->partitionSmoothed[i] = FALSE;
+  
+  tr->rooted      = FALSE;     
+
+  p = tr->nodep[(tr->nextnode)++]; 
+  
+  while((ch = treeGetCh(fp)) != '(');
+      
+  if(!topologyOnly)
+    assert(readBranches == FALSE && readNodeLabels == FALSE);
+  
+       
+  if (! addElementLen(fp, tr, p, readBranches, readNodeLabels, &lcount))                 
+    assert(0);
+  if (! treeNeedCh(fp, ',', "in"))                
+    assert(0);
+  if (! addElementLen(fp, tr, p->next, readBranches, readNodeLabels, &lcount))
+    assert(0);
+  if (! tr->rooted) 
+    {
+      if ((ch = treeGetCh(fp)) == ',') 
+	{ 
+	  if (! addElementLen(fp, tr, p->next->next, readBranches, readNodeLabels, &lcount))
+	    assert(0);	    
+	}
+      else 
+	{                                    /*  A rooted format */
+	  tr->rooted = TRUE;
+	  if (ch != EOF)  (void) ungetc(ch, fp);
+	}	
+    }
+  else 
+    {      
+      p->next->next->back = (nodeptr) NULL;
+    }
+  if (! treeNeedCh(fp, ')', "in"))                
+    assert(0);
+
+  if(topologyOnly)
+    assert(!(tr->rooted && readNodeLabels));
+
+  (void) treeFlushLabel(fp);
+  
+  if (! treeFlushLen(fp))                         
+    assert(0);
+ 
+  if (! treeNeedCh(fp, ';', "at end of"))       
+    assert(0);
+  
+  if (tr->rooted) 
+    {     
+      assert(!readNodeLabels);
+
+      p->next->next->back = (nodeptr) NULL;      
+      tr->start = uprootTree(tr, p->next->next, FALSE);      
+      if (! tr->start)                              
+	{
+	  printf("FATAL ERROR UPROOTING TREE\n");
+	  assert(0);
+	}    
+    }
+  else    
+    tr->start = findAnyTip(p, tr->mxtips);    
+  
+  
+ 
+  assert(tr->ntips == tr->mxtips);
+  
+ 
+   
+  
+  return lcount;
+}
+
+
+static int randomInt(int n)
+{
+  return rand() %n;
+}
+
+static boolean  addElementLenMULT (FILE *fp, tree *tr, nodeptr p, int partitionCounter, int *partCount)
+{ 
+  nodeptr  q, r, s;
+  int      n, ch, fres, rn;
+  double randomResolution;
+  int old;
+    
+  tr->constraintVector[p->number] = partitionCounter; 
+
+  if ((ch = treeGetCh(fp)) == '(') 
+    {
+      *partCount = *partCount + 1;
+      old = *partCount;       
+      
+      n = (tr->nextnode)++;
+      if (n > 2*(tr->mxtips) - 2) 
+	{
+	  if (tr->rooted || n > 2*(tr->mxtips) - 1) 
+	    {
+	      printf("ERROR: Too many internal nodes.  Is tree rooted?\n");
+	      printf("       Deepest splitting should be a trifurcation.\n");
+	      return FALSE;
+	    }
+	  else 
+	    {
+	      tr->rooted = TRUE;	    
+	    }
+	}
+      q = tr->nodep[n];
+      tr->constraintVector[q->number] = *partCount;
+      if (! addElementLenMULT(fp, tr, q->next, old, partCount))        return FALSE;
+      if (! treeNeedCh(fp, ',', "in"))             return FALSE;
+      if (! addElementLenMULT(fp, tr, q->next->next, old, partCount))  return FALSE;
+                 
+      hookupDefault(p, q, tr->numBranches);
+
+      while((ch = treeGetCh(fp)) == ',')
+	{ 
+	  n = (tr->nextnode)++;
+	  if (n > 2*(tr->mxtips) - 2) 
+	    {
+	      if (tr->rooted || n > 2*(tr->mxtips) - 1) 
+		{
+		  printf("ERROR: Too many internal nodes.  Is tree rooted?\n");
+		  printf("       Deepest splitting should be a trifurcation.\n");
+		  return FALSE;
+		}
+	      else 
+		{
+		  tr->rooted = TRUE;
+		}
+	    }
+	  r = tr->nodep[n];
+	  tr->constraintVector[r->number] = *partCount;	  
+
+	  rn = randomInt(10000);
+	  if(rn == 0) 
+	    randomResolution = 0;
+	  else 
+	    randomResolution = ((double)rn)/10000.0;
+	   	  
+	   if(randomResolution < 0.5)
+	    {	    
+	      s = q->next->back;	      
+	      r->back = q->next;
+	      q->next->back = r;	      
+	      r->next->back = s;
+	      s->back = r->next;	      
+	      addElementLenMULT(fp, tr, r->next->next, old, partCount);	     
+	    }
+	  else
+	    {	  
+	      s = q->next->next->back;	      
+	      r->back = q->next->next;
+	      q->next->next->back = r;	      
+	      r->next->back = s;
+	      s->back = r->next;	      
+	      addElementLenMULT(fp, tr, r->next->next, old, partCount);	     
+	    }	    	  	  
+	}       
+
+      if(ch != ')')
+	{
+	  printf("Missing /) in treeReadLenMULT\n");
+	  exit(-1);	        
+	}
+	
+
+
+      (void) treeFlushLabel(fp);
+    }
+  else 
+    {                             
+      ungetc(ch, fp);
+      if ((n = treeFindTipName(fp, tr, TRUE)) <= 0)          return FALSE;
+      q = tr->nodep[n];      
+      tr->constraintVector[q->number] = partitionCounter;
+
+      if (tr->start->number > n)  tr->start = q;
+      (tr->ntips)++;
+      hookupDefault(p, q, tr->numBranches);
+    }
+  
+  fres = treeFlushLen(fp);
+  if(!fres) return FALSE;
+    
+  return TRUE;          
+} 
+
+
+
+
+boolean treeReadLenMULT (FILE *fp, tree *tr, int *partCount)
+{
+  nodeptr  p, r, s;
+  int      i, ch, n, rn;
+  int partitionCounter = 0;
+  double randomResolution;
+
+  srand(tr->randomSeed);
+  
+  for(i = 0; i < 2 * tr->mxtips; i++)
+    tr->constraintVector[i] = -1;
+
+  for (i = 1; i <= tr->mxtips; i++) 
+    tr->nodep[i]->back = (node *) NULL;
+
+  for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+    {
+      tr->nodep[i]->back = (nodeptr)NULL;
+      tr->nodep[i]->next->back = (nodeptr)NULL;
+      tr->nodep[i]->next->next->back = (nodeptr)NULL;
+      tr->nodep[i]->number = i;
+      tr->nodep[i]->next->number = i;
+      tr->nodep[i]->next->next->number = i;
+    }
+
+
+  tr->start       = tr->nodep[tr->mxtips];
+  tr->ntips       = 0;
+  tr->nextnode    = tr->mxtips + 1;
+ 
+  for(i = 0; i < tr->numBranches; i++)
+    tr->partitionSmoothed[i] = FALSE;
+
+  tr->rooted      = FALSE;
+ 
+  p = tr->nodep[(tr->nextnode)++]; 
+  while((ch = treeGetCh(fp)) != '(');
+      
+  if (! addElementLenMULT(fp, tr, p, partitionCounter, partCount))                 return FALSE;
+  if (! treeNeedCh(fp, ',', "in"))                return FALSE;
+  if (! addElementLenMULT(fp, tr, p->next, partitionCounter, partCount))           return FALSE;
+  if (! tr->rooted) 
+    {
+      if ((ch = treeGetCh(fp)) == ',') 
+	{       
+	  if (! addElementLenMULT(fp, tr, p->next->next, partitionCounter, partCount)) return FALSE;
+
+	  while((ch = treeGetCh(fp)) == ',')
+	    { 
+	      n = (tr->nextnode)++;
+	      assert(n <= 2*(tr->mxtips) - 2);
+	
+	      r = tr->nodep[n];	
+	      tr->constraintVector[r->number] = partitionCounter;	   
+	      
+	      rn = randomInt(10000);
+	      if(rn == 0) 
+		randomResolution = 0;
+	      else 
+		randomResolution = ((double)rn)/10000.0;
+
+
+	      if(randomResolution < 0.5)
+		{	
+		  s = p->next->next->back;		  
+		  r->back = p->next->next;
+		  p->next->next->back = r;		  
+		  r->next->back = s;
+		  s->back = r->next;		  
+		  addElementLenMULT(fp, tr, r->next->next, partitionCounter, partCount);	
+		}
+	      else
+		{
+		  s = p->next->back;		  
+		  r->back = p->next;
+		  p->next->back = r;		  
+		  r->next->back = s;
+		  s->back = r->next;		  
+		  addElementLenMULT(fp, tr, r->next->next, partitionCounter, partCount);
+		}
+	    }	  	  	      	  
+
+	  if(ch != ')')
+	    {
+	      printf("Missing /) in treeReadLenMULT\n");
+	      exit(-1);	        	      	      
+	    }
+	  else
+	    ungetc(ch, fp);
+	}
+      else 
+	{ 
+	  tr->rooted = TRUE;
+	  if (ch != EOF)  (void) ungetc(ch, fp);
+	}       
+    }
+  else 
+    {
+      p->next->next->back = (nodeptr) NULL;
+    }
+    
+  if (! treeNeedCh(fp, ')', "in"))                return FALSE;
+  (void) treeFlushLabel(fp);
+  if (! treeFlushLen(fp))                         return FALSE;
+   
+  if (! treeNeedCh(fp, ';', "at end of"))       return FALSE;
+  
+
+  if (tr->rooted) 
+    {        
+      p->next->next->back = (nodeptr) NULL;
+      tr->start = uprootTree(tr, p->next->next, FALSE);
+      if (! tr->start)                              return FALSE;
+    }
+  else 
+    {     
+      tr->start = findAnyTip(p, tr->mxtips);
+    }
+
+  
+  
+  
+
+  assert(tr->ntips == tr->mxtips);
+  
+  return TRUE; 
+}
+
+
+void getStartingTree(tree *tr)
+{
+  FILE *treeFile = myfopen(tree_file, "rb");
+
+  tr->likelihood = unlikely;
+   
+  if(tr->constraintTree)
+    {
+      int 
+	partCount = 0;
+      if (! treeReadLenMULT(treeFile, tr, &partCount))
+	exit(-1);
+    }
+  else
+    treeReadLen(treeFile, tr, FALSE, FALSE, FALSE);
+               
+  fclose(treeFile);
+ 
+  tr->start = tr->nodep[1];
+}
+
+
+
diff --git a/gpl-3.0.txt b/gpl-3.0.txt
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/gpl-3.0.txt
@@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/manual/ExaML.backup.odt b/manual/ExaML.backup.odt
new file mode 100644
index 0000000..3eb065a
Binary files /dev/null and b/manual/ExaML.backup.odt differ
diff --git a/manual/ExaML.odt b/manual/ExaML.odt
new file mode 100644
index 0000000..ec5fcb1
Binary files /dev/null and b/manual/ExaML.odt differ
diff --git a/manual/ExaML.pdf b/manual/ExaML.pdf
new file mode 100644
index 0000000..424f2a9
Binary files /dev/null and b/manual/ExaML.pdf differ
diff --git a/parser/Makefile.SSE3.gcc b/parser/Makefile.SSE3.gcc
new file mode 100644
index 0000000..45b6fa2
--- /dev/null
+++ b/parser/Makefile.SSE3.gcc
@@ -0,0 +1,29 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = gcc 
+CFLAGS = -fomit-frame-pointer -O2 -D_GNU_SOURCE -msse -funroll-loops  #-Wall -Wunused-parameter -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototypes   -Wpointer-sign -Wextra -Wredundant-decls -Wunused -Wunused-fun [...]
+
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs    = axml.o parsePartitions.o
+
+all : clean parse-examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h 
+
+parse-examl : $(objs)
+	$(CC) -o parse-examl $(objs) $(LIBRARIES) 
+
+
+axml.o : axml.c $(GLOBAL_DEPS)
+parsePartitions.o : parsePartitions.c $(GLOBAL_DEPS)
+
+clean : 
+	$(RM) *.o parse-examl
+
+
+dev : parse-examl
\ No newline at end of file
diff --git a/parser/Makefile.check.warnings b/parser/Makefile.check.warnings
new file mode 100644
index 0000000..d3fe76d
--- /dev/null
+++ b/parser/Makefile.check.warnings
@@ -0,0 +1,26 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = clang
+CFLAGS = -fomit-frame-pointer -O2 -D_GNU_SOURCE -msse -funroll-loops -Weverything -Wno-padded  #-Wall -Wunused-parameter -Wredundant-decls  -Wreturn-type  -Wswitch-default -Wunused-value -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int -Wimport  -Wunused  -Wunused-function  -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast  -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs  -Wold-style-definition -Wstrict-prototypes   -Wpointer-sign -Wextra -Wredundant-de [...]
+
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs    = axml.o parsePartitions.o
+
+all : parse-examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h 
+
+parse-examl : $(objs)
+	$(CC) -o parse-examl $(objs) $(LIBRARIES) 
+
+
+axml.o : axml.c $(GLOBAL_DEPS)
+parsePartitions.o : parsePartitions.c $(GLOBAL_DEPS)
+
+clean : 
+	$(RM) *.o parse-examl
diff --git a/parser/USAGE b/parser/USAGE
new file mode 100644
index 0000000..308016d
--- /dev/null
+++ b/parser/USAGE
@@ -0,0 +1 @@
+./parser -m DNA -s ../testdata/49 -q ../testdata/49.model -n 49
diff --git a/parser/axml.c b/parser/axml.c
new file mode 100644
index 0000000..2ff2c49
--- /dev/null
+++ b/parser/axml.c
@@ -0,0 +1,2895 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifdef WIN32
+#include <direct.h>
+#endif
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdarg.h>
+#include <limits.h>
+
+
+#ifdef  _FINE_GRAIN_MPI
+#include <mpi.h>
+#endif
+
+
+
+#ifdef _USE_PTHREADS
+#include <pthread.h>
+
+#endif
+
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+#include <xmmintrin.h>
+/*
+  special bug fix, enforces denormalized numbers to be flushed to zero,
+  without this program is a tiny bit faster though.
+  #include <emmintrin.h> 
+  #define MM_DAZ_MASK    0x0040
+  #define MM_DAZ_ON    0x0040
+  #define MM_DAZ_OFF    0x0000
+*/
+#endif
+
+#include "axml.h"
+#include "globalVariables.h"
+
+
+#define _PORTABLE_PTHREADS
+
+
+
+
+/***************** UTILITY FUNCTIONS **************************/
+
+
+void myBinFwrite(const void *ptr, size_t size, size_t nmemb)
+{ 
+  size_t  
+    bytes_written = fwrite(ptr, size, nmemb, byteFile);
+  
+  assert(bytes_written == nmemb);
+}
+
+
+
+
+
+void *malloc_aligned(size_t size) 
+{
+  void 
+    *ptr = (void *)NULL;
+ 
+  int 
+    res;
+  
+
+#if defined (__APPLE__)
+  /* 
+     presumably malloc on MACs always returns 
+     a 16-byte aligned pointer
+  */
+
+  ptr = malloc(size);
+  
+  if(ptr == (void*)NULL) 
+   assert(0);
+  
+#ifdef __AVX
+  assert(0);
+#endif
+
+
+#else
+  res = posix_memalign( &ptr, BYTE_ALIGNMENT, size );
+
+  if(res != 0) 
+    assert(0);
+#endif 
+   
+  return ptr;
+}
+
+
+
+
+
+
+
+
+void printBothOpen(const char* format, ... )
+{
+  FILE *f = myfopen(infoFileName, "ab");
+
+  va_list args;
+  va_start(args, format);
+  vfprintf(f, format, args );
+  va_end(args);
+
+  va_start(args, format);
+  vprintf(format, args );
+  va_end(args);
+
+  fclose(f);
+}
+
+void printBothOpenMPI(const char* format, ... )
+{
+#ifdef _WAYNE_MPI
+  if(processID == 0)
+#endif
+    {
+      FILE *f = myfopen(infoFileName, "ab");
+
+      va_list args;
+      va_start(args, format);
+      vfprintf(f, format, args );
+      va_end(args);
+      
+      va_start(args, format);
+      vprintf(format, args );
+      va_end(args);
+      
+      fclose(f);
+    }
+}
+
+
+boolean getSmoothFreqs(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].smoothFrequencies;
+}
+
+const unsigned int *getBitVector(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].bitVector;
+}
+
+
+int getStates(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].states;
+}
+
+unsigned char getUndetermined(int dataType)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return pLengths[dataType].undetermined;
+}
+
+
+
+char getInverseMeaning(int dataType, unsigned char state)
+{
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  return  pLengths[dataType].inverseMeaning[state];
+}
+
+partitionLengths *getPartitionLengths(pInfo *p)
+{
+  int 
+    dataType  = p->dataType,
+    states    = p->states,
+    tipLength = p->maxTipStates;
+
+  assert(states != -1 && tipLength != -1);
+
+  assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+  pLength.leftLength = pLength.rightLength = states * states;
+  pLength.eignLength = states;
+  pLength.evLength   = states * states;
+  pLength.eiLength   = states * states;
+  pLength.substRatesLength = (states * states - states) / 2;
+  pLength.frequenciesLength = states;
+  pLength.tipVectorLength   = tipLength * states;
+  pLength.symmetryVectorLength = (states * states - states) / 2;
+  pLength.frequencyGroupingLength = states;  
+  pLength.nonGTR = FALSE;  
+  pLength.optimizeBaseFrequencies = FALSE;
+
+  return (&pLengths[dataType]); 
+}
+
+
+
+
+
+
+
+double gettime(void)
+{
+#ifdef WIN32
+  time_t tp;
+  struct tm localtm;
+  tp = time(NULL);
+  localtm = *localtime(&tp);
+  return 60.0*localtm.tm_min + localtm.tm_sec;
+#else
+  struct timeval ttime;
+  gettimeofday(&ttime , NULL);
+  return ttime.tv_sec + ttime.tv_usec * 0.000001;
+#endif
+}
+
+
+
+double randum (long  *seed)
+{
+  long  sum, mult0, mult1, seed0, seed1, seed2, newseed0, newseed1, newseed2;
+  double res;
+
+  mult0 = 1549;
+  seed0 = *seed & 4095;
+  sum  = mult0 * seed0;
+  newseed0 = sum & 4095;
+  sum >>= 12;
+  seed1 = (*seed >> 12) & 4095;
+  mult1 =  406;
+  sum += mult0 * seed1 + mult1 * seed0;
+  newseed1 = sum & 4095;
+  sum >>= 12;
+  seed2 = (*seed >> 24) & 255;
+  sum += mult0 * seed2 + mult1 * seed1;
+  newseed2 = sum & 255;
+
+  *seed = newseed2 << 24 | newseed1 << 12 | newseed0;
+  res = 0.00390625 * (newseed2 + 0.000244140625 * (newseed1 + 0.000244140625 * newseed0));
+
+  return res;
+}
+
+static int filexists(char *filename)
+{
+  FILE *fp;
+  int res;
+  fp = fopen(filename,"rb");
+
+  if(fp)
+    {
+      res = 1;
+      fclose(fp);
+    }
+  else
+    res = 0;
+
+  return res;
+}
+
+
+FILE *myfopen(const char *path, const char *mode)
+{
+  FILE *fp = fopen(path, mode);
+
+  if(strcmp(mode,"r") == 0 || strcmp(mode,"rb") == 0)
+    {
+      if(fp)
+	return fp;
+      else
+	{
+	  if(processID == 0)
+	    printf("\n Error: the file %s you want to open for reading does not exist, exiting ...\n\n", path);
+	  errorExit(-1);
+	  return (FILE *)NULL;
+	}
+    }
+  else
+    {
+      if(fp)
+	return fp;
+      else
+	{
+	  if(processID == 0)
+	    printf("\n Error: the file %s you want to open for writing or appending can not be opened [mode: %s], exiting ...\n\n",
+		   path, mode);
+	  errorExit(-1);
+	  return (FILE *)NULL;
+	}
+    }
+
+
+}
+
+
+
+
+
+/********************* END UTILITY FUNCTIONS ********************/
+
+
+/******************************some functions for the likelihood computation ****************************/
+
+
+
+
+
+
+
+
+
+
+/***********************reading and initializing input ******************/
+
+static void getnums (rawdata *rdta)
+{
+  if (fscanf(INFILE, "%d %d", & rdta->numsp, & rdta->sites) != 2)
+    {
+      if(processID == 0)
+	printf("\n Error: problem reading number of species and sites\n\n");
+      errorExit(-1);
+    }
+
+  if (rdta->numsp < 4)
+    {
+      if(processID == 0)
+	printf("\n Error: too few species\n\n");
+      errorExit(-1);
+    }
+
+  if (rdta->sites < 1)
+    {
+      if(processID == 0)
+	printf("\n Error: too few sites\n\n");
+      errorExit(-1);
+    }
+
+  return;
+}
+
+
+
+
+
+boolean whitechar (int ch)
+{
+  return (ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r');
+}
+
+
+static void uppercase (int *chptr)
+{
+  int  ch;
+
+  ch = *chptr;
+  if ((ch >= 'a' && ch <= 'i') || (ch >= 'j' && ch <= 'r')
+      || (ch >= 's' && ch <= 'z'))
+    *chptr = ch + 'A' - 'a';
+}
+
+
+
+
+static void getyspace (rawdata *rdta)
+{
+  size_t size = 4 * ((size_t)(rdta->sites / 4 + 1));
+  
+ 
+
+  int    i;
+  unsigned char *y0;
+
+  rdta->y = (unsigned char **) malloc(((size_t)rdta->numsp + 1) * sizeof(unsigned char *));
+  assert(rdta->y);   
+
+  y0 = (unsigned char *)calloc(((size_t)(rdta->numsp + 1)) * size, sizeof(unsigned char));
+
+  /*
+    printf("Raw alignment data Assigning %Zu bytes\n", ((size_t)(rdta->numsp + 1)) * size * sizeof(unsigned char));
+
+  */
+
+  assert(y0);   
+
+  rdta->y0 = y0;
+
+  for (i = 0; i <= rdta->numsp; i++)
+    {
+      rdta->y[i] = y0;
+      y0 += size;
+    }
+
+  return;
+}
+
+
+
+
+static boolean setupTree (tree *tr, analdef *adef)
+{
+  nodeptr  
+    p0;
+  
+  int
+    tips,
+    inter; 
+
+  if(!adef->readTaxaOnly)
+    {
+      /*tr->bigCutoff = FALSE;*/
+
+      tr->patternPosition = (int*)NULL;
+      tr->columnPosition = (int*)NULL;
+
+      /*tr->maxCategories = MAX(4, adef->categories);*/
+
+      /*tr->partitionContributions = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+      for(i = 0; i < tr->NumberOfModels; i++)
+	tr->partitionContributions[i] = -1.0;
+
+      tr->perPartitionLH = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+      
+
+      for(i = 0; i < tr->NumberOfModels; i++)
+	{
+	  tr->perPartitionLH[i] = 0.0;	 
+	}
+
+      if(adef->grouping)
+	tr->grouped = TRUE;
+      else
+	tr->grouped = FALSE;
+
+      if(adef->constraint)
+	tr->constrained = TRUE;
+      else
+	tr->constrained = FALSE;
+
+	tr->treeID = 0;*/
+    }
+
+  tips  = tr->mxtips;
+  inter = tr->mxtips - 1;
+
+  if(!adef->readTaxaOnly)
+    {
+      tr->yVector      = (unsigned char **)  malloc(((size_t)tr->mxtips + 1) * sizeof(unsigned char *));
+
+      /*      tr->fracchanges  = (double *)malloc(tr->NumberOfModels * sizeof(double));
+	      tr->likelihoods  = (double *)malloc(adef->multipleRuns * sizeof(double));*/
+    }
+
+  /*tr->numberOfTrees = -1;
+
+ 
+
+  tr->treeStringLength = tr->mxtips * (nmlngth+128) + 256 + tr->mxtips * 2;
+
+  tr->tree_string  = (char*)calloc(tr->treeStringLength, sizeof(char)); 
+  tr->tree0 = (char*)calloc(tr->treeStringLength, sizeof(char));
+  tr->tree1 = (char*)calloc(tr->treeStringLength, sizeof(char));*/
+
+
+  /*TODO, must that be so long ?*/
+
+  if(!adef->readTaxaOnly)
+    {
+            
+      /*tr->td[0].count = 0;
+      tr->td[0].ti    = (traversalInfo *)malloc(sizeof(traversalInfo) * tr->mxtips);
+      tr->td[0].executeModel = (boolean *)malloc(sizeof(boolean) * tr->NumberOfModels);
+      tr->td[0].parameterValues = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+       
+      for(i = 0; i < tr->NumberOfModels; i++)
+	tr->fracchanges[i] = -1.0;
+      tr->fracchange = -1.0;
+
+      tr->constraintVector = (int *)malloc((2 * tr->mxtips) * sizeof(int));*/
+
+      tr->nameList = (char **)malloc(sizeof(char *) * ((size_t)tips + 1));
+    }
+
+  if (!(p0 = (nodeptr) malloc(((size_t)tips + 3 * (size_t)inter) * sizeof(node))))
+    {
+      printf("\n Error: unable to obtain sufficient tree memory\n\n");
+      return  FALSE;
+    }
+  
+ 
+
+  
+
+  tr->vLength = 0;
+
+  tr->h = (hashtable*)NULL;
+
+
+  return TRUE;
+}
+
+
+static void checkTaxonName(char *buffer, int len)
+{
+  int i;
+
+  for(i = 0; i < len - 1; i++)
+    {
+      boolean valid;
+
+      switch(buffer[i])
+	{
+	case '\0':
+	case '\t':
+	case '\n':
+	case '\r':
+	case ' ':
+	case ':':
+	case ',':
+	case '(':
+	case ')':
+	case ';':
+	case '[':
+	case ']':
+	  valid = FALSE;
+	  break;
+	default:
+	  valid = TRUE;
+	}
+
+      if(!valid)
+	{
+	  printf("\n Error: Taxon Name \"%s\" is invalid at position %d, it contains illegal character %c\n\n", buffer, i, buffer[i]);
+	  printf(" Illegal characters in taxon-names are: tabulators, carriage returns, spaces, \":\", \",\", \")\", \"(\", \";\", \"]\", \"[\"\n");
+	  printf(" Exiting\n");
+	  exit(-1);
+	}
+
+    }
+  assert(buffer[len - 1] == '\0');
+}
+
+static boolean getdata(analdef *adef, rawdata *rdta, tree *tr)
+{
+  int   
+    i, 
+    j, 
+    basesread, 
+    basesnew, 
+    ch, my_i, meaning,
+    len,
+    meaningAA[256], 
+    meaningDNA[256], 
+    meaningBINARY[256],
+    meaningGeneric32[256],
+    meaningGeneric64[256];
+  
+  boolean  
+    allread, 
+    firstpass;
+  
+  char 
+    buffer[nmlngth + 2];
+  
+  unsigned char
+    genericChars32[32] = {'0', '1', '2', '3', '4', '5', '6', '7', 
+			  '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+			  'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+			  'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V'};  
+  unsigned long 
+    total = 0,
+    gaps  = 0;
+
+  for (i = 0; i < 256; i++)
+    {      
+      meaningAA[i]          = -1;
+      meaningDNA[i]         = -1;
+      meaningBINARY[i]      = -1;
+      meaningGeneric32[i]   = -1;
+      meaningGeneric64[i]   = -1;
+    }
+
+  /* generic 32 data */
+
+  for(i = 0; i < 32; i++)
+    meaningGeneric32[genericChars32[i]] = i;
+  meaningGeneric32['-'] = getUndetermined(GENERIC_32);
+  meaningGeneric32['?'] = getUndetermined(GENERIC_32);
+
+  /* AA data */
+
+  meaningAA['A'] =  0;  /* alanine */
+  meaningAA['R'] =  1;  /* arginine */
+  meaningAA['N'] =  2;  /*  asparagine*/
+  meaningAA['D'] =  3;  /* aspartic */
+  meaningAA['C'] =  4;  /* cysteine */
+  meaningAA['Q'] =  5;  /* glutamine */
+  meaningAA['E'] =  6;  /* glutamic */
+  meaningAA['G'] =  7;  /* glycine */
+  meaningAA['H'] =  8;  /* histidine */
+  meaningAA['I'] =  9;  /* isoleucine */
+  meaningAA['L'] =  10; /* leucine */
+  meaningAA['K'] =  11; /* lysine */
+  meaningAA['M'] =  12; /* methionine */
+  meaningAA['F'] =  13; /* phenylalanine */
+  meaningAA['P'] =  14; /* proline */
+  meaningAA['S'] =  15; /* serine */
+  meaningAA['T'] =  16; /* threonine */
+  meaningAA['W'] =  17; /* tryptophan */
+  meaningAA['Y'] =  18; /* tyrosine */
+  meaningAA['V'] =  19; /* valine */
+  meaningAA['B'] =  20; /* asparagine, aspartic 2 and 3*/
+  meaningAA['Z'] =  21; /*21 glutamine glutamic 5 and 6*/
+
+  meaningAA['X'] = 
+    meaningAA['?'] = 
+    meaningAA['*'] = 
+    meaningAA['-'] = 
+    getUndetermined(AA_DATA);
+
+  /* DNA data */
+
+  meaningDNA['A'] =  1;
+  meaningDNA['B'] = 14;
+  meaningDNA['C'] =  2;
+  meaningDNA['D'] = 13;
+  meaningDNA['G'] =  4;
+  meaningDNA['H'] = 11;
+  meaningDNA['K'] = 12;
+  meaningDNA['M'] =  3;  
+  meaningDNA['R'] =  5;
+  meaningDNA['S'] =  6;
+  meaningDNA['T'] =  8;
+  meaningDNA['U'] =  8;
+  meaningDNA['V'] =  7;
+  meaningDNA['W'] =  9; 
+  meaningDNA['Y'] = 10;
+
+  meaningDNA['N'] = 
+    meaningDNA['O'] = 
+    meaningDNA['X'] = 
+    meaningDNA['-'] = 
+    meaningDNA['?'] = 
+    getUndetermined(DNA_DATA);
+
+  /* BINARY DATA */
+
+  meaningBINARY['0'] = 1;
+  meaningBINARY['1'] = 2;
+  
+  meaningBINARY['-'] = 
+    meaningBINARY['?'] = 
+    getUndetermined(BINARY_DATA);
+
+
+  /*******************************************************************/
+
+  basesread = basesnew = 0;
+
+  allread = FALSE;
+  firstpass = TRUE;
+  ch = ' ';
+
+  while (! allread)
+    {
+      for (i = 1; i <= tr->mxtips; i++)
+	{
+	  if (firstpass)
+	    {
+	      ch = getc(INFILE);
+	      while(ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r')
+		ch = getc(INFILE);
+
+	      my_i = 0;
+
+	      do
+		{
+		  buffer[my_i] = (char)ch;
+		  ch = getc(INFILE);
+		  my_i++;
+		  if(my_i >= nmlngth)
+		    {
+		      if(processID == 0)
+			{
+			  printf("Taxon Name to long at taxon %d, adapt constant nmlngth in\n", i);
+			  printf("axml.h, current setting %d\n", nmlngth);
+			}
+		      errorExit(-1);
+		    }
+		}
+	      while(ch !=  ' ' && ch != '\n' && ch != '\t' && ch != '\r');
+
+	      while(ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r')
+		ch = getc(INFILE);
+	      
+	      ungetc(ch, INFILE);
+
+	      buffer[my_i] = '\0';
+	      len = (int)strlen(buffer) + 1;
+	      checkTaxonName(buffer, len);
+	      tr->nameList[i] = (char *)malloc(sizeof(char) * (size_t)len);
+	      strcpy(tr->nameList[i], buffer);
+	    }
+
+	  j = basesread;
+
+	  while ((j < rdta->sites) && ((ch = getc(INFILE)) != EOF) && (ch != '\n') && (ch != '\r'))
+	    {
+	      uppercase(& ch);
+
+	      assert(tr->dataVector[j + 1] != -1);
+
+	      switch(tr->dataVector[j + 1])
+		{
+		case BINARY_DATA:
+		  meaning = meaningBINARY[ch];
+		  break;
+		case DNA_DATA:
+		case SECONDARY_DATA:
+		case SECONDARY_DATA_6:
+		case SECONDARY_DATA_7:
+		  /*
+		     still dealing with DNA/RNA here, hence just act if as they where DNA characters
+		     corresponding column merging for sec struct models will take place later
+		  */
+		  meaning = meaningDNA[ch];
+		  break;
+		case AA_DATA:
+		  meaning = meaningAA[ch];
+		  break;
+		case GENERIC_32:
+		  meaning = meaningGeneric32[ch];
+		  break;
+		case GENERIC_64:
+		  meaning = meaningGeneric64[ch];
+		  break;
+		default:
+		  assert(0);
+		}
+
+	      if (meaning != -1)
+		{
+		  j++;
+		  rdta->y[i][j] = (unsigned char)ch;		 
+		}
+	      else
+		{
+		  if(!whitechar(ch))
+		    {
+		      printf("\n Error: bad base (%c) at site %d of sequence %d\n\n",
+			     ch, j + 1, i);
+		      return FALSE;
+		    }
+		}
+	    }
+
+	  if (ch == EOF)
+	    {
+	      printf("\n Error: end-of-file at site %d of sequence %d\n\n", j + 1, i);
+	      return  FALSE;
+	    }
+
+	  if (! firstpass && (j == basesread))
+	    i--;
+	  else
+	    {
+	      if (i == 1)
+		basesnew = j;
+	      else
+		if (j != basesnew)
+		  {
+		    printf("\n Error: sequences out of alignment\n");
+		    printf("%d (instead of %d) residues read in sequence %d %s\n",
+			   j - basesread, basesnew - basesread, i, tr->nameList[i]);
+		    return  FALSE;
+		  }
+	    }
+	  while (ch != '\n' && ch != EOF && ch != '\r') ch = getc(INFILE);  /* flush line *//* PC-LINEBREAK*/
+	}
+
+      firstpass = FALSE;
+      basesread = basesnew;
+      allread = (basesread >= rdta->sites);
+    }
+
+  for(j = 1; j <= tr->mxtips; j++)
+    for(i = 1; i <= rdta->sites; i++)
+      {
+	assert(tr->dataVector[i] != -1);
+
+	switch(tr->dataVector[i])
+	  {
+	  case BINARY_DATA:
+	    meaning = meaningBINARY[rdta->y[j][i]];
+	    if(meaning == getUndetermined(BINARY_DATA))
+	      gaps++;
+	    break;
+
+	  case SECONDARY_DATA:
+	  case SECONDARY_DATA_6:
+	  case SECONDARY_DATA_7:
+	    assert(tr->secondaryStructurePairs[i - 1] != -1);
+	    assert(i - 1 == tr->secondaryStructurePairs[tr->secondaryStructurePairs[i - 1]]);
+	    /*
+	       don't worry too much about undetermined column count here for sec-struct, just count
+	       DNA/RNA gaps here and worry about the rest later-on, falling through to DNA again :-)
+	    */
+	  case DNA_DATA:
+	    meaning = meaningDNA[rdta->y[j][i]];
+	    if(meaning == getUndetermined(DNA_DATA))
+	      gaps++;
+	    break;
+
+	  case AA_DATA:
+	    meaning = meaningAA[rdta->y[j][i]];
+	    if(meaning == getUndetermined(AA_DATA))
+	      gaps++;
+	    break;
+
+	  case GENERIC_32:
+	    meaning = meaningGeneric32[rdta->y[j][i]];
+	    if(meaning == getUndetermined(GENERIC_32))
+	      gaps++;
+	    break;
+
+	  case GENERIC_64:
+	    meaning = meaningGeneric64[rdta->y[j][i]];
+	    if(meaning == getUndetermined(GENERIC_64))
+	      gaps++;
+	    break;
+	  default:
+	    assert(0);
+	  }
+
+	total++;
+	rdta->y[j][i] = (unsigned char)meaning;
+      }
+
+  adef->gapyness = (double)gaps / (double)total;
+    
+  /*myBinFwrite(&(adef->gapyness), sizeof(double), 1);*/
+
+  printf("gappyness: %f\n", adef->gapyness);
+  
+  /*for(i = 1; i <= tr->mxtips; i++)
+    {
+      int 
+	len = strlen(tr->nameList[i]) + 1;
+      
+      myBinFwrite(&len, sizeof(int), 1);
+      myBinFwrite(tr->nameList[i], sizeof(char), len);
+      
+      printf("%d %s\n", len, tr->nameList[i]);
+      }     */
+  
+  return  TRUE;
+}
+
+
+
+static void inputweights (rawdata *rdta)
+{
+  int i, w, fres;
+  FILE *weightFile;
+  int *wv = (int *)malloc(sizeof(int) *  (size_t)rdta->sites);
+
+  weightFile = myfopen(weightFileName, "rb");
+
+  i = 0;
+
+  while((fres = fscanf(weightFile,"%d", &w)) != EOF)
+    {
+      if(!fres)
+	{
+	  if(processID == 0)
+	    printf("error reading weight file probably encountered a non-integer weight value\n");
+	  errorExit(-1);
+	}
+      wv[i] = w;
+      i++;
+    }
+
+  if(i != rdta->sites)
+    {
+      if(processID == 0)
+	printf("number %d of weights not equal to number %d of alignment columns\n", i, rdta->sites);
+      errorExit(-1);
+    }
+
+  for(i = 1; i <= rdta->sites; i++)
+    rdta->wgt[i] = wv[i - 1];
+
+  fclose(weightFile);
+  free(wv);
+}
+
+static hashNumberType hashString(char *p, hashNumberType tableSize)
+{
+  hashNumberType 
+    h = 0;
+  
+  for(; *p; p++)
+    h = 31 * h + (hashNumberType)*p;
+  
+  return (h % tableSize);
+}
+
+static void addword(char *s, stringHashtable *h, int nodeNumber)
+{
+  hashNumberType position = hashString(s, h->tableSize);
+  stringEntry *p = h->table[position];
+  
+  for(; p!= NULL; p = p->next)
+    {
+      if(strcmp(s, p->word) == 0)		 
+	return;	  	
+    }
+
+  p = (stringEntry *)malloc(sizeof(stringEntry));
+
+  assert(p);
+  
+  p->nodeNumber = nodeNumber;
+  p->word = (char *)malloc((strlen(s) + 1) * sizeof(char));
+
+  strcpy(p->word, s);
+  
+  p->next =  h->table[position];
+  
+  h->table[position] = p;
+}
+
+
+static stringHashtable *initStringHashTable(hashNumberType n)
+{
+  /* 
+     init with primes 
+  */
+    
+  static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+					     196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+					     50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+ 
+
+  /* init with powers of two
+
+  static const  hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+					      32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+					      4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+					      268435456, 536870912, 1073741824, 2147483648U};
+  */
+  
+  stringHashtable *h = (stringHashtable*)malloc(sizeof(stringHashtable));
+  
+  hashNumberType
+    tableSize,
+    i,
+    primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+    maxSize = (hashNumberType)-1;    
+
+  assert(n <= maxSize);
+
+  i = 0;
+
+  while(initTable[i] < n && i < primeTableLength)
+    i++;
+
+  assert(i < primeTableLength);
+
+  tableSize = initTable[i];  
+
+  h->table = (stringEntry**)calloc(tableSize, sizeof(stringEntry*));
+  h->tableSize = tableSize;    
+
+  return h;
+}
+
+
+
+static void getinput(analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr)
+{
+  int i;
+
+  INFILE = myfopen(seq_file, "rb");
+  
+  getnums(rdta);
+  
+     
+  /*myBinFwrite(&(rdta->sites), sizeof(int), 1);
+  myBinFwrite(&(rdta->numsp), sizeof(int), 1);  
+
+  printf("%d %d\n", rdta->sites, rdta->numsp);*/
+    
+
+  tr->mxtips            = rdta->numsp;
+  
+  
+  rdta->wgt             = (int *)    malloc(((size_t)rdta->sites + 1) * sizeof(int));
+  cdta->alias           = (int *)    malloc(((size_t)rdta->sites + 1) * sizeof(int));
+  cdta->aliaswgt        = (int *)    malloc(((size_t)rdta->sites + 1) * sizeof(int)); 
+  tr->model             = (int *)    calloc(((size_t)rdta->sites + 1), sizeof(int));
+  tr->initialDataVector  = (int *)    malloc(((size_t)rdta->sites + 1) * sizeof(int));
+  tr->extendedDataVector = (int *)    malloc(((size_t)rdta->sites + 1) * sizeof(int));         
+  
+  if(!adef->useWeightFile)
+    {
+      for (i = 1; i <= rdta->sites; i++)
+	rdta->wgt[i] = 1;
+    }
+  else
+    {
+      assert(!adef->useSecondaryStructure);
+      inputweights(rdta);
+    }
+
+  
+  if(adef->useMultipleModel)
+    {
+      int ref;
+      
+      parsePartitions(adef, rdta, tr);
+      
+      for(i = 1; i <= rdta->sites; i++)
+	{
+	  ref = tr->model[i];
+	  tr->initialDataVector[i] = tr->initialPartitionData[ref].dataType;
+	}
+    }
+  else
+    {
+      int dataType = -1;
+	              
+      tr->initialPartitionData  = (pInfo*)malloc(sizeof(pInfo));
+      tr->initialPartitionData->optimizeBaseFrequencies = FALSE;
+      
+      
+      tr->initialPartitionData[0].partitionName = (char*)malloc(128 * sizeof(char));
+      strcpy(tr->initialPartitionData[0].partitionName, "No Name Provided");
+      
+      tr->initialPartitionData[0].protModels = adef->proteinMatrix;
+      tr->initialPartitionData[0].protFreqs  = adef->protEmpiricalFreqs;
+      
+      
+      tr->NumberOfModels = 1;
+           
+      
+      if(adef->model == M_PROTCAT || adef->model == M_PROTGAMMA)
+	dataType = AA_DATA;
+      if(adef->model == M_GTRCAT || adef->model == M_GTRGAMMA)
+	dataType = DNA_DATA;
+      if(adef->model == M_BINCAT || adef->model == M_BINGAMMA)
+	dataType = BINARY_DATA;
+      if(adef->model == M_32CAT || adef->model == M_32GAMMA)
+	dataType = GENERIC_32;
+      if(adef->model == M_64CAT || adef->model == M_64GAMMA)
+	dataType = GENERIC_64;
+      
+      
+      
+      assert(dataType == BINARY_DATA || dataType == DNA_DATA || dataType == AA_DATA || 
+	     dataType == GENERIC_32  || dataType == GENERIC_64);
+      
+      tr->initialPartitionData[0].dataType = dataType;
+      
+      for(i = 0; i <= rdta->sites; i++)
+	{
+	  tr->initialDataVector[i] = dataType;
+	  tr->model[i]      = 0;
+	}
+    }
+  
+  if(adef->useSecondaryStructure)
+    {
+      memcpy(tr->extendedDataVector, tr->initialDataVector, ((size_t)rdta->sites + 1) * sizeof(int));
+      
+      tr->extendedPartitionData =(pInfo*)malloc(sizeof(pInfo) * (size_t)tr->NumberOfModels);
+      
+      for(i = 0; i < tr->NumberOfModels; i++)
+	{
+	  tr->extendedPartitionData[i].partitionName = (char*)malloc((strlen(tr->initialPartitionData[i].partitionName) + 1) * sizeof(char));
+	  strcpy(tr->extendedPartitionData[i].partitionName, tr->initialPartitionData[i].partitionName);
+	  tr->extendedPartitionData[i].dataType   = tr->initialPartitionData[i].dataType;
+	  
+	  tr->extendedPartitionData[i].protModels = tr->initialPartitionData[i].protModels;
+	  tr->extendedPartitionData[i].protFreqs  = tr->initialPartitionData[i].protFreqs;
+	}
+      
+      parseSecondaryStructure(tr, adef, rdta->sites);
+      
+      tr->dataVector    = tr->extendedDataVector;
+      tr->partitionData = tr->extendedPartitionData;
+    }
+  else
+    {
+      tr->dataVector    = tr->initialDataVector;
+      tr->partitionData = tr->initialPartitionData;
+    }
+  
+ 
+  
+  getyspace(rdta);
+
+  setupTree(tr, adef);
+
+      
+	
+  if(!getdata(adef, rdta, tr))
+    {
+      printf("Problem reading alignment file \n");
+      errorExit(1);
+    }
+      
+  tr->nameHash = initStringHashTable(10 * (size_t)tr->mxtips);
+  for(i = 1; i <= tr->mxtips; i++)
+    addword(tr->nameList[i], tr->nameHash, i);
+      
+  fclose(INFILE);
+}
+
+
+
+static unsigned char buildStates(int secModel, unsigned char v1, unsigned char v2)
+{
+  unsigned char 
+    new = 0;
+
+  switch(secModel)
+    {
+    case SECONDARY_DATA:
+      new = v1;
+      new = new << 4;
+      new = new | v2;
+      break;
+    case SECONDARY_DATA_6:
+      {
+	int
+	  meaningDNA[256],
+	  i;
+
+	const unsigned char
+	  allowedStates[6][2] = {{'A','T'}, {'C', 'G'}, {'G', 'C'}, {'G','T'}, {'T', 'A'}, {'T', 'G'}};
+
+	const unsigned char
+	  finalBinaryStates[6] = {1, 2, 4, 8, 16, 32};
+
+	unsigned char
+	  intermediateBinaryStates[6];
+
+	int length = 6;
+
+	for(i = 0; i < 256; i++)
+	  meaningDNA[i] = -1;
+
+	meaningDNA['A'] =  1;
+	meaningDNA['B'] = 14;
+	meaningDNA['C'] =  2;
+	meaningDNA['D'] = 13;
+	meaningDNA['G'] =  4;
+	meaningDNA['H'] = 11;
+	meaningDNA['K'] = 12;
+	meaningDNA['M'] =  3;
+	meaningDNA['N'] = 15;
+	meaningDNA['O'] = 15;
+	meaningDNA['R'] =  5;
+	meaningDNA['S'] =  6;
+	meaningDNA['T'] =  8;
+	meaningDNA['U'] =  8;
+	meaningDNA['V'] =  7;
+	meaningDNA['W'] =  9;
+	meaningDNA['X'] = 15;
+	meaningDNA['Y'] = 10;
+	meaningDNA['-'] = 15;
+	meaningDNA['?'] = 15;
+
+	for(i = 0; i < length; i++)
+	  {
+	    unsigned char n1 = meaningDNA[allowedStates[i][0]];
+	    unsigned char n2 = meaningDNA[allowedStates[i][1]];
+
+	    new = n1;
+	    new = new << 4;
+	    new = new | n2;
+
+	    intermediateBinaryStates[i] = new;
+	  }
+
+	new = v1;
+	new = new << 4;
+	new = new | v2;
+
+	for(i = 0; i < length; i++)
+	  {
+	    if(new == intermediateBinaryStates[i])
+	      break;
+	  }
+	if(i < length)
+	  new = finalBinaryStates[i];
+	else
+	  {
+	    new = 0;
+	    for(i = 0; i < length; i++)
+	      {
+		if(v1 & meaningDNA[allowedStates[i][0]])
+		  {
+		    /*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+		    new |= finalBinaryStates[i];
+		  }
+		if(v2 & meaningDNA[allowedStates[i][1]])
+		  {
+		    /*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+		    new |= finalBinaryStates[i];
+		  }
+	      }
+	  }	
+      }
+      break;
+    case SECONDARY_DATA_7:
+      {
+	int
+	  meaningDNA[256],
+	  i;
+
+	const unsigned char
+	  allowedStates[6][2] = {{'A','T'}, {'C', 'G'}, {'G', 'C'}, {'G','T'}, {'T', 'A'}, {'T', 'G'}};
+
+	const unsigned char
+	  finalBinaryStates[7] = {1, 2, 4, 8, 16, 32, 64};
+
+	unsigned char
+	  intermediateBinaryStates[7];
+
+	for(i = 0; i < 256; i++)
+	  meaningDNA[i] = -1;
+
+	meaningDNA['A'] =  1;
+	meaningDNA['B'] = 14;
+	meaningDNA['C'] =  2;
+	meaningDNA['D'] = 13;
+	meaningDNA['G'] =  4;
+	meaningDNA['H'] = 11;
+	meaningDNA['K'] = 12;
+	meaningDNA['M'] =  3;
+	meaningDNA['N'] = 15;
+	meaningDNA['O'] = 15;
+	meaningDNA['R'] =  5;
+	meaningDNA['S'] =  6;
+	meaningDNA['T'] =  8;
+	meaningDNA['U'] =  8;
+	meaningDNA['V'] =  7;
+	meaningDNA['W'] =  9;
+	meaningDNA['X'] = 15;
+	meaningDNA['Y'] = 10;
+	meaningDNA['-'] = 15;
+	meaningDNA['?'] = 15;
+	
+
+	for(i = 0; i < 6; i++)
+	  {
+	    unsigned char n1 = meaningDNA[allowedStates[i][0]];
+	    unsigned char n2 = meaningDNA[allowedStates[i][1]];
+
+	    new = n1;
+	    new = new << 4;
+	    new = new | n2;
+
+	    intermediateBinaryStates[i] = new;
+	  }
+
+	new = v1;
+	new = new << 4;
+	new = new | v2;
+
+	for(i = 0; i < 6; i++)
+	  {
+	    /* exact match */
+	    if(new == intermediateBinaryStates[i])
+	      break;
+	  }
+	if(i < 6)
+	  new = finalBinaryStates[i];
+	else
+	  {
+	    /* distinguish between exact mismatches and partial mismatches */
+
+	    for(i = 0; i < 6; i++)
+	      if((v1 & meaningDNA[allowedStates[i][0]]) && (v2 & meaningDNA[allowedStates[i][1]]))
+		break;
+	    if(i < 6)
+	      {
+		/* printf("partial mismatch\n"); */
+
+		new = 0;
+		for(i = 0; i < 6; i++)
+		  {
+		    if((v1 & meaningDNA[allowedStates[i][0]]) && (v2 & meaningDNA[allowedStates[i][1]]))
+		      {
+			/*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+			new |= finalBinaryStates[i];
+		      }
+		    else
+		      new |=  finalBinaryStates[6];
+		  }
+	      }
+	    else
+	      new = finalBinaryStates[6];
+	  }	
+      }
+      break;
+    default:
+      assert(0);
+    }
+
+  return new;
+
+}
+
+
+
+static void adaptRdataToSecondary(tree *tr, rawdata *rdta)
+{
+  int *alias = (int*)calloc((size_t)rdta->sites, sizeof(int));
+  int i, j, realPosition;  
+
+  for(i = 0; i < rdta->sites; i++)
+    alias[i] = -1;
+
+  for(i = 0, realPosition = 0; i < rdta->sites; i++)
+    {
+      int partner = tr->secondaryStructurePairs[i];
+      if(partner != -1)
+	{
+	  assert(tr->dataVector[i+1] == SECONDARY_DATA || tr->dataVector[i+1] == SECONDARY_DATA_6 || tr->dataVector[i+1] == SECONDARY_DATA_7);
+
+	  if(i < partner)
+	    {
+	      for(j = 1; j <= rdta->numsp; j++)
+		{
+		  unsigned char v1 = rdta->y[j][i+1];
+		  unsigned char v2 = rdta->y[j][partner+1];
+
+		  assert(i+1 < partner+1);
+
+		  rdta->y[j][i+1] = buildStates(tr->dataVector[i+1], v1, v2);
+		}
+	      alias[realPosition] = i;
+	      realPosition++;
+	    }
+	}
+      else
+	{
+	  alias[realPosition] = i;
+	  realPosition++;
+	}
+    }
+
+  assert(rdta->sites - realPosition == tr->numberOfSecondaryColumns / 2);
+
+  rdta->sites = realPosition;
+
+  for(i = 0; i < rdta->sites; i++)
+    {
+      assert(alias[i] != -1);
+      tr->model[i+1]    = tr->model[alias[i]+1];
+      tr->dataVector[i+1] = tr->dataVector[alias[i]+1];
+      rdta->wgt[i+1] =  rdta->wgt[alias[i]+1];
+
+      for(j = 1; j <= rdta->numsp; j++)
+	rdta->y[j][i+1] = rdta->y[j][alias[i]+1];
+    }
+
+  free(alias);
+}
+
+static void sitesort(rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+  int  gap, i, j, jj, jg, k, n, nsp;
+  int  
+    *index, 
+    *category = (int*)NULL;
+
+  boolean  flip, tied;
+  unsigned char  **data;
+
+  if(adef->useSecondaryStructure)
+    {
+      assert(tr->NumberOfModels > 1 && adef->useMultipleModel);
+
+      adaptRdataToSecondary(tr, rdta);
+    }
+
+  if(adef->useMultipleModel)    
+    category      = tr->model;
+  
+
+  index    = cdta->alias;
+  data     = rdta->y;
+  n        = rdta->sites;
+  nsp      = rdta->numsp;
+  index[0] = -1;
+
+
+  if(adef->compressPatterns)
+    {
+      for (gap = n / 2; gap > 0; gap /= 2)
+	{
+	  for (i = gap + 1; i <= n; i++)
+	    {
+	      j = i - gap;
+
+	      do
+		{
+		  jj = index[j];
+		  jg = index[j+gap];
+		  if(adef->useMultipleModel)
+		    {		     		      
+		      assert(category[jj] != -1 &&
+			     category[jg] != -1);
+		     
+		      flip = (category[jj] > category[jg]);
+		      tied = (category[jj] == category[jg]);		     
+
+		    }
+		  else
+		    {
+		      flip = 0;
+		      tied = 1;
+		    }
+
+		  for (k = 1; (k <= nsp) && tied; k++)
+		    {
+		      flip = (data[k][jj] >  data[k][jg]);
+		      tied = (data[k][jj] == data[k][jg]);
+		    }
+
+		  if (flip)
+		    {
+		      index[j]     = jg;
+		      index[j+gap] = jj;
+		      j -= gap;
+		    }
+		}
+	      while (flip && (j > 0));
+	    }
+	}
+    }
+}
+
+
+static void sitecombcrunch (rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+  
+  boolean  
+    tied;
+  
+  int
+    i,
+    sitei, 
+    j, 
+    sitej, 
+    k;
+
+  int 
+    *aliasModel = (int*)NULL,
+    *aliasSuperModel = (int*)NULL,        
+    undeterminedSites = 0;
+
+  if(adef->useMultipleModel)
+    {
+      aliasSuperModel = (int*)malloc(sizeof(int) * ((size_t)rdta->sites + 1));
+      aliasModel      = (int*)malloc(sizeof(int) * ((size_t)rdta->sites + 1));
+    } 
+
+  i = 0;
+  cdta->alias[0]    = cdta->alias[1];
+  cdta->aliaswgt[0] = 0;
+
+  if(adef->mode == PER_SITE_LL)
+    {      
+      assert(0);
+
+      /*
+      tr->patternPosition = (int*)malloc(sizeof(int) * rdta->sites);
+      tr->columnPosition  = (int*)malloc(sizeof(int) * rdta->sites);
+
+      for(i = 0; i < rdta->sites; i++)
+	{
+	  tr->patternPosition[i] = -1;
+	  tr->columnPosition[i]  = -1;
+	}
+      */
+    }
+
+  
+
+  i = 0;
+  for (j = 1; j <= rdta->sites; j++)
+    {
+      int 
+	allGap = TRUE;
+
+      unsigned char 
+	undetermined;
+
+      sitei = cdta->alias[i];
+      sitej = cdta->alias[j];      
+
+      undetermined = getUndetermined(tr->dataVector[sitej]);
+      
+      for(k = 1; k <= rdta->numsp; k++)
+	{	 
+	  if(rdta->y[k][sitej] != undetermined)
+	    {
+	      allGap = FALSE;
+	      break;
+	    }
+	}
+      
+      if(allGap)      
+      	undeterminedSites++;	 
+
+#ifdef _DEBUG_UNDET_REMOVAL
+      if(allGap)
+	printf("Skipping gap site %d\n", sitej);
+#endif
+          
+      if(!adef->compressPatterns)
+	tied = 0;
+      else
+	{
+	  if(adef->useMultipleModel)
+	    {	     
+	      tied = (tr->model[sitei] == tr->model[sitej]);
+	      if(tied)
+		assert(tr->dataVector[sitei] == tr->dataVector[sitej]);
+	    }
+	  else
+	    tied = 1;
+	}
+      
+      for (k = 1; tied && (k <= rdta->numsp); k++)
+	tied = (rdta->y[k][sitei] == rdta->y[k][sitej]);
+	      
+      assert(!(tied && allGap));
+
+      if(tied && !allGap)
+	{
+	  if(adef->mode == PER_SITE_LL)
+	    {
+	      tr->patternPosition[j - 1] = i;
+	      tr->columnPosition[j - 1] = sitej;
+	      /*printf("Pattern %d from column %d also at site %d\n", i, sitei, sitej);*/
+	    }
+
+
+	  cdta->aliaswgt[i] += rdta->wgt[sitej];
+	  if(adef->useMultipleModel)
+	    {
+	      aliasModel[i]      = tr->model[sitej];
+	      aliasSuperModel[i] = tr->dataVector[sitej];
+	    }
+	}
+      else
+	{
+	  if(!allGap)
+	    {
+	      if(cdta->aliaswgt[i] > 0) 
+		i++;
+	      
+	      if(adef->mode == PER_SITE_LL)
+		{
+		  tr->patternPosition[j - 1] = i;
+		  tr->columnPosition[j - 1] = sitej;
+		  /*printf("Pattern %d is from cloumn %d\n", i, sitej);*/
+		}
+	      
+	      cdta->aliaswgt[i] = rdta->wgt[sitej];
+	      cdta->alias[i] = sitej;
+	      if(adef->useMultipleModel)
+		{
+		  aliasModel[i]      = tr->model[sitej];
+		  aliasSuperModel[i] = tr->dataVector[sitej];
+		}
+	    }	  
+	}
+    }
+
+  cdta->endsite = (size_t)i;
+  if (cdta->aliaswgt[i] > 0) 
+    cdta->endsite++;
+
+#ifdef _DEBUG_UNDET_REMOVAL
+  printf("included sites: %d\n", cdta->endsite);
+#endif
+
+  if(adef->mode == PER_SITE_LL)
+    {
+      assert(0);
+
+      for(i = 0; i < rdta->sites; i++)
+	{
+	  int p  = tr->patternPosition[i];
+	  int c  = tr->columnPosition[i];
+
+	  assert(p >= 0 && (size_t) p < cdta->endsite);
+	  assert(c >= 1 && c <= rdta->sites);
+	}
+    }
+
+
+  if(adef->useMultipleModel)
+    {
+      for(i = 0; i <= rdta->sites; i++)
+	{
+	  tr->model[i]      = aliasModel[i];
+	  tr->dataVector[i] = aliasSuperModel[i];	  
+	}
+    }
+
+  if(adef->useMultipleModel)
+    {
+      free(aliasModel);
+      free(aliasSuperModel);
+    }     
+
+  if(undeterminedSites > 0)    
+    printBothOpen("\nAlignment has %d completely undetermined sites that will be automatically removed from the binary alignment file\n\n", undeterminedSites);
+}
+
+
+static boolean makeweights (analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr)
+{
+  int  i;
+
+ 
+    
+  for (i = 1; i <= rdta->sites; i++)
+    cdta->alias[i] = i;
+
+  sitesort(rdta, cdta, tr, adef);
+  sitecombcrunch(rdta, cdta, tr, adef);  
+
+  return TRUE;
+}
+
+
+
+static boolean makevalues(rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+  int  
+    i, 
+    model, 
+    modelCounter;
+  
+  size_t 
+    j;
+
+  unsigned char
+    *y    = (unsigned char *)malloc(((size_t)rdta->numsp) * ((size_t)cdta->endsite) * sizeof(unsigned char));
+  
+
+  /*
+
+  printf("compressed data Assigning %Zu bytes\n", ((size_t)rdta->numsp) * ((size_t)cdta->endsite) * sizeof(unsigned char));
+
+  */
+  
+  
+    {
+      for (i = 1; i <= rdta->numsp; i++)
+	for (j = 0; j < cdta->endsite; j++)   
+	  y[(((size_t)(i - 1)) * ((size_t)cdta->endsite)) + j] = rdta->y[i][cdta->alias[j]];
+      
+      /*
+	printf("Free on raw data\n");
+      */
+
+      free(rdta->y0);
+      free(rdta->y);
+      
+    }
+
+  rdta->y0 = y;
+ 
+  if(!adef->useMultipleModel)
+    tr->NumberOfModels = 1;
+
+#ifdef _DEBUG_UNDET_REMOVAL
+  for(i = 0; i < cdta->endsite; i++)
+    printf("%d ", tr->model[i]);
+
+  printf("\n");
+#endif
+
+  if(adef->useMultipleModel)
+    {
+      tr->partitionData[0].lower = 0;
+
+      model        = tr->model[0];
+      modelCounter = 0;
+     
+      i            = 1;
+
+      while((size_t) i <  cdta->endsite)
+	{
+	  if(tr->model[i] != model)
+	    {
+	      tr->partitionData[modelCounter].upper     = (size_t)i;
+	      tr->partitionData[modelCounter + 1].lower = (size_t)i;
+
+	      model = tr->model[i];	     
+	      modelCounter++;
+	    }
+	  i++;
+	}
+
+      if(modelCounter <  tr->NumberOfModels - 1)
+	{
+	  printf("\nYou specified %d partitions, but after parsing and pre-processing ExaML only found %d partitions\n", tr->NumberOfModels, modelCounter + 1);
+	  printf("Presumably one or more partitions vanished because they consisted entirely of undetermined characters.\n");
+	  printf("Please fix your data!\n\n");
+	  exit(-1);
+	}
+
+
+      tr->partitionData[tr->NumberOfModels - 1].upper = (size_t)cdta->endsite;      
+    
+      for(i = 0; i < tr->NumberOfModels; i++)		  
+	tr->partitionData[i].width      = tr->partitionData[i].upper -  tr->partitionData[i].lower;
+	 
+      model        = tr->model[0];
+      modelCounter = 0;
+      tr->model[0] = modelCounter;
+      i            = 1;
+	
+      while((size_t) i < cdta->endsite)
+	{	 
+	  if(tr->model[i] != model)
+	    {
+	      model = tr->model[i];
+	      modelCounter++;
+	      tr->model[i] = modelCounter;
+	    }
+	  else
+	    tr->model[i] = modelCounter;
+	  i++;
+	}      
+    }
+  else
+    {
+      tr->partitionData[0].lower = 0;
+      tr->partitionData[0].upper = (size_t)cdta->endsite;
+      tr->partitionData[0].width =  tr->partitionData[0].upper -  tr->partitionData[0].lower;
+    }
+
+  tr->rdta       = rdta;
+  tr->cdta       = cdta; 
+
+  tr->originalCrunchedLength = tr->cdta->endsite;
+    
+  for(i = 0; i < rdta->numsp; i++)
+    tr->yVector[i + 1] = &(rdta->y0[(tr->originalCrunchedLength) * ((size_t)i)]);
+
+  return TRUE;
+}
+
+
+
+static void initAdef(analdef *adef)
+{  
+  adef->useSecondaryStructure  = FALSE;
+  adef->bootstrapBranchLengths = FALSE;
+  adef->model                  = M_GTRCAT;
+  adef->max_rearrange          = 21;
+  adef->stepwidth              = 5;
+  adef->initial                = adef->bestTrav = 10;
+  adef->initialSet             = FALSE;
+  adef->restart                = FALSE;
+  adef->mode                   = BIG_RAPID_MODE;
+  adef->categories             = 25;
+  adef->boot                   = 0;
+  adef->rapidBoot              = 0;
+  adef->useWeightFile          = FALSE;
+  adef->checkpoints            = 0;
+  adef->startingTreeOnly       = 0;
+  adef->multipleRuns           = 1;
+  adef->useMultipleModel       = FALSE;
+  adef->likelihoodEpsilon      = 0.1;
+  adef->constraint             = FALSE;
+  adef->grouping               = FALSE;
+  adef->randomStartingTree     = FALSE;
+  adef->parsimonySeed          = 0;
+  adef->proteinMatrix          = JTT;
+  adef->protEmpiricalFreqs     = 0;  
+  adef->useInvariant           = FALSE;
+  adef->permuteTreeoptimize    = FALSE;
+  adef->useInvariant           = FALSE;
+  adef->allInOne               = FALSE;
+  adef->likelihoodTest         = FALSE;
+  adef->perGeneBranchLengths   = FALSE;
+  adef->generateBS             = FALSE;
+  adef->bootStopping           = FALSE;
+  adef->gapyness               = 0.0;
+  adef->similarityFilterMode   = 0;
+  adef->useExcludeFile         = FALSE;
+  adef->userProteinModel       = FALSE;
+  adef->externalAAMatrix       = (double*)NULL;
+  adef->computeELW             = FALSE;
+  adef->computeDistance        = FALSE;
+  adef->thoroughInsertion      = FALSE;
+  adef->compressPatterns       = TRUE; 
+  adef->readTaxaOnly           = FALSE;
+  adef->meshSearch             = 0;
+  adef->useCheckpoint          = FALSE;
+  adef->leaveDropMode          = FALSE;
+  adef->slidingWindowSize      = 100;
+#ifdef _BAYESIAN 
+  adef->bayesian               = FALSE;
+#endif
+
+}
+
+
+
+
+static int dataExists(char *model, analdef *adef)
+{
+  /********** BINARY ********************/
+
+   if(strcmp(model, "BIN\0") == 0)
+    {
+      adef->model = M_BINGAMMA;      
+      return 1;
+    }  
+
+  /*********** DNA **********************/
+
+  if(strcmp(model, "DNA\0") == 0)
+    {
+      adef->model = M_GTRGAMMA;     
+      return 1;
+    }
+
+  /*************** AA GTR ********************/
+
+  if(strcmp(model, "PROT\0") == 0)
+    {
+      adef->model = M_PROTGAMMA;     
+      return 1;
+    } 
+
+  return 0;
+}
+
+/*********************************************************************************************/
+
+static void printVersionInfo(void)
+{
+  printf("\n\nThis is the parse-examl version %s released by Alexandros Stamatakis, Andre J. Aberer, and Alexey Kozlov in %s.\n\n",  programVersion, programDate); 
+}
+
+static void printREADME(void)
+{
+  printVersionInfo();
+  printf("\n");  
+  printf("\nTo report bugs use the RAxML google group\n");
+  printf("Please send us all input files, the exact invocation, details of the HW and operating system,\n");
+  printf("as well as all error messages printed to screen.\n\n\n");
+
+  printf("parse-examl\n");
+  printf("      -s sequenceFileName\n");
+  printf("      -n outputFileName\n");
+  printf("      -m substitutionModel\n");
+  printf("      [-c]\n");
+  printf("      [-q]\n");
+  printf("      [-h]\n");
+  printf("\n"); 
+  printf("      -m Model of  Nucleotide or Amino Acid Substitution:\n");
+  printf("\n"); 
+  printf("              For Binary data use: BIN\n");
+  printf("              For DNA data use:    DNA\n");	
+  printf("              For AA data use:     PROT\n");			   
+  printf("\n"); 
+  printf("      -c      disable site pattern compression\n");
+  printf("\n");
+  printf("      -q      Specify the file name which contains the assignment of models to alignment\n");
+  printf("              partitions for multiple models of substitution. For the syntax of this file\n");
+  printf("              please consult the manual.\n");  
+  printf("\n");
+  printf("      -h      Display this help message.\n");
+  printf("\n");
+  printf("\n\n\n\n");
+
+}
+
+static int mygetopt(int argc, char **argv, char *opts, int *optind, char **optarg)
+{
+  static int sp = 1;
+  register int c;
+  register char *cp;
+
+  if(sp == 1)
+    {
+      if(*optind >= argc || argv[*optind][0] != '-' || argv[*optind][1] == '\0')
+        {
+	  return -1;
+        }
+    }
+  else
+    {
+      if(strcmp(argv[*optind], "--") == 0)
+	{
+	  *optind =  *optind + 1;
+	  return -1;
+	}
+    }
+
+  c = argv[*optind][sp];
+  if(c == ':' || (cp=strchr(opts, c)) == 0)
+    {
+      if(argv[*optind][++sp] == '\0')
+	{
+	  *optind =  *optind + 1;
+	  sp = 1;
+	}
+      printf("\n Error: illegal option -- %c\n\n", c);
+      return('?');
+    }
+  if(*++cp == ':')
+    {
+      if(argv[*optind][sp+1] != '\0')
+	{
+	  *optarg = &argv[*optind][sp+1];
+	  *optind =  *optind + 1;
+	}
+      else
+	{
+	  *optind =  *optind + 1;
+	  if(*optind >= argc)
+	    {
+	      if ( c != 'h')	
+                {
+	          sp = 1;
+                  printf("\n Error: option -- %c requires an argument\n\n", c);
+	          return('?');
+                }
+               else
+                  return ( c );
+	    }
+	  else
+	    {
+	      *optarg = argv[*optind];
+	      *optind =  *optind + 1;
+	    }
+	}
+      sp = 1;
+    }
+  else
+    {
+      if(argv[*optind][++sp] == '\0')
+	{
+	  sp = 1;
+	  *optind =  *optind + 1;
+	}
+      *optarg = 0;
+    }
+  return(c);
+  }
+
+
+/*********************************************************************************************/
+
+
+
+
+
+
+
+
+
+static void analyzeRunId(char id[128])
+{
+  int i = 0;
+
+  while(id[i] != '\0')
+    {    
+      if(i >= 128)
+	{
+	  printf("\n Error: run id after \"-n\" is too long, it has %d characters please use a shorter one\n\n", i);
+	  assert(0);
+	}
+      
+      if(id[i] == '/')
+	{
+	  printf("\n Error character %c not allowed in run ID\n\n", id[i]);
+	  assert(0);
+	}
+
+
+      i++;
+    }
+
+  if(i == 0)
+    {
+      printf("\n Error: please provide a string for the run id after \"-n\" \n\n");
+      assert(0);
+    }
+
+}
+
+
+static void get_args(int argc, char *argv[], analdef *adef, tree *tr)
+{
+  boolean
+    bad_opt    =FALSE;
+
+  char
+    *optarg = (char*)NULL,
+    model[2048] = "";
+  
+  int  
+    optind = 1,        
+    c,
+    nameSet = 0,
+    alignmentSet = 0,     
+    modelSet = 0;
+
+
+  run_id[0] = 0; 
+  seq_file[0] = 0;
+  model[0] = 0;
+  weightFileName[0] = 0;
+  modelFileName[0] = 0;
+
+  /*********** tr inits **************/
+
+#ifdef _USE_PTHREADS
+  NumberOfThreads = 0;
+#endif
+  
+ 
+  tr->bootStopCriterion = -1;
+  tr->wcThreshold = 0.03;
+  tr->doCutoff = TRUE;
+  tr->secondaryStructureModel = SEC_16; /* default setting */
+  tr->searchConvergenceCriterion = FALSE;
+  tr->catOnly = FALSE;
+ 
+  tr->multiStateModel  = GTR_MULTI_STATE;
+  tr->useGappedImplementation = FALSE;
+  tr->saveMemory = FALSE;
+  
+
+
+  
+  /********* tr inits end*************/
+
+
+    while( !bad_opt && ( ( c = mygetopt(argc,argv,"q:s:n:m:hc", &optind, &optarg ) ) != -1 ) )
+    {
+    switch(c)
+      {                
+      case 'c':
+	adef->compressPatterns = FALSE;
+	break;
+      case 'h':
+        printREADME();
+	errorExit(-1);
+        break;                 
+      case 'q':
+	strcpy(modelFileName,optarg);
+	adef->useMultipleModel = TRUE;
+        break;                 
+      case 'n':
+        strcpy(run_id,optarg);
+	analyzeRunId(run_id);
+	nameSet = 1;
+        break;     
+      case 's':
+	strcpy(seq_file, optarg);
+	alignmentSet = 1;
+	break;
+      case 'm':
+	strcpy(model,optarg);
+	if(dataExists(model, adef) == 0)
+	  {
+	    printf("\n Error: model %s does not exist\n\n", model);               
+	    errorExit(-1);
+	  }
+	else
+	  modelSet = 1;
+	break; 
+      default:
+	errorExit(-1);
+    }
+  }  
+
+  if(!adef->useMultipleModel && !modelSet)
+    {
+      if(processID == 0)
+        {
+          printREADME();	    
+	  printf("\n Error, you must specify a data type for unpartitioned alignment with the \"-m\" option\n\n");
+        }
+      errorExit(-1);
+    }
+
+  if(!nameSet)
+    {
+      if(processID == 0)
+        {
+          printREADME();	    
+	  printf("\n Error: please specify a name for this run with -n\n\n");
+        }
+      errorExit(-1);
+    }
+
+
+  if(!alignmentSet)
+    {
+      if(processID == 0)
+        {
+          printREADME();	    
+	  printf("\n Error: please specify an alignment for this run with -s\n\n");
+        } 
+      errorExit(-1);
+    }
+  
+  
+   strcat(infoFileName,         "RAxML_info."); 
+   strcat(infoFileName,         run_id);
+  
+   if(processID == 0)
+     {
+       int infoFileExists = 0;
+       
+       infoFileExists = filexists(infoFileName);
+       
+       if(infoFileExists)
+	 {
+	   printf("\n Error: output files with the run ID <%s> already exist... exiting\n\n", run_id);
+	   exit(-1);
+	 }
+     }
+
+  strcat(byteFileName, run_id);
+  strcat(byteFileName, ".binary");
+  
+  if(filexists(byteFileName))
+    {
+      printf("\n Error: binary compressed file %s you want to generate already exists... exiting\n\n", byteFileName);
+      exit(0);
+    }
+
+  byteFile = fopen(byteFileName, "wb");
+
+  if ( !byteFile )  
+    printf("%s\n", byteFileName);
+
+  return;
+}
+
+
+
+
+void errorExit(int e)
+{
+
+#ifdef _WAYNE_MPI
+  MPI_Finalize();
+#endif
+
+  exit(e);
+
+}
+
+
+
+
+
+ 
+
+
+
+
+/***********************reading and initializing input ******************/
+
+
+/********************PRINTING various INFO **************************************/
+
+
+
+
+
+void getDataTypeString(tree *tr, int model, char typeOfData[1024])
+{
+  switch(tr->partitionData[model].dataType)
+    {
+    case AA_DATA:
+      strcpy(typeOfData,"AA");
+      break;
+    case DNA_DATA:
+      strcpy(typeOfData,"DNA");
+      break;
+    case BINARY_DATA:
+      strcpy(typeOfData,"BINARY/MORPHOLOGICAL");
+      break;
+    case SECONDARY_DATA:
+      strcpy(typeOfData,"SECONDARY 16 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case SECONDARY_DATA_6:
+      strcpy(typeOfData,"SECONDARY 6 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case SECONDARY_DATA_7:
+      strcpy(typeOfData,"SECONDARY 7 STATE MODEL USING ");
+      strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+      break;
+    case GENERIC_32:
+      strcpy(typeOfData,"Multi-State");
+      break;
+    case GENERIC_64:
+      strcpy(typeOfData,"Codon"); 
+      break;
+    default:
+      assert(0);
+    }
+}
+
+
+
+
+
+/************************************************************************************/
+
+
+
+
+
+  
+
+
+
+static int iterated_bitcount(unsigned int n)
+{
+    int 
+      count=0;    
+    
+    while(n)
+      {
+        count += n & 0x1u ;    
+        n >>= 1 ;
+      }
+    
+    return count;
+}
+
+static char bits_in_16bits [0x1u << 16];
+
+static void compute_bits_in_16bits(void)
+{
+    unsigned int i;    
+    
+    for (i = 0; i < (0x1u<<16); i++)
+        bits_in_16bits[i] = iterated_bitcount(i);
+    
+    return ;
+}
+
+unsigned int precomputed16_bitcount (unsigned int n)
+{
+  /* works only for 32-bit int*/
+    
+    return bits_in_16bits [n         & 0xffffu]
+        +  bits_in_16bits [(n >> 16) & 0xffffu] ;
+}
+
+
+
+
+
+
+
+static void smoothFreqs(const int n, double *pfreqs, double *dst, pInfo *partitionData)
+{
+  int 
+    countScale = 0, 
+    l,
+    loopCounter = 0;  
+  
+
+ 
+  for(l = 0; l < n; l++)
+    if(pfreqs[l] < FREQ_MIN)
+      countScale++;
+ 
+
+  /* for(l = 0; l < n; l++)
+    if(pfreqs[l] == 0.0)
+    countScale++;*/
+
+  if(countScale > 0)
+    {	     
+      while(countScale > 0)
+	{
+	  double correction = 0.0;
+	  double factor = 1.0;
+	  
+	  for(l = 0; l < n; l++)
+	    {
+	      if(pfreqs[l] == 0.0)		  
+		correction += FREQ_MIN;		   		  
+	      else
+		if(pfreqs[l] < FREQ_MIN)		    
+		  {
+		    correction += (FREQ_MIN - pfreqs[l]);
+		    factor -= (FREQ_MIN - pfreqs[l]);
+		  }
+	    }		      	    	    
+	  
+	  countScale = 0;
+	  
+	  for(l = 0; l < n; l++)
+	    {		    
+	      if(pfreqs[l] >= FREQ_MIN)		      
+		pfreqs[l] = pfreqs[l] - (pfreqs[l] * correction * factor);	
+	      else
+		pfreqs[l] = FREQ_MIN;
+	      
+	      if(pfreqs[l] < FREQ_MIN)
+		countScale++;
+	    }
+	  assert(loopCounter < 100);
+	  loopCounter++;
+	}		    
+    }
+
+  for(l = 0; l < n; l++)
+    dst[l] = pfreqs[l];
+
+  
+  if(partitionData->nonGTR)
+    {
+      int k;
+
+      assert(partitionData->dataType == SECONDARY_DATA_7 || partitionData->dataType == SECONDARY_DATA_6 || partitionData->dataType == SECONDARY_DATA);
+       
+      for(l = 0; l < n; l++)
+	{
+	  int count = 1;	
+	  
+	  for(k = 0; k < n; k++)
+	    {
+	      if(k != l && partitionData->frequencyGrouping[l] == partitionData->frequencyGrouping[k])
+		{
+		  count++;
+		  dst[l] += pfreqs[k];
+		}
+	    }
+	  dst[l] /= ((double)count);
+	}            
+     }  
+}
+	    
+
+static void genericBaseFrequencies(tree *tr, const int numFreqs, rawdata *rdta, cruncheddata *cdta, int lower, int upper, int model, boolean smoothFrequencies,
+				   const unsigned int *bitMask)
+{
+  double 
+    wj, 
+    acc,
+    pfreqs[64], 
+    sumf[64],   
+    temp[64];
+ 
+  int     
+    countStatesPresent = 0,
+    statesPresent[64],
+    i, 
+    j, 
+    k, 
+    l;
+
+  unsigned char  
+    *yptr;  
+	  
+  for(l = 0; l < numFreqs; l++)	    
+    {
+      pfreqs[l] = 1.0 / ((double)numFreqs);
+      statesPresent[l] = 0;
+    }
+
+#ifdef _DEBUG_UNDET_REMOVAL	  
+  printf("bounds %d %d\n", lower, upper);
+
+  for(j = lower; j < upper; j++) 
+    {
+      for(i = 0; i < rdta->numsp; i++)
+	{
+	  unsigned int 
+	    code;
+
+	  yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+	  
+	  code = yptr[j];
+
+	  printf("%c",  inverseMeaningDNA[code]);
+	}
+      printf("\n");
+    }
+  
+  printf("\n\n");
+#endif
+
+  for(i = 0; i < rdta->numsp; i++) 
+    {
+      yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+      
+      for(j = lower; j < upper; j++) 
+	{
+	  unsigned int 	      
+	    code = bitMask[yptr[j]];
+	  
+	  switch(numFreqs)
+	    {
+	    case 2:
+	      switch(code)
+		{
+		case 1:
+		  statesPresent[0] = 1;
+		  break;
+		case 2:
+		  statesPresent[1] = 1;
+		  break;
+		default:
+		  ;
+		}
+	      break;
+	    case 4:
+	      switch(code)
+		{
+		case 1:
+		  statesPresent[0] = 1;
+		  break;
+		case 2:
+		  statesPresent[1] = 1;
+		  break;
+		case 4:
+		  statesPresent[2] = 1;
+		  break;
+		case 8:
+		  statesPresent[3] = 1;
+		  break;
+		default:
+		  ;
+		}
+	      break;	       
+	    case 20:
+	      if(yptr[j] >= 0 && yptr[j] < 20)
+		statesPresent[yptr[j]] = 1;
+	      break;
+	    default:
+	      assert(0);
+	    }
+	}
+    }
+	      
+  for(i = 0, countStatesPresent = 0; i < numFreqs; i++)
+    if(statesPresent[i] == 1)
+      countStatesPresent++;
+
+  for (k = 1; k <= 8; k++) 
+    {	     	   	    	      			    
+      for(l = 0; l < numFreqs; l++)
+	sumf[l] = 0.0;
+	      
+      for(i = 0; i < rdta->numsp; i++) 
+	{		 
+	  yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+	  
+	  for(j = lower; j < upper; j++) 
+	    {
+	      unsigned int 
+		code = bitMask[yptr[j]];
+	      
+	      assert(code >= 1);
+	      
+	      for(l = 0; l < numFreqs; l++)
+		{
+		  if((code >> l) & 1)
+		    temp[l] = pfreqs[l];
+		  else
+		    temp[l] = 0.0;
+		}		      	      
+	      
+	      for(l = 0, acc = 0.0; l < numFreqs; l++)
+		{
+		  if(temp[l] != 0.0)
+		    acc += temp[l];
+		}
+	      
+	      wj = ((double)cdta->aliaswgt[j]) / acc;
+	      
+	      for(l = 0; l < numFreqs; l++)
+		{
+		  if(temp[l] != 0.0)		    
+		    sumf[l] += wj * temp[l];			     				   			     		   
+		}
+	    }
+	}	    	      
+      
+      for(l = 0, acc = 0.0; l < numFreqs; l++)
+	{
+	  if(sumf[l] != 0.0)
+	    acc += sumf[l];
+	}
+	      
+      for(l = 0; l < numFreqs; l++)
+	pfreqs[l] = sumf[l] / acc;	     
+    }
+  
+  if(countStatesPresent < numFreqs)
+    {
+      printf("Partition %s number %d has a problem, the number of expected states is %d the number of states that are present is %d.\n", 
+	     tr->partitionData[model].partitionName, model, numFreqs, countStatesPresent);
+      printf("Please go and fix your data!\n\n");
+    }
+
+  if(smoothFrequencies)         
+    {          
+      smoothFreqs(numFreqs, pfreqs,  tr->partitionData[model].frequencies, &(tr->partitionData[model]));	   
+    }
+  else    
+    {
+      boolean 
+	zeroFreq = FALSE;
+
+      char 
+	typeOfData[1024];
+
+      getDataTypeString(tr, model, typeOfData);  
+
+      for(l = 0; l < numFreqs; l++)
+	{
+	  if(pfreqs[l] == 0.0)
+	    {
+	      printBothOpen("Empirical base frequency for state number %d is equal to zero in %s data partition %s\n", l, typeOfData, tr->partitionData[model].partitionName);
+	      printBothOpen("Since this is probably not what you want to do, RAxML will soon exit.\n\n");
+	      zeroFreq = TRUE;
+	    }
+	}
+      
+      if(zeroFreq)
+      	exit(-1);
+
+      for(l = 0; l < numFreqs; l++)
+	{
+	  assert(pfreqs[l] > 0.0);
+	  tr->partitionData[model].frequencies[l] = pfreqs[l];
+	}   
+    }  
+}
+
+
+
+
+
+
+
+static void baseFrequenciesGTR(rawdata *rdta, cruncheddata *cdta, tree *tr)
+{  
+  int
+    model;
+
+  size_t
+    lower,
+    upper;
+  
+  int
+    states;
+
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {      
+      lower = tr->partitionData[model].lower;
+      upper = tr->partitionData[model].upper;	  	 
+      states = tr->partitionData[model].states;
+	
+      switch(tr->partitionData[model].dataType)
+	{
+	case GENERIC_32:
+	  switch(tr->multiStateModel)
+	    {
+	    case ORDERED_MULTI_STATE:
+	    case MK_MULTI_STATE:	   
+	      {	       
+		int 
+		  i;
+		double 
+		  freq = 1.0 / (double)states,
+		  acc = 0.0;
+
+		for(i = 0; i < states; i++)
+		  {
+		    acc += freq;
+		    tr->partitionData[model].frequencies[i] = freq;
+		    /*printf("%f \n", freq);*/
+		  }
+		/*printf("Frequency Deviation: %1.60f\n", acc);*/
+	      }
+	      break;
+	     case GTR_MULTI_STATE:
+	      genericBaseFrequencies(tr, states, rdta, cdta, lower, upper, model, TRUE,
+				     bitVector32);
+	      break;
+	    default:
+	      assert(0);
+	    }
+	  break;
+	case GENERIC_64:	 
+	  assert(0);
+	  break;
+	case SECONDARY_DATA_6:
+	case SECONDARY_DATA_7:
+	case SECONDARY_DATA:
+	case AA_DATA:
+	case DNA_DATA:
+	case BINARY_DATA:	  
+	  genericBaseFrequencies(tr, states, rdta, cdta, lower, upper, model, 
+				 getSmoothFreqs(tr->partitionData[model].dataType),
+				 getBitVector(tr->partitionData[model].dataType));	  	 
+	  break;	
+	default:
+	  assert(0);     
+	}      
+    }
+  
+  return;
+}
+
+ // #define OLD_LAYOUT 
+
+int main (int argc, char *argv[])
+{
+  int model;
+
+  rawdata      *rdta;
+  cruncheddata *cdta;
+  tree         *tr;
+  analdef      *adef;
+  
+  /* get the start time */
+
+  masterTime = gettime();
+
+  /* get some memory for the basic data structures */
+
+  adef = (analdef *)malloc(sizeof(analdef));
+  rdta = (rawdata *)malloc(sizeof(rawdata));
+  cdta = (cruncheddata *)malloc(sizeof(cruncheddata));
+  tr   = (tree *)malloc(sizeof(tree));
+
+
+  /* the initialization below is required for the hash tables that are used */
+
+  compute_bits_in_16bits();
+
+  /* initialize the analysis parameters in struct adef to default values */
+
+  initAdef(adef);
+
+  /* parse command line arguments: this has a side effect on tr struct and adef struct variables */
+
+  get_args(argc,argv, adef, tr); 
+            
+  /* parse the phylip file: this should probably be re-done, perhaps using the relatively flexible parser 
+     written in C++ by Marc Holder */
+  
+  getinput(adef, rdta, cdta, tr);  
+
+  printBothOpen("Pattern compression: %s\n", (adef->compressPatterns)?"ON":"OFF");
+
+  makeweights(adef, rdta, cdta, tr);         
+      
+  makevalues(rdta, cdta, tr, adef);                 
+                   
+                  
+  for(model = 0; model < tr->NumberOfModels; model++)
+    {	      
+      tr->partitionData[model].states = getStates(tr->partitionData[model].dataType);
+      tr->partitionData[model].maxTipStates = getUndetermined(tr->partitionData[model].dataType) + 1;  	      
+      tr->partitionData[model].nonGTR = FALSE;
+      
+      partitionLengths 
+	*pl = getPartitionLengths(&(tr->partitionData[model]));
+      
+      tr->partitionData[model].frequencies       = (double*)malloc(pl->frequenciesLength * sizeof(double));      
+    }   
+  
+  baseFrequenciesGTR(tr->rdta, tr->cdta, tr); 
+  
+ 
+
+ 
+
+  {
+    int 
+      sizeOfSizeT = sizeof(size_t),
+      version = (int)programVersionInt,
+      magicNumber = 6517718;
+    
+    size_t 
+      i,
+      model;   
+    
+    /* NEW, we firstly write, how many bytes size_t comprises */
+    
+    myBinFwrite(&(sizeOfSizeT),                sizeof(sizeOfSizeT), 1); 
+
+    //error checking for parser!
+    myBinFwrite(&version,     sizeof(int), 1);
+    myBinFwrite(&magicNumber, sizeof(int), 1);
+    //error checking for correct parser end
+
+    myBinFwrite(&(tr->mxtips),                 sizeof(int), 1);
+    myBinFwrite(&(tr->originalCrunchedLength), sizeof(size_t), 1);
+    myBinFwrite(&(tr->NumberOfModels),         sizeof(int), 1);
+    myBinFwrite(&(adef->gapyness),             sizeof(double), 1);
+    
+    myBinFwrite(tr->cdta->aliaswgt,               sizeof(int), tr->originalCrunchedLength);	  	  	       	
+	
+    for(i = 1; i <= (size_t)tr->mxtips; i++)
+      {
+	int len = strlen(tr->nameList[i]) + 1;
+	myBinFwrite(&len, sizeof(int), 1);
+	myBinFwrite(tr->nameList[i], sizeof(char), len);	
+      }  
+    
+    for(model = 0; model < (size_t)tr->NumberOfModels; model++)
+      {
+	int 
+	  len;
+	
+	pInfo 
+	  *p = &(tr->partitionData[model]);
+	
+	
+	myBinFwrite(&(p->states),             sizeof(int), 1);
+	myBinFwrite(&(p->maxTipStates),       sizeof(int), 1);
+	myBinFwrite(&(p->lower),              sizeof(size_t), 1);
+	myBinFwrite(&(p->upper),              sizeof(size_t), 1);
+	myBinFwrite(&(p->width),              sizeof(size_t), 1);
+	myBinFwrite(&(p->dataType),           sizeof(int), 1);
+	myBinFwrite(&(p->protModels),         sizeof(int), 1);	
+	myBinFwrite(&(p->protFreqs),          sizeof(int), 1);	
+	myBinFwrite(&(p->nonGTR),                      sizeof(boolean), 1); 	
+	myBinFwrite(&(p->optimizeBaseFrequencies),     sizeof(boolean), 1);
+	
+	
+	
+	/* later on if adding secondary structure data
+	   
+	   int    *symmetryVector;
+	   int    *frequencyGrouping;
+	*/
+	
+	len = strlen(p->partitionName) + 1;
+	myBinFwrite(&len, sizeof(int), 1);
+	myBinFwrite(p->partitionName, sizeof(char), len);	    
+	myBinFwrite(tr->partitionData[model].frequencies, sizeof(double), tr->partitionData[model].states);
+
+
+	
+
+      }	            
+
+#ifdef OLD_LAYOUT
+    myBinFwrite(rdta->y0, sizeof(unsigned char), (tr->originalCrunchedLength) * ((size_t)tr->mxtips)); 
+#else 
+    /* 
+       Write each partition, taxon by taxon. Thus, if unpartitioned,
+       nothing changes.
+    */   
+
+    size_t
+      mem_reqs_cat = 0,
+      mem_reqs_gamma = 0,
+      unique_patterns = 0;
+
+    for(model = 0; model < (size_t) tr->NumberOfModels; ++model )
+      {
+        pInfo
+          *p  = &(tr->partitionData[model]); 
+	
+        size_t 
+          width = p->upper - p->lower; 
+
+	unique_patterns += width;
+
+	//multiply partition width with number of states we need to store in each CLV entry
+
+	mem_reqs_cat += (size_t)tr->partitionData[model].states * width;	
+
+        for(i = 0; i < (size_t)tr->mxtips; ++i)
+          {
+            myBinFwrite(rdta->y0
+                        + sizeof(unsigned char) * (  (i *  tr->originalCrunchedLength)  + p->lower   ) 
+                        , sizeof(unsigned char), width); 
+          }
+      }
+
+    printBothOpen("\n\nYour alignment has %zu %s\n", unique_patterns, (adef->compressPatterns == TRUE)?"unique patterns":"sites");
+
+    //multiply CLV vector length with number of tips and 8, since b bytes are needed to store an inner conditional probability vector
+    mem_reqs_cat *= (size_t)tr->mxtips * sizeof(double);
+
+    //mem reqs for gamma are 4 times higher than for CAT
+    mem_reqs_gamma = mem_reqs_cat * 4;
+
+    //now add the space for storing the tips:
+
+    mem_reqs_cat   += (size_t)tr->mxtips * unique_patterns * sizeof(unsigned char);
+    mem_reqs_gamma += (size_t)tr->mxtips * unique_patterns * sizeof(unsigned char);
+     
+    printBothOpen("\n\nUnder CAT the memory required by ExaML for storing CLVs and tip vectors will be\n%zu bytes\n%zu kiloBytes\n%zu MegaBytes\n%zu GigaBytes\n", 
+		  mem_reqs_cat, 
+		  mem_reqs_cat / 1024 , 
+		  mem_reqs_cat / (1024 * 1024),
+		  mem_reqs_cat / (1024 * 1024 * 1024));
+    
+    printBothOpen("\n\nUnder GAMMA the memory required by ExaML for storing CLVs and tip vectors will be\n%zu bytes\n%zu kiloBytes\n%zu MegaBytes\n%zu GigaBytes\n", 
+		  mem_reqs_gamma, 
+		  mem_reqs_gamma / 1024 , 
+		  mem_reqs_gamma / (1024 * 1024),
+		  mem_reqs_gamma / (1024 * 1024 * 1024));
+
+    printBothOpen("\nPlease note that, these are just the memory requirements for doing likelihood calculations!\n");
+    printBothOpen("To be on the safe side, we recommend that you execute ExaML on a system with twice that memory.\n");
+
+#endif
+  }
+
+  fclose(byteFile);  
+
+  printBothOpen("\n\nBinary and compressed alignment file written to file %s\n\n", byteFileName);
+  printBothOpen("Parsing completed, exiting now ... \n\n");
+
+  return 0;
+}
diff --git a/parser/axml.h b/parser/axml.h
new file mode 100644
index 0000000..7a97136
--- /dev/null
+++ b/parser/axml.h
@@ -0,0 +1,1295 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ *  and
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ *
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses
+ *  with thousands of taxa and mixed models".
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include "../versionHeader/version.h"
+
+
+#ifdef __AVX
+#define BYTE_ALIGNMENT 32
+#else
+#define BYTE_ALIGNMENT 16
+#endif
+
+
+
+
+
+#define MAX_TIP_EV     0.999999999 /* max tip vector value, sum of EVs needs to be smaller than 1.0, otherwise the numerics break down */
+#define smoothings     32          /* maximum smoothing passes through tree */
+#define iterations     10          /* maximum iterations of iterations per insert */
+#define newzpercycle   1           /* iterations of makenewz per tree traversal */
+#define nmlngth        256         /* number of characters in species name */
+#define deltaz         0.00001     /* test of net branch length change in update */
+#define defaultz       0.9         /* value of z assigned as starting point */
+#define unlikely       -1.0E300    /* low likelihood for initialization */
+
+
+#define SUMMARIZE_LENGTH -3
+#define SUMMARIZE_LH     -2
+#define NO_BRANCHES      -1
+
+#define MASK_LENGTH 32
+#define GET_BITVECTOR_LENGTH(x) ((x % MASK_LENGTH) ? (x / MASK_LENGTH + 1) : (x / MASK_LENGTH))
+
+#define zmin       1.0E-15  /* max branch prop. to -log(zmin) (= 34) */
+#define zmax (1.0 - 1.0E-6) /* min branch prop. to 1.0-zmax (= 1.0E-6) */
+
+#define twotothe256  \
+  115792089237316195423570985008687907853269984665640564039457584007913129639936.0  
+                                                     /*  2**256 (exactly)  */
+
+#define minlikelihood  (1.0/twotothe256)
+#define minusminlikelihood -minlikelihood
+
+
+
+
+/* 18446744073709551616.0 */
+
+/*4294967296.0*/
+
+/* 18446744073709551616.0 */
+
+/*  2**64 (exactly)  */
+/* 4294967296 2**32 */
+
+#define badRear         -1
+
+//#define NUM_BRANCHES     1
+
+#define TRUE             1
+#define FALSE            0
+
+
+
+#define LIKELIHOOD_EPSILON 0.0000001
+
+#define AA_SCALE 10.0
+#define AA_SCALE_PLUS_EPSILON 10.001
+
+/* ALPHA_MIN is critical -> numerical instability, eg for 4 discrete rate cats                    */
+/* and alpha = 0.01 the lowest rate r_0 is                                                        */
+/* 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487             */
+/* which leads to numerical problems Table for alpha settings below:                              */
+/*                                                                                                */
+/* 0.010000 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487    */
+/* 0.010000 yielded nasty numerical bugs in at least one case !                                   */
+/* 0.020000 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875    */
+/* 0.030000 0.00000000000000000000476844846859006690412039180149775802624789852441798419292220    */
+/* 0.040000 0.00000000000000049522423236954066431210260930029681736928018820007024736185030633    */
+/* 0.050000 0.00000000000050625351310359203371872643495343928538368616365517027588794007897377    */
+/* 0.060000 0.00000000005134625283884191118711474021861409372524676086868566926568746566772461    */
+/* 0.070000 0.00000000139080650074206434685544624965062437960128249869740102440118789672851562    */
+/* 0.080000 0.00000001650681201563587066858709818343436959153791576682124286890029907226562500    */
+/* 0.090000 0.00000011301977332931251259273962858978301859735893231118097901344299316406250000    */
+/* 0.100000 0.00000052651925834844387815526344648331402709118265192955732345581054687500000000    */
+
+
+#define ALPHA_MIN    0.02
+#define ALPHA_MAX    1000.0
+
+#define RATE_MIN     0.0000001
+#define RATE_MAX     1000000.0
+
+#define INVAR_MIN    0.0001
+#define INVAR_MAX    0.9999
+
+#define TT_MIN       0.0000001
+#define TT_MAX       1000000.0
+
+#define FREQ_MIN     0.001
+
+/* 
+   previous values between 0.001 and 0.000001
+
+   TO AVOID NUMERICAL PROBLEMS WHEN FREQ == 0 IN PARTITIONED MODELS, ESPECIALLY WITH AA 
+   previous value of FREQ_MIN was: 0.000001, but this seemed to cause problems with some 
+   of the 7-state secondary structure models with some rather exotic small toy test datasets,
+   on the other hand 0.001 caused problems with some of the 16-state secondary structure models
+
+   For some reason the frequency settings seem to be repeatedly causing numerical problems
+   
+*/
+
+#define ITMAX 100
+
+
+
+#define SHFT(a,b,c,d)                (a)=(b);(b)=(c);(c)=(d);
+#define SIGN(a,b)                    ((b) > 0.0 ? fabs(a) : -fabs(a))
+
+#define ABS(x)    (((x)<0)   ?  (-(x)) : (x))
+#define MIN(x,y)  (((x)<(y)) ?    (x)  : (y))
+#define MAX(x,y)  (((x)>(y)) ?    (x)  : (y))
+#define NINT(x)   ((int) ((x)>0 ? ((x)+0.5) : ((x)-0.5)))
+
+#ifdef _USE_FPGA_LOG
+extern double log_approx (double input);
+#define LOG(x)  log_approx(x)
+#else
+#define LOG(x)  log(x)
+#endif
+
+
+#ifdef _USE_FPGA_EXP
+extern double exp_approx (double x);
+#define EXP(x)  exp_approx(x)
+#else
+#define EXP(x)  exp(x)
+#endif
+
+
+#define LOGF(x) logf(x)
+
+
+#define PointGamma(prob,alpha,beta)  PointChi2(prob,2.0*(alpha))/(2.0*(beta))
+
+//#define programName        "the phylip file parser for ExaML"
+//#define programVersion     "2.0.1"
+//#define programDate        "June 3 2014"
+
+
+#define  TREE_EVALUATION            0
+#define  BIG_RAPID_MODE             1
+#define  CALC_BIPARTITIONS          3
+#define  SPLIT_MULTI_GENE           4
+#define  CHECK_ALIGNMENT            5
+#define  PER_SITE_LL                6
+#define  PARSIMONY_ADDITION         7
+#define  CLASSIFY_ML                9
+#define  DISTANCE_MODE              11
+#define  GENERATE_BS                12
+#define  COMPUTE_ELW                13
+#define  BOOTSTOP_ONLY              14
+#define  COMPUTE_LHS                17
+#define  COMPUTE_BIPARTITION_CORRELATION 18
+#define  THOROUGH_PARSIMONY         19
+#define  COMPUTE_RF_DISTANCE        20
+#define  MORPH_CALIBRATOR           21
+#define  CONSENSUS_ONLY             22
+#define  MESH_TREE_SEARCH           23
+#define  FAST_SEARCH                24
+#define  MORPH_CALIBRATOR_PARSIMONY 25
+#define  SH_LIKE_SUPPORTS           28
+
+#define M_GTRCAT         1
+#define M_GTRGAMMA       2
+#define M_BINCAT         3
+#define M_BINGAMMA       4
+#define M_PROTCAT        5
+#define M_PROTGAMMA      6
+#define M_32CAT          7
+#define M_32GAMMA        8
+#define M_64CAT          9
+#define M_64GAMMA        10
+
+
+#define DAYHOFF    0
+#define DCMUT      1
+#define JTT        2
+#define MTREV      3
+#define WAG        4
+#define RTREV      5
+#define CPREV      6
+#define VT         7
+#define BLOSUM62   8
+#define MTMAM      9
+#define LG         10
+#define MTART      11
+#define MTZOA      12
+#define PMB        13
+#define HIVB       14
+#define HIVW       15
+#define JTTDCMUT   16
+#define FLU        17 
+#define STMTREV    18
+#define AUTO       19
+#define LG4M       20
+#define LG4X       21
+#define GTR        22  /* GTR always needs to be the last one */
+
+#define NUM_PROT_MODELS 23
+
+/* bipartition stuff */
+
+#define BIPARTITIONS_ALL       0
+#define GET_BIPARTITIONS_BEST  1
+#define DRAW_BIPARTITIONS_BEST 2
+#define BIPARTITIONS_BOOTSTOP  3
+#define BIPARTITIONS_RF  4
+
+
+
+/* bootstopping stuff */
+
+#define BOOTSTOP_PERMUTATIONS 100
+#define START_BSTOP_TEST      10
+
+#define FC_THRESHOLD          99
+#define FC_SPACING            50
+#define FC_LOWER              0.99
+#define FC_INIT               20
+
+#define FREQUENCY_STOP 0
+#define MR_STOP        1
+#define MRE_STOP       2
+#define MRE_IGN_STOP   3
+
+#define MR_CONSENSUS 0
+#define MRE_CONSENSUS 1
+#define STRICT_CONSENSUS 2
+
+
+
+/* bootstopping stuff end */
+
+
+#define TIP_TIP     0
+#define TIP_INNER   1
+#define INNER_INNER 2
+
+#define MIN_MODEL        -1
+#define BINARY_DATA      0
+#define DNA_DATA         1
+#define AA_DATA          2
+#define SECONDARY_DATA   3
+#define SECONDARY_DATA_6 4
+#define SECONDARY_DATA_7 5
+#define GENERIC_32       6
+#define GENERIC_64       7
+#define MAX_MODEL        8
+
+#define SEC_6_A 0
+#define SEC_6_B 1
+#define SEC_6_C 2
+#define SEC_6_D 3
+#define SEC_6_E 4
+
+#define SEC_7_A 5
+#define SEC_7_B 6
+#define SEC_7_C 7
+#define SEC_7_D 8
+#define SEC_7_E 9
+#define SEC_7_F 10
+
+#define SEC_16   11
+#define SEC_16_A 12
+#define SEC_16_B 13
+#define SEC_16_C 14
+#define SEC_16_D 15
+#define SEC_16_E 16
+#define SEC_16_F 17
+#define SEC_16_I 18
+#define SEC_16_J 19
+#define SEC_16_K 20
+
+#define ORDERED_MULTI_STATE 0
+#define MK_MULTI_STATE      1
+#define GTR_MULTI_STATE     2
+
+
+
+
+
+#define CAT         0
+#define GAMMA       1
+#define GAMMA_I     2
+
+
+
+typedef  int boolean;
+
+
+typedef struct {
+  double lh;
+  int tree;
+  double weight;
+} elw;
+
+struct ent
+{
+  unsigned int *bitVector;
+  unsigned int *treeVector;
+  unsigned int amountTips;
+  int *supportVector;
+  unsigned int bipNumber;
+  unsigned int bipNumber2;
+  unsigned int supportFromTreeset[2]; 
+  struct ent *next;
+};
+
+typedef struct ent entry;
+
+typedef unsigned int hashNumberType;
+
+typedef unsigned int parsimonyNumber;
+
+/*typedef uint_fast32_t parsimonyNumber;*/
+
+#define PCF 32
+
+/*
+  typedef uint64_t parsimonyNumber;
+
+  #define PCF 16
+
+
+typedef unsigned char parsimonyNumber;
+
+#define PCF 2
+*/
+
+typedef struct
+{
+  hashNumberType tableSize;
+  entry **table;
+  hashNumberType entryCount;
+}
+  hashtable;
+
+
+struct stringEnt
+{
+  int nodeNumber;
+  char *word;
+  struct stringEnt *next;
+};
+
+typedef struct stringEnt stringEntry;
+ 
+typedef struct
+{
+  hashNumberType tableSize;
+  stringEntry **table;
+}
+  stringHashtable;
+
+
+typedef struct
+{
+  unsigned int  parsimonyScore;
+  unsigned int  parsimonyState;
+}
+  parsimonyVector;
+
+
+typedef struct ratec
+{
+  double accumulatedSiteLikelihood;
+  double rate;
+}
+  rateCategorize;
+
+
+typedef struct
+{
+  int tipCase;
+  int pNumber;
+  int qNumber;
+  int rNumber;
+  //double qz[NUM_BRANCHES];
+  //double rz[NUM_BRANCHES];
+} traversalInfo;
+
+typedef struct
+{
+  traversalInfo *ti;
+  int count;
+  int functionType;
+  boolean traversalHasChanged;
+  boolean *executeModel;
+  double  *parameterValues;
+} traversalData;
+
+
+struct noderec;
+
+typedef struct epBrData
+{
+  int    *countThem;
+  int    *executeThem;
+  unsigned int *parsimonyScores;
+  double *branches;
+  double *bootstrapBranches;
+  double *likelihoods;
+  double originalBranchLength;
+  char branchLabel[64];
+  int leftNodeNumber;
+  int rightNodeNumber;
+  int *leftScaling;
+  int *rightScaling;
+  parsimonyVector *leftParsimony;
+  parsimonyVector *rightParsimony;
+  //double branchLengths[NUM_BRANCHES];
+  double *left;
+  double *right;
+  int branchNumber; 
+} epaBranchData;
+
+typedef struct
+{
+  epaBranchData *epa;
+
+  unsigned int *vector; 
+  int support;   
+  struct noderec *oP;
+  struct noderec *oQ;
+} branchInfo;
+
+
+
+
+
+
+
+
+typedef struct
+{
+  boolean valid;
+  int partitions;
+  int *partitionList;
+}
+  linkageData;
+
+typedef struct
+{
+  int entries;
+  linkageData* ld;
+}
+  linkageList;
+
+
+typedef  struct noderec
+{
+ 
+  branchInfo      *bInf;
+  //  double           z[NUM_BRANCHES];
+#ifdef _BAYESIAN 
+  //double           z_tmp[NUM_BRANCHES];
+#endif 
+  struct noderec  *next;
+  struct noderec  *back;
+  hashNumberType   hash;
+  int              support;
+  int              number;
+  char             x;
+}
+  node, *nodeptr;
+
+typedef struct
+  {
+    double lh;
+    int number;
+  }
+  info;
+
+typedef struct bInf {
+  double likelihood;
+  nodeptr node;
+} bestInfo;
+
+typedef struct iL {
+  bestInfo *list;
+  int n;
+  int valid;
+} infoList;
+
+
+
+
+typedef  struct
+{
+  int              numsp;
+  int              sites;
+  unsigned char             **y;
+  unsigned char             *y0; 
+  int              *wgt;
+} rawdata;
+
+typedef  struct {
+  int             *alias;       /* site representing a pattern */
+  int             *aliaswgt;    /* weight by pattern */
+  int             *rateCategory;
+  size_t              endsite;     /* # of sequence patterns */
+  double          *patrat;      /* rates per pattern */
+  double          *patratStored; 
+} cruncheddata;
+
+
+
+
+typedef struct {
+  int     states;
+  int     maxTipStates;
+  size_t     lower;
+  size_t     upper;
+  size_t     width;
+  int     dataType;
+  int     protModels;
+  int     autoProtModels;
+  int     protFreqs;
+  int             **expVector;
+  double          **xVector;
+  size_t           *xSpaceVector;
+ 
+  unsigned char            **yVector;
+  char   *partitionName;
+  double *sumBuffer;
+ 
+  double *gammaRates;
+
+  double *EIGN;
+  double *EV;
+
+
+
+  double *EI;
+
+
+
+  
+
+  double *left;
+  double *right;
+
+
+
+
+  double *frequencies;
+  double *tipVector; 
+  double *substRates;
+  
+  
+  double *perSiteRates;
+
+  double *wr;
+  double *wr2;
+
+  
+
+  unsigned int    *globalScaler;
+  double          *globalScalerDouble; 
+  int    *wgt;
+ 
+  int    *rateCategory;
+  int    *symmetryVector;
+  int    *frequencyGrouping;
+  boolean nonGTR;
+  boolean optimizeBaseFrequencies;
+  double alpha;
+  
+
+  int gapVectorLength;
+  unsigned int *gapVector;
+  double *gapColumn;
+
+  int    numberOfCategories;
+} pInfo;
+
+
+
+typedef struct 
+{
+  int left;
+  int right;
+  double likelihood;
+} lhEntry;
+
+
+typedef struct 
+{
+  int count;
+  int size;
+  lhEntry *entries;
+} lhList;
+
+
+typedef struct List_{
+  void *value; 			
+  struct List_ *next; 
+} List;
+
+
+#define REARR_SETTING 1
+#define FAST_SPRS     2
+#define SLOW_SPRS     3
+
+typedef struct {
+ 
+  int state;
+
+  unsigned int vLength;
+  
+  int rearrangementsMax;
+  int rearrangementsMin;
+  int thoroughIterations;
+  int fastIterations;
+  int treeVectorLength;  
+  int mintrav;
+  int maxtrav;
+  int bestTrav;
+  int    Thorough;
+  int    optimizeRateCategoryInvocations;
+  
+  double accumulatedTime;
+
+  double startLH; 
+  double lh;
+  double previousLh;
+  double difference;
+  double epsilon;
+  
+  boolean impr;
+  boolean cutoff;  
+       
+  double tr_startLH;
+  double tr_endLH;
+  double tr_likelihood;
+  double tr_bestOfNode;
+  
+  double tr_lhCutoff;
+  double tr_lhAVG;
+  double tr_lhDEC;
+  int    tr_NumberOfCategories;
+  int    tr_itCount;  
+  int    tr_doCutoff;
+
+                                                                    
+} checkPointState;
+
+
+typedef struct {
+  double EIGN[19] __attribute__ ((aligned (BYTE_ALIGNMENT)));             
+  double EV[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));                
+  double EI[380] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double substRates[190];        
+  double frequencies[20] ;      
+  double tipVector[460] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double fracchange[1];
+  double left[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+  double right[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+} siteAAModels;
+
+typedef  struct  {
+  boolean useGappedImplementation;
+  boolean saveMemory;
+  
+  siteAAModels siteProtModel[2 * (NUM_PROT_MODELS - 2)];
+
+  boolean estimatePerSiteAA;
+
+  int    *resample;
+
+  int numberOfBranches;
+  int    numberOfTipsForInsertion;
+  int    *inserts;
+  int    branchCounter;
+
+ 
+
+
+  
+
+
+  parsimonyNumber **parsimonyState_A;
+  parsimonyNumber **parsimonyState_C;
+  parsimonyNumber **parsimonyState_G;
+  parsimonyNumber **parsimonyState_T;
+  unsigned int *parsimonyScore; 
+  int *ti;
+  unsigned int compressedWidth;
+  
+  int numberOfTrees; 
+
+  stringHashtable  *nameHash;
+
+  pInfo            *partitionData;
+  pInfo            *initialPartitionData;
+  pInfo            *extendedPartitionData;
+
+  int              *dataVector;
+  int              *initialDataVector;
+  int              *extendedDataVector;
+
+  int              *patternPosition;
+  int              *columnPosition;
+
+  char             *secondaryStructureInput;
+
+  boolean          *executeModel;
+
+  double           *perPartitionLH;
+
+  traversalData    td[1];
+
+  int              maxCategories;
+
+  double           *wr;
+  double           *wr2;
+  
+  //  double           coreLZ[NUM_BRANCHES];
+  int              modelNumber;
+  int              numBranches;
+  int              bootStopCriterion;
+  int              consensusType;
+  double           wcThreshold;
+
+
+ 
+ 
+ 
+ 
+  
+ 
+  branchInfo	   *bInf;
+
+  int              multiStateModel;
+
+
+  //  boolean curvatOK[NUM_BRANCHES];
+  /* the stuff below is shared among DNA and AA, span does
+     not change depending on datatype */
+
+  
+  double           *fracchanges;
+
+  /* model stuff end */
+
+  unsigned char             **yVector;
+  int              secondaryStructureModel;
+  size_t              originalCrunchedLength;
+  int              fullSites;
+  int              *originalModel;
+  int              *originalDataVector;
+  int              *originalWeights;
+  int              *secondaryStructurePairs;
+
+
+  double            *partitionContributions;
+  double            fracchange;
+  double            lhCutoff;
+  double            lhAVG;
+  unsigned long     lhDEC;
+  unsigned long     itCount;
+  int               numberOfInvariableColumns;
+  int               weightOfInvariableColumns;
+  int               rateHetModel;
+
+  double           startLH;
+  double           endLH;
+  double           likelihood;
+  double          *likelihoods;
+ 
+  node           **nodep;
+  nodeptr          nodeBaseAddress;
+  node            *start;
+  int              mxtips;  
+  int              *model;
+
+  int              *constraintVector;
+  int              numberOfSecondaryColumns;
+  boolean          searchConvergenceCriterion;
+  int              ntips;
+  int              nextnode;  
+  int              NumberOfModels;
+  int              parsimonyLength;
+  
+  int              checkPointCounter;
+  int              treeID;  
+  boolean          bigCutoff;
+  //  boolean          partitionSmoothed[NUM_BRANCHES];
+  //  boolean          partitionConverged[NUM_BRANCHES];
+  boolean          rooted;
+  boolean          grouped;
+  boolean          constrained;
+  boolean          doCutoff;
+  boolean          catOnly;
+  rawdata         *rdta;
+  cruncheddata    *cdta;
+
+  char **nameList;
+  char *tree_string;
+  char *tree0;
+  char *tree1;
+  int treeStringLength;
+  unsigned int bestParsimony;
+  double bestOfNode;
+  nodeptr removeNode;
+  nodeptr insertNode;
+
+  /*
+  double zqr[NUM_BRANCHES];
+  double currentZQR[NUM_BRANCHES];
+
+  double currentLZR[NUM_BRANCHES];
+  double currentLZQ[NUM_BRANCHES];
+  double currentLZS[NUM_BRANCHES];
+  double currentLZI[NUM_BRANCHES];
+  double lzs[NUM_BRANCHES];
+  double lzq[NUM_BRANCHES];
+  double lzr[NUM_BRANCHES];
+  double lzi[NUM_BRANCHES];
+  */
+ 
+  int mr_thresh;
+
+
+  unsigned int **bitVectors;
+
+  unsigned int vLength;
+
+  hashtable *h;
+  
+
+} tree;
+
+
+/***************************************************************/
+
+typedef struct {
+  int partitionNumber;
+  int partitionLength;
+} partitionType;
+
+typedef struct
+{
+  //  double z[NUM_BRANCHES];
+  nodeptr p, q;
+  int cp, cq;
+}
+  connectRELL, *connptrRELL;
+
+typedef  struct
+{
+  connectRELL     *connect; 
+  int             start;
+  double          likelihood;
+}
+  topolRELL;
+
+
+typedef  struct
+{
+  int max;
+  topolRELL **t;
+}
+  topolRELL_LIST;
+
+
+/**************************************************************/
+
+
+
+typedef struct conntyp {
+  //    double           z[NUM_BRANCHES];           /* branch length */
+    node            *p, *q;       /* parent and child sectors */
+    void            *valptr;      /* pointer to value of subtree */
+    int              descend;     /* pointer to first connect of child */
+    int              sibling;     /* next connect from same parent */
+    } connect, *connptr;
+
+typedef  struct {
+    double           likelihood;
+  int              initialTreeNumber;
+    connect         *links;       /* pointer to first connect (start) */
+    node            *start;
+    int              nextlink;    /* index of next available connect */
+                                  /* tr->start = tpl->links->p */
+    int              ntips;
+    int              nextnode;
+    int              scrNum;      /* position in sorted list of scores */
+    int              tplNum;      /* position in sorted list of trees */
+
+    } topol;
+
+typedef struct {
+    double           best;        /* highest score saved */
+    double           worst;       /* lowest score saved */
+    topol           *start;       /* starting tree for optimization */
+    topol          **byScore;
+    topol          **byTopol;
+    int              nkeep;       /* maximum topologies to save */
+    int              nvalid;      /* number of topologies saved */
+    int              ninit;       /* number of topologies initialized */
+    int              numtrees;    /* number of alternatives tested */
+    boolean          improved;
+    } bestlist;
+
+typedef  struct {
+  int              categories;
+  int              model;
+  int              bestTrav;
+  int              max_rearrange;
+  int              stepwidth;
+  int              initial;
+  boolean          initialSet;
+  int              mode;
+  long             boot;
+  long             rapidBoot;
+  boolean          bootstrapBranchLengths;
+  boolean          restart;
+  boolean          useWeightFile;
+  boolean          useMultipleModel;
+  boolean          constraint;
+  boolean          grouping;
+  boolean          randomStartingTree;
+  boolean          useInvariant;
+  int            protEmpiricalFreqs;
+  int            proteinMatrix;
+  int            checkpoints;
+  int            startingTreeOnly;
+  int            multipleRuns;
+  long           parsimonySeed;
+  boolean        perGeneBranchLengths;
+  boolean        likelihoodTest;
+  boolean        permuteTreeoptimize;
+  boolean        allInOne;
+  boolean        generateBS;
+  boolean        bootStopping;
+  boolean        useExcludeFile;
+  boolean        userProteinModel;
+  boolean        computeELW;
+  boolean        computeDistance;
+  boolean        thoroughInsertion;
+  boolean        compressPatterns;
+  boolean        useSecondaryStructure; 
+  double         likelihoodEpsilon;
+  double         gapyness;
+  int            similarityFilterMode;
+  double        *externalAAMatrix; 
+  boolean       readTaxaOnly;
+  int           meshSearch;  
+  boolean       veryFast;
+  boolean       useCheckpoint;
+  boolean       leaveDropMode;
+  int           slidingWindowSize;
+  boolean       writeBinaryFile;
+  boolean       readBinaryFile;
+#ifdef _BAYESIAN 
+  boolean       bayesian;
+#endif
+} analdef;
+
+typedef struct 
+{
+  int leftLength;
+  int rightLength;
+  int eignLength;
+  int evLength;
+  int eiLength;
+  int substRatesLength;
+  int frequenciesLength;
+  int tipVectorLength;
+  int symmetryVectorLength;
+  int frequencyGroupingLength;
+
+  boolean nonGTR;
+  boolean optimizeBaseFrequencies;
+
+  unsigned char undetermined;
+
+  const char *inverseMeaning;
+
+  int states;
+
+  boolean smoothFrequencies;
+
+  const unsigned  int *bitVector;
+
+} partitionLengths;
+
+/****************************** FUNCTIONS ****************************************************/
+
+
+
+extern void computePlacementBias(tree *tr, analdef *adef);
+
+extern int lookupWord(char *s, stringHashtable *h);
+
+extern void getDataTypeString(tree *tr, int model, char typeOfData[1024]);
+
+extern unsigned int genericBitCount(unsigned int* bitVector, unsigned int bitVectorLength);
+extern int countTips(nodeptr p, int numsp);
+extern entry *initEntry(void);
+extern void computeRogueTaxa(tree *tr, char* treeSetFileName, analdef *adef);
+extern unsigned int precomputed16_bitcount(unsigned int n);
+
+
+
+
+
+extern size_t discreteRateCategories(int rateHetModel);
+
+extern partitionLengths * getPartitionLengths(pInfo *p);
+extern boolean getSmoothFreqs(int dataType);
+extern const unsigned int *getBitVector(int dataType);
+extern unsigned char getUndetermined(int dataType);
+extern int getStates(int dataType);
+extern char getInverseMeaning(int dataType, unsigned char state);
+extern double gettime ( void );
+extern int gettimeSrand ( void );
+extern double randum ( long *seed );
+
+extern void getxnode ( nodeptr p );
+extern void hookup ( nodeptr p, nodeptr q, double *z, int numBranches);
+extern void hookupDefault ( nodeptr p, nodeptr q, int numBranches);
+extern boolean whitechar ( int ch );
+extern void errorExit ( int e );
+extern void printResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBootstrapResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBipartitionResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printLog ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printStartingTree ( tree *tr, analdef *adef, boolean finalPrint );
+extern void writeInfoFile ( analdef *adef, tree *tr, double t );
+extern int main ( int argc, char *argv[] );
+extern void calcBipartitions ( tree *tr, analdef *adef, char *bestTreeFileName, char *bootStrapFileName );
+extern void initReversibleGTR (tree *tr, int model);
+extern double LnGamma ( double alpha );
+extern double IncompleteGamma ( double x, double alpha, double ln_gamma_alpha );
+extern double PointNormal ( double prob );
+extern double PointChi2 ( double prob, double v );
+extern void makeGammaCats (double alpha, double *gammaRates, int K);
+extern void initModel ( tree *tr, rawdata *rdta, cruncheddata *cdta, analdef *adef );
+extern void doAllInOne ( tree *tr, analdef *adef );
+
+extern void classifyML(tree *tr, analdef *adef);
+extern void doBootstrap ( tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta );
+extern void doInference ( tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta );
+extern void resetBranches ( tree *tr );
+extern void modOpt ( tree *tr, analdef *adef , double likelihoodEpsilon);
+
+
+extern void parsePartitions ( analdef *adef, rawdata *rdta, tree *tr);
+extern void computeBOOTRAPID (tree *tr, analdef *adef, long *radiusSeed);
+extern void optimizeRAPID ( tree *tr, analdef *adef );
+extern void thoroughOptimization ( tree *tr, analdef *adef, topolRELL_LIST *rl, int index );
+extern int treeOptimizeThorough ( tree *tr, int mintrav, int maxtrav);
+
+extern int checker ( tree *tr, nodeptr p );
+extern int randomInt ( int n );
+extern void makePermutation ( int *perm, int n, analdef *adef );
+extern boolean tipHomogeneityChecker ( tree *tr, nodeptr p, int grouping );
+extern void makeRandomTree ( tree *tr, analdef *adef );
+extern void nodeRectifier ( tree *tr );
+extern void makeParsimonyTreeThorough(tree *tr, analdef *adef);
+extern void makeParsimonyTree ( tree *tr, analdef *adef );
+extern void makeParsimonyTreeFastDNA(tree *tr, analdef *adef);
+extern void makeParsimonyTreeIncomplete ( tree *tr, analdef *adef );
+extern void makeParsimonyInsertions(tree *tr, nodeptr startNodeQ, nodeptr startNodeR);
+
+
+
+extern FILE *myfopen(const char *path, const char *mode);
+
+
+extern boolean initrav ( tree *tr, nodeptr p );
+extern void initravPartition ( tree *tr, nodeptr p, int model );
+extern boolean update ( tree *tr, nodeptr p );
+extern boolean smooth ( tree *tr, nodeptr p );
+extern boolean smoothTree ( tree *tr, int maxtimes );
+extern boolean localSmooth ( tree *tr, nodeptr p, int maxtimes );
+extern boolean localSmoothMulti(tree *tr, nodeptr p, int maxtimes, int model);
+extern void initInfoList ( int n );
+extern void freeInfoList ( void );
+extern void insertInfoList ( nodeptr node, double likelihood );
+extern boolean smoothRegion ( tree *tr, nodeptr p, int region );
+extern boolean regionalSmooth ( tree *tr, nodeptr p, int maxtimes, int region );
+extern nodeptr removeNodeBIG ( tree *tr, nodeptr p, int numBranches);
+extern nodeptr removeNodeRestoreBIG ( tree *tr, nodeptr p );
+extern boolean insertBIG ( tree *tr, nodeptr p, nodeptr q, int numBranches);
+extern boolean insertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern boolean testInsertBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void addTraverseBIG ( tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav );
+extern int rearrangeBIG ( tree *tr, nodeptr p, int mintrav, int maxtrav );
+extern void traversalOrder ( nodeptr p, int *count, nodeptr *nodeArray );
+extern double treeOptimizeRapid ( tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt);
+extern boolean testInsertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void restoreTreeFast ( tree *tr );
+extern int determineRearrangementSetting ( tree *tr, analdef *adef, bestlist *bestT, bestlist *bt );
+extern void computeBIGRAPID ( tree *tr, analdef *adef, boolean estimateModel);
+extern boolean treeEvaluate ( tree *tr, double smoothFactor );
+extern boolean treeEvaluatePartition ( tree *tr, double smoothFactor, int model );
+
+extern void meshTreeSearch(tree *tr, analdef *adef, int thorough);
+
+extern void initTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void freeTL ( topolRELL_LIST *rl);
+extern void restoreTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void resetTL ( topolRELL_LIST *rl );
+extern void saveTL ( topolRELL_LIST *rl, tree *tr, int index );
+
+extern int  saveBestTree (bestlist *bt, tree *tr);
+extern int  recallBestTree (bestlist *bt, int rank, tree *tr);
+extern int initBestTree ( bestlist *bt, int newkeep, int numsp );
+extern void resetBestTree ( bestlist *bt );
+extern boolean freeBestTree ( bestlist *bt );
+
+
+extern char *Tree2String ( char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood, 
+			   boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport);
+extern void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission);
+
+
+
+extern int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly);
+extern void treeReadTopologyString(char *treeString, tree *tr);
+extern boolean treeReadLenMULT ( FILE *fp, tree *tr, analdef *adef );
+
+extern void getStartingTree ( tree *tr);
+extern double treeLength(tree *tr, int model);
+
+extern void computeBootStopOnly(tree *tr, char *bootStrapFileName, analdef *adef);
+extern boolean bootStop(tree *tr, hashtable *h, int numberOfTrees, double *pearsonAverage, unsigned int **bitVectors, int treeVectorLength, unsigned int vectorLength);
+extern void computeConsensusOnly(tree *tr, char* treeSetFileName, analdef *adef);
+extern double evaluatePartialGeneric (tree *, int i, double ki, int _model);
+extern void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal);
+extern void newviewGeneric (tree *tr, nodeptr p, boolean masked);
+extern void newviewGenericMulti (tree *tr, nodeptr p, int model);
+extern void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask);
+extern void makenewzGenericDistance(tree *tr, int maxiter, double *z0, double *result, int taxon1, int taxon2);
+extern double evaluatePartitionGeneric (tree *tr, nodeptr p, int model);
+extern void newviewPartitionGeneric (tree *tr, nodeptr p, int model);
+extern double evaluateGenericVector (tree *tr, nodeptr p);
+extern void categorizeGeneric (tree *tr, nodeptr p);
+extern double makenewzPartitionGeneric(tree *tr, nodeptr p, nodeptr q, double z0, int maxiter, int model);
+extern boolean isTip(int number, int maxTips);
+extern void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal);
+
+
+
+extern void   newviewIterative(tree *tr, int startIndex);
+
+extern void evaluateIterative(tree *);
+
+extern void *malloc_aligned( size_t size);
+
+extern void storeExecuteMaskInTraversalDescriptor(tree *tr);
+extern void storeValuesInTraversalDescriptor(tree *tr, double *value);
+extern void myBinFwrite(const void *ptr, size_t size, size_t nmemb);
+extern void myBinFread(void *ptr, size_t size, size_t nmemb);
+
+
+
+extern void makenewzIterative(tree *);
+extern void execCore(tree *, volatile double *dlnLdlz, volatile double *d2lnLdlz2);
+
+
+
+extern void determineFullTraversal(nodeptr p, tree *tr);
+/*extern void optRateCat(tree *, int i, double lower_spacing, double upper_spacing, double *lhs);*/
+
+extern unsigned int evaluateParsimonyIterative(tree *);
+extern void newviewParsimonyIterative(tree *);
+
+extern unsigned int evaluateParsimonyIterativeFast(tree *);
+extern void newviewParsimonyIterativeFast(tree *);
+
+extern unsigned int evaluatePerSiteParsimony(tree *tr, nodeptr p, unsigned int *siteParsimony);
+extern void initravParsimonyNormal(tree *tr, nodeptr p);
+
+extern double evaluateGenericInitravPartition(tree *tr, nodeptr p, int model);
+extern void evaluateGenericVectorIterative(tree *, int startIndex, int endIndex);
+extern void categorizeIterative(tree *, int startIndex, int endIndex);
+
+extern void fixModelIndices(tree *tr, int endsite, boolean fixRates);
+extern void calculateModelOffsets(tree *tr);
+extern void gammaToCat(tree *tr);
+extern void catToGamma(tree *tr, analdef *adef);
+extern void handleExcludeFile(tree *tr, analdef *adef, rawdata *rdta);
+
+extern nodeptr findAnyTip(nodeptr p, int numsp);
+
+extern void parseProteinModel(analdef *adef);
+
+
+
+extern void computeNextReplicate(tree *tr, long *seed, int *originalRateCategories, int *originalInvariant, boolean isRapid, boolean fixRates);
+/*extern void computeNextReplicate(tree *tr, analdef *adef, int *originalRateCategories, int *originalInvariant);*/
+
+extern void putWAG(double *ext_initialRates);
+
+extern void reductionCleanup(tree *tr, int *originalRateCategories, int *originalInvariant);
+extern void parseSecondaryStructure(tree *tr, analdef *adef, int sites);
+extern void printPartitions(tree *tr);
+extern void compareBips(tree *tr, char *bootStrapFileName, analdef *adef);
+extern void computeRF(tree *tr, char *bootStrapFileName, analdef *adef);
+
+
+extern  unsigned int **initBitVector(tree *tr, unsigned int *vectorLength);
+extern hashtable *copyHashTable(hashtable *src, unsigned int vectorLength);
+extern hashtable *initHashTable(unsigned int n);
+extern void cleanupHashTable(hashtable *h, int state);
+extern double convergenceCriterion(hashtable *h, int mxtips);
+extern void freeBitVectors(unsigned int **v, int n);
+extern void freeHashTable(hashtable *h);
+
+
+
+extern void printBothOpen(const char* format, ... );
+extern void printBothOpenMPI(const char* format, ... );
+extern void initRateMatrix(tree *tr);
+
+extern void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf,
+				    int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF);
+
+extern int getIncrement(tree *tr, int model);
+
+extern void fastSearch(tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta);
+extern void shSupports(tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta);
+
+extern FILE *getNumberOfTrees(tree *tr, char *fileName, analdef *adef);
+
+extern void writeBinaryModel(tree *tr);
+extern void readBinaryModel(tree *tr);
+extern void treeEvaluateRandom (tree *tr, double smoothFactor);
+extern void treeEvaluateProgressive(tree *tr);
+
+extern void testGapped(tree *tr);
+
+extern boolean issubset(unsigned int* bipA, unsigned int* bipB, unsigned int vectorLen);
+extern boolean compatible(entry* e1, entry* e2, unsigned int bvlen);
+
+
+
+extern int *permutationSH(tree *tr, int nBootstrap, long _randomSeed);
+
+extern void updatePerSiteRates(tree *tr, boolean scaleRates);
+
+extern void restart(tree *tr);
+
+
+
+
+
+
+
diff --git a/parser/globalVariables.h b/parser/globalVariables.h
new file mode 100644
index 0000000..2e407a6
--- /dev/null
+++ b/parser/globalVariables.h
@@ -0,0 +1,195 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+
+
+#ifdef _USE_ZLIB
+
+#include <zlib.h>
+
+#endif
+
+
+
+#ifdef _FINE_GRAIN_MPI
+int processes;
+double *globalResult;
+#endif
+
+int processID;
+infoList iList;
+FILE   *INFILE;
+
+#ifdef _USE_ZLIB
+gzFile byteFile;
+#else
+FILE *byteFile;
+#endif
+
+
+char run_id[128] = "", 
+  seq_file[1024] = "",
+  weightFileName[1024] = "",
+  modelFileName[1024] = "", 
+  byteFileName[1024] = "",
+  infoFileName[1024] = "",
+  secondaryStructureFileName[1024] = "",
+  excludeFileName[1024],
+  proteinModelFileName[1024];
+
+char *protModels[NUM_PROT_MODELS] = {"DAYHOFF", "DCMUT", "JTT", "MTREV", "WAG", "RTREV", "CPREV", "VT", "BLOSUM62", "MTMAM", "LG", "MTART", "MTZOA", "PMB", 
+				     "HIVB", "HIVW", "JTTDCMUT", "FLU", "STMTREV", "AUTO", "LG4M", "LG4X", "GTR"};
+
+const char inverseMeaningBINARY[4] = {'_', '0', '1', '-'};
+const char inverseMeaningDNA[16]   = {'_', 'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', '-'};
+const char inverseMeaningPROT[23]  = {'A','R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 
+			       'T', 'W', 'Y', 'V', 'B', 'Z', '-'};
+const char inverseMeaningGeneric32[33] = {'0', '1', '2', '3', '4', '5', '6', '7', 
+				    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+				    'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+				    'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+				    '-'};
+const char inverseMeaningGeneric64[33] = {'0', '1', '2', '3', '4', '5', '6', '7', 
+				    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+				    'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+				    'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+				    '-'};
+
+const unsigned int bitVectorIdentity[256] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,25 ,26 ,
+					     27 ,28 ,29 ,30 ,31 ,32 ,33 ,34 ,35 ,36 ,37 ,38 ,39 ,40 ,41 ,42 ,43 ,44 ,45 ,46 ,47 ,48 ,49 ,50 ,51 ,
+					     52 ,53 ,54 ,55 ,56 ,57 ,58 ,59 ,60 ,61 ,62 ,63 ,64 ,65 ,66 ,67 ,68 ,69 ,70 ,71 ,72 ,73 ,74 ,75 ,76 ,
+					     77 ,78 ,79 ,80 ,81 ,82 ,83 ,84 ,85 ,86 ,87 ,88 ,89 ,90 ,91 ,92 ,93 ,94 ,95 ,96 ,97 ,98 ,99 ,100 ,101 ,
+					     102 ,103 ,104 ,105 ,106 ,107 ,108 ,109 ,110 ,111 ,112 ,113 ,114 ,115 ,116 ,117 ,118 ,119 ,120 ,121 ,122 ,
+					     123 ,124 ,125 ,126 ,127 ,128 ,129 ,130 ,131 ,132 ,133 ,134 ,135 ,136 ,137 ,138 ,139 ,140 ,141 ,142 ,143 ,
+					     144 ,145 ,146 ,147 ,148 ,149 ,150 ,151 ,152 ,153 ,154 ,155 ,156 ,157 ,158 ,159 ,160 ,161 ,162 ,163 ,164 ,
+					     165 ,166 ,167 ,168 ,169 ,170 ,171 ,172 ,173 ,174 ,175 ,176 ,177 ,178 ,179 ,180 ,181 ,182 ,183 ,184 ,185 ,
+					     186 ,187 ,188 ,189 ,190 ,191 ,192 ,193 ,194 ,195 ,196 ,197 ,198 ,199 ,200 ,201 ,202 ,203 ,204 ,205 ,206 ,
+					     207 ,208 ,209 ,210 ,211 ,212 ,213 ,214 ,215 ,216 ,217 ,218 ,219 ,220 ,221 ,222 ,223 ,224 ,225 ,226 ,227 ,
+					     228 ,229 ,230 ,231 ,232 ,233 ,234 ,235 ,236 ,237 ,238 ,239 ,240 ,241 ,242 ,243 ,244 ,245 ,246 ,247 ,248 ,
+					     249 ,250 ,251 ,252 ,253 ,254 ,255};
+
+
+
+const unsigned int bitVectorAA[23] = {1, 2, 4, 8, 16, 32, 64, 128, 
+				      256, 512, 1024, 2048, 4096, 
+				      8192, 16384, 32768, 65536, 131072, 262144, 
+				      524288, 12 /* N | D */, 96 /*Q | E*/, 1048575 /* - */};
+
+const unsigned int bitVectorSecondary[256] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
+					      10, 11, 12, 13, 14, 15, 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 
+					      208, 224, 240, 0, 17, 34, 51, 68, 85, 102, 119, 136, 153, 170, 187, 204, 221, 238, 
+					      255, 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328, 
+					      3584, 3840, 0, 257, 514, 771, 1028, 1285, 1542, 1799, 2056, 2313, 2570, 2827, 3084, 
+					      3341, 3598, 3855, 0, 272, 544, 816, 1088, 1360, 1632, 1904, 2176, 2448, 2720, 2992, 
+					      3264, 3536, 3808, 4080, 0, 273, 546, 819, 1092, 1365, 1638, 1911, 2184, 2457, 2730, 
+					      3003, 3276, 3549, 3822, 4095, 0, 4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768, 
+					      36864, 40960, 45056, 49152, 53248, 57344, 61440, 0, 4097, 8194, 12291, 16388, 20485, 24582, 
+					      28679, 32776, 36873, 40970, 45067, 49164, 53261, 57358, 61455, 0, 4112, 8224, 12336, 16448, 
+					      20560, 24672, 28784, 32896, 37008, 41120, 45232, 49344, 53456, 57568, 61680, 0, 4113, 8226, 
+					      12339, 16452, 20565, 24678, 28791, 32904, 37017, 41130, 45243, 49356, 53469, 57582, 61695, 
+					      0, 4352, 8704, 13056, 17408, 21760, 26112, 30464, 34816, 39168, 43520, 47872, 52224, 56576, 
+					      60928, 65280, 0, 4353, 8706, 13059, 17412, 21765, 26118, 30471, 34824, 39177, 43530, 47883, 
+					      52236, 56589, 60942, 65295, 0, 4368, 8736, 13104, 17472, 21840, 26208, 30576, 34944, 39312, 
+					      43680, 48048, 52416, 56784, 61152, 65520, 0, 4369, 8738, 13107, 17476, 21845, 26214, 30583, 
+					      34952, 39321, 43690, 48059, 52428, 56797, 61166, 65535};
+
+const unsigned int bitVector32[33] = {1,     2,    4,    8,   16,   32,    64,   128,
+                                      256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
+                                      65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
+                                      16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824, 2147483648u, 
+				      4294967295u};
+
+/*const unsigned int bitVector64[65] = {};*/
+
+const unsigned int mask32[32] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 
+					262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 
+					268435456, 536870912, 1073741824, 2147483648U};
+
+const char *secondaryModelList[21] = { "S6A (GTR)", "S6B", "S6C", "S6D", "S6E", "S7A (GTR)", "S7B", "S7C", "S7D", "S7E", "S7F", "S16 (GTR)", "S16A", "S16B", "S16C", 
+				       "S16D", "S16E", "S16F", "S16I", "S16J", "S16K"};
+
+double masterTime;
+double accumulatedTime;
+int partCount = 0;
+int optimizeRateCategoryInvocations = 1;
+
+
+
+
+
+partitionLengths pLengths[MAX_MODEL] = {
+  
+  /* BINARY */
+
+  // {4,   4,   2,  4,  2, 1, 2,  8, 2, 2, FALSE, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+  //eiLength changed from 2 -> 4
+  {4,   4,   2,  4,  4, 1, 2,  8, 2, 2, FALSE, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+  /* DNA */
+  {16,  16,  4, 16, 16, 6, 4, 64, 6, 4, FALSE, FALSE, 15, inverseMeaningDNA, 4, FALSE, bitVectorIdentity},
+        
+  /* AA */
+  {400, 400, 20, 400, 400, 190, 20, 460, 190, 20, FALSE, FALSE, 22, inverseMeaningPROT, 20, TRUE, bitVectorAA},
+  
+  /* SECONDARY_DATA */
+
+  {256, 256, 16, 256, 256, 120, 16, 4096, 120, 16, FALSE, FALSE, 255, (char*)NULL, 16, TRUE, bitVectorSecondary},
+
+  
+  /* SECONDARY_DATA_6 */
+  {36, 36,  6, 36, 36, 15, 6, 384, 15, 6, FALSE, FALSE, 63, (char*)NULL, 6, TRUE, bitVectorIdentity},
+
+  
+  /* SECONDARY_DATA_7 */
+  {49,   49,    7,   49, 49,  21, 7, 896, 21, 7, FALSE, FALSE, 127, (char*)NULL, 7, TRUE, bitVectorIdentity},
+
+  /* 32 states */
+  {1024, 1024, 32, 1024, 1024, 496, 32, 1056, 496, 32, FALSE, FALSE, 32, inverseMeaningGeneric32, 32, TRUE, bitVector32},
+  
+  /* 64 states */
+  {4096, 4096, 64, 4096, 4096, 2016, 64, 4160, 64, 2016, FALSE, FALSE, 64, (char*)NULL, 64, TRUE, (unsigned int*)NULL}
+};
+
+partitionLengths pLength;
+
+     
+
+
+
+
+#ifdef _USE_PTHREADS
+volatile int             NumberOfJobs;
+volatile int             jobCycle = 0;
+volatile int             threadJob = 0;
+volatile int             NumberOfThreads;
+volatile double          *reductionBuffer;
+volatile double          *reductionBufferTwo;
+volatile char             *barrierBuffer;
+#endif
diff --git a/parser/parsePartitions.c b/parser/parsePartitions.c
new file mode 100644
index 0000000..23cef3e
--- /dev/null
+++ b/parser/parsePartitions.c
@@ -0,0 +1,1427 @@
+/*  RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees 
+ *  Copyright August 2006 by Alexandros Stamatakis
+ *
+ *  Partially derived from
+ *  fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *  
+ *  and 
+ *
+ *  Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ *  This program is free software; you may redistribute it and/or modify its
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation; either version 2 of the License, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ *  for more details.
+ * 
+ *
+ *  For any other enquiries send an Email to Alexandros Stamatakis
+ *  Alexandros.Stamatakis at epfl.ch
+ *
+ *  When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ *  Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". 
+ *  Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h> 
+#endif
+
+#include <math.h>
+#include <time.h> 
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <strings.h>
+
+
+
+
+#include "axml.h"
+
+/*****************************FUNCTIONS FOR READING MULTIPLE MODEL SPECIFICATIONS************************************************/
+
+
+extern char modelFileName[1024];
+extern char excludeFileName[1024];
+extern char proteinModelFileName[1024];
+extern char secondaryStructureFileName[1024];
+
+
+extern char seq_file[1024];
+
+extern char *protModels[NUM_PROT_MODELS];
+
+static boolean lineContainsOnlyWhiteChars(char *line)
+{
+  int i, n = strlen(line);
+
+  if(n == 0)
+    return TRUE;
+
+  for(i = 0; i < n; i++)
+    {
+      if(!whitechar(line[i]))
+	return FALSE;
+    }
+  return TRUE;
+}
+
+
+static int isNum(char c)
+{
+  
+  return (c == '0' || c == '1' || c == '2' || c == '3' || c == '4' ||
+	  c == '5' || c == '6' || c == '7' || c == '8' || c == '9');
+}
+
+
+static void skipWhites(char **ch)
+{
+  while(**ch == ' ' || **ch == '\t')
+    *ch = *ch + 1;
+}
+
+static void analyzeIdentifier(char **ch, int modelNumber, tree *tr)
+{
+  char
+    *start = *ch,
+    ident[2048] = "";
+  char model[128] = "";  
+  char thisModel[1024];
+  int i = 0, n, j;
+  int containsComma = 0;
+
+  while(**ch != '=')
+    {
+      if(**ch == '\n' || **ch == '\r')
+	{
+	  printf("\nPartition file parsing error!\n");
+	  printf("Each line must contain a \"=\" character\n");
+	  printf("Offending line: %s\n", start);
+	  printf("ExaML will exit now.\n\n");
+	  exit(-1);
+	}
+
+      if(**ch != ' ' && **ch != '\t')
+	{
+	  ident[i] = **ch;      
+	  i++;
+	}
+      *ch = *ch + 1;
+    }
+  
+  n = i;
+  i = 0;
+  
+  for(i = 0; i < n; i++)
+    if(ident[i] == ',') 
+      containsComma = 1;
+
+  if(!containsComma)
+    {
+      printf("Error, model file must have format: DNA or AA model, then a comma, and then the partition name\n");
+      exit(-1);
+    }
+  else
+    {
+      boolean found = FALSE;
+      i = 0;
+      while(ident[i] != ',')
+	{
+	  model[i] = ident[i];
+	  i++;
+	}      
+      
+      /* AA */
+
+      for(i = 0; i < NUM_PROT_MODELS && !found; i++)
+	{	
+	  strcpy(thisModel, protModels[i]);
+	  
+	  if(strcasecmp(model, thisModel) == 0)
+	    {	      	      
+	      tr->initialPartitionData[modelNumber].protModels = i;		  
+	      tr->initialPartitionData[modelNumber].protFreqs  = 0;
+	      tr->initialPartitionData[modelNumber].dataType   = AA_DATA;
+	      found = TRUE;
+	    }
+	  	  
+	  strcpy(thisModel, protModels[i]);
+	  strcat(thisModel, "F");
+	  
+	  if(strcasecmp(model, thisModel) == 0)
+	    {	      
+	      tr->initialPartitionData[modelNumber].protModels = i;		  
+	      tr->initialPartitionData[modelNumber].protFreqs  = 1;
+	      tr->initialPartitionData[modelNumber].dataType   = AA_DATA;
+	      found = TRUE;
+
+	      if(tr->initialPartitionData[modelNumber].protModels == AUTO)
+		{
+		  printf("\nError: Option AUTOF has been deprecated, exiting\n\n");
+		  errorExit(-1);
+		}
+	      
+	      if(tr->initialPartitionData[modelNumber].protModels == LG4M || tr->initialPartitionData[modelNumber].protModels == LG4X)
+		{
+		  printf("\nError: Options LG4MF and LG4XF have been deprecated.\n");
+		  printf("They shall only be used with the given base frequencies of the model, exiting\n\n");
+		  errorExit(-1);
+		}
+	    }	
+
+	  strcpy(thisModel, protModels[i]);
+	  strcat(thisModel, "X");
+
+	  if(strcasecmp(model, thisModel) == 0)
+	    {	      
+	      tr->initialPartitionData[modelNumber].protModels = i;		  
+	      tr->initialPartitionData[modelNumber].protFreqs  = 0;
+	      tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+	      tr->initialPartitionData[modelNumber].dataType   = AA_DATA;
+	      found = TRUE;
+
+	      if(tr->initialPartitionData[modelNumber].protModels == AUTO)
+		{
+		  printf("\nError: Option AUTOX has been deprecated, exiting\n\n");
+		  errorExit(-1);
+		}
+	      
+	      if(tr->initialPartitionData[modelNumber].protModels == LG4M || tr->initialPartitionData[modelNumber].protModels == LG4X)
+		{
+		  printf("\nError: Options LG4MX and LG4XX have been deprecated.\n");
+		  printf("They shall only be used with the given base frequencies of the model, exiting\n\n");
+		  errorExit(-1);
+		}
+
+	    }	
+
+	  /*if(found)
+	    printf("%s %d\n", model, i);*/
+	}
+      
+      if(!found)
+	{		  	  
+	  if(strcasecmp(model, "DNA") == 0)
+	    {	     	      
+	      tr->initialPartitionData[modelNumber].protModels = -1;		  
+	      tr->initialPartitionData[modelNumber].protFreqs  = -1;
+	      tr->initialPartitionData[modelNumber].dataType   = DNA_DATA;
+	      tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = FALSE;
+	      found = TRUE;
+	    }
+	  else
+	    {
+	      if(strcasecmp(model, "DNAX") == 0)
+		{	     	      
+		  tr->initialPartitionData[modelNumber].protModels = -1;		  
+		  tr->initialPartitionData[modelNumber].protFreqs  = -1;
+		  tr->initialPartitionData[modelNumber].dataType   = DNA_DATA;
+		  tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+		  found = TRUE;
+		}	      	    
+	      else
+		{	    	  
+		  if(strcasecmp(model, "BIN") == 0)
+		    {	     	      
+		      tr->initialPartitionData[modelNumber].protModels = -1;		  
+		      tr->initialPartitionData[modelNumber].protFreqs  = -1;
+		      tr->initialPartitionData[modelNumber].dataType   = BINARY_DATA;
+		      tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = FALSE;
+		      found = TRUE;
+		    }
+		  else
+		    {
+		      if(strcasecmp(model, "BINX") == 0)
+			{
+			  tr->initialPartitionData[modelNumber].protModels = -1;		  
+			  tr->initialPartitionData[modelNumber].protFreqs  = -1;
+			  tr->initialPartitionData[modelNumber].dataType   = BINARY_DATA;
+			  tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+			  found = TRUE;
+			}
+		      else
+			{
+			  if(strcasecmp(model, "MULTI") == 0)
+			    {	     	      
+			      tr->initialPartitionData[modelNumber].protModels = -1;		  
+			      tr->initialPartitionData[modelNumber].protFreqs  = -1;
+			      tr->initialPartitionData[modelNumber].dataType   = GENERIC_32;
+			      
+			      found = TRUE;
+			    }
+			  else
+			    {
+			      if(strcasecmp(model, "CODON") == 0)
+				{	     	      
+				  tr->initialPartitionData[modelNumber].protModels = -1;		  
+				  tr->initialPartitionData[modelNumber].protFreqs  = -1;
+				  tr->initialPartitionData[modelNumber].dataType   = GENERIC_64;
+				  
+				  found = TRUE;
+				}
+			    }
+			}
+		    }
+		}
+	    }
+	}
+
+      if(!found)
+	{
+	  printf("ERROR: you specified the unknown model %s for partition %d\n", model, modelNumber);
+	  exit(-1);
+	}
+           
+
+      i = 0;
+      while(ident[i++] != ',');      
+
+      tr->initialPartitionData[modelNumber].partitionName = (char*)malloc((n - i + 1) * sizeof(char));          
+
+      j = 0;
+      while(i < n)	
+	tr->initialPartitionData[modelNumber].partitionName[j++] =  ident[i++];
+
+      tr->initialPartitionData[modelNumber].partitionName[j] = '\0';                      
+    }
+}
+
+
+
+static void setModel(int model, int position, int *a)
+{
+  if(a[position] == -1)
+    a[position] = model;
+  else
+    {
+      printf("ERROR trying to assign model %d to position %d \n", model, position);
+      printf("while already model %d has been assigned to this position\n", a[position]);
+      exit(-1);
+    }      
+}
+
+
+static int myGetline(char **lineptr, int *n, FILE *stream)
+{
+  char *line, *p;
+  int size, copy, len;
+  int chunkSize = 256 * sizeof(char);
+
+   if (*lineptr == NULL || *n < 2) 
+    {
+      line = (char *)realloc(*lineptr, chunkSize);
+      if (line == NULL)
+	return -1;
+      *lineptr = line;
+      *n = chunkSize;
+    }
+
+   line = *lineptr;
+   size = *n;
+  
+   copy = size;
+   p = line;
+   
+   while(1)
+     {
+       while (--copy > 0)
+	 {
+	   register int c = getc(stream);
+	   if (c == EOF)
+	     goto lose;
+	   else
+	     {
+	       *p++ = c;
+	       if(c == '\n' || c == '\r')	
+		 goto win;
+	     }
+	 }
+
+       /* Need to enlarge the line buffer.  */
+       len = p - line;
+       size *= 2;
+       line = realloc (line, size);
+       if (line == NULL)
+	 goto lose;
+       *lineptr = line;
+       *n = size;
+       p = line + len;
+       copy = size - len;
+     }
+   
+ lose:
+  if (p == *lineptr)
+    return -1;
+  /* Return a partial line since we got an error in the middle.  */
+ win:
+  *p = '\0';
+  return p - *lineptr;
+}
+
+
+
+void parsePartitions(analdef *adef, rawdata *rdta, tree *tr)
+{
+  FILE *f; 
+  int numberOfModels = 0; 
+  int nbytes = 0;
+  char *ch;
+  char *cc = (char *)NULL;
+  char **p_names;
+  int n, i, l;
+  int lower, upper, modulo;
+  char buf[256];
+  int **partitions;
+  int pairsCount;
+  int as, j;
+  int k; 
+
+  f = myfopen(modelFileName, "rb");   
+
+ 
+  while(myGetline(&cc, &nbytes, f) > -1)
+    {     
+      if(!lineContainsOnlyWhiteChars(cc))
+	{
+	  numberOfModels++;
+	}
+      if(cc)
+	free(cc);
+      cc = (char *)NULL;
+    }     
+  
+  rewind(f);
+      
+  p_names = (char **)malloc(sizeof(char *) * numberOfModels);
+  partitions = (int **)malloc(sizeof(int *) * numberOfModels);
+      
+ 
+  
+  tr->initialPartitionData = (pInfo*)malloc(sizeof(pInfo) * numberOfModels);
+
+      
+  for(i = 0; i < numberOfModels; i++) 
+    {     
+      tr->initialPartitionData[i].protModels = adef->proteinMatrix;
+      tr->initialPartitionData[i].protFreqs  = adef->protEmpiricalFreqs;
+      tr->initialPartitionData[i].dataType   = -1;
+    }
+
+  for(i = 0; i < numberOfModels; i++)    
+    partitions[i] = (int *)NULL;
+    
+  i = 0;
+  while(myGetline(&cc, &nbytes, f) > -1)
+    {          
+      if(!lineContainsOnlyWhiteChars(cc))
+	{
+	  n = strlen(cc);	 
+	  p_names[i] = (char *)malloc(sizeof(char) * (n + 1));
+	  strcpy(&(p_names[i][0]), cc);
+	  i++;
+	}
+      if(cc)
+	free(cc);
+      cc = (char *)NULL;
+    }         
+
+  for(i = 0; i < numberOfModels; i++)
+    {           
+      ch = p_names[i];     
+      pairsCount = 0;
+      skipWhites(&ch);
+      
+      if(*ch == '=')
+	{
+	  printf("Identifier missing prior to '=' in %s\n", p_names[i]);
+	  exit(-1);
+	}
+      
+      analyzeIdentifier(&ch, i, tr);
+      ch++;
+            
+    numberPairs:
+      pairsCount++;
+      partitions[i] = (int *)realloc((void *)partitions[i], (1 + 3 * pairsCount) * sizeof(int));
+      partitions[i][0] = pairsCount;
+      partitions[i][3 + 3 * (pairsCount - 1)] = -1; 	
+      
+      skipWhites(&ch);
+      
+      if(!isNum(*ch))
+	{
+	  printf("%c Number expected in %s\n", *ch, p_names[i]);
+	  exit(-1);
+	}   
+      
+      l = 0;
+      while(isNum(*ch))		 
+	{
+	  /*printf("%c", *ch);*/
+	  buf[l] = *ch;
+	  ch++;	
+	  l++;
+	}
+      buf[l] = '\0';
+      lower = atoi(buf);
+      partitions[i][1 + 3 * (pairsCount - 1)] = lower;   
+      
+      skipWhites(&ch);
+      
+      /* NEW */
+      
+      if((*ch != '-') && (*ch != ','))
+	{
+	  if(*ch == '\0' || *ch == '\n' || *ch == '\r')
+	    {
+	      upper = lower;
+	      goto SINGLE_NUMBER;
+	    }
+	  else
+	    {
+	      printf("'-' or ',' expected in %s\n", p_names[i]);
+	      exit(-1);
+	    }
+	}	 
+      
+      if(*ch == ',')
+	{	     
+	  upper = lower;
+	  goto SINGLE_NUMBER;
+	}
+      
+      /* END NEW */
+      
+      ch++;   
+      
+      skipWhites(&ch);
+      
+      if(!isNum(*ch))
+	{
+	  printf("%c Number expected in %s\n", *ch, p_names[i]);
+	  exit(-1);
+	}    
+      
+      l = 0;
+      while(isNum(*ch))
+	{    
+	  buf[l] = *ch;
+	  ch++;	
+	  l++;
+	}
+      buf[l] = '\0';
+      upper = atoi(buf);     
+    SINGLE_NUMBER:
+      partitions[i][2 + 3 * (pairsCount - 1)] = upper;        	  
+      
+      if(upper < lower)
+	{
+	  printf("Upper bound %d smaller than lower bound %d for this partition: %s\n", upper, lower,  p_names[i]);
+	  exit(-1);
+	}
+      
+      skipWhites(&ch);
+      
+      if(*ch == '\0' || *ch == '\n' || *ch == '\r') /* PC-LINEBREAK*/
+	{    
+	  goto parsed;
+	}
+      
+      if(*ch == ',')
+	{	 
+	  ch++;
+	  goto numberPairs;
+	}
+      
+      if(*ch == '\\')
+	{
+	  ch++;
+	  skipWhites(&ch);	 	
+
+	  if(!isNum(*ch))
+	    {
+	      printf("%c Number expected in %s\n", *ch, p_names[i]);
+	      exit(-1);
+	    }   
+
+	  if(adef->compressPatterns == FALSE)
+	    {
+	      printf("\nError: You are not allowed to use interleaved partitions, that is, assign non-contiguous sites\n");
+	      printf("to the same partition model, when pattern compression is disabled via the -c flag!\n\n");
+	      exit(-1);
+	    }
+	  
+	  l = 0;
+	  while(isNum(*ch))
+	    {
+	      buf[l] = *ch;
+	      ch++;	
+	      l++;
+	    }
+	  buf[l] = '\0';
+	  modulo = atoi(buf);      
+	  partitions[i][3 + 3 * (pairsCount - 1)] = modulo; 	
+	  
+	  skipWhites(&ch);
+	  if(*ch == '\0' || *ch == '\n' || *ch == '\r')
+	    {	     
+	      goto parsed;
+	    }
+	  if(*ch == ',')
+	    {	       
+	      ch++;
+	      goto numberPairs;
+	    }
+	}  
+      
+      
+      printf("\nError: You may be using \"/\" for specifying interleaved partitions in the model file, while it should be \"\\\" !\n\n");
+      assert(0);
+       
+    parsed:
+      i = i;
+    }
+  
+  fclose(f);
+ 
+  /*********************************************************************************************************************/ 
+
+  for(i = 0; i <= rdta->sites; i++)
+    tr->model[i] = -1;
+  
+  for(i = 0; i < numberOfModels; i++)
+    {   
+      as = partitions[i][0];     
+      
+      for(j = 0; j < as; j++)
+	{
+	  lower = partitions[i][1 + j * 3];
+	  upper = partitions[i][2 + j * 3]; 
+	  modulo = partitions[i][3 + j * 3];	
+	 
+	  if(modulo == -1)
+	    {
+	      for(k = lower; k <= upper; k++)
+		setModel(i, k, tr->model);
+	    }
+	  else
+	    {
+	      for(k = lower; k <= upper; k += modulo)
+		{
+		  if(k <= rdta->sites)
+		    setModel(i, k, tr->model);	      
+		}
+	    }
+	}        
+    }
+
+
+  for(i = 1; i < rdta->sites + 1; i++)
+    {
+      
+      if(tr->model[i] == -1)
+	{
+	  printf("ERROR: Alignment Position %d has not been assigned any model\n", i);
+	  exit(-1);
+	}      
+    }  
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      free(partitions[i]);
+      free(p_names[i]);
+    }
+  
+  free(partitions);
+  free(p_names);    
+    
+  tr->NumberOfModels = numberOfModels;     
+  
+  
+}
+
+/*******************************************************************************************************************************/
+
+void handleExcludeFile(tree *tr, analdef *adef, rawdata *rdta)
+{
+  FILE *f;  
+  char buf[256];
+  int
+    ch,
+    j, value, i,
+    state = 0,
+    numberOfModels = 0,
+    l = -1,
+    excludeRegion   = 0,
+    excludedColumns = 0,
+    modelCounter    = 1;
+  int
+    *excludeArray, *countArray, *modelList;
+  int
+    **partitions;
+
+  printf("\n\n");
+
+  f = myfopen(excludeFileName, "rb");    
+
+  while((ch = getc(f)) != EOF)
+    {
+      if(ch == '-')
+	numberOfModels++;
+    } 
+
+  excludeArray = (int*)malloc(sizeof(int) * (rdta->sites + 1));
+  countArray   = (int*)malloc(sizeof(int) * (rdta->sites + 1));
+  modelList    = (int *)malloc((rdta->sites + 1)* sizeof(int));
+
+  partitions = (int **)malloc(sizeof(int *) * numberOfModels);  
+  for(i = 0; i < numberOfModels; i++)
+    partitions[i] = (int *)malloc(sizeof(int) * 2);
+
+  rewind(f);
+  
+  while((ch = getc(f)) != EOF)
+    {     
+      switch(state)
+	{
+	case 0: /* get first number */
+	  if(!whitechar(ch))
+	    {
+	      if(!isNum(ch))
+		{
+		  printf("exclude file must have format: number-number [number-number]*\n");
+		  exit(-1);
+		}
+	      l = 0;
+	      buf[l++] = ch;
+	      state = 1;
+	    }
+	  break;
+	case 1: /*get the number or detect - */
+	  if(!isNum(ch) && ch != '-')
+	    {
+	      printf("exclude file must have format: number-number [number-number]*\n");
+	      exit(-1);
+	    }
+	  if(isNum(ch))
+	    {
+	      buf[l++] = ch;
+	    }
+	  else
+	    {
+	      buf[l++] = '\0';	     
+	      value = atoi(buf);
+	      partitions[excludeRegion][0] = value;
+	      state = 2;
+	    }
+	  break;
+	case 2: /*get second number */
+	  if(!isNum(ch))
+	    {
+	      printf("exclude file must have format: number-number [number-number]*\n");
+	      exit(-1);
+	    }
+	  l = 0;
+	  buf[l++] = ch;
+	  state = 3;
+	  break;
+	case 3: /* continue second number or find end */	 
+	  if(!isNum(ch) && !whitechar(ch))
+	    {
+	      printf("exclude file must have format: number-number [number-number]*\n");
+	      exit(-1);
+	    }
+	  if(isNum(ch))
+	    {
+	      buf[l++] = ch;
+	    }
+	  else
+	    {	      
+	      buf[l++] = '\0';	     
+	      value = atoi(buf);
+	      partitions[excludeRegion][1] = value;
+	      excludeRegion++;
+	      state = 0;
+	    }
+	  break;
+	default:
+	  assert(0);
+	}
+    }
+     
+  if(state == 3)
+    {
+      buf[l++] = '\0';     
+      value = atoi(buf);
+      partitions[excludeRegion][1] = value;
+      excludeRegion++;
+    }
+  
+  assert(excludeRegion == numberOfModels);
+
+  for(i = 0; i <= rdta->sites; i++)
+    {
+      excludeArray[i] = -1;
+      countArray[i] = 0;      
+      modelList[i] = -1;
+    }  
+
+  for(i = 0; i < numberOfModels; i++)
+    {
+      int lower = partitions[i][0];
+      int upper = partitions[i][1];
+
+      if(lower > upper)
+	{
+	  printf("Misspecified exclude region %d\n", i);
+	  printf("lower bound %d is greater than upper bound %d\n", lower, upper);
+	  exit(-1);
+	}
+
+      if(lower == 0)
+	{
+	  printf("Misspecified exclude region %d\n", i);
+	  printf("lower bound must be greater than 0\n");
+	  exit(-1);
+	}
+
+      if(upper > rdta->sites)
+	{
+	  printf("Misspecified exclude region %d\n", i);
+	  printf("upper bound %d must be smaller than %d\n", upper, (rdta->sites + 1));
+	  exit(-1);
+	}	
+      for(j = lower; j <= upper; j++)
+	{
+	  if(excludeArray[j] != -1)
+	    {
+	      printf("WARNING: Exclude regions %d and %d overlap at position %d (already excluded %d times)\n", 
+		     excludeArray[j], i, j, countArray[j]);
+	    }
+	  excludeArray[j] = i;
+	  countArray[j]   =  countArray[j] + 1;	 
+	}
+    }
+
+  for(i = 1; i <= rdta->sites; i++)
+    {
+      if(excludeArray[i] != -1)
+	excludedColumns++;
+      else
+	{
+	  modelList[modelCounter] = tr->model[i];
+	  modelCounter++;
+	}
+    }
+
+  printf("You have excluded %d out of %d columns\n", excludedColumns, rdta->sites);
+
+  if(excludedColumns == rdta->sites)
+    {
+      printf("Error: You have excluded all sites\n");
+      exit(-1);
+    }
+
+  if(adef->useSecondaryStructure && (excludedColumns > 0))
+    {
+      char mfn[2048];
+      int countColumns;
+      FILE *newFile;
+
+      assert(adef->useMultipleModel);
+
+      strcpy(mfn, secondaryStructureFileName);
+      strcat(mfn, ".");
+      strcat(mfn, excludeFileName);
+
+      newFile = myfopen(mfn, "wb");
+
+      printBothOpen("\nA secondary structure file with analogous structure assignments for non-excluded columns is printed to file %s\n", mfn);        	     	    
+		  
+      for(i = 1, countColumns = 0; i <= rdta->sites; i++)
+	{		  
+	  if(excludeArray[i] == -1)
+	    fprintf(newFile, "%c", tr->secondaryStructureInput[i - 1]);
+	  else
+	    countColumns++;
+	}
+		  
+      assert(countColumns == excludedColumns);
+		  
+      fprintf(newFile,"\n");
+		  
+      fclose(newFile);
+    }
+
+
+  if(adef->useMultipleModel && (excludedColumns > 0))
+    {      
+      char mfn[2048];
+      FILE *newFile;
+
+      strcpy(mfn, modelFileName);
+      strcat(mfn, ".");
+      strcat(mfn, excludeFileName);
+
+      newFile = myfopen(mfn, "wb");
+
+      printf("\nA partition file with analogous model assignments for non-excluded columns is printed to file %s\n", mfn);     
+	      
+      for(i = 0; i < tr->NumberOfModels; i++)
+	{
+	  boolean modelStillExists = FALSE;
+		  
+	  for(j = 1; (j <= rdta->sites) && (!modelStillExists); j++)
+	    {
+	      if(modelList[j] == i)
+		modelStillExists = TRUE;
+	    }
+
+	  if(modelStillExists)
+	    {	  	      
+	      int k = 1;
+	      int lower, upper;
+	      int parts = 0;
+
+	      switch(tr->partitionData[i].dataType)
+		{
+		case AA_DATA:		      		     
+		  {
+		    char AAmodel[1024];
+		    
+		    strcpy(AAmodel, protModels[tr->partitionData[i].protModels]);
+		    if(tr->partitionData[i].protFreqs)
+		      strcat(AAmodel, "F");		  
+		    
+		    fprintf(newFile, "%s, ", AAmodel);
+		  }
+		  break;
+		case DNA_DATA:
+		  fprintf(newFile, "DNA, ");
+		  break;
+		case BINARY_DATA:
+		  fprintf(newFile, "BIN, ");
+		  break;
+		case GENERIC_32:
+		  fprintf(newFile, "MULTI, ");
+		  break;
+		case GENERIC_64:
+		  fprintf(newFile, "CODON, ");
+		  break;
+		default:
+		  assert(0);
+		}
+
+	      fprintf(newFile, "%s = ", tr->partitionData[i].partitionName);
+	      
+	      while(k <= rdta->sites)
+		{
+		  if(modelList[k] == i)
+		    {
+		      lower = k;
+		      while((modelList[k + 1] == i) && (k <= rdta->sites))		      			
+			k++;
+		      upper = k;
+		      
+		      if(lower == upper)		  
+			{
+			  if(parts == 0)
+			    fprintf(newFile, "%d", lower);
+			  else
+			    fprintf(newFile, ",%d", lower);
+			}
+		      else
+			{
+			  if(parts == 0)
+			    fprintf(newFile, "%d-%d", lower, upper);
+			  else
+			    fprintf(newFile, ",%d-%d", lower, upper);
+			}		  
+		      parts++;
+		    }
+		  k++;
+		}
+	      fprintf(newFile, "\n");
+	    }		  
+	}	
+      fclose(newFile);
+    }
+
+  
+  {
+    FILE *newFile;
+    char mfn[2048];
+   
+
+    strcpy(mfn, seq_file);
+    strcat(mfn, ".");
+    strcat(mfn, excludeFileName);
+    
+    newFile = myfopen(mfn, "wb");
+    
+    printf("\nAn alignment file with excluded columns is printed to file %s\n\n\n", mfn);
+    
+    fprintf(newFile, "%d %d\n", tr->mxtips, rdta->sites - excludedColumns);
+    
+    for(i = 1; i <= tr->mxtips; i++)
+      {   
+	unsigned char *tipI =  &(rdta->y[i][1]);
+	fprintf(newFile, "%s ", tr->nameList[i]);
+	
+	for(j = 0; j < rdta->sites; j++)
+	  {
+	    if(excludeArray[j + 1] == -1)	      
+	      fprintf(newFile, "%c", getInverseMeaning(tr->dataVector[j + 1], tipI[j]));	       	  
+	  }
+	
+	fprintf(newFile, "\n");
+      }
+    
+    fclose(newFile);
+  }
+
+  
+  fclose(f);
+  for(i = 0; i < numberOfModels; i++)
+    free(partitions[i]);
+  free(partitions);  
+  free(excludeArray);
+  free(countArray);
+  free(modelList);
+}
+
+
+void parseProteinModel(analdef *adef)
+{
+  FILE *f; 
+  int doublesRead = 0;
+  int result = 0;
+  int i, j;
+  double acc = 0.0;
+
+  assert(adef->userProteinModel);
+  printf("User-defined prot mod %s\n", proteinModelFileName);
+
+  adef->externalAAMatrix = (double*)malloc(420 * sizeof(double));
+
+  f = myfopen(proteinModelFileName, "rb");
+  
+ 
+
+  while(doublesRead < 420)
+    {     
+      result = fscanf(f, "%lf", &(adef->externalAAMatrix[doublesRead++]));           
+
+      if(result == EOF)
+	{
+	  printf("Error protein model file must consist of exactly 420 entries \n");
+	  printf("The first 400 entries are for the rates of the AA matrix, while the\n");
+	  printf("last 20 should contain the empirical base frequencies\n");
+	  printf("Reached End of File after %d entries\n", (doublesRead - 1));
+	  exit(-1);
+	}    
+    }
+       
+  fclose(f);
+
+  /* CHECKS */
+  for(i = 0; i < 20; i++)
+    for(j = 0; j < 20; j++)
+      {
+	if(i != j)
+	  {
+	    if(adef->externalAAMatrix[i * 20 + j] != adef->externalAAMatrix[j * 20 + i])
+	      {
+		printf("Error user-defined Protein model matrix must be symmetric\n");
+		printf("Entry P[%d][%d]=%f at position %d is not equal to P[%d][%d]=%f at position %d\n", 
+		       i, j,  adef->externalAAMatrix[i * 20 + j], (i * 20 + j),
+		       j, i,  adef->externalAAMatrix[j * 20 + i], (j * 20 + i));
+		exit(-1);
+	      }
+	  }
+      }
+
+  acc = 0.0;
+
+  for(i = 400; i < 420; i++)    
+    acc += adef->externalAAMatrix[i];         
+
+  if((acc > 1.0 + 1.0E-6) || (acc <  1.0 - 1.0E-6))
+    {
+      printf("Base frequencies in user-defined AA substitution matrix do not sum to 1.0\n");
+      printf("the sum is %1.80f\n", acc);
+      exit(-1);
+    }
+
+}
+
+
+
+
+void parseSecondaryStructure(tree *tr, analdef *adef, int sites)
+{
+  if(adef->useSecondaryStructure)
+    {
+      FILE *f = myfopen(secondaryStructureFileName, "rb");
+
+      int
+	i,
+	k,
+	countCharacters = 0,
+	ch,
+	*characters,
+	**brackets,
+	opening,
+	closing,
+	depth,
+	numberOfSymbols,
+	numSecondaryColumns;      
+
+      unsigned char bracketTypes[4][2] = {{'(', ')'}, {'<', '>'},  {'[', ']'},  {'{', '}'}};          
+
+      numberOfSymbols = 4;     
+
+      tr->secondaryStructureInput = (char*)malloc(sizeof(char) * sites);
+
+      while((ch = fgetc(f)) != EOF)
+	{
+	  if(ch == '(' || ch == ')' || ch == '<' || ch == '>' || ch == '[' || ch == ']' || ch == '{' || ch == '}' || ch == '.')
+	    countCharacters++;
+	  else
+	    {
+	      if(!whitechar(ch))
+		{
+		  printf("Secondary Structure file %s contains character %c at position %d\n", secondaryStructureFileName, ch, countCharacters + 1);
+		  printf("Allowed Characters are \"( ) < > [ ] { } \" and \".\" \n");
+		  errorExit(-1);
+		}
+	    }
+	}
+      
+      if(countCharacters != sites)
+	{
+	  printf("Error: Alignment length is: %d, secondary structure file has length %d\n", sites, countCharacters);
+	  errorExit(-1);
+	}
+    
+      characters = (int*)malloc(sizeof(int) * countCharacters); 
+
+      brackets = (int **)malloc(sizeof(int*) * numberOfSymbols);
+      
+      for(k = 0; k < numberOfSymbols; k++)	  
+	brackets[k]   = (int*)calloc(countCharacters, sizeof(int));
+
+      rewind(f);
+
+      countCharacters = 0;
+      while((ch = fgetc(f)) != EOF)
+	{
+	  if(!whitechar(ch)) 
+	    {
+	      tr->secondaryStructureInput[countCharacters] = ch;
+	      characters[countCharacters++] = ch;
+	    }
+	}
+      
+      assert(countCharacters == sites);
+
+      for(k = 0; k < numberOfSymbols; k++)
+	{
+	  for(i = 0, opening = 0, closing = 0, depth = 0; i < countCharacters; i++)
+	    {
+	      if((characters[i] == bracketTypes[k][0] || characters[i] == bracketTypes[k][1]) && 
+		 (tr->extendedDataVector[i+1] == AA_DATA || tr->extendedDataVector[i+1] == BINARY_DATA ||
+		  tr->extendedDataVector[i+1] == GENERIC_32 || tr->extendedDataVector[i+1] == GENERIC_64))
+		{
+		  printf("Secondary Structure only for DNA character positions \n");
+		  printf("I am at position %d of the secondary structure file and this is not part of a DNA partition\n", i+1);
+		  errorExit(-1);
+		}
+	      
+	      if(characters[i] == bracketTypes[k][0])
+		{	      
+		  depth++;
+		  /*printf("%d %d\n", depth, i);*/
+		  brackets[k][i] = depth;
+		  opening++;
+		}
+	      if(characters[i] == bracketTypes[k][1])
+		{	  
+		  brackets[k][i] = depth; 
+		  /*printf("%d %d\n", depth, i);  */
+		  depth--;	
+		  
+		  closing++;
+		}	  	  	  
+	      
+	      if(closing > opening)
+		{
+		  printf("at position %d there is a closing bracket too much\n", i+1);
+		  errorExit(-1);
+		}
+	    }	
+
+	  assert(depth == 0 && countCharacters == sites);
+	
+      
+	  if(closing != opening)
+	    {
+	      printf("Number of opening brackets %d should be equal to number of closing brackets %d\n", opening, closing);
+	      errorExit(-1);
+	    }
+	}
+      
+      for(i = 0, numSecondaryColumns = 0; i < countCharacters; i++)
+	{
+	  int checkSum = 0;
+
+	  for(k = 0; k < numberOfSymbols; k++)
+	    {
+	      if(brackets[k][i] > 0)
+		{
+		  checkSum++;
+		  
+		  switch(tr->secondaryStructureModel)
+		    {
+		    case SEC_16:
+		    case SEC_16_A:
+		    case SEC_16_B:
+		    case SEC_16_C:
+		    case SEC_16_D:
+		    case SEC_16_E:
+		    case SEC_16_F:
+		    case SEC_16_I:
+		    case SEC_16_J:
+		    case SEC_16_K:
+		      tr->extendedDataVector[i+1] = SECONDARY_DATA;
+		      break;
+		    case SEC_6_A:
+		    case SEC_6_B:
+		    case SEC_6_C:
+		    case SEC_6_D:
+		    case SEC_6_E:
+		      tr->extendedDataVector[i+1] = SECONDARY_DATA_6;
+		      break;
+		    case SEC_7_A:
+		    case SEC_7_B:
+		    case SEC_7_C:
+		    case SEC_7_D:
+		    case SEC_7_E:
+		    case SEC_7_F:
+		      tr->extendedDataVector[i+1] = SECONDARY_DATA_7;
+		      break;
+		    default:
+		      assert(0);
+		    }
+		 
+		  numSecondaryColumns++;
+		}
+	    }
+	  assert(checkSum <= 1);
+	}
+      
+      assert(numSecondaryColumns % 2 == 0);
+      
+      /*printf("Number of secondary columns: %d merged columns: %d\n", numSecondaryColumns, numSecondaryColumns / 2);*/
+
+      tr->numberOfSecondaryColumns = numSecondaryColumns;
+      if(numSecondaryColumns > 0)
+	{
+	  int model = tr->NumberOfModels;
+	  int countPairs;
+	  pInfo *partBuffer = (pInfo*)malloc(sizeof(pInfo) * tr->NumberOfModels);
+
+	  for(i = 1; i <= sites; i++)
+	    {
+	      for(k = 0; k < numberOfSymbols; k++)
+		{
+		  if(brackets[k][i-1] > 0)
+		    tr->model[i] = model;
+		}
+
+	    }
+
+	  /* now make a copy of partition data */
+
+	 
+	  for(i = 0; i < tr->NumberOfModels; i++)
+	    {
+	      partBuffer[i].partitionName = (char*)malloc((strlen(tr->extendedPartitionData[i].partitionName) + 1) * sizeof(char));
+	      strcpy(partBuffer[i].partitionName, tr->extendedPartitionData[i].partitionName);
+	      partBuffer[i].dataType =  tr->extendedPartitionData[i].dataType;
+	      partBuffer[i].protModels=  tr->extendedPartitionData[i].protModels;
+	      partBuffer[i].protFreqs=  tr->extendedPartitionData[i].protFreqs;	      
+	    }
+
+	  for(i = 0; i < tr->NumberOfModels; i++)
+	    free(tr->extendedPartitionData[i].partitionName);
+	  free(tr->extendedPartitionData);
+	 
+	  tr->extendedPartitionData = (pInfo*)malloc(sizeof(pInfo) * (tr->NumberOfModels + 1));
+	  
+	  for(i = 0; i < tr->NumberOfModels; i++)
+	    {
+	      tr->extendedPartitionData[i].partitionName = (char*)malloc((strlen(partBuffer[i].partitionName) + 1) * sizeof(char));
+	      strcpy(tr->extendedPartitionData[i].partitionName, partBuffer[i].partitionName);
+	      tr->extendedPartitionData[i].dataType =  partBuffer[i].dataType;
+	      tr->extendedPartitionData[i].protModels= partBuffer[i].protModels;
+	      tr->extendedPartitionData[i].protFreqs=  partBuffer[i].protFreqs;	      
+	      free(partBuffer[i].partitionName);
+	    }
+	  free(partBuffer);
+
+	  tr->extendedPartitionData[i].partitionName = (char*)malloc(64 * sizeof(char));
+
+	  switch(tr->secondaryStructureModel)
+	    {
+	    case SEC_16:
+	    case SEC_16_A:
+	    case SEC_16_B:
+	    case SEC_16_C:
+	    case SEC_16_D:
+	    case SEC_16_E:
+	    case SEC_16_F:
+	    case SEC_16_I:
+	    case SEC_16_J:
+	    case SEC_16_K:
+	      strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 16 STATE MODEL");
+	      tr->extendedPartitionData[i].dataType = SECONDARY_DATA;
+	      break;
+	    case SEC_6_A:
+	    case SEC_6_B:
+	    case SEC_6_C:
+	    case SEC_6_D:
+	    case SEC_6_E:
+	      strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 6 STATE MODEL");
+	      tr->extendedPartitionData[i].dataType = SECONDARY_DATA_6;
+	      break;
+	    case SEC_7_A:
+	    case SEC_7_B:
+	    case SEC_7_C:
+	    case SEC_7_D:
+	    case SEC_7_E:
+	    case SEC_7_F:
+	      strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 7 STATE MODEL");
+	      tr->extendedPartitionData[i].dataType = SECONDARY_DATA_7;
+	      break;
+	    default:
+	      assert(0);
+	    }
+
+	  tr->extendedPartitionData[i].protModels= -1;
+	  tr->extendedPartitionData[i].protFreqs=  -1;	 
+
+	  tr->NumberOfModels++;	 
+	  
+	  if(adef->perGeneBranchLengths)
+	    {
+	      /*if(tr->NumberOfModels > NUM_BRANCHES)
+		{
+		  printf("You are trying to use %d partitioned models for an individual per-gene branch length estimate.\n", tr->NumberOfModels);
+		  printf("Currently only %d are allowed to improve efficiency.\n", NUM_BRANCHES);
+		  printf("Note that the number of partitions has automatically been incremented by one to accomodate secondary structure models\n");
+		  printf("\n");
+		  printf("In order to change this please replace the line \"#define NUM_BRANCHES   %d\" in file \"axml.h\" \n", NUM_BRANCHES);
+		  printf("by \"#define NUM_BRANCHES   %d\" and then re-compile RAxML.\n", tr->NumberOfModels);
+		  exit(-1);
+		}
+		else*/
+		{		  
+		  tr->numBranches = tr->NumberOfModels;
+		}
+	    }
+	  
+	  assert(countCharacters == sites);
+
+	  tr->secondaryStructurePairs = (int*)malloc(sizeof(int) * countCharacters);
+	  for(i = 0; i < countCharacters; i++)
+	    tr->secondaryStructurePairs[i] = -1;
+	  /*
+	    for(i = 0; i < countCharacters; i++)
+	    printf("%d", brackets[i]);
+	    printf("\n");
+	  */
+	  countPairs = 0;
+
+	  for(k = 0; k < numberOfSymbols; k++)
+	    {
+	      i = 0;
+	      
+	  
+	      while(i < countCharacters)
+		{
+		  int 
+		    j = i,
+		    bracket = 0,
+		    openBracket,
+		    closeBracket;
+		  
+		  while(j < countCharacters && ((bracket = brackets[k][j]) == 0))
+		    {
+		      i++;
+		      j++;
+		    }
+		  
+		  if(j == countCharacters)
+		    {
+		      assert(bracket == 0);
+		      break;
+		    }
+	    
+		  openBracket = j;
+		  j++;
+		  
+		  while(bracket != brackets[k][j] && j < countCharacters)
+		    j++;
+		  assert(j < countCharacters);
+		  closeBracket = j;
+		  
+		  assert(closeBracket < countCharacters && openBracket < countCharacters);
+
+		  assert(brackets[k][closeBracket] > 0 && brackets[k][openBracket] > 0);
+		  
+		  /*printf("%d %d %d\n", openBracket, closeBracket, bracket);*/
+		  brackets[k][closeBracket] = 0;
+		  brackets[k][openBracket]  = 0;	      	      
+		  countPairs++;
+		  
+		  tr->secondaryStructurePairs[closeBracket] = openBracket;
+		  tr->secondaryStructurePairs[openBracket] = closeBracket;
+		}
+
+	      assert(i == countCharacters);
+	    }
+	     
+	  assert(countPairs == numSecondaryColumns / 2);
+
+	  
+	  /*for(i = 0; i < countCharacters; i++)
+	    printf("%d ", tr->secondaryStructurePairs[i]);
+	    printf("\n");*/
+	  
+
+	  adef->useMultipleModel = TRUE;
+
+	}
+
+      
+      for(k = 0; k < numberOfSymbols; k++)	  
+	free(brackets[k]);
+      free(brackets);
+      free(characters);
+
+      fclose(f);
+    }
+}
diff --git a/testData/140 b/testData/140
new file mode 100644
index 0000000..3b370ce
--- /dev/null
+++ b/testData/140
@@ -0,0 +1,142 @@
+ 140 1104
+Seq1 MDFIDTESSSESDSELHRQLLL-----QQRDETDTVLRELVQGGCNRKRRASSTCTVPQTVPKAMRLYSSNFFCLTSKNRKLSALAAFKKMYNASYYEVAREYKSDKTQSYEWVLGCSPERVSALNCLKGVTEFILYD-YSALFYLEFYCSKNREGVRRLLNVDTDSILLLNPPNKRSVLAALFYQKLVLAHGD--FPDWCR-D-ILSNFELSQMIQWALDNKHHDEGSIAYHYAIHAEQDNNAKLWLQSNQQAKYVRDAATMVRHFVKGRLHSITMSEHIAAQDGWKKILVFLTFQHINFKEFISILCMWLKGPKKSCITIAGVPDSGKSMFAYSLIKFLNGSVLSFANSKSHFWLQPLTECKAALIDDVTLPCWDYVDTFLRNALDGNAICDCKHRAPVQTKCPPLLLTSNYDPRLHYLNSRIQFLLFNRVIPLYG-TQPRFYIEPADWRSFFQKYSEDLQLYDGEGELLQKLLQLQERESQLLE [...]
+Seq2 MDFITIESSQCEEEESHVQLL-------QDFEIIPKR--VAGHYTRKRRRTRSPGDPKKTVPKTIRQYNRPHKCLVSKNKTLTALAVFKELYTASFTEVTRTFKSDKTQSYEWVLGCSHIALEAVKVLIHNTEHVILD-HLGVYYVGFTVSKSREGLLRFLNIFTENVVLSNPPNKRSVLSALFFDKLVQVSGD--KPQWMI-D-IITSFELSKMIQWALDNNMYDEGAIAYNYALLADTDLNAQLWLKHNSQAKYVRDAATMCRHYRRGQMQAIGVMEHLATRG-WKRIIVFLRYQHVDHHTFINDLKYWIVNPKRSTIAIVGIPDSGKSMFGMSLIQFLDGRVLSFSNHKSHFWLQPLSETRYALVDDVTWPAWDYMDVYMRNALDGNPICDCKHRAPIQTKCPPLLLTSNYDPRERYLLSRITFMSFNRSIPCIG-GQPRFLISPADWRSFMLKFRKELDI---TGELRESLERLQRREAEILE [...]
+Seq3 MTDPNSKGFGDWCLLDISDLLDQGN-SLELFHQQECEQSEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNKKATLMAKFKESFGVGFNELTRQFKSHKTCCKDWVYAVHDDLFESSKLLQQHCDYIWVR--MSLYLLCFKAGKNRGTVHKLILNVHEQQILSEPPKLRNTAAALFWYKGCMGSGAGPYPDWIAQQTILGHFDFSAMVQWAFHNHLLDEADIAYQYARLAPEDANAVAWLAHNNQAKFVRECAYMVRFYKKGQMRDMSISEWIYTKGHWSDIVKFIRYQNINFIVFLTALKEFLHSPKKNCILIYGPPNSGKSSFAMSLIRVLKGRVLSFVNSKSQFWLQPLSECKIALLDDVTDPCWIYMDTYLRNGLDGHYVSDCKYRAPTQMKFPPLLLTSNINVHGEYLHTTIKGFEFPNPFPMKADNTPQFELTDQSWKSFFTRLWTQLDLEEEFQCLSERFNALQDQLMNIYE [...]
+Seq4 MADPKGSGFGDWCILDISDLIDQGN-SLELFHQQECKQSEEQLQKLKRKC-LSPSPRLQSIKSKRRLFDSGVELLRSSNKKATLMAKFKAAFGVGFNELTRQFKSHKTCCNHWVYAVHDDLFESSKLLQQHCDYLWVR--MSLYLLCFKAGKNRGTVHKLMLNVHEQQILSEPPKLRNTAAALFWYKGCMGSGVGPYPDWIAQQTILGHFDFSQMVQWAFDNQLVDEGDIAYRYARLAPEDANAVAWLAHNSQAKFVRECAAMVRFYKKGQMRDMSMSEWIYTKGHWSDIVKFLRYQEVNFIMFLAAFKDFLHSPKKNCILIHGPPNSGKSSFAMSLIRVLKGRVLSFVNSKSQFWLQPLSECKIALIDDVTDPCWLYMDNYLRNGLDGHYVSDCKYKAPMQTKFPPLLLTSNINVHEEYLHSRIKGFAFPNPFPMKSDDTPQFELTDQSWKSFFERLWTQLELEDEFQCLSERFNALQDLLMNIYE [...]
+Seq5 MADSKGSGFGDWCILDISDLLDQGN-SRELFHQQECKQSEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNQKATLLAKFKQAFGVGFNELTRQFKSYKTCCNHWVYAVHDDLFESSKLLQQHCDYIWVR--MSLYLLCFKAGKNRGTVHKLILNVHEQQILSEPPKLRNTAAALFWYKGCMGPGVGPYPEWIAQLTILGHFDLSVMVQWAFDNNLFEEADIAYGYARLAPEDSNAVAWLAHNNQAKYVRECAMMVRYYKKGQMRDMSMSEWIYTRGQWSSIVKFLRYQEINFISFLAALKDLLHSPKRNCILFHGPPNTGKSSFGMSLIKVLRGRVLSFVNSKSQFWLQPLGECKIALLDDVTDPCWVYMDQYLRNGLDGHFVSDCKYRAPMQTKFPPLILTSNINVHAEYLHSRIKGFEFKNPFPMKADNTPQFELTDQSWKSFFTRLWTHLDLEDEFQCLSERFNALQEQLMNIYE [...]
+Seq6 MADHKGSGLSEWCILDISDLLDQGN-SLELFHQQECEQSEEQLQKLKRKY-LSPSPRLQSIKSKRRLFDSGVELLRSSNTKATLMAKFKEAFGDGFNELTRQFKSYKTCCNYWVYAVHD-VYESSKLLQQHCDYIWVR--ITLYLLSFKAGKNRGTVHKLMLNVQEQQILSEPPKLRHTAAALFWYKGGMGTGTGSYPDWIAHQTILGHFDFSVMVQWAFDNNHFEEADIAYGYAKLAPEDANAVAWLAHNSQAKFVRECAAMVRFYKRGQMREMTMSEWIYTRGHWSSIVKFVRYQGINFITFLAALKDFLHSPKRNCLLIYGPPNTGKSTFAMSLIQVLKGRVLSYVNSKSQFWLQPLGDCKIALLDDVTDPCWLYMDTFLRNGLDGHVVSDCKYKAPMQIKFPPLLLTSNINLHEEYLHSRVRGFEFPNPFPMKPDNTPEFELTDQSWKSFFARLWTQLELEDEFQCLSERFNVLQDQLMNIYE [...]
+Seq7 MADSKGSGLSDWCILDVSDLLDQGN-SLELFHQQECEQSEEQLQILKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNIKATLMAKFKESFGVGFNELTRQFKSYKTCCNDWVYAVHDDLFESSKLLQQHCDYIWVR--MTLYLLCFKAGKNRGTVHKLMLNVQEQQILSEPPKLRNTAAALFWYKGGMGSGAGTYPDWIAHQTILGHFDFSAMVQWAFDNNYLEEPDIAYQYAKLAPEDSNAVAWLAHNQQAKFVRECAAMVRFYKKGQMKEMSMSEWIHTKGHWSDIVKFLRYQDVNFITFLAAFKNFLHAPKHNCILIYGPPNSGKSSFAMSLIKVLKGRVLSFVNSKSQFWLQPLGESKIALLDDVTDPCWVYIDTYLRNGLDGHFVSDCKYKAPVQIKFPPLLLTSNINVHGEYLHSRIKGFEFPHPFPMKPDNTPQFQLTDQSWKSFFERLWTQLDLEEEFQCLSERFNVLQDQLMNIYE [...]
+Seq8 MADPKGSGLGDWCILDVSDLLDQGN-SLELFHQQECKQSEEQLQILKRKY-LSPSPRLELMKSKRRLFDSGLELLRSSNRKATLMAKFKDAFGVGFNELTRQFKSYKTCCNHRVYAVHDDLFESSKLLQQHCDYIWVR--MTLYLLCFKAGKNRGTVHKLLLNVQEQQILSEPPKLRNTAAALFWYKGGMGSGAGKYPDWIAQQTVLGHFDFSVMVQWAFDNNHVDEADIAYQYARLAPEDSNAVAWLAHNSQAKFVRDCAAMVRFYKNLQMREMSMSEWIYTRGHWSSIVKFLGYQGVNFIMFLAALKNFLHAPKQNCILIHGPPNSGKSSFAMSLIKVLKGRVLSFVNSRSQFWLQPLGECKIALIDDVTDPCWLYMDTYLRNGLDGHFVSDCKYKAPVQTKFLPLLLTSNINVHEEYLHSRIKGFEFPNPFPMKSDNTPQFELTDQSWKSFFERLWTQLELEEEFQCLSERFNVLQDQLMNIYE [...]
+Seq9 MADPKGSGLDDWCIVDISELLDQGN-SRELFHQQESKESEEHLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRASNNKAILMAKFKEAFGVGFNDLTRQFKSYKTCCNHWVYAVHDDLLESSKLLQQHCDYVWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGAMGSGTGPYPDWMAHQTIVGHFDMSVMVQWAFDNNYLDEADIAYQYAKLAPEDSNAVAWLAHNNQARFVRECASMVRFYKKGQMKEMSMSEWIHTRGHWSTIAKFLRYQQVNFIMFLAALKDMLHSPKRNCILIYGPPNTGKSAFTMSLIRVLRGRVLSFVNSKSQFWLQPMSECKIALIDDVTDPCWLYMDTYLRNGLDGHYVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPEFELTDQSWKSFFTRLWNQLELEDEFQCLSDRFNALQDQLMNIYE [...]
+Seq10 MADPKGSGLEDWCIVDISELLDQGN-SRELFHQQESKESEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRASNNKAILMAKFKEFFGVGFNDLTRQFKSYKTCCNAWVYAVHDDLLESSKLLQQHCDYIWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQIISEPPKLRNVAAALFWYKGAMGSGAGPYPDWIAQQTIVGHFDMSAMVQWAFDNNYLDEADIAYQYAKLAPEDSNAVAWLAHNNQARYVREVASMVRFYKKGQMKEMSMSEWIHTRGHWSTIAKFLRYQQVNFIMFLAALKDMLHSPKRNCILIYGPPNTGKSAFTMSLIHVLRGRVLSFVNSKSQFWLQPMSECKIALIDDVTDPCWIYMDTYLRNGLDGHVVSDCKHKAPMQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPEFELTDQSWKSFFTRLWNQLELEDEFQCLSDRFNVLQDQLMNIY [...]
+Seq11 MADPKGSGLDDWCIVDISELLDQGN-SRELFHQQECKDSEEQLQKLKRKY-ISPSPRLESIKSKRRLFDSGLELLRASNHKAILLAKFKEAFGIGFNDLTRQFKSYKTCCNDWVYAVHEDLLESSKLLQQHCDYIWIR--MSLFLLCFKAGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGAMGSGAGPYPNWMAQQTIVGHFDLSEMIQWAFDHNYLDEADIAFQYAKLAPENSNAVAWLAHNNQARFVRECASMVRFYKKGQMKEMSMSEWIYARGHWSSIAKFLRYQQVNVIMFLAALKDMLHSPKHNCILIHGPPNTGKSAFTMSLIHVLKGRVLSFVNSKSQFWLQPMSETKIALIDDVTDPCWVYMDTYLRNGLDGHYVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKPDNTPEFELTDQSWKSFFTRLWKQLELEDEFQCLSKRFNALQDQLMNIY [...]
+Seq12 MAESKGSGFGDWCILDISDLLDQGN-SRELFHQQECQESEEHLQKLKRKY-LSPSPRFESIKSKRRLFDSGLELLRANNNRAILMAKFKEAFGVGFYDLTRQFKSYKTCCNAWVYAVHDDLLESSKLLQQHCDYVWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQILSEPPKLRNTAAALFWYKGCMGSGGGPYPDWIAQQTILGHFDLSEMIQWAFDNNHMDESDIAYQYAKLAPENSNAVAWLAHNNQARFVRECAAMVRFYKKGQMKEMSMSEWIYARGHWSTIAKFLRYQQVNFIMFLAALKDLLHAPKRNCILIYGPPNTGKSAFTMSLIRVLKGRVISFVNSKSQFWLQPLSECKIALLDDVTDPCWIYMDTYLRNGLDGHVVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIQGFEFPNPFPMKADNTPQFELTDQSWKSFFTRLWQQLELEEEFQCLSERFNVLQDQLMNIY [...]
+Seq13 MADPKGSGFNDWCILDISDLLDQGN-SRELFHLQECQESEEQLQKLKRKY-LSPSPRFESIKSKRRLFDSGLELLRASNNKAILMAKFKEAFGVGFNDLTRQFKSYKTCCNAWVYAVHDDLIESSKLLQQHCDYVWIR--MSLFLVCFKAGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGSMGSGVGSYPDWIAHQTILGHFDLSDMVQWAFDNNYLDEADIAYQYAKLAPDNSNAVAWLAHNNQAKFVRECASMVRFYKKGQMKEMSMSEWIYTKGQWSTIVQFLRYQQVNFIMFLAALKDLLHSPKRNCILFYGPPNTGKSAFTMSLIKVLKGRVLSFCNSKSQFWLQPLSECKIALLDDVTDPCWVYMDTYLRNGLDGHYVSDCKHKAPMQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPQFDLTDQSWKSFFTRLWHQLDLEDEFQCLSERFNVLQDQLMNIY [...]
+Seq14 MADNKGSGLHEWCLLDVSDLISQGN-SRELFQQQELEESNALLQSLKRKY-ISPSPQLESIKTKRKLFDSGVELMRCSNLKATLLSKFKNAFGVSFVELTRQFRSNKTCCNDWVYGVNYDLFESSKLLQQHCDYIWVT--MFLYLLCFKAGKNRQTVIRLLLYVAEEQILSEPPKLRSTVSALFWYKGSSNAATGSYPKWIIEQTLIGHFDMSTMVQWAFDNDLTEEADIAFQYAKLAPDDVNATAWLAHNNQARFVRECANMVRYYKKGQMREMSMSAWIHFKGQWSTIVKFIRYQGINFISFLSALKDFLHGPKKNCLLIYGPPNTGKSAFTMSLIKVLHGRVISFVNSKSHFWLQPMSEAKIALLDDATDPCWIYMDTYLRNGLDGHLVSDCKHKAPIQIRFPPLLITSNINAMAEYLHSRLVAFEFPNPFPMKDDDTPEFELTDQSWKSFFKRLWRQLDLEDEFRCLKKRFDVLQDLLMNIY [...]
+Seq15 MADNKGSGLSDWCLLDVSDLLNQGN-SRELFQQQELEDSETLLQSLKRKY-ISPSPQLESIKSKRKLFDSGVELMRCSNLKATLLAKFKSAFGVSFAELTRQYKSNKTCCNDWVYGVNNDLFEGSKLLQQHCDYIWLT--MYLYLLCFKAGKNRHTVIRLLLHVAEEQILSEPPKLRSTVAALFWYKGSSNSGTGSYPKWIVEQTLIGHFDMSTMVQWAFDNNLTEEADIAFQYAKLAPDDVNATAWLAHNNQARFVREVAAMVRFYKKGQMREMSMSAWIHFRGHWSSIVKFIRYQGINFISFLSALKDFLHAPKKNCLLIYGPPNTGKSAFTMSLIKVLNGRVISFVNSKSHFWLQPMSECKIALLDDATDPCWVYMDTYLRNGLDGHLVSDCKHRAPMQIKFPPLLITSNINAMAEYLHSRLVAFEFPNPFPMKDDDTPEFELTDQSWKSFFTRLWTQLELEDEFRCLRERFDVLQDQLMNIY [...]
+Seq16 MADDKGSENDNWCLLDISDLVDQGN-SRELLHQQQCNDSELQVQKLKRKY-LSPSPRLESIKSKRKLFDSGLELMRCSNVKATLLCKFKLAFGVSFSELTRQYKSNKTCCNDWVYGIRDELYEGSKLLQQHCDYIWVY--MSLFLLCFKAGKNRTTVHRLLLDVQEQQILSEPPKLRSTVAALFWYKGSFGSKAGAYPQWIVQQTMVGHFELSTMVQWAFDNNLTDEADIAYKYANMAFEDVNAAAWLAHNNQARFVRECASMVRFYKRGQMREMSISEWIHHKGHWSSIVKFIRYQEINFICFLAALKDFLHSPKRNCLLIYGPPNTGKSAFTMSLIKVLGGRVISFVNSRSQFWLQPLSECKIALLDDATDPCWTYMDTYLRNGLDGHMVSDCKHKAPMQTKFPPMLVTSNINVLEEYLHSRIVGFKFPNPFPLKPDNTPEFELTDQSWKSFFERLWSQLDLEEEFQCLSERFNALQDELMNIY [...]
+Seq17 MEDNKGTGCSDWFLVDISDLIDQGN-SRELLCQQETEESEQQVQLLKRKY-FSPSPRLQSIKSKKRLFDSGLELLKSSNVKATLMGKFKDAFGVGFNELTRQYKSNKTCCKDWVYCVQDDLLEASKLLQKHCNYIWMH--MTLYLLCFNAGKSRETVCRLLLQIDDMQALLEPPRLRSVLSALFWYKGSMNPNVGTYPDWIVAQTMISHFSLSRMVQWAFDNEHLEEADIAYNYAKLAETDSNAKAFLDSNSQANFVRQCALMVRHYKRGQMRDMSMSCWIHTRGHWSEIVKFIRYQNLNFIMFLDKFRTFLKNPKRNCMCFYGPPDSGKSMFTMSLINVLKGRVLSFANSRSQFWLQPLSETKLALLDDATQECWNYIDTFLRNGVDGNYVSDIKHRAPLQIKFPPLMITTNMNILKEYLHTRIEFFEFPNKFPFDNNNKPQFHLTDQSWKSFFERLWTQLELEDEFHCLSERFTALQDKLMDIY [...]
+Seq18 MSDNKGTCCSAWLSLDISDLIDQGN-SRELFCQQESEESEQQTQLLKRKY-ISPSPQLESIKPKRRLFDSGLELLKHSNVKAVLMAKFKEAFGVGFAELTRQYKSNKTCCRDWVYAVNDDLIESSKLLLQHCAYIWLH--MCLYLLCFNVGKSRETVCRLLLQVSEVQLLSEPPKLRSVCAALFWYKGSMNPNVGAYPEWILTQTLINHFDLSTMIQFAYDHEYFDEATIAYQYAKLAETDANARAFLQSNSQARLVKECATMVRHYMRGEMKEMSMSTWIHRKGQWSDIVRFIRYQDINFIEFLTVFKAFLQNPKQNCLLFHGPPDTGKSMFTMSLISVLKGKVLSFANCKSTFWLQPIADTKLALIDDVTHVCWEYIDQYLRNGLDGNYVCDMKHRAPCQMKFPPLMLTSNIDITKDYLHSRVKSFAFNNKFPLDANHKPQFELTDQSWKSFFKRLWTQLDLEDEFQCLSARFNALQETLMDLY [...]
+Seq19 MSDDKGTGCSDWFVLDISDLIDQGN-SRELLCQQESEESEQQIHWLKRKY-ISSSPRLQCIKSKRRLFDSGLELLKCNNVKAMLLAKFKEAFGVGFMELTRQYKSSKTCCRDWVYAVQDELLESSKLLIQHCAYIWLH--MCLYLLCFNVGKSRETVLRLLLQVSEIQIIAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMINHFDLSTMVQYAYDNELSEEAEIAWHYAKLADTDANARAFLQHNSQARLVKDCAIMVRHYRRGEMKEMSMSSWIHKKGHWSDIVKFVRYQDINFIQFLDSFKSFLHNPKKSCMLIYGPPDTGKSMFTMSLIKVLKGKVLSFANYKSTFWLQPVADTKIALIDDVTYVCWDYIDQYLRNALDGGVVCDMKHRAPCQIRFPPLMLTSNIDIMKEYLRSRVQAFAFPHKFPFDSDNNPQFKLTDQSWKSFFERLWRQLELEDEFQCLSERFNALQENLMDIY [...]
+Seq20 MSDEKGTGCSEWFDLDISDLIDQGN-SRELLCQQESEESEQQIHWLKRKY-ISSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGYMELTRQYRSSKTCCRDWVYAVQDELLESSKLLIQHCAYIWLH--MCLYLLCFNVGKSRETVLRLLLQVSEVQIIAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMISHFDLSTMVQYAYDNELTDEAEIAYHYAKLADTDANARAFLQHNSQARLVKDCAIMVRHYRRGEMKEMSMSAWIHKKGHWSDIVKFIRYQEINFIQFLNAFKLFLHNPKKSCLLFYGPPDTGKSMFTMSLIKLLKGKVLSFANYKSTFWLQPVADTKVALIDDVTYVCWDYIEQYLRNALDGNTVCDMKHRAPCQIRFPPLMLTSNIDIMKEYLYSRIQAFAFPHKFPFDSDNKPQFKLTDQSWKSFFERLWRQLDLEDEFQCLSERFNALQENLMDIY [...]
+Seq21 MTDDNKGGCSQWCILEISDLIDQGN-SRELLCQQESEESEQQIQLLKRKY-LSSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGYMDLTRQYKSSKTCCRDWVYAVQDELIESSKLLLQHCAYIWLQ--MCLYLLCFNVGKSRETVSRLLLQVAEVQMLAEPPKLRSMLSALFWYKGSMNPNVGEYPEWILTQTMINHFDLSTMIQFAYDNEYLQEDEIAYHYAKLADTDANARAFLQHNSQARFVKECAIMVRHYKRGEMKEMSISTWVHRKGHWSDIVKFIRYQDINFIRFLDIFKSFLHNPKKNCILIHGPPDTGKSMFTMSLIKVLKGKVLSFANCRSNFWLQPLADTKLALIDDVTFVCWDYIDQYLRNGLDGNVVCDLKHRAPCQIKFPPLLLTSNIDVMKEYLHSRIQSFAFPNKFPFDNNNMPQFRLTDQSWKSFFERLWHQLDLEEEFQCLSERFNVLQENLMDIY [...]
+Seq22 MTDDTKGGCSDWFVLDISDLIDQGN-SRELLCQQQSEESEQQIHLLKRKY-FSSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGFMELTRQYKSCKTCCRDWVYAVQDELIESSKLLLQHCAYIWLQ--MCLYLLCFNVGKSRETVFRLLLQVAEVQILAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMINHFDLSTMIQYAYDNDLINEDEIAYNYAKLADTDANARAFLQHNSQARFVRECALMVRYYKRGEMKDMSISAWIHNKGHWSDIVKFVRFQDINFIRFLDVFKSFLHNPKKNCLLFYGPPDTGKSMFTMSLIKVLKGKVLSFANYKSNFWLQPLADTKIALIDDVTHVCWDYIDQYLRNGLDGNFVCDLKHRAPCQIKFPPLLLTSNMDIMKEYLHSRVHAFAFPNKFPFDSNNKPQFRLTDQSWKSFFERLWKQLDLEDEFQCLSDRFNALQENLMDIY [...]
+Seq23 MDDDKGTGCSGWFMLDVSDLINQGN-SRELLCQQQSEECEQQIQYLKRKY-FSPSPRLQSMKSKRRLFDSGLELLRCNNVKAVLLGKFKDAFGVSYNELTRQFRSNKTCCKHWVYAAKDELIDASKLLQQHCTYLWLQ--MSLYLCCFNVGKSRETVMRLLLQVNENHILSEPPKIRSMIAALFWYKGSMNPNVGEYPEWIMTQTMIHHFDLSEMIQWAYDQDYVDECTIAYQYARLADSNSNARAFLAHNSQAKYVRECAQMVRYYKRGEMRDMSISAWIHHCGHWQDIVKFLRYQGLNFIVFLDKFRTFLKNPKKNCLLICGPPDTGKSMFSMSLMKALRGQVVSFANSKSHFWLQPLADAKLALLDDATEVCWQYIDAFLRNGLDGNMVSDMKHRAPCQMKFPPLIITSNISLKKEYLHSRIYEFEFPNKFPFDANDTPLFKLTDQSWASFFKRLWTQLELEEEFQCLSERFSALQEKLMDLY [...]
+Seq24 MDDDKGTGCSTWCLLDVSDLINQGN-SRELLCQQESEECEQQIQYLKRKYNISPSPRLQSLKSKRRLFDSGLELLRCKNAKAVLLHKFKEGFGISYNELTRQFKSNKTCCKHWVYGAKEELIDASKLLQQHCSYIWLQ--MSLYLCCFNVAKSRETVVKLLLQIHENHILSEPPKNRSVPVALFWYKGSMNPNVGEYPEWIVTQTMIQHFDLSRMIQWAYDNDHLDECSIAYNYAKLADTDSNARAFLAQNSQAKHVRDCAQMVKHYKRGEMREMTISAWVHHCGQWQDIVKFLRYQGLNFIVFLDKFRTFLQNPKKNCLLIYGPPDTGKSMFTMSLMKALRGQVISFANSKSQFWLQPLADAKIALLDDATEVCWQYIDMFLRNGLDGNVVSDMKHRAPCQMKFPPLIITSNISLKKEYLHSRIYEFEFPNRFPFDSDDKPLFKLTDQSWASFFKRLWIQLGLEDEFQCLSERFSALQDKLMDLY [...]
+Seq25 MADDKGTGCSDFIY-DISDLINQGN-SRELLCQQEREESELQVQYLKRKC-FSPSPRLQSMKSKRRLFDSGLELLRSSNSRATLLSKFKDSFGVSFTELTRQYKSNKTCCHHWVYAAKDDLIDASKLLQQHCFYIWLQ--MSLYLCCFNVGKSRDTVVRLILQVHENHILSEPPKNRSIPAALFWYKGSLNSNVGEAPDWILSQTMIQHFDLSRMIQWAYDNDHIDESIIAYQYAKLADIDSNAKAFLAHNSQVKYVKECALMVRYYKRGEMKEMSISAWIHHCGNWQHIVRFIRYQNLNFIMFLDKFRTFLKNPKKNCLLIYGPPDTGKSMFAMSLIKLLKGSVVSFANSKSQFWLQPLADGKIGLLDDATDVCWQYIDSFLRNGLDGNLVSDIKHKAPCQMKFPPLIITSNINLLKEFLHSRVTQIDFPNKFPFDSDNKPLFELTDQSWASFFKRLWTQLELEDEFHCLSARFTVLQEKLMDIY [...]
+Seq26 MADDKGTGCSEWFIDNISNLLNQGN-SRDLLRQQEFEESAEQVQKLKRKY-FSPSPRLQSMKSKRRLFDSGLELMRCNNSRAKLLSKVKEYFGVGFYELARQYKSDKTCCKDWVYGVREELVESAKLLLNHCSYVWIN--MTLYLLCFNHAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGAYPDWIVNQTMISHFDLSEMIQWAYDSDLTDEADIAYLYAKMANSDSNARAWLAHNNQARYLRECAQMVRHYRRGEMRDMSMSEWIHHRGHWSEIVKFIRFQEINFIIFLDAFKQFIHGPKKSCLLIHGPPDCGKSMFAMSLLKVLKGKVISFVNAKSQFWLSPLSECKIGLLDDATDPCWQYIDTYLRNGLDGNVVSDCKHKTPMQIRFPPLLITSNYNIKANFLYSRIAIFEFKHKFPFKEDGTPVFQLTDQSWKSFFERLWTQLELEDEFQCLNARFNVLQEMLMDIY [...]
+Seq27 MADDKGTGCSEWFLDNISELIDQGN-SRDLFRLQEFEESAEQVQMLKRKY-FSPSPRLQSLKSKRRLFDSGLELMRCSNSRARLLSKVKEYFGVGFYELARQYKSNKTCCRDWVYGVREELLEGAKLLLNHCSYVWIN--MSLFLLCFNNAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGTYPDWIVNQTMLTHFDLSQMIQWAYDTDLTDEADIAYGYAKMAESDSNARAWLAHNSQAKFVRECAQMVRHYRRGEMRDMSISEWIHYRGHWSEIVKFIRFQEINFILFLDAFKQFLHGPKKSCLLIYGPPDCGKSMFAMSLIRVLKGRVISFVNAKSQFWLSPLAECKIGLLDDATDPCWQYIDAYLRNGIDGNIVSDCKHKTPLQIRFPPLLITSNYNIKDNYLYSRIVIFEFKHKFPFKEDGSPEFLLTDQSWKSFFKRLWSQLELEDEFQCLNDRFNALQDKLMDIY [...]
+Seq28 MADDKGTGCSEWFLDNISELIDQGN-SRDLLRQQEFEESAEQVQKLKRKY-FSPSPRLQSLKSKRRLFDSGLELMRCSNSRARLLSKVKEYFGVGFYELARQYKSDKTCCRDWVYGVREELLEGAKLLINHCSYVWIN--MSLFLLCFNNAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGTYPDWIVNQTMLTHFDLSEMIQWAYDTDLTDEADIAYGYAKMAESDSNARAWLAHNSQAKFVRECAQMVRHYRRGEMRDMSISEWIHYRGHWSEIVKFIRYQGINFILFLDAFKQFLHGPKKSCLLIYGPPDCGKSMFAMSLIRVLRGRVISFVNAKSQFWLSPLAECKIGLLDDATDPCWQYIDTYLRNGIDGNIVSDCKHKTPLQIRFPPLLITSNYNIKDNYLYSRIVIFEFKHKFPFKEDGSPEFLLTDQSWKSFFKRLWNQLDLEDEFQCLNDRFNALQDKLMDIY [...]
+Seq29 MAD-KGIGCSTWCLIDISDLLDELGNPQELLCLQEREESDLQLQQLKRKY-FSPSPQLESIKSKRRLFDSGLELLQCSNARATLLSKFKAAFGVSFTELTRRYKSDNTCCRDWAYGL-QDIIEGSKLFQQHCEYIWLH--ISLYLLCFKTGKSRNTVKNLLLNVGDAQLIADPPQIRSVVAALFWYKESMNKNVGEYPEWIANQTLLSHFDLSRMIQWAYDNEYTEDSDIAYHYAKLADEDSNARAFLAHNSQAKFVRECGQMVRHYKRGEMKNMSMSAWIYTRGHWSDIVKFIRFQQINFIMFLDVFKQFLASPKRNCLLIYGAPDCGKSMFCMSLIKALKGKVISFVNARSQFWLSPLVESKIALLDDATECCWNYIDNYLRNGIDGNMVSDCKHKNPVQIRFPPLLITSNNNIMSDYLHSRIKAFEFVNKFPFKDDGSPLFELTDQSWKSFFQRLWRQLDLEDEFQCLSDRFNALQDKLMTIY [...]
+Seq30 MADNRGIGCSNWF-SDLSDLIDEQGISRDLFRQQGSEEFEQQIQDLKRKY-FSPSPRLEAIKCKKRLFDSGLELLKCSNLRATMLSKFKNSFGVGFMELCRKFNSNKTCCRDWVYGVKEELLEGCKLLQEHCGYIWLH--MSLFLLCFKTGKSRDTVVRLLLSIHKEQLLTEPPKLRSVMAVLYWYKGSMNPNIGEYPDWIVQQTMISHFELSPMVQWAYDNDYIEDSDIAYNYAKLADEDINARAFLAHNNQAKIVRDCAWMVRHYKRGEMRYMSISKWIWYKGHWSNIVKFVRFQGINFIMFLDAFKHFLLSTKKNCILFYGPSDCGKTMFCMSLIKALGGRVISFANAKSQFWLQPLTESKIAMLDDATEACWNYLDTYLRNGLDGNWVSDCKHKAPIQIRFPPLLITSNYDILKNFLVSRIKIFEFKNKFPFNEDGTPMFELTDQSWKSFFQRLWKQLDLEDEFQCLNQRFNALEDQLMDIY [...]
+Seq31 M-EGK-KSFTSPFIIDIS-FIDEGN-TAQLFAQHQALDAAQEISAVKRKL-PLTGS-------KKGKLDSGYAWVNASSEKGAKLAIFKQTYGVTFASLTRVFKSDKTCCHNWVFSASEEVIEGSKQLRQYCDFYYASGHCVLYLLDFKASKNRETVIRLFLAVPDHCILSDPPKLKHVPAALFWLKTSNQPHVGQLPNWICQQTMLNYFELRKMVQWALDHNLTDDSMIAYNYAQLAEEDENANAWLNSNSQARYLKECALMTRHFLRAQRLEMTMAKWLTRCGDWKAIIKFLKYQNVNIVNFLSMFRDFMNSPKKNCLVICGAPNTGKSIFAMSLMQFLQGKTISFANHKSHFWLQPLADCKFAVLDDATLPCWSYLDIYCRNALDGNYVCDSKHKNPVQIKIPPLLITTNYNILQEYLHSRLLFLEFNNAFPLDEEGNPQFDLNDQNWKSFFIKLRRQLDLEEEV*RLEARFEEVQEKLLELL [...]
+Seq32 M-DPNLK-GQ-SFLDDLSDLIDQGN-SAELFAQQEAFAFQEHIRTTKRKLKLSFTSQ--SNAPKRRVLDSGYNLLAASSHRAVQLAIFKEKFGISLNSLTRIFKNDKTCCSNWVFGAREELLAASQILQRVCDSIMLLGFMGLYLLEFKNAKSRDTVRHLFLQVENNDMLLEPPKIKSLPAATFWWKLRHSSAAGNLPDWIARQTSITHFDLSAMVQWAYDHNFVDEAQIAYYYARLASEDSNAAAFLRCNNQVKHVKECAQMTRYYKTAEMREMSMSKWIKKCGDWKQIINFIKYQNINFLSFLACFRDLLHSPKRNCLVIVGPPNTGKSMFVMSLMRTLKGRVLSFVNSKSHFWLQPLNAAKIAILDDATRPTWSYIDTYLRNGLDGTPVSDMKHRAPMQICFPPLIITTNVDVAKDYLHSRLMSFEFANAFPLDENGKPALILNELSWKSFFERLWNQLDLEDEFRCLQARFDAVQEQLLEIY [...]
+Seq33 M-DPNEK--VLSFIDDLSDLIDQGN-SAELFAQQEALAVQEHIRASKRKLKLSFTSH--SNAPKRRKLDSGYNFLRAGSRRATQLAIFKDKFNISFNSLTRPFKNDKTCCNNWVFGARDELLEASKLLQRHCDYLMLLGFMALYLLEFKHAKSRETIRHLFLQIEKEEMFLEPPKLKSLPAATFWWKISHSASAGELPDWIARQTSLSHFDLSQMVQWAYDHNYTDEPTIAYNYARMASEDSNAAAFLRCNSQVKFVKECAQMTKYYKTAEMREMSMGKWIKRAGDWKDIINFLKYQGINFLSFLASFKDLLHGPKRNCLVIVGPPNTGKSMFVMSLMKALKGRVLSFVNSKSHFWLQPLNAAKIAVLDDATKATWSYIDTYLRNGLDGTPVSDMKHRAPIQICFPPLLITTNVQVMKDYLHSRLMCFEFPNPFPLNEAGQPALILNELSWKSFFARLWRQLDL-EDFRCLQARFDAVQDQLIDIY [...]
+Seq34 M-DP--K-TVLDFIEDISDLIDQGN-SAELFAQQQAFDFHKDICTTKRNLKRSLTSQ--SNAPKRRLLDSGYNLLRAGSRRAAYLGVFKEKFTISFTALTRIFKNDKTCCRNWVYRAREELLEASKILQKCCDFILLLGFLALFLLEFKTAKSRETVQRLFLQVEKEDMLLEPPKLKSLPAATFWWKIQHSNNSGTLPDWIARQTMISHFSLSVMVQWAYDHNYTEESTIAYHYAKLASEDSNAAAFLKCNNQVKHVKECAQMTRYYKTAEMTEMSMGQWIKKCGDWKQICKFLKFQNVNFLSFMSALKDLLHRPKRNCMVICGPPNTGKSMFVMSFMKALQGKVLSFVNSKSHFWLQPLRGAKVAVLDDATRATWTYFDTYLRNGLDGTPVSDMKHRAPLQICFPPLVITTNVNVMQDYLHSRIVCFEFPNTFPLDEAGNPLLLIDELSWKSFFERLWTQLDLEEDFRCLEARFDAVQDQLLQVY [...]
+Seq35 MAEDKGTVSGSWYLDDDPEFISEGNSSELLHNNHMLAKDGEQIQLLKRKY-MSPSPRLALVSSKRRLFETKDKILKSKNQKATALAQFKEAFGVSFTDLTRSFISNKTCTQHWVFGPNSDILDGTGLLEPHCTFLLKCGPIILLLIEFKASKCRDTVQNLLMRVEHHQMLLEPPKIRSQLTAFFFYKKTMAGGCGKLPDWLTRLTVLSHFELSRMVQWAYDNDMLEDSEIAYYYAQHADVDSNAAAWLKTNNQAKYVRDCGNMVRLYKQQEMKNLTMSEYIYKRGDWKHIFKLLRYQDVNMIQFLTSFRDLLSCPKRQCLVIYGPPDTGKSYFLYSLISFLKGKVISFTNSKSHFWLQPLLNAKVALLDDATKACWNYMDCYMRTALDGNAVSDSKFKAPVQVRLPPLLISTNVELPLLYLHSRTMCYCFAKPCLYDDEGNPLFNLTDRHWKGFFLHLEQQLGL-SEFRCLAEHLDACQEQMLELI [...]
+Seq36 MEDKDNKYNAIDFIDDISDLIDEGN-SLALLNKQQLEDDTQQLKILKRKYFSPSSPRLQQLK-RRLVFDSGLGLLHSSNREATAYAKFKATFDVGYKELTRPFISNKSCCCSWIFGVVAEILEAAKLLQPHCEYLQIIGVTVMMLFQFYAAKCRDTVINLLLHVREWQIITNPPKHRSVAVALYFYKTSMSNVSGAMPEWIKKQTLVNHFEFATMVQWAYDNNIRDEAEVAYGYASLADDDTNAAAWLKCNNQFKYVKDCVQMVAMYKRYEMRNMTIGQWLVKCGNWKNIINFLKYQEISIVAFLTTLRYFLQGPKKNCLAIWGPPDTGKSMFCYSLIKYTQGKVVSFVNSRSQFWLQPLVDGKIGLIDDATFACYQYMDVYMRNGLDGNAVSDVKHKAPIQLKLPPLLLTSNIEVHAEYLHSRIQEYKFPNKLRLDANGNPIITITDADWKSFFSKLWKQLDL-DDFRCLVDRFDAVQDRLLGIY [...]
+Seq37 MEDKDTKANACEFIEDISELIERGN-PQALLNRQQLEEDSQLLTVLKRKYVSPSSPRLEAL-SKRRLFDSGLGILHSSNRQATALTKFKNVFGVGYKEITRPFQSNKSCCHSWVFGVVAEMLEAAKLFKVHCDYLQIIGVTVLCLFEFSSSKCRDTVQKLILNVQEHQIITDPPRHRSVPVALFFYKQSMSNTSGTMPDWLKRQTMLNHFELSHMVQWAYDNNIWDEAELAYQYACLADVEPNAAAWLKSNQQYKYVSDCAKMVRMYKKYEMQQMSMAQWIKKCGDWKKIINLLKYQEISVIAFLTSFRMFLKGPKKNCIALWGPPDTGKSMFCYSLIRYVKGKVVSYANSKSQFWLQPLTDAKLGLIDDATFPCFQFMDVYMRSALDGNEVSDCKHRVPVQIKLPPLLVTSNIDMHSEYLQSRITSFKFPHKLPLDTNGNPIFIITDTDWKCFFSKLEQQLDL-EDFRCLVERFDAVQEKLLGLY [...]
+Seq38 MAEGGERLDAGWFVVNVSDLIDN-EGHAGVLNQQLLEESEQQTAYLKRKYCTPSSPRLQAVHSKRRLFDSGI-LQGSNQEATIL-AKFKGCFGVSLKELTRPYKSSKTCCNEWVFGIREELLTASKLLQPHCDFFLADGYVCLYLITFKAAKNRETITKLFLNCYDYQLRADPPKNRSVAVALFFYKLGLSGGCGDFPPWLAKQLLVSHFELSKMVQWAYDNDHTDESEIAFHYACLADEDSNAAAWLKSNAQAKYVADCSKMVRHYKKQEMRNMSMSQWIYRCGDWTVVAKYLKYQGVSFLGFLTALRHLFEGPKKQCLLIYGPPDTGKSWFCFSLLNFLRGKVVSYQNSRSHFWLSPLADCKVGMLDDATHACFQFIDVNMRSAFDGNYVSDCKHKAPIQIKLPPMLVTSNVNLPGEYLHSRVTGFEFPRKFPIDQDGSPVFSLTHSVWKAFFKRLHHQLGLEDEFRCLARRFDVLQEVMLHHY [...]
+Seq39 M-DAVNK-G-WCFIE----FLDDRGNHLALFTQQLFSEDDQHIAALKRKYAATPSPRLHSCQSRRRLFDSGIGLLRMSNRVAASLARFKDAFSVSFSDLTRSFKSDKTCSVNWVFGAREPLLEALLVLKPQCDYFQTVRRVDIILFEFKVGKSRNTLRKQMLGLDEKLIMADPPNHRSTLAALFFYKKVLFGAAGQTPAWIAQQTILEHFDFSKMVQWAYDNQLIEESEIAYRYACEAETDANAQAWLKCTNQVKHVRDCCAMVRLYKRQEMRDMTMAQWVRKCGDWKTIAGFLRYQEVNMVLFLTALRHMFKGPKKHCLVISGPPDTGKSYFCNSLNTFLHGRVISFMNSKSQFWLQPLVDAKMGFLDDATNACWTFMDIYMRNALDGNPMQDMKHRAPLQLKLPPLLITTNVDVMHNYLHSRLQCFAFEKPMPLNNDGHPQFPLHAANWKSFFTRLAKQLGIEEEFRCLEARFDAVQEQILSLY [...]
+Seq40 M-D-NDK-YRWAFLD----FLDQGN-SLALLTSQLFEQDEQHITALKRKYVTTPSPRLNAVTSRRRLFDSGVGLLRANNVYNACLARFKEAFGVGFTDLTRSFKSNKTCSQHWVFGAPETLIEAAKQLGEQSLFLQHQKRVDLFLFQFKAEKCRLTLTKQVLGVAERLVLAEPPNCRSNLAAFYFYKKTLGKEPGSSPEWIVKQVLIEHFDFSKMVQWAYDNNYVEESEIAYNYALEAETDSNAESWLKTTSQVKYVKDCAQMVRMYKRQQMREMSNTQWIRKCGDWKVIAAFLRYQEVNLVMFLSALRNMLKGPKRHCLVITGPPDTGKSYFCTTLVSFLKGRVISFMNSKSQFWLQPLADAKIGFLDDATHTCWTYMDTYLRNALDGNPVQDMKHRASIQLKLPPLLITSNIDVMNMYLHSRLQSFEFTKRMPLDSKGQPEFVLSAANWASFFTRLAKQLGLEEEFRCLQDRFDALQEQILNLY [...]
+Seq41 MAD-KGTGCSGWYIVETNSFIDQGN-SLSLFHEQLFLSSEEQIACLKRKYAATPSPRLESVSSRRKLFDSGIGLLKSSNIYATCLSKFKTAYGCSFAELTRQFKSDKTCSPHWVFGAPEQLVEASKLLPQHCEYAQLSSKVLLFLFEFKASKNRETVRKLLLGVQECLIIAEPPKERSVLSALFFYKKVMFQGSGQLPEWVAKQTLVEHFDLSRMIQWAYDNDYAEESAIAYNYALYAEADANAEAFLKSNCQAKYVKDCATMVRLYKRQEMRDMSMSQWVKKCGDWKVIAAFLRYQEVNVVLFLAALRHLFLGPKKHCLVIYGPPDTGKSYFCTTLVGFLKGKMISFMNSKSQFWLQPLVDSKIGFLDDATTACWQYMDVFMRNALDGNPISDMKHRAPTQIKLPPLLITTNVNVQANFLHSRLQFFAFNKPMLFDDSGNPQYPLSKANWRSFFTRLGKQLGIDDEFRCLTERFDAVQDQILNLI [...]
+Seq42 MADKGTD-GNNWYIVNISNLIDQGN-SLALYNAQINEDCDNALAHLKRKYNKSPSPQLQAVHSKRRLFDSGIFLLQSSNRRATMLAKFKEWYGVSYNEITRIYKSDKSCSDNWVFRAAVEVLESSKVLKQHCTYIQVK--SALYLVQFKSAKSRETVQKLMLNIQEYQMLCDPPKLRSVPTALYFYKHAMLTESGQTPDWIAKQTLVSHFELSRMVQWAYDNNYVDECDIAYHYAMYAEEDANAAAYLKSNNQVKHVRDCSTMVRMYKRYEMRDMSMSEWIYKCGDWKPISQFLKYQGVNILSFLIVLKSFLKGPKKNCIVIHGPPDTGKSLFCYSFIKFLKGKVVSYVNRSSHFWLQPLMDCKVGFMDDATYVCWTYIDQNLRNALDGNPMCDAKHRAPQQLKLPPMLITSNIDIKQEYLHSRIQCFNFPNKMPILDDGSPMYTFTDGTWKSFFQKLGRQLELEEEFRCLVARFDALQEAILTHI [...]
+Seq43 MADKGTE-GSSWYIVNVSNLIDQGN-SLALYNAKITDDCDNAIAHLKRKYNKSPSPQLQAVNSKRRLFDSGIFLLQTSNRRATMLAKFKDWYGVSYNEITRVYKSDKSCSDNWVFRAAVEVLESSKVLQQHCTYIQVK--SALYLLQFKSAKSRETVQKLMLNIQEFQILTDPPKLRSVPTALYFYKQAMLTESGQTPDWIAKQTLVSHFELSKMVQWAYDNNLLEECDIAYHYAMYADEDANAAAYLKSNNQVKHVRDCSTMVRMYKRYEMRDMSMSEWIYKCGDWKPISQFLKYQGVNILSFLIVLKSFLKGPKKNCIVIHGPPDTGKSLFCYSLVKFLKGKVVSYVNRSSHFWLQPLMDCKVGFMDDATYVCWTYIDQNLRNALDGNPMCDAKHRAPQQLKLPPMLITSNIDVKQEYLHSRVQCFSFPNKMPFLDDGSPMYTFTDATWKSFFQKLGRQLELEEEFRCLVARFDALQEAILTHI [...]
+Seq44 MADKGTE-GSSWYFVNISNLIDQGN-SLALYNTQITDACENAIAALKRKYTKSPSPQLQAVKSKRRLFDSGICLLQDNNRRATMLAKFKDWYGVSYTEITRLYKSDKSCSDNWVFKAPVEVLESSKVLQQHCQYIQVK--SALYLLQFKSSKSRETVYKLLLNIQEFQILADPPKLRSVPAALYFYKHALLTECGQTPDWIAKQTIVSHFELSRMVQWAYDNNHLEECDIAYHYALYADEDANAAAYLKSNNQVKHVKDCSTMVRMYKRYEMREMSMSEWIHKCGDWKPISHFLKYQGVNILSFLIVLKSFLKGPKKNCILIHGPPDTGKSLFCYSLIKFLRGKVVSYVNRTSQFWLQPLMDGKIGFLDDATYVCWTYIDQNLRNALDGNPMSDAKHRAPQQLKLPPMLITSNINVKQEYLHSRVQSFEFPNKMPFLDDGSPLYTFTDATWKSFFEKLGRQLDLEEEFRCLVARFDALQEEILTHI [...]
+Seq45 MADPNKGH-SEWYVVIVSNLIDEGN-SLALYNEQLTEDCNRAILALKRKLTKTPSPRLEAVQSKRRLFDSGLGIFRSTNRKATLLAKFKEYFGVAYGDLTRPFKSDRSCCENWVCAAAEEVIEASKVMQQHCDFLQVI--YALYLVKFKTAKSRDTIMKLFLNVQEQQLMCDPPKSRSTPTALYFYRRSFGNASGPFPDWLAKLTMLDHFELAQMIQFAYDNNLTTESEIAYKYALLADSDANAAAFLKSNQQVKYVRDCYAMLRYYKRQEMKDMSISEWIWKCGNWKLIAQFLRYQEVNFISFLCALKTLFKGPKRNCLVFWGPPDTGKSYICSSLTRFMQGKVVSFMNRHSQFWLQPLQDCKLGFLDDATFQCWQYMDVNMRNALDGNHISDLKHKAPLQIKLPPLLITTNVDVENEYLKSRLVFFKFPNKLPLKENDEVLYEITDASWKCFFIKFASHLELGDEFRCERS-DALQEQI-LNLY [...]
+Seq46 MAELKGTI-NELFDNTISNLIDQGN-SHALLNAQLSEEYDKDLVTVKRKFYATPSPRLSAVQSKRRLFDSGI-LLHSNNRRAALLCKFKEKYGIPFNEITRTFKSNKSCTQNWIFACAEDLIEASKTMQNHVSYLQMI--SALYIICFKAAKSRETVVKLILNTKEEQVLCDPPKIKSMAAALYCYKKVIADTCGDFPDWIATHTVINHFKFSDMVQWAYDNDMLDEAAIAYNYACYASENENAAAFLQTNSQLKYVKECCAMVRLYKKQEMRNMTMPEWIKSCDDWKVIVRYLKYQNINFLEFLLALKLLLKGPKKMCLVIYGPPDTGKSYFCYQFIQFMRGKVVSFMNKNSHFWLMPLLDSKIGFLDDATQCCWMYLDTHMRNAFDGNAVSDVKHKNLQQIVLPPMLITTNCDVCRDYLRSRLTCFNFPNKLPLYENGEPKFKFTDNCWTSFFSKFWKHLDLDPDFSCLSARFLAQQDIQLNLI [...]
+Seq47 MADHKGTLDGSWCLIVVSNLLD------SIIQGN-SEESDRCIQELKRKLNVTPSPRLSAVASKRRLFDSGVVILRSNNIRATVLCKFKDKFGVSFNELTRSFKSDKTCTPNWVIGIREDLRDACKLLQQHVEFLEMI--SVLLLVEFKVTKNRETVLKLMLNAKEEQILCEPPKLKSTAAALYFYKKIITDTCGTLPSWVSRLTIVEHFSLSEMVQWAYDNDFTEEASVAYNYACYATENTNAAAFLASNMQVKYVKDCVAMVRMYKRQEMKSMTMSEWISKCEEWKEIVQFLKYQGVNFLEFLIALKQFFKCPKKMCIVIYGPPDTGKSMFCFKLVQFLKGQVVSYINKSSQFWLMPLQDAKIGLLDDATHNCWIYLDTYLRNAFDGNTFCDIKHKNLQQTKLPPMIITTNVNVTTDYLRSRLTCFNFPNKLPMSDKDEPLFTISDKSWTCFFRKFWNQLELDA-FCCLSTRFAAQQEIQLTLI [...]
+Seq48 MTD-RGT-NDDWYIVDISDLLDQGN-SLELFHLQEHLQNEQDLNTLKRKYLNSPSPRLESIKARKQLFDSGIEILKCSNTRSALLAKFKDTVGVSFTDLTRAYKNNKTCCSYWVWGVTSTSVDVVKVFQVQCNYMHVENKFLIVLAGFKAQKSRETVLNLVLNVQSNYIMAEPPKNRSMAAALYWYRRSMSPAVGEMPDWMAQQTLLNHFELSQMVQWAYDNGYTDESDIAYYYAILAEEDENAKAFLASNAQAKYVKDCARMVSHYKRAEMSSMSMSAWIYKRGDWKHIVKFLRFQEVEFISFMIAFKELLSGPKKNCLVIYGPPNTGKSMFCMSLLRVLKGKVISYVNSKSQFWLQPLASTKIALLDDATKPAWDYIDLFLRNALDGNPICDLKHKAPQQIKCPPLMITSNINVKADYLHSRITCFEFKQPFPFDENGQPAFSLTDINWKSFFERFWSQLDLEDELRLLNNRLDWLQEQLLTLY [...]
+Seq49 MADNKGT--NDWFLVDLSDLLDQGN-SLELFHKQESLESEQELNALKRKLLYSPSPRLETIRYRRQLFDSGLEILKASNIRAALLSRFKDTAGVSFTDLTRSYKSNKTCCGDWVWGVRENLIDSVKLLQTHCVYIQLENRFLFLLVRFKAQKSRETVIKLILPVDASYILSEPPKSRSVAAALFWYKRSMSSTVGTTLEWIAQQTLINHFELCKMVQWAYDNGHTEECKIAYYYAVLADEDENARAFLSSNSQAKYVKDCAQMVRHYLRAEMAQMSMSEWIFRKGNWKEIVRFLRFQEVEFISFMIAFKDLLCGPKKNCLLIFGPPNTGKSMFCTSLLKLLGGKVISYCNSKSQFWLQPLADAKIGLLDDATKPCWDYMDIYMRNALDGNTICDLKHRAPQQIKCPPLLITSNIDVKSDYLHSRISAFKFAHEFPFKDNGDPGFSLTDENWKSFFERFWQQLELEDELRLLSSRLDLLQEQLMNLY [...]
+Seq50 MAE--GTGSSGWFLVTIGDLIDQGN-SLELFHQQETAEVLAEIAQLKRKYCDSPSPRMQSVSVKKRLFNEAVSVFTQSSSRIAQLAIFKEAHTVSFAELTRPFKSDKTVCGDWVSGVHCALGDSLKSLRSHCMFFLYDSTSILLLLRFKSQKSRDTVTSLLLGVDHIQVMLDPPKTRSVPAALFWYKRAMVTAVGPFPEWITQQTQVNHFELSTMIQWAYDNHITEESKIAYQYALLADSDENAKAFLASNAQAKYVKDCAAMVRLYFRAEMQEMSISAWIHYRGDWKEIVRFLRFQGIEYIPFMISMKKFLKGPKKNCIVIYGPPNSGKSYFCMSLLRLMGGKVISFANSKSHFWLQPLADAKIGLLDDATKPCWDFIDTYLRNALDGNPISDCKHRAPTELKCPPLLVTTNVDVMGDYLHSRIVFLRFMNKMPLKSDGTPGYNLDDKNWKSFFTRFWETLELEEELRLLSQRLDSVQEQLLNLY [...]
+Seq51 MAE--GT-DCGGFLDTVSSLLDQGN-SLEPFQHHEATETLKSIEHLKRKYVDSPSPRLQAFAVKKRLFDEAASANTARVKH-LLL--FRQAHSVSFSELTRTFQSDKTMSWDWVADIHVSVLESLQSLRSHCVYVQYDASSLLLLLRFKAQKCRDGVKALLLGVQDLKVLLEPPKTRSVAVALFWYKRAMVSGVGPMPEWITQQTNVNHFQLSVMVQWAYDNHLQDESSIAYKYAMLAETDENARAFLASNSQAKYVRDCCNMVRLYLRAEMRQMTMSAWINYRGDWKVVVHFLRHQRVEFIPFMVKLKAFLRGPKKNCMVFYGPPNSGKSYFCMSLIRLLAGRVLSFANSRSHFWLQPLADAKLALVDDATSACWDFIDTYLRNALDGNPISDLKHKAPIEIKCPPLLITTNVDVKSDYLFSRICVFNFLQELPIR-NGTPVYELNDANWKSFFKRFWSTLELEDELRLLSQRLDSIQEELLSLY [...]
+Seq52 MAARKGTTEDGGWVLNVSDLVDQG-LSLQLFQQQELTECEEQLQQLKRKFVQSPSPQLASIKVKKQLFDSGIQLFKVRDKRAFLYSKFKSSFGISFTDLTRVYNSDKTCSSDWVYHVSDDRREAGKLLQDHCEYFFLH--CTLLLLCLFVPKCRNTLFKLCFHISNVQMLADPPKTRSPAVALYWYKKGFASGTGELPSWIAQQTLITHFDLSEMVQWAYDNDLKDESEIAYKYAALAETDENALAFLKSNNQPKHVKDCATMCRYYKKAEMKRLSMSQWIDERGDWKEVVKFLRHQGIEFILFLADFKRFLRGPKKNCLVFWGPPNTGKSMFCMSLLSFLHGVVISYVNSKSHFWLQPLTEGKMGLLDDATRPCWLYIDTYLRNALDGNTFSDCKHKAPLQLKCPPLLITTNVNVCGDYLRSRCSFFHFPQEFPLDDNGNPGFQLNDQSWASFFKRFWKHLDLED-LRLLSEALDLLQEELLSLY [...]
+Seq53 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQKESDEQLQQLKRKYIASPSPQLEAVKAKKQLFDSGIDLFRAKNSRVFALGKFKETYGLSFMDLVRVFQSDKTCSLDWVLYMNPERSEAAKLLQDHCAYIFFT--AALMLLCFKYQKSRETVMKLLFDCSAQQILAEPPKTRSTAAALYWYKKSLIAGAGAFPEWIAKQTLINHFDLSAMVQWAYDNDLYEECEIAYQYASLADTEENAAAFLKSNSQAKHVRDCATMCRYYKRAEMQRMTMSEWISRQGDWKDIVKFLRYQDLEFTSFLSAFTKFLKGPKKNCLVFWGPPNTGKSMFCMSLMQFLKGKVLSFVNSKSQFWLQPLADAKVALLDDATAPCWTYFDTFLRNALDGNPICDAKHKAPYQVKCPPLMVTTNVDVIGDYLRSRLSAFCFATEFPFKEDGSPGFCLSDQSWASFFTRFWSRLELEDELRLISKALDSIQEQLLTLY [...]
+Seq54 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDEQLQQLKRKYFHSPSPRLQAVKAKKQLLDSGIDLFRAKNRRLFSLGKFKETYGLSFMDLVRVFQSDKTCSLDWVLYLHEERSEAAKLLQDHCSYVFCN--TTLMLLSFKSQKSRETVLKLLFDCKGEQFLAEPPKTRSTAAALYWYKKSVVSGTGILPEWIARQTLINHFDLSAMVQWAYDNDVYEECEIAYRYACLGETEENAAAFLKSNNQAKHVRDCATMCRYYKRAEMQRMSISEWIHRQGDWKDVVRFLRYQGLEFMEFLGAFTKFLKGPKRNCLVFWGPPNTGKSMFCMSLLRFLRGKVISYVNSKSQFWLQPLADAKVALLDDATVPCWNYFDVYLRNALDGNPVCDAKHKAPYQIKCPPLMVTTNVDVLADYLRSRLSAFCFATEFPFKEDGSPGFLLNDQSWASFFTRFWLRLELEDELRVISKALESVQEQLLTLY [...]
+Seq55 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDEQLQQLKRKYLASPSPRLQSVKAKKKLWDSGIELLRSKNRRLFSLGKFKETYGLSFLDLVRVFQSDKTCSMDWVLYLNEERAEAAKLLQDHCSYVFLT--VSLMLLSFKSQKSRETVSKLLFDCRGEQFLAEPPKTRSTAAALYWYKKSTVSGAGMLPDWIAKQTLINHFDLSAMVQWAYDNDLYEECEIAYQYACCAETDENAAAFLKSNSQAKHVRDCATMCRYYKRAEMQRMSISEWIHRQGDWKEVVKFLRHQGLEFIEFLSAFTKFLKGPKKNCIVFWGPPNTGKSMFCMSLLNFLKGKVISYANSKSHFWLQPLADAKLALLDDATAPCWNYIDTFMRNALDGNPVCDAKHKAPFQIKCPPLLVTTNVDVLGDYLRSRLSAFCFAAEFPFNEDGSPGFHLNDQSWASFFERFWPRLELEDELRVISLALESVQEQLLTLY [...]
+Seq56 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDDQLQQLKRKYIASPSPRLQAVKAKKQLVDSGVDLFKSKNRRLFALGKFKENFGISYMDLVRVFQSHKTCSMDWVLYLHDERSEAAKLLQDHCAYIFFT--VTLMLLAFTSQKSRETVFKLLFDCKEEQFLAEPPKTRSTAAALYWYKKSLLAGAGIFPEWIAKQTLINHFDLSVMIQWAYDNNITEESEIAYQYAMFADTDENAAAFLNSNNQAKHVRDCATMCRYYRRAEMQRMTMSEWIHKQGDWKEVVKFLRYQGLEFVEFLSAFTKFLKGPKRNCLVFWGPPNTGKSLFCMTLLKFLRGKVISYVNSRSQFWLQPLADAKVALLDDATVPCWNYFDTFLRNALDGNPVCDCKHKAPFQIKCPPLLVTTNLNVKGDYLHSRLSAFCFANEFPFKEDGSPGFNLNDQSWASFFKRFWLRLELEDELRVVSKALECVQEQLLHLY [...]
+Seq57 MVD-KGT-EESDWVLSLADFVDEG-LSLELFRQQEADREEEHLLQLKRKYIRSPSPRLESIKAKKQLKDSGLGLFKSKNQRAVLFAKFKECFGLSFTDLTRNFKSDKTCTADWVIYISEARAEAGKLLQDHCEYVFVS--CALCLLSFKAQKNRETVLKLLFGVRDCQLMAEPPRTRSAAAALYWYKRGMSNCAGQLPEWIAKQTLLGHFDLSQMVQWAYDNDLVEESEIAYQYALLGEEDENAAAFLNSNNQTKHVKDCAVMCRYYKKAERESMSMSEWIHRSELWREIVKFLRHQTVSFVSFIAAFKRFLRGPKSNCIVIWGPPNTGKSLFCMTLLKFLKGRVISFVNSKSQFWLQPLADAKIGLLDDATRPCWDYFDAYMRNALDGNPICDCKHKAPSQIKCPPLLITTNLNVMGDYLRSRLSSFCFPTEFPFHDDGSPGFILNDESWASFFARFWTHLELEDELRLLRQALDSVQEELLNLY [...]
+Seq58 ?ETKIKTLG-CSYIV-TEDFVDAGE-HLSLLQTQMRASDAQQIASLKRKYVKSPSPKLEQCKARKQLFDSGISDERSRVLYMYRR--FNDMYGVKYTDLIRAFKSDKTMSANWVYVPLLEDGKAAATLQQQCSWYFME--IQLFNVEFNAQKCRATVIKLFFNFSSKRLMADPPKLRSAPACLFWYQKVLKKVGGELPDYIHTQCALGSFELTKMVQWALDNNLTEESSIALKYAMLAEEDENAQAFLKSNNQPKLVKDCCTMVRMFQTALMRDMSISQYVDHR--WRSIVHFLRYQGVQFLSFMIDLKNLLHHPKKCTIVVCGPPNTGKSYFVLSFVKFMNGCVISFVNYGSHFWLTPLRTARIGMIDDATNSFWKYCDTYMRTLLDGNDVSDCKHRNPIQLRCPPLVITTNEDIKNDYLQTRLRFLYFNKPFPLHDRGDPVYKIESLQWASFFRKFWRHLDLLEELRLLQHRLDYTQEKILTLY [...]
+Seq59 MAMKTKREARCSYIL-VEDLVDQGN-SLSLFHAQTVEEYEGEIQSLKRKFILSPSPRLAGVKARKSLFDSGIDLFQSRQRCTHMYSKFKAVYGVSFTDITRPFKSDKTTSQHWVYYLAFDSEISAMLLRQQCQFLYID--IILFFLEYNVQKSRTTVYNWFFHYNENRMLANPPRTRNMPAALFFYHRFMGTGGGAMPEIIVNQCVVSNFELSRMVQWALDNDLQDEHMLALEYALLAESDGNARAFLKQNNQPMIVKNCSIMVRHYKTALVAKMSISQYVNKRNSWRGIVHFLRYQGQEFLPFMCKMHNFLHHPKKSTLVLCGPSDTGKSYFANGLNKFLDGHVLSFVSNGSHFWLSPLRGARCCLIDDATLTFWRYADQNMRALLDGYEISDAKHRNPMQTRAPPLIITTNEDIMRLYLQTRTMYVYFNKPFPLKGNGQPLYYIDGYTWNSFFRKFWRHLNL-EDIRLLLERLDYIQEQILTLY [...]
+Seq60 MEDLEEGGCSGWF-DSIADMFDQGN-SLELFHTQEKEETRTQIQALKRKYI-PSSPRLRAIPSRRLFEDSG-NLLQSHNRVARLLAVFKEAYGVSYKELTREYKSDKTCNPDWVYSLSEPILNAARTLQGICEYVFMQATVALLTVRFKCSKSRETVRKQMFHSDPLLCLCDPPKVQSVPAALYWYKSSMYSGTGEAPEWIKRQTMITCFDLSEMVQWAYDNNYEDESQIAFEYARTATESPNANAWLASNAQAKHVRDCATMVRHYKRAEMKAMSMSQWVWKCGTWTPISLYLASEGVEVIRFLSAMKSWLRGPKKNCLVFYGPPNTGKSLFTMSLIKFLRGRVISFANSKSHFWMQPLAEAKVVLLDDATRATWDYVDTYMRNAMDGNPLSDCKYRTPVQVKCPPMLVTTNEDVHLNYLHSRIQVFHLKEPMPIDTAGNPEYSFSNRHWKAFFEKLQKPLDLEGDFSCIHSRLAAVQEELMCMY [...]
+Seq61 MASQRGTLGGIDFIDSVSNLFDPGN-PLQLLQQQEAAEDERLVALIKRKHLTTPSPKLDAM-AKKKLFDSGVGILRAANRRVCMLARFKEVYGVSVTDLTRQFKSDKTCCKNWVFGLCEPYYITLTVLPDHCCYSHMQGGIALMLCDFKAMKNRDTVIKLIVPVSDDLIMVQPPNVRSPAAALYWVQRAQSNASGEYPSWITKQTMLSHFDFSNMVQWAYDQGYTEESKIAYHYAQLAEEDKNAMAWLSSAAQAQHVKHCAQMVRYYMQAQTAEMTMAQYIHERGDWKHIIAFLRHQDIEIIPWLRTTRDWLKGPKRNCVCYHGPPDTGKSMFGMSLMRFMRGAIISYVNSRSHFWLQPLVSAKVAMLDDATDACWQYIDTNLRNLLDGNPLSDLKHRAPIQATCPPLLITSNIDITQDYLRSRVKCFAFHCPLPVGEDGMPTLVLTEASWKCFFRKFASTLEV-EEFRCLRDRLDVLQDQILGHY [...]
+Seq62 MTDKSG---E-YFLLNGVDFIDG-NTLAEYNRKEADRHKRDLEQLKRRHVR-RPVGGSPSSSSKRRCL-GL-ELLRSANRQATFLGKFKDTYGISFTELTRPFKSDKTCCEDWVYGISGPLYEGAKLLEGHVIYMQLTGLLLLMLLRLKHAKSRATLRRLLFNISEMQLLAEPPKTRSVPVALFWYKGTLSSLSGTCPEWIHRQTLINHFDLSSMIQWAYDYDYDDECTIAYQYARLAETDANANAWLNSPAQARYVKDTATMVKYYKRAQMREMTMGEWIKHRGDWKKIVQFIRFQGIEFPLFFGALKKFLHGPKHNCIVIWGPPDTGKSMFCMNLIKLLGGKIISFANSKSQFWLQPLADAKVGLLDDATGVCWDYIDQYLRNALDGNPISDLKHRAPTQMKCPPLLITTNLDITANYLVSRVACFKFSEPFPFTDRDTPTYPLTECNWKALLERLWKQLDFQEEFKCLLDRLNAVQSKILDLY [...]
+Seq63 MEGDIDSSRGGGFILDLINDSLQGN-SQALLHQQIMREDNRQVQDLKRKYVSPKSPRLRAISAKRRLFDSGLELMRASNQRATQLALFKKGYGISLTELTRVFKSNKTCNPDWVFGVHHNTYSDLVRLEKHCEYVQCSGYIVLMLLRFTAHKNRNTLIKLMLSVSDIQILADPPKIRSVPAALYWYRNSMSTAVGPLPDWVARQTLVQHFVLSTMVQWAYDNDHTEESDIAYHYALLADEDTNAAAWLGTNSQAKHVRDCAVMVKHYRRAIMSAMSMSEWI-NRGDWKNIGNFLRYQGIEVITFIGALRDMLKGPKRTCMCIVGPPDTGKSAFCLSLLDFFGGRVLSFTNYKSQFWLQPLADTRIALIDDATKSTWDYIDEYMRNALDGNAICDLKHKNPLQIRCPPLLITSNINIKHNYLYSRIHIVEFKHAFPFNEEGEPVYQLTKGNWKSFFKRLWLRLDLEDEFRCLKKRLDAIQDELLTIY [...]
+Seq64 MDAEEAGEGSSWFLQE--DLIDQGN-SLLLFQQQEAQADEQHLSVFKRKYC-SPSPRLGAIQVKRRLFDSGLDILRSANRKATMLGLFKDAFGVPYGELTRQFRSDKTGCFDWVYAVREPFFESGKQLRQHCRYTHVTGTVLLMLVSFNNQKCRDTVNKLIFNVHELLLMLEPPKIRSVAAAMYWYKQSLTNATGELPEWIKKLILINHFDFSQFVQWAYDNEYQEEHEIAYNYASIADEDSNAAAWLGLTGQAKVVKDVATMVRYYRRAEMNRMSMSNWIHNRGQWQPIVNFLKYQGVAMVTFINALKSFLKGPKKNCLVIWGPPNTGKSWFCMSLMHFLGGRVLSHVNSNSHFWLQPLGDAKVALLDDATTVVWDYFDRYMRNACDGNPISDMKHKAPVQIKCPPLLITSNIDVKADYLHSRLVTFHFPNLFPFEDDGSPVYQFNDENWNSLFTRLWRALDLEDEFRCLRQRLDAVQEKLMNLL [...]
+Seq65 MDEKPGS-GGTSFILSDEDLVDRGN-HLELFQTQEKEAGEKQISILKRKFCLSPPGLAGIRVVRRRLFTGGPDVVQDFNMAATIQKLFKTLYVATFGEITRIFQSNKTNNQQWVYGVPELLYTASFLLNNHCNYLLANGSLSLYLAVFNVGKSRDTVCKLVLNTTNQNLLLQPPKVRGLCSALFWYKLSLSPATGSTPDWIQQQTNVANFDFGTMVQWAYDHHLTEECKIAYQYAKCAGSDVNAKAFLASTNQARLVKDCCTMVRHYLRAEEQALSISAYIKKRGSWLSIMNLLKFHGIEPIHFVNALNPWLKGPKHNCIAIVGPPNSGKSLLCNSLITFLGGKVLTFANHSSHFWLAPLSDCRVALIDDATHACWRYFDTYLRNVLDGYPVCDRKHKSAVQMKAPPLLLTSNIDVHADYLQSRVKSFYFTEPCCASDNGEPLFVITDADWRNFFERLWERLDLEDELTCASEHLLAAQETQMTLI [...]
+Seq66 MAETAGS-GGGAYICSDEDLVDPGN-HLELFQTQEKEAGERQISLLKRKFCLSPPGLAGIRVVRRRLFDAGGRPASDGNMAAVMHKLFKTLYIAGFGEITRVFQSDKTNNNQWVHGASEVLYAASFILSKHCSYLQASGSMSLFLAVFNVGKSRETVRKLILNTPCSRLLLQPPKIRGLCPALFWFKLGLSPATGTTPDWIKQQTNVAYFDFGTMVQWAYDHRLTEECKIAYQYAKCAGTDLNAKAFLASTNQARLVKDCCTMVKHYLRAEEQSLTISAFIKRRGSWLSIMNLLKFQGIEPINFVNALKPWLKGPKHNCIAIVGPPNSGKSLLCNTLMSFLGGKVLTFANHSSHFWLAPLTDCRVALIDDATHACWRYFDTYLRNVLDGYPVCDRKHKSAVQLKAPPLLLTSNIDVHADYLQSRVKTFYFKEPCPASDTGEPLFFITDADWKNFFERLWERLDLEDEFTCASDHLLAAQETQMQCI [...]
+Seq67 MDKENAG-GGDSFILD-EDLLDPGN-HLELFQTQEKEAGERQISILKRKLCLSPSWACCHKVVRRRLFDPGGASSAEPNMAACIQKLFKTLYIASHGEITRVFQSNKTVNHQWVYGVSEVLYSASFLFGKQCNCLQTSGSISVYRCMFNVAKSRDTVQKLMLNVTAGNLLLQPPKIRGLGPALFWFKLTLSPATGTTPEWIQQATNVASFDLGTMVQWAYDHGFTEESKIAYEYALCAGSDCNAKAFLASTSQARLVKDCCTMVRHYLRAEVQALTMSGYIKRRGSWLSIMNLLKYHGIEHIQFVNALKPWLKGPKYNCITIVGPPNSGKSLLCNSLIAFLGGKVLTFANHHSHFWLAPLADCRVALIDDATTACWRYFDTHLRNVLDGYPFGDRKHNTAVQMKAPPLLVTSNIDVHAEYSHSRVKPFYFKEPCPASDNGEPMFSITDADWKHFFERLWGRLDLEDEVTCAKEQLLAAQETQMTLI [...]
+Seq68 MSDEPGSGKGSEFILDLEDFVDQGN-HRELFQTQEKEAGEKAIQKLKRKLALSPPGLAAISLVKRRLFIGPKAILKSKNSAACKLKLFKTIFACSFCDLTRVFQSNKTTNLQWVYGPSETMYEASFLLKKACSYVLAVGTIALILACFNNAKSRDTVQKLFLNVHHEQLLMQPPKIRGVCAALFWFRLTFSPATGTLPQWIRTQTIAAEFDFGTMVQWAYDNSYCEESKIAYEYAMLANCDTNAKAFLASNNQAKMVKDCATMVRHYKRAEVQAMTMSEYIKRRGSWLPIMNLFKFQGIEPIRFVNSMRQWLRGPKKNCICIVGPPNSGKSLLCNSLISFLGGRVLTFAMHKSHFWLAPLSEARVALIDDATYACWKYFDTYLRNALDGYPICDRKHKTAVQMKAPPLLVTSNIDVHADYLHSRIVSFYFKETCT-TANGEPMFSITNADWKIFFERLWGRLELEEEFACASERLRAAQEQQMLLI [...]
+Seq69 MSDEPGSGKGSEFILDLEDFVDQGN-HRELFQIQEKEAGDKAIQKLKRKLALSPPGLAAITLVKRRLFIGPKAILKSKNSAACKLKLFKTIFACSYSDLTRVFQSNKTTNLQWVYGPSETMFEASFLLKKACSYLLSVGTVALFLACFNNAKSRDTVRKLFLNVHPEQLLMQPPKIRGVCAALFWFRLTFSPATGTLPQWIRTQTIAAEFDFGTMVQWAYDNSYCEESKIAYEYAMLANCDSNAKAFLASNNQAKMVKDCATMVRHYKRAEVQAMSISEYIKRRGSWLPIMNLFKFQGIEPIRFVNSMRQWLRGPKKNCICIVGPPNSGKSLLCNSLISFLGGRVLTFAMHKSHFWLAPLSEARVALIDDATYACWKYFDTYLRNALDGYPICDRKHKTAVQMKAPPLLVTSNIDVHADYLHSRIVSFYFKETCT-TANGEPMFSITNADWKIFFERLWGRLELEEEFTCASDRLRAAQEQQMLLI [...]
+Seq70 MANDKGSALGCSYLLQDEDFLDQGN-HLEVFQALEKKAGEEQLLNLKRKV-LGSSEASETPGAKRRLFENEANLVKSKNATVFKLGLFKSLFLCSFHDLTRLFKNDKTTNQQWVFGIAEVFFEASLLLKKQCSFVQMQGTCAVYLLCFNTAKSRETVRNLMLNVREECLLMQPPKIRGLSAALFWFKSSLSPATGALPEWIRAQTTL-HFDFGTMVQWAYDHKYAEESKIAYEYALAAGSDSNARAFLATNSQAKHVKDCATMVRHYLRAETQALSMPAYIKTRGSWKSILTFFNYQNIELITFINALKLWLNGPKKNCLAFIGPPKTGKSMLCNSLIHFLGGSVLSFANHKSHFWLASLADARAALVDDATHACWRYFDTYLRNALDGYPVSDRKHKAAVQIKAPPLLVTSNIDVQAEYLHSRVQTFRFEQPCT-DESGEQPFTITDADWKSFFVRLWGRLDLEEDFTCACERLHVAQETQMQLI [...]
+Seq71 MANDKGSGLGCSYLLQDEDFVDQGN-HLEVFQALEKKAGEEQILNLKRKV-LGSSEASETPGAKRRLFE?EANLVKSKNATVFKLGLFKSLFLCSFHDITRLFKNDKTTNQQWVFGLAEVFFEASFLLKKQCSFLQMQGTCAVYLICFNTAKSRETVRNLMLNVREECLMLQPAKIRGLSAALFWFKSSLSPATGALPEWIRAQTTL-NFDFGTMVQWAYDHKYAEESKIAYEYALAAGSDSNARAFLATNSQAKHVKDCATMVRHYLRAETQALSMPAYIKARGSWKSILTFFNYQNIELITFINALKLWLKGPKKNCLAFIGPPNTGKSMLCNSLIHFLGGSVLSFANHKSHFWLASLADTRAALVDDATHACWRYFDTYLRNALDGYPVSDRKHKAAVQIKAPPLLVTSNIDVQAEYLHSRVQTFRFEQPCT-DESGEQPFNITDADWKSFFVRLWGRLDLEEDFTCACERLHVAQETQMQLI [...]
+Seq72 MADKSGRGGCSFVLDFDAEFIDQGN-TLALFQSQVAQAGKQKVNYLKRKLHLESRAVLQPVAAKRRLFCSSSEILKSKNSAACKLAVFKFVYAASFCDLTRPFKNDKTTNYQWVFGVSEELFEASKLLGRSCTYLHATGSVALLLLSFHVAKSRETVTNLLLNLRAEHMMLQPPKLRGVTSAMFWYKMTLSPNTGQLPRWIEQQILITEFDFSHMVQWALDNEMMDESSIAFHYAQMADHDSNARAWLGLSNQAKIVKDVCTMVHHYQRAIMRSMTMSAYVHKMGSWLVIMQFLKFHGIEPIRFVNALRPWLQGPKKNCLAFIGPPDTGKSLFTNSLMSFLKGKVLNFANSASHFWLAPLTEAKVALIDDATHACLKYCDTYLRNFFDGYSVCDRKHKNAVQIKAPPMLLTSNIDIQAEYLKSRVTCFYFNDKCPLNEDGKPLFQITDPDWKSFFERLWQRLELEEEFICAAERLSAAQETQMTLL [...]
+Seq73 MDNTPGTGSSDWVLLDLVDFIDDSDFYRRLQVEQQREDDQRAAHVLKRKFLDSPSPRLEAIRARRKLYDSGHGLMQAGKPRNVLLALCKDAYGCSFSDLTRSYKSDKTVCGDWVAGVPCSLEEAITLLKPHSDYTHVNGLLLLLLVRWKTAKCRETVQKLLMSVEKHQMVLEPPKIRHPATAMFWYKRTLANASGETPEWILKQVSLQEFSLSAMVQWAYDNGLEGESEIAYGYAQLAEEDTNAEAFLRSNAQAKHVKDCAIMVRHYRRAEMCKMNIAQWIKLRGDWRPIMKFLKFQKVEILAFLTFMRHFLRGPKRNCMVLLGPPNTGKSLFGMSLMHFLGGKIISHVNSGSHFWLQPLLECKVAMLDDATTSTWDYMDIYLRNMLDGNTVCDAKHKAPMQLKCPPLIVTTNVDVTANYLHSRLKVFTFPNLCPLNCRGDPEFQLTPENWKAFLEKCWTSLGLDLLLRCLCSRLDVLQEQQMELI [...]
+Seq74 MASQDSTGSGG-FILEYMDFIDDDGGRHLNALLAEDDARAVQAVMSKIGH-SREYSSGGKESSHRKRTRSPDSLIRSGKARAAMLGIFKDSFGVRFTDITRHFKSDKTVCRDWVVGVACSVSDAVPLVRPHTVYSHTTGNMALGLVRWKTAKCRDTCCKLLLTVENKQLLLEPPQTQNAGAALFWYKKSISRGSGETLEWIARQVSLSSFCLSRMVQWAYDCGYTNESTIAYEYAKLADDDSNAEAFLKCNNQARYVRDCCKMVTLYARAEMAKMSMNEWIGRRKEWRVIVQFLKVQKVEFIPFLMQFRKFLKGPKNNCLCFYGPSNTGKSMFCMSLLEFLKGRTISYVNSKSQFWLQPLGDSKIGLLDDATLPVWDYMDVYLRNLLDGNVFCDAKHKAPSQIKAPPLLVTSNYNIKEYYLVNRVHVITFPTVCQTDYKGDVSVKLESHHWKSFFRRWWPLLDSNDGFRCLTNHLDVLQETQMEIF [...]
+Seq75 MAHAEGTGAGGWFVVDLVDFI-Q-EVPLELFVQQTANDDAAAVQALKRKFVGSPSPRLDAIKARRRLFDSGYGLFKGSNVRAAILSKFKDLFGLSFYDLVRQFKSDKSICGDWVFGVYYAVAEAVKLIQPQCIYAHIQGMVVLLLVRFKCGKSRETVAKYMLNVPEKHMLIEPPKIRSGPCALYWYRTAMGNACGETPEWIVRQTVVGHFSLSVLVQWAYDNDIQDESDLAYEYAKLGNEDANAAAFLASNCQAKYIKDAMTMCRLYRRAEQARMSMAQWIVHRGDWKHIVKFLRYQRVEFITFISAFKLFLKGPKKSCLVFYGPSDTGKSLFCMSLLQYLGGAVISFVNSSSHFWLSPLADAKIGLLDDATGQCWTYIDVYLRSILDGNPISDRKHRTLTQLKCPPLMITTNVDPLADYLRSRITVFKFMNKCPVTASGEPVYTLNNETWKSFFQRSWARLELEEEFRCLADRLDACQEMLIDLY [...]
+Seq76 MADTEGTGAGGWFMVDLVDFI-Q-EVPLELFVQQTAEDDAAIVQAVKRKFVCSPSPRLDAIKARRRLFDSGYGLFKCSNVRAAILSKFKDLFGLSFYDLVRQFKSDKSICGDWVFGVYYAVAEAVKLLQPQCLYAHIQGMVVLMLLRFKCGKSRETVAKYMLNVPEKHMLIEPPKIRSGPCALYWYRTAMGNASGETPEWIVRQTVVGHFSLSMLVQWAYDNDIQEESDLAYGYAQLGNTDPNAAAFLASNCQAKYIKDAMTMCRLYRRAEQSRMSMAQWIAHRGDWKHIVKFLKYQNVEFISFISAFKLFLKGPKKSCLVFYGPSDTGKSLFCMSLLQYLGGAVISYVNSSSHFWLSPLADAKIGLLDDATAQCWTYIDVYLRSILDGNPTSDRKHRTLTQLKCPPLMITTNVDPLADYQRSRITVFKFLNKCPVTNSGELVYTLNNETWKSFFQRSWARLELEEEFRCLADRLDACQETLIDLY [...]
+Seq77 MADHEGTRAGGWFLVDLVDFIVQ-EVPLALYVHQNAQDDAAAVQALKRKFTYSPSPRLDAIKARRRLFDSGYGLLKASNIRATILSKFKELFGLSYYDLVRQFKSDKSTCGDWVFGVYHAVAEAVKLLQPHCVYAHIQGMVVLALVRFKCGKNRESVAHCMLNIPDRHMLIEPPKIRSGPCALYWYRTAMGNASGETPEWIVRQTVIGEFSLSTLVQWAYDNDITDESQLAYEYALLGNEDPNAAAFLASNCQAKYIKDAITMCKHYKRAEQARMTMAQWIKYRGDWRHIVKFLRYQNVEFITFMSAFKHFLKGPKKSCMVFYGPSDTGKSLFCMSLLHYLGGAVISFVNSSSHFWLSPLVDAKVGLLDDATMQCWTYIDVYLRSILDGNAISDRKHRNLTQLKCPPLMITTNVDPLADYLKSRIVVFRFLNKCPMNANGEPVYTLNNETWKSFFQRSWARLDLEEEFRCLADRLDACQERLIDLY [...]
+Seq78 MADAEGTGAGGWFMVDLVDFIDQ-EVPLELFVQQNARDDAAAVQALKRKYTYSPSPRLNAIRARRRLFDSGYGLLKTSNLRATLLSKFKELYGLAFGELVRQFKSDKSVCGDWVFGVYHAVAEAIKLIQPVCLYAHIQGMVILMLIRFKCSKSRETVAKCILNVPDKQMLIEPPKIRSAPCALYWFRTAMGNASGETPEWITRQTVVGHFSLSVLVQWAYDNDIVDESDLAYQYALLGNDDPNAAAFLASNCQAKYIKDAITMCKYYKRAEQKRMSMAQWIAHRGDWRPIVRFLRYQKIEFVTFMSALKMFLRNPKKSCIVIYGPSDTGKSLFCMSLLKFLGGAVISYVNSTSHFWLSPLTDAKVGLLDDATYPCWVYIDTHLRSVLDGNQISDRKHKNLTQIQCPPLFITTNINPLEDYLHSRIAVFHFMYKCPLDDKGDPVYQFNNENWKSFFQRSWAQIEGEEEFRCLADRLDACQEKLIDFY [...]
+Seq79 MADVEGTRAGGWFMVDLVDFI-Q-EVPLELFVQQNAQDDAAAVHALKRKYIHSPSPRLDAIRARRRLFDSGYGLLKARNLRATLLSKFKELYGLAFGELVRQFKSDKSTCTDWVFGVYYAVAEALKLIQPLCHYAHIQGMVQLMLIRFKCGKSRDTVAHCILNVSEKQMLIEPPKIKSTPCALYWYKTAMGNASGETPEWIVRQTVVGHFSLSVMVQWAYDHDITDESQLAYEYALLGHEDPNAAAFLASNCQARYIKDAITMCRHYKRAEQARMSMAQWIAHRGDWKPIVRFLRFQKIEFMTFMGAFKMWLKGPKRSCIVIHGPSDTGKSLFCMSLVQFLGGAVISYVNASSHFWLSPLADAKVGLLDDATHPCWVYIDTHLRSVVDGNLISDRKHRNLAQLKCPPLLITTNINPLEDYLHSRMAVFSFMYKCPLDDNGDPVYKFNNENWKSFFQRSWARLEVEEEFRCLADRLDACQETLIDLY [...]
+Seq80 MADSEGTRAGGWFLVDLVDFI-Q-EVPLALFVQQNAQDDAATVQALKRKYTCSPSPRLDAIRARRRLFDSGYGVFKVSNLKAKLLYKFKDLFGLAFGELVRNFKSDKSICGDWVFGVYHAVAEAVKLIQPICVYAHIQGMVILMLVRYKCGKSRETVAHSMLNIPERQMLIEPPKIRSAPCALYWYRTAMGNASGETPEWIVRQTVVGHFSLSMLVQWAYDNDITDESVLAYEYALLGNEDPNAAAFLASNCQAKYIKDAITMCKHYRRAEQAKMTMAQWITHRGDWKAIVKYLRYQQVEFVPFISALKLFLKGPKKSCMVFYGPSDTGKSLFCMSLLNFLGGAVISYVNSSSHFWLSPLADTKVGLLDDATYQCWQYIDTYLRTVLDGNAISDRKHRNLTQLKCPPLMITTNINPLEDYLHSRIVVFQFLHKCPLNSNGDPVYTLNNENWKSFFRRSWARIEGEEEFRCLADRLDACQEKLLDLY [...]
+Seq81 MANCEGTRAGGWFLVDLVDFI-Q-EVPLDLFVQQNARDDAATVQALKRKYTCSPSPRLDAIRARRRLFDSGYGIFKVSNLRVTLLHKFKELFGLAYGDLVRQFKSDKSICGDWVFGVYHAVAEAVKLIQPICLYAHIQGMVILMLVRYKCGKSRETVAHSMLNIPEKQMLIEPPKIRSGPCALYWYRTAMGNGSGETPEWIVRQTVVGHFSLSTLVQWAYDNDITDESELAYDYAMLGNEDPNAAAFLASNCQAKYIKDAITMCKHYKRAEQARMSMTQWIAHRGD*E?IVKYLRYQRVEFVTFMGALKLFLKGPKKSCMVFYGPSDTGKSLFCMSLLKYLGGAVISYVNSGSHFWLSPLVDAKVGLLDDATYQCWQYIDTYLRTVLDGNAISDRKHRNLTQLKCPPLMITTNINPLEDYLHSRIVLFKFMHKCPLKSNGDPVYTLNNENWKSFFQRSWARIEGQEEFRCLADRLDACQEKLLDLY [...]
+Seq82 MAESPEGGAGGWFVVDMVDFVDQ-EVPLGLYVQQTMQDDAATVQALKRKFMGSPSPRLDAIKARRRLFDSGYGLLQVSNLRVQLFTKFKELFGLSFKDLVRQFKSDRSTCAEWVFGVYYAVAEAAKLLQPVCEYAHIQGMVMLLLLRFKCNKSRETVAKCLLNIPEKRMLIEPPKQRSAPCALYWYKTAMGNASGDTPDWIVRQTVIGHFSLSVLVQWAYDNEITDDSELAYEYAKLGNEDPNAAAFLASNCQARYIKDAITMVRHYRRAEQARMSMSQWIAHRGDWRHIVKLLRFQGIEFISFMEALKQFLKGPKKSCLVFYGPSDTGKSLFCMSLLRYLGGAVISFVNSTSHFWLSPLVDAKIGLLDDATQQCWVYIDTYLRTVLDGNTMSDRKHKNLQQLKCPPLMITTNVNIAADYLRSRMVVFPFLQKCPLDSNGEPVYKLNNENWKSFFQRSWARLDLEEEFRCLADRLDACQDKLLDLY [...]
+Seq83 MAESPEGGQG?WFVVDMVDFIDQ-EVPLDLYVQQTMQDDAATVHALKRKYIGSPSPRLDAIKARRRLFDSGYGLLQVSNLRVKLLAKFKELFGLSFMDLVRQFKSNKSTCGDWVFGVYYAVAEAAKLLQPVCDYAHIQGMVMLLLLRFKCNKSRETVAHCILNIPEKRMLIEPPKQRSGPCALYWYKTAMSNASGETPDWIVRQTVIGHFSLSVLVQWAYDNEITEESELAYEYAQLGNEDANAAAFLASNCQARYIKDAITMVRHYRRAEQARMTMSQWIAYRGDWRHIVKLLRYQGIEFISFMEALKHFLKGPKKSCLVFYGPSDTGKSMFCMSLLKYLGGAVISFVNSTSHFWLSPLVDTKIGLLDDATQQCWVYMDTYLRTVLDGNTMSDRKHKTLQQLKCPPLMITTNVNIEADYLRSRMVVFPFLHKCPLDSNGDPVYKLNNENWKSFFQRSWARLDLEEEFRCLAARLDACQDKLIDLY [...]
+Seq84 MAESPEGGAGGWFVVDLVDFIDQ-EVPLDLYVQQTIQDDAATVQALKRKFMGSPSPRLNAIKARRRLFDSGYGLLQVSNLRVKLLGKFKELFGLSFMDLVRQFKSNKSTCGDWVFGVYHAVAEAAKLLQPVCEYAHIQGMVMLLLLRFKCNKSRETVAHCILNVPEKRMLIEPPKQRSGPCALYWYRTAMGNACGETPDWIVRQTVIGHFSLSKLVQWAYDNDITDESELAYEYAQLGTEEPNAAAFLASNCQARYIKDAMIMCRHYRRAEQTRMSMSQWITYRGDWRHIVKLLRYQGIEFISFMTALKQFLKGPKKGCLVFYGPSDTGKSLFCMSLINYLGGTVISFVNSTSHFWLSPLADAKIGLLDDATYQCWIYMDTYLRSVLDGNVISDRKHKNLVQLKCPPLLITTNINPETDYLRSRMVIFPFLNKCPLDANGDPVYQLNNENWKSFFRRSWARLDLEEEFRCLADRLDACQEKLIDLY [...]
+Seq85 MADGEGTSACGWFWVDMVDFIDQ-EVALELYRQQEAQDDEAFVQALKRKYLASPSPRLDAIKAKRRLFDSGYGLLKTSNLRATLLGKFKDIYGLSFMELARQFKSNRTTCLDWVFGVYCTVAEGVKLIQQHCQYAHIQGMVVLMLVRYNCAKNRDTVAKCMLNIPEQHMLIEPPKIRHPAAALYWYKAGMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYAKLADVDGNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMTMSEWIRFRGDWRPIVRFLRHHDIEFITFVISLKNFLKGPKKCCIVIYGPADTGKSYFCMSLLRFLGGVVISYANSTSHFWLQPLCDAKIGLIDDVTPQCWSYIDTYLRNALDGNQVCDRKHRPLLQLKCPPLLMTTNTNPLEEFLRSRLQLFTFPNAFPVNQKGDPLYTLNDANWKCFFQRLWARLDLDDQFKCLASRLNACREKLLELY [...]
+Seq86 MADCEGTGACGWFWVDMVDFIDQ-EGAPELYRQQEVQDDEAIVQALKRKYIASPSPRLDAIRAKRRLFDSGYGLLKTSNLRATLLGKFKDLFGLSYMELVRQFKSNKTTCIDWVFGVYCTVAEGVKLIQQHCQYAHIQGMVVLMLVRYNCAKNRDTVAKCMLNIPEHHMLIEPPKIRYPPAALYWYKAGMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYARLADVDGNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMTMSQWITFRGDWRPIVRYLRHQDIEFISFVIALKNLLKGPKKCCIVIYGPADTGKSYFCMSLLRFLGGVVISYANSSSHFWLQPLCDAKIGLIDDVTPQCWSYIDTYLRNALDGNQVCDRKHRPLLQLKCPPLLMTTNTNPLEEFLRSRLQMFTFKNAFPVNQKGDPLYILNDANWKCFFQRLWARLDLQDDFRCLASRLDVCQEKLLELY [...]
+Seq87 MADCEGTGAGGWFFVDMVDFINQ-EVAAEVYRQQEALDDEAIVQPLKRKFLASPSPRLDAIKAKRRLFDSGYGLLNSSNRRATLLGKFKDLYGLSYMELVRQFKSNKTTCLDWVFGVYCTVAEGVKLIQQHCEYAHIQGVVILMLLRYKCAKNRDTVAKGLLNIPETNMLIEPPKIRSTPAALYWFRASMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYARLADVDSNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMSMSQWISFRGDWRTIVKYLRHQDIEFITFIIALKNFLKGPKKSCLVFYGPADTGKSYFCMSLLRFLGGVVISYANSSSHFWLQPLADAKLGLIDDVTPNCWSYIDVYLRNALDGNQICDRKHRPLLQLKCPPLLITTNTNPLEEFLRSRLQLFTFKNAFPLNSKGDPMYPLNDANWKCFFQRLWARLDLDEQFRCLASRLDACQETLLELY [...]
+Seq88 MEDSEGTRAGGWFHVEDLDFIDQ-EVPLQLYAQQIAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLKSSNLKATLLSKFKELYGVGYYELVRQFKSSRTACADWVFRVYYAVAEGIKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDTVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTSMGNGSGETPEWIVRQTLIGHFKLSVMVQYAYDHDITDESALAFEYAQLADVDANAAAFLNSNCQAKYLKDAVTMCRHYKRAEREQMSMSQWITFRGDWKPIVKFLRHQGVEFVSFLAAFKSFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLADSKIGLLDDATAQCWTYIDTYLRNLLDGNPFSDRKHKTLLQIKCPPLMITTNINPLEEYLRSRVTLFKFTNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEDQFRCLASRLDACQETLLELY [...]
+Seq89 MEDSEGTRAGGWFHVEDVDFIDQ-EIPLQLYTQQIAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLRSSNLKATLLSKFKELFGVGYYELVRQFKSSKTACADWVFGVYYAVAEGIKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDTVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTAMGNGSGETPEWIVRQTLVGHFRLSVMVQFAYDHDIVEESVLAFEYAQLADVDANAAAFLNSNCQAKYVKDAVTMCRHYKRAERAQMSMSQWITFRGDWKPIVKFLRHQGVEFVSFLAAFKLFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLSDSKIGLLDDATPQCWSYIDTYLRNLLDGNPVSDRKHKTLLQLKCPPLMITTNINPLEEYLRSRLTLFTFNNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEEQFRCLANRLDACQETLLELY [...]
+Seq90 MEDSEGTRAGGWFHVEDLDFIDQ-EIPLQLYAQQTAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLRSSNLKATLLSKFKDLFGVGFYELVRQFKSSKTACADWVYGVYYAVAEGLKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDSVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTAMGNGSGETPEWIVRQTLVGHFRLSVMVQYAYDHDIVEESVLAFEYAQLADVDANAAAFLNSNCQAKYVKDAVTMCRHYKRAEREQMSMSQWITFRGDWKPIVRFLRHQGVEFVSFLAAFKLFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLSDSKIGLLDDATPQCWSYIDIYLRNLLDGHPVSDRKHKTLLQLKCPPLMITTNTNPLEEYLRSRLTVFTFKNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEEQFRCLANRLDACQETLLELY [...]
+Seq91 MADNSGTRAGGWFMVDMVDFIDQ-EVAQELLLQQAAADDDEAVHTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNQKATLLGKFKEQFGLGYNELVRHFKSSRTACVDWVFGVYCTVAEGIKLIQPLCEYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSSPAALYWYKTSMSNISGETPEWIVRQTMVGHFSLSEMVQWAYDHDITDEGTLAYEYALIADVDSNAAAFLASNCQAKYVKDACTMCRHYKRGEQARMSMSEWIRFRGDWKPIVHFLRYQNVEFIPFLCAFKLFLQGPKKSCLVFYGPADTGKSYFCMSLLKFMGGVVISYANSHSHFWLQPLSEAKMGLLDDATSQCWSYVDTYLRNALDGNVMCDRKHRSLLQLKCPPLLITTNVNPLEDYLRSRLQVFTFSNPCPLTSKGEPVYTLNDQNWKSFFQRLWARLSLDDEFRCLANRLDACQDKILELY [...]
+Seq92 MADNSGTRAGGWFIVDMVDFIDQ-EVAQELLLQQAAADDDVAVQAVKRKFTHSPSPRLDAIKAKRRLFDSGYGILKASNQRATLLGKFKEQFGLGYNELVRHFKSNRTACADWVFGVYCTVAEGIKLIQPLCDYAHIQGMTVLMLLRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSGPAALYWYKTGMSNISGDTPDWIVRQTIVGHFRLSDMVQWAYDHDITDEGTLAYEYALIAEFDANAAAFLASNCQAKYVRDACTMCRHYKRGEQARMTMSEWIKFRGNWKPIVQYLRYQDVEFVPFLCALKSFLQGPKKSCLVFYGPADTGKSYFCMSLLRFMGGAVISYANSTSHFWLQPLSEAKMGLLDDATSQCWNYIDTYLRNALDGNVICDRKHRSLLQLKCPPLLITTNVNPLEDYLRSRLQVFTFKNKFPVTSSGDPLYTLNDQNWKSFFQRLWARLRLDDEFRCLASRLDACQDKMLELY [...]
+Seq93 MDDTSGTRAGGWFMVDLVDFIDQ-EVAQELLLQQAAADDDVEVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNHKATLLGKFKEQFGLGFNELIRHFKSNKTVCSDWVFGVYCTLAESFKLIQPQCEYAHIQGMTVLTLVRFKRAKNRETVAKGFLNVPENHMLIEPPKLRSAPAALYWFKTSLSNCSGETPEWIVRQTVVGHFSLSEMVQYAYDHDITDESTLAYEYALQADTDANAAAFLASNCQAKYVKDACTMCRHYKRGEQARMNMSEWIKFRGDWKPIVQYLRYQDVEFIPFLCALKSFLQGPKKSCIVFYGPADTGKSYFCMSLLKFLGGVVISYANSSSHFWLQPLAEAKIGLLDDATSQCWCYIDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNINPLGDYLRSRLQVFTFNNKFPLTTQGEPLYTLNDQNWKSFFQRLWARLNLEDEFRCLANRLDVCQDKILELY [...]
+Seq94 MDDTSGTRAGGWFMVDFVDFIDQ-EVAQELLLQQAAADDDVAVQAVKRKFAPSPSPRLDAITAKRRLFDSGYGILRASNQKATLLGKFKEQFGLGFNELIRHFKSSKTVCLDWVFGVYCTLAEGIKLIQPQCDYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSGPAALYWYKTAMSNCSGETPEWIVRQTMVGHFSLSEMVQYAYDHDITDESMLAFEYALLADTDANAAAFLSSNCQAKYVKDACTMCRHYKRGEQARMNMSEWIWFRGDWKPIVQFLRYHDVEFIPFLCAFKTFLQGPKKSCLVFYGPADTGKSYFCMSLLRFLGGVVISYANSNSHFWLQPLADAKIGLLDDATSQCWCYIDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNINPLEDYLRSRVQLFTFKNKFPLTTQGEPLYTLNDQNWKCFFRRLWARLSLDDEFRCLANRLDVCQDKMLELY [...]
+Seq95 MDDNTGTRAGGWFIVDFVDFIDQ-EVAQELFQQQTAADDDVAVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILRASNKKATLLGKFKEQFGLGYNELIRHFKSDRTSCADWVFGVFCTVAEGIKLIQPLCDYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESQMLIEPPKLRSGPAALYWYKTSMSSCSGETPEWIVRQTMVGHFSLSEMVQWAYDHDITDESTLAYEYALIADTDSNAAAFLSSNCQAKYLKDACTMCRHYKRGEQARMSMSEWIWFRGDWKPIVQFLRYQDVEFIPFLCAFKTFLQGPKKSCLVFYGPADTGKSYFCMSLLRFLGGAVISYANSSSHFWLQPLSEAKIGLLDDATSQCWNYIDTYLRNALDGNQICDRKHRALLQLKCPPLLITTNINPLTDFLRSRLQLFTFKNPFPVTTQGEPMYTLNDQNWKCFFRRLWARLSLEDEFRCLANRLDACQDKMLELY [...]
+Seq96 MDDNTGTRAGGWFIVDCVDFIDQ-EVARELFLQQAAADDDIAVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNHKATLLGKFKEQFGLGYNELIRHFKSDRTACVDWVFGVYCTVAEGIKLIQPLCDYAHIQGMTVFMLVRYKRAKNRETVAKGLLNVPESQMLIEPPKLRSGPAALYWYKTSMSSCSGETPEWIVRQTMVGHFTLSEMIQWAYDHDITDESTLAYEYALIADTDANAAAFLASNCQAKYLKDACTMCRHYKRGEQARMSMSEWIRFRGDWKPIVQFLRYQDVEFIPFLCAFKTFLQGPKKSCIVFYGPADTGKSYFCMSLLRFLGGAVISYANSSSHFWLQPLSEAKIGLLDDATTQCWNYVDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNVNPLADYLRSRLQLFTFKNPFPVTAQGEPLYTLNDQNWKCFFRRLWARLSLEDEFRCLANRLDACQDKMLELY [...]
+Seq97 MADPEGTGCNGWFYVDMVDFIDELETAQALFHAQEVHNDAQVLHVLKRKFAGGSSPRLQEIKAKRRLFDSGYGLLKVNNKQGAMLAVFKDTYGLSFTDLVRNFKSDKTTCTDWVFGVNPTIAEGFKLIQPFILYAHIQGVLILALLRYKCGKSRLTVAKGLLHVPETCMLIQPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSEMVQWAFDNELTDESDMAFEYALLADSNSNAAAFLKSNCQAKYLKDCATMCKHYRRAQKRQMNMSQWIRFRGDWRPIVQFLRYQQIEFITFLGALKSFLKGPKKNCLVFCGPANTGKSYFGMSFIHFIQGAVISFVNSTSHFWLEPLTDTKVAMLDDATTTCWTYFDTYMRNALDGNPISDRKHKPLIQLKCPPILLTTNIHPAKDYLESRITVFEFPNAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLEEDFKLLSERLSCVQDKIIDHY [...]
+Seq98 MADPEGTGCNGWFFVDMVDFIDEQETAQALFHAQEVQNDAQVLHLLKRKFAGGSSPRLQEIKAKRRLFDSGYGLLQASNKKAAMLAVFKDIYGLSFTDLVRNFKSDKTTCTDWVFGVNPTVAEGFKLIKPATLYAHIQGVLILALLRYKCGKNRLTVAKGLLHVPETCMLIEPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSDMVQWAFDNDLTDESDMAFQYAQLADCNSNAAAFLKSNCQAKYLKDCAVMCRHYKRAQKRQMNMSQWIKYRGDWRPIVQFLRYQGVEFISFLRALKEFLKGPKKNCILLYGPANTGKSYFGMSFIHFLQGAIISFVNSNSHFWLEPLADTKVAMLDDATHTCWTYFDNYMRNALDGNPISDRKHKPLLQLKCPPILLTSNIDPAKDYLESRVTVFTFPHAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLDEDFKCLSERLSALQDKILDHY [...]
+Seq99 MEDSQGTGCNGWFYVDMVDFIDELETAQALFHAQEVDNDAQVLHVLKRKYGTESSPRLQEIKAKRRLFDSGYGLLKANNKKAAMLAVFKETYGLSFADLVRTFKSDKTTCTDWVFGVNPTIAEGFKLIQPCTLYAHIQGVLILALLRYKCGKNRLTVAKGLLHVPETCMLIEPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSEMIQWAFDNDFTDESDIAYEYAQLADCNSNAAAFLKSNCQAKYLRDCAVMCRHYKRAQKRQMNMSQWIKYRGDWRPIVQFLRFQGIEFITFLGALKAFLKGPKKNCIVIHGPANTGKSYFGMSFIHFIQGAIISFVNSNSHFWLEPLADAKVAMLDDATNTCWTYFDNYMRNALDGNPISDRKHKPLLQLKCPPILLTSNINPAIDYLESRVTVFTFPNAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLDDDFKCLSERLSVLQDKILDHY [...]
+Seq100 MADSEGTGCNGWFFVDLVDFIDERETAQALFNVQEAQRDAREMHVLKRKFGCS-SP-LQEIKVKRRLIDSGYGLLHSKNKKAAMYAKFKELYGLSFQDLVRTFKSDRTTCSDWVFGVNPTVAEGFKLIQPYVLYAHIQGVVILALLRYKCGKNRITVAKGLLHVPDTCMLIEPPKLRSGVAALYWYRTGMSNISGETPEWIQRLTIIQHFDLSEMIQWAFDNDLTDESDIAYEYALIADSNSNAAAFLKSNCQAKYLKDCAVMCRHYKRAQKRQMSMSQWIKWRGDWKPIVQFLRYQGVEFITFLCALKDFLKGPKRNCIVLCGPANTGKSYFGMSLLHFLQGTVISHVNSNSHFWLEPLTDRKLAMLDDATDSCWTYFDTYMRNALDGNPISDRKHRHLVQIKCPPMLITSNTNPVTDYLNSRLMVFKFPNKLPFDKNRNPVYTINDRNWKCFFERTWCRLDLEEDFKCLSQRLSVLQDQILEH [...]
+Seq101 MANCEGTGCNGWFFVDMVDFIDERETAQVLLNMQEAQRDAQRVRALKRKYTDS-SP-LQELQARQPAYDSGYGLLQCNNKKAAMLTEFKKVYGLSFNDLVRTFKSDKTTCTDWVFGVNPTIAEGFKLIKQYALYTHIQGILILMLIRYKCGKNRITVGKGLLHVPDSCMLLQPPKLRSPVAALYWYRTGISNISGDTPEWIKRLTIIQHFDLSDMVQWAFDNELTDDSDIAFQYAMLADCNSNAAAFLKSNCQAKYVKDCATMCRHYKRAQKRQMTMPQWIKFRGDWRPIVQFLRYQGLEFITFLCALKDFLKGPKRNCIVIHGPPNTGKSYFCMSLIHFLQGTIISYVNSASHFWLEPLADAKIAMLDDATGTCWSYFDNYMRNALDGNPISDRKHRHLIQIKCPPMLITSNTNPVEDYLHSRLTVFKFPNAFPFDQNRNPVYTINDKNWKCFFEKTWCRLDLEDEFKCLSQRLNVLQEKILEH [...]
+Seq102 MANCEGTGCNGWFLVDLADFIDERETAQVLYNMQEAQRDAQSVRALKRKYGGSNRVTLQELQARTNVYDSGYGVLQANNQKAILLSQFKHTYGLAFNDLVRTFKSDKTICTDWVCGVNPTIAEGFKLIQPYALYTHIQGVYILLLIRYKCGKNRITVGKGLLHVPESCMLIEPPKLRSPVAALYWYRTGMSNISGTTPEWIQRLTVIQHFDLSDMVQWAFDNDVTEDSDIAYGYALLADSNSNAAAFLKSNCQAKYVRDCATMCRHYKRAQKKQMTMAQWIRFRGDWRPIVQFLRYQGVEFITFLCAFKEFLKGPKKNCIVIQGPPNTGKSYFCMSLMHFLQGTVISYVNSTSHFWLEPLADAKVAMLDDATGTCWSYFDTYMRNALDGNPISDRKHRHLIQIKCPPILITSNTNPVEEYLTSRLTVFTFPNAFPFDQNRNPVYTINNKNWKSFFQKTWCKLDLEDEFKCLSQRLNALQEKILEH [...]
+Seq103 MANREGTGCNGWFLVDLADFIDERETAQVLLHMQEAQRDAQAVRALKRKYTDSSRGTLQEIQATQTVYDSGYGLLQSNNKKAAMLTQFKETYGLSFTDLVRTFKSDKTTCTDWVFGVHPTIAEGFKLINKYALYTHIQGVLILMLIRYTCGKNRVTVGKGLLHVPESCMLLEPPKLRSPVAALYWYRTGISNISGDTPEWIQRLTVIQHFDLSDMVQWAFDNEYTDESDIAFNYAMLADCNSNAAAFLKSNCQAKYVKDCATMCKHYKRAQKRQMSMSQWIKFRGDWRPIVQFLRYQGIEFISFLCALKEFLKGPKKNCIVIYGPANTGKSHFCMSLMHFLQGTVISYVNSTSHFWLEPLADAKLAMLDDATGTCWSYFDNYMRNALDGYAISDRKYKSLLQMKCPPLLITSNTNPVEDYLRSRLTVFKFPNAFPFDQNRNPVYTINDKNWKCFFEKTWCRLDLEDEFKCLSQRLNVLQDKILEY [...]
+Seq104 MADPEGTGCNGWFFVDLVDFIDDREAAQALLHAQEVETDTKLLHALKRKYGAHSSSPLQEITAKRRLCDSGYGLLKVNNKKAAILAKFKETYGLSFTDLVRTFKSDKTTCTDWVCGVNPNIAEGFKLIQPYVLYAHIQGVFILALLRYKCGKNRLTVAKGLLHVPDTHMLIEPPKLRSSCAALYWYRTGISNISGDTPEWIQRQTIIQHFDLSEMIQWAFDNDYIDESDIAYEYAQLADCNSNAAAFLKSNCQAKYLRDCAVMCRHYKRAQRKQMNMSQWISYRGDWKPIVQFLRFQGIEFITFLRAFKDFLKGPKKNCIVIYGPANTGKSYFCMSLIQFLHGTVLSFVNSNSHFWLEPLTDTKIAMVDDATPTCWSYFDNYMRNALDGNPISDRKHKHLIQMKCPPMLITSNTNPATDYLRSRVTVFTFPHTFPFDSNGNPVYDINDKNWKCFFKRTWSRLDLEEDFKCLSQRLNVLQEKILEH [...]
+Seq105 M-DCEGTGCTGWFSVDLIGFIDQ-EVTQALFQAQQKQANTKAVRNLKRKLLGS-SDSQQNT-AKRRAVDSGYGLLKCSNVKAALLSKFKTVYGVSFAELVRVFKSDKTCCSDWVFGVAGSVAESIKLIQQYCLYYHIQGVIVLMLVRFTCAKNRTTIKNCLLNVPETQLLIEPPKLRSTAVALYFYKTGLSNISGDTPEWIVRQTQLEHFDLSKMVQWAFDHDITDDSEIAFKYAQLADIDSNAAAFLKSNCQAKYVKDCATMTRHYKRAQKRSMCMSQWLQYRGSWKEIAKFLRFQHVNFIYFLQVLKQFLKGPKHNCIVIYGPPNTGKSQFAMSFIKFMQGSVISYVNSNSHFWLQPLEDAKVAVLDDATYSCWLYIDKYLRNFLDGNPCCDRKHRSLLQVTCPPLIITSNINPQEDYLHSRVTVIPFPNTFPFDSNGNPVYALTDVNWKSFFSTTWSRLDLDADFKCLCQRLNACQEKILDY [...]
+Seq106 M-DCEGTGCTGWFSVDLIGFIDQ-QVAQALFQAQETQANKKAVRALKRKLLGS-SNSQQST-AKRRAVDSGYGLLKCSNVKAALLSKFKTVYGVSYTELVRVFKSDKTCCSDWVFGVAGSVAESLKLIQPYCLYYHIQGVLPLMLIRFTCAKNRATIKKCLLNVPDTQLLIEPPKLRSTAVALYFYKTGLSNISGDTPEWIVRQTQLEHFDLSKMVQWAFDHDITDDSEIAFKYAQLADIESNAAAFLKSNCQAKYVKDCATMTRHYKRAQKRSMGMSQWLQHRGTWKDIARFLRYQNVNFIYFLQVLKQFLKGPKHNCIVIYGPPNTGKSQFAMSFIKFVQGSVISYVNSNSHFWLQPLEDAKVALLDDATYGCWLYIDKYLRNFLDGNPCCDRKHRSLIQVRCPPLIITSNINPQDDYLHSRVTVIPFPNTFPFDSNGNPVYELTDVNWKSFFSTTWSRLDLDADFKCLCQRLNACQEKILDY [...]
+Seq107 M-DCEGTGCNGWFFVDLINFIDEQETARALFQAQELQANKEAVHQLKRKFLVSPNNTH-SH-VKRRLLDSGYGVLKSSNAKATLMAKFKELYGISYNELVRVFKSDKTCCIDWVFGVSPMVAENLKLIKPFCMYYHIQGTIVLMLIRFSCAKNRTTIAKCLVNIPQSQMFIEPPKLRSTPVALYFYRTGISNISGETPEWITRQTQLQHFELSQMVQWAFDHEVLDDSEIAFHYAQLADIDSNAAAFLKSNCQAKYVKDCGTMARHYKRAQRKSLSMSAWIRYRGNWREIAKFLRYQGVNFMSFIQMFKQFLKGPKHNCIVIYGPPNTGKSLFAMSLMKFMQGSIISYVNSGSHFWLQPLEDAKIALLDDATYGCWTYIDQYLRNFLDGNPCSDRKHRSLIQLVCPPLLITSNINPQEDYLHTRVTVLKFLNTFPFDNNGNAVYTLNDENWKNFFSTTWSRLDLEEDFKCLCHRLNVCQEKILDC [...]
+Seq108 M-DSEGTGCTGWFYVDIIDFIDERETAQALLQVQETQAHKEAVQHLKRKFLGSPSNSQQQP-GKRRLLDSGYGVLKCSNAKAMFMAKFKELYGVSYNELVRVFKSDKTCCTDWVFGVSPMVAENLKLIQPFCMYYHIQGTIVLLLARFTCAKNRLTIAKCLVNIPQSQMFIEPPKLRSTAVALYFYRTGISNISGETPEWITRQTQLQHFELSQMVQWAFDHDVVDDSEIAFYYAQLADTDSNAAAFLKSNCQAKYVKDCGTMTRHYKRAQRKSLTMSAWIRYRGNWREIAKFLRYQGINFMYFIQTFKLFLKGPKHNCIVIQGPPNTGKSQFAMSLIRFLQGCVISYVNSGSHFWLQPLEDAKVALLDDATYGCWTYIDQYLRNFLNGNPCSDRKHRSLLQIVCPPLLITSNINPKEDYLHSRVTVFQFLNAFPFDPHGNPVYALNDVNWKNFFSTTWSRLDLEEDFKCLCHRLNVCQEKILDC [...]
+Seq109 MASPEGTGCRGWFHVDLDGFIDERETAQQLLH?QNTHADTQTLQKLKRKYLGSP------SAVKRRLIDSGYGILKCSNVQAKLYCKFKDIFGIPFSELVRTFKSDSTCCHDWIFGVN?TLAEALKIIKTQCIYYHMQGVVILLLIRYTCGKNRKTIVKSLLNVPTEQMLVQPPKIRSPAVALYFYKTSISNISGSTPEWIERQTQLQHFELSKMVQWAFDNEVTDDSQIAFHYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMKQWIKHVGDWKPIVQFLRYQGVEFISFLSYFKLFLQGPKHNCLVIYGPPNTGKSCFAMSLINFFHGSVISYVNSHSHFWLQPLDNTKLGMLDDATEACWKYIDEYLRNLLDGNPVSDRKHKQLVQIKCPPVLITTNINPMQDYLHSRIHVLQFLNPFPIDVNGNPVYQLNNANWKCFFERTWSRLDLDEDFRCLCQRLDACQEKILDC [...]
+Seq110 MASPEGTGCTGWFHVDLDGFLDDRETAQQLLHAQNTYADTQTLHNLKRKYLGSP------SGVKRRIIDSGYGLLKSSNVQAKLCYKFKELFGIPFSELVRTFKSDSTCCHDWIFGVNETLAEALKIIKSQCMYYHIQGVVILMLIRYTCGKNRKTIIKSLVNVPSEQMLVQPPKIRSPAVALYFYKTAMSNISGETPEWIQRQTQIQHFELSKMVQWAFDNDVTDDSDIAFYYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMKQWITHIGDWRPIVQFLRYQGVDFISFLSYFKLFLRGPKHNCLVLYGPPNTGKSCFAMSLIQFFQGSVISYVNSHSHFWLQPLDNAKLGMLDDATDACWRYIDEYMRNLLDGNPVSDRKHKQLVQIKCPPVIITTNINPLHDYLHSRIHVVPFLNPFPIDTNGNPVYQLNNVNWKCFFERTWSRLDLDEDFRCLCQRLDACQEKILDC [...]
+Seq111 MASPEGTGCCGWFQVDVDGFIDDRETAQQLLQVQTAHADAQTLQKLKRKYIGSP------SEVKRRLIDSGYGLFKSSNVQGRLHFKFKEVYGVPYTELVRTFKSDSTCCNDWIFGVNETLAEALKILKPQCVYYHMQGVIVMMLIRYICGKNRKTITKSLLNVPQEQMLIQPPKLRSPAVALYFYKTAMSNISGETPEWIQRQTQLQHFELSKMVQWAFDNEVTDDSQIAFLYAQLADIDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMCQWIKHIGDWKPIVQFLRYQGVDFISFLSYFKLFLQGPKHNCLVLCGPPNTGKSCFAMSLINFFQGSVISFVNSQSHFWLQPLDNAKLGLLDDATDTCWRYIDDYLRNLLDGNPISDRKHKQLVQIKCPPVIITTNVNPMQDYLHSRISVFKFENPFPLDNNGNPVYELSNVNWKCFFERTWSRLNLDEDFRCLSQRLDACQNKILDC [...]
+Seq112 MASPEGTGCCGWFEVDLDGFIDDAET?QQLLQVQTAHADKQTLQKLKRKYIASP------SGVKRRLIDSGYGLFKSSNLQGKLYYKFKEVYGIPFSELVRTFKSDSTCCNDWIFGVNETLAEALKIIKPHCMYYHMQGVIVMMLIRYTCGKNRKTIAKALLNVPQEQMLIQPPKIRSPAVALYFYKTAMSNISGDTPEWIQRQTQLQHFELSKMVQWAFDNEVTDDSQIAFQYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMCQWIKHIGDWKPIVQFLRYQGVDFISFLSYFKLFLQGPKHNCLVLCGPPNTGKSCFAMSLIKFFQGSVISFVNSQSHFWLQPLDNAKLGLLDDATEICWKYIDDYLRNLVDGNPISDRKHKQLVQIKCPPLLITTNINPMLDYLHSRMLVFQFQNPFPLDNNGNPVYELSNVNWKCFFTRTWSRLNLDEDFKCLSQRLNACQNKILDC [...]
+Seq113 MDPEGTPGCTGWFNVDLVDFIDD-EAPGALLHAQETQAHAEAVQVLKRKFVGSPSPRLNEIQAKRRLFDSGYGLLRCSNLKATLLSKFKSVYGVSFSELVRSFKSDRTTCADWVAGVHHSVAEGLKLIQPFCSYAHIQGVYLLLLARFKCGKNRLTVSKCMLNVQETHMLIEPPKLRSAAAALYWYRTGISNVSGETPEWITRQTMFQHFDLSEMVQWAYDHDFTDDSVIAYEYAQLAGIDSNAAAFLKSNAQAKYVKDCATMCRHYKRAERQQMTMSQWIKQRGDWRPIVQFLRYQGVEFIAFLAALKLFLKGPKKNCIVLFGPPNTGKSYFGMSLIHFLQGSIISYVNSNSHFWLQPLADAKVAMLDDATPQCWSYIDNYLRNALDGNPISDRKHKNLVQMKCPPLLITSNTNAGQDYLHSRMVVFTFEQPFPFDQNGNPVYELNDKNWKSFFSRTWSRLDLEEEFKCLAERLSALQDRILEL [...]
+Seq114 MADPEGTGCTGWFEVDLLEFIDDTEAARALFNIQEGEDDLNAVCALKRKFAACSANPCRTSYRKRKIDDSGYGVLHSSNTKANILYKFKEAYGISFMELVRPFKSDKTSCTDWCYGISPSVAESLKLIKQHSLYTHLQGIIILLLIRFRCSKNRLTVAKLMLSIPETCMVIEPPKLRSQTCALYWFRTAMSNISGTTPEWIDRLTVLQHFDLSEMVQWAYDNELTDDSDIAYYYAQLADSNSNAAAFLKSNSQAKIVKDCGIMCRHYKKAEKRKMSIGQWIQSRGNWRPIVQLLRYQNIEFTAFLGAFKKFLKGPKKSCMLICGPANTGKSYFGMSLIQFLKGCVISCVNSKSHFWLQPLSDAKIGMIDDVTPISWTYIDDYMRNALDGNEISDVKHRALVQLKCPPLLLTSNTNAGTDYLHSRLTVFEFKNPFPFDENGNPVYAINDENWKSFFSRTWCKLDLEEDFKCISARLNAVQEKILDL [...]
+Seq115 MDDPEGTGCTGWFEVDLIEFIDEAEAARALFNVQEGVDDINAVCALKRKFAACSANVCVSWHRKRKIIDSGYGILHNSNTKATLLYKFKEAYGVSFMELVRPFKSDKTSCTDWCYGISPSVAESLKLIKQHSIYTHLQGIILLLLIRFKCSKNRLTVAKLMLSIPETCMIIEPPKLRSQACALYWFRTAMSNISGTTPEWIDRLTVLQHFDLSEMIQWAYDNDITDDSDIAYKYAQLADVNSNAAAFLRSNAQAKIVKDCGVMCRHYKRAEKRGMTMGQWIQSRGNWRPIVQFLRYQNIEFTAFLVAFKQFLQGPKKSCMLLCGPANTGKSYFGMSLIHFLKGCIISYVNSKSHFWLQPLSDAKLGMIDDVTAISWTYIDDYMRNALDGNDISDVKHRALVQLKCPPLIITSNTNAGKDYLHSRLTVFEFNNPFPFDANGNPVYKINDENWKSFFSRTWCKLGLEEDFKCISARLSAVQDKILDI [...]
+Seq116 MEDPEGTGCTGWFEVDLIDFIDEHEAARALFNAQEGEDDLHAVSAVKRKFTSSPSPRA-KHLPKRKPCDSGYGIMCENSIKTTVLFKFKETYGVSFMELVRPFKSNRSSCTDWCMGVTPSVAEGLKLIQPYSIYAHLQGVLILLLIRFKCGKNRLTVSKLMLNIPETHMVIEPPKLRSATCALYWYRTGLSNISGTTPEWIEQQTVLQHFDFGEMVQWAYDHDITDDSDIAYKYAQLADVNSNAAAFLKSNSQAKIVKDCATMCRHYKRAERKHMNIGQWIQYRGDWRPIVRFLRYQDIEFTAFLDAFKKFLKGPKKNCLVLYGPANTGKSYFGMSLIRFLSGCVISYVNSKSHFWLQPLTDAKVGMIDDVTPICWTYIDDYMRNALDGNDISDVKHRALVQIKCPPLILTTNTNAGTDYLHSRLVVFHFKNPFPFDENGNPIYEINNENWKSFFSRTWCKLDLEEDFKCIPARLNAVQEKILDL [...]
+Seq117 MEDPEGTGCTGWFSVDLIGFIDDTQAARALFNLQEEEDDLNAVSALKRKF--TGGGNSN--AAKRRAYDSGYGIMHVNNIKATLMHKFKEAYGVTFTQLIRPFKSDRTSCTDWCFGITPSVAESLKLIKPQTLYTHLQGIIILLLVRFKCAKNRLTVSKLMLSIPETHMIIEPPKIRSTTCALYWFRTGMSNISGQTPEWIERLTVLQHFDLGEMVQWAYDNDITDDSEIAYQYAMLADVNSNAAAFLKSNSQAKIVKDCGTMCRHYKRAEKRKMTIGQWIQARGDWRTIVKLLRYQNVEFTQFLATFKKFLKGPKKSCMVICGPPNTGKTYFAMSLIHFLQGCVISYVNAKSHFWLQPLSDAKIGMIDDVTAICWTYIDDYLRNALDGNDISDVKHKALVQLKCPPLLLTSNIDVATDFLHSRVVVFRFNNPFPFDENGNPVYNLNDENWKSFFSRTWCQLDLEEDFKCIATRLNAVQEKILDV [...]
+Seq118 MADPAGTGCNGWFYVDMVDFIDEAETAQALFHAQEAEEHAEAVQVLKRKYVGSPSPRLKAITAKRRLFDSGYGVLKTSNGKAAMLGKFKELYGVSFMELIRPFQSNKSTCTDWCFGVTGTVAEGFKLLQPYCLYCHLQGMVMLMLVRFKCAKNRITIEKLLLCISTNCMLIQPPKLRSTAAALYWYRTGMSNISGETPEWIERQTVLQHFDLSQMVQWAYDNDVMDDSEIAYKYAQLADSDSNACAFLKSNSQAKIVKDCGTMCRHYKRAEKRQMSMGQWIKSRGDWRDIVKFLRYQQIEFVSFLSALKLFLKGPKKNCILIHGAPNTGKSYFGMSLISFLQGCIISYANSKSHFWLQPLADAKIGMLDDATTPCWHYIDNYLRNALDGNPVSDVKHKALMQLKCPPLLITSNINAGKDYLHSRLVVFTFPNPFPFDKNGNPVYELSDKNWKSFFSRTWCRLNLEEDFKCLSQRLNVCQDKILEH [...]
+Seq119 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKQHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSRMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVEFMPFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLIKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq120 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKQHRDAVQVLKRKY?GSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGISNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVEFMSFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq121 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLMLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLSALKKFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq122 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFTELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLMLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLSALKKFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDKNGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq123 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq124 MADPAGTGCNGWFFVDMVDFINETETAQALFHAQEEQTHKEAVQVLKRKYASSPSPRLKAIAAKRRLFDSGYGILKCSNANAAMLAKFKELFGISFTELIRPFKSDKSTCTDWCFGIAPSVANFKH-----?ICIHIQAMVILALLRFKV?KTRTTIENY*LCISAASMLIQPPKLRSTPAALYWFKTAMSNISGETPEWIQRQTVLQHFDLSEMVQWAYDNDFIDDSDIAYKYAQLAETNSNACAFLKSNSQAKIVKDCATMCRHYKRAEKREMTMSQWIKRRGDWRDIVRFLRYQQVDFVAFLSALKNFLHGPKKNCILIYGAPNTGKSLFGMSLMHFLQGAIISYVNSKSHFWLQPLYDAKIAMLDDATSPC??YIDQYLRNALDGNPIFDVKH*A?VH??CPPLLIT?NINAGKDYLHSRVVVFTFHNEFPFDKNGNPEYGLNDKNWKSFFSRTWCRLNLEEVFKCLSQRLSVCQDKILEH [...]
+Seq125 MADSGNWRCSGWFNVEMGDFIDQQEIAQALYHSQQVNADNEAIRVLKRKFAGSASPH--ILTSTHLLCDSGYGILKSSNVKATLLAKFKEVYGLSYMELVRPYKSDKTQCQDWVFGVAPSLAESLKLLTQYCLYIHLQGIIVLLLARFKCNKNRLTVQKLLLNVTQEYMLIEPPRLRSTPCALYWYRTSLSNISGEVPEWIKRQTVVQHFDLSQMVQWAFDNDITNDCEIAYKYALLASEDSNAAAFLKSNAQAKYVKDCGTMCRHYKAAERKQMTMSQWITHRGNWKHIVQFLRYQQVEFVPFLIALKQFLKGPKQNCIVIYGPPDTGKSHFGMSLMQFMQGVVISYVNSNSHFWLSPLADAKMALLDDATPACWTYIDRYLRNALDGNPMCDRKHKHLLQIKCPPLLITSNTNPKADYLHSRMKVFTFSNPFPFDSNGNPLYQLTNENWKAFFTKTWSKLDLDDDFKCLCKRLSACQDAILEL [...]
+Seq126 MADSGNWRCTGWFNVEMGDFIDQQEIAQALYQSQQANADNEAIRVLKRKFTGSPSPQINVLTSKRRLFDSGYGLLQRNNAKAALLAKFKEVYGLSYMELVRPYKSDKTHCQDWVFGVIPSLAESLKLLTQYCMYIHLQGIIVLVLVRFKCNKNRLTVQKLLLNVTQERMLIEPPRLRSTPCALYWYRTSLSNISGDTPEWIKRQTLVQHFDLSQMIQWAFDNDITDDCEIAYKYALLGNVDSNAAAFLKSNAQAKYVKDCGTMCRHYKAAERKQMSMAQWIQHRGNWKDIVLFLRYQNVEFMPFLITLKQFLKGPKQNCIVLYGPPDTGKSHFGMSLIKFIQGVVISYVNSTSHFWLSPLADAKMALLDDATPGCWTYIDKYLRNALDGNPICDRKHKNLLQVKCPPLLITSNTNPKADYLHSRIKVFTFLNPFPFDSNGNPLYQLTNENWKAFFTKTWSKLDLDDDFKCLCKRLSACQDAILEL [...]
+Seq127 MADDTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVFGVHHSVSEAFQLIQPLSTYSHIQGMVLLVLLRFKVNKNRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTSISNASGDTPEWIARQTIVEHFKLTEMVQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKMSIKQWIKYRGNWKPIVQFLRHQGIEFISFLSKLKLWLHGPKKNCIAIVGPPDTGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNIDITTEYLYSRVTLFKFPNPFPFDSNGNAVYELCDANWKCFFARLSASLDIE-DFRCLAKHLDACQEQLLEL [...]
+Seq128 MADNTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLNAIKVKRRLFDSGYGLLKCKDIRATLHGKFKQCYGLSFTDLIRQFKSNKTTCEDWVFGVHHSVSEAFELIQPLTIYRHIQGMLLLVLLRFKVNKNRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTSLSNASGETPEWIARQTIVEHFKLTEMVQWAYDNDYCDECDIAFEYAKRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKMTMNQWIKHRGNWKPIVQFLRHQNIEFISFLSKLKLWLQGPKKNCIAIVGPPDTGKSMFCMSLIKFLGGTVISYVNSSSHFWLQPLCNTKVALLDDATHSCWGYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNIDITTEYLYSRVTVFKFPNPFPFDRNGNAVYELCDANWKCFFARLSASLDIE-DFRCLAKHLDACQEQLLEL [...]
+Seq129 MAEDTGTGCSGWFLVDMVDFID-VE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLYGKFKDCYGLSFTDLIRPFKSDKTTCGDWVFGIHHSVSEAFELMQPLTTYMHIQGMVLLVLIRFKVNKSRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTGISNASGETPEWIKRQTIVEHFKLTEMVQWAYDNDFCDESEIAFEYAQRGDFDSNARAFLNSNCQAKYVKDCATMCKHYKNAEMKKMSMKQWITYRGNWKPIVQFLRHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWVYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNVDITKDYLYSRVTTLTFPNPFPFDRNGNAVYELSDANWKCFFTRLSASLDIE-DFRCIAKHLDACQEQLLEL [...]
+Seq130 MADNTGTGCSGWFLVDMVDFID-ME-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKNIRATLLGKFKDCYGLSYTDLIRQFKSDKTTCGDWVFGVHHSVSEAFQLIQPVTTYSHIQGMVLLALVRFKVNKNRCTVARMMLNIPEDHMLIEPPKIQSGVAALYWFRSGISNASGETPEWITRQTIVEHFKLADMVQWAYDNDFCEESEIAFEYAQRADIDANARAFLNSNCQAKYVKDCATMCKHYKTAEMKKMNMKQWIKFRGNWKPIVQFLRHQNIEFIPFLTKLKMWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDVTQSCWVYMDTYMRNLLDGNPMTDRKHKSLALIKCPPLIVTSNIDITKEYLCSRVTLFTFPNPFPFDRNGNALYDLCETNWKCFFARLSSSLDIE-DFRCIAKHLDVCQEQLLEL [...]
+Seq131 MAENTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLLGKFKDCYGLSYTDLIRQFKSNKSTCGHWVFGVHHSVADAFQLIQPVTTYSHIQGMVLLALLTFKVNKNRCTVARMLLNIPEDHMLIEPPKIQSTVAALYWFRSSLSNASGDTPDWITRQTIVEHFKLADMVQWAYDNDLCDESEIAFDYAQRADIDANARAFLNSNCQAKYVKDCATMCKHYKNAEMKKMNMKQWIHYRGNWKPIVQFLKHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWVYIDTYMRNLLDGNPMSDRKHKSLALIKCPPLLITSNIDITKDYLFSRVSVFTFPNPFPFDRNGNAVYDLCESNWKCFFTRLSASLDIE-DFRCIAKHLDVCQEQLLEL [...]
+Seq132 MADDSGTGCTGWFMVDMVDFID-LE-AQALFNRQEADTHYATVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDLRAALLGKFKECFGLSFIDLIRPFKSDKTTCLDWVFGIHHSISEAFQLIEPLSLYAHIQGMVLLVLLRFKVNKSRSTVARTLLNIPENQMLIEPPKIQSGVAALYWFRTGISNASGEAPEWITRQTVIEHFKLTEMVQWAYDNDICEESEIAFEYAQRGDFDSNARAFLNSNMQAKYVKDCATMCRHYKHAEMRKMSIKQWIKHRGNWKPIVQFLRHQNIEFIPFLTKFKLWLHGPKKNCIAIVGPPDTGKSYFCMSLISFLGGTVISHVNSSSHFWLQPLVDAKVALLDDATQPCWIYMDTYMRNLLDGNPMSDRKHKALTLIKCPPLLVTSNIDITKEYLHTRVTTFTFPNPFPFDRNGNAVYELSNTNWKCFFERLSSSLDIE-DFRCIAKRLDACQEQLLEL [...]
+Seq133 MADDSGTGCTGWFMVDMVDFID-VE-AQALFNRQEADAHYATVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDIRSTLHGKFKDCFGLSFVDLIRPFKSDRTTCADWVFGIHHSIADAFQLIEPLSLYAHIQGMVLLVLIRFKVNKSRCTVARTLLNIPENHMLIEPPKIQSGVRALYWFRTGISNASGEAPEWITRQTVIEHFKLTEMVQWAYDNDICEESEIAFEYAQRGDFDSNARAFLNSNMQAKYVKDCAIMCRHYKHAEMKKMSIKQWIKYRGNWKPIVQFLRHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSCSHFWLQPLTDAKVALLDDATQPCWTYMDTYMRNLLDGNPMSDRKHRALTLIKCPPLLVTSNIDISKEYLHSRVTTFTFPNPFPFDRNGNAVYELSDANWKCFFERLSSSLDIE-DFRCIAKRLDACQDQLLEL [...]
+Seq134 MADDSGTGCSGWFLVDMVDFINL-SNAQALLHAQQTCADAVELCELKRKYISP-SPRLHAIKAKRRLFDSGYGLFKCKDLNAKLCGKFKELFGVGFHDLVRQFKSDKSTCTDWVFGVNPTIAEGFHLLKGQALYLHTQGMVLLALCRYKVAKNRETVVRQLLNVPDNQLMVQPPKLQSSAAALFWFRSGMGNGSGTTPEWIAKQTMLEHFSLTQMVQWAYDNGHTDECEIAYYYAQIADIDANAAAFLKSNNQAKYVRDCAAMCKHYRLAEMRRMSMADWIKHRGDWKPIVKLLRYQHIDIIVFLAALKKWLHGPKKNCICIVGPPDTGKSCFGMSLMHFLQGTIISFVNSCSHFWLQSLVDAKVAMLDDVTSACWAYMDTHMRNLLDGNPTSDRKHKSLAVIKCPPLLLTSNINIKHDYLQSRVTVFEFPNPFPFDSNGNAVYELSDANWNSFFKRLASSLELE-DF--LARRLDLCQEQLLEL [...]
+Seq135 MADSPGTGCSGWFVVDMIDFIDELSNAQALLHVQQTCADAADLCELKRKYISP-SPRLHAIKAKRRLFDSGYGLFKCKDLNAKLYGKFKELYGVGFGDLVRQFKSDKSTCTDWVFGVNPTIAEGFHLLKRQALYLHTQGMVLLALCRYKVGKNRETVVRQLLNVPDNQILVQPPKLQSPPAALFWFRAGMGNGSGTTPEWIAKQTMLEHFSLTDMVQWAYDNGHTDECEIAYYYAQRADVDANAAAFLKSNNQAKYVRDCASMCKHYRLAEMRRMSMAEWIKHRGDWKPIVKLLRYQHIDIIVFLAALKKWLQGPKKNCICIVGPPDTGKSCFGMSLMHFMQGTIISYVNSCSHFWLQSLADAKVAMLDDVTAACWGYMDTHMRNLLDGNPTSDRKHKPLAVIKCPPLLLTSNINITQDYLQSRVQVFEFPNPFPFDSNGNAVYELNDANWNSFFKRLASSLELG-DF--LARRLDLCQEQLLEL [...]
+Seq136 MADKQGTGCSGWFIVDMVDFINDHSSAQALLNAQQADADAAIVQELKRKYMSP-SPRLHAIKAKRRLFDSTNGLFKDKDVTVKLLGKFKELFGVGFNDLVRQFRSDKSTCTDWVFGVNPSISEGFHLLKEHTLYLHTQGMVLLALCRYKVAKNRSTIVRQLLNVPVQQILIQPPKLQSAPAALFWFRSSMGNGSGTTPEWISRQTMLEHFSLTDMVQWAYDNGYTEEYDIAYYYAQRGDIDANAAAFLKSNMQARYVRDCACMCKHYKLAEMKKMSMAEWIKHRGDWKPIVKFLKYQHIDIIAFLGALKKWLHGPKKNCICIIGPPDTGKSCFGMSLMKFLGGTILSYVNASSHFWLQPLVDAKVAMLDDVTAGCWTYMDMHMRNLLDGNPTSDRKHRALTVIKCPPLLLTSNLDISTEYLRSRITTFTFPNTFPFDTNGNAIYELNDENWNSFFKRLASSLELE-DY--LARRLDMCQEQLLEL [...]
+Seq137 MADKQGTGCSGWFIVDMVDFINEQSCAQALLNKQQADADAAIVQELKRKYISP-SPRLHAIKAKRRLFHSNYGLFKDKDVTVKLLGKFKDLFGVGFNDLVRQFKSDKSTCTDWVFGVNPSIAEGFHLLKEQTLYLHTQGMVLLALCRYKVAKNRSTVGRQLLNVPVQQILIQPPKLQSAPAALFWFRAGMGNGSGTTPEWISRQTVLEHFSLTNMVQWAYDNGYTEECDIAYYYAQLGDTDANAAAFLKSNMQARYVRDCACMCKHYKLAEMKKMSMAEWIKHRGDWKPIIRFLRYQHIDIITFLAALKKWLHGPKKNCICIIGPPDTGKSSFGMSLMKFLGGTMLSYVNSSSHFWLQSLVDAKAAMLDDVTAACWNYMDMHMRNLLDGNLTSDRKHKALAVIKCPPLLLTSNMDINTDYLKSRITTFTFPNAFPFDTNGNAIYEFNDENWNPFFKRLASSLELE-DY--LARRLDMCQEQLLEL [...]
+Seq138 MADDTGTGCSGWFSVDLVDFVDNQLKAQALLNRQQAHADKEAVQALKRKLLGSPSPRLGGLGAKRRLFDSGYGLLKCKNLQATLLGKFKELFGLSFGDLVRQFKSDKSSCTDWVFGVHHSIAEGFNLIKAEALYTHIQGMVLLMLIRFKCGKNRTTVSKGMLNIPANQLLIEPPRLQSVAAAIYWFRAGISNASGETPEWIQRQTIVEHFNLTEMVQWAYDNDLTEDSDIAYEYAQRADTDSNAAAFLKSNCQAKYVKDCGIMCRHYKKAQMKRMSMPQWIKHRGDWRPIVKFIRYQGIDFLTFMSAFKKFLHNPKKSCLVLIGPPNTGKSQFGMSLVKFLAGTVISFVNSHSHFWLQPLDSAKIAMLDDATPPCWTYLDTYLRNLLDGNPCSDRKHKALTVVKCPPLIITSNTDIRTEYLYSRISLFEFPNPFPLDKNGNPVYVLNDENWKSFFQRLWSSLEFEDEFRCLAKRLDACQEQLLEL [...]
+Seq139 MADDTGTGCSGWFCVDLVDFVDQ-VHAQALLNKQQAHADQEAVQALKRKLLGSPSPRLGGLGAKRRLFDSGYGLLKCKNLHATLLGKFKELFGVSFGDLVRQFKSDKSSCTDWVFGVNHSIAEGFNLIKADSLYTHIQGMVLLMLIRFKCGKNRTTVSKGLLNIPTNQLLIEPPRLQSVAAAIYWFRSGISNASGDTPEWIQRQTILEHFNLTEMVQWAYDNDITEDSDIAYEYAQRADRDSNAAAFLKSNCQAKYVKDCGVMCRHYKKAQMRRMSMGAWIKHRGDWKPIVKFIRYQQIDFLAFMSAFKKFLHNPKKSCLVLIGPPNTGKSQFGMSLINFLAGTVISFVNSHSHFWLQPLDSAKIAMLDDATPPCWTYLDIYLRNLLDGNPCSDRKHKALTVVKCPPLLITSNTDIRTNYLYSRVSLFEFPNPFPLDTNGNPVYELNDKNWKSFFQRLWSSLEFEDEFRCLAKRLDACQEQLLEL [...]
+Seq140 MADNQGTGCNGWFFVDMVDFIDQ-ENPQALLHAQQLQADVEAVQQLKRKYIGSPSPRLGAIKAKRRLFPPPNGLIHNTNIRVALFGMFKDLYGLSFMDLARPFKSDKTVCTDWVFGIYHGITDGFKLLEPHCLYGHIQGMVLLLLTRFKCGKNRLTVSKCLLNIPETQMLIDPPKLRTPAAALYWYRQGLSNASGTPPEWLARQTVIEYFDLSKMVQWAYDHNYIDDSIIALEYAKLADIDENAAAFLGSNCQAKYVKDCGTMCRHYIRAQKMQMTMSQWIKHRGEWKEIVRFLRYQHVDFISFMIALKQFLQGPKHNCILLYGPPDTGKSNFAMSLISFLGGVVLSYVNSSSHFWLEPLADAKIAMLDDATTQCWNYMDIYMRNALDGNPMCDRKHRAMVQTKCPPLIVTSNINASTDYLHSRVKCFCFPNRFPFDSNGNPVYDLSNKNWKSFFKRSWSRLALDNEFRCLATRLDVCQERLLDL [...]
+Seq141 Seq142 
\ No newline at end of file
diff --git a/testData/140.model b/testData/140.model
new file mode 100644
index 0000000..e2676f0
--- /dev/null
+++ b/testData/140.model
@@ -0,0 +1,3 @@
+WAG, p0 = 1-399
+AUTO, p1 = 400-699
+AUTO, p2 = 700-1104
diff --git a/testData/140.tree b/testData/140.tree
new file mode 100644
index 0000000..76fe704
--- /dev/null
+++ b/testData/140.tree
@@ -0,0 +1 @@
+((((((Seq67,(Seq66,Seq65)),(Seq69,Seq68)),(Seq70,Seq71)),Seq72),(Seq63,((((((Seq38,(Seq37,Seq36)),(((Seq45,(Seq47,Seq46)),(Seq44,(Seq43,Seq42))),(Seq35,(Seq41,(Seq39,Seq40))))),((Seq31,(Seq34,(Seq32,Seq33))),(((Seq14,Seq15),(Seq16,((((Seq8,Seq7),Seq6),(Seq5,(Seq4,Seq3))),((Seq11,(Seq9,Seq10)),(Seq13,Seq12))))),((Seq17,(((Seq24,Seq23),Seq25),(Seq18,((Seq22,Seq21),(Seq19,Seq20))))),((Seq26,(Seq28,Seq27)),(Seq30,Seq29)))))),(((Seq49,Seq48),(Seq51,Seq50)),((Seq52,(((Seq55,(Seq53,Seq54)),Seq5 [...]
diff --git a/testData/354.tree b/testData/354.tree
new file mode 100644
index 0000000..29ac922
--- /dev/null
+++ b/testData/354.tree
@@ -0,0 +1 @@
+((((bn_001BGTue,((ac002MorArb,bn_002BGTue),((bf_005BGTue,(bf_002BGTue,st_001BGTue)),st_002BGTue))),((er101AA26384W,(er002MorArb,(er003MorArb,(er108AA26384W,er005MorArb)))),(ol037MorArb,(((si_006MorArb,ol111PRChina),si_003MorArb),(am111TS17259W,(((ja157TS17319W,ja117TS17319W),ja144TS17319W),((((ps209TS16075W,(ps202TS16075W,(ps117TS16020W,(ps120TS16020W,(ps211TS16075W,ps115TS16020W))))),ps121TS16020W),(pa005BGTue,pa006BGTue)),(((fl017MorArb,fl010MorArb),((wa125For29653E,(wa224Kin20879E,(wa [...]
diff --git a/testData/49 b/testData/49
new file mode 100644
index 0000000..d9d2e72
--- /dev/null
+++ b/testData/49
@@ -0,0 +1,50 @@
+49 1200
+Seq1 ATGACCAACATTCGAAAATCACACCCCCTTATCAAAATCGTTAATCACTCATTCATCGATTTACCCACCCCACCTAACATTTCAGCATGATGAAACTTCGGCTCCCTACTAGGAGTCTGCCTAGTCCTACAGATCCTAACCGGCCTTTTCCTAGCCATACACTACACATCAGACACAATAACCGCCTTTTCATCAGTTACTCACATCTGCCGCGACGTCAACTACGGCTGAATTATTCGATACATGCACGCCAACGGAGCCTCTATATTCTTTATCTGCCTATACATGCATGTAGGACGAGGAATATACTACGGCTCCTACACCTTCTCAGAAACATGAAATATTGGAATCATACTACTACTCACAGTCATAGCCACAGCCTTCATAGGATATGTCTTACCATGAGGTCAAATATCTTTCTGAGGAGCAACTGTAATTACCAACCTCCTATCAGCAATTCCTTACATCGGCACTAATCTAGTAGAGT [...]
+Seq2 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAACCATGCATTCGTCGACCTCCCTGCACCCTCCAACATCTCATCATGATGAAACTTCGGCTCTCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTGTTTCTAGCAATACACTACACCTCAGACACTATAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAATTACGGTTGACTGATTCGATACCTCCATGCAAACGGAGCCTCCATGTTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATCTACTATGGGTCCTACACCTTTATAGAGACATGAAACCTTGGTATTATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTCCCATGGGGGCAAATATCCTTCTGAGGGGCCACAGTCATCACAAACCTACTTTCAGCCATCCCCTACATCGGTACTAACCTAGTAGAAT [...]
+Seq3 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAACATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTATTTCTAGCAATACACTACACCTCCGACACTACAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGGTCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq4 ATGAAAATTATACGAAAAACACACCCACTCATAAAAATTATCAACCACGCATTCATCGATCTCCCTGCACCCTCCAACATCTCATCATGATGAAACTTTGGTTCTCTATTAGGAGTATGCCTAATAGTCCAAATCCTCACAGGCCTATTCTTAGCAATACACTACACCTCCGACACTATAACAGCATTCACATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATATCTCCATGCAAACGGAGCCTCCATATTCTTCGTATGCTTGTTTATACACGTAGGACGAGGAATCTACTATGGATCTTACACCTTTACAGAAACATGAAATCTTGGTGTTATTCTACTATTTGCCGTAATAGCAACTGCATTTATAGGATATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACAGTCATTACAAACCTTCTCTCAGCTATTCCCTACATCGGTACTAACCTAGTAGAAT [...]
+Seq5 ATGAAAAACATACGAAAAACGCAACCACTCCTAAAAATTATTAACCACGAATTCATTGA-TTTCCTGAAACATCCAA-ATCTCATCATGATGAAACTTTGGCTCTCTACTAGGCATCTGCCTAGTAATCCAGATCCTAACAGGCTTATTCCTAGCAATACACTATACCTCCGACACCACCACAGCATTTTCATCTGTAACCCACATTTGCCGAGACGTAAACTACGGCTGACTGATTCGTTACCTCCATGCAAATGGAGCCTCCATATTCTTCATGTGCCTGTTCATACATGTAGGACGGGGAATCTACTACGGATCTTATACCTTCATAGAAACCTGAAATCTCGGCATTATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTACTCCCATGAGGACAGATATCCTTCTGAGGGGCCACAGTCATTACAAATCTACTCTCAGCTATCCCCTACATCGGAACTAATCTAGTAGAGT [...]
+Seq6 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAACATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTATTTCTAGCAATACACTACACCTCCGACACTACAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGGTCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq7 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAATATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAGTAATCCAAATCCTCACAGGATTATTTCTAGCAATACACTACACCTCCGACACTATAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCGTGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGATCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGGTATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq8 ATGAAAAACATACGAAAATCACACCCACTACTAAAAATCATTAATCTCGCATTTATTGACCTACCCGCACCATCCAATATCTCATCATGATGAAACTTTGGGTCCCTTCTAGGAGTCTGTCTAGTAGTACAAATTATCACAGGACTATTTCTAGCAATACACTACACCTCTGATACCACAACAGCATTTTCATCTGTAACTCATATCTGCCGAGATGTAAACTACGGTTGATTGATTCGATATCTTCATGCAAACGGAGCCTCAATATTCTTCATGTGCCTATTTATACACGTAGGACGAGGAATCTACTACGGATCCTACACCTTTACAGAAACCTGAAATATTGGCATTATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTGCTTCCCTGAGGACAAATATCCTTCTGAGGAGCCACAGTTATTACTAACCTTCTCTCAGCAGTTCCCTACATCGGTACTAATTTAGTAGAAT [...]
+Seq9 ATGACAAACATCCGAAAAACACACCCCCTACTTAAAATTATTAATAACGCATTCATTGACCTACCAGCCCCATCCAACATTTCATCATGATGAAACTTCGGGTCTTTACTAGGAATCTGCTTAATCATCCAAATCATCACAGGACTTTTCCTAGCCATACATTACACCTCAGACACCTCAACAGCATTCTCATCTGTTACCCATATTTGCCGAGATGTAAACTACGGTTGACTTATTCGCTATCTTCATGCAAACGGAGCCTCCATATTCTTTGTCTGCTTATTTATACATGTTGGACGAGGAATCTATTACGGATCTTATACCTACATAGAAACATGAAATATCGGCATCATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTACTTCCATGAGGACAAATATCTTTTTGAGGGGCCACAGTCATTACCAACCTACTATCAGCTATCCCTTACATTGGCACTAACTTAGTAGAAT [...]
+Seq10 ATGACTAACATTCGAAAAACTCACCCACTGATAAAAATTGTAAACAACGCATTTATCGACCTCCCAGCCCCATCAAACATTTCATCATGATGAAACTTTGGCTCCCTACTAGGCATCTGCCTAATCCTGCAAATCTTAACAGGCCTATTCCTAGCGATACACTACACATCCGACACAACAACAGCATTCTCCTCTGTCGCCCACATTTGCCGAGACGTCAATTATGGCTGAATCATCCGATACATACACGCAAACGGAGCATCAATATTTTTTATCTGCCTATTCATACACGTAGGACGAGGCCTCTACTATGGGTCATATACCTTCCTAGAAACATGAAACGTCGGAGTAATCCTCCTATTTACAACAATAGCCACAGCATTTATAGGCTATGTCCTGCCATGAGGACAAATATCATTCTGAGGAGCAACAGTCATCACCAACCTTCTCTCAGCAATCCCATATATCGGCACAGACCTGGTCGAA [...]
+Seq11 ATGACAAACATCCGAAAAATTCATCCCCTAATAAAAACCATTAACCACTCCTTCATTGATCTCCCCGCACCATCCAACATCTCATCATGATGAAACTTCGGCTCTCTACTAGGAATTTGCTTAATAGTACAAATCATCACAGGTCTATTCTTAGCCATACATTATACATCAGACACAACAACAGCATTTTCATCAGTAACCCATATCTGCCGAGACGTAAATTATGGATGACTAATCCGATATATACATGCAAACGGAGCCTCAATGTTCTTCATCTGCTTATTCCTTCATGTAGGACGAGGAATATACTATGGATCTTATACATTCCTAGAAACATGAAACATCGGAGTGATTTTATTATTTACAGTCATAGCCACTGCATTCATAGGATATGTCCTTCCATGAGGACAAATATCATTCTGAGGGGCCACAGTAATTACAAACTTACTTTCAGCCATCCCATATATTGGCACAATCCTAGTAGAA [...]
+Seq12 ATGACAAACATCCGAAAAACTCATCCCTTAATAAAAATTATTAATCATTCATTCATTGATCTTCCCGCACCATCCAACATCTCATCATGATGAAATTTCGGCTCCCTATTAGGAATCTGCCTAACAGTACAAATTGCCACAGGCCTATTTCTAGCCATACATTATACATCAGATACAACAACAGCATTCTCATCAGTAGCCCACATTTGCCGAGACGTAAATTACGGATGATTTATCCGATATATACATGCAAACGGAGCTTCCATATTTTTCATATGCCTATTCCTCCACGTAGGACGAGGAATATATTACGGATCCTACACATTTCTAGAAACATGAAATATCGGAGTAATTCTTCTATTTGCAGTTATAGCCACTGCATTCATAGGATATGTCCTTCCATGAGGACAAATATCATTTTGAGGAGCTACAGTAATTACAAATCTACTCTCAGCCATCCCATACATTGGCAGTACTTTAGTAGAG [...]
+Seq13 ATGACAAACATCCGAAAAACCCATCCCCTATTCAAAATTATTAATCACTCATTCATTGACCTTCCAGCCCCATCCAACATCTCATCATGATGAAACTTCGGTTCACTTCTTGGAATCTGCTTAATAGTCCAAATTTTAACTGGCTTATTCTTAGCTATACACTACACCTCCGACACACTAACAGCATTCTCATCAGTAACCCACATCTGTCGAGACGTTAATTACGGCTGATTGGTACGATATATACACGCAAATGGAGCCTCAATATTCTTCATCTGCCTATTTATACACGTAGGACGAGGTATATACTACGGATCATACACATTTTTAGAAACATGAAACATCGGGGTAATTCTTTTATTTACAGTAATAGCTACCGCATTTATAGGTTATGTACTCCCATGAGGACAAATATCATTTTGAGGGGCAACAGTAATTACAAACTTACTATCTGCCATTCCATACATTGGAACTACTTTAGTAGAA [...]
+Seq14 ATGATCAACATCCGAAAAACTCATCCATTAGTTAAAATTATCAACAACTCATTCATTGACCTTCCAACACCATCAAACATTTCAACATGATGGAACTTTGGGTCCCTGTTAGGAGTGTGTCTGATCTTGCAAATCTTAACAGGCTTATTTCTAGCCATACACTATACATCAGATACAGCTACAGCCTTTTCATCAGTCGCACACATTTGTCGAGACGTCAACTATGGGTGATTTATCCGATATATACATGCCAATGGGGCCTCTATATTTTTTATCTGCCTATTTATACACGTAGGGCGAGGCTTATACTATGGATCATACCTATTTCCAGAGACATGGAATATCGGAATTATTCTCCTACTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTCTGAGGAGCGACTGTCATCACCAACCTACTATCGGCCATTCCCTACATCGGAACGAACCTAGTAGAA [...]
+Seq15 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTTAATCATTCATTTATTGACCTCCCAACCCCATCTAACATCTCATCTTGATGGAATTTCGGGTCACTACTCGGTGTGTGCCTAGTAATCCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTTGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATTCGCTACCTTCACGCTAACGGAGCCTCCATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGTATCATCCTGTTCCTTACAACTATAGCAACAGCATTCGTAGGGTATGTTCTACCGTGGGGACAAATATCCTTCTGAGGAGCTACCGTAATCACAAACCTCCTCTCAGCAATCCCATACATCGGAAGCACCCTTGTTGAA [...]
+Seq16 ATGAAAATTTTACGGAAAAACCACCCGCTACTTAAAATTGTTAATCACTCATTTATTGACCTCCCAACCCCATCCAACATCTCATCTTGATGAAATTTTGGGTCACTACTCGGTGTATGCCTAATAATTCAAATTCTGACTGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTTGCCCACATTTGCCGAGACGTGAACTACGGATGATTAATCCGCTACCTCCACGCCAACGGAGCCTCCATATTCTTTATCTGCCTTTTTATCCACGTAGGTCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCCTATTCCTTACAACTATGGCAACAGCATTCGTAGGGTATGTACTACCATGAGGACAAATATCTTTCTGAGGGGCTACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTTGTTGAA [...]
+Seq17 ATGAAAATTTTACGGAAAAATCACCCACTACTTAAAATTGTTAATCACTCATTCATTGACCTACCAACCCCATCTAGCATCTCGTCTTGATGGAATTTTGGGTCACTACTTGGTGTGTGTCTGATAATTCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCTCACATCTGTCGAGATGTAAACTACGGATGATTAATCCGCTACCTACATGCTAACGGGGCTTCCATATTCTTTATCTGTCTCTTCATCCACGTAGGCCGAGGGATCTATTACGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGTATCATTCTATTTCTTACAACTATAGCAACAGCATTCGTAGGCTATGTGTTACCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTTGTTGAA [...]
+Seq18 ATGAAAATTTTACGTAAAAATCACCCACTACTCAAAATTATAAATCACTCATTCATTGATCTGCCAGCTCCATCTAACATCTCATCCTGATGGAACTTTGGATCCCTACTTGGTACATGCCTAGTAATCCAAATCCTAACAGGCCTATTCTTAGCTATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCTCATATCTGCCGAGACGTAAACTACGGATGATTAATCCGCTACTTACACGCTAATGGAGCCTCCATATTCTTCATCTGCCTCTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTACGTCCTTTCAAAAACTTGAAATATCGGCATTATCTTATTCCTCACAACTATAGCAACAGCATTTGTGGGGTACGTACTTCCATGAGGACAAATATCCTTCTGAGGGGCCACTGTAATTACAAACCTCCTCTCAGCCATCCCCTACATCGGAAGCACCCTAGTTGAA [...]
+Seq19 ATGAAAATTTTACGGAAAAATCACCCGCTACTCAAAATTGTTAATCACTCATTCATTGACCTACCAACTCCATCTAACATCTCATCCTGATGAAATTTTGGATCCCTACTAGGCATATGCCTAATAATCCAAATTTTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCACATATCTGCCGAGATGTAAACTACGGATGATTAATCCGCTACTTGCACGCTAATGGAGCCTCCATATTCTTTATCTGCCTCTTCATCCACGTAGGCCGAGGTATTTACTATGGTTCCTATACCCTCTCAGAAACCTGAAACATTGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTTGTAGGATATGTACTCCCATGAGGACAAATATCCTTCTGGGGTGCCACCGTAATCACAAACCTCCTCTCAGCTATTCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq20 ATGAAAATCTTACGGAAAAATCATCCACTGCTTAAAATTGTTAATCACTCATTCATTGATCTACCAACTCCATCCAACATCTCATCCTGATGGAATTTTGGATCTCTTCTAGGAATATGCTTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAACTACGGATGGCTGATCCGCTACTTACACGCCAACGGGGCCTCCATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGGATTTACTACGGCTCCTACGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTACTTCTCACAACCATAGCAACAGCATTTGTGGGATACGTACTCCCATGAGGACAAATATCTTTTTGAGGGGCTACCGTAATCACAAACCTTCTTTCAGCCATCCCATACATCGGAAACACCCTAGTTGAA [...]
+Seq21 ATGAAAATTTTACGAAAAAATCACCCATTATTCAAAATTATTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATGTGCTTAATAATCCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGACGTGAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATTGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGGGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACATTAGTTGAA [...]
+Seq22 ATGAAAATCTTACGAAAAAATCATCCACTACTCAAAATTATTAATCATTCATTTATTGACCTACCAGCCCCATCTAACATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTGTATGCCTAATAATCCAAATTCTAACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCATATCTGTCGAGACGTAAATTACGGGTGATTAATCCGTTACCTACACGCTAATGGTGCCTCTATATTCTTCATCTGCCTTTTCATTCATGTAGGTCGAGGGATCTACTATGGCTCTTATGTACTCTCAGAAACTTGAAATATCGGCATTATCCTATTCCTCACAACTATAGCCACAGCATTCGTAGGGTATGTTCTTCCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq23 ATGAAAGTCTTACGAAAAAATCACCCACTACTCAAAATTGTTAATCACTCATTTATCGATCTACCAACCCCATCTAACATCTCATCCTGATGGAATTTCGGGTCCCTACTAGGCACATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACGTCAGATACAACCACAGCATTCTCCTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGATTAATCCGCTACTTACACGCTAACGGAGCCTCTATATTCTTTATCTGCCTCTTCATCCATGTAGGCCGAGGGATTTACTACGGCTCCTACATCCTCTCAGAAACCTGAAACATTGGCATCATCTTGTTTCTCACAACTATAGCAACAGCATTTGTAGGGTATGTACTTCCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTTTCAGCTATCCCCTACATTGGAAACACCTTAGTTGAA [...]
+Seq24 ATGAAAATTTTACGTAAAACTCACCCACTACTTAAAATTGTTAACCACTCATTCATTGACCTACCCACCCCATCTAACATCTCATCCTGATGAAACTTTGGATCCCTACTAGGCATGTGCCTAGTAATTCAAATTCTAACAGGCCTATTCCTAGCTATACACTACACATCAGACACAGCCACAGCATTTTCTTCAGTTGCCCACATCTGTCGAGATGTAAATTACGGATGATTAATCCGTTATCTACACGCCAACGGAGCTTCCATATTCTTCATCTGCCTTTTCATTCATGTAGGACGAGGAATCTACTATGGCTCCTATGTCCTTTCAGAAACCTGAAACATTGGAATTATCCTACTGCTAACTACTATAGCAACAGCATTTGTAGGATATGTTCTACCATGGGGACAAATATCATTCTGAGGCGCTACCGTAATCACAAACCTTCTCTCAGCAATTCCTTACATCGGAAATACCCTAGTTGAA [...]
+Seq25 ATGAAAATCTTACGAAAAAATCACCCACTACTCAAAATTATTAATCACTCATTCATTGATCTTCCAACTCCATCTAACATCTCATCCTGATGGAATTTCGGATCCCTACTAGGCATATGCCTAATGATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCCTCCATATTCTTTATTTGTCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGCCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTTCTCATAACTATAGCAACAGCATTTGTAGGATATGTACTCCCATGAGGACAAATATCCTTCTGAGGGGCTACTGTAATCACAAATCTCCTTTCAGCTATCCCCTACATCGGAAGCACCTTAGTTGAA [...]
+Seq26 ATGAAAATCTTACGGAAAAATCACCCACTACTCAAAATTGTTAATCACTCATTTATTGATCTACCAACTCCATCTAACATCTCATCCTGATGAAATTTTGGGTCTTTGCTAGGTATATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATGCACTACACATCAGACACAGCCACAGCATTCTCTTCAGTAGCCCATATCTGCCGAGACGTAAACTATGGTTGACTAATCCGCTACCTACACGCTAATGGAGCCTCTATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATCATCTTACTTCTCACAACCATAGCAACAGCATTCGTAGGATACGTACTTCCATGAGGACAAATATCCTTTTGAGGGGCTACTGTAATCACAAATCTCCTTTCAGCCATCCCCTACATTGGAAGCACCCTAGTCGAA [...]
+Seq27 ATGACAATTATACGAAAAACCCACCCGCTACTTAAAATTATTAACCACTCATTTATTGATCTCCCTACCCCCTCCAACATTTCATCTTGATGGAACTTTGGCTCACTTTTAGGTATTTGCCTAATCATTCAAATTTTAACTGGCCTCTTCCTGGCCATACACTACACATCCGACACAGCCACAGCATTCTCCTCCGTCACCCACATCTGCCGAGACGTAAACTATGGCTGGCTCATCCGTTATATACACGCCAACGGAGCATCCATATTTTTTATTTGCCTATTCATTCACGTAGGACGAGGAATCTACTACGGCTCCTACATGCTCTCAGAAACCTGAAACATTGGCATCATCCTACTCCTAACCACAATAGCCACAGCATTCGTAGGCTATGTTCTCCCATGAGGGCAAATATCCTTCTGAGGCGCCACAGTAATCACAAATTTACTATCAGCAATCCCCTATATCGGAACAACTCTAGTTGAA [...]
+Seq28 ATGAAAAT-TTACGAAAAAATCACCCATTATTCAAAATTATTAACCACTCATTCATTGACCTGCCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGATCTCTACTCGGTATGTGCTTAATAATCCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCATTCGTAGGGTATGTACTTCCATGGGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAACCTCCTCTCAGCAATCCCCTACATCGGAAGCACATTAGTTGAA [...]
+Seq29 ATGAAAATCTTACGGAAAAATCACCCGCTACTTAAAATTGTTAATCACTCATTTATTGATCTACCAACTCCATCCAACATCTCATCTTGATGGAACTTTGGGTCACTACTTGGTTTATGCCTAATAATCCAAATTCTGACCGGCCTATTCCTAGCTATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCTTCTATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGTATCATTCTATTCCTTACAACCATAGCAACAGCATTCGTAGGATATGTACTACCATGAGGACAAATATCTTTCTGAGGGGCTACTGTAATTACAAACCTCCTTTCAGCAATCCCCTACATCGGAAACACCCTTGTGGAA [...]
+Seq30 ATGAAAATTATACGAAAGAATCACCCCCTACTTAAAATTATTAACCACTCATTCATCGACCTACCAACCCCGTCCAACATCTCATCATGATGAAACTTTGGGTCCCTACTAGGTGCCTGCCTAATTATCCAAATCTTAACGGGCCTCTTTCTAGCCATACACTACACTTCAGATACAACCACAGCATTCTCCTCAGTAGCCCACATTTGCCGAGACGTAAATTACGGGTGATTAATTCGCTATCTACACGCCAACGGAGCCTCCATATTCTTCATCTGCCTATCCATCCACGCCGGCCGAGGAATTTACTACGGCTCCTACGTCCTTTCAGAAACCTGAAACATCGGTATCATCTTATTCCTTACAACCATAGCAACAGCATTTGTAGGTTATGTGCTTCCATGAGGACAAATATCCTTCTGAGGCGCTACCGTAATCACTAACCTTCTCTCAGCAATCCCCTACATCGGAAGCACTCTATTTGAA [...]
+Seq31 ATGACAATCATACGAAAAAACCACCCTTTACTTAAAATCATTAATCACTCGTTTATTGACCTGCCCACCCCTTCCAACATTTCATCCTGATGGAACTTCGGCTCACTCCTTGGCATTTGCTTAATAATTCAAATTTTAACTGGCCTCTTCCTAGCCATACATTATACGTCCGATACAGCTACAGCATTTTCCTCCGTCACCCATATCTGCCGAGACGTAAATTACGGATGACTTATCCGCTACTTACATGCCAATGGGGCATCTATATTTTTTATCTGCCTATTTATTCATGTAGGACGAGGTATCTACTACGGCTCCTACATACTTTCAGAAACATGAAACATCGGAATTATCCTATTCCTAACCACAATAGCCACAGCATTTGTAGGCTATGTTCTTCCATGGGGACAGATATCTTTCTGAGGGGCCACAGTAATTACAAACTTACTCTCAGCAATTCCCTATATTGGAACCTCTCTAGTTGAA [...]
+Seq32 ATGAAAATTCTACGGAAAAATCACCCACTACTTAAAATTGTTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAATTTTGGATCGCTACTCGGCGTATGCCTGATAATCCAAATTCTAACCGGTCTATTTCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAATGGAGCCTCCATATTCTTCATCTGTCTTTTTATTCATGTAGGTCGAGGAATCTACTACGGCTCTTATGTCCTCTCAGAAACCTGAAACATCGGCATCATTTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTATTACCATGAGGACAAATATCTTTCTGAGGAGCTACTGTAATCACAAATCTCCTTTCAGCGATTCCCTACATCGGAAGCACCCTTGTCGAA [...]
+Seq33 ATGAAAATTTTACGGAAAAACCACCCACTACTCAAAATTATTAATCACTCATTTATTGACCTACCAACTCCATCTAACATCTCATCCTGGTGAAATTTTGGATCCCTACTAGGCATATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATCTGCCGAGATGTAAATTACGGATGATTAATCCGCTATCTACACGCCAATGGAGCTTCTATATTCTTTATCTGCCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTACTACCATGAGGACAAATGTCTTTCTGAGGAGCCACTGTAATTACAAATCTCCTTTCAGCCATTCCCTACATCGGAAGCACCCTAGTTGAA [...]
+Seq34 ATGAAAATTTTACGAAAAAATCACCCCCTACTCAAAATTATTAATCACTCGTTCATCGACTTACCAACCCCATCCAACATCTCATCCTGATGAAATTTTGGATCCCTACTTGCCCTATGCCTAGCCATCCAAATCCTCACAGGCCTATTTCTAGCCATACATTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATCTGTCGAGATGTAAATTACGGATGATTAATCCGCTATCTACATGCTAACGGAGCCTCCATATTCTTCATCTGCCTTTTCATCCACGTGGGCCGAGGGATTTATTACGGCTCATATATCCTCTCAGAAACCTGAAACATCGGTATCATTCTATTCCTTACAACTATAGCAACTGCATTCGTAGGATATGTCCTCCCATGGGGACAAATATCTTTCTGAGGAGCCACTGTAATTACTAATCTCCTCTCAGCTATTCCTTACATCGGAAATACCCTAGTAGAA [...]
+Seq35 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTAAACCACTCATTTATTGACCTACCAACCCCATCTAATATTTCATCCTGATGAAATTTTGGGTCCCTACTCGGCGTATGCTTAATTATTCAAATCCTAACCGGTTTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCCTCCATATTCTTCATCTGTCTTTTCATTCACGTGGGTCGAGGAATCTATTATGGTTCCTATATCCTCTCAGAAACCTGAAACATCGGCATCATTCTATTCCTTACAACTATAGCAACAGCATTTGTAGGATATGTACTACCATGAGGACAGATATCCTTTTGAGGAGCTACCGTAATCACGAATCTTCTATCAGCAATTCCCTACATCGGAAACACCCTTGTTGAA [...]
+Seq36 ATGAAAATTTTACGGAAGAATCACCCGCTACTCAAAATTGTTAATCATTCATTTATCGACCTTCCAACTCCATCGAACATCTCATCCTGATGAAATTTTGGATCCCTACTAGGCATATGCCTAATAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAATTACGGATGATTAATCCGCTACTTGCACGCTAATGGAGCCTCCATATTCTTTATCTGCCTCTTCATCCACGTAGGCCGAGGTATTTACTATGGTTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTGGGGTATGTACTCCCATGAGGACAAATATCCTTCTGAGGTGCCACCGTAATCACAAACCTCCTCTCAGCCATCCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq37 ATGAAAATTTTACGGAAAAACCACCCACTACTCAAAATTATTAATCACTCATTCATTGACTTACCAACTCCATCTAACATCTCATCCTGATGAAATTTCGGATCCCTACTAGGCATATGCTTAGTGATCCAAATTCTAACAGGCCTGTTCCTAGCCATACACTATACATCCGACACAACTACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAACGGAGCCTCTATATTCTTCATCTGCCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTATTACCATGAGGACAAATGTCCTTCTGAGGGGCCACTGTAATCACAAACCTCCTTTCAGCCATCCCATACATCGGAACCACCCTAGTTGAA [...]
+Seq38 ATGAAAATTTTACGAAAAAATCACCCATTACTCAAAATTATTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATATGCTTAATAATTCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACACTAGTCGAA [...]
+Seq39 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTAAATCACTCATTCATTGACTTACCAACCCCATCCAACATCTCATCTTGATGAAACTTTGGGTCACTACTCGGTGTATGCCTAATAATCCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCTTCCATATTCTTTATCTGCCTTTTCATCCATGTAGGCCGAGGAATCTATTACGGCTCCTATGTCCTCTCAGAAACCTGAAACGTCGGTATCATCCTATTCCTCACAACTATAGCAACAGCATTCGTAGGGTACGTGTTACCATGAGGACAAATATCTTTCTGAGGAGCTACCGTAATTACAAACCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTCGTCGAA [...]
+Seq40 ATGAAAATTTTACGAAAAAATCACCCATTACTCAAAATTATTAATCACTCATTCATTGACCTGCCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATGTGCTTAATAATTCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACACTAGTCGAA [...]
+Seq41 ATGACAATCATACGAAAAAACCACCCTTTACTTAAAATCATTAATCACTCGTTTATTGACCTGCCCACCCCTTCCAACATTTCATCCTGATGGAACTTCGGCTCACTCCTTGGCATTTGCTTAATAATTCAAATTTTAACTGGCCTCTTCCTAGCCATACATTATACGTCCGATACAGCTACAGCATTCTCCTCCGTCACCCATATCTGCCGAGACGTAAATTACGGATGACTTATCCGCTACTTACATGCCAATGGGGCATCTATATTTTTTATCTGCCTATTTATTCATGTAGGACGAGGTATCTACTACGGCTCCTACATACTCTCAGAAACCTGAAACATCGGAATTATCCTATTCCTAACCACAATAGCCACAGCATTCGTAGGCTATGTTCTTCCATGAGGACAGATATCTTTCTGAGGAGCCACAGTAATTACAAACTTGCTCTCAGCAATTCCTTATATTGGAACCTCTCTAGTTGAA [...]
+Seq42 ATGACCAACATCCGAAAGACCCACCCACTAATAAAAATTATTAACAATGCATTCATTGACCTCCCTGCCCCATCAAACATCTCATCGTGATGAAATTTTGGCTCCCTTCTAGGCATCTGCCTAATCCTACAGATCCTAACAGGACTATTTCTAGCAATACACTACACATCTGATACAACAACAGCATTCTCCTCTGTCACCCACATTTGCCGAGACGTCAACTATGGCTGAATCATCCGATATATACACGCAAACGGAGCCTCAATATTTTTTATCTGCCTATTCCTACATGTAGGACGAGGCCTATATTACGGATCCTACGCCTTCCTAGAAACATGAAACGTCGGAGTAATCCTTTTATTCGCAACAATGGCCACAGCATTTATGGGCTACGTTCTGCCATGAGGACAAATATCATTCTGAGGGGCAACAGTCATCACTAATCTCCTCTCAGCAATCCCATATATTGGCACAGACCTAGTAGAA [...]
+Seq43 ATGACCAACATACGAAAAACTCATCCTTTAATAAAAATCATTAACGAGTCTTTCATTGACCTTCCTACCCCATCTAACATCTCTGCATGATGGAACTTCGGCTCTCTTTTAGGATTATGCCTTGTAATTCAGATTCTCACAGGACTTTTCCTAGCCATACATTACACCTCCGACACCACAACAGCCTTCTCATCAGTCACCCATATCTGCCGAGACGTAAACTACGGATGATTAATCCGATACATACATGCTAACGGAGCTTCAATATTCTTCATCTGCCTCTTCCTCCACGTAGGACGAGGCCTGTACTATGGGTCATACACTTTCATTGAAACCTGAAACATTGGAGTACTATTATTATTCACTGTAATAGCAACAGCCTTTATAGGCTACGTCTTACCATGAGGACAAATATCCTTCTGAGGAGCCACAGTAATTACAAACCTCCTATCCGCTATCCCCTATATCGGCACAACCCTGGTAGAA [...]
+Seq44 ATGACAACCCCCCGCAAAACACATCCACTAGCAAAAATCATTAACAACTCATTCATTGATCTCCCCACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTGATTATCCAAATTACTACAGGTCTATTCTTAGCCATACACTACACACCAGACACCTCAACTGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTCATCTGCCTCTTCCTCCACATTGGCCGAGGCCTATACTATGGATCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGGGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTATCAGCCATCCCATACATCGGATCTGACCTCGTACAA [...]
+Seq45 ATGACTACCCCCCGCAAGACACATCCACTAACAAAAATCATTAACAACTCATTCATTGATCTCCCCACACCATCCAACATTTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATCCAAATCACTACAGGTCTATTCCTAGCCATACATTATACACCAGACACTTCAACTGCCTTCTCCTCGGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAACGGCGCTTCCATATTCTTCATCTGCCTATTCCTCCACATTGGCCGAGGCTTATATTACGGGTCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTACGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTATCAGCCATCCCATACATCGGATCTGACCTCGTACAA [...]
+Seq46 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGGTGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACCGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTACCTACATGCCAACGGCGCCTCCATATTCTTTATCTGCCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTGTCAGCCATTCCATATATCGGGTCTGACCTCGTACAA [...]
+Seq47 ATGACTACTCCCCGCAAAACACATCCACTAGCAAAAATCATCAACAACTCATTCATTGATCTCCCTACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACTGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTCATCTGCCTCTTCCTTCACATTGGCCGAGGCCTATATTATGGATCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTTCTACTCACAACTATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTCCTATCAGCCATCCCATACATCGGACCTGACCTCGTACAA [...]
+Seq48 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGGTGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACCGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTATCTACATGCCAACGGCGCCTCCATATTCTTTATCTGCCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGGAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTGTCAGCCATTCCATATATCGGGTCTGACCTCGTACAA [...]
+Seq49 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGCATTTGCCTCATTATTCAAATTACTACAGGCCTATTCTTAGCCATACACTATACACCAGATACTTCAACCGCCTTCTCTTCAGTCGCTCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTTATCTGTCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGCGCCACAGTAATTACAAACCTTCTGTCAGCCATCCCATATATCGGATCTGACCTTGTACAA [...]
diff --git a/testData/49.model b/testData/49.model
new file mode 100644
index 0000000..c586c71
--- /dev/null
+++ b/testData/49.model
@@ -0,0 +1,4 @@
+DNA, gene1 = 1-300
+DNA, gene2 = 301-900
+DNA, gene3 = 901-1100
+DNA, gene4 = 1101-1200
diff --git a/testData/49.tree b/testData/49.tree
new file mode 100644
index 0000000..ce35154
--- /dev/null
+++ b/testData/49.tree
@@ -0,0 +1 @@
+(Seq14,(((Seq10,Seq42),((Seq47,(Seq44,Seq45)),(Seq49,(Seq48,Seq46)))),(((((((Seq2,((Seq6,Seq3),Seq7)),Seq4),Seq5),(Seq9,Seq8)),(Seq13,(Seq11,Seq12))),((Seq27,(Seq31,Seq41)),((((Seq22,((Seq38,Seq40),(Seq28,Seq21))),(((Seq34,Seq30),((Seq20,Seq26),(((Seq19,Seq36),(Seq37,Seq33)),(Seq25,Seq23)))),(Seq17,(Seq29,(((Seq16,Seq15),Seq39),(Seq35,Seq32)))))),Seq18),Seq24))),Seq43)),Seq1);
diff --git a/versionHeader/version.h b/versionHeader/version.h
new file mode 100644
index 0000000..b5973b3
--- /dev/null
+++ b/versionHeader/version.h
@@ -0,0 +1,4 @@
+#define programName        "ExaML"
+#define programVersion     "3.0.18"
+#define programVersionInt  3018
+#define programDate        "February 14 2017"

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/examl.git



More information about the debian-med-commit mailing list