[med-svn] [Git][med-team/metaphlan2][upstream] New upstream version 2.7.5

Fri Feb 16 10:19:04 UTC 2018

Andreas Tille pushed to branch upstream at Debian Med / metaphlan2


Commits:
4f4e0b55 by Andreas Tille at 2018-02-16T10:37:33+01:00
New upstream version 2.7.5
- - - - -


21 changed files:

- .hg_archival.txt
- + .hgignore
- .hgsubstate
- .hgtags
- README.md
- + __init__.py
- + _metaphlan2.py
- − db_v20/mpa_v20_m200.pkl
- metaphlan2.py
- + plugin_setup.py
- strainphlan.py
- strainphlan_src/add_metadata_tree.py
- strainphlan_src/compute_distance.py
- + strainphlan_src/compute_distance_all.py
- + strainphlan_src/plot_tree_ete2.py
- strainphlan_src/sample2markers.py
- strainphlan_tutorial/step2_fastq2sam.sh
- utils/extract_markers.py
- − utils/markers_info.txt.bz2
- utils/metaphlan_hclust_heatmap.py
- + utils/read_fastx.py


Changes:

=====================================
.hg_archival.txt
=====================================

--- a/.hg_archival.txt
+++ b/.hg_archival.txt
@@ -1,4 +1,4 @@
-repo: b4e7c5505112b08d33dd30f4788429ba023e67f0
-node: c43e40a443edbd3c4cac7349d2679540578096f5
+repo: 092c2fe2278cb7f0b18d81faeb4aab98b89dc096
+node: b2f9b3286d4be376805e3b5c26cf141ed375c605
 branch: default
-tag: 2.6.0
+tag: 2.7.5


=====================================
.hgignore
=====================================
--- /dev/null
+++ b/.hgignore
@@ -0,0 +1,7 @@
+syntax: glob
+
+databases/
+*.pyc
+build/
+dist/
+*.egg-info/


=====================================
.hgsubstate
=====================================
--- a/.hgsubstate
+++ b/.hgsubstate
@@ -1,2 +1,2 @@
-f8823b8162ddea6533866afd27d5ed1ce6ff22e0 utils/export2graphlan
-0d8cb18ce9996e7ce4043a00294aeb2ed9bfa5f2 utils/hclust2
+c168a100f37e23e2c110849a8d91fac8da49f5bd utils/export2graphlan
+35dfd725e7f024fc6d0edef0cc191c7963108787 utils/hclust2


=====================================
.hgtags
=====================================
--- a/.hgtags
+++ b/.hgtags
@@ -1,9 +1,13 @@
-b4e7c5505112b08d33dd30f4788429ba023e67f0 2.0_alpha1
-60d254d499e2dd1a8b1cfe344236efa47f823ec6 2.0_beta1
-1b6df65b5a3e9feed0179f855c11fd197fe9a64f 2.0_beta2
-12cceaad3493085c4497898aaeff691913ddb633 2.0_beta3
-616a7debe7937672940130e6c5b26a9ef9e76fcd 2.0.0
-3959b668bbed6150698b594cbbc30a924e5d30e1 2.1.0
-0ef29ae841f52b53176ca264fb9f52f98713eb3c 2.2.0
-5424bb911dfcdb7212ea0949d4faeb6e69cfa61f 2.3.0
-6f2a1673af8565e93fb8e69238141889b7c87361 2.5.0
+092c2fe2278cb7f0b18d81faeb4aab98b89dc096 2.0_alpha1
+c5c90e145ff40fb2fc3827651d572aa9a724ba31 2.0_beta1
+7168beb9750d223736888cefa387252d019f6a10 2.0_beta2
+46a5e65865233da6d05ebded2700f1854bab9878 2.0_beta3
+9943159669e10c4943c1d3440866e93ce536617a 2.0.0
+56fbf765ffaa2b851c0bbd17f26083a6710a404e 2.1.0
+97d782790746b6a1124fac7cadeca7bfa9f797cf 2.2.0
+f3325ec17035523cf9f2ea7736afcc119bd94a89 2.3.0
+e424931b4d94d50cf62381c79c335935415b33b9 2.5.0
+6d6433aa0f6856bff2e84a757b4084736bc3738f 2.6.0
+8963e486f79043c79a299f7a684e4550b0115c32 2.7.0
+d8ab9ca4244c09a7a4995042a99fbba1e3598ac0 2.7.1
+a1fe0d15320c04f69d56f1b7dd31cff972a7b8df 2.7.2


=====================================
README.md
=====================================
--- a/README.md
+++ b/README.md
@@ -1,13 +1,12 @@
 [TOC]
 
-#**MetaPhlAn 2: Metagenomic Phylogenetic Analysis**#
+# MetaPhlAn 2: Metagenomic Phylogenetic Analysis
 
-AUTHORS: Duy Tin Truong (duytin.truong at unitn.it), Nicola Segata (nicola.segata at unitn.it)
 
-##**Description**##
-MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution. From version 2.0, MetaPhlAn is also able to identify specific strains (in the not-so-frequent cases in which the sample contains a previously sequenced strains) and to track strains across samples for all species.
+## Description
+MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
 
-MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file can be found at src/utils/markers_info.txt.bz2 or here](https://bitbucket.org/biobakery/metaphlan2/src/473a41eba501df5f750da032d4f04b38db98dde1/utils/markers_info.txt.bz2?at=default)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
+MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file mpa_v20_m200_marker_info.txt.bz2 can be found in the Download page here](https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v20_m200_marker_info.txt.bz2)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
 
 * unambiguous taxonomic assignments;
 * accurate estimation of organismal relative abundance;
@@ -16,15 +15,38 @@ MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker inform
 * orders of magnitude speedups compared to existing methods.
 * metagenomic strain-level population genomics
 
-If you use this software, please cite :
+If you use MetaPhlAn version 1, please cite:
 
-[**MetaPhlAn2 for enhanced metagenomic taxonomic profiling.**](http://www.nature.com/nmeth/journal/v12/n10/pdf/nmeth.3589.pdf)
- *Duy Tin Truong, Eric A Franzosa, Timothy L Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower & Nicola Segata*. 
-Nature Methods 12, 902–903 (2015)
+[**Metagenomic microbial community profiling using unique clade-specific marker genes.**](https://www.nature.com/articles/nmeth.2066) *Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson, &  Curtis Huttenhower*. Nature Methods 9, 811-814 (2012)
+
+If you use MetaPhlAn2, please cite:
+
+[**MetaPhlAn2 for enhanced metagenomic taxonomic profiling.**](http://www.nature.com/nmeth/journal/v12/n10/pdf/nmeth.3589.pdf) *Duy Tin Truong, Eric A Franzosa, Timothy L Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower & Nicola Segata*. Nature Methods 12, 902-903 (2015)
+
+If you use StrainPhlAn, please cite the MetaPhlAn2 paper and the following StrainPhlAn paper:
+
+[**Microbial strain-level population structure and genetic diversity from metagenomes.**](http://genome.cshlp.org/content/27/4/626.full.pdf) *Duy Tin Truong, Adrian Tett, Edoardo Pasolli, Curtis Huttenhower, & Nicola Segata*. Genome Research 27:626-638 (2017)
 
 -------------
 
-##**Pre-requisites**##
+## MetaPhlAn and StrainPhlAn resources
+
+In addition to the information on this page, you can refer to the following additional resources.
+
+* The [MetaPhlAn2 tutorial on bioBakery](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
+
+* The [StrainPhlAn tutorial on bioBakery](https://bitbucket.org/biobakery/biobakery/wiki/strainphlan).
+
+* The MetaPhlAn2 and StrainPhlAn [Google Group](http://groups.google.com/forum/#!forum/metaphlan-users) ([metaphlan-users at googlegroups.com](mailto:metaphlan-users at googlegroups.com))
+
+* Related tools including [PanPhlAn](https://bitbucket.org/CibioCM/panphlan/src) (and its [tutorial](https://bitbucket.org/CibioCM/panphlan/wiki/Home)), [GraPhlAn](https://bitbucket.org/nsegata/graphlan/wiki/Home) (and it [tutorial](https://bitbucket.org/biobakery/biobakery/wiki/graphlan)), [PhyloPhlAn2](https://bitbucket.org/nsegata/phylophlan/wiki/Home) (and its [tutorial](https://bitbucket.org/biobakery/biobakery/wiki/phylophlan)), [HUMAnN2](https://bitbucket.org/biobakery/humann2/wiki/Home) (and its [tutorial](https://bitbucket.org/biobakery/biobakery/wiki/humann2)).
+
+* The related [bioBakery workflows](https://bitbucket.org/biobakery/biobakery/wiki/biobakery_workflows)
+
+
+-------------
+
+## Pre-requisites
 
 MetaPhlAn requires *python 2.7* or higher with argparse, tempfile and [*numpy*](http://www.numpy.org/) libraries installed 
   (apart for numpy they are usually installed together with the python distribution). 
@@ -42,11 +64,11 @@ MetaPhlAn requires *python 2.7* or higher with argparse, tempfile and [*numpy*](
 
 ----------------------
 
-##**Installation**##
+## Installation
 
 MetaPhlAn 2.0 can be obtained by either
 
-[Downloading MetaPhlAn v2.0](https://bitbucket.org/biobakery/metaphlan2/get/default.zip)  
+Dowloading MetaPhlAn v2.0 [here](https://bitbucket.org/biobakery/metaphlan2/get/default.zip) or [here](https://www.dropbox.com/s/ztqr8qgbo727zpn/metaphlan2.zip?dl=0)
 
 **OR**
 
@@ -56,13 +78,14 @@ Cloning the repository via the following commands
 --------------------------
 
 
-##**Basic Usage**##
+## Basic Usage
 
 This section presents some basic usages of MetaPhlAn2, for more advanced usages, please see at [its wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
 
 We assume here that ``metaphlan2.py`` is in the system path and that ``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this two variables moving to your MetaPhlAn2 local folder and type:
+
 ```
-#!cmd
+#!bash
 $ export PATH=`pwd`:$PATH
 $ export mpa_dir=`pwd`
 ```
@@ -70,28 +93,28 @@ $ export mpa_dir=`pwd`
 Here is the basic example to profile a metagenome from raw reads (requires BowTie2 in the system path with execution and read permissions, Perl installed). 
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py metagenome.fastq --input_type fastq > profiled_metagenome.txt
 ```
 
 It is highly recommended to save the intermediate BowTie2 output for re-running MetaPhlAn extremely quickly (--bowtie2out), and use multiple CPUs (--nproc) if available:
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
 ```
 
 If you already mapped your metagenome against the marker DB (using a previous  MetaPhlAn run), you can obtain the results in few seconds by using the previously saved --bowtie2out file and specifying the input (--input_type bowtie2out):
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > profiled_metagenome.txt
 ```
 
 You can also provide an externally BowTie2-mapped SAM if you specify this format with --input_type. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
 
 ```
-#!cmd
+#!bash
 $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq
 $ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
 ```
@@ -99,40 +122,40 @@ $ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
 In order to make MetaPhlAn 2 easily compatible with complex metagenomic pipeline, there are now multiple alternative ways to pass the input:
 
 ```
-#!cmd
+#!bash
 $ cat metagenome.fastq | metaphlan2.py --input_type fastq > profiled_metagenome.txt
 ```
 
 ```
-#!cmd
+#!bash
 $ tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2.py --input_type fastq --bowtie2db ${mpa_dir}/db_v20/mpa_v20_m200 > profiled_metagenome.txt
 ```
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py --input_type fastq < metagenome.fastq > profiled_metagenome.txt
 ```
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py --input_type fastq <(bzcat metagenome.fastq.bz2) > profiled_metagenome.txt
 ```
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py --input_type fastq <(zcat metagenome_1.fastq.gz metagenome_2.fastq.gz) > profiled_metagenome.txt
 ```
 
 MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not use the paired-end information), and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
 
 ```
-#!cmd
+#!bash
 $ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
 ```
 
 For advanced options and other analysis types (such as strain tracking) please refer to the full command-line options.
 
-##**Full command-line options**##
+## Full command-line options
 
 
 ```
@@ -357,35 +380,36 @@ Other arguments:
 
 ```
 
-##**Utility Scripts**##
+## Utility Scripts
 
 MetaPhlAn's repository features a few utility scripts to aid in manipulation of sample output and its visualization. These scripts can be found under the ``utils`` folder in the metaphlan2 directory.
 
-###**Merging Tables**###
+### Merging Tables
 
-The script **merge_metaphlan_tables.py** allows to combine MetaPhlAn output from several samples to be merged into one table Bugs (rows) vs Samples (columns) with the table enlisting the relative normalized abundances per sample per bug.
+The script **merge\_metaphlan\_tables.py** allows to combine MetaPhlAn output from several samples to be merged into one table Bugs (rows) vs Samples (columns) with the table enlisting the relative normalized abundances per sample per bug.
 
 To merge multiple output files, run the script as below
 
 ```
-#!cmd
+#!bash
 $ python utils/merge_metaphlan_tables.py metaphlan_output1.txt metaphlan_output2.txt metaphlan_output3.txt > output/merged_abundance_table.txt
 ```
 
 Wildcards can be used as needed:
+
 ```
-#!cmd
+#!bash
 $ python utils/merge_metaphlan_tables.py metaphlan_output*.txt > output/merged_abundance_table.txt
 ```
 
 **There is no limit to how many files you can merge.**
 
-##**Heatmap Visualization**##
+## Heatmap Visualization
 
-The script **metaphlan_hclust_heatmap.py** allows to visualize the MetaPhlAn results in the form of a hierarchically-clustered heatmap. To generate the heatmap for a merged MetaPhlAn output table (as described above), please run the script as below.
+The script **metaphlan\_hclust\_heatmap.py** allows to visualize the MetaPhlAn results in the form of a hierarchically-clustered heatmap. To generate the heatmap for a merged MetaPhlAn output table (as described above), please run the script as below.
 
 ```
-#!cmd
+#!bash
 $ python utils/metaphlan_hclust_heatmap.py -c bbcry --top 25 --minv 0.1 -s log --in output/merged_abundance_table.txt --out output_images/abundance_heatmap.png
 ```
 
@@ -393,8 +417,6 @@ For detailed command-line instructions, please refer to below:
 
 
 ```
-#!
-
 $ utils/metaphlan_hclust_heatmap.py -h
 usage: metaphlan_hclust_heatmap.py [-h] --in INPUT_FILE --out OUTPUT_FILE
                                    [-m {single,complete,average,weighted,centroid,median,ward}]
@@ -467,41 +489,34 @@ optional arguments:
                         Set the colormap. Default is "jet".
 ```
 
-###**GraPhlAn Visualization**###
+### GraPhlAn Visualization
 
 The tutorial of using GraPhlAn can be found from [the MetaPhlAn2 wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
 
 
-##**Customizing the database**##
+## Customizing the database
 In order to add a marker to the database, the user needs the following steps:
 
 * Reconstruct the marker sequences (in fasta format) from the MetaPhlAn2 bowtie2 database by:
 
 ```
 #!bash
-
 bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
-
 ```
 
-
 * Add the marker sequence stored in a file new_marker.fasta to the marker set:
 
 ```
 #!bash
-
 cat new_marker.fasta >> metaphlan2/markers.fasta
-
 ```
 
 * Rebuild the bowtie2 database:
 
 ```
 #!bash
-
 mkdir metaphlan2/db_v21/mpa_v21_m200
 bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200
-
 ```
 
 * Assume that the new marker was extracted from genome1, genome2. Update the taxonomy file from python console as follows:
@@ -536,15 +551,15 @@ pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)
 ofile.close()
 ```
 
-* To use the new database, switch to metaphlan2/db_v21 instead of metaphlan2/db_v20 when running metaphlan2.py with option "--mpa_pkl".
+* To use the new database, switch to metaphlan2/db_v21 instead of metaphlan2/db\_v20 when running metaphlan2.py with option "--mpa\_pkl".
 
 
-##**Metagenomic strain-level population genomics**##
+## Metagenomic strain-level population genomics
 
 StrainPhlAn is a computational tool for tracking individual strains across large set of samples. **The input** of StrainPhlAn is a set of metagenomic samples and for each species, **the output** is a multiple sequence alignment (MSA) file of all species strains reconstructed directly from the samples. From this MSA, StrainPhlAn calls RAxML (or other phylogenetic tree builders) to build the phylogenetic tree showing the strain evolution of the sample strains. 
 For each sample, StrainPhlAn extracts the strain of a specific species by merging and concatenating all reads mapped against that species markers in the MetaPhlAn2 database.
 
-In detail, let us start from a toy example with 6 HMP gut metagenomic samples (SRS055982-subjectID_638754422, SRS022137-subjectID_638754422, SRS019161-subjectID_763496533, SRS013951-subjectID_763496533, SRS014613-subjectID_763840445, SRS064276-subjectID_763840445) from 3 three subjects (each was sampled at two time points) and one *Bacteroides caccae* genome G000273725. 
+In detail, let us start from a toy example with 6 HMP gut metagenomic samples (SRS055982-subjectID\_638754422, SRS022137-subjectID\_638754422, SRS019161-subjectID\_763496533, SRS013951-subjectID\_763496533, SRS014613-subjectID\_763840445, SRS064276-subjectID\_763840445) from 3 three subjects (each was sampled at two time points) and one *Bacteroides caccae* genome G000273725. 
 **We would like to**:
 
 * extract the *Bacteroides caccae* strains from these samples and compare them with the reference genome in a phylogenetic tree.
@@ -555,33 +570,27 @@ Running StrainPhlAn on these samples, we will obtain the *Bacteroides caccae* ph
 ![tree_alignment.png](https://bitbucket.org/repo/rM969K/images/476974413-tree_alignment.png)
 
 We can see that the strains from the same subject are grouped together. The tree also highlights that the strains from subject "763840445" (red color) do not change between two sampling time points whereas the strains from the other subjects have slightly evolved. From the tree, we also know that the strains of subject "763496533" is closer to the reference genome than those of the others. 
-In addition, the table below shows the number of snps between the sample strains and the reference genome based on the strain alignment returned by MetaPhlAN_Strainer.
+In addition, the table below shows the number of snps between the sample strains and the reference genome based on the strain alignment returned by MetaPhlAn\_Strainer.
 
 ![snp_distance.png](https://bitbucket.org/repo/rM969K/images/1771497600-snp_distance.png)
 
-In the next sections, we will illustrate step by step how to run MetaPhlAn_Strainer on this toy example to reproduce the above figures.
+In the next sections, we will illustrate step by step how to run MetaPhlAn\_Strainer on this toy example to reproduce the above figures.
 
-### Pre-requisites ###
+### Pre-requisites
 StrainPhlAn requires *python 2.7* and the libraries [pysam](http://pysam.readthedocs.org/en/latest/) (tested on **version 0.8.3**), [biopython](http://biopython.org/wiki/Main_Page), [msgpack](https://pypi.python.org/pypi/msgpack-python) and [numpy](http://www.numpy.org/), [dendropy](https://pythonhosted.org/DendroPy/) (tested on version **3.12.0**). Besides, StrainPhlAn also needs the following programs in the executable path:
 
 * [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) for mapping reads against the marker database.
-
 * [MUSCLE](http://www.drive5.com/muscle/) for the alignment step.
-
 * [samtools, bcftools and vcfutils.pl](http://samtools.sourceforge.net/) which can be downloaded from [here](https://github.com/samtools) for building consensus markers. Note that vcfutils.pl is included in bcftools and **StrainPhlAn only works with samtools version 0.1.19** as samtools has changed the output format after this version.
-
 * [blastn](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) for adding reference genomes to the phylogenetic tree.
-
 * [raxmlHPC and raxmlHPC-PTHREADS-SSE3](http://sco.h-its.org/exelixis/web/software/raxml/index.html) for building the phylogenetic trees.
 
 All dependence binaries on Linux 64 bit can be downloaded in the folder "bin" from [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
 
 The script files in folder "strainphlan_src" should be changed to executable mode by:
 
-
 ```
 #!python
-
 chmod +x strainphlan_src/*.py
 ```
 
@@ -589,11 +598,10 @@ and add to the executable path:
 
 ```
 #!python
-
 export PATH=$PATH:$(pwd -P)/strainphlan_src
 ```
 
-### Usage ###
+### Usage
 
 Let's reproduce the toy example result in the introduction section. Note that all the commands to run the below steps are in the "strainer_tutorial/step?*.sh" files (? corresponding to the step number). All the below steps are excuted under the "strainer_tutorial" folder.
 The steps are as follows:
@@ -608,7 +616,6 @@ The commands to run are:
 
 ```
 #!python
-
 mkdir -p sams
 for f in $(ls fastqs/*.bz2)
 do
@@ -629,7 +636,6 @@ The commands to run are:
 
 ```
 #!python
-
 mkdir -p consensus_markers
 cwd=$(pwd -P)
 export PATH=${cwd}/../strainphlan_src:${PATH}
@@ -640,27 +646,26 @@ The result is the same if you want run several sample2markers.py scripts in para
 After this step, you will have a folder "consensus_markers" containing all sample-marker files (\*.markers).
 This steps will take around 44 minutes.  If you want to skip this step, you can download the consensus marker files from the folder "consensus_markers" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
 
-Step 4. Extract the markers of *Bacteroides_caccae* from MetaPhlAn2 database (to add its reference genome later):
+Step 4. Extract the markers of *Bacteroides\_caccae* from MetaPhlAn2 database (to add its reference genome later):
 
 This step will extract the markers of *Bacteroides_caccae* in the database and then StrainPhlAn will identify the sequences in the reference genomes that are closet to them (in the next step by using blast). Those will be concatenated and referred as *reference-genome-reconstructed strains*. 
 The commands to run are:
 
 ```
 #!python
-
 mkdir -p db_markers
 bowtie2-inspect ../db_v20/mpa_v20_m200 > db_markers/all_markers.fasta
 python ../strainphlan_src/extract_markers.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_markers db_markers/all_markers.fasta --clade s__Bacteroides_caccae --ofn_markers db_markers/s__Bacteroides_caccae.markers.fasta
 ```
 
-Note that the "all_markers.fasta" file consists can be reused for extracting other reference genomes. 
-After this step, you should have two files in folder "db_markers": "all_markers.fasta" containing all marker sequences, and "s__Bacteroides_caccae.markers.fasta" containing the markers of *Bacteroides caccae*.
-This step will take around 1 minute and can skipped if you do not need to add the reference genomes to the phylogenetic tree. Those markers can be found in the folder "db_markers" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+Note that the "all\_markers.fasta" file consists can be reused for extracting other reference genomes. 
+After this step, you should have two files in folder "db\_markers": "all\_markers.fasta" containing all marker sequences, and "s\_\_Bacteroides\_caccae.markers.fasta" containing the markers of *Bacteroides caccae*.
+This step will take around 1 minute and can skipped if you do not need to add the reference genomes to the phylogenetic tree. Those markers can be found in the folder "db\_markers" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
 
 Before building the trees, we should get the list of all clades detected from the samples and save them in the "output/clades.txt" file by the following command:
+
 ```
 #!python
-
 python ../strainphlan.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_samples consensus_markers/*.markers --output_dir output --nprocs_main 10 --print_clades_only > output/clades.txt
 ```
 
@@ -675,31 +680,28 @@ The commands to run are:
 
 ```
 #!python
-
 mkdir -p output
 python ../strainphlan.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_samples consensus_markers/*.markers --ifn_markers db_markers/s__Bacteroides_caccae.markers.fasta --ifn_ref_genomes reference_genomes/G000273725.fna.bz2 --output_dir output --nprocs_main 10 --clades s__Bacteroides_caccae &> output/log_full.txt
 ```
 
-This step will take around 2 minutes. After this step, you will find the tree "output/RAxML_bestTree.s__Bacteroides_caccae.tree". All the output files can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+This step will take around 2 minutes. After this step, you will find the tree "output/RAxML\_bestTree.s\_\_Bacteroides\_caccae.tree". All the output files can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
 You can view it by [Archaeopteryx](https://sites.google.com/site/cmzmasek/home/software/archaeopteryx) or any other viewers.
 
-By default, if you do not specify reference genomes (by --ifn_ref_genomes) and any specific clade (by --clades), strainphlan.py will build the phylogenetic trees for all species that it can detect.
+By default, if you do not specify reference genomes (by --ifn\_ref\_genomes) and any specific clade (by --clades), strainphlan.py will build the phylogenetic trees for all species that it can detect.
 
-In order to add the metadata, we also provide a script called "add_metadata_tree.py" which can be used as follows:
+In order to add the metadata, we also provide a script called "add\_metadata\_tree.py" which can be used as follows:
 
 ```
 #!python
-
 python ../strainphlan_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
 ```
 
-The script "add_metadata_tree.py" can accept multiple metadata files (space separated, wild card can also be used) and multiple trees. A metadata file is a tab separated file where the first row is the meta-headers, and the following rows contain the metadata for each sample. Multiple metadata files are used in the case where your samples come from more than one dataset and you do not want to merge the metadata files.
-For more details of using "add_metadata_tree.py", please see its help (with option "-h").
+The script "add\_metadata\_tree.py" can accept multiple metadata files (space separated, wild card can also be used) and multiple trees. A metadata file is a tab separated file where the first row is the meta-headers, and the following rows contain the metadata for each sample. Multiple metadata files are used in the case where your samples come from more than one dataset and you do not want to merge the metadata files.
+For more details of using "add\_metadata\_tree.py", please see its help (with option "-h").
 An example of a metadata file is the "fastqs/metadata.txt" file with the below content:
 
 ```
 #!python
-
 sampleID        subjectID
 SRS055982       638754422
 SRS022137       638754422
@@ -719,11 +721,10 @@ If you have installed [graphlan](https://bitbucket.org/nsegata/graphlan/wiki/Hom
 
 ```
 #!python
-
 python ../strainphlan_src/plot_tree_graphlan.py --ifn_tree output/RAxML_bestTree.s__Bacteroides_caccae.tree.metadata --colorized_metadata subjectID
 ```
 
-and obtain the following figure (output/RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png):
+and obtain the following figure (output/RAxML\_bestTree.s\_\_Bacteroides\_caccae.tree.metadata.png):
 
 ![RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png](https://bitbucket.org/repo/rM969K/images/1574126761-RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png)
 
@@ -731,34 +732,33 @@ Step 6. If you want to remove the samples with high-probability of containing mu
 
 ```
 #!python
-
 python ../strainphlan_src/build_tree_single_strain.py --ifn_alignments output/s__Bacteroides_caccae.fasta --nprocs 10 --log_ofn output/build_tree_single_strain.log
 python ../strainphlan_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.remove_multiple_strains.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
 ```
 
-You will obtain the refined tree "output/RAxML_bestTree.s__Bacteroides_caccae.remove_multiple_strains.tree.metadata". This tree can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+You will obtain the refined tree "output/RAxML\_bestTree.s\_\_Bacteroides\_caccae.remove\_multiple\_strains.tree.metadata". This tree can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
 
-### Some useful options ###
+### Some useful options
 All option details can be viewed by strainphlan.py help:
+
 ```
 #!python
-
 python ../strainphlan.py -h
 ```
 
 The default setting can be stringent for some cases where you have very few samples left in the phylogenetic tree. You can relax some parameters to add more samples back:
 
-1. *marker_in_clade*: In each sample, the clades with the percentage of present markers less than this threshold are removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
-2. *sample_in_marker*: If the percentage of samples that a marker present in is less than this threhold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
-3. *N_in_marker*: The consensus markers with the percentage of N nucleotides greater than this threshold are removed. Default "0.2". You can set this parameter to "0.5" to add some more samples.
-4. *gap_in_sample*: The samples with full sequences concatenated from all markers and having the percentage of gaps greater than this threshold will be removed. Default 0.2. You can set this parameter to "0.5" to add some more samples.
-5. *relaxed_parameters*: use this option to automatically set the above parameters to add some more samples by accepting some more gaps, Ns, etc. This option is equivalent to set: marker_in_clade=0.5, sample_in_marker=0.5,                        N_in_marker=0.5, gap_in_sample=0.5. Default "False".
-6. *relaxed_parameters2*: use this option to add more samples by accepting some noise. This is equivalent to set marker_in_clade=0.2, sample_in_marker=0.2, N_in_marker=0.8, gap_in_sample=0.8. Default "False".
+1. *marker\_in\_clade*: In each sample, the clades with the percentage of present markers less than this threshold are removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+2. *sample\_in\_marker*: If the percentage of samples that a marker present in is less than this threhold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+3. *N\_in\_marker*: The consensus markers with the percentage of N nucleotides greater than this threshold are removed. Default "0.2". You can set this parameter to "0.5" to add some more samples.
+4. *gap\_in\_sample*: The samples with full sequences concatenated from all markers and having the percentage of gaps greater than this threshold will be removed. Default 0.2. You can set this parameter to "0.5" to add some more samples.
+5. *relaxed\_parameters*: use this option to automatically set the above parameters to add some more samples by accepting some more gaps, Ns, etc. This option is equivalent to set: marker\_in\_clade=0.5, sample\_in\_marker=0.5, N\_in\_marker=0.5, gap\_in\_sample=0.5. Default "False".
+6. *relaxed\_parameters2*: use this option to add more samples by accepting some noise. This is equivalent to set marker\_in\_clade=0.2, sample\_in\_marker=0.2, N\_in\_marker=0.8, gap\_in\_sample=0.8. Default "False".
 
-### Some other useful output files ###
+### Some other useful output files
 In the output folder, you can find the following files:
 
 1. clade_name.fasta: the alignment file of all metagenomic strains.
 3. *.marker_pos: this file shows the starting position of each marker in the strains.
 3. *.info: this file shows the general information like the total length of the concatenated markers (full sequence length), number of used markers, etc.
-4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage_of_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg_coverage" is the average coverage at all polymorphic sites.
\ No newline at end of file
+4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg\_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg\_coverage" is the average coverage at all polymorphic sites.
\ No newline at end of file


=====================================
__init__.py
=====================================
--- /dev/null
+++ b/__init__.py
@@ -0,0 +1,10 @@
+from metaphlan2 import metaphlan2
+from ._metaphlan2 import profile_single_fastq
+from ._metaphlan2 import profile_paired_fastq
+
+
+__author__ = metaphlan2.__author__
+__version__ = metaphlan2.__version__
+__date__ = metaphlan2.__date__
+
+__all__ = ['profile_single_fastq', 'profile_paired_fastq']


=====================================
_metaphlan2.py
=====================================
--- /dev/null
+++ b/_metaphlan2.py
@@ -0,0 +1,48 @@
+# Run MetaPhlAn2
+# Author: Francesco Asnicar
+# This module defines the functions which run MetaPhlAn2 on
+# single and paired fastq data.
+
+
+import subprocess as sb
+from q2_types.per_sample_sequences import SingleLanePerSampleSingleEndFastqDirFmt
+from q2_types.per_sample_sequences import SingleLanePerSamplePairedEndFastqDirFmt
+import tempfile
+import biom
+import os
+
+
+def metaphlan2_helper(raw_data, nproc, input_type, output_file, verbose=True):
+    cmd = ['metaphlan2.py', str(raw_data), '--input_type', str(input_type),
+           '--biom', str(output_file), '--nproc', str(nproc)]
+
+    if verbose:
+        print("\nRunning external command line application. This may print "
+              "messages to stdout and/or stderr.")
+        print("Command: {}".format(' '.join(cmd)), end='\n\n')
+
+    sb.run(cmd, check=True)
+
+
+def profile_single_fastq(raw_data: SingleLanePerSampleSingleEndFastqDirFmt,
+                         nproc: int=1) -> biom.Table:
+    output_biom = None
+
+    with tempfile.TemporaryDirectory() as tmp_dir:
+        tmp_output_biom = os.path.join(tmp_dir, 'mp2_tmp_output.biom')
+        metaphlan2_helper(raw_data, nproc, 'multifastq', tmp_output_biom)
+        output_biom = biom.load_table(tmp_output_biom)
+
+    return output_biom
+
+
+def profile_paired_fastq(raw_data: SingleLanePerSamplePairedEndFastqDirFmt,
+                         nproc: int=1) -> biom.Table:
+    output_biom = None
+
+    with tempfile.TemporaryDirectory() as tmp_dir:
+        tmp_output_biom = os.path.join(tmp_dir, 'mp2_tmp_output.biom')
+        metaphlan2_helper(raw_data, nproc, 'multifastq', tmp_output_biom)
+        output_biom = biom.load_table(tmp_output_biom)
+
+    return output_biom


=====================================
db_v20/mpa_v20_m200.pkl deleted
=====================================
Binary files a/db_v20/mpa_v20_m200.pkl and /dev/null differ


=====================================
metaphlan2.py
=====================================
--- a/metaphlan2.py
+++ b/metaphlan2.py
@@ -1,46 +1,79 @@
 #!/usr/bin/env python
-
-from __future__ import with_statement 
+from __future__ import with_statement
 
 # ==============================================================================
 # MetaPhlAn v2.x: METAgenomic PHyLogenetic ANalysis for taxonomic classification
 #                 of metagenomic data
 #
-# Authors: Nicola Segata (nicola.segata at unitn.it), 
-#          Duy Tin Truong (duytin.truong at unitn.it)
+# Authors: Nicola Segata (nicola.segata at unitn.it),
+#          Duy Tin Truong,
+#          Francesco Asnicar (f.asnicar at unitn.it)
 #
 # Please type "./metaphlan2.py -h" for usage help
 #
 # ==============================================================================
 
-__author__ = 'Nicola Segata (nicola.segata at unitn.it), Duy Tin Truong (duytin.truong at unitn.it)'
-__version__ = '2.6.0'
-__date__ = '19 August 2016'
+__author__ = ('Nicola Segata (nicola.segata at unitn.it), '
+              'Duy Tin Truong, '
+              'Francesco Asnicar (f.asnicar at unitn.it)')
+__version__ = '2.7.5'
+__date__ = '6 February 2018'
 
 
 import sys
 import os
 import stat
 import re
-from binascii import b2a_uu 
-
+import time
+import tarfile
+# from binascii import b2a_uu
 try:
-    import numpy as np 
+    import numpy as np
 except ImportError:
     sys.stderr.write("Error! numpy python library not detected!!\n")
     sys.exit(1)
 import tempfile as tf
 import argparse as ap
 import subprocess as subp
-import multiprocessing as mp
+try:
+    from subprocess import DEVNULL  # py3k
+except ImportError:
+    DEVNULL = open(os.devnull, 'wb')
+# import multiprocessing as mp
 from collections import defaultdict as defdict
-import bz2 
+import bz2
 import itertools
 from distutils.version import LooseVersion
 try:
     import cPickle as pickle
-except:
+except ImportError:
     import pickle
+# try to import urllib.request.urlretrieve for python3
+try:
+    from urllib.request import urlretrieve
+except ImportError:
+    from urllib import urlretrieve
+from glob import glob
+import hashlib
+
+
+# set the location of the database download url
+DATABASE_DOWNLOAD = "https://bitbucket.org/biobakery/metaphlan2/downloads/"
+# get the directory that contains this script
+metaphlan2_script_install_folder = os.path.dirname(os.path.abspath(__file__))
+# get the default database folder
+DEFAULT_DB_FOLDER = os.path.join(metaphlan2_script_install_folder, "databases")
+
+
+#**********************************************************************************************
+#  Modification of Code :                                                                     *
+#  Modified the code so instead of using the current clade IDs, which are numbers, we will    *
+#      use the clade_names                                                                    *
+#      Users reported the biom output is invalid and also the IDs were changing from run to   *
+#      run.                                                                                   *
+#  George Weingart    05/22/2017   george.weingart at mail.com                                   *
+#**********************************************************************************************
+
 
 
 #*************************************************************
@@ -49,7 +82,7 @@ except:
 try:
     import biom
     import biom.table
-    import numpy as np
+    # import numpy as np  # numpy already imported above
 except ImportError:
     sys.stderr.write("Warning! Biom python library not detected!"
                      "\n Exporting to biom format will not work!\n")
@@ -59,11 +92,10 @@ except ImportError:
     sys.stderr.write("Warning! json python library not detected!"
                      "\n Exporting to biom format will not work!\n")
 
+
 # This set contains the markers that after careful validation are found to have low precision or recall
 # We esclude the markers here to avoid generating a new marker DB when changing just few markers
-markers_to_exclude = \
-    set([
-        'NC_001782.1','GeneID:17099689','gi|419819595|ref|NZ_AJRE01000517.1|:1-118',
+markers_to_exclude = set(['NC_001782.1', 'GeneID:17099689', 'gi|419819595|ref|NZ_AJRE01000517.1|:1-118',
         'GeneID:10498696', 'GeneID:10498710', 'GeneID:10498726', 'GeneID:10498735',
         'GeneID:10498757', 'GeneID:10498760', 'GeneID:10498761', 'GeneID:10498763',
         'GeneID:11294465', 'GeneID:14181982', 'GeneID:14182132', 'GeneID:14182146',
@@ -320,43 +352,43 @@ markers_to_exclude = \
         'gi|564938696|gb|AWYH01000018.1|:c75674-75039', 'gi|67993724|ref|XM_664440.1|',
         'gi|68059117|ref|XM_666447.1|', 'gi|68062389|ref|XM_668109.1|',
         'gi|71730848|gb|AAAM03000019.1|:c14289-12877', 'gi|82753723|ref|XM_722699.1|',
-        'gi|82775382|ref|NC_007606.1|:2249487-2250014', 'gi|82793634|ref|XM_723027.1|'
-        ])
+        'gi|82775382|ref|NC_007606.1|:2249487-2250014', 'gi|82793634|ref|XM_723027.1|',
+        'GeneID:1489527'])
 
 tax_units = "kpcofgst"
 
 if float(sys.version_info[0]) < 3.0:
-    def read_and_split( ofn  ):
+    def read_and_split(ofn):
         return (l.strip().split('\t') for l in ofn)
-    def read_and_split_line( line ):
+
+    def read_and_split_line(line):
         return line.strip().split('\t')
 else:
-    def read_and_split( ofn ):
-        return (str(l,encoding='utf-8').strip().split('\t') for l in ofn)
-    def read_and_split_line( line ):
-        return str(line,encoding='utf-8').strip().split('\t')
+    def read_and_split(ofn):
+        return (l.decode('utf-8').strip().split('\t') for l in ofn)
+
+    def read_and_split_line(line):
+        return line.decode('utf-8').strip().split('\t')
 
 
-def plain_read_and_split( ofn ):
+def plain_read_and_split(ofn):
     return (l.strip().split('\t') for l in ofn)
 
-def plain_read_and_split_line( l ):
-    return l.strip().split('\t')
 
+def plain_read_and_split_line(l):
+    return l.strip().split('\t')
 
 
 if float(sys.version_info[0]) < 3.0:
-    def mybytes( val ):
+    def mybytes(val):
         return val
 else:
-    def mybytes( val ):
-        return bytes(val,encoding='utf-8')
-    
-# get the directory that contains this script
-metaphlan2_script_install_folder=os.path.dirname(os.path.abspath(__file__))
+    def mybytes(val):
+        return bytes(val, encoding='utf-8')
+
 
 def read_params(args):
-    p = ap.ArgumentParser( description= 
+    p = ap.ArgumentParser( description=
             "DESCRIPTION\n"
             " MetaPhlAn version "+__version__+" ("+__date__+"): \n"
             " METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling.\n\n"
@@ -365,7 +397,7 @@ def read_params(args):
             " We assume here that metaphlan2.py is in the system path and that mpa_dir bash variable contains the\n"
             " main MetaPhlAn folder. Also BowTie2 should be in the system path with execution and read\n"
             " permissions, and Perl should be installed)\n\n"
-           
+
             "\n========== MetaPhlAn 2 clade-abundance estimation ================= \n\n"
             "The basic usage of MetaPhlAn 2 consists in the identification of the clades (from phyla to species and \n"
             "strains in particular cases) present in the metagenome obtained from a microbiome sample and their \n"
@@ -373,21 +405,21 @@ def read_params(args):
 
             "*  Profiling a metagenome from raw reads:\n"
             "$ metaphlan2.py metagenome.fastq --input_type fastq\n\n"
-            
+
             "*  You can take advantage of multiple CPUs and save the intermediate BowTie2 output for re-running\n"
             "   MetaPhlAn extremely quickly:\n"
             "$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq\n\n"
-            
+
             "*  If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you\n"
             "   can obtain the results in few seconds by using the previously saved --bowtie2out file and \n"
             "   specifying the input (--input_type bowtie2out):\n"
             "$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out\n\n"
-            
+
             "*  You can also provide an externally BowTie2-mapped SAM if you specify this format with \n"
             "   --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:\n"
             "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq\n"
             "$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt\n\n"
-            
+
             "*  Multiple alternative ways to pass the input are also available:\n"
             "$ cat metagenome.fastq | metaphlan2.py --input_type fastq \n"
             "$ tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2.py --input_type fastq \n"
@@ -399,8 +431,8 @@ def read_params(args):
             "  multiple files (but you need to specify the --bowtie2out parameter):\n"
             "$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq\n\n"
             "\n------------------------------------------------------------------- \n \n\n"
-        
-            
+
+
             "\n========== MetaPhlAn 2 strain tracking ============================ \n\n"
             "MetaPhlAn 2 introduces the capability of charachterizing organisms at the strain level using non\n"
             "aggregated marker information. Such capability comes with several slightly different flavours and \n"
@@ -409,7 +441,7 @@ def read_params(args):
             "the community, and then a strain-level profiling can be performed to zoom-in into specific species\n"
             "of interest. This operation can be performed quickly as it exploits the --bowtie2out intermediate \n"
             "file saved during the execution of the default analysis type.\n\n"
-           
+
             "*  The following command will output the abundance of each marker with a RPK (reads per kil-base) \n"
             "   higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated before as \n"
             "   shown above).\n"
@@ -421,24 +453,24 @@ def read_params(args):
             "*  The list of markers present in the sample can be obtained with '-t marker_pres_table'\n"
             "$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n"
             "   The --pres_th argument (default 1.0) set the minimum RPK value to consider a marker present\n\n"
-            
+
             "*  The list '-t clade_profiles' analysis type reports the same information of '-t marker_ab_table'\n"
             "   but the markers are reported on a clade-by-clade basis.\n"
             "$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n\n"
-            
+
             "*  Finally, to obtain all markers present for a specific clade and all its subclades, the \n"
             "   '-t clade_specific_strain_tracker' should be used. For example, the following command\n"
             "   is reporting the presence/absence of the markers for the B. fragulis species and its strains\n"
             "   the optional argument --min_ab specifies the minimum clade abundance for reporting the markers\n\n"
             "$ metaphlan2.py -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n"
-            
+
             "\n------------------------------------------------------------------- \n\n"
             "",
             formatter_class=ap.RawTextHelpFormatter,
             add_help=False )
     arg = p.add_argument
 
-    arg( 'inp', metavar='INPUT_FILE', type=str, nargs='?', default=None, help= 
+    arg( 'inp', metavar='INPUT_FILE', type=str, nargs='?', default=None, help=
          "the input file can be:\n"
          "* a fastq file containing metagenomic reads\n"
          "OR\n"
@@ -447,8 +479,8 @@ def read_params(args):
          "* an intermediary mapping file of the metagenome generated by a previous MetaPhlAn run \n"
          "If the input file is missing, the script assumes that the input is provided using the standard \n"
          "input, or named pipes.\n"
-         "IMPORTANT: the type of input needs to be specified with --input_type" )   
-    
+         "IMPORTANT: the type of input needs to be specified with --input_type" )
+
     arg( 'output', metavar='OUTPUT_FILE', type=str, nargs='?', default=None,
          help= "the tab-separated output file of the predicted taxon relative abundances \n"
                "[stdout if not present]")
@@ -456,48 +488,58 @@ def read_params(args):
 
     g = p.add_argument_group('Required arguments')
     arg = g.add_argument
-    input_type_choices = ['fastq','fasta','multifasta','multifastq','bowtie2out','sam'] # !!!!
-    arg( '--input_type', choices=input_type_choices, required = 'True', help =  
+    input_type_choices = ['fastq','fasta','multifasta','multifastq','bowtie2out','sam']
+    arg( '--input_type', choices=input_type_choices, required = '--install' not in args, help =
          "set whether the input is the multifasta file of metagenomic reads or \n"
          "the SAM file of the mapping of the reads against the MetaPhlAn db.\n"
          "[default 'automatic', i.e. the script will try to guess the input format]\n" )
-   
+
     g = p.add_argument_group('Mapping arguments')
     arg = g.add_argument
-    arg( '--mpa_pkl', type=str,
-         default=os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200.pkl"), 
-         help = "the metadata pickled MetaPhlAn file")
-    arg( '--bowtie2db', metavar="METAPHLAN_BOWTIE2_DB", type=str,
-         default = os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200"),
-         help = "The BowTie2 database file of the MetaPhlAn database. \n"
-                "Used if --input_type is fastq, fasta, multifasta, or multifastq")
-    bt2ps = ['sensitive','very-sensitive','sensitive-local','very-sensitive-local']
-    arg( '--bt2_ps', metavar="BowTie2 presets", default='very-sensitive', choices=bt2ps,
-         help = "presets options for BowTie2 (applied only when a multifasta file is provided)\n"
-                "The choices enabled in MetaPhlAn are:\n"
-                " * sensitive\n"
-                " * very-sensitive\n"
-                " * sensitive-local\n"
-                " * very-sensitive-local\n"
-                "[default very-sensitive]\n"   )
-    arg( '--bowtie2_exe', type=str, default = None, help =
-         'Full path and name of the BowTie2 executable. This option allows \n'
-         'MetaPhlAn to reach the executable even when it is not in the system \n'
-         'PATH or the system PATH is unreachable\n' )
-    arg( '--bowtie2out', metavar="FILE_NAME", type=str, default = None, help = 
-         "The file for saving the output of BowTie2\n" )
-    arg( '--no_map', action='store_true', help=
-         "Avoid storing the --bowtie2out map file\n" )
-    arg( '--tmp_dir', metavar="", default=None, type=str, help = 
-         "the folder used to store temporary files \n"
-         "[default is the OS dependent tmp dir]\n"   )
-    
-    
+    arg('--mpa_pkl', type=str, default=None,
+        help="The metadata pickled MetaPhlAn file [deprecated]")
+
+    arg('--bowtie2db', metavar="METAPHLAN_BOWTIE2_DB", type=str, default=None,
+        help=("The BowTie2 database file of the MetaPhlAn database. Used if "
+              "--input_type is fastq, fasta, multifasta, or multifastq "
+              "[deprecated]"))
+
+    arg('-x', '--index', type=str, default='v20_m200',
+        help=("Specify the id of the database version to use. If the database "
+              "files are not found on the local MetaPhlAn2 installation they "
+              "will be automatically downloaded"))
+
+    bt2ps = ['sensitive', 'very-sensitive', 'sensitive-local',
+             'very-sensitive-local']
+    arg('--bt2_ps', metavar="BowTie2 presets", default='very-sensitive',
+        choices=bt2ps, help="Presets options for BowTie2 (applied only when a "
+                            "multifasta file is provided)\n"
+                            "The choices enabled in MetaPhlAn are:\n"
+                            " * sensitive\n"
+                            " * very-sensitive\n"
+                            " * sensitive-local\n"
+                            " * very-sensitive-local\n"
+                            "[default very-sensitive]\n")
+    arg('--bowtie2_exe', type=str, default=None,
+        help='Full path and name of the BowTie2 executable. This option allows'
+             'MetaPhlAn to reach the executable even when it is not in the '
+             'system PATH or the system PATH is unreachable')
+    arg('--bowtie2_build', type=str, default='bowtie2-build',
+        help="Full path to the bowtie2-build command to use, deafult assumes "
+             "that 'bowtie2-build is present in the system path")
+    arg('--bowtie2out', metavar="FILE_NAME", type=str, default=None,
+        help="The file for saving the output of BowTie2")
+    arg('--no_map', action='store_true',
+        help="Avoid storing the --bowtie2out map file")
+    arg('--tmp_dir', metavar="", default=None, type=str,
+        help="The folder used to store temporary files [default is the OS "
+             "dependent tmp dir]")
+
     g = p.add_argument_group('Post-mapping arguments')
     arg = g.add_argument
     stat_choices = ['avg_g','avg_l','tavg_g','tavg_l','wavg_g','wavg_l','med']
-    arg( '--tax_lev', metavar='TAXONOMIC_LEVEL', type=str, 
-         choices='a'+tax_units, default='a', help = 
+    arg( '--tax_lev', metavar='TAXONOMIC_LEVEL', type=str,
+         choices='a'+tax_units, default='a', help =
          "The taxonomic level for the relative abundance output:\n"
          "'a' : all taxonomic levels\n"
          "'k' : kingdoms\n"
@@ -524,16 +566,16 @@ def read_params(args):
          "Do not profile bacterial organisms" )
     arg( '--ignore_archaea', action='store_true', help=
          "Do not profile archeal organisms" )
-    arg( '--stat_q', metavar="", type = float, default=0.1, help = 
+    arg( '--stat_q', metavar="", type = float, default=0.1, help =
          "Quantile value for the robust average\n"
          "[default 0.1]"   )
-    arg( '--ignore_markers', type=str, default = None, help = 
+    arg( '--ignore_markers', type=str, default = None, help =
          "File containing a list of markers to ignore. \n")
-    arg( '--avoid_disqm', action="store_true", help = 
+    arg( '--avoid_disqm', action="store_true", help =
          "Deactivate the procedure of disambiguating the quasi-markers based on the \n"
          "marker abundance pattern found in the sample. It is generally recommended \n"
          "too keep the disambiguation procedure in order to minimize false positives\n")
-    arg( '--stat', metavar="", choices=stat_choices, default="tavg_g", type=str, help = 
+    arg( '--stat', metavar="", choices=stat_choices, default="tavg_g", type=str, help =
          "EXPERIMENTAL! Statistical approach for converting marker abundances into clade abundances\n"
          "'avg_g'  : clade global (i.e. normalizing all markers together) average\n"
          "'avg_l'  : average of length-normalized marker counts\n"
@@ -542,17 +584,17 @@ def read_params(args):
          "'wavg_g' : winsorized clade global average (at --stat_q)\n"
          "'wavg_l' : winsorized average of length-normalized marker counts (at --stat_q)\n"
          "'med'    : median of length-normalized marker counts\n"
-         "[default tavg_g]"   ) 
-    
+         "[default tavg_g]"   )
+
     arg = p.add_argument
 
 
-    
+
     g = p.add_argument_group('Additional analysis types and arguments')
     arg = g.add_argument
     analysis_types = ['rel_ab', 'rel_ab_w_read_stats', 'reads_map', 'clade_profiles', 'marker_ab_table', 'marker_counts', 'marker_pres_table', 'clade_specific_strain_tracker']
-    arg( '-t', metavar='ANALYSIS TYPE', type=str, choices = analysis_types, 
-         default='rel_ab', help = 
+    arg( '-t', metavar='ANALYSIS TYPE', type=str, choices = analysis_types,
+         default='rel_ab', help =
          "Type of analysis to perform: \n"
          " * rel_ab: profiling a metagenomes in terms of relative abundances\n"
          " * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads comming from each clade.\n"
@@ -569,20 +611,19 @@ def read_params(args):
          "specified" )
     arg( '--pres_th', metavar="PRESENCE_THRESHOLD", type=int, default = 1.0, help =
          'Threshold for calling a marker present by the -t marker_pres_table option' )
-    arg( '--clade', metavar="", default=None, type=str, help = 
+    arg( '--clade', metavar="", default=None, type=str, help =
          "The clade for clade_specific_strain_tracker analysis\n"  )
-    arg( '--min_ab', metavar="", default=0.1, type=float, help = 
+    arg( '--min_ab', metavar="", default=0.1, type=float, help =
          "The minimum percentage abundace for the clade in the clade_specific_strain_tracker analysis\n"  )
-    arg( "-h", "--help", action="help", help="show this help message and exit")
 
     g = p.add_argument_group('Output arguments')
     arg = g.add_argument
-    arg( '-o', '--output_file',  metavar="output file", type=str, default=None, help = 
+    arg( '-o', '--output_file',  metavar="output file", type=str, default=None, help =
          "The output file (if not specified as positional argument)\n")
-    arg('--sample_id_key',  metavar="name", type=str, default="#SampleID", 
+    arg('--sample_id_key',  metavar="name", type=str, default="#SampleID",
         help =("Specify the sample ID key for this analysis."
                " Defaults to '#SampleID'."))
-    arg('--sample_id',  metavar="value", type=str, 
+    arg('--sample_id',  metavar="value", type=str,
         default="Metaphlan2_Analysis",
         help =("Specify the sample ID for this analysis."
                " Defaults to 'Metaphlan2_Analysis'."))
@@ -590,92 +631,299 @@ def read_params(args):
         type=str, default=None, help="The sam output file\n")
     #*************************************************************
     #* Parameters related to biom file generation                *
-    #*************************************************************         
-    arg( '--biom', '--biom_output_file',  metavar="biom_output", type=str, default=None, help = 
+    #*************************************************************
+    arg( '--biom', '--biom_output_file',  metavar="biom_output", type=str, default=None, help =
          "If requesting biom file output: The name of the output file in biom format \n")
 
-    arg( '--mdelim', '--metadata_delimiter_char',  metavar="mdelim", type=str, default="|", help = 
+    arg( '--mdelim', '--metadata_delimiter_char',  metavar="mdelim", type=str, default="|", help =
          "Delimiter for bug metadata: - defaults to pipe. e.g. the pipe in k__Bacteria|p__Proteobacteria \n")
     #*************************************************************
     #* End parameters related to biom file generation            *
-    #*************************************************************    
-    
+    #*************************************************************
+
     g = p.add_argument_group('Other arguments')
     arg = g.add_argument
-    arg( '--nproc', metavar="N", type=int, default=1, help = 
-         "The number of CPUs to use for parallelizing the mapping\n"
-         "[default 1, i.e. no parallelism]\n" ) 
-    arg( '-v','--version', action='version', version="MetaPhlAn version "+__version__+"\t("+__date__+")",
-         help="Prints the current MetaPhlAn version and exit\n" )
-    
-
-    return vars(p.parse_args()) 
-
-def run_bowtie2(  fna_in, outfmt6_out, bowtie2_db, preset, nproc, 
-                  file_format = "multifasta", exe = None, 
-                  samout = None,
-                  min_alignment_len = None,
-                  ):
+    arg('--nproc', metavar="N", type=int, default=4,
+        help="The number of CPUs to use for parallelizing the mapping [default 4]")
+    arg('--install', action='store_true',
+        help="Only checks if the MetaPhlAn2 DB is installed and installs it if not. All other parameters are ignored.")
+    arg('--read_min_len', type=int, default=70,
+        help="Specify the minimum length of the reads to be considered when parsing the input file with "
+             "'read_fastx.py' script, default value is 60")
+    arg('-v', '--version', action='version',
+        version="MetaPhlAn version {}\t({})".format(__version__, __date__),
+        help="Prints the current MetaPhlAn version and exit")
+    arg("-h", "--help", action="help", help="show this help message and exit")
+
+    return vars(p.parse_args())
+
+
+def byte_to_megabyte(byte):
+    """
+    Convert byte value to megabyte
+    """
+
+    return byte / (1024.0**2)
+
+
+class ReportHook():
+    def __init__(self):
+        self.start_time = time.time()
+
+    def report(self, blocknum, block_size, total_size):
+        """
+        Print download progress message
+        """
+
+        if blocknum == 0:
+            self.start_time = time.time()
+            if total_size > 0:
+                sys.stderr.write("Downloading file of size: {:.2f} MB\n"
+                                 .format(byte_to_megabyte(total_size)))
+        else:
+            total_downloaded = blocknum * block_size
+            status = "{:3.2f} MB ".format(byte_to_megabyte(total_downloaded))
+
+            if total_size > 0:
+                percent_downloaded = total_downloaded * 100.0 / total_size
+                # use carriage return plus sys.stderr to overwrite stderr
+                download_rate = total_downloaded / (time.time() - self.start_time)
+                estimated_time = (total_size - total_downloaded) / download_rate
+                estimated_minutes = int(estimated_time / 60.0)
+                estimated_seconds = estimated_time - estimated_minutes * 60.0
+                status += ("{:3.2f} %  {:5.2f} MB/sec {:2.0f} min {:2.0f} sec "
+                           .format(percent_downloaded,
+                                   byte_to_megabyte(download_rate),
+                                   estimated_minutes, estimated_seconds))
+
+            status += "        \r"
+            sys.stderr.write(status)
+
+
+def download(url, download_file):
+    """
+    Download a file from a url
+    """
+
+    if not os.path.isfile(download_file):
+        try:
+            sys.stderr.write("\nDownloading " + url + "\n")
+            file, headers = urlretrieve(url, download_file,
+                                        reporthook=ReportHook().report)
+        except EnvironmentError:
+            sys.stderr.write("\nWarning: Unable to download " + url + "\n")
+    else:
+        sys.stderr.write("\nFile {} already present!\n".format(download_file))
+
+
+def download_unpack_tar(url, download_file_name, folder, bowtie2_build, nproc):
+    """
+    Download the url to the file and decompress into the folder
+    """
+
+    # Create the folder if it does not already exist
+    if not os.path.isdir(folder):
+        try:
+            os.makedirs(folder)
+        except EnvironmentError:
+            sys.exit("ERROR: Unable to create folder for database install: " + folder)
+
+    # Check the directory permissions
+    if not os.access(folder, os.W_OK):
+        sys.exit("ERROR: The directory is not writeable: " + folder + ". "
+                 "Please modify the permissions.")
+
+    tar_file = os.path.join(folder, "mpa_" + download_file_name + ".tar")
+    url_tar_file = os.path.join(url, "mpa_" + download_file_name + ".tar")
+    download(url_tar_file, tar_file)
+
+    # download MD5 checksum
+    md5_file = os.path.join(folder, "mpa_" + download_file_name + ".md5")
+    url_md5_file = os.path.join(url, "mpa_" + download_file_name + ".md5")
+    download(url_md5_file, md5_file)
+
+    md5_md5 = None
+    md5_tar = None
+
+    if os.path.isfile(md5_file):
+        with open(md5_file) as f:
+            for row in f:
+                md5_md5 = row.strip().split(' ')[0]
+    else:
+        sys.stderr.write('File "{}" not found!'.format(md5_file))
+
+    # compute MD5 of .tar.bz2
+    if os.path.isfile(tar_file):
+        hash_md5 = hashlib.md5()
+
+        with open(tar_file, "rb") as f:
+            for chunk in iter(lambda: f.read(4096), b""):
+                hash_md5.update(chunk)
+
+        md5_tar = hash_md5.hexdigest()[:32]
+    else:
+        sys.stderr.write('File "{}" not found!'.format(tar_file))
+
+    if (md5_tar is None) or (md5_md5 is None):
+        sys.exit("MD5 checksums not found, something went wrong!")
+
+    # compare checksums
+    if md5_tar != md5_md5:
+        sys.exit("MD5 checksums do not correspond! If this happens again, you should remove the database files and "
+                 "rerun MetaPhlAn2 so they are re-downloaded")
+
+    # untar
+    try:
+        tarfile_handle = tarfile.open(tar_file)
+        tarfile_handle.extractall(path=folder)
+        tarfile_handle.close()
+    except EnvironmentError:
+        sys.stderr.write("Warning: Unable to extract {}.\n".format(tar_file))
+
+    # uncompress sequences
+    bz2_file = os.path.join(folder, "mpa_" + download_file_name + ".fna.bz2")
+    fna_file = os.path.join(folder, "mpa_" + download_file_name + ".fna")
+
+    if not os.path.isfile(fna_file):
+        sys.stderr.write('\n\nDecompressing {} into {}\n'.format(bz2_file, fna_file))
+
+        with open(fna_file, 'wb') as fna_h, bz2.BZ2File(bz2_file, 'rb') as bz2_h:
+            for data in iter(lambda: bz2_h.read(100 * 1024), b''):
+                fna_h.write(data)
+
+    # build bowtie2 indexes
+    if not glob(os.path.join(folder, "mpa_" + download_file_name + "*.bt2")):
+        bt2_base = os.path.join(folder, "mpa_" + download_file_name)
+        bt2_cmd = [bowtie2_build, '--quiet']
+
+        if nproc > 1:
+            bt2_build_output = subp.check_output([bowtie2_build, '--usage'], stderr=subp.STDOUT)
+
+            if 'threads' in str(bt2_build_output):
+                bt2_cmd += ['--threads', str(nproc)]
+
+        bt2_cmd += ['-f', fna_file, bt2_base]
+
+        sys.stderr.write('\nBuilding Bowtie2 indexes\n')
+
+        try:
+            subp.check_call(bt2_cmd)
+        except Exception as e:
+            sys.stderr.write("Fatal error running '{}'\nError message: '{}'\n\n".format(' '.join(bt2_cmd), e))
+            sys.exit(1)
+
+
+def check_and_install_database(index, bowtie2_build, nproc):
+    """ Check if the database is installed, if not download and install """
+
+    if len(glob(os.path.join(DEFAULT_DB_FOLDER, "mpa_{}*".format(index)))) >= 7:
+        return
+
+    # download the tar archive and decompress
+    sys.stderr.write("\nDownloading MetaPhlAn2 database\nPlease note due to "
+                     "the size this might take a few minutes\n")
+    download_unpack_tar(DATABASE_DOWNLOAD, index, DEFAULT_DB_FOLDER, bowtie2_build, nproc)
+    sys.stderr.write("\nDownload complete\n")
+
+
+def set_mapping_arguments(index):
+    mpa_pkl = 'mpa_pkl'
+    bowtie2db = 'bowtie2db'
+
+    if os.path.isfile(os.path.join(DEFAULT_DB_FOLDER, "mpa_{}.pkl".format(index))):
+        mpa_pkl = os.path.join(DEFAULT_DB_FOLDER, "mpa_{}.pkl".format(index))
+
+    if glob(os.path.join(DEFAULT_DB_FOLDER, "mpa_{}*.bt2".format(index))):
+        bowtie2db = os.path.join(DEFAULT_DB_FOLDER, "mpa_{}".format(index))
+
+    return (mpa_pkl, bowtie2db)
+
+
+def run_bowtie2(fna_in, outfmt6_out, bowtie2_db, preset, nproc, file_format="multifasta",
+                exe=None, samout=None, min_alignment_len=None, read_min_len=0):
+    # checking read_fastx.py
+    read_fastx = "read_fastx.py"
+
+    try:
+        subp.check_call([read_fastx, "-h"], stdout=DEVNULL)
+    except Exception as e:
+        try:
+            read_fastx = os.path.join(os.path.join(os.path.dirname(__file__), "utils"), read_fastx)
+            subp.check_call([read_fastx, "-h"], stdout=DEVNULL)
+        except Exception as e:
+            sys.stderr.write("OSError: fatal error running '{}'. Is it in the system path?\n".format(read_fastx))
+            sys.exit(1)
+
+    # checking bowtie2
     try:
-        if not fna_in: # or stat.S_ISFIFO(os.stat(fna_in).st_mode):
-            fna_in = "-"
-        bowtie2_cmd = [ exe if exe else 'bowtie2', 
-                        "--quiet", "--no-unal", 
-                        "--"+preset,
-                        "-S","-",
-                        "-x", bowtie2_db,
-                         ] + ([] if int(nproc) < 2 else ["-p",str(nproc)])
-        bowtie2_cmd += ["-U", fna_in] # if not stat.S_ISFIFO(os.stat(fna_in).st_mode) else []
-        bowtie2_cmd += (["-f"] if file_format == "multifasta" else []) 
-        p = subp.Popen( bowtie2_cmd, stdout=subp.PIPE ) 
-        lmybytes, outf = (mybytes,bz2.BZ2File(outfmt6_out, "w")) if outfmt6_out.endswith(".bz2") else (str,open( outfmt6_out, "w" ))
-        
+        subp.check_call([exe if exe else 'bowtie2', "-h"], stdout=DEVNULL)
+    except Exception as e:
+        sys.stderr.write('OSError: "{}"\nFatal error running BowTie2. Is BowTie2 in the system path?\n'.format(e))
+        sys.exit(1)
+
+    try:
+        if fna_in:
+            readin = subp.Popen([read_fastx, '-l', str(read_min_len), fna_in], stdout=subp.PIPE)
+        else:
+            readin = subp.Popen([read_fastx, '-l', str(read_min_len)], stdin=sys.stdin, stdout=subp.PIPE)
+
+        bowtie2_cmd = [exe if exe else 'bowtie2', "--quiet", "--no-unal", "--{}".format(preset),
+                       "-S", "-", "-x", bowtie2_db]
+
+        if int(nproc) > 1:
+            bowtie2_cmd += ["-p", str(nproc)]
+
+        bowtie2_cmd += ["-U", "-"]  # if not stat.S_ISFIFO(os.stat(fna_in).st_mode) else []
+
+        if file_format == "multifasta":
+            bowtie2_cmd += ["-f"]
+
+        p = subp.Popen(bowtie2_cmd, stdout=subp.PIPE, stdin=readin.stdout)
+        readin.stdout.close()
+        lmybytes, outf = (mybytes, bz2.BZ2File(outfmt6_out, "w")) if outfmt6_out.endswith(".bz2") else (str, open(outfmt6_out, "w"))
+
         try:
             if samout:
                 if samout[-4:] == '.bz2':
                     sam_file = bz2.BZ2File(samout, 'w')
                 else:
-                    sam_file = open(samout, 'w')
-        except IOError:
-            sys.stderr.write( "IOError: Unable to open sam output file.\n" )
+                    sam_file = open(samout, 'wb')
+        except IOError as e:
+            sys.stderr.write('IOError: "{}"\nUnable to open sam output file.\n'.format(e))
             sys.exit(1)
 
         for line in p.stdout:
             if samout:
                 sam_file.write(line)
-            if line[0] != '@':
-                o = read_and_split_line(line)
-                if o[2][-1] != '*':
-                    if min_alignment_len == None\
-                        or max([int(x.strip('M')) for x in\
-                                re.findall(r'(\d*M)', o[5])]) >= min_alignment_len:
-                        outf.write( lmybytes("\t".join([o[0],o[2]]) +"\n") )
-        #if  float(sys.version_info[0]) >= 3: 
-        #    for o in read_and_split(p.stdout):
-        #        if o[2][-1] != '*':
-        #            outf.write( bytes("\t".join([o[0],o[2]]) +"\n",encoding='utf-8') )
-        #else:
-        #    for o in read_and_split(p.stdout):
-        #        if o[2][-1] != '*':
-        #            outf.write( "\t".join([o[0],o[2]]) +"\n" )
+
+            o = read_and_split_line(line)
+
+            if not o[0].startswith('@'):
+                if not o[2].endswith('*'):
+                    if ((min_alignment_len is None) or
+                            (max([int(x.strip('M')) for x in re.findall(r'(\d*M)', o[5]) if x]) >= min_alignment_len)):
+                        outf.write(lmybytes("\t".join([o[0], o[2]]) + "\n"))
+
         outf.close()
+
         if samout:
             sam_file.close()
-        p.wait()
 
+        p.communicate()
 
-    except OSError:
-        sys.stderr.write( "OSError: fatal error running BowTie2. Is BowTie2 in the system path?\n" )
+    except OSError as e:
+        sys.stderr.write('OSError: "{}"\nFatal error running BowTie2.\n'.format(e))
         sys.exit(1)
-    except ValueError:
-        sys.stderr.write( "ValueError: fatal error running BowTie2.\n" )
+    except ValueError as e:
+        sys.stderr.write('ValueError: "{}"\nFatal error running BowTie2.\n'.format(e))
         sys.exit(1)
-    except IOError:
-        sys.stderr.write( "IOError: fatal error running BowTie2.\n" )
+    except IOError as e:
+        sys.stderr.write('IOError: "{}"\nFatal error running BowTie2.\n'.format(e))
         sys.exit(1)
+
     if p.returncode == 13:
-        sys.stderr.write( "Permission Denied Error: fatal error running BowTie2." 
-          "Is the BowTie2 file in the path with execution and read permissions?\n" )
+        sys.stderr.write("Permission Denied Error: fatal error running BowTie2."
+                         "Is the BowTie2 file in the path with execution and read permissions?\n")
         sys.exit(1)
     elif p.returncode != 0:
         sys.stderr.write("Error while running bowtie2.\n")
@@ -685,7 +933,7 @@ def run_bowtie2(  fna_in, outfmt6_out, bowtie2_db, preset, nproc,
 #    if "," in inp_file:
 #        sys.stderr.write( "Sorry, I cannot guess the format of the input, when "
 #                          "more than one file is specified. Please set the --input_type parameter \n" )
-#        sys.exit(1) 
+#        sys.exit(1)
 #
 #    with open( inp_file ) as inpf:
 #        for i,l in enumerate(inpf):
@@ -708,7 +956,7 @@ class TaxClade:
         self.children, self.markers2nreads = {}, {}
         self.name, self.father = name, None
         self.uncl, self.subcl_uncl = uncl, False
-        self.abundance, self.uncl_abundance = None, 0 
+        self.abundance, self.uncl_abundance = None, 0
         self.id = id_int
 
     def add_child( self, name, id_int ):
@@ -717,7 +965,7 @@ class TaxClade:
         new_clade.father = self
         return new_clade
 
-    
+
     def get_terminals( self ):
         terms = []
         if not self.children:
@@ -736,18 +984,18 @@ class TaxClade:
         return "|".join(fullname[1:])
 
     def get_normalized_counts( self ):
-        return [(m,float(n)*1000.0/self.markers2lens[m]) 
+        return [(m,float(n)*1000.0/self.markers2lens[m])
                     for m,n in self.markers2nreads.items()]
 
     def compute_abundance( self ):
         if self.abundance is not None: return self.abundance
-        sum_ab = sum([c.compute_abundance() for c in self.children.values()]) 
-        rat_nreads = sorted([(self.markers2lens[m],n) 
+        sum_ab = sum([c.compute_abundance() for c in self.children.values()])
+        rat_nreads = sorted([(self.markers2lens[m],n)
                                     for m,n in self.markers2nreads.items()],
                                             key = lambda x: x[1])
 
         rat_nreads, removed = [], []
-        for m,n in self.markers2nreads.items():
+        for m,n in sorted(self.markers2nreads.items(),key=lambda pars:pars[0]):
             misidentified = False
 
             if not self.avoid_disqm:
@@ -758,7 +1006,7 @@ class TaxClade:
                     while len(tocladetmp.children) == 1:
                         tocladetmp = list(tocladetmp.children.values())[0]
                         m2nr = tocladetmp.markers2nreads
-    
+
                     nonzeros = sum([v>0 for v in m2nr.values()])
                     if len(m2nr):
                         if float(nonzeros) / len(m2nr) > 0.33:
@@ -766,14 +1014,14 @@ class TaxClade:
                             removed.append( (self.markers2lens[m],n) )
                             break
             if not misidentified:
-                rat_nreads.append( (self.markers2lens[m],n) ) 
-       
+                rat_nreads.append( (self.markers2lens[m],n) )
+
         if not self.avoid_disqm and len(removed):
             n_rat_nreads = float(len(rat_nreads))
             n_removed = float(len(removed))
             n_tot = n_rat_nreads + n_removed
             n_ripr = 10
-            
+
             if len(self.get_terminals()) < 2:
                 n_ripr = 0
 
@@ -783,16 +1031,16 @@ class TaxClade:
             if n_rat_nreads < n_ripr and n_tot > n_rat_nreads:
                 rat_nreads += removed[:n_ripr-int(n_rat_nreads)]
 
-        
+
         rat_nreads = sorted(rat_nreads, key = lambda x: x[1])
 
         rat_v,nreads_v = zip(*rat_nreads) if rat_nreads else ([],[])
         rat, nrawreads, loc_ab = float(sum(rat_v)) or -1.0, sum(nreads_v), 0.0
         quant = int(self.quantile*len(rat_nreads))
         ql,qr,qn = (quant,-quant,quant) if quant else (None,None,0)
-     
+
         if self.name[0] == 't' and (len(self.father.children) > 1 or "_sp" in self.father.name or "k__Viruses" in self.get_full_name()):
-            non_zeros = float(len([n for r,n in rat_nreads if n > 0])) 
+            non_zeros = float(len([n for r,n in rat_nreads if n > 0]))
             nreads = float(len(rat_nreads))
             if nreads == 0.0 or non_zeros / nreads < 0.7:
                 self.abundance = 0.0
@@ -803,7 +1051,7 @@ class TaxClade:
         elif self.stat == 'avg_g' or (not qn and self.stat in ['wavg_g','tavg_g']):
             loc_ab = nrawreads / rat if rat >= 0 else 0.0
         elif self.stat == 'avg_l' or (not qn and self.stat in ['wavg_l','tavg_l']):
-            loc_ab = np.mean([float(n)/r for r,n in rat_nreads]) 
+            loc_ab = np.mean([float(n)/r for r,n in rat_nreads])
         elif self.stat == 'tavg_g':
             wnreads = sorted([(float(n)/r,r,n) for r,n in rat_nreads], key=lambda x:x[0])
             den,num = zip(*[v[1:] for v in wnreads[ql:qr]])
@@ -813,15 +1061,15 @@ class TaxClade:
         elif self.stat == 'wavg_g':
             vmin, vmax = nreads_v[ql], nreads_v[qr]
             wnreads = [vmin]*qn+list(nreads_v[ql:qr])+[vmax]*qn
-            loc_ab = float(sum(wnreads)) / rat  
+            loc_ab = float(sum(wnreads)) / rat
         elif self.stat == 'wavg_l':
             wnreads = sorted([float(n)/r for r,n in rat_nreads])
             vmin, vmax = wnreads[ql], wnreads[qr]
             wnreads = [vmin]*qn+list(wnreads[ql:qr])+[vmax]*qn
-            loc_ab = np.mean(wnreads) 
+            loc_ab = np.mean(wnreads)
         elif self.stat == 'med':
-            loc_ab = np.median(sorted([float(n)/r for r,n in rat_nreads])[ql:qr]) 
-        
+            loc_ab = np.median(sorted([float(n)/r for r,n in rat_nreads])[ql:qr])
+
         self.abundance = loc_ab
         if rat < self.min_cu_len and self.children:
             self.abundance = sum_ab
@@ -830,7 +1078,7 @@ class TaxClade:
 
         if self.abundance > sum_ab and self.children: # *1.1??
             self.uncl_abundance = self.abundance - sum_ab
-        self.subcl_uncl = not self.children and self.name[0] not in tax_units[-2:] 
+        self.subcl_uncl = not self.children and self.name[0] not in tax_units[-2:]
 
         return self.abundance
 
@@ -857,7 +1105,9 @@ class TaxTree:
         TaxClade.taxa2clades = self.taxa2clades
         self.id_gen = itertools.count(1)
 
-        clades_txt = ((l.strip().split("|"),n) for l,n in mpa_pkl['taxonomy'].items())        
+        # clades_txt = ((l.strip().split("|"),n) for l,n in mpa_pkl['taxonomy'].items())
+        clades_txt = ((l.strip().split("|"), n) for l, n in mpa['taxonomy'].items())
+
         for clade,lenc in clades_txt:
             father = self.root
             for clade_lev in clade: # !!!!! [:-1]:
@@ -874,21 +1124,24 @@ class TaxTree:
         def add_lens( node ):
             if not node.children:
                 return node.glen
-            lens = [] 
+            lens = []
             for c in node.children.values():
                 lens.append( add_lens( c ) )
             node.glen = sum(lens) / len(lens)
             return node.glen
         add_lens( self.root )
-        
-        for k,p in mpa_pkl['markers'].items():
+
+        # for k,p in mpa_pkl['markers'].items():
+        for k, p in mpa['markers'].items():
             if k in markers_to_exclude:
                 continue
+
             if k in markers_to_ignore:
                 continue
+
             self.markers2lens[k] = p['len']
             self.markers2clades[k] = p['clade']
-            self.add_reads( k, 0  )
+            self.add_reads(k, 0)
             self.markers2exts[k] = p['ext']
 
     def set_min_cu_len( self, min_cu_len ):
@@ -899,8 +1152,8 @@ class TaxTree:
         TaxClade.quantile = quantile
         TaxClade.avoid_disqm = avoid_disqm
 
-    def add_reads(  self, marker, n, 
-                    ignore_viruses = False, ignore_eukaryotes = False, 
+    def add_reads(  self, marker, n,
+                    ignore_viruses = False, ignore_eukaryotes = False,
                     ignore_bacteria = False, ignore_archaea = False  ):
         clade = self.markers2clades[marker]
         cl = self.all_clades[clade]
@@ -918,7 +1171,7 @@ class TaxTree:
             cl = list(cl.children.values())[0]
         cl.markers2nreads[marker] = n
         return cl.get_full_name()
-   
+
 
     def markers2counts( self ):
         m2c = {}
@@ -930,24 +1183,24 @@ class TaxTree:
     def clade_profiles( self, tax_lev, get_all = False  ):
         cl2pr = {}
         for k,v in self.all_clades.items():
-            if tax_lev and not k.startswith(tax_lev): 
+            if tax_lev and not k.startswith(tax_lev):
                 continue
             prof = v.get_normalized_counts()
             if not get_all and ( len(prof) < 1 or not sum([p[1] for p in prof]) > 0.0 ):
                 continue
             cl2pr[v.get_full_name()] = prof
         return cl2pr
-            
+
     def relative_abundances( self, tax_lev  ):
-        cl2ab_n = dict([(k,v) for k,v in self.all_clades.items() 
+        cl2ab_n = dict([(k,v) for k,v in self.all_clades.items()
                     if k.startswith("k__") and not v.uncl])
-     
-        cl2ab, cl2glen, tot_ab = {}, {}, 0.0 
+
+        cl2ab, cl2glen, tot_ab = {}, {}, 0.0
         for k,v in cl2ab_n.items():
             tot_ab += v.compute_abundance()
 
         for k,v in cl2ab_n.items():
-            for cl,ab in v.get_all_abundances():
+            for cl,ab in sorted(v.get_all_abundances(),key=lambda pars:pars[0]):
                 if not tax_lev:
                     if cl not in self.all_clades:
                         to = tax_units.index(cl[0])
@@ -959,7 +1212,7 @@ class TaxTree:
                         glen = self.all_clades[spl[-1]].glen
                     else:
                         glen = self.all_clades[cl].glen
-                        cl = self.all_clades[cl].get_full_name() 
+                        cl = self.all_clades[cl].get_full_name()
                 elif not cl.startswith(tax_lev):
                     if cl in self.all_clades:
                         glen = self.all_clades[cl].glen
@@ -967,7 +1220,7 @@ class TaxTree:
                         glen = 1.0
                     continue
                 cl2ab[cl] = ab
-                cl2glen[cl] = glen 
+                cl2glen[cl] = glen
 
         ret_d = dict([( k, float(v) / tot_ab if tot_ab else 0.0) for k,v in cl2ab.items()])
         ret_r = dict([( k, (v,cl2glen[k],float(v)*cl2glen[k])) for k,v in cl2ab.items()])
@@ -976,45 +1229,55 @@ class TaxTree:
             ret_d[tax_lev+"unclassified"] = 1.0 - sum(ret_d.values())
         return ret_d, ret_r
 
-def map2bbh( mapping_f, input_type = 'bowtie2out', min_alignment_len = None):
+
+def map2bbh(mapping_f, input_type='bowtie2out', min_alignment_len=None):
     if not mapping_f:
         ras, ras_line, inpf = plain_read_and_split, plain_read_and_split_line, sys.stdin
     else:
         if mapping_f.endswith(".bz2"):
-            ras, ras_line, inpf = read_and_split, read_and_split_line, bz2.BZ2File( mapping_f, "r" )
+            ras, ras_line, inpf = read_and_split, read_and_split_line, bz2.BZ2File(mapping_f, "r")
         else:
-            ras, ras_line, inpf = plain_read_and_split,\
-                                  plain_read_and_split_line,\
-                                  open( mapping_f )
+            ras, ras_line, inpf = plain_read_and_split, plain_read_and_split_line, open(mapping_f)
+
+    reads2markers = {}
 
-    reads2markers, reads2maxb = {}, {}
     if input_type == 'bowtie2out':
-        for r,c in ras(inpf):
+        for r, c in ras(inpf):
             reads2markers[r] = c
     elif input_type == 'sam':
         for line in inpf:
             o = ras_line(line)
-            if o[0][0] != '@' and o[2][-1] != '*':
-                if min_alignment_len == None\
-                    or max([int(x.strip('M')) for x in\
-                            re.findall(r'(\d*M)', o[5])]) >= min_alignment_len:
+
+            if ((o[0][0] != '@') and
+                (o[2][-1] != '*') and
+                ((min_alignment_len is None) or
+                 (max([int(x.strip('M')) for x in re.findall(r'(\d*M)', o[5]) if x]) >= min_alignment_len))):
                     reads2markers[o[0]] = o[2]
+
     inpf.close()
+    markers2reads = defdict(set)
 
-    markers2reads = defdict( set )
-    for r,m in reads2markers.items():
-        markers2reads[m].add( r )
+    for r, m in reads2markers.items():
+        markers2reads[m].add(r)
 
     return markers2reads
-    
-    
+
+
 def maybe_generate_biom_file(pars, abundance_predictions):
+    json_key = "MetaPhlAn2"
+
     if not pars['biom']:
         return None
     if not abundance_predictions:
-        return open(pars['biom'], 'w').close()
+        biom_table = biom.Table([], [], [])  # create empty BIOM table
+
+        with open(pars['biom'], 'w') as outfile:
+            biom_table.to_json(json_key, direct_io=outfile)
+
+        return True
 
     delimiter = "|" if len(pars['mdelim']) > 1 else pars['mdelim']
+
     def istip(clade_name):
         end_name = clade_name.split(delimiter)[-1]
         return end_name.startswith("t__") or end_name.endswith("_unclassified")
@@ -1029,7 +1292,7 @@ def maybe_generate_biom_file(pars, abundance_predictions):
     def to_biomformat(clade_name):
         return { 'taxonomy': clade_name.split(delimiter) }
 
-    clades = iter( (abundance, findclade(name)) 
+    clades = iter( (abundance, findclade(name))
                    for (name, abundance) in abundance_predictions
                    if istip(name) )
     packed = iter( ([abundance], clade.get_full_name(), clade.id)
@@ -1042,11 +1305,24 @@ def maybe_generate_biom_file(pars, abundance_predictions):
     data = np.array(data)
     sample_ids = [pars['sample_id']]
     table_id='MetaPhlAn2_Analysis'
-    json_key = "MetaPhlAn2"
 
+
+
+
+    #**********************************************************************************************
+    #  Modification of Code :                                                                     *
+    #  Modified the code so instead of using the current clade IDs, which are numbers, we will    *
+    #      use the clade_names                                                                    *
+    #      Users reported the biom output is invalid and also the IDs were changing from run to   *
+    #      run.                                                                                   *
+    #  George Weingart    05/22/2017   george.weingart at mail.com                                   *
+    #**********************************************************************************************
     if LooseVersion(biom.__version__) < LooseVersion("2.0.0"):
         biom_table = biom.table.table_factory(
-            data, sample_ids, clade_ids,
+            data,
+        sample_ids,
+            ######## clade_ids,     #Modified by George Weingart 5/22/2017 - We will use instead the clade_names
+            clade_names,            #Modified by George Weingart 5/22/2017 - We will use instead the clade_names
             sample_metadata      = None,
             observation_metadata = map(to_biomformat, clade_names),
             table_id             = table_id,
@@ -1057,40 +1333,48 @@ def maybe_generate_biom_file(pars, abundance_predictions):
                            outfile )
     else:  # Below is the biom2 compatible code
         biom_table = biom.table.Table(
-            data, clade_ids, sample_ids,
+            data,
+            #clade_ids,           #Modified by George Weingart 5/22/2017 - We will use instead the clade_names
+            clade_names,          #Modified by George Weingart 5/22/2017 - We will use instead the clade_names
+            sample_ids,
             sample_metadata      = None,
             observation_metadata = map(to_biomformat, clade_names),
             table_id             = table_id,
             input_is_dense       = True
         )
-        
-        with open(pars['biom'], 'w') as outfile:  
+
+        with open(pars['biom'], 'w') as outfile:
             biom_table.to_json( json_key,
                                 direct_io = outfile )
 
     return True
 
 
-if __name__ == '__main__':
-    pars = read_params( sys.argv )    
-    #if pars['inp'] is None and ( pars['input_type'] is None or  pars['input_type'] == 'automatic'): 
+def metaphlan2():
+    pars = read_params(sys.argv)
+
+    # check if the database is installed, if not then install
+    check_and_install_database(pars['index'], pars['bowtie2_build'],
+                               pars['nproc'])
+
+    if pars['install']:
+        sys.stderr.write('The database is installed\n')
+        return
+
+    # set correct map_pkl and bowtie2db variables
+    pars['mpa_pkl'], pars['bowtie2db'] = set_mapping_arguments(pars['index'])
+
+    #if pars['inp'] is None and ( pars['input_type'] is None or  pars['input_type'] == 'automatic'):
     #    sys.stderr.write( "The --input_type parameter need top be specified when the "
     #                      "input is provided from the standard input.\n"
     #                      "Type metaphlan.py -h for more info\n")
     #    sys.exit(0)
 
-    if pars['bt2_ps'] in [
-                          "sensitive-local",
-                          "very-sensitive-local"
-                          ]\
-        and pars['min_alignment_len'] == None:
+    if (pars['bt2_ps'] in ["sensitive-local", "very-sensitive-local"]) and (pars['min_alignment_len'] is None):
             pars['min_alignment_len'] = 100
-            sys.stderr.write('Warning! bt2_ps is set to local mode, '\
-                             'and min_alignment_len is None, '
-                             'I automatically set min_alignment_len to 100! '\
-                             'If you do not like, rerun the command and set '\
-                             'min_alignment_len to a specific value.\n'
-                            )
+            sys.stderr.write('Warning! bt2_ps is set to local mode, and min_alignment_len is None, I automatically '
+                             'set min_alignment_len to 100! If you do not like, rerun the command and set '
+                             'min_alignment_len to a specific value.\n')
 
     if pars['input_type'] == 'fastq':
         pars['input_type'] = 'multifastq'
@@ -1102,7 +1386,7 @@ if __name__ == '__main__':
     #    if not pars['input_type']:
     #        sys.stderr.write( "Sorry, I cannot guess the format of the input file, please "
     #                          "specify the --input_type parameter \n" )
-    #        sys.exit(1) 
+    #        sys.exit(1)
 
     # check for the mpa_pkl file
     if not os.path.isfile(pars['mpa_pkl']):
@@ -1110,7 +1394,7 @@ if __name__ == '__main__':
                          "\nExpecting location ${mpa_dir}/db_v20/map_v20_m200.pkl "
                          "\nSelect the file location with the option --mpa_pkl.\n"
                          "Exiting...\n\n")
-        sys.exit(1)           
+        sys.exit(1)
 
     if pars['ignore_markers']:
         with open(pars['ignore_markers']) as ignv:
@@ -1121,19 +1405,21 @@ if __name__ == '__main__':
     no_map = False
     if pars['input_type'] == 'multifasta' or pars['input_type'] == 'multifastq':
         bow = pars['bowtie2db'] is not None
+
         if not bow:
             sys.stderr.write( "No MetaPhlAn BowTie2 database provided\n "
                               "[--bowtie2db options]!\n"
                               "Exiting...\n\n" )
             sys.exit(1)
+
         if pars['no_map']:
             pars['bowtie2out'] = tf.NamedTemporaryFile(dir=pars['tmp_dir']).name
             no_map = True
         else:
             if bow and not pars['bowtie2out']:
                 if pars['inp'] and "," in  pars['inp']:
-                    sys.stderr.write( "Error! --bowtie2out needs to be specified when multiple "
-                                      "fastq or fasta files (comma separated) are provided"  )
+                    sys.stderr.write("Error! --bowtie2out needs to be specified when multiple "
+                                     "fastq or fasta files (comma separated) are provided\n")
                     sys.exit(1)
                 fname = pars['inp']
                 if fname is None:
@@ -1143,30 +1429,26 @@ if __name__ == '__main__':
                 pars['bowtie2out'] = fname + ".bowtie2out.txt"
 
             if os.path.exists( pars['bowtie2out'] ):
-                sys.stderr.write(   
+                sys.stderr.write(
                     "BowTie2 output file detected: " + pars['bowtie2out'] + "\n"
                     "Please use it as input or remove it if you want to "
                     "re-perform the BowTie2 run.\n"
                     "Exiting...\n\n" )
                 sys.exit(1)
 
-        if bow and not all([os.path.exists(".".join([str(pars['bowtie2db']),p]))
-                        for p in ["1.bt2", "2.bt2", "3.bt2","4.bt2","1.bt2","2.bt2"]]):
-            sys.stderr.write( "No MetaPhlAn BowTie2 database found "
-                              "[--bowtie2db option]! "
-                              "(or wrong path provided)."
-                              "\nExpecting location ${mpa_dir}/db_v20/map_v20_m200 "
-                              "\nExiting... " )
+        if bow and not all([os.path.exists(".".join([str(pars['bowtie2db']), p]))
+                            for p in ["1.bt2", "2.bt2", "3.bt2", "4.bt2", "rev.1.bt2", "rev.2.bt2"]]):
+            sys.stderr.write("No MetaPhlAn BowTie2 database found (--index "
+                             "option)!\nExpecting location {}\nExiting..."
+                             .format(DEFAULT_DB_FOLDER))
             sys.exit(1)
 
         if bow:
-            run_bowtie2( pars['inp'], pars['bowtie2out'], pars['bowtie2db'], 
-                         pars['bt2_ps'], pars['nproc'], file_format = pars['input_type'],
-                         exe = pars['bowtie2_exe'],
-                         samout = pars['samout'],
-                         min_alignment_len = pars['min_alignment_len'])
+            run_bowtie2(pars['inp'], pars['bowtie2out'], pars['bowtie2db'],
+                        pars['bt2_ps'], pars['nproc'], file_format=pars['input_type'],
+                        exe=pars['bowtie2_exe'], samout=pars['samout'],
+                        min_alignment_len=pars['min_alignment_len'], read_min_len=pars['read_min_len'])
             pars['input_type'] = 'bowtie2out'
-        
         pars['inp'] = pars['bowtie2out'] # !!!
 
     with open( pars['mpa_pkl'], 'rb' ) as a:
@@ -1176,27 +1458,24 @@ if __name__ == '__main__':
     tree.set_min_cu_len( pars['min_cu_len'] )
     tree.set_stat( pars['stat'], pars['stat_q'], pars['avoid_disqm']  )
 
-    markers2reads = map2bbh( 
-                            pars['inp'], 
-                            pars['input_type'],
-                            pars['min_alignment_len']
-                            )
+    markers2reads = map2bbh(pars['inp'], pars['input_type'],
+                            pars['min_alignment_len'])
     if no_map:
-        os.remove( pars['inp'] )         
+        os.remove( pars['inp'] )
 
     map_out = []
-    for marker,reads in markers2reads.items():
+    for marker,reads in sorted(markers2reads.items(), key=lambda pars: pars[0]):
         if marker not in tree.markers2lens:
             continue
-        tax_seq = tree.add_reads( marker, len(reads), 
+        tax_seq = tree.add_reads( marker, len(reads),
                                   ignore_viruses = pars['ignore_viruses'],
                                   ignore_eukaryotes = pars['ignore_eukaryotes'],
                                   ignore_bacteria = pars['ignore_bacteria'],
                                   ignore_archaea = pars['ignore_archaea'],
                                   )
         if tax_seq:
-            map_out +=["\t".join([r,tax_seq]) for r in reads]
-    
+            map_out +=["\t".join([r,tax_seq]) for r in sorted(reads)]
+
     if pars['output'] is None and pars['output_file'] is not None:
         pars['output'] = pars['output_file']
 
@@ -1205,18 +1484,18 @@ if __name__ == '__main__':
         if pars['t'] == 'reads_map':
             outf.write( "\n".join( map_out ) + "\n" )
         elif pars['t'] == 'rel_ab':
-            cl2ab, _ = tree.relative_abundances( 
+            cl2ab, _ = tree.relative_abundances(
                         pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None )
             outpred = [(k,round(v*100.0,5)) for k,v in cl2ab.items() if v > 0.0]
             if outpred:
                 for k,v in sorted(  outpred, reverse=True,
-                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ): 
-                    outf.write( "\t".join( [k,str(v)] ) + "\n" )   
+                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ):
+                    outf.write( "\t".join( [k,str(v)] ) + "\n" )
             else:
                 outf.write( "unclassified\t100.0\n" )
             maybe_generate_biom_file(pars, outpred)
         elif pars['t'] == 'rel_ab_w_read_stats':
-            cl2ab, rr = tree.relative_abundances( 
+            cl2ab, rr = tree.relative_abundances(
                         pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None )
             outpred = [(k,round(v*100.0,5)) for k,v in cl2ab.items() if v > 0.0]
             totl = 0
@@ -1228,13 +1507,13 @@ if __name__ == '__main__':
                                             "estimated_number_of_reads_from_the_clade" ]) +"\n" )
 
                 for k,v in sorted(  outpred, reverse=True,
-                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ): 
+                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ):
                     outf.write( "\t".join( [    k,
                                                 str(v),
                                                 str(rr[k][0]) if k in rr else "-",
                                                 str(rr[k][1]) if k in rr else "-",
-                                                str(int(round(rr[k][2],0)) if k in rr else "-")   
-                                                ] ) + "\n" )   
+                                                str(int(round(rr[k][2],0)) if k in rr else "-")
+                                                ] ) + "\n" )
                     if "|" not in k:
                         totl += (int(round(rr[k][2],0)) if k in rr else 0)
 
@@ -1252,7 +1531,7 @@ if __name__ == '__main__':
         elif pars['t'] == 'marker_ab_table':
             cl2pr = tree.clade_profiles( pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None  )
             for v in cl2pr.values():
-                outf.write( "\n".join(["\t".join([str(a),str(b/float(pars['nreads'])) if pars['nreads'] else str(b)]) 
+                outf.write( "\n".join(["\t".join([str(a),str(b/float(pars['nreads'])) if pars['nreads'] else str(b)])
                                 for a,b in v if b > 0.0]) + "\n" )
         elif pars['t'] == 'marker_pres_table':
             cl2pr = tree.clade_profiles( pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None  )
@@ -1280,3 +1559,7 @@ if __name__ == '__main__':
             else:
                 sys.stderr.write("Clade "+pars['clade']+" not present at an abundance >"+str(round(pars['min_ab'],2))+"%, "
                                  "so no clade specific markers are reported\n")
+
+
+if __name__ == '__main__':
+    metaphlan2()


=====================================
plugin_setup.py
=====================================
--- /dev/null
+++ b/plugin_setup.py
@@ -0,0 +1,76 @@
+# MetaPhlAn2 Plugin
+# Author: Francesco Asnicar
+# This module creates the QIIME2 plugin instance for MetaPhlAn2 and
+# registers functions for profiling single and paired fastq files.
+
+
+from qiime2.plugin import Plugin, Int
+from q2_types.sample_data import SampleData
+from q2_types.per_sample_sequences import SequencesWithQuality
+from q2_types.per_sample_sequences import PairedEndSequencesWithQuality
+from q2_types.feature_table import FeatureTable
+from q2_types.feature_table import Frequency
+import metaphlan2
+
+
+plugin = Plugin(
+    name='metaphlan2',
+    version='2.7.5',
+    website='http://segatalab.cibio.unitn.it/tools/metaphlan2/',
+    user_support_text='metaphlan-users at googlegroups.com',
+    package='metaphlan2',
+    citation_text=('Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, '
+                   'Pasolli E, Tett A, Huttenhower C, Segata N. MetaPhlAn2 '
+                   'for enhanced metagenomic taxonomic profiling. Nature '
+                   'Methods. 2015 Oct 1;12(10):902-3'),
+    description=('MetaPhlAn is a computational tool for profiling the '
+                 'composition of microbial communities (Bacteria, Archaea, '
+                 'Eukaryotes, and Viruses) from metagenomic shotgun '
+                 'sequencing data with species level resolution'),
+    short_description='MetaPhlAn2 for enhanced metagenomic taxonomic profiling'
+)
+
+plugin.methods.register_function(
+    function=metaphlan2._metaphlan2.profile_single_fastq,
+
+    inputs={'raw_data': SampleData[SequencesWithQuality]},
+    input_descriptions={'raw_data': ('metagenomic shotgun sequencing data')},
+
+    parameters={'nproc': Int},
+    parameter_descriptions={'nproc': ('The number of CPUs to use for '
+                                      'parallelizing the mapping, default 1 '
+                                      '(no parallelization)')},
+
+    outputs=[('biom_table', FeatureTable[Frequency])],
+    output_descriptions={'biom_table': ('Table relative abundances of the '
+                                        'species found in the input')},
+
+    name='MetaPhlAn2 taxonomic profiling',
+    description=(('MetaPhlAn is a computational tool for profiling the '
+                  'composition of microbial communities (Bacteria, Archaea, '
+                  'Eukaryotes, and Viruses) from metagenomic shotgun '
+                  'sequencing data with species level resolution'))
+)
+
+plugin.methods.register_function(
+    function=metaphlan2._metaphlan2.profile_paired_fastq,
+
+    inputs={'raw_data': SampleData[PairedEndSequencesWithQuality]},
+    input_descriptions={'raw_data': ('metagenomic shotgun sequencing data')},
+
+    parameters={'nproc': Int},
+    parameter_descriptions={'nproc': 'The number of CPUs to use for '
+                                     'parallelizing the mapping, default 1 '
+                                     '(no parallelization)'},
+
+    outputs=[('biom_table', FeatureTable[Frequency])],
+    output_descriptions={'biom_table': ('TAB-separated text file containing '
+                                        'relative abundances of the species '
+                                        'found in the input')},
+
+    name='MetaPhlAn2 taxonomic profiling',
+    description=('MetaPhlAn is a computational tool for profiling the '
+                 'composition of microbial communities (Bacteria, Archaea, '
+                 'Eukaryotes, and Viruses) from metagenomic shotgun '
+                 'sequencing data with species level resolution')
+)


=====================================
strainphlan.py
=====================================
--- a/strainphlan.py
+++ b/strainphlan.py
@@ -87,62 +87,62 @@ def read_params():
              'extracted from this sample will be used for all other samples.')
 
     p.add_argument(
-        '--mpa_pkl', 
-        required=False, 
-        default=os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200.pkl"), 
-        type=str, 
+        '--mpa_pkl',
+        required=False,
+        default=os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200.pkl"),
+        type=str,
         help='The database of metaphlan3.py.')
 
     p.add_argument(
-        '--output_dir', 
-        required=True, 
-        default='strainer_output', 
+        '--output_dir',
+        required=True,
+        default='strainer_output',
         type=str,
         help='The output directory.')
 
     p.add_argument(
-        '--ifn_markers', 
-        required=False, 
-        default=None, 
+        '--ifn_markers',
+        required=False,
+        default=None,
         type=str,
         help='The marker file in fasta format.')
 
     p.add_argument(
-        '--nprocs_main', 
-        required=False, 
-        default=1, 
+        '--nprocs_main',
+        required=False,
+        default=1,
         type=int,
         help='The number of processors are used for the main threads. '\
              'Default 1.')
 
     p.add_argument(
-        '--nprocs_load_samples', 
-        required=False, 
-        default=None, 
+        '--nprocs_load_samples',
+        required=False,
+        default=None,
         type=int,
         help='The number of processors are used for loading samples. '\
              'Default nprocs_main.')
 
     p.add_argument(
-        '--nprocs_align_clean', 
-        required=False, 
-        default=None, 
+        '--nprocs_align_clean',
+        required=False,
+        default=None,
         type=int,
         help='The number of processors are used for aligning and cleaning markers. '\
              'Default nprocs_main.')
 
     p.add_argument(
-        '--nprocs_raxml', 
-        required=False, 
-        default=None, 
+        '--nprocs_raxml',
+        required=False,
+        default=None,
         type=int,
         help='The number of processors are used for running raxml. '\
              'Default nprocs_main.')
 
     p.add_argument(
-        '--bootstrap_raxml', 
-        required=False, 
-        default=0, 
+        '--bootstrap_raxml',
+        required=False,
+        default=0,
         type=int,
         help='The number of runs for bootstraping when building the tree. '\
              'Default 0.')
@@ -156,8 +156,8 @@ def read_params():
         help='The reference genome file names. They are separated by spaces.')
 
     p.add_argument(
-        '--add_reference_genomes_as_second_samples', 
-        required=False, 
+        '--add_reference_genomes_as_second_samples',
+        required=False,
         dest='add_reference_genomes_as_second_samples',
         action='store_true',
         help='Add reference genomes as second samples. '\
@@ -323,10 +323,10 @@ def read_params():
              'Default 0.05.')
 
     p.add_argument(
-        '--clades', 
+        '--clades',
         nargs='+',
-        required=False, 
-        default=['all'], 
+        required=False,
+        default=['all'],
         type=str,
         help='The clades (space seperated) for which the script will compute '\
                 'the marker alignments in fasta format and the phylogenetic '\
@@ -335,16 +335,16 @@ def read_params():
                 'Default "automatically identify all clades".')
 
     p.add_argument(
-        '--marker_list_fn', 
-        required=False, 
-        default=None, 
+        '--marker_list_fn',
+        required=False,
+        default=None,
         type=str,
         help='The file name containing the list of considered markers. '\
                 'The other markers will be discarded. '\
                 'Default "None".')
     p.add_argument(
-        '--print_clades_only', 
-        required=False, 
+        '--print_clades_only',
+        required=False,
         dest='print_clades_only',
         action='store_true',
         help='Only print the potential clades and stop without building any '\
@@ -354,16 +354,16 @@ def read_params():
     p.set_defaults(print_clades_only=False)
 
     p.add_argument(
-        '--alignment_program', 
-        required=False, 
-        default='muscle', 
+        '--alignment_program',
+        required=False,
+        default='muscle',
         choices=['muscle', 'mafft'],
         type=str,
         help='The alignment program. Default "muscle".')
 
     p.add_argument(
-        '--relaxed_parameters', 
-        required=False, 
+        '--relaxed_parameters',
+        required=False,
         dest='relaxed_parameters',
         action='store_true',
         help='Set marker_in_clade=0.5, sample_in_marker=0.5, '\
@@ -372,8 +372,8 @@ def read_params():
     p.set_defaults(relaxed_parameters=False)
 
     p.add_argument(
-        '--relaxed_parameters2', 
-        required=False, 
+        '--relaxed_parameters2',
+        required=False,
         dest='relaxed_parameters2',
         action='store_true',
         help='Set marker_in_clade=0.2, sample_in_marker=0.2, '\
@@ -382,8 +382,8 @@ def read_params():
     p.set_defaults(relaxed_parameters2=False)
 
     p.add_argument(
-        '--relaxed_parameters3', 
-        required=False, 
+        '--relaxed_parameters3',
+        required=False,
         dest='relaxed_parameters3',
         action='store_true',
         help='Set gap_in_trailing_col=0.9, gap_in_internal_col=0.9, '\
@@ -394,16 +394,16 @@ def read_params():
     p.set_defaults(relaxed_parameters3=False)
 
     p.add_argument(
-        '--keep_alignment_files', 
-        required=False, 
+        '--keep_alignment_files',
+        required=False,
         dest='keep_alignment_files',
         action='store_true',
         help='Keep the alignment files of all markers before cleaning step.')
     p.set_defaults(keep_alignment_files=False)
 
     p.add_argument(
-        '--keep_full_alignment_files', 
-        required=False, 
+        '--keep_full_alignment_files',
+        required=False,
         dest='keep_full_alignment_files',
         action='store_true',
         help='Keep the alignment files of all markers before '\
@@ -413,16 +413,16 @@ def read_params():
     p.set_defaults(keep_full_alignment_files=False)
 
     p.add_argument(
-        '--save_sample2fullfreq', 
-        required=False, 
+        '--save_sample2fullfreq',
+        required=False,
         dest='save_sample2fullfreq',
         action='store_true',
         help='Save sample2fullfreq to a msgpack file sample2fullfreq.msgpack.')
     p.set_defaults(save_sample2fullfreq=False)
 
     p.add_argument(
-        '--use_threads', 
-        required=False, 
+        '--use_threads',
+        required=False,
         action='store_true',
         dest='use_threads',
         help='Use multithreading. Default "Use multiprocessing".')
@@ -581,10 +581,10 @@ def clean_alignment(
                 del_cols.append(df_seq.columns[i])
         if float(len(del_cols)) / len(df_seq.columns) < N_col:
             remove_N_col = True
-            df_seq.drop(del_cols, axis=1, inplace=True)   
-            df_freq.drop(del_cols, axis=1, inplace=True)   
+            df_seq.drop(del_cols, axis=1, inplace=True)
+            df_freq.drop(del_cols, axis=1, inplace=True)
             logger.debug('length after N_col: %d', len(df_seq.columns))
-            
+
         if N_count > 0 or not remove_N_col:
             logger.debug('replace Ns by gaps for all samples')
             for sample in samples:
@@ -606,7 +606,8 @@ def clean_alignment(
 
 
 
-def add_ref_genomes(genome2marker, marker_records, ifn_ref_genomes, tmp_dir):
+def add_ref_genomes(genome2marker, marker_records, ifn_ref_genomes, tmp_dir,
+                    nprocs_main=1):
     ifn_ref_genomes = sorted(list(set(ifn_ref_genomes)))
     logger.debug('add %d reference genomes'%len(ifn_ref_genomes))
     logger.debug('Number of samples: %d'%len(genome2marker))
@@ -636,8 +637,8 @@ def add_ref_genomes(genome2marker, marker_records, ifn_ref_genomes, tmp_dir):
         elif ifn_genome[-4:] == '.fna':
             ifile_genome = open(ifn_genome, 'r')
         else:
-            logger.error('Unknown file type of %s. '%ifn_genome +\
-                            'It should be .fna.bz2, .fna.gz, or .fna!')
+            logger.error('Unknown file type of %s. '%ifn_genome + \
+                         'It should be .fna.bz2, .fna.gz, or .fna!')
             exit(1)
 
         # extract genome contigs
@@ -655,41 +656,37 @@ def add_ref_genomes(genome2marker, marker_records, ifn_ref_genomes, tmp_dir):
 
         ifile_genome.close()
     p1.seek(0)
-                        
+
     # build blastdb
     logger.debug('build blastdb')
     blastdb_prefix = oosp.ftmp('genome_blastn_db_%s'%(random.random()))
     if len(glob.glob('%s*'%blastdb_prefix)):
         logger.error('blastdb exists! Please remove it or rerun!')
         exit(1)
-    oosp.ex('makeblastdb', 
-                args=[
-                        '-dbtype', 'nucl', 
-                        '-title', 'genome_db',
-                        '-out', blastdb_prefix],
+    oosp.ex('makeblastdb',
+                args=['-dbtype', 'nucl',
+                      '-title', 'genome_db',
+                      '-out', blastdb_prefix],
                 in_pipe=p1,
                 verbose=True)
 
     # blast markers against contigs
     logger.debug('blast markers against contigs')
     p1 = SpooledTemporaryFile(dir=tmp_dir)
+
     for marker in unique_markers:
         SeqIO.write(marker_records[marker], p1, 'fasta')
+
     p1.seek(0)
-    blastn_args = [
-                    '-db', blastdb_prefix,
-                    '-outfmt', '6',
-                    '-evalue', '1e-10',
-                    '-max_target_seqs', '1000000000']
-    if args['nprocs_main'] > 1:
-        blastn_args += ['-num_threads', str(args['nprocs_main'])]
-    output = oosp.ex(
-                        'blastn', 
-                        args=blastn_args,
-                        in_pipe=p1,
-                        get_out_pipe=True,
-                        verbose=True)
-    
+    blastn_args = ['-db', blastdb_prefix, '-outfmt', '6', '-evalue', '1e-10',
+                   '-max_target_seqs', '1000000000']
+
+    if nprocs_main > 1:
+        blastn_args += ['-num_threads', str(nprocs_main)]
+
+    output = oosp.ex('blastn', args=blastn_args, in_pipe=p1, get_out_pipe=True,
+                     verbose=True)
+
     #output = output.split('\n')
     for line in output:
         if line.strip() == '':
@@ -789,7 +786,7 @@ def align_clean(args):
 
     sample2seq, sample2freq = clean_alignment(
                                     sample2marker.keys(),
-                                    sample2seq, 
+                                    sample2seq,
                                     sample2freq,
                                     gap_in_trailing_col,
                                     gap_trailing_col_limit,
@@ -805,7 +802,7 @@ def align_clean(args):
 
 def build_tree(
         clade,
-        sample2marker, 
+        sample2marker,
         sample2order,
         clade2num_markers,
         sample_in_clade,
@@ -851,7 +848,7 @@ def build_tree(
                           %clade2num_markers[clade])
 
     # align sequences in each marker
-    markers = set([]) 
+    markers = set([])
     for sample in sample2marker:
         if sample2order[sample] == 'first':
             for marker in sample2marker[sample]:
@@ -890,11 +887,15 @@ def build_tree(
 
     logger.debug('start to align_clean for all markers')
     results = ooSubprocess.parallelize(
-                                       align_clean, 
-                                       args_list, 
+                                       align_clean,
+                                       args_list,
                                        nprocs_align_clean,
                                        use_threads=use_threads)
 
+    if len(results) == 0:
+        logger.debug('No markers found!')
+        return
+
     sample2seqs, sample2freqs = zip(*results)
     sample2fullseq = defaultdict(list)
     sample2fullfreq = defaultdict(list)
@@ -931,7 +932,7 @@ def build_tree(
         logger.debug('all markers were removed, skip this clade!')
         ofile_cladeinfo.write('all markers were removed, skip this clade!\n')
         return
-    
+
     # remove long gaps
     logger.debug('full sequence length before long_gap_length: %d'\
                     %(len(sample2fullseq[sample2fullseq.keys()[0]])))
@@ -978,7 +979,7 @@ def build_tree(
                         %(len(sample2fullseq[sample2fullseq.keys()[0]])))
 
         for i in range(len(marker_pos)):
-            num_del = 0 
+            num_del = 0
             for p in del_pos:
                 if marker_pos[i][1] > p:
                     num_del += 1
@@ -1004,7 +1005,7 @@ def build_tree(
             'number of samples before gap_in_sample: %d\n'\
             %len(sample2fullseq))
     for sample in sample2marker:
-        ratio = float(sample2fullseq[sample].count('-')) / len(sample2fullseq[sample]) 
+        ratio = float(sample2fullseq[sample].count('-')) / len(sample2fullseq[sample])
         gap_ratio = gap_in_sample if (sample2order[sample] == 'first') else second_gap_in_sample
         if ratio > gap_ratio:
             del sample2fullseq[sample]
@@ -1035,7 +1036,7 @@ def build_tree(
                 sequential_gaps.append(sgap)
                 sgap = 0
         all_gaps.append(agap)
-        
+
     ofile_cladeinfo.write('all_gaps:\n' + statistics(all_gaps)[1])
     if sequential_gaps == []:
         sequential_gaps = [0]
@@ -1082,11 +1083,11 @@ def build_tree(
                     id=sample,
                     description='',
                     seq=Seq.Seq(''.join(sample2fullseq[sample]))),
-                ofile, 
+                ofile,
                 'fasta')
 
     # produce tree
-    oosp = ooSubprocess.ooSubprocess() 
+    oosp = ooSubprocess.ooSubprocess()
     #ofn_tree = os.path.join(output_dir, '%s.tree'%clade)
     #oosp.ex('FastTree', args=['-quiet', '-nt', ofn_align], out_fn=ofn_tree)
     ofn_tree = clade + '.tree'
@@ -1096,7 +1097,7 @@ def build_tree(
                     %(os.path.abspath(output_dir), ofn_tree)):
             os.remove(fn)
         raxml_args = [
-                            '-s', os.path.abspath(ofn_align), 
+                            '-s', os.path.abspath(ofn_align),
                             '-w', os.path.abspath(output_dir),
                             '-n', ofn_tree,
                             '-p', '1234'
@@ -1130,7 +1131,7 @@ def build_tree(
 def load_sample(args):
     ifn_sample = args['ifn_sample']
     logger.debug('load %s'%ifn_sample)
-    output_dir = args['output_dir'] 
+    output_dir = args['output_dir']
     ifn_markers = args['ifn_markers']
     clades = args['clades']
     kept_clade = args['kept_clade']
@@ -1192,7 +1193,7 @@ def load_sample(args):
                          remove_clade]
         for marker in remove_marker:
             del marker2seq[marker]
-     
+
         sample_clades = set([])
         for marker in marker2seq:
             clade = db['markers'][marker]['taxon'].split('|')[-1]
@@ -1203,7 +1204,7 @@ def load_sample(args):
 
 
 def load_all_samples(args, sample2order, kept_clade, kept_markers):
-    ifn_samples = args['ifn_samples'] + args['ifn_second_samples'] 
+    ifn_samples = args['ifn_samples'] + args['ifn_second_samples']
     if args['ifn_representative_sample']:
         ifn_samples.append(args['ifn_representative_sample'])
     ifn_samples = sorted(list(set(ifn_samples)))
@@ -1216,9 +1217,9 @@ def load_all_samples(args, sample2order, kept_clade, kept_markers):
             func_args['ifn_sample'] = ifn_sample
             func_args['kept_clade'] = kept_clade
             func_args['kept_markers'] = kept_markers
-            for k in [ 
+            for k in [
                       'output_dir',
-                      'ifn_markers', 'nprocs_load_samples', 
+                      'ifn_markers', 'nprocs_load_samples',
                       'clades',
                       'mpa_pkl',
                       ]:
@@ -1240,7 +1241,7 @@ def load_all_samples(args, sample2order, kept_clade, kept_markers):
             for i in range(len(ifn_samples)):
                 sample = ooSubprocess.splitext(ifn_samples[i])[0]
                 if len(results[i]): # skip samples with no markers
-                    sample2marker[sample] = results[i] 
+                    sample2marker[sample] = results[i]
             return sample2marker
         else:
             all_clades = set([])
@@ -1275,12 +1276,12 @@ def strainer(args):
     if args['keep_full_alignment_files']:
         args['keep_alignment_files'] = True
         args['marker_strip_length'] = 0
-        
+
     if os.path.isfile(args['clades'][0]):
         with open(args['clades'][0], 'r') as ifile:
             args['clades'] = [line.strip() for line in ifile]
 
- 
+
     # check conditions
     ooSubprocess.mkdir(args['output_dir'])
     with open(os.path.join(args['output_dir'], 'arguments.txt'), 'w') as ofile:
@@ -1340,7 +1341,7 @@ def strainer(args):
             del db['markers'][m]['score']
         gc.collect()
         #logger.debug('converted db')
-        
+
         # get clades from db
         logger.info('Get clades from db')
         sing_clades, clade2num_markers, clade2subclades = get_db_clades(db)
@@ -1361,7 +1362,7 @@ def strainer(args):
         sample = ooSubprocess.splitext(ifn)[0]
         if sample not in sample2order:
             sample2order[sample] = 'second'
-    
+
     kept_markers = set([])
     if args['marker_list_fn']:
         with open(args['marker_list_fn'], 'r') as ifile:
@@ -1383,11 +1384,11 @@ def strainer(args):
                      'sample: %d'%len(kept_markers))
         if not kept_markers:
             raise Exception('Number of markers in the representative sample is 0!')
-    
+
     # get clades from samples
     if args['clades'] == ['all']:
         logger.info('Get clades from samples')
-        args['clades'] = load_all_samples(args, 
+        args['clades'] = load_all_samples(args,
                                           sample2order,
                                           kept_clade=None,
                                           kept_markers=kept_markers)
@@ -1395,24 +1396,24 @@ def strainer(args):
     if args['print_clades_only']:
         for c in args['clades']:
             if c.startswith('s__'):
-                print c
+                print(c)
             else:
-                print c, '(%s)'%(','.join(list(clade2subclades[c])))
+                print(c, '(%s)'%(','.join(list(clade2subclades[c]))))
         return
 
     # add reference genomes
     ref2marker = defaultdict(dict)
-    if args['ifn_markers'] != None and args['ifn_ref_genomes'] != None:
+
+    if (args['ifn_markers'] is not None) and (args['ifn_ref_genomes'] is not None):
         logger.info('Add reference genomes')
         marker_records = {}
+
         for rec in SeqIO.parse(open(args['ifn_markers'], 'r'), 'fasta'):
             if rec.id in kept_markers or (not kept_markers):
                 marker_records[rec.id] = rec
-        add_ref_genomes(
-                        ref2marker, 
-                        marker_records, 
-                        args['ifn_ref_genomes'],
-                        args['output_dir'])
+
+        add_ref_genomes(ref2marker, marker_records, args['ifn_ref_genomes'],
+                        args['output_dir'], args['nprocs_main'])
 
         # remove bad reference genomes
         if not kept_markers:
@@ -1438,7 +1439,7 @@ def strainer(args):
         logger.info('Build the tree for %s'%clade)
 
         # load samples and reference genomes
-        sample2marker = load_all_samples(args, 
+        sample2marker = load_all_samples(args,
                                          sample2order,
                                          kept_clade=clade,
                                          kept_markers=kept_markers)
@@ -1481,7 +1482,7 @@ def strainer(args):
         shared_variables.sample2marker = sample2marker
         build_tree(
             clade=clade,
-            sample2marker=sample2marker, 
+            sample2marker=sample2marker,
             sample2order=sample2order,
             clade2num_markers=clade2num_markers,
             sample_in_clade=args['sample_in_clade'],
@@ -1530,9 +1531,11 @@ def check_dependencies(args):
             exit(1)
 
 
-
-
-if __name__ == "__main__":
+def strainphlan():
     args = read_params()
     check_dependencies(args)
     strainer(args)
+
+
+if __name__ == "__main__":
+    strainphlan()


=====================================
strainphlan_src/add_metadata_tree.py
=====================================
--- a/strainphlan_src/add_metadata_tree.py
+++ b/strainphlan_src/add_metadata_tree.py
@@ -11,6 +11,7 @@ import copy
 import ConfigParser
 import dendropy
 import numpy
+import ipdb
 
 
 def read_params():
@@ -70,6 +71,7 @@ def main(args):
         tree = dendropy.Tree(stream=open(ifn_tree, 'r'), schema='newick')
         for node in tree.leaf_nodes():
             sample = node.get_node_str().strip("'")
+            sample = sample.replace(' ', '_')
             sample = sample.replace(args['string_to_remove'], '')
             prefixes = [prefix for prefix in 
                             ['k__', 'p__', 'c__', 'o__', 


=====================================
strainphlan_src/compute_distance.py
=====================================
--- a/strainphlan_src/compute_distance.py
+++ b/strainphlan_src/compute_distance.py
@@ -44,7 +44,7 @@ def read_params():
                    required=False,
                    dest='overwrite', 
                    action='store_true')
-    p.set_defaults(overwrite=True)
+    p.set_defaults(overwrite=False)
 
     return vars(p.parse_args())
 
@@ -56,7 +56,10 @@ def get_dist(seq1, seq2, ignore_gaps):
         exit(1)
 
     abs_dist = 0.0
+    abs_snp = 0
     for i in range(len(seq1)):
+        if seq1[i] != '-' and seq2[i] != '-':
+            abs_snp += 1
         if seq1[i] != seq2[i]:
             if ignore_gaps:
                 if seq1[i] != '-' and seq2[i] != '-':
@@ -67,12 +70,13 @@ def get_dist(seq1, seq2, ignore_gaps):
     abs_sim = len(seq1) - abs_dist
     rel_dist = abs_dist / float(len(seq1))
     rel_sim = 1.0 - rel_dist
-    return abs_dist, rel_dist, abs_sim, rel_sim
+    rel_snp = abs_snp / float(len(seq1))
+    return abs_dist, rel_dist, abs_sim, rel_sim, abs_snp, rel_snp
     
 
-def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps):
+def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps, overwrite):
     ofn_abs_dist = ofn_prefix + '.abs_dist'
-    if os.path.isfile(ofn_abs_dist):
+    if (not overwrite) and os.path.isfile(ofn_abs_dist.replace('.abs_dist', '.rel_dist')):
         print 'File %s exists, skip!'%ofn_abs_dist
         return
     else:
@@ -84,16 +88,22 @@ def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps):
     abs_dist_flat = []
     rel_dist = numpy.zeros((len(recs), len(recs)))
     rel_dist_flat = []
+
     abs_sim = numpy.zeros((len(recs), len(recs)))
     abs_sim_flat = []
     rel_sim = numpy.zeros((len(recs), len(recs)))
     rel_sim_flat = []
 
+    abs_snp = numpy.zeros((len(recs), len(recs)))
+    abs_snp_flat = []
+    rel_snp = numpy.zeros((len(recs), len(recs)))
+    rel_snp_flat = []
+
     for i in range(len(recs)):
-        for j in range(i+1, len(recs)):
-            abs_d, rel_d, abs_s, rel_s = get_dist(recs[i].seq, 
-                                                  recs[j].seq,
-                                                  ignore_gaps)
+        for j in range(i, len(recs)):
+            abs_d, rel_d, abs_s, rel_s, abs_sp, rel_sp = get_dist(recs[i].seq, 
+                                                                recs[j].seq,
+                                                                ignore_gaps)
 
             abs_dist[i][j] = abs_d
             abs_dist[j][i] = abs_d
@@ -111,6 +121,13 @@ def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps):
             rel_sim[j][i] = rel_s
             rel_sim_flat.append(rel_s)
 
+            abs_snp[i][j] = abs_sp
+            abs_snp[j][i] = abs_sp
+            abs_snp_flat.append(abs_sp)
+            
+            rel_snp[i][j] = rel_sp
+            rel_snp[j][i] = rel_sp
+            rel_snp_flat.append(rel_sp)
 
     labels = [rec.name for rec in recs]
     oosp = ooSubprocess()
@@ -183,12 +200,24 @@ def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps):
               '--max_flabel_len', '200'])
     '''       
 
+    ofn_abs_snp = ofn_prefix + '.abs_snp'
+    dist2file(abs_snp, labels, ofn_abs_snp)
+    with open(ofn_abs_snp + '.info', 'w') as ofile:
+        ofile.write(statistics(abs_snp_flat)[1])
+    ofn_rel_snp = ofn_prefix + '.rel_snp'
+    dist2file(rel_snp, labels, ofn_rel_snp)
+    with open(ofn_rel_snp + '.info', 'w') as ofile:
+        ofile.write(statistics(rel_snp_flat)[1])
+ 
+
+
 
 def main(args):
     compute_dist_matrix(
                         args['ifn_alignment'], 
                         args['ofn_prefix'],
-                        args['ignore_gaps']) 
+                        args['ignore_gaps'],
+                        args['overwrite']) 
     
 if __name__ == "__main__":
     args = read_params()


=====================================
strainphlan_src/compute_distance_all.py
=====================================
--- /dev/null
+++ b/strainphlan_src/compute_distance_all.py
@@ -0,0 +1,63 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '9 Feb 2015'
+
+import sys
+import os
+ABS_PATH = os.path.abspath(sys.argv[0])
+MAIN_DIR = os.path.dirname(ABS_PATH)
+os.environ['PATH'] += ':%s'%MAIN_DIR
+sys.path.append(MAIN_DIR)
+import argparse as ap
+from Bio import SeqIO, Seq, SeqRecord
+from collections import defaultdict
+import numpy
+from compute_distance import compute_dist_matrix
+from ooSubprocess import parallelize
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--ifn_alignments', nargs='+', required=True, default=None, type=str)
+    p.add_argument('--nprocs', required=True, default=None, type=int)
+    p.add_argument('--count_gaps', 
+                   required=False,
+                   dest='ignore_gaps', 
+                   action='store_false')
+    p.set_defaults(ignore_gaps=True)
+
+
+    return vars(p.parse_args())
+
+
+
+def compute_dist_matrix_wrapper(args):
+    compute_dist_matrix(
+                        args['ifn_alignment'], 
+                        args['ofn_prefix'],
+                        args['ignore_gaps'],
+                        overwrite=True) 
+    
+
+
+def main(args):
+    args_list = []
+    for i in range(len(args['ifn_alignments'])):
+        args_list.append({})
+        args_list[i]['ifn_alignment'] = args['ifn_alignments'][i]
+        args_list[i]['ofn_prefix'] = args['ifn_alignments'][i] 
+        if not args['ignore_gaps']:
+            args_list[i]['ofn_prefix'] += '.count_gaps'
+        args_list[i]['ignore_gaps'] = args['ignore_gaps']
+
+    parallelize(compute_dist_matrix_wrapper, args_list, args['nprocs'])
+
+
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)


=====================================
strainphlan_src/plot_tree_ete2.py
=====================================
--- /dev/null
+++ b/strainphlan_src/plot_tree_ete2.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '7 Sep 2016'
+
+import sys
+import os
+import argparse 
+from ete2 import Tree, TreeStyle, NodeStyle
+
+def read_params():
+    p = argparse.ArgumentParser()
+    p.add_argument(
+        '--ifn', 
+        required=True, 
+        default=None, 
+        type=str,
+        help='The input tree file.')
+    p.add_argument(
+        '--ofn', 
+        required=False, 
+        default=None, 
+        type=str,
+        help='The input tree file.')
+
+
+    return p.parse_args()
+
+
+def main(args):
+    if args.ofn == None:
+        args.ofn = args.ifn + '.png'
+    ts = TreeStyle()
+    ts.show_leaf_name = True
+    nstyle = NodeStyle()
+    #nstyle["shape"] = "sphere"
+    nstyle["size"] = 5
+    #nstyle["fgcolor"] = "darkred"
+    nstyle["vt_line_width"] = 2
+    nstyle["hz_line_width"] = 2
+
+    #ts.mode = "c"
+    tree = Tree(args.ifn)
+    for n in tree.traverse():
+        n.set_style(nstyle)
+    tree.render(args.ofn, tree_style=ts, dpi=300, units='in') #, h=20, w=20)
+
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)


=====================================
strainphlan_src/sample2markers.py
=====================================
--- a/strainphlan_src/sample2markers.py
+++ b/strainphlan_src/sample2markers.py
@@ -50,9 +50,12 @@ def read_params():
     p.add_argument('--min_read_len', required=False, default=90, type=int)
     p.add_argument('--min_align_score', required=False, default=None, type=int)
     p.add_argument('--min_base_quality', required=False, default=30, type=float)
+    p.add_argument('--min_read_depth', required=False, default=0, type=int)
     p.add_argument('--error_rate', required=False, default=0.01, type=float)
     p.add_argument('--marker2file_ext', required=False, default='.markers', type=str)
     p.add_argument('--sam2file_ext', required=False, default='.sam.bz2', type=str)
+    p.add_argument('--samtools_exe', required=False, default='samtools', type=str)
+    p.add_argument('--bcftools_exe', required=False, default='bcftools', type=str)
     p.add_argument(
         '--verbose', 
         required=False, 
@@ -114,6 +117,7 @@ def sample2markers(
         min_read_len,
         min_align_score,
         min_base_quality,
+        min_read_depth,
         error_rate,
         ifn_markers, 
         index_path,
@@ -121,7 +125,9 @@ def sample2markers(
         sam2file=None,
         marker2file=None, 
         tmp_dir='tmp',
-        quiet=False):
+        quiet=False,
+        samtools_exe='samtools',
+        bcftools_exe='bcftools'):
 
     '''
     Compute the consensus markers in a sample file ifn_sample.
@@ -181,13 +187,18 @@ def sample2markers(
                                 os.path.basename(ifn_sample) + '.bam.sorted')
 
     return sam2markers(
-                       sam_file=sam_pipe, 
-                       ofn_bam_sorted_prefix=ofn_bam_sorted_prefix,
-                       marker2file=marker2file, 
-                       oosp=oosp, 
-                       tmp_dir=tmp_dir,
-                       quiet=quiet)
-
+               sam_file=sam_pipe, 
+               ofn_bam_sorted_prefix=ofn_bam_sorted_prefix,
+               min_align_score=min_align_score,
+               min_base_quality=min_base_quality,
+               min_read_depth=min_read_depth,
+               error_rate=error_rate,
+               marker2file=marker2file, 
+               oosp=oosp, 
+               tmp_dir=tmp_dir,
+               quiet=quiet,
+               samtools_exe=samtools_exe,
+               bcftools_exe=bcftools_exe)
 
 
 def save2file(tmp_file, ofn):
@@ -204,11 +215,14 @@ def sam2markers(
                 ofn_bam_sorted_prefix,
                 min_align_score=None,
                 min_base_quality=30,
+                min_read_depth=0,
                 error_rate=0.01,
                 marker2file=None, 
                 oosp=None, 
                 tmp_dir='tmp',
-                quiet=False):
+                quiet=False,
+                samtools_exe='samtools',
+                bcftools_exe='bcftools'):
     '''
     Compute the consensus markers in a sample from a sam content.
 
@@ -250,7 +264,7 @@ def sam2markers(
                                  stderr=error_pipe)
     # sam to bam
     p2 = oosp.chain(
-                    'samtools', 
+                    samtools_exe, 
                     args=['view', '-bS', '-'], 
                     in_pipe=p1_filtered,
                     stderr=error_pipe)
@@ -260,8 +274,8 @@ def sam2markers(
     for tmp_fn in tmp_fns:
         os.remove(tmp_fn)
     p3 = oosp.chain(
-                    'samtools', 
-                    args=['sort', '-o', '-', ofn_bam_sorted_prefix], 
+                    samtools_exe, 
+                    args=['sort', '-', '-o',  ofn_bam_sorted_prefix], 
                     in_pipe=p2,
                     stderr=error_pipe)
 
@@ -278,7 +292,7 @@ def sam2markers(
                 q = pileupread.alignment.query_qualities[pileupread.query_position]
                 if q >= min_base_quality:
                     pileup[b] += 1
-        if len(pileup):
+        if len(pileup) and sum(pileup.values()) > min_read_depth:
             f = float(max(pileup.values())) / sum(pileup.values())
             p = stats.binom.cdf(max(pileup.values()), sum(pileup.values()), 1.0 - error_rate)
             freq = (f, sum(pileup.values()), p)
@@ -293,14 +307,14 @@ def sam2markers(
     # bam to mpileup
     p3.seek(0)
     p4 = oosp.chain(
-                    'samtools', 
+                    samtools_exe, 
                     args=['mpileup', '-u', '-'], 
                     in_pipe=p3,
                     stderr=error_pipe)
 
     # mpileup to vcf
     p5 = oosp.chain(
-                    'bcftools', 
+                    bcftools_exe, 
                     args=['view', '-c', '-g', '-p', '1.1', '-'], 
                     in_pipe=p4,
                     stderr=error_pipe)
@@ -359,6 +373,7 @@ def run_sample(args_list):
                     min_read_len=args['min_read_len'],
                     min_align_score=args['min_align_score'],
                     min_base_quality=args['min_base_quality'],
+                    min_read_depth=args['min_read_depth'],
                     error_rate=args['error_rate'],
                     ifn_markers=args['ifn_markers'], 
                     index_path=args['index_path'],
@@ -366,7 +381,9 @@ def run_sample(args_list):
                     sam2file=sam2file,
                     marker2file=marker2file, 
                     tmp_dir=args['output_dir'],
-                    quiet=args['quiet'])
+                    quiet=args['quiet'],
+                    samtools_exe=args['samtools_exe'],
+                    bcftools_exe=args['bcftools_exe'])
     else:
         ofn_bam_sorted_prefix = os.path.join(
                             args['output_dir'],
@@ -376,9 +393,12 @@ def run_sample(args_list):
                     ofn_bam_sorted_prefix=ofn_bam_sorted_prefix,
                     min_align_score=args['min_align_score'],
                     min_base_quality=args['min_base_quality'],
+                    min_read_depth=args['min_read_depth'],
                     error_rate=args['error_rate'],
                     marker2file=marker2file,
-                    quiet=args['quiet'])
+                    quiet=args['quiet'],
+                    samtools_exe=args['samtools_exe'],
+                    bcftools_exe=args['bcftools_exe'])
     return 0
 
 


=====================================
strainphlan_tutorial/step2_fastq2sam.sh
=====================================
--- a/strainphlan_tutorial/step2_fastq2sam.sh
+++ b/strainphlan_tutorial/step2_fastq2sam.sh
@@ -4,5 +4,5 @@ for f in $(ls fastqs/*.bz2)
 do
     echo "Running metaphlan2 on ${f}"
     bn=$(basename ${f} | cut -d . -f 1)
-    tar xjfO ${f} | ../metaphlan2.py --bowtie2db ../db_v20/mpa_v20_m200 --mpa_pkl ../db_v20/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
+    tar xjfO ${f} | ../metaphlan2.py --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
 done


=====================================
utils/extract_markers.py
=====================================
--- a/utils/extract_markers.py
+++ b/utils/extract_markers.py
@@ -29,7 +29,7 @@ def extract_markers(mpa_pkl, ifn_markers, clade, ofn_markers):
     for marker in db['markers']:
         if clade == db['markers'][marker]['taxon'].split('|')[-1]:
             markers.add(marker)
-    print 'number of markers', len(markers)
+    print('number of markers {}'.format(len(markers)))
     with open(ofn_markers, 'w') as ofile:
         if ifn_markers:
             for rec in SeqIO.parse(open(ifn_markers, 'r'), 'fasta'):


=====================================
utils/markers_info.txt.bz2 deleted
=====================================
Binary files a/utils/markers_info.txt.bz2 and /dev/null differ


=====================================
utils/metaphlan_hclust_heatmap.py
=====================================
--- a/utils/metaphlan_hclust_heatmap.py
+++ b/utils/metaphlan_hclust_heatmap.py
@@ -209,7 +209,7 @@ def features_dend_panel( fig, Z, Z2, width, lw ):
     ax1 = fig.add_axes([-width,0.0,width,1.0], frameon=False)
     Z2['color_list'] = [c.replace('b','k').replace('x','b') for c in Z2['color_list']]
     mh = max(Z[:,2])
-    sch._plot_dendrogram(Z2['icoord'], Z2['dcoord'], Z2['ivl'], Z.shape[0] + 1, Z.shape[0] + 1, mh, 'right', no_labels=True, color_list=Z2['color_list'])
+    sch._plot_dendrogram(Z2['icoord'], Z2['dcoord'], Z2['ivl'], Z.shape[0] + 1, Z.shape[0] + 1, mh, 'left', no_labels=True, color_list=Z2['color_list'])
     for coll in ax1.collections:
         coll._linewidths = (lw,)
     ax1.set_xticks([])


=====================================
utils/read_fastx.py
=====================================
--- /dev/null
+++ b/utils/read_fastx.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python
+
+
+import sys
+import os
+import bz2
+import gzip
+import glob
+from Bio import SeqIO
+try:
+    import StringIO as uio
+except ImportError:
+    import io as uio
+
+
+p2 = float(sys.version_info[0]) < 3.0
+
+
+def ignore_spaces(l, forced=False):
+    if (l[0] == '@') or (l[0] == '>') or forced:
+        return l.replace(' ', '_')
+
+    return l
+
+
+def fastx(l):
+    if l:
+        if l[0] == '@':
+            return 'fastq'
+
+        if l[0] == '>':
+            return 'fasta'
+
+    sys.stderr.write("\nError, input data has to be in fastq or fasta format\n\n")
+    sys.exit(-1)
+
+
+def fopen(fn):
+    fileName, fileExtension = os.path.splitext(fn)
+
+    if fileExtension == '.bz2':
+        return (bz2.open(fn, "rt") if not p2 else bz2.BZ2File(fn, "r"))
+
+    if fileExtension == '.gz':
+        return gzip.open(fn, "rt")
+
+    return open(fn)
+
+
+def read_and_write_raw_int(fd, min_len=None):
+    fmt = None
+
+    if min_len:
+        r = []
+
+        while True:
+            l = fd.readline()
+
+            if not fmt:
+                fmt = fastx(l)
+                readn = 4 if fmt == 'fastq' else 2
+
+            r.append(l)
+
+            if len(r) == readn:
+                break
+
+        for record in SeqIO.parse(uio.StringIO("".join(r)), fmt):
+            if len(record) >= min_len:
+                record.id = ignore_spaces(record.description, forced=True)
+                record.description = ""
+                SeqIO.write(record, sys.stdout, fmt)
+
+        for record in SeqIO.parse(fd, fmt):
+            if len(record) >= min_len:
+                record.id = ignore_spaces(record.description, forced=True)
+                record.description = ""
+                SeqIO.write(record, sys.stdout, fmt)
+    else:
+        for l in fd:
+            sys.stdout.write(ignore_spaces(l))
+
+
+def read_and_write_raw(fd, opened=False, min_len=None):
+    if opened:
+        read_and_write_raw_int(fd, min_len=min_len)
+    else:
+        with fopen(fd) as inf:
+            read_and_write_raw_int(inf, min_len=min_len)
+
+
+if __name__ == '__main__':
+    min_len = None
+    args = []
+
+    if len(sys.argv) > 1:
+        for l in sys.argv[1:]:
+            if l in ['-h', '--help', '-v', '--version']:
+                sys.stderr.write("Help message for " +
+                                 os.path.basename(sys.argv[0]) + "\n")
+                sys.exit(0)
+
+            if min_len == 'next':
+                min_len = int(l)
+            elif l in ['-l', '--min_len']:
+                min_len = 'next'
+            else:
+                args.append(l)
+
+    if len(args) == 0:
+        read_and_write_raw(sys.stdin, opened=True, min_len=min_len)
+    else:
+        files = []
+
+        for a in args:
+            for f in a.split(','):
+                if os.path.isdir(f):
+                    files += list(glob.iglob(os.path.join(f, "*fastq*")))
+                else:
+                    files += [f]
+
+        for f in files:
+            read_and_write_raw(f, opened=False, min_len=min_len)



View it on GitLab: https://salsa.debian.org/med-team/metaphlan2/commit/4f4e0b555a70b29aec2d13ebaed49bc099183e4c

---
View it on GitLab: https://salsa.debian.org/med-team/metaphlan2/commit/4f4e0b555a70b29aec2d13ebaed49bc099183e4c
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debian-med-commit/attachments/20180216/4abd15ce/attachment-0001.html>