[med-svn] r19049 - in trunk/packages/mummer/trunk/debian: . patches

Andreas Tille tille at moszumanska.debian.org
Tue Apr 14 12:39:04 UTC 2015


Author: tille
Date: 2015-04-14 12:39:02 +0000 (Tue, 14 Apr 2015)
New Revision: 19049

Added:
   trunk/packages/mummer/trunk/debian/delta-filter.1
   trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch
Modified:
   trunk/packages/mummer/trunk/debian/README.Debian
   trunk/packages/mummer/trunk/debian/changelog
   trunk/packages/mummer/trunk/debian/delta2blocks.1
   trunk/packages/mummer/trunk/debian/mummer.1
   trunk/packages/mummer/trunk/debian/mummer.links
   trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch
   trunk/packages/mummer/trunk/debian/patches/series
Log:
Add some patches from mugsy to include delta-filter -b for reporting duplications


Modified: trunk/packages/mummer/trunk/debian/README.Debian
===================================================================
--- trunk/packages/mummer/trunk/debian/README.Debian	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/README.Debian	2015-04-14 12:39:02 UTC (rev 19049)
@@ -8,7 +8,15 @@
    delta2blocks
    delta2maf
 
-These tools are taken over to Debian's mummer code base to enable
-droping the extra code copy in mugsy.
+Moreover mugsy provides modifications to include delta-filter -b for
+reporting duplications.
 
+These tools and the additional option are taken over to Debian's mummer
+code base to enable droping the extra code copy in mugsy.
+
+Please test and report problems via
+
+    reportbug mummer
+
+
  -- Andreas Tille <tille at debian.org>  Mon, 13 Apr 2015 22:29:27 +0200

Modified: trunk/packages/mummer/trunk/debian/changelog
===================================================================
--- trunk/packages/mummer/trunk/debian/changelog	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/changelog	2015-04-14 12:39:02 UTC (rev 19049)
@@ -2,6 +2,8 @@
 
   * Moved debian/upstream to debian/upstream/metadata
   * Add some patches from mugsy enhancing functionality by adding two tools
+  * Add some patches from mugsy to include delta-filter -b for reporting
+    duplications
   * cme fix dpkg-control
   * Remove SF privacy breach script from docs
   * Fix manpage syntax and add missing manpage

Added: trunk/packages/mummer/trunk/debian/delta-filter.1
===================================================================
--- trunk/packages/mummer/trunk/debian/delta-filter.1	                        (rev 0)
+++ trunk/packages/mummer/trunk/debian/delta-filter.1	2015-04-14 12:39:02 UTC (rev 19049)
@@ -0,0 +1,74 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.46.4.
+.TH DELTA-FILTER "1" "April 2015" "mummer 3.23" "User Commands"
+.SH NAME
+delta-filter \- read a delta alignment file from either nucmer or promer and filters the alignments
+.SH SYNOPSIS
+.B delta\-filter 
+.RI [options]  <deltafile>
+.SH DESCRIPTION
+.TP
+.BR \fB\-1\fR
+1\-to\-1 alignment allowing for rearrangements (intersection of \fB\-r\fR and \fB\-q\fR alignments)
+.TP
+\fB\-g\fR
+1\-to\-1 global alignment not allowing rearrangements
+.TP
+\fB\-h\fR
+Display help information
+.TP
+\fB\-i\fR float
+Set the minimum alignment identity [0, 100], default 0
+.TP
+\fB\-l\fR int
+Set the minimum alignment length, default 0
+.TP
+\fB\-m\fR
+Many\-to\-many alignment allowing for rearrangements (union of \fB\-r\fR and \fB\-q\fR alignments)
+.TP
+\fB\-q\fR
+Maps each position of each query to its best hit in
+the reference, allowing for reference overlaps
+.TP
+\fB\-r\fR
+Maps each position of each reference to its best hit
+in the query, allowing for query overlaps
+.TP
+\fB\-u\fR float      Set the minimum alignment uniqueness, i.e. percent of
+.IP
+the alignment matching to unique reference AND query
+sequence [0, 100], default 0
+.TP
+\fB\-o\fR float
+Set the maximum alignment overlap for \fB\-r\fR and \fB\-q\fR options
+as a percent of the alignment length [0, 100], default 100
+.TP
+\fB\-v\fR
+Print the discarded alignments instead of those that pass filters
+.TP
+\fB\-b\fR
+Maps duplications
+(XOR of \fB\-r\fR and \fB\-q\fR alignments, one or the other but not both)
+.PP
+Reads a delta alignment file from either nucmer or promer and
+filters the alignments based on the command\-line switches, leaving
+only the desired alignments which are output to stdout in the same
+delta format as the input. For multiple switches, order of operations
+is as follows: \fB\-i\fR \fB\-l\fR \fB\-u\fR \fB\-q\fR \fB\-r\fR \fB\-g\fR \fB\-m\fR \fB\-1\fR \fB\-b\fR. If an alignment is excluded
+by a preceding operation, it will be ignored by the succeeding
+operations.
+.PP
+An important distinction between the \fB\-g\fR option and the \fB\-1\fR and \fB\-m\fR
+options is that \fB\-g\fR requires the alignments to be mutually consistent
+in their order, while the \fB\-1\fR and \fB\-m\fR options are not required to be
+mutually consistent and therefore tolerate translocations,
+inversions, etc. In general cases, the \fB\-m\fR option is the best choice,
+however \fB\-1\fR can be handy for applications such as SNP finding which
+require a 1\-to\-1 mapping. Finally, for mapping query contigs, or
+sequencing reads, to a reference genome, use \fB\-q\fR. The duplications
+printed with the \fB\-b\fR option are \fB\-r\fR and \fB\-q\fR alignments that are not
+present in the 1\-to\-1 alignment. These alignments are also the
+difference between the \fB\-1\fR and \fB\-m\fR alignments
+.SH SEE ALSO
+The \fB\-b\fR option originates from mugsy that provides a code copy of mummer
+with additional patches.  The source can be found in SVN
+svn://svn.code.sf.net/p/mugsy/code/trunk

Modified: trunk/packages/mummer/trunk/debian/delta2blocks.1
===================================================================
--- trunk/packages/mummer/trunk/debian/delta2blocks.1	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/delta2blocks.1	2015-04-14 12:39:02 UTC (rev 19049)
@@ -12,17 +12,22 @@
 .RI [options]  <deltafile>  <ref ID>  <qry ID>
 .SH DESCRIPTION
 .TP
-\fB\-h\fR            Display help information
+\fB\-h\fR
+Display help information
 .TP
-\fB\-q\fR            Sort alignments by the query start coordinate
+\fB\-q\fR
+Sort alignments by the query start coordinate
 .TP
-\fB\-r\fR            Sort alignments by the reference start coordinate
+\fB\-r\fR
+Sort alignments by the reference start coordinate
 .TP
-\fB\-w\fR int        Set the screen width \- default is 60
+\fB\-w\fR int
+Set the screen width \- default is 60
 .TP
-\fB\-x\fR int        Set the matrix type \- default is 2 (BLOSUM 62),
-.PP
+\fB\-x\fR int
+Set the matrix type \- default is 2 (BLOSUM 62),
 other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
+.br
 note: only has effect on amino acid alignments
 .PP
 Input is the .delta output of either the "nucmer" or the

Modified: trunk/packages/mummer/trunk/debian/mummer.1
===================================================================
--- trunk/packages/mummer/trunk/debian/mummer.1	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/mummer.1	2015-04-14 12:39:02 UTC (rev 19049)
@@ -24,9 +24,6 @@
 .B combineMUMs
 .RI <RefSequence> <MatchSequences> <GapsFile>
 .br
-.B delta-filter
-.RI [options]  <deltafile>
-.br
 .B dnadiff
 .RI [options]  <reference>  <query>
 or
@@ -147,42 +144,6 @@
     \-\-version       Display the version information and exit
 
 .br
-.B delta-filter
-  \-e float    For switches \-g \-r \-q, keep repeats within e percent
-              of the best LIS score [0, 100], no repeats by default
-  \-g          Global alignment using length*identity weighted LIS.
-              For every reference-query pair, leave only the aligns
-              which form the longest mutually consistent set
-  \-h          Display help information
-  \-i float    Set the minimum alignment identity [0, 100], default 0
-  \-l int      Set the minimum alignment length, default 0
-  \-q          Query alignment using length*identity weighted LIS.
-              For each query, leave only the aligns which form the
-              longest consistent set for the query
-  \-r          Reference alignment using length*identity weighted LIS.
-              For each reference, leave only the aligns which form
-              the longest consistent set for the reference
-  \-u float    Set the minimum alignment uniqueness, i.e. percent of
-              the alignment matching to unique reference AND query
-              sequence [0, 100], default 0
-  \-o float    Set the maximum alignment overlap for \-r and \-q options
-              as a percent of the alignment length [0, 100], default 100
-.PP 
-  Reads a delta alignment file from either nucmer or promer and
-filters the alignments based on the command-line switches, leaving
-only the desired alignments which are output to stdout in the same
-delta format as the input. For multiple switches, order of operations
-is as follows: \-i \-l \-u \-q \-r \-g. If an alignment is excluded by a
-preceding operation, it will be ignored by the succeeding operations
-.PP
-  An important distinction between the \-g option and the \-r \-q
-options is that \-g requires the alignments to be mutually consistent
-in their order, while the \-r \-q options are not required to be
-mutually consistent and therefore tolerate translocations,
-inversions, etc. Thus, \-r provides a one-to-many, \-q a many-to-one,
-\-r \-q a one-to-one local mapping, and \-g a one-to-one global mapping
-of reference and query bases respectively.
-.br
 .B mapview
 .br
   \-h

Modified: trunk/packages/mummer/trunk/debian/mummer.links
===================================================================
--- trunk/packages/mummer/trunk/debian/mummer.links	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/mummer.links	2015-04-14 12:39:02 UTC (rev 19049)
@@ -1,7 +1,6 @@
 usr/share/man/man1/mummer.1 usr/share/man/man1/mummer-annotate.1
 usr/share/man/man1/mummer.1 usr/share/man/man1/combineMUMs.1
 usr/share/man/man1/mummer.1 usr/share/man/man1/dnadiff.1
-usr/share/man/man1/mummer.1 usr/share/man/man1/delta-filter.1
 usr/share/man/man1/mummer.1 usr/share/man/man1/exact-tandems.1
 usr/share/man/man1/mummer.1 usr/share/man/man1/gaps.1
 usr/share/man/man1/mummer.1 usr/share/man/man1/mapview.1

Modified: trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch	2015-04-14 12:39:02 UTC (rev 19049)
@@ -1,9 +1,9 @@
 Author: Andreas Tille <tille at debian.org>
 Last-Update: Mon, 13 Apr 2015 21:50:34 +0200
-Description: The tool mugsy provides a mummer code copy with an additional
- tool delta2maf.  Since the mummer copy in mummy does not feature all the
- mummer patches we rather inject the additional tool right into the Debian
- package.
+Description: The tool mugsy provides an old mummer code copy (version 3.20)
+ with a additional tools delta2maf and delta2blocks.
+ Since the mummer copy in mugsy does not feature all the mummer patches we
+ rather inject the additional tool right into the Debian package.
  .
  The source can be found in
     svn://svn.code.sf.net/p/mugsy/code/trunk

Added: trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch	                        (rev 0)
+++ trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch	2015-04-14 12:39:02 UTC (rev 19049)
@@ -0,0 +1,756 @@
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Mon, 13 Apr 2015 21:50:34 +0200
+Description: The tool mugsy provides an old mummer code copy (version 3.20)
+ with modifications to include delta-filter -b for reporting duplications.
+ Since the mummer copy in mugsy does not feature all the mummer patches we
+ rather inject the additional tool right into the Debian package.
+ .
+ The source can be found in
+    svn://svn.code.sf.net/p/mugsy/code/trunk
+
+--- a/src/tigr/delta.cc
++++ b/src/tigr/delta.cc
+@@ -34,48 +34,6 @@ struct LIS_t
+ };
+ 
+ 
+-struct EdgeletQCmp_t
+-//!< Compares query lo coord
+-{
+-  bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
+-  {
+-    //-- Sorting by score in the event of a tie ensures that when building
+-    //   LIS chains, the highest scoring ones get seen first, thus avoiding
+-    //   overlap problems
+-
+-    if ( i->loQ < j->loQ )
+-      return true;
+-    else if ( i->loQ > j->loQ )
+-      return false;
+-    else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) >
+-              ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) )
+-      return true;
+-    else
+-      return false;
+-  }
+-};
+-
+-
+-struct EdgeletRCmp_t
+-//!< Compares reference lo coord
+-{
+-  bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
+-  {
+-    //-- Sorting by score in the event of a tie ensures that when building
+-    //   LIS chains, the highest scoring ones get seen first, thus avoiding
+-    //   overlap problems
+-
+-    if ( i->loR < j->loR )
+-      return true;
+-    else if ( i->loR > j->loR )
+-      return false;
+-    else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) >
+-              ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) )
+-      return true;
+-    else
+-      return false;
+-  }
+-};
+ 
+ 
+ struct NULLPred_t
+@@ -122,18 +80,6 @@ inline long RevC (const long & coord,
+ }
+ 
+ 
+-//------------------------------------------------------------ ScoreLocal ------
+-inline long ScoreLocal
+-(long scorej, long leni, long lenj,
+- long olap, float idyi, float maxolap)
+-{
+-  if ( olap > 0  &&
+-       ((float)olap / (float)leni * 100.0 > maxolap  ||
+-	(float)olap / (float)lenj * 100.0 > maxolap) )
+-    return -1;
+-  else
+-    return (scorej + (long)((leni - olap) * pow (idyi, 2)));
+-}
+ 
+ 
+ //----------------------------------------------------------- ScoreGlobal ------
+@@ -450,7 +396,7 @@ void DeltaGraph_t::clean()
+   map<string, DeltaNode_t>::iterator i;
+   map<string, DeltaNode_t>::iterator ii;
+   vector<DeltaEdge_t *>::iterator j;
+-  vector<DeltaEdgelet_t *>::iterator k;
++  EdgeletType::iterator k;
+ 
+   //-- For all ref nodes
+   for ( i = refnodes.begin(); i != refnodes.end(); )
+@@ -601,7 +547,7 @@ void DeltaGraph_t::flagGOOD()
+ {
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+   //-- All references
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -625,12 +571,13 @@ void DeltaGraph_t::flagGOOD()
+ //! \brief Intersection of flagQLIS and flagRLIS
+ void DeltaGraph_t::flag1to1(float epsilon, float maxolap)
+ {
++  //std::cerr << "Starting LIS" << std::endl;
+   flagRLIS(epsilon, maxolap, false);
+   flagQLIS(epsilon, maxolap, false);
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+   //-- All references
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -640,16 +587,56 @@ void DeltaGraph_t::flag1to1(float epsilo
+             ei != (mi->second).edges.end(); ++ ei )
+         {
+           //-- All alignments between reference and query
++	  //SVA debugging
++	  //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl;
+           for ( eli  = (*ei)->edgelets.begin();
+                 eli != (*ei)->edgelets.end(); ++ eli )
+             {
+               if ( !(*eli)->isRLIS || !(*eli)->isQLIS )
+                 (*eli)->isGOOD = false;
+             }
++	  //std::cerr << "Done" << std::endl;
+         }
+     }
+ }
+ 
++//---------------------------------------------------------------- flagDup ----
++//! \brief Exclusive OR of flagQLIS and flagRLIS
++void DeltaGraph_t::flagDup(float epsilon, float maxolap)
++{
++  //std::cerr << "Starting LIS" << std::endl;
++  flagRLIS(epsilon, maxolap, false);
++  flagQLIS(epsilon, maxolap, false);
++
++  map<string, DeltaNode_t>::const_iterator mi;
++  vector<DeltaEdge_t *>::const_iterator ei;
++  EdgeletType::iterator eli;
++
++  //-- All references
++  for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
++    {
++      //-- All queries matching this reference
++      for ( ei  = (mi->second).edges.begin();
++            ei != (mi->second).edges.end(); ++ ei )
++        {
++          //-- All alignments between reference and query
++	  //SVA debugging
++	  //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl;
++          for ( eli  = (*ei)->edgelets.begin();
++                eli != (*ei)->edgelets.end(); ++ eli )
++            {
++              if ( (!(*eli)->isRLIS || !(*eli)->isQLIS)
++		   && ((*eli)->isRLIS || (*eli)->isQLIS)){
++		//(*eli)->isGOOD = true;
++	      }
++	      else{
++                (*eli)->isGOOD = false;
++	      }
++            }
++	  //std::cerr << "Done" << std::endl;
++        }
++    }
++}
+ 
+ //---------------------------------------------------------------- flagMtoM ----
+ //! \brief Union of flagQLIS and flagRLIS
+@@ -660,7 +647,7 @@ void DeltaGraph_t::flagMtoM(float epsilo
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+   //-- All references
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -700,11 +687,11 @@ void DeltaGraph_t::flagGLIS (float epsil
+   long i, j, n;
+   long olap, olapQ, olapR, len, lenQ, lenR, score, diff;
+ 
+-  vector<DeltaEdgelet_t *> edgelets;
++  EdgeletType edgelets;
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+ 
+   //-- For each reference sequence
+@@ -861,25 +848,29 @@ void DeltaGraph_t::flagQLIS (float epsil
+   long i, j, n;
+   long olap, leni, lenj, score, diff;
+ 
+-  vector<DeltaEdgelet_t *> edgelets;
++  EdgeletType edgelets;
++  edgelets.reserve(30000);
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
+-
++  EdgeletType::iterator eli;
++  EdgeletType::iterator  eli_end;
+ 
+   //-- For each query sequence
+   for ( mi = qrynodes.begin(); mi != qrynodes.end(); ++ mi )
+     {
+-      //-- Collect all the good edgelets
+-      edgelets.clear();
+       for ( ei  = (mi->second).edges.begin();
+             ei != (mi->second).edges.end(); ++ ei )
++	{
++	  //-- Collect all the good edgelets
++	  edgelets.clear();
++	  edgelets.reserve(30000);
++	  lis_size=0;
++	  eli_end = (*ei)->edgelets.end();
+         for ( eli  = (*ei)->edgelets.begin();
+-              eli != (*ei)->edgelets.end(); ++ eli )
++		eli != eli_end; ++ eli )
+           if ( (*eli)->isGOOD )
+             edgelets.push_back (*eli);
+-
+       //-- Resize and initialize
+       n = edgelets.size();
+       if ( n > lis_size )
+@@ -895,6 +886,7 @@ void DeltaGraph_t::flagQLIS (float epsil
+ 
+       //-- Continue until all equivalent repeats are extracted
+       vector<long> allbest;
++	  allbest.reserve(30000);
+       do
+         {
+           //-- Dynamic
+@@ -956,8 +948,9 @@ void DeltaGraph_t::flagQLIS (float epsil
+           if ( ! (*eli)->isQLIS )
+             (*eli)->isGOOD = false;
+     }
+-
++    }
+   free (lis);
++  
+ }
+ 
+ 
+@@ -982,25 +975,30 @@ void DeltaGraph_t::flagRLIS (float epsil
+   long i, j, n;
+   long olap, leni, lenj, score, diff;
+ 
+-  vector<DeltaEdgelet_t *> edgelets;
++  EdgeletType edgelets;
++  edgelets.reserve(30000);
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
+-
++  EdgeletType::iterator eli;
++  EdgeletType::iterator  eli_end;
+ 
+   //-- For each reference sequence
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+     {
+-      //-- Collect all the good edgelets
+-      edgelets.clear();
++      //-- For each query matching this reference
+       for ( ei  = (mi->second).edges.begin();
+             ei != (mi->second).edges.end(); ++ ei )
++	{
++	  //-- Collect all the good edgelets
++	  edgelets.clear();
++	  edgelets.reserve(30000);
++	  lis_size=0;
++	  eli_end = (*ei)->edgelets.end();
+         for ( eli  = (*ei)->edgelets.begin();
+-              eli != (*ei)->edgelets.end(); ++ eli )
++		eli != eli_end; ++ eli )
+           if ( (*eli)->isGOOD )
+             edgelets.push_back (*eli);
+-
+       //-- Resize
+       n = edgelets.size();
+       if ( n > lis_size )
+@@ -1016,6 +1014,7 @@ void DeltaGraph_t::flagRLIS (float epsil
+ 
+       //-- Continue until all equivalent repeats are extracted
+       vector<long> allbest;
++	  allbest.reserve(30000);
+       do
+         {
+           //-- Dynamic
+@@ -1072,11 +1071,14 @@ void DeltaGraph_t::flagRLIS (float epsil
+         for ( i = allbest[beg]; i >= 0  &&  i < n; i = lis[i].from )
+           lis[i].a->isRLIS = true;
+ 
+-      if ( flagbad )
+-        for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli )
++	  if ( flagbad ){
++	    eli_end = edgelets.end();
++	    for ( eli = edgelets.begin(); eli != eli_end; ++ eli )
+           if ( ! (*eli)->isRLIS )
+             (*eli)->isGOOD = false;
+     }
++	}
++    }
+ 
+   free (lis);
+ }
+@@ -1095,7 +1097,7 @@ void DeltaGraph_t::flagScore (long minle
+ {
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+     for ( ei  = (mi->second).edges.begin();
+@@ -1128,11 +1130,11 @@ void DeltaGraph_t::flagUNIQ (float minun
+ {
+   long i, uniq, len;
+ 
+-  vector<DeltaEdgelet_t *> edgelets;
++  EdgeletType edgelets;
+ 
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::iterator eli;
++  EdgeletType::iterator eli;
+ 
+ 
+   //-- For each reference sequence
+@@ -1141,6 +1143,10 @@ void DeltaGraph_t::flagUNIQ (float minun
+   unsigned char * ref_cov = NULL;
+   for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+     {
++      /*
++	//SVA treat each query, ref pair as distinct
++	//TODO make optional so can use this way for complete genomes
++	//and other way for draft
+       //-- Reset the reference coverage array
+       ref_len = (mi->second).len;
+       if ( ref_len > ref_size )
+@@ -1150,11 +1156,20 @@ void DeltaGraph_t::flagUNIQ (float minun
+         }
+       for ( i = 1; i <= ref_len; ++ i )
+         ref_cov[i] = 0;
+-
++      */
+       //-- Collect all the good edgelets
+       edgelets.clear();
+       for ( ei  = (mi->second).edges.begin();
+-            ei != (mi->second).edges.end(); ++ ei )
++            ei != (mi->second).edges.end(); ++ ei ){
++	ref_len = (mi->second).len;
++	if ( ref_len > ref_size )
++	  {
++	    ref_cov = (unsigned char *) Safe_realloc (ref_cov, ref_len + 1);
++	    ref_size = ref_len;
++	  }
++	for ( i = 1; i <= ref_len; ++ i )
++	  ref_cov[i] = 0;
++	
+         for ( eli  = (*ei)->edgelets.begin();
+               eli != (*ei)->edgelets.end(); ++ eli )
+           if ( (*eli)->isGOOD )
+@@ -1166,7 +1181,7 @@ void DeltaGraph_t::flagUNIQ (float minun
+                 if ( ref_cov[i] < UCHAR_MAX )
+                   ref_cov[i] ++;
+             }
+-
++      }
+       //-- Calculate the uniqueness of each edgelet
+       for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli )
+         {
+@@ -1312,16 +1327,17 @@ void DeltaGraph_t::loadSequences ()
+ //! \brief Outputs the contents of the graph as a deltafile
+ //!
+ //! \param out The output stream to write to
++//! \param inverse Print the discarded alignments, !isGood
+ //! \return The output stream
+ //!
+-ostream & DeltaGraph_t::outputDelta (ostream & out)
++ostream & DeltaGraph_t::outputDelta (ostream & out, bool inverse)
+ {
+   bool header;
+   long s1, e1, s2, e2;
+  
+   map<string, DeltaNode_t>::const_iterator mi;
+   vector<DeltaEdge_t *>::const_iterator ei;
+-  vector<DeltaEdgelet_t *>::const_iterator eli;
++  EdgeletType::const_iterator eli;
+  
+   //-- Print the file header
+   cout
+@@ -1338,9 +1354,40 @@ ostream & DeltaGraph_t::outputDelta (ost
+           for ( eli  = (*ei)->edgelets.begin();
+                 eli != (*ei)->edgelets.end(); ++ eli )
+             {
+-              if ( ! (*eli)->isGOOD )
+-                continue;
++              if ( ! (*eli)->isGOOD){
++		//Check if printing discard alns
++		if(inverse){
++		  //-- Print the sequence header
++		  if ( ! header )
++		    {
++		      cout
++			<< '>'
++			<< *((*ei)->refnode->id) << ' '
++			<< *((*ei)->qrynode->id) << ' '
++			<< (*ei)->refnode->len << ' '
++			<< (*ei)->qrynode->len << '\n';
++		      header = true;
++		    }
++		  //-- Print the alignment
++		  s1 = (*eli)->loR;
++		  e1 = (*eli)->hiR;
++		  s2 = (*eli)->loQ;
++		  e2 = (*eli)->hiQ;
++		  if ( (*eli)->dirR == REVERSE_DIR )
++		    Swap (s1, e1);
++		  if ( (*eli)->dirQ == REVERSE_DIR )
++		    Swap (s2, e2);
+  
++		  cout
++		    << s1 << ' ' << e1 << ' ' << s2 << ' ' << e2 << ' '
++		    << (*eli)->idyc << ' '
++		    << (*eli)->simc << ' '
++		    << (*eli)->stpc << '\n'
++		    << (*eli)->delta;
++		}
++	      }
++	      else{
++		if(!inverse){
+               //-- Print the sequence header
+               if ( ! header )
+                 {
+@@ -1371,5 +1418,7 @@ ostream & DeltaGraph_t::outputDelta (ost
+             }
+         }
+     }
++	}
++    }
+   return out;
+ }
+--- a/src/tigr/delta-filter.cc
++++ b/src/tigr/delta-filter.cc
+@@ -38,6 +38,8 @@ float          OPT_MinIdentity  = 0.0;
+ float          OPT_MinUnique    = 0.0;       // minimum %unique
+ float          OPT_MaxOverlap   = 100.0;     // maximum olap as % of align len
+ float          OPT_Epsilon      = -1.0;      // negligible alignment score
++float          OPT_Inverse      = false;      // output the discarded alignments, !isGood
++float          OPT_Dup          = false;      // output apparent dups
+ 
+ 
+ //========================================================== Fuction Decs ====//
+@@ -94,8 +96,12 @@ int main(int argc, char ** argv)
+   if ( OPT_1to1 )
+     graph.flag1to1(OPT_Epsilon, OPT_MaxOverlap);
+ 
++  //-- Dups
++  if ( OPT_Dup )
++    graph.flagDup(OPT_Epsilon, OPT_MaxOverlap);
++
+   //-- Output the filtered delta file
+-  graph.outputDelta(cout);
++  graph.outputDelta(cout,OPT_Inverse);
+ 
+   return EXIT_SUCCESS;
+ }
+@@ -110,7 +116,7 @@ void ParseArgs(int argc, char ** argv)
+   optarg = NULL;
+   
+   while ( !errflg  &&
+-         ((ch = getopt(argc, argv, "e:ghi:l:o:qru:m1")) != EOF) )
++         ((ch = getopt(argc, argv, "bve:ghi:l:o:qru:m1")) != EOF) )
+     switch (ch)
+       {
+       case 'e':
+@@ -158,6 +164,14 @@ void ParseArgs(int argc, char ** argv)
+         OPT_1to1 = true;
+         break;
+ 
++      case 'v':
++	OPT_Inverse = true;
++	break;
++
++      case 'b':
++	OPT_Dup = true;
++	break;
++
+       default:
+         errflg ++;
+       }
+@@ -225,6 +239,9 @@ void PrintHelp(const char * s)
+     << "-o float      Set the maximum alignment overlap for -r and -q options\n"
+     << "              as a percent of the alignment length [0, 100], default "
+     << OPT_MaxOverlap << endl
++    << "-v            Print the discarded alignments instead of those that pass filters\n"
++    << "-b            Maps duplications\n"
++    << "              (XOR of -r and -q alignments, one or the other but not both)\n"
+     << endl;
+ 
+   cerr
+@@ -232,7 +249,7 @@ void PrintHelp(const char * s)
+     << "filters the alignments based on the command-line switches, leaving\n"
+     << "only the desired alignments which are output to stdout in the same\n"
+     << "delta format as the input. For multiple switches, order of operations\n"
+-    << "is as follows: -i -l -u -q -r -g -m -1. If an alignment is excluded\n"
++    << "is as follows: -i -l -u -q -r -g -m -1 -b. If an alignment is excluded\n"
+     << "by a preceding operation, it will be ignored by the succeeding\n"
+     << "operations.\n"
+     << "  An important distinction between the -g option and the -1 and -m\n"
+@@ -242,7 +259,10 @@ void PrintHelp(const char * s)
+     << "inversions, etc. In general cases, the -m option is the best choice,\n"
+     << "however -1 can be handy for applications such as SNP finding which\n"
+     << "require a 1-to-1 mapping. Finally, for mapping query contigs, or\n"
+-    << "sequencing reads, to a reference genome, use -q.\n"
++    << "sequencing reads, to a reference genome, use -q. The duplications\n"
++    << "printed with the -b option are -r and -q alignments that are not\n"
++    << "present in the 1-to-1 alignment. These alignments are also the\n"
++    << "difference between the -1 and -m alignments\n"
+     << endl;
+ 
+   return;
+--- a/src/tigr/delta.hh
++++ b/src/tigr/delta.hh
+@@ -350,6 +350,11 @@ struct DeltaEdgelet_t
+   long loQ, hiQ, loR, hiR;    //!< alignment bounds
+   int frmQ, frmR;             //!< reading frame
+ 
++  unsigned int v1;
++  unsigned int v2;
++  void * SEQAN_idx;
++  // SEQAN_idx; //mapping to edge/list of vertices in segment graph in Seqan library. Testing by SVA
++
+   std::string delta;          //!< delta information
+   std::vector<SNP_t *> snps;  //!< snps for this edgelet
+ 
+@@ -395,6 +400,7 @@ struct DeltaEdgelet_t
+ };
+ 
+ 
++typedef std::vector<DeltaEdgelet_t *> EdgeletType;
+ 
+ //===================================================== DeltaEdge_t ============
+ struct DeltaEdge_t
+@@ -402,14 +408,14 @@ struct DeltaEdge_t
+ {
+   DeltaNode_t * refnode;      //!< the adjacent reference node
+   DeltaNode_t * qrynode;      //!< the adjacent query node
+-  std::vector<DeltaEdgelet_t *> edgelets;  //!< the set of individual alignments
++  EdgeletType edgelets;  //!< the set of individual alignments
+ 
+   DeltaEdge_t ( )
+   { refnode = qrynode = NULL; }
+ 
+   ~DeltaEdge_t ( )
+   {
+-    std::vector<DeltaEdgelet_t *>::iterator i;
++    EdgeletType::iterator i;
+     for ( i = edgelets . begin( ); i != edgelets . end( ); ++ i )
+       delete (*i);
+   }
+@@ -477,6 +483,7 @@ public:
+ 
+   void flagGOOD();
+   void flag1to1(float epsilon = -1, float maxolap = 100.0);
++  void flagDup(float epsilon = -1, float maxolap = 100.0);
+   void flagMtoM(float epsilon = -1, float maxolap = 100.0);
+   void flagGLIS(float epsilon = -1);
+   void flagQLIS(float epsilon = -1,
+@@ -489,7 +496,64 @@ public:
+   void flagUNIQ(float minuniq);
+ 
+   void loadSequences();
+-  std::ostream & outputDelta(std::ostream & out);
++  std::ostream & outputDelta(std::ostream & out, bool inverse);
++};
++
++//------------------------------------------------------------ ScoreLocal ------
++inline long ScoreLocal
++(long scorej, long leni, long lenj,
++ long olap, float idyi, float maxolap)
++{
++  if ( olap > 0  &&
++       ((float)olap / (float)leni * 100.0 > maxolap  ||
++	(float)olap / (float)lenj * 100.0 > maxolap) )
++    return -1;
++  else
++    //return (scorej + (long)((leni - olap) * pow (idyi, 2)));
++    return (scorej + (long)((leni - olap) * (idyi*idyi)));
++}
++
++struct EdgeletQCmp_t
++//!< Compares query lo coord
++{
++  bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
++  {
++    //-- Sorting by score in the event of a tie ensures that when building
++    //   LIS chains, the highest scoring ones get seen first, thus avoiding
++    //   overlap problems
++
++    if ( i->loQ < j->loQ )
++      return true;
++    else if ( i->loQ > j->loQ )
++      return false;
++    else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) >
++              ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) )
++      return true;
++    else
++      return false;
++  }
++};
++
++
++struct EdgeletRCmp_t
++//!< Compares reference lo coord
++{
++  bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
++  {
++    //-- Sorting by score in the event of a tie ensures that when building
++    //   LIS chains, the highest scoring ones get seen first, thus avoiding
++    //   overlap problems
++
++    if ( i->loR < j->loR )
++      return true;
++    else if ( i->loR > j->loR )
++      return false;
++    else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) >
++              ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) )
++      return true;
++    else
++      return false;
++  }
+ };
+ 
+ #endif // #ifndef __DELTA_HH
+--- a/src/tigr/postnuc.cc
++++ b/src/tigr/postnuc.cc
+@@ -148,7 +148,7 @@ void extendClusters
+ void flushAlignments
+      (vector<Alignment> & Alignments,
+       const FastaRecord * Af, const FastaRecord * Bf,
+-      FILE * DeltaFile);
++      FILE * DeltaFile, int filter);
+ 
+ void flushSyntenys
+      (vector<Synteny> & Syntenys, FILE * ClusterFile);
+@@ -174,7 +174,7 @@ void parseDelta
+ void processSyntenys
+      (vector<Synteny> & Syntenys,
+       FastaRecord * Af, long int As,
+-      FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile);
++      FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter);
+ 
+ inline long int revC
+      (long int Coord, long int Len);
+@@ -232,6 +232,8 @@ int main
+   setBreakLen ( 200 );
+   setBanding ( 0 );
+ 
++  int FILTER=0; //perform automatic delta-filter. SVA. trying to save time/space on large alignments
++
+   //-- Parse the command line arguments
+   {
+     optarg = NULL;
+@@ -268,6 +270,9 @@ int main
+ 	  TO_SEQEND = true;
+ 	  break;
+ 
++	case 'f' :
++	  FILTER = 1;
++
+ 	default :
+ 	  errflg ++;
+ 	}
+@@ -420,7 +425,7 @@ int main
+ 		{
+ 		  //-- New B sequence header, process all the old synteny's
+ 		  processSyntenys (Syntenys, Af, As,
+-				   QryFile, ClusterFile, DeltaFile);
++				   QryFile, ClusterFile, DeltaFile, FILTER);
+ 		}
+ 	      
+ 	      strcpy (IdA, Af[Seqi].Id);
+@@ -476,7 +481,7 @@ int main
+     if ( CurrSp->clusters.rbegin( )->matches.empty( ) )
+       CurrSp->clusters.pop_back( );
+   
+-  processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile);
++  processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile, FILTER);
+   fclose (QryFile);
+ 
+   //-- Free the reference sequences
+@@ -703,7 +708,7 @@ bool extendForward
+ 
+ void extendClusters
+      (vector<Cluster> & Clusters,
+-      const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile)
++      const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile, int filter)
+ 
+      //  Connect all the matches in every cluster between sequences A and B.
+      //  Also, extend alignments off of the front and back of each cluster to
+@@ -864,7 +869,7 @@ void extendClusters
+ #endif
+ 
+   //-- Output the alignment data to the delta file
+-  flushAlignments (Alignments, Af, Bf, DeltaFile);
++  flushAlignments (Alignments, Af, Bf, DeltaFile, filter);
+ 
+   if ( Brev != NULL )
+     free (Brev);
+@@ -878,7 +883,7 @@ void extendClusters
+ void flushAlignments
+      (vector<Alignment> & Alignments,
+       const FastaRecord * Af, const FastaRecord * Bf,
+-      FILE * DeltaFile)
++      FILE * DeltaFile, int filter)
+ 
+      //  Simply output the delta information stored in Alignments to the
+      //  given delta file. Free the memory used by Alignments once the
+@@ -887,6 +892,11 @@ void flushAlignments
+ {
+   vector<Alignment>::iterator Ap;       // alignment pointer
+   vector<long int>::iterator Dp;             // delta pointer
++  //filter graph here
++
++  //DeltaGraph_t graph;
++  //srand(1);
++  //graph.flagMtoM(OPT_Epsilon, OPT_MaxOverlap);
+ 
+   fprintf (DeltaFile, ">%s %s %ld %ld\n", Af->Id, Bf->Id, Af->len, Bf->len);
+ 
+@@ -1348,7 +1358,7 @@ void parseDelta
+ 
+ void processSyntenys
+      (vector<Synteny> & Syntenys, FastaRecord * Af, long int As,
+-      FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile)
++      FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter)
+ 
+      //  For each syntenic region with clusters, read in the B sequence and
+      //  extend the clusters to expand total alignment coverage. Only should
+@@ -1393,7 +1403,7 @@ void processSyntenys
+ 
+       //-- Extend clusters and create the alignment information
+       CurrSp->Bf.len = Bf.len;
+-      extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile);
++      extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile,filter);
+     }
+ 
+   //-- Create the cluster information

Modified: trunk/packages/mummer/trunk/debian/patches/series
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/series	2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/patches/series	2015-04-14 12:39:02 UTC (rev 19049)
@@ -5,3 +5,4 @@
 hardening.patch
 spelling.patch
 addition_from_mugsy.patch
+addition_from_report_duplicates.patch




More information about the debian-med-commit mailing list