[med-svn] [Git][med-team/e-mem][upstream] New upstream version 1.0.1

Andreas Tille gitlab at salsa.debian.org
Thu Feb 15 15:48:38 UTC 2018


Andreas Tille pushed to branch upstream at Debian Med / e-mem


Commits:
56f3e8c3 by Andreas Tille at 2018-02-15T16:41:55+01:00
New upstream version 1.0.1
- - - - -


4 changed files:

- Makefile
- README.md
- e-mem.cpp
- file.h


Changes:

=====================================
Makefile
=====================================
--- a/Makefile
+++ b/Makefile
@@ -5,8 +5,8 @@ endif
 CC        = g++
 EXEC      = e-mem
 CFLAGS    = -Wall -Wextra -Wunused -mpopcnt -std=gnu++0x -fopenmp 
-CDEBUG    = -g -ggdb -DDEBUG 
-CPROF    = -g -ggdb -DDEBUG -pg 
+CDEBUG    = -g -ggdb -gdwarf-3 -DDEBUG 
+CPROF    = -g -ggdb -gdwarf-3 -DDEBUG -pg 
 COPTIMIZE = -Wuninitialized -O3 -fomit-frame-pointer
 CLIBS     = -lm
 


=====================================
README.md
=====================================
--- a/README.md
+++ b/README.md
@@ -15,8 +15,30 @@ E-MEM is an efficient MEM computation program for large genomes which can be use
 
    OUTPUT:
    
-                 stdout  a list of exact matches
-
+   The E-MEM program prints maximal exact matches on standard output. The output format varies depending on the command line options used. In the default mode 3-column output is printed. The option -b allows printing of matches in both forward and reverse complement directions. Below is the description of the output format with respect to the example included with e-mem program. For each query sequence, the sequence name or ID is reported on the first line followed by a '>' character. The sequence name will be reported even if there are no MEMs found for this sequence. For example, the query sequence in 'Reverse' reports sequence name with no matching MEMs. Note that, for each query sequence, the reverse complemented MEMs immediately follow the forward MEMs. For each match, the 3-columns list the start position in the reference sequence, the start position in the query sequence, and the length of the match respectively. For reverse matches, the positions are reported relative to the reverse query sequence. The option -c is used to report reverse match positions with respect to forward query sequence.
+   
+             > gi|5524211|gb|AAD44166.1|
+               4              1             51
+               8              1             51
+              12              1             51
+              16              1             51
+              20              1             51
+              24              1             51
+              28              1             51
+               1              2             50
+          > gi|5524211|gb|AAD44166.1| Reverse
+                    
+The 3-column output is sufficient for reporting matches between one reference sequence and one/more query sequences. For multiple reference sequences, 4-column output is desired. E-MEM provides an option -F to print 4-column output which also prints the reference sequence name/ID for each match. Below is a dummy example of 4-column output. The first column now indicates the name/ID of the reference sequence followed by the 3-column output format as described above.
+
+          > gi|5524211|gb|AAD44166.1|
+             ref1                      4              1              51
+             ref1                      8              1              51
+             ref2                      12             1              51
+             ref2                      16             1              51
+             ref3                      20             1              51
+             ref3                      24             1              51
+                          
+             
 ##-- INSTALLATION --
 
 After extracting the files into the desired installation directory,
@@ -69,7 +91,7 @@ For example, in order to use NUCMER (all-vs-all comparison of nucleotide sequenc
 
 The other important script in MUmmer3 is run-mummer3 (the alignment program). To use this script with E-MEM, simply replace "$bindir/mummer" with "< path >/e-mem" where < path > is e-mem installation directory.
 
-###CITE
+## CITE
 If you find E-MEM program useful, please cite the E-MEM paper:
 
 N. Khiste, L. Ilie [E-MEM: efficient computation of maximal exact matches for very large genomes](http://bioinformatics.oxfordjournals.org/content/31/4/509.short) Bioinformatics, 2015


=====================================
e-mem.cpp
=====================================
--- a/e-mem.cpp
+++ b/e-mem.cpp
@@ -234,6 +234,14 @@ void helperReportMem(uint64_t &currRPos, uint64_t &currQPos, uint64_t totalRBits
                 rQue-=2;
                 break;
             }
+            /* if current rRef/rQue plus matchSize smaller than minMEMLength, then simply return.
+             * Note that one less character is compared due to a mismatch 
+             */
+            if (rRef+matchSize-lRef < static_cast<uint64_t>(commonData::minMemLen))
+            {
+                return;
+            }
+
             mismatch=1;
             matchSize/=2;
             if (matchSize%2)
@@ -488,7 +496,7 @@ void checkCommandLineOptions(uint32_t &options)
 void print_help_msg()
 {
     cout <<  endl;
-    cout << "E-MEM Version 1.0.0, Sep. 25, 2014" << endl;
+    cout << "E-MEM Version 1.0.1, Dec. 12, 2017" << endl;
     cout << "© 2014 Nilesh Khiste, Lucian Ilie" << endl;
     cout <<  endl;
     cout << "E-MEM finds and outputs the position and length of all maximal" << endl;
@@ -739,6 +747,7 @@ int main (int argc, char *argv[])
     QueryFile.closeFile();
 
     arrayTmpFile.removeDuplicates(refSeqInfo, querySeqInfo, revComplement);
+    fflush(0);
     return 0;
 }
 


=====================================
file.h
=====================================
--- a/file.h
+++ b/file.h
@@ -278,7 +278,7 @@ class seqFileReadInfo {
       {
           size = size/commonData::d;
           binReadSize = floor((size+numSequences*RANDOM_SEQ_SIZE+commonData::d)/32+4);
-          binReads = new uint64_t[binReadSize];
+          binReads = new uint64_t[binReadSize+1];
           return size;
       }
   
@@ -495,7 +495,7 @@ class seqFileReadInfo {
 
       void writeReverseComplementString(string &name, string &content, fstream &file)
       {
-          file << ">" << name << endl;
+          file << ">" << name << "\n";
           flipNswap(content);
           file << content ;
       }
@@ -747,15 +747,15 @@ class tmpFilesInfo {
     {
         if (revComplement & 0x1){
             if (commonData::lenInHeader) {
-                cout << "> " << (*itQ).seq << " Reverse" << " Len = " << ((*itQ).end-(*itQ).start+2)/2 << endl;
+                cout << "> " << (*itQ).seq << " Reverse" << " Len = " << ((*itQ).end-(*itQ).start+2)/2 << "\n";
             }else{
-                cout << "> " << (*itQ).seq << " Reverse" << endl;
+                cout << "> " << (*itQ).seq << " Reverse" << "\n";
             }
         }else{
             if (commonData::lenInHeader){
-                cout << "> " << (*itQ).seq << " Len = " << ((*itQ).end-(*itQ).start+2)/2 << endl;
+                cout << "> " << (*itQ).seq << " Len = " << ((*itQ).end-(*itQ).start+2)/2 << "\n";
             }else{
-                cout << "> " << (*itQ).seq << endl;
+                cout << "> " << (*itQ).seq << "\n";
             }
         }
     }
@@ -873,14 +873,14 @@ class tmpFilesInfo {
         if (rRef-lRef+2 >= static_cast<uint64_t>(commonData::minMemLen)){
            if (refSeqInfo.size() == 1 && !commonData::fourColOutput) {
                if ((revComplement & 0x1) && commonData::relQueryPos)
-                   cout << " " << setw(15) << ((lRef+2)/2) <<  setw(15) << ((*itQ).end-(*itQ).start-lQue+2)/2 << setw(15) << ((rRef-lRef+2)/2) << endl;
+                   cout << " " << setw(15) << ((lRef+2)/2) <<  setw(15) << ((*itQ).end-(*itQ).start-lQue+2)/2 << setw(15) << ((rRef-lRef+2)/2) << "\n";
                else
-                   cout << " " << setw(15) << ((lRef+2)/2) <<  setw(15) << ((lQue+2)/2) << setw(15) << ((rRef-lRef+2)/2) << endl;
+                   cout << " " << setw(15) << ((lRef+2)/2) <<  setw(15) << ((lQue+2)/2) << setw(15) << ((rRef-lRef+2)/2) << "\n";
            }else{
                if ((revComplement & 0x1) && commonData::relQueryPos) {
-                   cout << " " << setw(30) << std::left <<(*itR).seq << setw(15) << ((lRef+2)/2) <<  setw(15) << ((*itQ).end-(*itQ).start-lQue+2)/2 << setw(15) << ((rRef-lRef+2)/2) << endl;
+                   cout << " " << setw(30) << std::left <<(*itR).seq << setw(15) << ((lRef+2)/2) <<  setw(15) << ((*itQ).end-(*itQ).start-lQue+2)/2 << setw(15) << ((rRef-lRef+2)/2) << "\n";
                }else{
-                   cout << " " << setw(30) << std::left <<(*itR).seq << setw(15) << ((lRef+2)/2) <<  setw(15) << ((lQue+2)/2) << setw(15) << ((rRef-lRef+2)/2) << endl;
+                   cout << " " << setw(30) << std::left <<(*itR).seq << setw(15) << ((lRef+2)/2) <<  setw(15) << ((lQue+2)/2) << setw(15) << ((rRef-lRef+2)/2) << "\n";
                }
            }
         }
@@ -959,31 +959,31 @@ class tmpFilesInfo {
 
         filePtr = forFile;
         if(getline((*filePtr), line).good()) 
-            cout << line << endl;
+            cout << line << "\n";
 
         while(getline((*filePtr), line).good()) {
             if(line[0] == '>'){
                 if (last_line.size())
-                    cout << last_line << endl;
+                    cout << last_line << "\n";
                 last_line = line;
                 if (filePtr == forFile) {
                     filePtr = revFile;
                     if (first) {
                         if(getline((*filePtr), line).good()) 
-                            cout << line << endl;
+                            cout << line << "\n";
                         first=0;
                     }
                 }else
                     filePtr = forFile;
                 continue;
             }
-            cout << line << endl;
+            cout << line << "\n";
         }
 
-        cout << last_line << endl;
+        cout << last_line << "\n";
         filePtr = revFile;
         while(getline((*filePtr), line).good()) 
-            cout << line << endl;
+            cout << line << "\n";
    
         (*revFile).close();
         (*forFile).close();



View it on GitLab: https://salsa.debian.org/med-team/e-mem/commit/56f3e8c320137d7b5f4dc422a99679024fbaed7d

---
View it on GitLab: https://salsa.debian.org/med-team/e-mem/commit/56f3e8c320137d7b5f4dc422a99679024fbaed7d
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debian-med-commit/attachments/20180215/8e77b094/attachment-0001.html>


More information about the debian-med-commit mailing list