[med-svn] r1573 - in trunk/packages: . meme meme/trunk meme/trunk/debian meme/trunk/debian/patches

tille at alioth.debian.org tille at alioth.debian.org
Thu Mar 13 10:30:36 UTC 2008


Author: tille
Date: 2008-03-13 10:30:35 +0000 (Thu, 13 Mar 2008)
New Revision: 1573

Added:
   trunk/packages/meme/
   trunk/packages/meme/trunk/
   trunk/packages/meme/trunk/README
   trunk/packages/meme/trunk/debian/
   trunk/packages/meme/trunk/debian/README.Debian
   trunk/packages/meme/trunk/debian/changelog
   trunk/packages/meme/trunk/debian/compat
   trunk/packages/meme/trunk/debian/control
   trunk/packages/meme/trunk/debian/copyright
   trunk/packages/meme/trunk/debian/dirs
   trunk/packages/meme/trunk/debian/docs
   trunk/packages/meme/trunk/debian/mast_manual.txt
   trunk/packages/meme/trunk/debian/meme_manual.txt
   trunk/packages/meme/trunk/debian/patches/
   trunk/packages/meme/trunk/debian/patches/xyz.diff
   trunk/packages/meme/trunk/debian/rules
   trunk/packages/meme/trunk/debian/watch
Log:
Moved Steffen M?\195?\182llers work which was stalled to svn to keep track of previous work.


Added: trunk/packages/meme/trunk/README
===================================================================
--- trunk/packages/meme/trunk/README	                        (rev 0)
+++ trunk/packages/meme/trunk/README	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,3 @@
+Work on this project was started by Steffen Möller but has stalled.
+Feel free to take it over.
+

Added: trunk/packages/meme/trunk/debian/README.Debian
===================================================================
--- trunk/packages/meme/trunk/debian/README.Debian	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/README.Debian	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,8 @@
+meme for Debian
+---------------
+
+This package lacks man pages.
+
+In its current version only the binaries of meme and mast are created. The offered parallelism is not utilised and the Makefile was crippled to facilitate the packaging of this software. To improve on all these issues (and others I am likely to have neglected) I'd appreciate help for.
+
+ -- Steffen Moeller <moeller at pzr.uni-rostock.de>, Thu, 20 Jan 2005 02:36:49 +0100

Added: trunk/packages/meme/trunk/debian/changelog
===================================================================
--- trunk/packages/meme/trunk/debian/changelog	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/changelog	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,6 @@
+meme (3.0.13-1) unstable; urgency=low
+
+  * Initial Release.
+
+ -- Steffen Moeller <moeller at pzr.uni-rostock.de>  Thu, 20 Jan 2005 02:36:49 +0100
+

Added: trunk/packages/meme/trunk/debian/compat
===================================================================
--- trunk/packages/meme/trunk/debian/compat	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/compat	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1 @@
+4

Added: trunk/packages/meme/trunk/debian/control
===================================================================
--- trunk/packages/meme/trunk/debian/control	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/control	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,24 @@
+Source: meme
+Section: non-free/science
+Priority: optional
+Maintainer: Steffen Moeller <moeller at pzr.uni-rostock.de>
+Build-Depends: debhelper (>= 4.0.0)
+Standards-Version: 3.6.1.1
+
+Package: meme
+Architecture: any
+Depends: ${shlibs:Depends}, ${misc:Depends}, csh
+Description: [Biology] search for common motifs in DNA or protein sequences
+ MEME (Multiple EM for Motif Elicitation) discovers motifs in sequences.
+ .
+ A motif is a sequence pattern that occurs repeatedly in a group
+ of related protein or DNA sequences. Motifs are represented as
+ position-dependent scoring matrices that describe the score of each
+ possible letter at each position in the pattern. Individual motifs may
+ not contain gaps but combinations of patterns may be elucidated.
+ .
+ The output of MEME may be forwarded to the program MAST of this package
+ for the search in sequence databases.
+ .
+  Homepage: http://meme.sdsc.edu
+

Added: trunk/packages/meme/trunk/debian/copyright
===================================================================
--- trunk/packages/meme/trunk/debian/copyright	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/copyright	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,42 @@
+This package was debianized by Steffen Moeller <moeller at pzr.uni-rostock.de> on
+Thu, 20 Jan 2005 02:36:49 +0100.
+
+It was downloaded from ftp://ftp.sdsc.edu/pub/sdsc/biology/meme/
+
+Copyright:
+
+Upstream Author: Tim Bailey <tbailey at sdsc.edu>
+
+License:
+
+*	Copyright							*
+*	(1994-2000) The Regents of the University of California.	*
+*	All Rights Reserved.						*
+*									*
+*	Permission to use, copy, modify, and distribute any part of 	*
+*	this software for educational, research and non-profit purposes,*
+*	without fee, and without a written agreement is hereby granted, *
+*	provided that the above copyright notice, this paragraph and 	*
+*	the following three paragraphs appear in all copies.		*
+*									*
+*	Those desiring to incorporate this software into commercial 	*
+*	products or use for commercial purposes should contact the 	*
+*	Technology Transfer Office, University of California, San Diego,*
+*	9500 Gilman Drive, La Jolla, California, 92093-0910, 		*
+*	Ph: (858) 534 5815.						*
+*									*
+*	IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO 	*
+*	ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR 	*
+*	CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF 	*
+*	THE USE OF THIS SOFTWARE, EVEN IF THE UNIVERSITY OF CALIFORNIA 	*
+*	HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 		*
+*									*
+*	THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE *
+*	UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE 		*
+*	MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.  *
+*	THE UNIVERSITY OF CALIFORNIA MAKES NO REPRESENTATIONS AND 	*
+*	EXTENDS NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, *
+*	INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 	*
+*	MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, OR THAT 	*
+*	THE USE OF THE MATERIAL WILL NOT INFRINGE ANY PATENT, 		*
+*	TRADEMARK OR OTHER RIGHTS.  					*

Added: trunk/packages/meme/trunk/debian/dirs
===================================================================
--- trunk/packages/meme/trunk/debian/dirs	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/dirs	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,2 @@
+usr/bin
+usr/share/meme/bin

Added: trunk/packages/meme/trunk/debian/docs
===================================================================
--- trunk/packages/meme/trunk/debian/docs	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/docs	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,4 @@
+README
+WEBSITE
+debian/mast_manual.txt
+debian/meme_manual.txt

Added: trunk/packages/meme/trunk/debian/mast_manual.txt
===================================================================
--- trunk/packages/meme/trunk/debian/mast_manual.txt	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/mast_manual.txt	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,413 @@
+USAGE:
+	mast &lt;mfile&gt; [optional arguments ...]
+
+	&lt;mfile&gt;		file containing motifs to use; may be a MEME output
+			file or a file with the format given below 
+	[&lt;database&gt;] 	or 
+	[-d &lt;database&gt;] database to search with motifs or
+	[-stdin]	read database from standard input; 
+			Default: reads database specified inside &lt;mfile&gt;
+	[-c &lt;count&gt;]	only use the first &lt;count&gt; motifs
+	[-a &lt;alphabet&gt;]	&lt;mfile&gt; is assumed to contain motifs in the
+			format output by bin/make_logodds
+			and &lt;alphabet&gt; is their alphabet; -d &lt;database&gt;
+			or -stdin must be specified when this option is used
+	[-stdout]	print output to standard output instead of file
+	[-text]		output in text (ASCII) format;
+			(default: hypertext (HTML) format)
+	
+	[-sep]		score reverse complement DNA strand as a separate 
+			sequence
+	[-norc]		do not score reverse complement DNA strand
+	[-dna]		translate DNA sequences to protein
+	[-comp]		adjust p-values and E-values for sequence composition
+	[-rank &lt;rank&gt;]	print results starting with &lt;rank&gt; best (default: 1)
+	[-smax &lt;smax&gt;]	print results for no more than &lt;smax&gt; sequences
+			(default: all)
+	[-ev &lt;ev&gt;]	print results for sequences with E-value &lt; &lt;ev&gt;
+			(default: 10)
+	[-mt &lt;mt&gt;]	show motif matches with p-value &lt; mt (default: 0.0001)
+	[-w]		show weak matches (mt&lt;p-value&lt;mt*10) in angle brackets
+	[-bfile &lt;bfile&gt;]	read background frequencies from &lt;bfile&gt;
+	[-seqp]		use SEQUENCE p-values for motif thresholds
+			(default: use POSITION p-values)
+	[-mf &lt;mf&gt;]	print &lt;mf&gt; as motif file name
+	[-df &lt;df&gt;]	print &lt;df&gt; as database name
+	[-minseqs &lt;minseqs&gt;]	lower bound on number of sequences in db
+	[-mev &lt;mev&gt;]+	use only motifs with E-values less than &lt;mev&gt;
+	[-m &lt;m&gt;]+	use only motif(s) number &lt;m&gt; (overrides -mev)
+	[-diag &lt;diag&gt;]	nominal order and spacing of motifs
+	[-best]		include only the best motif in diagrams
+	[-remcorr]	remove highly correlated motifs from query
+	[-brief]	brief output--do not print documentation
+	[-b]		print only sections I and II
+	[-nostatus]	do not print progress report
+	[-hit_list]	print hit_list instead of diagram; implies -text
+
+  
+  MAST: Motif Alignment and Search Tool
+  
+  MAST is a tool for searching biological sequence databases for sequences
+  that contain one or more of a group of known motifs. 
+  
+  A motif is a sequence pattern that occurs repeatedly in a group of related
+  protein or DNA sequences. Motifs are represented as position-dependent
+  scoring matrices that describe the score of each possible letter at each
+  position in the pattern. Individual motifs may not contain gaps. Patterns with
+  variable-length gaps must be split into two or more separate motifs before
+  being submitted as input to MAST. 
+  
+  MAST takes as input a file containing the descriptions of one or more motifs
+  and searches a sequence database that you select for sequences that match
+  the motifs. The motif file can be the output of the MEME motif discovery tool 
+  or any file in the appropriate format. 
+  
+  MAST outputs three things: 
+  
+    1. The names of the high-scoring sequences sorted by the strength of the
+       combined match of the sequence to all of the motifs in the group. 
+    2. Motif diagrams showing the order and spacing of the motifs within each
+       matching sequence. 
+    3. Detailed annotation of each matching sequence showing the sequence
+       and the locations and strengths of matches to the motifs. 
+  
+  MAST works by calculating match scores for each sequence in the database
+  compared with each of the motifs in the group of motifs you provide. For each
+  sequence, the match scores are converted into various types of p-values and
+  these are used to determine the overall match of the sequence to the group of
+  motifs and the probable order and spacing of occurrences of the motifs in the
+  sequence. 
+  
+  MAST outputs a file containing:
+  
+      * the version of MAST and the date it was built, 
+      * the reference to cite if you use MAST in your research, 
+      * a description of the database and motifs used in the search, 
+      * an explanation of the results,
+      * high-scoring sequences--sequences matching the group of motifs
+        above a stated level of statistical significance, 
+      * motif diagrams showing the order and spacing of occurrences of the
+        motifs in the high-scoring sequences and 
+      * annotated sequences showing the positions and p-values of all motif
+        occurrences in each of the high-scoring sequences. 
+  
+  Each section of the results file contains an explanation of how to interpret
+  them. 
+  
+    Match Scores
+  
+  The match score of a motif to a position in a sequence is the sum of the
+  score from each column of the position-dependent scoring matrix
+  corresponding to the letter at that position in the sequence. For example, if
+  the sequence is 
+  
+  TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
+     ========
+  
+  and the motif is represented by the position-dependent scoring matrix (where
+  each row of the matrix corresponds to a position in the motif) 
+  
+  =========|=================================
+  POSITION |   A        C        G        T
+  =========|=================================
+    1      | 1.447    0.188   -4.025   -4.095 
+    2      | 0.739    1.339   -3.945   -2.325 
+    3      | 1.764   -3.562   -4.197   -3.895 
+    4      | 1.574   -3.784   -1.594   -1.994 
+    5      | 1.602   -3.935   -4.054   -1.370 
+    6      | 0.797   -3.647   -0.814    0.215 
+    7      |-1.280    1.873   -0.607   -1.933 
+    8      |-3.076    1.035    1.414   -3.913 
+  =========|=================================
+  
+  then the match score of the fourth position in the sequence (underlined)
+  would be found by summing the score for T in position 1, G in position 2 and
+  so on until G in position 8. So the match score would be 
+  
+    score = -4.095 + -3.945 + -3.895 + -1.994
+            + -4.054 + -0.814 + -1.933 + 1.414 
+          = -19.316
+  
+  The match scores for other positions in the sequence are calculated in the
+  same way. Match scores are only calculated if the match completely fits within
+  the sequence. Match scores are not calculated if the motif would overhang
+  either end of the sequence. 
+  
+    P-values
+  
+  MAST reports all matches of a sequence to a motif or group of motifs in terms
+  of the p-value of the match. MAST considers the p-values of four types of
+  events: 
+  
+      position p-value: the match of a single position within a sequence to
+      	a given motif, 
+      sequence p-value: the best match of any position within a sequence
+      	to a given motif, 
+      combined p-value: the combined best matches of a sequence to a
+      	group of motifs, and 
+      E-value: observing a combined p-value at least as small in a random
+      	database of the same size. 
+  
+  All p-values are based on a random sequence model that assumes each
+  position in a random sequence is generated according to the average letter
+  frequencies of all sequences in the the appropriate (peptide or nucleotide)
+  non-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/) on September 22,
+  1996.  This can be overridden in two ways:
+  
+  	1) -bfile &lt;bfile&gt;
+  	The random model uses the letter frequencies given in &lt;bfile&gt; 
+  	instead of the non-redundant database frequencies.
+  	The format of &lt;bfile&gt; is the same as that for the MEME -bfile opton; 
+  	see the MEME documentation for details.  Sample files are given in 
+  	directory tests: tests/nt.freq and tests/na.freq.) 
+  	
+  	2) -comp
+  	The random model uses the letter frequencies in the current target
+  	sequence instead of the non-redundant database frequencies.  This
+  	causes p-values and E-values to be compensated individually for the 
+  	actual composition of each sequence in the database.  This option
+  	can increase search time substantially due to the need to compute
+  	a different score distribution for each high-scoring sequence.
+  
+  
+      Position p-value
+  
+      The p-value of a match of a given position within a sequence to a
+      motif is defined as the probability of a randomly selected position in a
+      randomly generated sequence having a match score at least as large
+      as that of the given position. 
+  
+      Sequence p-value
+  
+      The p-value of a match of a sequence to a motif is defined as the
+      probability of a randomly generated sequence of the same length
+      having a match score at least as large as the largest match score of
+      any position in the sequence. 
+  
+      Combined p-value
+  
+      The p-value of a match of a sequence to a group of motifs is defined
+      as the probability of a randomly generated sequence of the same
+      length having sequence p-values whose product is at least as small
+      as the product of the sequence p-values of the matches of the motifs
+      to the given sequence. 
+  
+      E-value
+  
+      The E-value of the match of a sequence in a database to a a group
+      of motifs is defined as the expected number of sequences in a random
+      database of the same size that would match the motifs as well as the
+      sequence does and is equal to the combined p-value of the sequence
+      times the number of sequences in the database. 
+  
+    High-scoring Sequences
+  
+  MAST lists the names and part of the descriptive text of all sequences
+  whose E-value is less than E. Sequences shorter than one or more of the
+  motifs are skipped. The sequences are sorted by increasing E-value. The
+  value of E is set to 10 for the WEB server but is user-selectable in the
+  down-loadable version of MAST. 
+  
+    Motif Diagrams
+  
+  Motif diagrams show the order and spacing of non-overlapping matches to
+  the motifs in each high-scoring sequence. Motif occurrences are determined
+  based on the position p-value of matches to the motif. Strong matches
+  (p-value &lt; M) are shown in square brackets (`[ ]'), weak matches (M &lt;
+  p-value &lt; M × 10) are shown in angle brackets (`&lt; &gt;') and the length of
+  non-motif sequence ("spacer") is shown between dashes (`-'). For example, 
+  
+          27-[3]-44-&lt;4&gt;-99-[1]-7
+  
+  shows an initial spacer of length 27, followed by a strong match to motif 3, a
+  spacer of length 44, a weak match to motif 4, a spacer of length 99, a strong
+  match to motif 1 and a final non-motif sequence of length 7. The value of M is
+  0.0001 for the WEB server but is user-selectable in the down-loadable
+  version of MAST. 
+  
+  Note: If you specify the -hit_list switch to MAST, the motif "diagram" takes the form
+  of a comma separated list of motif occurrences ("hits").  Each "hit" has the format:
+  	&lt;strand&gt;&lt;motif&gt; &lt;start&gt; &lt;end&gt; &lt;p-value&gt;
+  where 
+          &lt;strand&gt;        is the strand (+ or - for DNA, blank for protein),
+          &lt;motif&gt;         is the motif number,
+          &lt;start&gt;         is the starting position of the hit,
+          &lt;end&gt;           is the ending position of the hit, and
+          &lt;p-value&gt;       is the position p-value of the hit.
+  
+    Annotated Sequences
+  
+  MAST annotates each high-scoring sequence by printing the sequence
+  along with the position and strength of all the non-overlapping motif
+  occurrences. The four lines above each motif occurrence contain,
+  respectively, 
+  
+      the motif number of the occurrence, 
+      the position p-value of the occurence, 
+      the best possible match to the motif, and 
+      a plus sign (`+') above each letter in the occurrence that has a positive
+      match score to the motif. 
+  
+  The best possible match to a motif is the sequence of letters which would
+  acheive the highest match score. 
+  
+  
+  MOTIF FORMAT 
+  
+  MAST can search using (multiple) motifs contained in 
+  
+      a MEME output file, 
+      a GCG profile file, 
+      two or more GCG profile filess concatenated together, or 
+      a file with the following format. 
+  
+                    Motif file format
+  
+       ALPHABET= alphabet
+       log-odds matrix: alength= alength w= w
+       row_1
+       row_2
+       ...
+       row_w  
+  
+  
+  
+      A motif is represented by a position-dependent scoring matrix. 
+      A scoring matrix is preceded by a line starting with the words
+      log-odds matrix: and specifying alength, the length of
+      the alphabet (number of columns in the scoring matrix), and the w, the
+      width of the motif (number of rows in the scoring matrix). 
+      The following w lines (no blank lines allowed) contain the rows of the
+      scoring matrix. Row i, column j of the matrix gives the score for the j-th
+      letter in alphabet appearing at position i in an occurrence of the
+      motif. 
+      The spaces after the equals signs and the colon are required. 
+      The number of letters in alphabet must equal alength. 
+      Any number of additional motifs may follow the first one. 
+      The motif file must contain a line starting with 
+  
+              ALPHABET= 
+  
+      followed by alphabet, a list containing the letters used in the motifs. 
+      The order of the letters in alphabet must be the same as the order of the
+      columns of scores in the motifs. The order need not be alphabetical
+      and case does not matter, but there should be no spaces in alphabet.
+      The letters in alphabet must be a subset of either the IUB/IUPAC DNA
+      (ABCDGHKMNRSTUVWY) or protein
+      (ABCDEFGHIKLMNPQRSTUVWXYZ) alphabets. DNA alphabets
+      must contain at least the letters ACGT. Protein alphabets must contain
+      at least the letters ACDEFGHIKLMNPQRSTVWY. All other letters in
+      the alphabets are optional. If any of the optional letters are missing 
+      from alphabet, MAST automatically generates scores for them by taking the
+      weighted average of the scores for the letters which the missing letter
+      could match. (The weights are the frequencies of the replaced letters in
+      the appropriate non-redundant database.) Replacements for the
+      optional letters are given in the following table. 
+  
+             LETTERS MATCHED BY OPTIONAL LETTERS
+      =================================================
+      optional          matches 
+      letter      DNA             protein 
+      =================================================
+       B          CGT             DN 
+       D          AGT
+       H          ACT
+       K          GT
+       M          AC
+       N          ACGT
+       R          AG
+       S          CG
+       U          T               ACDEFGHIKLMNPQRSTVWY 
+       V          CAG
+       W          AT
+       X                          ACDEFGHIKLMNPQRSTVWY 
+       Y          CT
+       Z                          EQ 
+       *          ACGT            ACDEFGHIKLMNPQRSTVWY
+       -          ACGT            ACDEFGHIKLMNPQRSTVWY
+      =================================================
+  
+  
+  EXAMPLE 
+  
+  Here is an example of a DNA motif file that contains two motifs. 
+  
+                    Sample motif file 
+  
+          ALPHABET= ACGT
+          log-odds matrix: alength= 4 w= 9
+           -4.275  -0.182  -4.195   1.408
+           -4.296  -1.487   1.880  -0.816
+           -2.160  -1.492  -4.171   1.474
+           -0.810  -4.076   1.872  -2.164
+            1.537  -1.487  -4.195  -4.205
+            0.113   0.340  -0.237  -0.209
+           -0.454   0.923   0.390  -0.834
+           -1.336  -0.082   0.905   0.100
+            0.674  -4.183   0.130  -0.201
+          log-odds matrix: alength= 4 w= 6
+           -2.032   0.324   1.371  -0.781
+           -0.409   0.560  -0.250   0.119
+           -4.274  -0.519  -0.260   1.167
+           -2.188   2.300  -4.191  -2.465
+            1.265  -4.111  -0.267  -2.180
+           -1.977   2.158  -1.661  -2.071 
+  
+  
+  
+  In the example above, because the order of the letters in alphabet is
+  ACGT, the first column of each motif gives the scores for the letter A at each
+  position in the motif, the second column gives the scores for C and so forth.
+  
+  Note: If -d &lt;database&gt; is not given, MAST looks for database
+  	specified inside of &lt;mfile&gt;
+  
+  Creates file (unless [-stdout] given) after stripping ".html" from the end of
+  &lt;mfile&gt;:
+  	mast.&lt;mfile&gt;[.&lt;database&gt;][.c&lt;count&gt;][.m&lt;motif&gt;]+[.rank&lt;rank&gt;][.ev&lt;ev&gt;][.mt&lt;mt&gt;][.b]
+  
+  EXAMPLES:
+  
+  The following examples assume that file "meme.results" is the
+  output of a MEME run containing at least 3 motifs and file
+  SwissProt is a copy of the Swiss-Prot database on your local disk.
+  DNA_DB is a copy of a DNA database on your local disk.
+   
+  1) Annotate the training set:
+   
+  	mast meme.results
+   
+  2) Find sequences matching the motif and annotate them in
+  the SwissProt database:
+   
+  	mast meme.results -d SwissProt
+   
+  3) Show sequences with weaker combined matches to motifs.
+   
+  	mast meme.results -d SwissProt -ev 200
+   
+  4) Indicate weaker matches to single motifs in the annotation so
+  that sequences with weak matches to the motifs (but perhaps with
+  the "correct" order and spacing) can be seen:
+  
+  	mast meme.results -d SwissProt -w
+   
+  5) Include a nominal order and spacing of the first three motifs
+  in the calculation of the sequence p-values to increase the
+  sensitivity of the search for matching sequences:
+   
+  	mast meme.results -d SwissProt -diag "9-[2]-61-[1]-62-[3]-91"
+   
+  6) Use only the first and third motifs in the search:
+   
+  	mast meme.results -d SwissProt -m 1 -m 3
+   
+  7) Use only the first two motifs in the search:
+   
+  	mast meme.results -d SwissProt -c 2
+  
+  8) Search DNA sequences using protein motifs, adjusting p-values and E-values 
+  for each sequence by that sequence's composition:
+  
+  	mast meme.results -d DNA_DB -dna -comp
+  

Added: trunk/packages/meme/trunk/debian/meme_manual.txt
===================================================================
--- trunk/packages/meme/trunk/debian/meme_manual.txt	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/meme_manual.txt	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,650 @@
+USAGE:
+	meme	&lt;dataset&gt; [optional arguments]
+
+	&lt;dataset&gt; 		file containing sequences in FASTA format
+	[-h]			print this message
+	[-dna]			sequences use DNA alphabet
+	[-protein]		sequences use protein alphabet
+	[-mod oops|zoops|anr]	distribution of motifs
+	[-nmotifs &lt;nmotifs&gt;]	maximum number of motifs to find
+	[-evt &lt;ev&gt;]		stop if motif E-value greater than &lt;evt&gt;
+	[-nsites &lt;sites&gt;]	number of sites for each motif
+	[-minsites &lt;minsites&gt;]	minimum number of sites for each motif
+	[-maxsites &lt;maxsites&gt;]	maximum number of sites for each motif
+	[-wnsites &lt;wnsites&gt;]	weight on expected number of sites
+	[-w &lt;w&gt;]		motif width
+	[-minw &lt;minw&gt;]		minumum motif width
+	[-maxw &lt;maxw&gt;]		maximum motif width
+	[-nomatrim]		do not adjust motif width using multiple
+				alignment
+	[-wg &lt;wg&gt;]		gap opening cost for multiple alignments
+	[-ws &lt;ws&gt;]		gap extension cost for multiple alignments
+	[-noendgaps]		do not count end gaps in multiple alignments
+	[-bfile &lt;bfile&gt;]	name of background Markov model file
+	[-revcomp]		allow sites on + or - DNA strands
+	[-pal]			force palindromes (requires -dna)
+	[-maxiter &lt;maxiter&gt;]	maximum EM iterations to run
+	[-distance &lt;distance&gt;]	EM convergence criterion
+	[-prior dirichlet|dmix|mega|megap|addone]
+				type of prior to use
+	[-b &lt;b&gt;]		strength of the prior
+	[-plib &lt;plib&gt;]		name of Dirichlet prior file
+	[-spfuzz &lt;spfuzz&gt;]	fuzziness of sequence to theta mapping
+	[-spmap uni|pam]	starting point seq to theta mapping type
+	[-cons &lt;cons&gt;]		consensus sequence to start EM from
+	[-text]			output in text format (default is HTML)
+	[-maxsize &lt;maxsize&gt;]	maximum dataset size in characters
+	[-nostatus]		do not print progress reports to terminal
+	[-p &lt;np&gt;]		use parallel version with &lt;np&gt; processors
+	[-time &lt;t&gt;]		quit before &lt;t&gt; CPU seconds consumed
+	[-sf &lt;sf&gt;]		print &lt;sf&gt; as name of sequence file
+
+  MEME -- Multiple EM for Motif Elicitation
+   
+  MEME is a tool for discovering motifs in a group of related DNA or protein
+  sequences.
+   
+  A motif is a sequence pattern that occurs repeatedly in a group of related
+  protein or DNA sequences. MEME represents motifs as position-dependent
+  letter-probability matrices which describe the probability of each possible
+  letter at each position in the pattern. Individual MEME motifs do not 
+  contain gaps. Patterns with variable-length gaps are split by MEME into two 
+  or more separate motifs.
+   
+  MEME takes as input a group of DNA or protein sequences (the training set)
+  and outputs as many motifs as requested. MEME uses statistical modeling
+  techniques to automatically choose the best width, number of occurrences,
+  and description for each motif.
+   
+  MEME outputs its results as a hypertext (HTML) document.
+  
+  The MEME results consist of:
+  
+         The version of MEME and the date it was released. 
+  
+         The reference to cite if you use MEME in your research. 
+  
+         A description of the sequences you submitted (the "training set")
+         showing the name, "weight" and length of each sequence. 
+  
+         The command line summary detailing the parameters with which you
+         ran MEME. 
+  
+         Information on each of the motifs MEME discovered, including: 
+             1.A summary line showing the width, number of occurrences, log
+                likelihood ratio and statistical significance of the motif. 
+             2.A simplified position-specific probability matrix. 
+             3.A diagram showing the degree of conservation at each motif
+                position. 
+             4.A multilevel consensus sequence showing the most conserved
+                letter(s) at each motif position. 
+             5.The occurrences of the motif sorted by p-value and aligned with
+                each other. 
+             6.Block diagrams of the occurrences of the motif within each
+                sequence in the training set. 
+             7.The motif in BLOCKS format. 
+             8.A position-specific scoring matrix (PSSM) for use by the
+                MAST database search program. 
+             9.The position specific probability matrix (PSPM) describing the
+                motif. 
+  
+         A summary of motifs showing an optimized (non-overlapping) tiling of
+         all of the motifs onto each of the sequences in the training set. 
+  
+         The reason why MEME stopped and the name of the CPU on which it
+         ran. 
+  
+         This explanation of how to interpret MEME results.  
+  
+  REQUIRED ARGUMENTS:
+  	&lt;dataset&gt;       The name of the file containing the training set 
+  			sequences.  If &lt;dataset&gt; is the word "stdin", MEME
+  			reads from standard input.  
+  
+  			The sequences in the dataset should be in 
+  			Pearson/FASTA format.  For example:
+  
+  			&gt;ICYA_MANSE INSECTICYANIN A FORM (BLUE BILIPROTEIN)
+  			GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK
+  			LPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDA
+  			&gt;LACB_BOVIN BETA-LACTOGLOBULIN PRECURSOR (BETA-LG) 
+  			MKCLLLALALTCGAQALIVTQTMKGLDI
+  			QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKW
+  				
+  			Sequences start with a header line followed by
+  			sequence lines.  A header line has
+  			the character "&gt;" in position one, followed by
+  			an unique name without any spaces, followed by
+  			(optional) descriptive text.  After the header line 
+  			come the actual sequence lines.  Spaces and blank 
+  			lines are ignored.  Sequences may be in capital or 
+  			lowercase or both.  
+  
+  			MEME uses the first word in the header line of each 
+  			sequence, truncated to 24 characters if necessary,
+  			as the name of the sequence. This name must be unique. 
+  			Sequences with duplicate names will be ignored. 
+  			(The first word in the title line is 
+  			everything following the "&gt;" up to the first blank.)
+  
+  			Sequence weights may be specified in the dataset
+  			file by special header lines where the unique name
+  			is "WEIGHTS" (all caps) and the descriptive 
+  			text is a list of sequence weights. 
+  			Sequence weights are numbers in the range 0 &lt; w &lt;=1.
+  			All weights are assigned in order to the
+  			sequences in the file. If there are more sequences
+  			than weights, the remainder are given weight one.
+  			Weights must be greater than zero and less than
+  			or equal to one.  Weights may be specified by
+  			more than one "WEIGHT" entry which may appear
+  			anywhere in the file.  When weights are used, 
+  			sequences will contribute to motifs in proportion
+  			to their weights.  Here is an example for a file
+  			of three sequences where the first two sequences are 
+  			very similar and it is desired to down-weight them:
+  
+  			&gt;WEIGHTS 0.5 .5 1.0 
+  			&gt;seq1
+  			GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK
+  			&gt;seq2
+  			GDMFCPGYCPDVKPVGDFDLSAFAGAWHELAK
+  			&gt;seq3
+  			QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKW
+  
+  
+  OPTIONAL ARGUMENTS:
+   
+  MEME has a large number of optional inputs that can be used
+  to fine-tune its behavior.  To make these easier to understand
+  they are divided into the following categories:
+   
+  		ALPHABET	- control the alphabet for the motifs
+  				  (patterns) that MEME will search for
+   
+  		DISTRIBUTION	- control how MEME assumes the occurrences
+  				  of the motifs are distributed throughout
+  				  the training set sequences
+   
+  		SEARCH		- control how MEME searches for motifs
+   
+                  SYSTEM          - the -p &lt;np&gt; argument causes a version of MEME
+                                    compiled for a parallel CPU architecture
+                                    to be run.  (By placing &lt;np&gt; in quotes you
+                                    may pass installation specific switches to
+  				  the 'mpirun' command.  The number of 
+                                    processors to run on must be the first 
+  				  argument following -p).
+  
+   
+  In what follows, &lt;n&gt; is an integer, &lt;a&gt; is a decimal number, and &lt;string&gt; 
+  is a string of characters.
+   
+  ALPHABET
+  --------
+  MEME accepts either DNA or protein sequences, but not both in the same run.
+  By default, sequences are assumed to be protein.  The sequences must be in 
+  FASTA format.
+  
+  DNA sequences must contain only the letters "ACGT", plus the ambiguous
+  letters "BDHKMNRSUVWY*-". 
+  Protein sequences must contain only the letters "ACDEFGHIKLMNPQRSTVWY",
+  plus the ambiguous letters "BUXZ*-".
+  
+  MEME converts all ambiguous letters to "X", which is treated as "unknown".
+   
+  	-dna		Assume sequences are DNA; default: protein sequences
+  	-protein	Assume sequences are protein
+  
+   
+  DISTRIBUTION
+  ------------
+  If you know how occurrences of motifs are distributed in the training set 
+  sequences, you can specify it with the following optional switches.  The 
+  default distribution of motif occurrences is assumed to be zero or one 
+  occurrence of per sequence.
+   
+  	-mod &lt;string&gt;   The type of distribution to assume.
+  			oops    One Occurrence Per Sequence
+  				MEME assumes that each sequence in the dataset
+  				contains exactly one occurrence of each motif.
+  				This option is the fastest and most sensitive
+  				but the motifs returned by MEME may be 
+  				"blurry" if any of the sequences is missing
+  				them. 	
+   
+  			zoops   Zero or One Occurrence Per Sequence
+  				MEME assumes that each sequence may contain at
+  				most one occurrence of each motif. This option
+  				is useful when you suspect that some motifs
+  				may be missing from some of the sequences. In
+  				that case, the motifs found will be more
+  				accurate than using the first option. This
+  				option takes more computer time than the
+  				first option (about twice as much) and is
+  				slightly less sensitive to weak motifs present
+  				in all of the sequences.
+   
+  			anr 	Any Number of Repetitions
+  				MEME assumes each sequence may contain any
+  				number of non-overlapping occurrences of each
+  				motif. This option is useful when you suspect
+  				that motifs repeat multiple times within a
+  				single sequence. In that case, the motifs 
+  				found will be much more accurate than using 
+  				one of the other options. This option can also
+  				be used to discover repeats within a single
+  				sequence. This option takes the much more
+  				computer time than the first option (about ten
+  				times as much) and is somewhat less sensitive
+  				to weak motifs which do not repeat within a
+  				single sequence than the other two options.
+   
+   
+  SEARCH
+  ------
+  
+  A) OBJECTIVE FUNCTION
+  
+  MEME uses an objective function on motifs to select the "best" motif.
+  The objective function is based on the statistical significance of the 
+  log likelihood ratio (LLR) of the occurrences of the motif.  
+  The E-value of the motif is an estimate of the number of motifs (with the 
+  same width and number of occurrences) that would have equal or higher log 
+  likelihood ratio if the training set sequences had been generated randomly 
+  according to the (0-order portion of the) background model. 
+  
+  MEME searches for the motif with the smallest E-value.
+  It searches over different motif widths, numbers of occurrences, and
+  positions in the training set for the motif occurrences.
+  The user may limit the range of motif widths and number of occurrences
+  that MEME tries using the switches described below.  In addition,
+  MEME trims the motif (using a dynamic programming multiple alignment) to 
+  eliminate any positions where there is a gap in any of the occurrences.  
+  
+  The log likelihood ratio of a motif is
+  	llr = log (Pr(sites | motif) / Pr(sites | back))
+  and is a measure of how different the sites are from the background model.
+  Pr(sites | motif) is the probability of the occurrences given the a model
+  consisting of the position-specific probability matrix (PSPM) of the motif.
+  (The PSPM is output by MEME).
+  Pr(sites | back) is the  probability of the occurrences given the background
+  model.  The background model is an n-order Markov model.  By default,
+  it is a 0-order model consisting of the frequencies of the letters in
+  the training set.  A different 0-order Markov model or higher order Markov 
+  models can be specified to MEME using the -bfile option described below.
+  
+  The E-value reported by MEME is actually an approximation of the E-value
+  of the log likelihood ratio.  (An approximation is used because it is far
+  more efficient to compute.)  The approximation is based on the fact that
+  the log likelihood ratio of a motif is the sum of the log 
+  likelihood ratios of each column of the motif.  Instead of computing the 
+  statistical significance of this sum (its p-value), MEME computes the 
+  p-value of each column and then computes the significance of their product.  
+  Although not identical to the significance of the log likelihood ratio, this 
+  easier to compute objective function works very similarly in practice.
+  
+  The motif significance is reported as the E-value of the motif.  
+  The statistical signficance of a motif is computed based on:
+  	1) the log likelihood ratio,
+  	2) the width of the motif,
+  	3) the number of occurrences,
+  	4) the 0-order portion of the background model,
+  	5) the size of the training set, and
+  	6) the type of model (oops, zoops, or anr, which determines the
+  	   number of possible different motifs of the given width and
+  	   number of occurrences).
+  
+  MEME searches for motifs by performing Expectation Maximization (EM) on a 
+  motif model of a fixed width and using an initial estimate of the number of 
+  sites.  It then sorts the possible sites according to their probability 
+  according to EM.  MEME then and calculates the E-values of the first n sites 
+  in the sorted list for different values of n.  This procedure (first EM, 
+  followed by computing E-values for different numbers of sites) is repeated 
+  with different widths and different initial estimates of the number of 
+  sites.  MEME outputs the motif with the lowest E-value.
+  
+   
+  B) NUMBER OF MOTIFS
+   
+  	-nmotifs &lt;n&gt;    The number of *different* motifs to search
+  			for.  MEME will search for and output &lt;n&gt; motifs.
+  			Default: 1
+   
+  	-evt &lt;p&gt;	Quit looking for motifs if E-value exceeds &lt;p&gt;.
+  			Default: infinite (so by default MEME never quits
+  			before -nmotifs &lt;n&gt; have been found.)
+   
+   
+  C) NUMBER OF MOTIF OCCURENCES
+   
+  	-nsites &lt;n&gt;
+  	-minsites &lt;n&gt;
+  	-maxsites &lt;n&gt;
+  			The (expected) number of occurrences of each motif.
+  			If -nsites is given, only that number of occurrences
+  			is tried.  Otherwise, numbers of occurrences between
+  			-minsites and -maxsites are tried as initial guesses
+  			for the number of motif occurrences.  These
+  			switches are ignored if mod = oops.
+   
+  			Default: -minsites sqrt(number sequences)
+  				 -maxsites Default:
+  					zoops 	# of sequences
+  					anr	MIN(5*#sequences, 50)
+  
+  	-wnsites &lt;n&gt;	The weight on the prior on nsites.  This controls
+  			how strong the bias towards motifs with exactly
+  			nsites sites (or between minsites and maxsites sites)
+  			is.  It is a number in the range [0..1).  The
+  			larger it is, the stronger the bias towards 
+  			motifs with exactly nsites occurrences is.
+  			Default: 0.8
+   
+  D) MOTIF WIDTH
+   
+  	-w &lt;n&gt;
+  	-minw &lt;n&gt;
+  	-maxw &lt;n&gt;
+  
+  			The width of the motif(s) to search for.
+  			If -w is given, only that width is tried.
+  			Otherwise, widths between -minw and -maxw are tried.
+  			Default: -minw  8, -maxw 50 (defined in user.h)
+  
+  			Note: If &lt;n&gt; is less than the length of the shortest 
+  			sequence in the dataset, &lt;n&gt; is reset by MEME to 
+  			that value. 
+  
+  	-nomatrim
+  	-wg &lt;a&gt;
+  	-ws &lt;a&gt;
+  	-noendgaps
+  			These switches control trimming (shortening) of
+  			motifs using the multiple alignment method.
+  			Specifying -nomatrim causes MEME to skip this and
+  			causes the other switches to be ignored.
+  			MEME finds the best motif
+  			found and then trims (shortens) it using the multiple 
+  			alignment method (described below). The number of 
+  			occurrences is then adjusted to maximize the motif 
+  			E-value, and then the motif width is further
+  			shortened to optimize the E-value.
+  
+  			The multiple alignment method performs a separate 
+  			pairwise alignment of the site with the highest
+  			probability and each other possible site.
+  			(The alignment includes width/2 positions on either 
+  			side of the sites.) The pairwise alignment
+  			is controlled by the switches:
+  				-wg &lt;a&gt; (gap cost; default: 11), 
+  				-ws &lt;a&gt; (space cost; default 1), and, 
+  				-noendgaps (do not penalize endgaps; default: 
+  					penalize endgaps).  
+  			The pairwise alignments are then combined and the 
+  			method determines the widest section of the motif with 
+  			no insertions or deletions.  If this alignment
+  		        is shorter than &lt;minw&gt;, it tries to find an alignment
+  			allowing up to one insertion/deletion per motif
+  			column.  This continues (allowing up to 2, 3 ...
+  			insertions/deletions per motif column) until an 
+  			alignment of width at least &lt;minw&gt; is found. 
+  
+  
+  E) BACKGROUND MODEL
+  	-bfile &lt;bfile&gt;	The name of the file containing the background model
+  			for sequences.  The background model is the model
+  			of random sequences used by MEME.  The background 
+  			model is used by MEME 
+  				1) during EM as the "null model",
+  				2) for calculating the log likelihood ratio
+  				   of a motif,
+  				3) for calculating the significance (E-value) 
+  				   of a motif, and, 
+  				4) for creating the position-specific scoring
+  				   matrix (log-odds matrix).
+  
+  			By default, the background model is a 0-order Markov 
+  			model based on the letter frequencies in the training 
+  			set.  
+  
+  			Markov models of any order can be specified in &lt;bfile&gt; 
+  			by listing frequencies of all possible tuples of 
+  			length up to order+1.  
+  
+  			Note that MEME uses only the 0-order portion (single
+  			letter frequencies) of the background model for
+  			purposes 3) and 4), but uses the full-order model
+  			for purposes 1) and 2), above.
+  
+  			Example: To specify a 1-order Markov background model
+  		 		 for DNA, &lt;bfile&gt; might contain the following
+  				 lines.  Note that optional comment lines are
+  				 by "#" and are ignored by MEME.
+  
+  				# tuple   frequency_non_coding
+  				a       0.324
+  				c       0.176
+  				g       0.176
+  				t       0.324
+  				# tuple   frequency_non_coding
+  				aa      0.119
+  				ac      0.052
+  				ag      0.056
+  				at      0.097
+  				ca      0.058
+  				cc      0.033
+  				cg      0.028
+  				ct      0.056
+  				ga      0.056
+  				gc      0.035
+  				gg      0.033
+  				gt      0.052
+  				ta      0.091
+  				tc      0.056
+  				tg      0.058
+  				tt      0.119
+  
+  Sample -bfile files are given in directory tests: 
+  	tests/nt.freq (DNA), and 
+  	tests/na.freq (amino acid).
+  
+  F) DNA PALINDROMES AND STRANDS
+   
+  	-revcomp	motifs occurrences may be on the given DNA strand
+  			or on its reverse complement.
+  			Default: look for DNA motifs only on the strand given 
+  			in the training set.
+   
+  	-pal		
+  			Choosing -pal causes MEME to look for palindromes in 
+  			DNA datasets.  
+  
+  			MEME averages the letter frequencies in corresponding 
+  			columns of the motif (PSPM) together. For instance, 
+  			if the width of the motif is 10, columns 1 and 10, 2 
+  			and 9, 3 and 8, etc., are averaged together.  The 
+  			averaging combines the frequency of A in one column 
+  			with T in the other, and the frequency of C in one 
+  			column with G in the other.  
+  			If neither option is not chosen, MEME does not 
+  			search for DNA palindromes.
+  
+  
+  G) EM ALGORITHM
+   
+  	-maxiter &lt;n&gt;    The number of iterations of EM to run from
+  			any starting point.
+  			EM is run for &lt;n&gt; iterations or until convergence
+  			(see -distance, below) from each starting point.
+  			Default: 50
+   
+  	-distance &lt;a&gt;   The convergence criterion.  MEME stops
+  			iterating EM when the change in the
+  			motif frequency matrix is less than &lt;a&gt;.
+  			(Change is the euclidean distance between
+  			two successive frequency matrices.)
+  			Default: 0.001
+   
+  	-prior &lt;string&gt; The prior distribution on the model parameters:
+  			dirichlet       simple Dirichlet prior
+  					This is the default for -dna and 
+  					-alph.  It is based on the 
+  					non-redundant database letter
+  					frequencies.
+  			dmix		mixture of Dirichlets prior
+  					This is the default for -protein. 
+  			mega		extremely low variance dmix;
+  					variance is scaled inversely with
+  					the size of the dataset.
+  			megap		mega for all but last iteration
+  					of EM; dmix on last iteration.
+  			addone		add +1 to each observed count
+   
+  	-b &lt;a&gt;	  The strength of the prior on model parameters:
+  				&lt;a&gt; = 0 means use intrinsic strength of prior
+  					for prior = dmix.
+  			Defaults:
+  				0.01 if prior = dirichlet
+  				0 if prior = dmix
+   
+  	-plib &lt;string&gt;  The name of the file containing the Dirichlet prior
+  			in the format of file prior30.plib.
+   
+   
+  H) SELECTING STARTS FOR EM
+   
+  The default is for MEME to search the dataset for good starts for EM.  How 
+  the starting points are derived from the dataset is specified by the 
+  following switches.
+   
+  The default type of mapping MEME uses is:
+  		-spmap uni for -dna and -alph &lt;string&gt;
+  		-spmap pam for -protein
+   
+  	-spfuzz &lt;a&gt;     The fuzziness of the mapping.
+  			Possible values are greater than 0.  Meaning
+  			depends on -spmap, see below.
+   
+  	-spmap &lt;string&gt; The type of mapping function to use.
+  			uni     Use add-&lt;a&gt; prior when converting a substring
+  				to an estimate of theta.
+  				Default -spfuzz &lt;a&gt;: 0.5
+  			pam     Use columns of PAM &lt;a&gt; matrix when converting
+  				a substring to an estimate of theta.
+  				Default -spfuzz &lt;a&gt;: 120 (PAM 120)
+   
+  			Other types of starting points
+  			can be specified using the following switches.
+   
+  	-cons &lt;string&gt;  Override the sampling of starting points
+  			and just use a starting point derived from
+  			&lt;string&gt;.
+  			This is useful when an actual occurrence of
+  			a motif is known and can be used as the
+  			starting point for finding the motif.
+  
+  EXAMPLES:
+  
+  The following examples use data files provided in this release of MEME.  
+  MEME writes its output to standard output, so you will want to redirect it 
+  to a file in order for use with MAST.
+   
+  1) A simple DNA example:
+   
+  	 meme crp0.s -dna -mod oops -pal &gt; ex1.html
+   
+  MEME looks for a single motif in the file crp0.s which contains DNA 
+  sequences in FASTA format.  The OOPS model is used so MEME assumes that 
+  every sequence contains exactly one occurrence of the motif.  The 
+  palindrome switch is given so the motif model (PSPM) is converted into a 
+  palindrome by combining corresponding frequency columns.  MEME automatically 
+  chooses the best width for the motif in this example since no width was 
+  specified.
+   
+  2) Searching for motifs on both DNA strands:
+  
+           meme crp0.s -dna -mod oops -revcomp &gt; ex2.html
+  
+  This is like the previous example except that the -revcomp switch tells
+  MEME to consider both DNA strands, and the -pal switch is absent so the
+  palindrome conversion is omitted.  When DNA uses both DNA strands, motif
+  occurrences on the two strands may not overlap.  That is, any position
+  in the sequence given in the training set may be contained in an occurrence
+  of a motif on the positive strand or the negative strand, but not both.
+  
+  3) A fast DNA example:
+   
+  	meme crp0.s -dna -mod oops -revcomp -w 20 &gt; ex3.html
+   
+  This example differs from example 1) in that MEME is told to only 
+  consider motifs of width 20.  This causes MEME to execute about 10 
+  times faster.  The -w switch can also be used with protein datasets if 
+  the width of the motifs are known in advance.
+  
+  4) Using a higher-order background model:
+  
+  	meme INO_up800.s -dna -mod anr -revcomp -bfile yeast.nc.6.freq &gt; ex4.html
+  
+  In this example we use -mod anr and -bfile yeast.nc.6.freq.  This specifies 
+  that
+  	a) the motif may have any number of occurrences in each sequence, and,
+  	b) the Markov model specified in yeast.nc.6.freq is used as the 
+  	   background model.  This file contains a fifth-order Markov model 
+             for the non-coding regions in the yeast genome.
+  Using a higher order background model can often result in more sensitive
+  detection of motifs.  This is because the background model more accurately
+  models non-motif sequence, allowing MEME to discriminate against it and find 
+  the true motifs.
+  
+  5) A simple protein example:
+   
+  	meme lipocalin.s -mod oops -maxw 20 -nmotifs 2 &gt; ex5.html
+   
+  The -dna switch is absent, so MEME assumes the file lipocalin.s contains 
+  protein sequences.  MEME searches for two motifs each of width less than or 
+  equal to 20.
+  (Specifying -maxw 20 makes MEME run faster since it does not have to 
+  consider motifs longer than 20.) Each motif is assumed to occur in each 
+  of the sequences because the OOPS model is specified.
+   
+  6) Another simple protein example:
+   
+  	meme farntrans5.s -mod anr -maxw 40 -maxsites 50 &gt; ex6.html
+   
+  MEME searches for a motif of width up to 40 with up to 50 occurrences in
+  the entire training set.  The ANR sequence model is specified,
+  which allows each motif to have any number of occurrences in each sequence.  
+  This dataset contains motifs with multiple repeats of motifs in each 
+  sequence.  This example is fairly time consuming due to the fact that the 
+  time required to initiale the motif probability tables is proportional 
+  to &lt;maxw&gt; times &lt;maxsites&gt;.  By default, MEME only looks for motifs up to 
+  29 letters wide with a maximum total of number of occurrences equal to twice 
+  the number of sequences or 30, whichever is less.
+  
+  7) A much faster protein example:
+  
+  	meme farntrans5.s -mod anr -w 10 -maxsites 30 -nmotifs 3 &gt; ex7.html
+  
+  This time MEME is constrained to search for three motifs of width exactly 
+  ten.  The effect is to break up the long motif found in the previous 
+  example.  The -w switch forces motifs to be *exactly* ten letters wide.
+  This example is much faster because, since only one width is considered, the
+  time to build the motif probability tables is only proportional to 
+  &lt;maxsites&gt;.
+  
+  8) Splitting the sites into three:
+  
+  	meme farntrans5.s -mod anr -maxw 12 -nsites 24 -nmotifs 3 &gt; ex8.html
+  
+  This forces each motif to have 24 occurrences, exactly, and be up to 12 
+  letters wide.
+  
+  9) A larger protein example with E-value cutoff:
+  
+  	meme adh.s -mod zoops -nmotifs 20 -evt 0.01 &gt; ex9.html
+  
+  In this example, MEME looks for up to 20 motifs, but stops when a motif is
+  found with E-value greater than 0.01.  Motifs with large E-values are likely
+  to be statistical artifacts rather than biologically significant.
+

Added: trunk/packages/meme/trunk/debian/patches/xyz.diff
===================================================================
--- trunk/packages/meme/trunk/debian/patches/xyz.diff	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/patches/xyz.diff	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,239 @@
+--- meme-3.0.13.orig/src/makefile
++++ meme-3.0.13/src/makefile
+@@ -23,94 +23,25 @@
+ #
+ # Compiler and linker flags.
+ #
+-CC		= cc -O
+-GCC		= gcc -Wall -O3 #-g #-DEXP -pg -m64
+-SUN_CC		= cc -O -Xa
++CC		= gcc
++CFLAGS		= -O3
+ MPICC		= mpicc -DPARALLEL -O
+ T3E_MPICC	= cc -DPARALLEL -O3 -h msglevel_3
+ RS6000_MPICC	= mpcc -DPARALLEL -O3 -qstrict -qarch=pwr3 -qtune=pwr3
+ IRIX_MPICC	= gcc -DPARALLEL -O3
+-INCLUDES	= -I${SRC} -I${SRC}/INCLUDE
+-XTRAFLAGS	= -D${M} -DUNIX -D__USE_FIXED_PROTOTYPES__ #-DEXP
+-CFLAGS		= ${INCLUDES} ${XTRAFLAGS}
+-
+-#
+-# Current directory relative to bin directory.
+-#
+-SRC		= ../../src
+-
+-#
+-# Create symbolic links from bin directory to src directory .c files
+-# if they don't already exist.
+-#
+-links:	
+-	@ if [ "${mode}" = p ] ; then p="_p"; else p=""; fi; \
+-        bin=../bin/`../bin/machid`$$p; \
+-	if [ ! -d $$bin ] ; then mkdir $$bin; chmod 755 $$bin; fi; \
+-	for f in *.c; do \
+-	  if [ ! -f $$bin/$$f ] ; then ln -s ${SRC}/$$f $$bin/$$f; fi \
+-	done
+-
+-#
+-# Make the objects and executable.
+-#
+-objexe:
+-	@ OS=`uname -s | tr -d '[:space:'] |  tr -c '[:alnum:]' '_'`; \
+-	machid=`../bin/machid`; \
+-        plib=""; \
+-	if [ "${mode}" = p ] ; then \
+-	  p="_p"; \
+-	  if [ $$OS = "crayt3e" ] ; then \
+-	    cc="${T3E_MPICC}"; \
+-	  else \
+-	    if [ $$OS = "AIX" ] ; then \
+-	      cc="${RS6000_MPICC}"; \
+-	    else \
+-	      if [ $$OS = "IRIX64" ] ; then \
+-		cc="${IRIX_MPICC}"; \
+-		plib="-lmpi"; \
+-	      else \
+-		cc="${MPICC}"; \
+-	      fi; \
+-	    fi; \
+-	  fi; \
+-	else \
+-	  if [ -f `which gcc` ] ; then \
+-	    cc="${GCC}"; \
+-	  else \
+-	    if [ $$OS = "SunOS" ] ; then \
+-	      cc="${SUN_CC}"; \
+-	    else \
+-	      cc="${CC}"; \
+-	    fi; \
+-	  fi; \
+-	fi; \
+-	cd ../bin/$$machid$$p; \
+-	make -f ${SRC}/makefile exec M=$$OS CC="$$cc ${SO}" PLIB=$$plib;
+- 
+-#
+-# Make the executable.
+-#
+-exec:	${OBJ}
+-	@ if [ ${TAR} = readseq ] || [ ${TAR} = meme-client ] || \
+-	  [ ${TAR} = meme-server ] || [ ${TAR} = mast-server ] || \
+-          [ ${TAR} = alphtype ]; then \
+-	  lib=""; \
+-	else \
+-  	  lib="-lm"; \
+-	fi; \
+-	if [ ${M} = SunOS ] ; then \
+-	  lib="$$lib -lnsl -lsocket -ldl ${PLIB}"; \
+-	else \
+-  	  lib="$$lib ${PLIB}"; \
+-        fi; \
+-	echo ${CC} ${OBJ} -o ${TAR} $$lib; \
+-	${CC} ${OBJ} -o ${TAR} $$lib; \
+-	if [ -f ${TAR} ] ; then chmod 755 ${TAR}; fi
++INCLUDES	= -I. -IINCLUDE
++#XTRAFLAGS	= -D${M} -DUNIX -D__USE_FIXED_PROTOTYPES__ #-DEXP
++XTRAFLAGS	= -DUNIX -D__USE_FIXED_PROTOTYPES__ #-DEXP
++
++.SUFFIXES: .c .o
++
++%.o: %.c
++	$(CC) $(CFLAGS) $(INCLUDES) $(XTRAFLAGS) -c $<
+ 
+ #
+ # Lists of objects.
+ #
++#
+ SEQ_OBJECTS =  hash.o hash_alph.o read_seq_file.o background.o
+ MOTIF_OBJECTS = logodds.o motifs.o regress.o
+ MEME_OBJECTS = clock.o display.o dpalign.o em.o \
+@@ -135,35 +66,93 @@
+ DPALIGN_OBJECTS = dpalign.o meme_util.o hash_alph.o display.o
+ 
+ #
++# Make the executable.
++#
++#exec:	${OBJ}
++#	@ if [ ${TAR} = readseq ] || [ ${TAR} = meme-client ] || \
++#	  [ ${TAR} = meme-server ] || [ ${TAR} = mast-server ] || \
++#          [ ${TAR} = alphtype ]; then \
++#	  lib=""; \
++#	else \
++#  	  lib="-lm"; \
++#	fi; \
++#	if [ ${M} = SunOS ] ; then \
++#	  lib="$$lib -lnsl -lsocket -ldl ${PLIB}"; \
++#	else \
++#  	  lib="$$lib ${PLIB}"; \
++#        fi; \
++#	echo $(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) ${OBJ} -o ${TAR} $$lib; \
++#	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) ${OBJ} -o ${TAR} $$lib; \
++#	if [ -f ${TAR} ] ; then chmod 755 ${TAR}; fi
++#
++
++#
+ # Targets.
+ #
+-meme:		links
+-	@ make -f makefile objexe OBJ="${MEME_OBJECTS}" TAR=$@
+-dpalign: 	links
+-	@ make -f makefile objexe OBJ="${DPALIGN_OBJECTS}" TAR=$@
+-llr: 		links
+-	@ make -f makefile objexe SO="-DSO" OBJ="${LLR_OBJECTS}" TAR=$@
+-star:		links
+-	@ make -f makefile objexe SO="-DSO" OBJ="${STAR_OBJECTS}" TAR=$@
+-ic: 		links
+-	@ make -f makefile objexe SO="-DSO" OBJ="${IC_OBJECTS}" TAR=$@
+-mast:		links
+-	@ make -f makefile objexe OBJ="${MAST_OBJECTS}" TAR=$@
+-siteroc:	links
+-	@ make -f makefile objexe OBJ="${SITEROC_OBJECTS}" TAR=$@
+-seqroc:	links
+-	@ make -f makefile objexe OBJ="${SEQROC_OBJECTS}" TAR=$@
+-getsize:	links
+-	@ make -f makefile objexe OBJ="${GETSIZE_OBJECTS}" TAR=$@
+-meme-client:	links
+-	@ make -f makefile objexe OBJ="${MEME-CLIENT_OBJECTS}" TAR=$@
+-meme-server:	links
+-	@ make -f makefile objexe OBJ="${MEME-SERVER_OBJECTS}" TAR=$@
+-mast-server:	links
+-	@ make -f makefile objexe OBJ="${MAST-SERVER_OBJECTS}" TAR=$@
+-readseq:	links
+-	@ make -f makefile objexe OBJ="${READSEQ_OBJECTS}" TAR=$@
+-alphtype:	links
+-	@ make -f makefile objexe OBJ="${ALPHTYPE_OBJECTS}" TAR=$@
++
++#TARGETS=meme dpalign llr star ic mast siteroc seqroc getsize meme-client meme-server mast-server \
++	readseq alphtype
++TARGETS=meme mast
++
++all:	$(TARGETS)
++
++meme: $(MEME_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(MEME_OBJECTS) -o $@ -lm
++
++mast: $(MAST_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(MAST_OBJECTS) -o $@ -lm
++	
++dpalign: $(DPALIGN_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(DPALIGN_OBJECTS) -o $@
++
++llr: 	$(LLR_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(LLR_OBJECTS) -o $@
++
++star:	$(STAR_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(STAR_OBJECTS) -o $@
++
++ic: 	$(IC_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(IC_OBJECTS) -o $@
++
++siteroc:	$(SITEROC_OBJECTS)	
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(SITEROC_OBJECTS) -o $@
++
++seqroc: $(SEQROC_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(SEQROC_OBJECTS) -o $@
++
++getsize: $(GETSIZE_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(GETSIZE_OBJECTS) -o $@
++
++meme-client: $(MEME-CLIENT_OBJECTS)	
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(MEME-CLIENT_OBJECTS) -o $@ -lm
++
++meme-server: $(MEME-SERVER_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(MEME-SERVER_OBJECTS) -o $@ -lm
++
++mast-server: $(MAST-SERVER_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(MAST-SERVER_OBJECTS) -o $@ -lm
++
++readseq: $(READSEQ_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(READSEQ_OBJECTS) -o $@ -lm
++
++alphtype: $(ALPHTYPE_OBJECTS)
++	$(CC) $(CFLAGS) $(INCLUDES) $(EXTRAFLAGS) $(ALPHATYE_OBJECTS) -o $@ -lm
++
+ test:
+-	cd ..; bin/runtests
++	(cd .. &&  bin/runtests)
++
++DESTDIR=/
++
++install:	all
++	cp -r ../bin/* $(DESTDIR)/usr/share/meme/bin/
++	cp $(TARGETS) $(DESTDIR)/usr/share/meme/bin/
++	(cd $(DESTDIR)/usr/bin/ && ln -s /usr/share/meme/bin/m{ast,eme} .)
++
++clean:
++	rm -f *.o
++
++distclean: clean
++	rm -f $(TARGETS)
++
++
++.PHONY:	clean distclean
+--- meme-3.0.13.orig/website/cgi-bin/process_request.cgi
++++ meme-3.0.13/website/cgi-bin/process_request.cgi
+@@ -1,4 +1,4 @@
+-#!/usr/local/bin/perl
++#!/usr/bin/perl
+ # process_request.cgi
+ 
+ #

Added: trunk/packages/meme/trunk/debian/rules
===================================================================
--- trunk/packages/meme/trunk/debian/rules	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/rules	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,79 @@
+#!/usr/bin/make -f
+# -*- makefile -*-
+# Sample debian/rules that uses debhelper.
+# This file was originally written by Joey Hess and Craig Small.
+# As a special exception, when this file is copied by dh-make into a
+# dh-make output file, you may use that output file without restriction.
+# This special exception was added by Craig Small in version 0.37 of dh-make.
+
+# Uncomment this to turn on verbose mode.
+#export DH_VERBOSE=1
+
+
+
+
+CFLAGS = -Wall -g
+
+ifneq (,$(findstring noopt,$(DEB_BUILD_OPTIONS)))
+	CFLAGS += -O0
+else
+	CFLAGS += -O2
+endif
+
+configure: 
+
+build: build-stamp
+build-stamp: 
+	dh_testdir
+
+	$(MAKE) -C src
+	#docbook-to-man debian/meme.sgml > meme.1
+
+	touch build-stamp
+
+clean:
+	dh_testdir
+	dh_testroot
+	rm -f build-stamp configure-stamp
+
+	-$(MAKE) -C src distclean
+
+	dh_clean 
+
+install: build
+	dh_testdir
+	dh_testroot
+	dh_clean -k 
+	dh_installdirs
+
+	# Add here commands to install the package into debian/meme.
+	$(MAKE) -C src install DESTDIR=$(CURDIR)/debian/meme
+	cp -r website $(CURDIR)/debian/meme/usr/share/meme/
+
+
+# Build architecture-independent files here.
+binary-indep: build install
+# We have nothing to do by default.
+
+# Build architecture-dependent files here.
+binary-arch: build install
+	dh_testdir
+	dh_testroot
+	dh_installchangelogs 
+	dh_installdocs
+	dh_installexamples
+#	dh_installmenu
+	dh_installman
+	dh_link
+	dh_strip
+	dh_compress
+	dh_fixperms
+	dh_perl
+	dh_installdeb
+	dh_shlibdeps
+	dh_gencontrol
+	dh_md5sums
+	dh_builddeb
+
+binary: binary-indep binary-arch
+.PHONY: build clean binary-indep binary-arch binary install configure

Added: trunk/packages/meme/trunk/debian/watch
===================================================================
--- trunk/packages/meme/trunk/debian/watch	                        (rev 0)
+++ trunk/packages/meme/trunk/debian/watch	2008-03-13 10:30:35 UTC (rev 1573)
@@ -0,0 +1,6 @@
+# Example watch control file for uscan
+# Rename this file to "watch" and then you can run the "uscan" command
+# to check for upstream updates and more.
+# Site		Directory		Pattern			Version	Script
+version=2
+ftp.sdsc.edu pub/sdsc/biology/meme	meme.(.*)\.tar.Z	debian	uupdate




More information about the debian-med-commit mailing list