[Debian-med-packaging] getData functional for symbiosis of EMBOSS and UniProt

Steffen Moeller steffen_moeller at gmx.de
Thu Jul 3 23:07:52 UTC 2008


Hello,

some of you may recall that I planned for a while to provide something
to update databases at least semi-automatically. Well, the protein
sequence databases SWISS-PROT and TrEMBL are now

 * downloaded/updated with wget
 * unpacked
 * indexed with the EMBOSS tools dbiflat and dbxflat
 * the addition to the EMBOSS configuration is printed.

SWISS-PROT is indexed for the IDs, Accession numbers and the description
line. TrEMBL only for the prior two, the description takes considerable
disk space and it did not seem worth to me. Such bits should all become
neatly configurable, which they are not at the moment.

This is how it works. To download, unpack and index:

$ alioth/debian-med/trunk/community/infrastructure/getData/getData 
trembl.dat

You may not want to try this at home since this exceeds the 2Gig barrier
already for the gzipped download. Use the far lighter swiss.dat instead.

getData --help produces some pod2man typical output that may be worth a
look.

To print the configuration

$ alioth/debian-med/trunk/community/infrastructure/getData/getData 
--config emboss trembl.dat --mirrordir=/local/databases/mirrored/



########### trembl.dat ##############

DB trembllocal [
  type: P
  format: swiss
  method: emboss
  directory: /local/databases/mirrored//trembl.dat/uncompressed/trembl.dat
]

####################################

The indexing is performed whenever the indexing program is found in
/usr/bin. This should probably be changed as an optional performance
post the unpacking.

I have some hope that someone of you would like to adapt the script for
other databases. Would someone be going for the addition of indexing for
BLAST? The download of PDB? SCOP?  Pathway data? Well, I'll definitely
add those I need.

I'd be happy to receive suggestions about how the config file of getData
should look alike. The mirrordir should be specified there and for those
who have such already via some other means, the indexing directory
should be somehow different. Well, I don't need such features at the
moment but would really like to hear about someone who prepares this
script for more flexibility.

Cheers,

Steffen




More information about the Debian-med-packaging mailing list