[Python-apps-team] Bug#505397: New version of harvestman 2.0 beta is out

Lucas Szybalski webmaster at lucasmanual.com
Wed Nov 12 01:38:59 UTC 2008


Package: harvestman
Severity: normal


How are you crawling and downloading websites, files, images?
Do you need something better?
Its time for a change !
Download the beta version of harvestman crawler today!!!!

HarvestMan is a modular, extensible and flexible web crawler program
written in pure Python. HarvestMan can be used to download files from
websites according to a number of customized rules and constraints. It
can be used to find information from websites matching keywords or
regular expressions. The latest version of HarvestMan supports as much
as 60 plus customization options.

Download the files here:
http://harvestman-crawler.googlecode.com/files/Harvestman-2.0.4beta.tar.gz

Unzip and install:
tar -xzvf Harvestman-2.0.4beta.tar.gz
cd Harvestman-2.0.4beta
python setup.py install

Create config file and run harvestman:
harvestman --selftest
harvestman --genconfig    (open easy web gui and add the site you want
to crawl, and all the details. Save the config xml file)

Run harvestman
harvestman -C mycrawl.xml
or use harvestman from a command line
harvestman -h

Project website:
http://code.google.com/p/harvestman-crawler/


Forward to anybody that might be interested!!!!

Thank you,
Harvestman Team

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-486
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)





More information about the Python-apps-team mailing list