r769 - in python-clientform/branches/upstream/current: . ClientForm.egg-info
Jérémy Bobbio
lunar at alioth.debian.org
Mon Apr 9 21:18:25 UTC 2007
Author: lunar
Date: 2007-04-09 21:18:24 +0000 (Mon, 09 Apr 2007)
New Revision: 769
Added:
python-clientform/branches/upstream/current/ClientForm.egg-info/dependency_links.txt
python-clientform/branches/upstream/current/ez_setup.py
python-clientform/branches/upstream/current/setup.cfg
Removed:
python-clientform/branches/upstream/current/ez_setup/
Modified:
python-clientform/branches/upstream/current/ChangeLog.txt
python-clientform/branches/upstream/current/ClientForm.egg-info/PKG-INFO
python-clientform/branches/upstream/current/ClientForm.egg-info/SOURCES.txt
python-clientform/branches/upstream/current/ClientForm.egg-info/zip-safe
python-clientform/branches/upstream/current/ClientForm.py
python-clientform/branches/upstream/current/GeneralFAQ.html
python-clientform/branches/upstream/current/INSTALL.txt
python-clientform/branches/upstream/current/MANIFEST.in
python-clientform/branches/upstream/current/PKG-INFO
python-clientform/branches/upstream/current/README.html
python-clientform/branches/upstream/current/README.html.in
python-clientform/branches/upstream/current/README.txt
python-clientform/branches/upstream/current/setup.py
python-clientform/branches/upstream/current/test.py
Log:
[svn-upgrade] Integrating new upstream version, python-clientform (0.2.6)
Modified: python-clientform/branches/upstream/current/ChangeLog.txt
===================================================================
--- python-clientform/branches/upstream/current/ChangeLog.txt 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ChangeLog.txt 2007-04-09 21:18:24 UTC (rev 769)
@@ -1,6 +1,49 @@
This isn't really in proper GNU ChangeLog format, it just happens to
look that way.
+2007-01-07 John J Lee <jjl at pobox.com>
+ * 0.2.6 release:
+ * Don't allow underlying parser errors (e.g. SGMLParseError)
+ through -- always raise ClientForm.ParseError . To preserve.
+ However, any code that distinguishes between
+ ClientForm.ParseError and these other exceptions will break.
+ * Allow controls to appear outside of forms (the HTML spec allows
+ this). This involved adding new functions ParseFileEx and
+ ParseResponseEx, which return a list that's always one longer
+ than the return value of their counterparts ParseFile and
+ ParseResponse. The new first element in the list of forms is an
+ HTMLForm representing the collection of all forms that lie
+ outside of any FORM element.
+
+2006-10-24 John J Lee <jjl at pobox.com>
+ * 0.2.5 release:
+ * Fix fragment bug introduced in 0.2.4 (thanks Dave Marble). This
+ only caused a bug when used with mechanize: it does not affect
+ users using ClientForm without mechanize.
+
+2006-10-14 John J Lee <jjl at pobox.com>
+ * 0.2.4 release:
+ * Support for mechanize 0.1.4b.
+
+2006-10-12 John J Lee <jjl at pobox.com>
+ * 0.2.3 release:
+ * Fix entity reference / character reference handling for
+ Python 2.5 .
+ * Nameless list controls are now never successful.
+ * List controls used to get inappropriately .merge_control()ed
+ with other controls, or parsing would raise AmbiguityError.
+ That's fixed now.
+ * Handle line endings in element content the same way browsers do
+ (strip exactly one leading linebreaks, if any leading linebreaks
+ are present) (patch from Benji York).
+ * Convert TEXTAREA content to DOS line ending convention, again
+ following the major browsers.
+ * Allow mechanize to supply URL join / parse / unparse functions,
+ to allow mechanize to follow RFC 3986, thus fixing some URL
+ processing bugs. ClientForm should do the same; probably I
+ should merge the two projects after final mechanize release.
+ * Doc fixes.
+
2006-03-22 John J Lee <jjl at pobox.com>
* 0.2.2 release:
* Stop trying to record precise dates in changelog, since that's
Modified: python-clientform/branches/upstream/current/ClientForm.egg-info/PKG-INFO
===================================================================
--- python-clientform/branches/upstream/current/ClientForm.egg-info/PKG-INFO 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ClientForm.egg-info/PKG-INFO 2007-04-09 21:18:24 UTC (rev 769)
@@ -1,12 +1,12 @@
Metadata-Version: 1.0
Name: ClientForm
-Version: 0.2.2
+Version: 0.2.6
Summary: Client-side HTML form handling.
Home-page: http://wwwsearch.sourceforge.net/ClientForm/
Author: John J. Lee
Author-email: jjl at pobox.com
License: BSD
-Download-URL: http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.2.tar.gz
+Download-URL: http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.6.tar.gz
Description: ClientForm is a Python module for handling HTML forms on the client
side, useful for parsing HTML forms, filling them in and returning the
completed forms to the server. It developed from a port of Gisle Aas'
Modified: python-clientform/branches/upstream/current/ClientForm.egg-info/SOURCES.txt
===================================================================
--- python-clientform/branches/upstream/current/ClientForm.egg-info/SOURCES.txt 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ClientForm.egg-info/SOURCES.txt 2007-04-09 21:18:24 UTC (rev 769)
@@ -8,9 +8,12 @@
README.html
README.html.in
README.txt
+ez_setup.py
setup.py
test.py
ClientForm.egg-info/PKG-INFO
+ClientForm.egg-info/SOURCES.txt
+ClientForm.egg-info/dependency_links.txt
ClientForm.egg-info/top_level.txt
ClientForm.egg-info/zip-safe
examples/data.dat
@@ -19,8 +22,6 @@
examples/example.html
examples/example.py
examples/simple.py
-ez_setup/README.txt
-ez_setup/__init__.py
testdata/Auth.html
testdata/FullSearch.html
testdata/GeneralSearch.html
Added: python-clientform/branches/upstream/current/ClientForm.egg-info/dependency_links.txt
===================================================================
--- python-clientform/branches/upstream/current/ClientForm.egg-info/dependency_links.txt 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ClientForm.egg-info/dependency_links.txt 2007-04-09 21:18:24 UTC (rev 769)
@@ -0,0 +1 @@
+
Modified: python-clientform/branches/upstream/current/ClientForm.egg-info/zip-safe
===================================================================
--- python-clientform/branches/upstream/current/ClientForm.egg-info/zip-safe 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ClientForm.egg-info/zip-safe 2007-04-09 21:18:24 UTC (rev 769)
@@ -0,0 +1 @@
+
Modified: python-clientform/branches/upstream/current/ClientForm.py
===================================================================
--- python-clientform/branches/upstream/current/ClientForm.py 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ClientForm.py 2007-04-09 21:18:24 UTC (rev 769)
@@ -21,18 +21,15 @@
Copyright 1998-2000 Gisle Aas.
This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included with
-the distribution).
+under the terms of the BSD or ZPL 2.1 licenses (see the file
+COPYING.txt included with the distribution).
"""
# XXX
-# Remove unescape_attr method
+# add an __all__
# Remove parser testing hack
# safeUrl()-ize action
-# Really should to merge CC, CF, pp and mechanize as soon as mechanize
-# goes to beta...
-# Add url attribute to ParseError
# Switch to unicode throughout (would be 0.3.x)
# See Wichert Akkerman's 2004-01-22 message to c.l.py.
# Add charset parameter to Content-type headers? How to find value??
@@ -41,11 +38,6 @@
# Does file upload work when name is missing? Sourceforge tracker form
# doesn't like it. Check standards, and test with Apache. Test
# binary upload with Apache.
-# Controls can have name=None (e.g. forms constructed partly with
-# JavaScript), but find_control can't be told to find a control
-# with that name, because None there means 'unspecified'. Can still
-# get at by nr, but would be nice to be able to specify something
-# equivalent to name=None, too.
# mailto submission & enctype text/plain
# I'm not going to fix this unless somebody tells me what real servers
# that want this encoding actually expect: If enctype is
@@ -109,10 +101,22 @@
import sys, urllib, urllib2, types, mimetools, copy, urlparse, \
htmlentitydefs, re, random
-from urlparse import urljoin
from cStringIO import StringIO
+import sgmllib
+# monkeypatch to fix http://www.python.org/sf/803422 :-(
+sgmllib.charref = re.compile("&#(x?[0-9a-fA-F]+)[^0-9a-fA-F]")
+
+# HTMLParser.HTMLParser is recent, so live without it if it's not available
+# (also, sgmllib.SGMLParser is much more tolerant of bad HTML)
try:
+ import HTMLParser
+except ImportError:
+ HAVE_MODULE_HTMLPARSER = False
+else:
+ HAVE_MODULE_HTMLPARSER = True
+
+try:
import warnings
except ImportError:
def deprecation(message):
@@ -121,15 +125,21 @@
def deprecation(message):
warnings.warn(message, DeprecationWarning, stacklevel=2)
-VERSION = "0.2.2"
+VERSION = "0.2.6"
CHUNK = 1024 # size of chunks fed to parser, in bytes
DEFAULT_ENCODING = "latin-1"
+class Missing: pass
+
_compress_re = re.compile(r"\s+")
def compress_text(text): return _compress_re.sub(" ", text.strip())
+def normalize_line_endings(text):
+ return re.sub(r"(?:(?<!\r)\n)|(?:\r(?!\n))", "\r\n", text)
+
+
# This version of urlencode is from my Python 1.5.2 back-port of the
# Python 2.1 CVS maintenance branch of urllib. It will accept a sequence
# of pairs instead of a mapping -- the 2.0 version only accepts a mapping.
@@ -429,10 +439,25 @@
class ItemCountError(ValueError): pass
+# for backwards compatibility, ParseError derives from exceptions that were
+# raised by versions of ClientForm <= 0.2.5
+if HAVE_MODULE_HTMLPARSER:
+ SGMLLIB_PARSEERROR = sgmllib.SGMLParseError
+ class ParseError(sgmllib.SGMLParseError,
+ HTMLParser.HTMLParseError,
+ ):
+ pass
+else:
+ if hasattr(sgmllib, "SGMLParseError"):
+ SGMLLIB_PARSEERROR = sgmllib.SGMLParseError
+ class ParseError(sgmllib.SGMLParseError):
+ pass
+ else:
+ SGMLLIB_PARSEERROR = RuntimeError
+ class ParseError(RuntimeError):
+ pass
-class ParseError(Exception): pass
-
class _AbstractFormParser:
"""forms attribute contains HTMLForm instances on completion."""
# thanks to Moshe Zadka for an example of sgmllib/htmllib usage
@@ -452,6 +477,13 @@
self._option = None
self._textarea = None
+ # forms[0] will contain all controls that are outside of any form
+ # self._global_form is an alias for self.forms[0]
+ self._global_form = None
+ self.start_form([])
+ self.end_form()
+ self._current_form = self._global_form = self.forms[0]
+
def do_base(self, attrs):
debug("%s", attrs)
for key, value in attrs:
@@ -462,12 +494,12 @@
debug("")
if self._current_label is not None:
self.end_label()
- if self._current_form is not None:
+ if self._current_form is not self._global_form:
self.end_form()
def start_form(self, attrs):
debug("%s", attrs)
- if self._current_form is not None:
+ if self._current_form is not self._global_form:
raise ParseError("nested FORMs")
name = None
action = None
@@ -491,15 +523,13 @@
debug("")
if self._current_label is not None:
self.end_label()
- if self._current_form is None:
+ if self._current_form is self._global_form:
raise ParseError("end of FORM before start")
self.forms.append(self._current_form)
- self._current_form = None
+ self._current_form = self._global_form
def start_select(self, attrs):
debug("%s", attrs)
- if self._current_form is None:
- raise ParseError("start of SELECT before start of FORM")
if self._select is not None:
raise ParseError("nested SELECTs")
if self._textarea is not None:
@@ -515,8 +545,8 @@
def end_select(self):
debug("")
- if self._current_form is None:
- raise ParseError("end of SELECT before start of FORM")
+ if self._current_form is self._global_form:
+ return
if self._select is None:
raise ParseError("end of SELECT before start")
@@ -583,8 +613,6 @@
def start_textarea(self, attrs):
debug("%s", attrs)
- if self._current_form is None:
- raise ParseError("start of TEXTAREA before start of FORM")
if self._textarea is not None:
raise ParseError("nested TEXTAREAs")
if self._select is not None:
@@ -598,8 +626,8 @@
def end_textarea(self):
debug("")
- if self._current_form is None:
- raise ParseError("end of TEXTAREA before start of FORM")
+ if self._current_form is self._global_form:
+ return
if self._textarea is None:
raise ParseError("end of TEXTAREA before start")
controls = self._current_form[2]
@@ -643,6 +671,16 @@
def handle_data(self, data):
debug("%s", data)
+
+ # according to http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1
+ # line break immediately after start tags or immediately before end
+ # tags must be ignored, but real browsers only ignore a line break
+ # after a start tag, so we'll do that.
+ if data[0:2] == "\r\n":
+ data = data[2:]
+ if data[0:1] in ["\n", "\r"]:
+ data = data[1:]
+
if self._option is not None:
# self._option is a dictionary of the OPTION element's HTML
# attributes, but it has two special keys, one of which is the
@@ -653,6 +691,7 @@
elif self._textarea is not None:
map = self._textarea
key = "value"
+ data = normalize_line_endings(data)
# not if within option or textarea
elif self._current_label is not None:
map = self._current_label
@@ -667,8 +706,6 @@
def do_button(self, attrs):
debug("%s", attrs)
- if self._current_form is None:
- raise ParseError("start of BUTTON before start of FORM")
d = {}
d["type"] = "submit" # default
for key, val in attrs:
@@ -687,8 +724,6 @@
def do_input(self, attrs):
debug("%s", attrs)
- if self._current_form is None:
- raise ParseError("start of INPUT before start of FORM")
d = {}
d["type"] = "text" # default
for key, val in attrs:
@@ -702,8 +737,6 @@
def do_isindex(self, attrs):
debug("%s", attrs)
- if self._current_form is None:
- raise ParseError("start of ISINDEX before start of FORM")
d = {}
for key, val in attrs:
d[key] = val
@@ -743,11 +776,7 @@
def unknown_charref(self, ref): self.handle_data("&#%s;" % ref)
-# HTMLParser.HTMLParser is recent, so live without it if it's not available
-# (also, htmllib.HTMLParser is much more tolerant of bad HTML)
-try:
- import HTMLParser
-except ImportError:
+if not HAVE_MODULE_HTMLPARSER:
class XHTMLCompatibleFormParser:
def __init__(self, entitydefs=None, encoding=DEFAULT_ENCODING):
raise ValueError("HTMLParser could not be imported")
@@ -759,6 +788,12 @@
HTMLParser.HTMLParser.__init__(self)
_AbstractFormParser.__init__(self, entitydefs, encoding)
+ def feed(self, data):
+ try:
+ HTMLParser.HTMLParser.feed(self, data)
+ except HTMLParser.HTMLParseError, exc:
+ raise ParseError(exc)
+
def start_option(self, attrs):
_AbstractFormParser._start_option(self, attrs)
@@ -796,31 +831,52 @@
def unescape_attrs_if_required(self, attrs):
return attrs # ditto
-import sgmllib
-# monkeypatch to fix http://www.python.org/sf/803422 :-(
-sgmllib.charref = re.compile("&#(x?[0-9a-fA-F]+)[^0-9a-fA-F]")
+
class _AbstractSgmllibParser(_AbstractFormParser):
+
def do_option(self, attrs):
_AbstractFormParser._start_option(self, attrs)
- def unescape_attr_if_required(self, name):
- return self.unescape_attr(name)
- def unescape_attrs_if_required(self, attrs):
- return self.unescape_attrs(attrs)
+ if sys.version_info[:2] >= (2,5):
+ # we override this attr to decode hex charrefs
+ entity_or_charref = re.compile(
+ '&(?:([a-zA-Z][-.a-zA-Z0-9]*)|#(x?[0-9a-fA-F]+))(;?)')
+ def convert_entityref(self, name):
+ return unescape("&%s;" % name, self._entitydefs, self._encoding)
+ def convert_charref(self, name):
+ return unescape_charref("%s" % name, self._encoding)
+ def unescape_attr_if_required(self, name):
+ return name # sgmllib already did it
+ def unescape_attrs_if_required(self, attrs):
+ return attrs # ditto
+ else:
+ def unescape_attr_if_required(self, name):
+ return self.unescape_attr(name)
+ def unescape_attrs_if_required(self, attrs):
+ return self.unescape_attrs(attrs)
+
class FormParser(_AbstractSgmllibParser, sgmllib.SGMLParser):
"""Good for tolerance of incorrect HTML, bad for XHTML."""
def __init__(self, entitydefs=None, encoding=DEFAULT_ENCODING):
sgmllib.SGMLParser.__init__(self)
_AbstractFormParser.__init__(self, entitydefs, encoding)
-try:
- if sys.version_info[:2] < (2, 2):
- raise ImportError # BeautifulSoup uses generators
- import BeautifulSoup
-except ImportError:
- pass
-else:
+ def feed(self, data):
+ try:
+ sgmllib.SGMLParser.feed(self, data)
+ except SGMLLIB_PARSEERROR, exc:
+ raise ParseError(exc)
+
+
+
+# sigh, must support mechanize by allowing dynamic creation of classes based on
+# its bundled copy of BeautifulSoup (which was necessary because of dependency
+# problems)
+
+def _create_bs_classes(bs,
+ icbinbs,
+ ):
class _AbstractBSFormParser(_AbstractSgmllibParser):
bs_base_class = None
def __init__(self, entitydefs=None, encoding=DEFAULT_ENCODING):
@@ -829,31 +885,114 @@
def handle_data(self, data):
_AbstractFormParser.handle_data(self, data)
self.bs_base_class.handle_data(self, data)
+ def feed(self, data):
+ try:
+ self.bs_base_class.feed(self, data)
+ except SGMLLIB_PARSEERROR, exc:
+ raise ParseError(exc)
- class RobustFormParser(_AbstractBSFormParser, BeautifulSoup.BeautifulSoup):
+
+ class RobustFormParser(_AbstractBSFormParser, bs):
"""Tries to be highly tolerant of incorrect HTML."""
- bs_base_class = BeautifulSoup.BeautifulSoup
- class NestingRobustFormParser(_AbstractBSFormParser,
- BeautifulSoup.ICantBelieveItsBeautifulSoup):
+ pass
+ RobustFormParser.bs_base_class = bs
+ class NestingRobustFormParser(_AbstractBSFormParser, icbinbs):
"""Tries to be highly tolerant of incorrect HTML.
Different from RobustFormParser in that it more often guesses nesting
above missing end tags (see BeautifulSoup docs).
"""
- bs_base_class = BeautifulSoup.ICantBelieveItsBeautifulSoup
+ pass
+ NestingRobustFormParser.bs_base_class = icbinbs
+ return RobustFormParser, NestingRobustFormParser
+
+try:
+ if sys.version_info[:2] < (2, 2):
+ raise ImportError # BeautifulSoup uses generators
+ import BeautifulSoup
+except ImportError:
+ pass
+else:
+ RobustFormParser, NestingRobustFormParser = _create_bs_classes(
+ BeautifulSoup.BeautifulSoup, BeautifulSoup.ICantBelieveItsBeautifulSoup
+ )
+
+
#FormParser = XHTMLCompatibleFormParser # testing hack
#FormParser = RobustFormParser # testing hack
-def ParseResponse(response, select_default=False,
- ignore_errors=False, # ignored!
- form_parser_class=FormParser,
- request_class=urllib2.Request,
- entitydefs=None,
- backwards_compat=True,
- encoding=DEFAULT_ENCODING,
- ):
+
+def ParseResponseEx(response,
+ select_default=False,
+ form_parser_class=FormParser,
+ request_class=urllib2.Request,
+ entitydefs=None,
+ encoding=DEFAULT_ENCODING,
+
+ # private
+ _urljoin=urlparse.urljoin,
+ _urlparse=urlparse.urlparse,
+ _urlunparse=urlparse.urlunparse,
+ ):
+ """Identical to ParseResponse, except that:
+
+ 1. The returned list contains an extra item. The first form in the list
+ contains all controls not contained in any FORM element.
+
+ 2. The arguments ignore_errors and backwards_compat have been removed.
+
+ 3. Backwards-compatibility mode (backwards_compat=True) is not available.
+ """
+ return _ParseFileEx(response, response.geturl(),
+ select_default,
+ False,
+ form_parser_class,
+ request_class,
+ entitydefs,
+ False,
+ encoding,
+ _urljoin=_urljoin,
+ _urlparse=_urlparse,
+ _urlunparse=_urlunparse,
+ )
+
+def ParseFileEx(file, base_uri,
+ select_default=False,
+ form_parser_class=FormParser,
+ request_class=urllib2.Request,
+ entitydefs=None,
+ encoding=DEFAULT_ENCODING,
+
+ # private
+ _urljoin=urlparse.urljoin,
+ _urlparse=urlparse.urlparse,
+ _urlunparse=urlparse.urlunparse,
+ ):
+ """Identical to ParseFile, except that:
+
+ 1. The returned list contains an extra item. The first form in the list
+ contains all controls not contained in any FORM element.
+
+ 2. The arguments ignore_errors and backwards_compat have been removed.
+
+ 3. Backwards-compatibility mode (backwards_compat=True) is not available.
+ """
+ return _ParseFileEx(file, base_uri,
+ select_default,
+ False,
+ form_parser_class,
+ request_class,
+ entitydefs,
+ False,
+ encoding,
+ _urljoin=_urljoin,
+ _urlparse=_urlparse,
+ _urlunparse=_urlunparse,
+ )
+
+def ParseResponse(response, *args, **kwds):
"""Parse HTTP response and return a list of HTMLForm instances.
The return value of urllib2.urlopen can be conveniently passed to this
@@ -913,23 +1052,9 @@
own risk: there is no well-defined interface.
"""
- return ParseFile(response, response.geturl(), select_default,
- False,
- form_parser_class,
- request_class,
- entitydefs,
- backwards_compat,
- encoding,
- )
+ return _ParseFileEx(response, response.geturl(), *args, **kwds)[1:]
-def ParseFile(file, base_uri, select_default=False,
- ignore_errors=False, # ignored!
- form_parser_class=FormParser,
- request_class=urllib2.Request,
- entitydefs=None,
- backwards_compat=True,
- encoding=DEFAULT_ENCODING,
- ):
+def ParseFile(file, base_uri, *args, **kwds):
"""Parse HTML and return a list of HTMLForm instances.
ClientForm.ParseError is raised on parse errors.
@@ -943,6 +1068,20 @@
For the other arguments and further details, see ParseResponse.__doc__.
"""
+ return _ParseFileEx(file, base_uri, *args, **kwds)[1:]
+
+def _ParseFileEx(file, base_uri,
+ select_default=False,
+ ignore_errors=False,
+ form_parser_class=FormParser,
+ request_class=urllib2.Request,
+ entitydefs=None,
+ backwards_compat=True,
+ encoding=DEFAULT_ENCODING,
+ _urljoin=urlparse.urljoin,
+ _urlparse=urlparse.urlparse,
+ _urlunparse=urlparse.urlunparse,
+ ):
if backwards_compat:
deprecation("operating in backwards-compatibility mode")
fp = form_parser_class(entitydefs, encoding)
@@ -973,7 +1112,7 @@
if action is None:
action = base_uri
else:
- action = urljoin(base_uri, action)
+ action = _urljoin(base_uri, action)
action = fp.unescape_attr_if_required(action)
name = fp.unescape_attr_if_required(name)
attrs = fp.unescape_attrs_if_required(attrs)
@@ -981,6 +1120,8 @@
form = HTMLForm(
action, method, enctype, name, attrs, request_class,
forms, labels, id_to_labels, backwards_compat)
+ form._urlparse = _urlparse
+ form._urlunparse = _urlunparse
for ii in range(len(controls)):
type, name, attrs = controls[ii]
attrs = fp.unescape_attrs_if_required(attrs)
@@ -1174,6 +1315,9 @@
self._clicked = False
+ self._urlparse = urlparse.urlparse
+ self._urlunparse = urlparse.urlunparse
+
def __getattr__(self, name):
if name == "value":
return self.__dict__["_value"]
@@ -1382,10 +1526,10 @@
# This doesn't seem to be specified in HTML 4.01 spec. (ISINDEX is
# deprecated in 4.01, but it should still say how to submit it).
# Submission of ISINDEX is explained in the HTML 3.2 spec, though.
- parts = urlparse.urlparse(form.action)
+ parts = self._urlparse(form.action)
rest, (query, frag) = parts[:-2], parts[-2:]
- parts = rest + (urllib.quote_plus(self.value), "")
- url = urlparse.urlunparse(parts)
+ parts = rest + (urllib.quote_plus(self.value), None)
+ url = self._urlunparse(parts)
req_data = url, None, []
if return_type == "pairs":
@@ -1853,12 +1997,16 @@
assert self._form is None or form == self._form, (
"can't add control to more than one form")
self._form = form
- try:
- control = form.find_control(self.name, self.type)
- except ControlNotFoundError:
+ if self.name is None:
+ # always count nameless elements as separate controls
Control.add_to_form(self, form)
else:
- control.merge_control(self)
+ try:
+ control = form.find_control(self.name, self.type)
+ except (ControlNotFoundError, AmbiguityError):
+ Control.add_to_form(self, form)
+ else:
+ control.merge_control(self)
def merge_control(self, control):
assert bool(control.multiple) == bool(self.multiple)
@@ -1904,6 +2052,8 @@
def __getattr__(self, name):
if name == "value":
compat = self._form.backwards_compat
+ if self.name is None:
+ return []
return [o.name for o in self.items if o.selected and
(not o.disabled or compat)]
else:
@@ -2073,7 +2223,7 @@
return [o.name for o in self.items]
def _totally_ordered_pairs(self):
- if self.disabled:
+ if self.disabled or self.name is None:
return []
else:
return [(o._index, self.name, o.name) for o in self.items
@@ -2615,6 +2765,9 @@
self.backwards_compat = backwards_compat # note __setattr__
+ self._urlunparse = urlparse.urlunparse
+ self._urlparse = urlparse.urlparse
+
def __getattr__(self, name):
if name == "backwards_compat":
return self._backwards_compat
@@ -2673,6 +2826,8 @@
else:
control = klass(type, name, a, index)
control.add_to_form(self)
+ control._urlparse = self._urlparse
+ control._urlunparse = self._urlunparse
def fixup(self):
"""Normalise form after all controls have been added.
@@ -3050,7 +3205,8 @@
is_listcontrol, nr)
def _find_control(self, name, type, kind, id, label, predicate, nr):
- if (name is not None) and not isstringlike(name):
+ if ((name is not None) and (name is not Missing) and
+ not isstringlike(name)):
raise TypeError("control name must be string-like")
if (type is not None) and not isstringlike(type):
raise TypeError("control type must be string-like")
@@ -3072,7 +3228,8 @@
nr = 0
for control in self.controls:
- if name is not None and name != control.name:
+ if ((name is not None and name != control.name) and
+ (name is not Missing or control.name is not None)):
continue
if type is not None and type != control.type:
continue
@@ -3102,7 +3259,7 @@
return found
description = []
- if name is not None: description.append("name '%s'" % name)
+ if name is not None: description.append("name %s" % repr(name))
if type is not None: description.append("type '%s'" % type)
if kind is not None: description.append("kind '%s'" % kind)
if id is not None: description.append("id '%s'" % id)
@@ -3159,19 +3316,19 @@
"""Return a tuple (url, data, headers)."""
method = self.method.upper()
#scheme, netloc, path, parameters, query, frag = urlparse.urlparse(self.action)
- parts = urlparse.urlparse(self.action)
+ parts = self._urlparse(self.action)
rest, (query, frag) = parts[:-2], parts[-2:]
if method == "GET":
if self.enctype != "application/x-www-form-urlencoded":
raise ValueError(
"unknown GET form encoding type '%s'" % self.enctype)
- parts = rest + (urlencode(self._pairs()), "")
- uri = urlparse.urlunparse(parts)
+ parts = rest + (urlencode(self._pairs()), None)
+ uri = self._urlunparse(parts)
return uri, None, []
elif method == "POST":
- parts = rest + (query, "")
- uri = urlparse.urlunparse(parts)
+ parts = rest + (query, None)
+ uri = self._urlunparse(parts)
if self.enctype == "application/x-www-form-urlencoded":
return (uri, urlencode(self._pairs()),
[("Content-type", self.enctype)])
Modified: python-clientform/branches/upstream/current/GeneralFAQ.html
===================================================================
--- python-clientform/branches/upstream/current/GeneralFAQ.html 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/GeneralFAQ.html 2007-04-09 21:18:24 UTC (rev 769)
@@ -8,7 +8,7 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="author" content="John J. Lee <jjl at pobox.com>">
- <meta name="date" content="2006-01-05">
+ <meta name="date" content="2006-05-06">
<meta name="keywords" content="FAQ,cookie,HTTP,HTML,form,table,Python,web,client,client-side,testing,sniffer,https,script,embedded">
<title>Python web-client programming general FAQs</title>
<style type="text/css" media="screen">@import "../styles/style.css";</style>
@@ -26,10 +26,10 @@
<div id="Content">
<ul>
<li>Is there any example code?
- <p>There's (still!) a bit of a shortage of example code for ClientCookie
- and ClientForm &co., because the stuff I've written tends to either
- require access to restricted-access sites, or is proprietary code (and the
- same goes for other people's code).
+ <p>Look in the examples directory of <a href="../mechanize">mechanize</a>.
+ Note that the examples on the <a href="../ClientForm">ClientForm page</a>
+ are executable as-is. Contributions of example code would be very
+ welcome!
<li>HTTPS on Windows?
<p>Use this <a href="http://pypgsql.sourceforge.net/misc/python22-win32-ssl.zip">
_socket.pyd</a>, or use Python 2.3.
@@ -67,7 +67,7 @@
<code>CookieJar</code> instance, calling methods on
<code>HTMLForm</code>s, calling <code>urlopen</code>, etc.
- <li>Dump ClientCookie and ClientForm and automate a browser instead.
+ <li>Dump mechanize and ClientForm and automate a browser instead.
For example use MS Internet Explorer via its COM automation interfaces, using
the <a href="http://starship.python.net/crew/mhammond/">Python for
Windows extensions</a>, aka pywin32, aka win32all (eg.
@@ -127,7 +127,7 @@
<li>Will any of this code make its way into the Python standard library?
<p>The request / response processing extensions to <code>urllib2</code> from
- ClientCookie have been merged into <code>urllib2</code> for Python 2.4.
+ mechanize have been merged into <code>urllib2</code> for Python 2.4.
The cookie processing has been added, as module <code>cookielib</code>.
Eventually, I'll submit patches to get the http-equiv, refresh, and
robots.txt code in there too, and maybe <code>mechanize.UserAgent</code>
@@ -141,7 +141,7 @@
mailing list</a> rather than direct to me.
<p><a href="mailto:jjl at pobox.com">John J. Lee</a>,
-January 2006.
+May 2006.
</div> <!--id="Content"-->
@@ -152,11 +152,12 @@
<span class="thispage">General FAQs</span><br>
<br>
<a href="../mechanize/">mechanize</a><br>
-<a href="../pullparser/">pullparser</a><br>
+<a href="../mechanize/doc.html"><span class="subpage">mechanize docs</span></a><br>
+<a href="../ClientForm/">ClientForm</a><br>
+<br>
<a href="../ClientCookie/">ClientCookie</a><br>
<a href="../ClientCookie/doc.html"><span class="subpage">ClientCookie docs</span></a><br>
-<a href="../ClientForm/">ClientForm</a><br>
-<br>
+<a href="../pullparser/">pullparser</a><br>
<a href="../DOMForm/">DOMForm</a><br>
<a href="../python-spidermonkey/">python-spidermonkey</a><br>
<a href="../ClientTable/">ClientTable</a><br>
Modified: python-clientform/branches/upstream/current/INSTALL.txt
===================================================================
--- python-clientform/branches/upstream/current/INSTALL.txt 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/INSTALL.txt 2007-04-09 21:18:24 UTC (rev 769)
@@ -54,8 +54,8 @@
See the file COPYRIGHT.txt for copyright information.
This code in this package is free software; you can redistribute it
-and/or modify it under the terms of the BSD or ZPL licenses (see the
-file COPYING.txt).
+and/or modify it under the terms of the BSD or ZPL 2.1 licenses (see
+the file COPYING.txt).
John J. Lee <jjl at pobox.com>
March 2006
Modified: python-clientform/branches/upstream/current/MANIFEST.in
===================================================================
--- python-clientform/branches/upstream/current/MANIFEST.in 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/MANIFEST.in 2007-04-09 21:18:24 UTC (rev 769)
@@ -10,4 +10,3 @@
include *.py
recursive-include testdata *.html
recursive-include examples *.dat *.txt *.html *.cgi *.py
-recursive-include ez_setup *.py
Modified: python-clientform/branches/upstream/current/PKG-INFO
===================================================================
--- python-clientform/branches/upstream/current/PKG-INFO 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/PKG-INFO 2007-04-09 21:18:24 UTC (rev 769)
@@ -1,12 +1,12 @@
Metadata-Version: 1.0
Name: ClientForm
-Version: 0.2.2
+Version: 0.2.6
Summary: Client-side HTML form handling.
Home-page: http://wwwsearch.sourceforge.net/ClientForm/
Author: John J. Lee
Author-email: jjl at pobox.com
License: BSD
-Download-URL: http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.2.tar.gz
+Download-URL: http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.6.tar.gz
Description: ClientForm is a Python module for handling HTML forms on the client
side, useful for parsing HTML forms, filling them in and returning the
completed forms to the server. It developed from a port of Gisle Aas'
Modified: python-clientform/branches/upstream/current/README.html
===================================================================
--- python-clientform/branches/upstream/current/README.html 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/README.html 2007-04-09 21:18:24 UTC (rev 769)
@@ -2,18 +2,15 @@
"http://www.w3.org/TR/html4/strict.dtd">
<!--This file was generated by EmPy from README.html.in : do not edit-->
-
-
-
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="author" content="John J. Lee <jjl at pobox.com>">
- <meta name="date" content="2006-03-22">
+ <meta name="date" content="2006-10-25">
<meta name="keywords" content="form,HTML,Python,web,client,client-side">
<title>ClientForm</title>
<style type="text/css" media="screen">@import "../styles/style.css";</style>
- <base href="http://wwwsearch.sourceforge.net/ClientForm/">
+
</head>
<body>
@@ -267,9 +264,9 @@
<p>For full documentation, see the docstrings in ClientForm.py.
-<p><em><strong>Note: this page describes the 0.2 (development release)
-interface. See <a href="./src/README_0_1_18.html">here</a> for the stable
-0.1 interface.</strong> </em>
+<p><em><strong>Note: this page describes the 0.2 (stable release)
+interface. See <a href="./src/README-0_1_17.html">here</a> for the
+old 0.1 interface.</strong> </em>
<a name="parsers"></a>
@@ -338,7 +335,7 @@
deselected: AttributeError is raised in 0.2, whereas deselection was allowed in
0.1. The bug in 0.1 and in 0.2's backwards-compatibility mode will not be
fixed, to preserve compatibility and to encourage people to upgrade to the new
-0.2 <code>backwards_compat=True</code> behaviour. </ul>
+0.2 <code>backwards_compat=False</code> behaviour. </ul>
<a name="credits"></a>
<h2>Credits</h2>
@@ -369,8 +366,8 @@
<ul>
-<li><a href="./src/ClientForm-0.2.2.tar.gz">ClientForm-0.2.2.tar.gz</a>
-<li><a href="./src/ClientForm-0.2.2.zip">ClientForm-0.2.2.zip</a>
+<li><a href="./src/ClientForm-0.2.6.tar.gz">ClientForm-0.2.6.tar.gz</a>
+<li><a href="./src/ClientForm-0.2.6.zip">ClientForm-0.2.6.zip</a>
<li><a href="./src/ChangeLog.txt">Change Log</a> (included in distribution)
<li><a href="./src/">Older releases.</a>
</ul>
@@ -543,7 +540,7 @@
mailing list</a> rather than direct to me.
<p><a href="mailto:jjl at pobox.com">John J. Lee</a>,
-March 2006.
+October 2006.
</div>
@@ -554,11 +551,12 @@
<a href="../bits/GeneralFAQ.html">General FAQs</a><br>
<br>
<a href="../mechanize/">mechanize</a><br>
-<a href="../pullparser/">pullparser</a><br>
-<a href="../ClientCookie/">ClientCookie</a><br>
-<a href="../ClientCookie/doc.html"><span class="subpage">ClientCookie docs</span></a><br>
+<a href="../mechanize/doc.html"><span class="subpage">mechanize docs</span></a><br>
<span class="thispage">ClientForm</span><br>
<br>
+<a href="../ClientCookie/">ClientCookie</a><br>
+<a href="../ClientCookie/doc.html"><span class="subpage">ClientCookie docs</span></a><br>
+<a href="../pullparser/">pullparser</a><br>
<a href="../DOMForm/">DOMForm</a><br>
<a href="../python-spidermonkey/">python-spidermonkey</a><br>
<a href="../ClientTable/">ClientTable</a><br>
Modified: python-clientform/branches/upstream/current/README.html.in
===================================================================
--- python-clientform/branches/upstream/current/README.html.in 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/README.html.in 2007-04-09 21:18:24 UTC (rev 769)
@@ -3,10 +3,16 @@
@# This file is processed by EmPy
<!--This file was generated by EmPy from README.html.in : do not edit-->
@# http://wwwsearch.sf.net/bits/colorize.py
-@{from colorize import colorize}
-@{import time}
-@{import release}
-@{last_modified = release.svn_id_to_time("$Id: README.html.in 24825 2006-03-22 21:41:49Z jjlee $")}
+@{
+from colorize import colorize
+import time
+import release
+last_modified = release.svn_id_to_time("$Id: README.html.in 33738 2006-10-25 19:54:30Z jjlee $")
+try:
+ base
+except NameError:
+ base = False
+}
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
@@ -15,7 +21,7 @@
<meta name="keywords" content="form,HTML,Python,web,client,client-side">
<title>ClientForm</title>
<style type="text/css" media="screen">@@import "../styles/style.css";</style>
- <base href="http://wwwsearch.sourceforge.net/ClientForm/">
+ @[if base]<base href="http://wwwsearch.sourceforge.net/ClientForm/">@[end if]
</head>
<body>
@@ -75,9 +81,9 @@
<p>For full documentation, see the docstrings in ClientForm.py.
-<p><em><strong>Note: this page describes the 0.2 (development release)
-interface. See <a href="./src/README_0_1_18.html">here</a> for the stable
-0.1 interface.</strong> </em>
+<p><em><strong>Note: this page describes the 0.2 (stable release)
+interface. See <a href="./src/README-0_1_17.html">here</a> for the
+old 0.1 interface.</strong> </em>
<a name="parsers"></a>
@@ -146,7 +152,7 @@
deselected: AttributeError is raised in 0.2, whereas deselection was allowed in
0.1. The bug in 0.1 and in 0.2's backwards-compatibility mode will not be
fixed, to preserve compatibility and to encourage people to upgrade to the new
-0.2 <code>backwards_compat=True</code> behaviour. </ul>
+0.2 <code>backwards_compat=False</code> behaviour. </ul>
<a name="credits"></a>
<h2>Credits</h2>
@@ -176,7 +182,7 @@
methods are still there, but some have been deprecated and a few added).
<ul>
-@{version = "0.2.2"}
+@{version = "0.2.6"}
<li><a href="./src/ClientForm-@(version).tar.gz">ClientForm-@(version).tar.gz</a>
<li><a href="./src/ClientForm-@(version).zip">ClientForm-@(version).zip</a>
<li><a href="./src/ChangeLog.txt">Change Log</a> (included in distribution)
Modified: python-clientform/branches/upstream/current/README.txt
===================================================================
--- python-clientform/branches/upstream/current/README.txt 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/README.txt 2007-04-09 21:18:24 UTC (rev 769)
@@ -1,526 +1,583 @@
- [1]SourceForge.net Logo
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
+ "http://www.w3.org/TR/html4/strict.dtd">
+<!--This file was generated by EmPy from README.html.in : do not edit-->
- ClientForm
+<html>
+<head>
+ <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+ <meta name="author" content="John J. Lee <jjl at pobox.com>">
+ <meta name="date" content="2006-10-25">
+ <meta name="keywords" content="form,HTML,Python,web,client,client-side">
+ <title>ClientForm</title>
+ <style type="text/css" media="screen">@import "../styles/style.css";</style>
+
+</head>
+<body>
- ClientForm is a Python module for handling HTML forms on the client
- side, useful for parsing HTML forms, filling them in and returning the
- completed forms to the server. It developed from a port of Gisle Aas'
- Perl module HTML::Form, from the [2]libwww-perl library, but the
- interface is not the same.
+<div id="sf"><a href="http://sourceforge.net">
+<img src="http://sourceforge.net/sflogo.php?group_id=48205&type=2"
+ width="125" height="37" alt="SourceForge.net Logo"></a></div>
- Simple working example:
-from urllib2 import urlopen
-from ClientForm import ParseResponse
+<h1>ClientForm</h1>
-response = urlopen("http://wwwsearch.sourceforge.net/ClientForm/example.html")
+<div id="Content">
+
+<p>ClientForm is a Python module for handling HTML forms on the client
+side, useful for parsing HTML forms, filling them in and returning the
+completed forms to the server. It developed from a port of Gisle Aas'
+Perl module <code>HTML::Form</code>, from the <a
+href="http://www.linpro.no/lwp/">libwww-perl</a> library, but the
+interface is not the same.
+
+<p>Simple working example:
+
+<pre><span class="pykw">from</span> urllib2 <span class="pykw">import</span> urlopen
+<span class="pykw">from</span> ClientForm <span class="pykw">import</span> ParseResponse
+
+response = urlopen(<span class="pystr">"http://wwwsearch.sourceforge.net/ClientForm/example.html"</span>)
forms = ParseResponse(response, backwards_compat=False)
form = forms[0]
-print form
-form["comments"] = "Thanks, Gisle"
+<span class="pykw">print</span> form
+form[<span class="pystr">"comments"</span>] = <span class="pystr">"Thanks, Gisle"</span>
-# form.click() returns a urllib2.Request object
-# (see HTMLForm.click.__doc__ if you don't have urllib2)
-print urlopen(form.click()).read()
+<span class="pycmt"># form.click() returns a urllib2.Request object
+</span><span class="pycmt"># (see HTMLForm.click.__doc__ if you don't have urllib2)
+</span><span class="pykw">print</span> urlopen(form.click()).read()</pre>
- A more complicated working example (Note: this example makes use of
- the ClientForm 0.2 API; refer to the README.html file in the latest
- 0.1 release for the corresponding code for that version.):
-import ClientForm
-import urllib2
+
+<p>A more complicated working example (<em><strong>Note</strong>: this
+example makes use of the ClientForm 0.2 API; refer to the README.html
+file in the latest 0.1 release for the corresponding code for that
+version.</em>):
+
+<a name="example"></a>
+<pre><span class="pykw">import</span> ClientForm
+<span class="pykw">import</span> urllib2
request = urllib2.Request(
- "http://wwwsearch.sourceforge.net/ClientForm/example.html")
+ <span class="pystr">"http://wwwsearch.sourceforge.net/ClientForm/example.html"</span>)
response = urllib2.urlopen(request)
forms = ClientForm.ParseResponse(response, backwards_compat=False)
response.close()
-## f = open("example.html")
-## forms = ClientForm.ParseFile(f, "http://example.com/example.html",
-## backwards_compat=False)
-## f.close()
-form = forms[0]
-print form # very useful!
+<span class="pycmt">## f = open("example.html")
+</span><span class="pycmt">## forms = ClientForm.ParseFile(f, "http://example.com/example.html",
+</span><span class="pycmt">## backwards_compat=False)
+</span><span class="pycmt">## f.close()
+</span>form = forms[0]
+<span class="pykw">print</span> form <span class="pycmt"># very useful!</span>
-# A 'control' is a graphical HTML form widget: a text entry box, a
-# dropdown 'select' list, a checkbox, etc.
+<span class="pycmt"># A 'control' is a graphical HTML form widget: a text entry box, a
+</span><span class="pycmt"># dropdown 'select' list, a checkbox, etc.
+</span>
+<span class="pycmt"># Indexing allows setting and retrieval of control values
+</span>original_text = form[<span class="pystr">"comments"</span>] <span class="pycmt"># a string, NOT a Control instance</span>
+form[<span class="pystr">"comments"</span>] = <span class="pystr">"Blah."</span>
-# Indexing allows setting and retrieval of control values
-original_text = form["comments"] # a string, NOT a Control instance
-form["comments"] = "Blah."
+<span class="pycmt"># Controls that represent lists (checkbox, select and radio lists) are
+</span><span class="pycmt"># ListControl instances. Their values are sequences of list item names.
+</span><span class="pycmt"># They come in two flavours: single- and multiple-selection:
+</span>form[<span class="pystr">"favorite_cheese"</span>] = [<span class="pystr">"brie"</span>] <span class="pycmt"># single</span>
+form[<span class="pystr">"cheeses"</span>] = [<span class="pystr">"parmesan"</span>, <span class="pystr">"leicester"</span>, <span class="pystr">"cheddar"</span>] <span class="pycmt"># multi</span>
+<span class="pycmt"># equivalent, but more flexible:
+</span>form.set_value([<span class="pystr">"parmesan"</span>, <span class="pystr">"leicester"</span>, <span class="pystr">"cheddar"</span>], name=<span class="pystr">"cheeses"</span>)
-# Controls that represent lists (checkbox, select and radio lists) are
-# ListControl instances. Their values are sequences of list item names.
-# They come in two flavours: single- and multiple-selection:
-form["favorite_cheese"] = ["brie"] # single
-form["cheeses"] = ["parmesan", "leicester", "cheddar"] # multi
-# equivalent, but more flexible:
-form.set_value(["parmesan", "leicester", "cheddar"], name="cheeses")
+<span class="pycmt"># Add files to FILE controls with .add_file(). Only call this multiple
+</span><span class="pycmt"># times if the server is expecting multiple files.
+</span><span class="pycmt"># add a file, default value for MIME type, no filename sent to server
+</span>form.add_file(open(<span class="pystr">"data.dat"</span>))
+<span class="pycmt"># add a second file, explicitly giving MIME type, and telling the server
+</span><span class="pycmt"># what the filename is
+</span>form.add_file(open(<span class="pystr">"data.txt"</span>), <span class="pystr">"text/plain"</span>, <span class="pystr">"data.txt"</span>)
-# Add files to FILE controls with .add_file(). Only call this multiple
-# times if the server is expecting multiple files.
-# add a file, default value for MIME type, no filename sent to server
-form.add_file(open("data.dat"))
-# add a second file, explicitly giving MIME type, and telling the server
-# what the filename is
-form.add_file(open("data.txt"), "text/plain", "data.txt")
+<span class="pycmt"># All Controls may be disabled (equivalent of greyed-out in browser)...
+</span>control = form.find_control(<span class="pystr">"comments"</span>)
+<span class="pykw">print</span> control.disabled
+<span class="pycmt"># ...or readonly
+</span><span class="pykw">print</span> control.readonly
+<span class="pycmt"># readonly and disabled attributes can be assigned to
+</span>control.disabled = False
+<span class="pycmt"># convenience method, used here to make all controls writable (unless
+</span><span class="pycmt"># they're disabled):
+</span>form.set_all_readonly(False)
-# All Controls may be disabled (equivalent of greyed-out in browser)...
-control = form.find_control("comments")
-print control.disabled
-# ...or readonly
-print control.readonly
-# readonly and disabled attributes can be assigned to
-control.disabled = False
-# convenience method, used here to make all controls writable (unless
-# they're disabled):
-form.set_all_readonly(False)
+<span class="pycmt"># A couple of notes about list controls and HTML:
+</span>
+<span class="pycmt"># 1. List controls correspond to either a single SELECT element, or
+</span><span class="pycmt"># multiple INPUT elements. Items correspond to either OPTION or INPUT
+</span><span class="pycmt"># elements. For example, this is a SELECT control, named "control1":
+</span>
+<span class="pycmt"># <select name="control1">
+</span><span class="pycmt"># <option>foo</option>
+</span><span class="pycmt"># <option value="1">bar</option>
+</span><span class="pycmt"># </select>
+</span>
+<span class="pycmt"># and this is a CHECKBOX control, named "control2":
+</span>
+<span class="pycmt"># <input type="checkbox" name="control2" value="foo" id="cbe1">
+</span><span class="pycmt"># <input type="checkbox" name="control2" value="bar" id="cbe2">
+</span>
+<span class="pycmt"># You know the latter is a single control because all the name attributes
+</span><span class="pycmt"># are the same.
+</span>
+<span class="pycmt"># 2. Item names are the strings that go to make up the value that should
+</span><span class="pycmt"># be returned to the server. These strings come from various different
+</span><span class="pycmt"># pieces of text in the HTML. The HTML standard and the ClientForm
+</span><span class="pycmt"># docstrings explain in detail, but playing around with an HTML file,
+</span><span class="pycmt"># ParseFile() and 'print form' is very useful to understand this!
+</span>
+<span class="pycmt"># You can get the Control instances from inside the form...
+</span>control = form.find_control(<span class="pystr">"cheeses"</span>, type=<span class="pystr">"select"</span>)
+<span class="pykw">print</span> control.name, control.value, control.type
+control.value = [<span class="pystr">"mascarpone"</span>, <span class="pystr">"curd"</span>]
+<span class="pycmt"># ...and the Item instances from inside the Control
+</span>item = control.get(<span class="pystr">"curd"</span>)
+<span class="pykw">print</span> item.name, item.selected, item.id, item.attrs
+item.selected = False
-# A couple of notes about list controls and HTML:
+<span class="pycmt"># Controls may be referred to by label:
+</span><span class="pycmt"># find control with label that has a *substring* "Cheeses"
+</span><span class="pycmt"># (eg., a label "Please select a cheese" would match).
+</span>control = form.find_control(label=<span class="pystr">"select a cheese"</span>)
-# 1. List controls correspond to either a single SELECT element, or
-# multiple INPUT elements. Items correspond to either OPTION or INPUT
-# elements. For example, this is a SELECT control, named "control1":
+<span class="pycmt"># You can explicitly say that you're referring to a ListControl:
+</span><span class="pycmt"># set value of "cheeses" ListControl
+</span>form.set_value([<span class="pystr">"gouda"</span>], name=<span class="pystr">"cheeses"</span>, kind=<span class="pystr">"list"</span>)
+<span class="pycmt"># equivalent:
+</span>form.find_control(name=<span class="pystr">"cheeses"</span>, kind=<span class="pystr">"list"</span>).value = [<span class="pystr">"gouda"</span>]
+<span class="pycmt"># the first example is also almost equivalent to the following (but
+</span><span class="pycmt"># insists that the control be a ListControl -- so it will skip any
+</span><span class="pycmt"># non-list controls that come before the control we want)
+</span>form[<span class="pystr">"cheeses"</span>] = [<span class="pystr">"gouda"</span>]
+<span class="pycmt"># The kind argument can also take values "multilist", "singlelist", "text",
+</span><span class="pycmt"># "clickable" and "file":
+</span><span class="pycmt"># find first control that will accept text, and scribble in it
+</span>form.set_value(<span class="pystr">"rhubarb rhubarb"</span>, kind=<span class="pystr">"text"</span>, nr=0)
+<span class="pycmt"># find, and set the value of, the first single-selection list control
+</span>form.set_value([<span class="pystr">"spam"</span>], kind=<span class="pystr">"singlelist"</span>, nr=0)
-# <select name="control1">
-# <option>foo</option>
-# <option value="1">bar</option>
-# </select>
+<span class="pycmt"># You can find controls with a general predicate function:
+</span><span class="pykw">def</span> control_has_caerphilly(control):
+ <span class="pykw">for</span> item <span class="pykw">in</span> control.items:
+ <span class="pykw">if</span> item.name == <span class="pystr">"caerphilly"</span>: <span class="pykw">return</span> True
+form.find_control(kind=<span class="pystr">"list"</span>, predicate=control_has_caerphilly)
-# and this is a CHECKBOX control, named "control2":
+<span class="pycmt"># HTMLForm.controls is a list of all controls in the form
+</span><span class="pykw">for</span> control <span class="pykw">in</span> form.controls:
+ <span class="pykw">if</span> control.value == <span class="pystr">"inquisition"</span>: sys.exit()
-# <input type="checkbox" name="control2" value="foo" id="cbe1">
-# <input type="checkbox" name="control2" value="bar" id="cbe2">
+<span class="pycmt"># Control.items is a list of all Item instances in the control
+</span><span class="pykw">for</span> item <span class="pykw">in</span> form.find_control(<span class="pystr">"cheeses"</span>).items:
+ <span class="pykw">print</span> item.name
-# You know the latter is a single control because all the name attributes
-# are the same.
+<span class="pycmt"># To remove items from a list control, remove it from .items:
+</span>cheeses = form.find_control(<span class="pystr">"cheeses"</span>)
+curd = cheeses.get(<span class="pystr">"curd"</span>)
+<span class="pykw">del</span> cheeses.items[cheeses.items.index(curd)]
+<span class="pycmt"># To add items to a list container, instantiate an Item with its control
+</span><span class="pycmt"># and attributes:
+</span><span class="pycmt"># Note that you are responsible for getting the attributes correct here,
+</span><span class="pycmt"># and these are not quite identical to the original HTML, due to
+</span><span class="pycmt"># defaulting rules and a few special attributes (e.g. Items that represent
+</span><span class="pycmt"># OPTIONs have a special "contents" key in their .attrs dict). In future
+</span><span class="pycmt"># there will be an explicitly supported way of using the parsing logic to
+</span><span class="pycmt"># add items and controls from HTML strings without knowing these details.
+</span>ClientForm.Item(cheeses, {<span class="pystr">"contents"</span>: <span class="pystr">"mascarpone"</span>,
+ <span class="pystr">"value"</span>: <span class="pystr">"mascarpone"</span>})
-# 2. Item names are the strings that go to make up the value that should
-# be returned to the server. These strings come from various different
-# pieces of text in the HTML. The HTML standard and the ClientForm
-# docstrings explain in detail, but playing around with an HTML file,
-# ParseFile() and 'print form' is very useful to understand this!
+<span class="pycmt"># You can specify list items by label using set/get_value_by_label() and
+</span><span class="pycmt"># the label argument of the .get() method. Sometimes labels are easier to
+</span><span class="pycmt"># maintain than names, sometimes the other way around.
+</span>form.set_value_by_label([<span class="pystr">"Mozzarella"</span>, <span class="pystr">"Caerphilly"</span>], <span class="pystr">"cheeses"</span>)
-# You can get the Control instances from inside the form...
-control = form.find_control("cheeses", type="select")
-print control.name, control.value, control.type
-control.value = ["mascarpone", "curd"]
-# ...and the Item instances from inside the Control
-item = control.get("curd")
-print item.name, item.selected, item.id, item.attrs
-item.selected = False
+<span class="pycmt"># Which items are present, selected, and successful?
+</span><span class="pycmt"># is the "parmesan" item of the "cheeses" control successful (selected
+</span><span class="pycmt"># and not disabled)?
+</span><span class="pykw">print</span> <span class="pystr">"parmesan"</span> <span class="pykw">in</span> form[<span class="pystr">"cheeses"</span>]
+<span class="pycmt"># is the "parmesan" item of the "cheeses" control selected?
+</span><span class="pykw">print</span> <span class="pystr">"parmesan"</span> <span class="pykw">in</span> [
+ item.name <span class="pykw">for</span> item <span class="pykw">in</span> form.find_control(<span class="pystr">"cheeses"</span>).items <span class="pykw">if</span> item.selected]
+<span class="pycmt"># does cheeses control have a "caerphilly" item?
+</span><span class="pykw">print</span> <span class="pystr">"caerphilly"</span> <span class="pykw">in</span> [item.name <span class="pykw">for</span> item <span class="pykw">in</span> form.find_control(<span class="pystr">"cheeses"</span>).items]
-# Controls may be referred to by label:
-# find control with label that has a *substring* "Cheeses"
-# (eg., a label "Please select a cheese" would match).
-control = form.find_control(label="select a cheese")
+<span class="pycmt"># Sometimes one wants to set or clear individual items in a list, rather
+</span><span class="pycmt"># than setting the whole .value:
+</span><span class="pycmt"># select the item named "gorgonzola" in the first control named "cheeses"
+</span>form.find_control(<span class="pystr">"cheeses"</span>).get(<span class="pystr">"gorgonzola"</span>).selected = True
+<span class="pycmt"># You can be more specific:
+</span><span class="pycmt"># deselect "edam" in third CHECKBOX control
+</span>form.find_control(type=<span class="pystr">"checkbox"</span>, nr=2).get(<span class="pystr">"edam"</span>).selected = False
+<span class="pycmt"># deselect item labelled "Mozzarella" in control with id "chz"
+</span>form.find_control(id=<span class="pystr">"chz"</span>).get(label=<span class="pystr">"Mozzarella"</span>).selected = False
-# You can explicitly say that you're referring to a ListControl:
-# set value of "cheeses" ListControl
-form.set_value(["gouda"], name="cheeses", kind="list")
-# equivalent:
-form.find_control(name="cheeses", kind="list").value = ["gouda"]
-# the first example is also almost equivalent to the following (but
-# insists that the control be a ListControl -- so it will skip any
-# non-list controls that come before the control we want)
-form["cheeses"] = ["gouda"]
-# The kind argument can also take values "multilist", "singlelist", "text",
-# "clickable" and "file":
-# find first control that will accept text, and scribble in it
-form.set_value("rhubarb rhubarb", kind="text", nr=0)
-# find, and set the value of, the first single-selection list control
-form.set_value(["spam"], kind="singlelist", nr=0)
+<span class="pycmt"># Often, a single checkbox (a CHECKBOX control with a single item) is
+</span><span class="pycmt"># present. In that case, the name of the single item isn't of much
+</span><span class="pycmt"># interest, so it's a good idea to check and uncheck the box without
+</span><span class="pycmt"># using the item name:
+</span>form.find_control(<span class="pystr">"smelly"</span>).items[0].selected = True <span class="pycmt"># check</span>
+form.find_control(<span class="pystr">"smelly"</span>).items[0].selected = False <span class="pycmt"># uncheck</span>
-# You can find controls with a general predicate function:
-def control_has_caerphilly(control):
- for item in control.items:
- if item.name == "caerphilly": return True
-form.find_control(kind="list", predicate=control_has_caerphilly)
+<span class="pycmt"># Items may be disabled (selecting or de-selecting a disabled item is
+</span><span class="pycmt"># not allowed):
+</span>control = form.find_control(<span class="pystr">"cheeses"</span>)
+<span class="pykw">print</span> control.get(<span class="pystr">"emmenthal"</span>).disabled
+control.get(<span class="pystr">"emmenthal"</span>).disabled = True
+<span class="pycmt"># enable all items in control
+</span>control.set_all_items_disabled(False)
-# HTMLForm.controls is a list of all controls in the form
-for control in form.controls:
- if control.value == "inquisition": sys.exit()
+request2 = form.click() <span class="pycmt"># urllib2.Request object</span>
+<span class="pykw">try</span>:
+ response2 = urllib2.urlopen(request2)
+<span class="pykw">except</span> urllib2.HTTPError, response2:
+ <span class="pykw">pass</span>
-# Control.items is a list of all Item instances in the control
-for item in form.find_control("cheeses").items:
- print item.name
+<span class="pykw">print</span> response2.geturl()
+<span class="pykw">print</span> response2.info() <span class="pycmt"># headers</span>
+<span class="pykw">print</span> response2.read() <span class="pycmt"># body</span>
+response2.close()</pre>
-# To remove items from a list control, remove it from .items:
-cheeses = form.find_control("cheeses")
-curd = cheeses.get("curd")
-del cheeses.items[cheeses.items.index(curd)]
-# To add items to a list container, instantiate an Item with its control
-# and attributes:
-# Note that you are responsible for getting the attributes correct here,
-# and these are not quite identical to the original HTML, due to
-# defaulting rules and a few special attributes (e.g. Items that represent
-# OPTIONs have a special "contents" key in their .attrs dict). In future
-# there will be an explicitly supported way of using the parsing logic to
-# add items and controls from HTML strings without knowing these details.
-ClientForm.Item(cheeses, {"contents": "mascarpone",
- "value": "mascarpone"})
-# You can specify list items by label using set/get_value_by_label() and
-# the label argument of the .get() method. Sometimes labels are easier to
-# maintain than names, sometimes the other way around.
-form.set_value_by_label(["Mozzarella", "Caerphilly"], "cheeses")
+<a name="notes"></a>
-# Which items are present, selected, and successful?
-# is the "parmesan" item of the "cheeses" control successful (selected
-# and not disabled)?
-print "parmesan" in form["cheeses"]
-# is the "parmesan" item of the "cheeses" control selected?
-print "parmesan" in [
- item.name for item in form.find_control("cheeses").items if item.selected]
-# does cheeses control have a "caerphilly" item?
-print "caerphilly" in [item.name for item in form.find_control("cheeses").items
-]
+<p>All of the standard control types are supported: <code>TEXT</code>,
+<code>PASSWORD</code>, <code>HIDDEN</code>, <code>TEXTAREA</code>,
+<code>ISINDEX</code>, <code>RESET</code>, <code>BUTTON</code> (<code>INPUT
+TYPE=BUTTON</code> and the various <code>BUTTON</code> types),
+<code>SUBMIT</code>, <code>IMAGE</code>, <code>RADIO</code>,
+<code>CHECKBOX</code>, <code>SELECT</code>/<code>OPTION</code> and
+<code>FILE</code> (for file upload). Both standard form encodings
+(<code>application/x-www-form-urlencoded</code> and
+<code>multipart/form-data</code>) are supported.
-# Sometimes one wants to set or clear individual items in a list, rather
-# than setting the whole .value:
-# select the item named "gorgonzola" in the first control named "cheeses"
-form.find_control("cheeses").get("gorgonzola").selected = True
-# You can be more specific:
-# deselect "edam" in third CHECKBOX control
-form.find_control(type="checkbox", nr=2).get("edam").selected = False
-# deselect item labelled "Mozzarella" in control with id "chz"
-form.find_control(id="chz").get(label="Mozzarella").selected = False
+<p>The module is designed for testing and automation of web
+interfaces, not for implementing interactive user agents.
-# Often, a single checkbox (a CHECKBOX control with a single item) is
-# present. In that case, the name of the single item isn't of much
-# interest, so it's a good idea to check and uncheck the box without
-# using the item name:
-form.find_control("smelly").items[0].selected = True # check
-form.find_control("smelly").items[0].selected = False # uncheck
+<p><strong><em>Security note</em>: Remember that any passwords you store in
+<code>HTMLForm</code> instances will be saved to disk in the clear if you
+pickle them (directly or indirectly). The simplest solution to this is to
+avoid pickling <code>HTMLForm</code> objects. You could also pickle before
+filling in any password, or just set the password to <code>""</code> before
+pickling.</strong>
-# Items may be disabled (selecting or de-selecting a disabled item is
-# not allowed):
-control = form.find_control("cheeses")
-print control.get("emmenthal").disabled
-control.get("emmenthal").disabled = True
-# enable all items in control
-control.set_all_items_disabled(False)
+<p>Python 2.0 or above is required. To run the tests, you need the
+<code>unittest</code> module (from <a
+href="http://pyunit.sourceforge.net/">PyUnit</a>). <code>unittest</code> is a
+standard library module with Python 2.1 and above.
-request2 = form.click() # urllib2.Request object
-try:
- response2 = urllib2.urlopen(request2)
-except urllib2.HTTPError, response2:
- pass
+<p>For full documentation, see the docstrings in ClientForm.py.
-print response2.geturl()
-print response2.info() # headers
-print response2.read() # body
-response2.close()
+<p><em><strong>Note: this page describes the 0.2 (stable release)
+interface. See <a href="./src/README-0_1_17.html">here</a> for the
+old 0.1 interface.</strong> </em>
- All of the standard control types are supported: TEXT, PASSWORD,
- HIDDEN, TEXTAREA, ISINDEX, RESET, BUTTON (INPUT TYPE=BUTTON and the
- various BUTTON types), SUBMIT, IMAGE, RADIO, CHECKBOX, SELECT/OPTION
- and FILE (for file upload). Both standard form encodings
- (application/x-www-form-urlencoded and multipart/form-data) are
- supported.
- The module is designed for testing and automation of web interfaces,
- not for implementing interactive user agents.
+<a name="parsers"></a>
+<h2>Parsers</h2>
- Security note: Remember that any passwords you store in HTMLForm
- instances will be saved to disk in the clear if you pickle them
- (directly or indirectly). The simplest solution to this is to avoid
- pickling HTMLForm objects. You could also pickle before filling in any
- password, or just set the password to "" before pickling.
+<p>ClientForm contains two parsers. See <a href="./#faq">the FAQ entry on
+XHTML</a> for details.
- Python 2.0 or above is required. To run the tests, you need the
- unittest module (from [3]PyUnit). unittest is a standard library
- module with Python 2.1 and above.
+<p><a href="http://www.egenix.com/files/python/mxTidy.html">mxTidy</a> or <a
+href="http://utidylib.berlios.de/">µTidylib</a> can be useful for dealing with
+bad HTML.
- For full documentation, see the docstrings in ClientForm.py.
+<p>I think it would be nice to have an implementation of ClientForm based on <a
+href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>
+(i.e. all methods and attributes implemented using the BeautifulSoup API),
+since that module does tolerant HTML parsing with a nice API for doing
+non-forms stuff. (I'm not about to do this, though. For anybody interested in
+doing this, note that the ClientForm tests would need making
+constructor-independent first.)
- Note: this page describes the 0.2 (development release) interface. See
- [4]here for the stable 0.1 interface.
-Parsers
+<a name="compat"></a>
+<h2>Backwards-compatibility mode</h2>
- ClientForm contains two parsers. See [5]the FAQ entry on XHTML for
- details.
+<p>ClientForm 0.2 includes three minor backwards-incompatible interface
+changes from version 0.1.
- [6]mxTidy or [7]µTidylib can be useful for dealing with bad HTML.
+<p>To make upgrading from 0.1 easier, and to allow me to stop supporting
+version 0.1 sooner, version 0.2 contains support for operating in a
+backwards-compatible mode, under which code written for 0.1 should work without
+modification. This is done on a per-<code>HTMLForm</code> basis via the
+<code>.backwards_compat</code> attribute, but for convenience the
+ParseResponse() and ParseFile() factory functions accept
+<code>backwards_compat</code> arguments. These backwards-compatibility
+features will be removed in version 0.3. The default is to operate in
+backwards-compatible mode. To run with backwards compatible mode turned
+<em><strong>OFF</strong></em> (<strong>strongly recommended</strong>):
- I think it would be nice to have an implementation of ClientForm based
- on [8]BeautifulSoup (i.e. all methods and attributes implemented using
- the BeautifulSoup API), since that module does tolerant HTML parsing
- with a nice API for doing non-forms stuff. (I'm not about to do this,
- though. For anybody interested in doing this, note that the ClientForm
- tests would need making constructor-independent first.)
+<pre>
+<span class="pykw">from</span> urllib2 <span class="pykw">import</span> urlopen
+<span class="pykw">from</span> ClientForm <span class="pykw">import</span> ParseResponse
+forms = ParseResponse(urlopen(<span class="pystr">"http://example.com/"</span>), backwards_compat=False)
+<span class="pycmt"># ...</span></pre>
-Backwards-compatibility mode
- ClientForm 0.2 includes three minor backwards-incompatible interface
- changes from version 0.1.
+<p>The backwards-incompatible changes are:
- To make upgrading from 0.1 easier, and to allow me to stop supporting
- version 0.1 sooner, version 0.2 contains support for operating in a
- backwards-compatible mode, under which code written for 0.1 should
- work without modification. This is done on a per-HTMLForm basis via
- the .backwards_compat attribute, but for convenience the
- ParseResponse() and ParseFile() factory functions accept
- backwards_compat arguments. These backwards-compatibility features
- will be removed in version 0.3. The default is to operate in
- backwards-compatible mode. To run with backwards compatible mode
- turned OFF (strongly recommended):
-from urllib2 import urlopen
-from ClientForm import ParseResponse
-forms = ParseResponse(urlopen("http://example.com/"), backwards_compat=False)
-# ...
+<ul>
+<li><p>Ambiguous specification of controls or items now results in
+AmbiguityError. If you want the old behaviour, explicitly pass
+<code>nr=0</code> to indicate you want the first matching control or item.
- The backwards-incompatible changes are:
- * Ambiguous specification of controls or items now results in
- AmbiguityError. If you want the old behaviour, explicitly pass
- nr=0 to indicate you want the first matching control or item.
- * Item label matching is now done by substring, not by strict
- string-equality (but note leading and trailing space is always
- stripped). (Control label matching is always done by substring.)
- * Handling of disabled list items has changed. First, note that
- handling of disabled list items in 0.1 (and in 0.2's
- backwards-compatibility mode!) is buggy: disabled items are
- successful (ie. disabled item names are sent back to the server).
- As a result, there was no distinction to be made between
- successful items and selected items. In 0.2, the bug is fixed, so
- this is no longer the case, and it is important to note that list
- controls' .value attribute contains only the successful item
- names; items that are selected but not successful (because
- disabled) are not included in .value. Second, disabled list items
- may no longer be deselected: AttributeError is raised in 0.2,
- whereas deselection was allowed in 0.1. The bug in 0.1 and in
- 0.2's backwards-compatibility mode will not be fixed, to preserve
- compatibility and to encourage people to upgrade to the new 0.2
- backwards_compat=True behaviour.
+<li><p>Item label matching is now done by substring, not by strict
+string-equality (but note leading and trailing space is always stripped).
+(Control label matching is always done by substring.)
-Credits
+<li><p>Handling of disabled list items has changed. First, note that handling
+of disabled list items in 0.1 (and in 0.2's backwards-compatibility mode!) is
+buggy: disabled items are successful (ie. disabled item names are sent back to
+the server). As a result, there was no distinction to be made between
+successful items and selected items. In 0.2, the bug is fixed, so this is no
+longer the case, and it is important to note that list controls'
+<code>.value</code> attribute contains only the <em>successful</em> item names;
+items that are <em>selected </em> but not successful (because disabled) are not
+included in <code>.value</code>. Second, disabled list items may no longer be
+deselected: AttributeError is raised in 0.2, whereas deselection was allowed in
+0.1. The bug in 0.1 and in 0.2's backwards-compatibility mode will not be
+fixed, to preserve compatibility and to encourage people to upgrade to the new
+0.2 <code>backwards_compat=False</code> behaviour. </ul>
- Apart from Gisle Aas for allowing the original port from libwww-perl,
- particular credit is due to Gary Poster and Benji York, and their
- employer, Zope Corporation, for their contributions which led to
- ClientForm 0.2 being released. Thanks also to the many people who have
- contributed bug reports.
+<a name="credits"></a>
+<h2>Credits</h2>
-Download
+<p>Apart from Gisle Aas for allowing the original port from
+libwww-perl, particular credit is due to Gary Poster and Benji York,
+and their employer, Zope Corporation, for their contributions which
+led to ClientForm 0.2 being released. Thanks also to the many people
+who have contributed bug reports.
- For installation instructions, see the INSTALL.txt file included in
- the distribution.
- Stable release There have been three fairly minor
- backwards-incompatible interface changes since version 0.1 (see
- [9]above), but by default the code operates in a backwards-compatible
- mode so that code written for 0.1 should work without changes.
- 0.2 includes better support for labels, and a simpler interface (all
- the old methods are still there, but some have been deprecated and a
- few added).
- * [10]ClientForm-0.2.2.tar.gz
- * [11]ClientForm-0.2.2.zip
- * [12]Change Log (included in distribution)
- * [13]Older releases.
+<a name="download"></a>
- Old release No longer maintained. I recommend upgrading from 0.1 to
- 0.2.
+<h2>Download</h2>
- There were many interface changes between 0.0 and 0.1, so you should
- take care if upgrading old code from 0.0.
+<p>For installation instructions, see the INSTALL.txt file included in the
+distribution.
- 0.1 includes FILE control support for file upload, handling of
- disabled list items, and a redesigned interface.
- * [14]ClientForm-0.1.17.tar.gz
- * [15]ClientForm-0_1_17.zip
- * [16]Change Log (included in distribution)
- * [17]Older releases.
+<p><span class="spanhdr">Stable release</span> There have been three fairly
+minor backwards-incompatible interface changes since version 0.1 (see <a
+href="./#compat">above</a>), but by default the code operates in a
+backwards-compatible mode so that code written for 0.1 should work without
+changes.
- Ancient release No longer maintained. You don't want this.
- * [18]ClientForm-0.0.16.tar.gz
- * [19]ClientForm-0_0_16.zip
- * [20]Change Log (included in distribution)
- * [21]Older releases.
+<p>0.2 includes better support for labels, and a simpler interface (all the old
+methods are still there, but some have been deprecated and a few added).
-Subversion
+<ul>
- The [22]Subversion (SVN) trunk is
- [23]http://codespeak.net/svn/wwwsearch/ClientForm/trunk, so to check
- out the source:
+<li><a href="./src/ClientForm-0.2.6.tar.gz">ClientForm-0.2.6.tar.gz</a>
+<li><a href="./src/ClientForm-0.2.6.zip">ClientForm-0.2.6.zip</a>
+<li><a href="./src/ChangeLog.txt">Change Log</a> (included in distribution)
+<li><a href="./src/">Older releases.</a>
+</ul>
+
+<br>
+
+<p><span class="spanhdr">Old release</span> No longer maintained. I recommend
+upgrading from 0.1 to 0.2.
+
+<p>There were many interface changes between 0.0 and 0.1, so you should take
+care if upgrading old code from 0.0.
+
+<p>0.1 includes <code>FILE</code> control support for file upload, handling
+of disabled list items, and a redesigned interface.
+<ul>
+
+
+<li><a href="./src/ClientForm-0.1.17.tar.gz">ClientForm-0.1.17.tar.gz</a>
+<li><a href="./src/ClientForm-0_1_17.zip">ClientForm-0_1_17.zip</a>
+<li><a href="./src/ChangeLog.txt">Change Log</a> (included in distribution)
+<li><a href="./src/">Older releases.</a>
+</ul>
+
+<br>
+
+<p><span class="spanhdr">Ancient release</span> No longer maintained. You
+don't want this.
+
+<ul>
+
+
+<li><a href="./src/ClientForm-0.0.16.tar.gz">ClientForm-0.0.16.tar.gz</a>
+<li><a href="./src/ClientForm-0_0_16.zip">ClientForm-0_0_16.zip</a>
+<li><a href="./src/ChangeLog.txt">Change Log</a> (included in distribution)
+<li><a href="./src/">Older releases.</a>
+</ul>
+
+
+<a name="svn"></a>
+<h2>Subversion</h2>
+
+<p>The <a href="http://subversion.tigris.org/">Subversion (SVN)</a> trunk is <a href="http://codespeak.net/svn/wwwsearch/ClientForm/trunk#egg=ClientForm-dev">http://codespeak.net/svn/wwwsearch/ClientForm/trunk</a>, so to check out the source:
+
+<pre>
svn co http://codespeak.net/svn/wwwsearch/ClientForm/trunk ClientForm
+</pre>
-FAQs
- * Doesn't the standard Python library module, cgi, do this?
- No: the cgi module does the server end of the job. It doesn't know
- how to parse or fill in a form or how to send it back to the
- server.
- * Which version of Python do I need?
- 2.0 or above (ClientForm 0.2; version 0.1 requires Python 1.5.2 or
- above).
- * Is urllib2 required?
- No.
- * How do I use it without urllib2?
- Use .click_request_data() instead of .click().
- * Which urllib2 do I need?
- You don't. It's convenient, though. If you have Python 2.0, you
- need to upgrade to the version from Python 2.1 (available from
- [24]www.python.org). Otherwise, you're OK.
- * Which license?
- ClientForm is dual-licensed: you may pick either the [25]BSD
- license, or the [26]ZPL 2.1 (both are included in the
- distribution).
- * Is XHTML supported?
- Yes. You must pass
- form_parser_class=ClientForm.XHTMLCompatibleFormParser to
- ParseResponse() / ParseFile(). Note this parser is less tolerant
- of bad HTML than the default, ClientForm.FormParser
- * How do I figure out what control names and values to use?
- print form is usually all you need. In your code, things like the
- HTMLForm.items attribute of HTMLForm instances can be useful to
- inspect forms at runtime. Note that it's possible to use item
- labels instead of item names, which can be useful -- use the
- by_label arguments to the various methods, and the
- .get_value_by_label() / .set_value_by_label() methods on
- ListControl.
- * What do those '*' characters mean in the string representations of
- list controls?
- A * next to an item means that item is selected.
- * What do those parentheses (round brackets) mean in the string
- representations of list controls?
- Parentheses (foo) around an item mean that item is disabled.
- * Why doesn't <some control> turn up in the data returned by
- .click*() when that control has non-None value?
- Either the control is disabled, or it is not successful for some
- other reason. 'Successful' (see HTML 4 specification) means that
- the control will cause data to get sent to the server.
- * Why does ClientForm not follow the HTML 4.0 / RFC 1866 standards
- for RADIO and multiple-selection SELECT controls?
- Because by default, it follows browser behaviour when setting the
- initially-selected items in list controls that have no items
- explicitly selected in the HTML. Use the select_default argument
- to ParseResponse if you want to follow the RFC 1866 rules instead.
- Note that browser behaviour violates the HTML 4.01 specification
- in the case of RADIO controls.
- * Why does .click()ing on a button not work for me?
- + Clicking on a RESET button doesn't do anything, by design -
- this is a library for web automation, not an interactive
- browser. Even in an interactive browser, clicking on RESET
- sends nothing to the server, so there is little point in
- having .click() do anything special here.
- + Clicking on a BUTTON TYPE=BUTTON doesn't do anything either,
- also by design. This time, the reason is that that BUTTON is
- only in the HTML standard so that one can attach callbacks to
- its events. The callbacks are functions in SCRIPT elements
- (such as Javascript) embedded in the HTML, and their
- execution may result in information getting sent back to the
- server. ClientForm, however, knows nothing about these
- callbacks, so it can't do anything useful with a click on a
- BUTTON whose type is BUTTON.
- + Generally, embedded script may be messing things up in all
- kinds of ways. See the answer to the next question.
- * Embedded script is messing up my form filling. What do I do?
- See the [27]General FAQs page and the next FAQ entry for what to
- do about this.
- * How do I change INPUT TYPE=HIDDEN field values (for example, to
- emulate the effect of JavaScript code)?
- As with any control, set the control's readonly attribute false.
-form.find_control("foo").readonly = False # allow changing .value of control f
-oo
-form.set_all_readonly(False) # allow changing the .value of all controls
- * I'm having trouble debugging my code.
- The [28]ClientCookie package makes it easy to get .seek()able
- response objects, which is convenient for debugging. See also
- [29]here for few relevant tips. Also see [30]General FAQs.
- * I have a control containing a list of integers. How do I select
- the one whose value is nearest to the one I want?
-import bisect
-def closest_int_value(form, ctrl_name, value):
- values = map(int, [item.name for item in form.find_control(ctrl_name).items
-])
- return str(values[bisect.bisect(values, value) - 1])
+<a name="faq"></a>
+<h2>FAQs</h2>
+<ul>
+ <li>Doesn't the standard Python library module, <code>cgi</code>, do this?
+ <p>No: the <code>cgi</code> module does the server end of the job. It
+ doesn't know how to parse or fill in a form or how to send it back to the
+ server.
+ <li>Which version of Python do I need?
+ <p>2.0 or above (ClientForm 0.2; version 0.1 requires Python 1.5.2 or above).
+ <li>Is <code>urllib2</code> required?
+ <p>No.
+ <li>How do I use it without <code>urllib2</code>?
+ <p>Use <code>.click_request_data()</code> instead of <code>.click()</code>.
+ <li>Which <code>urllib2</code> do I need?
+ <p>You don't. It's convenient, though. If you have Python 2.0, you need to
+ upgrade to the version from Python 2.1 (available from <a
+ href="http://www.python.org/">www.python.org</a>). Otherwise, you're OK.
+ <li>Which license?
+ <p>ClientForm is dual-licensed: you may pick either the
+ <a href="http://www.opensource.org/licenses/bsd-license.php">BSD license</a>,
+ or the <a href="http://www.zope.org/Resources/ZPL">ZPL 2.1</a> (both are
+ included in the distribution).
+ <a name="xhtml"></a>
+ <li>Is XHTML supported?
+ <p>Yes. You must pass
+ <code>form_parser_class=ClientForm.XHTMLCompatibleFormParser</code> to
+ <code>ParseResponse()</code> / <code>ParseFile()</code>. Note this parser
+ is less tolerant of bad HTML than the default,
+ <code>ClientForm.FormParser</code>
+ <li>How do I figure out what control names and values to use?
+ <p><code>print form</code> is usually all you need.
+ In your code, things like the <code>HTMLForm.items</code> attribute of
+ <code>HTMLForm</code> instances can be useful to inspect forms at
+ runtime. Note that it's possible to use item labels instead of item
+ names, which can be useful — use the <code>by_label</code>
+ arguments to the various methods, and the <code>.get_value_by_label()</code> /
+ <code>.set_value_by_label()</code> methods on <code>ListControl</code>.
+ <li>What do those <code>'*'</code> characters mean in the string
+ representations of list controls?
+ <p>A <code>*</code> next to an item means that item is selected.
+ <li>What do those parentheses (round brackets) mean in the string
+ representations of list controls?
+ <p>Parentheses <code>(foo)</code> around an item mean that item is disabled.
+ <li>Why doesn't <some control> turn up in the data returned by
+ <code>.click*()</code> when that control has non-<code>None</code> value?
+ <p>Either the control is disabled, or it is not successful for some other
+ reason. 'Successful' (see HTML 4 specification) means that the control
+ will cause data to get sent to the server.
+ <li>Why does ClientForm not follow the HTML 4.0 / RFC 1866 standards for
+ <code>RADIO</code> and multiple-selection <code>SELECT</code> controls?
+ <p>Because by default, it follows browser behaviour when setting the
+ initially-selected items in list controls that have no items explicitly
+ selected in the HTML. Use the <code>select_default</code> argument to
+ <code>ParseResponse</code> if you want to follow the RFC 1866 rules
+ instead. Note that browser behaviour violates the HTML 4.01 specification
+ in the case of <code>RADIO</code> controls.
+ <li>Why does <code>.click()</code>ing on a button not work for me?
+ <ul>
+ <li>Clicking on a <code>RESET</code> button doesn't do anything, by design
+ - this is a library for web automation, not an interactive browser.
+ Even in an interactive browser, clicking on <code>RESET</code> sends
+ nothing to the server, so there is little point in having
+ <code>.click()</code> do anything special here.
+ <li>Clicking on a <code>BUTTON TYPE=BUTTON</code> doesn't do anything
+ either, also by design. This time, the reason is that that
+ <code>BUTTON</code> is only in the HTML standard so that one can attach
+ callbacks to its events. The callbacks are functions in
+ <code>SCRIPT</code> elements (such as Javascript) embedded in the HTML,
+ and their execution may result in information getting sent back to the
+ server. ClientForm, however, knows nothing about these callbacks, so
+ it can't do anything useful with a click on a <code>BUTTON</code> whose
+ type is <code>BUTTON</code>.
+ <li>Generally, embedded script may be messing things up in all kinds of
+ ways. See the answer to the next question.
+ </ul>
+ <li>Embedded script is messing up my form filling. What do I do?
+ <p>See the <a href="../bits/GeneralFAQ.html">General FAQs</a> page and the
+ next FAQ entry for what to do about this.
+<!-- XXX example here -->
+ <li>How do I change <code>INPUT TYPE=HIDDEN</code> field values (for example,
+ to emulate the effect of JavaScript code)?
+ <p>As with any control, set the control's <code>readonly</code> attribute
+ false.
+<p><pre>
+form.find_control(<span class="pystr">"foo"</span>).readonly = False <span class="pycmt"># allow changing .value of control foo</span>
+form.set_all_readonly(False) <span class="pycmt"># allow changing the .value of all controls</span></pre>
-form["distance"] = [closest_int_value(form, "distance", 23)]
- * Where can I find out more about the HTML and HTTP standards?
- + W3C [31]HTML 4.01 Specification.
- + [32]RFC 1866 - the HTML 2.0 standard.
- + [33]RFC 1867 - Form-based file upload.
- + [34]RFC 2616 - HTTP 1.1 Specification.
+ </li>
+ <li>I'm having trouble debugging my code.
+ <p>The <a href="../ClientCookie/">ClientCookie</a> package makes it
+ easy to get <code>.seek()</code>able response objects, which is
+ convenient for debugging. See also <a
+ href="../ClientCookie/doc.html#debugging">here</a> for few
+ relevant tips. Also see <a href="../bits/GeneralFAQ.html"> General
+ FAQs</a>.
+ <li>I have a control containing a list of integers. How do I select the one
+ whose value is nearest to the one I want?
+<p><pre>
+<span class="pykw">import</span> bisect
+<span class="pykw">def</span> closest_int_value(form, ctrl_name, value):
+ values = map(int, [item.name <span class="pykw">for</span> item <span class="pykw">in</span> form.find_control(ctrl_name).items])
+ <span class="pykw">return</span> str(values[bisect.bisect(values, value) - 1])
- I prefer questions and comments to be sent to the [35]mailing list
- rather than direct to me.
+form[<span class="pystr">"distance"</span>] = [closest_int_value(form, <span class="pystr">"distance"</span>, 23)]</pre>
- [36]John J. Lee, March 2006.
+ </li>
+ <li>Where can I find out more about the HTML and HTTP standards?
+ <ul>
+ <li>W3C <a href="http://www.w3.org/TR/html401/">HTML 4.01
+ Specification</a>.
+ <li><a href="http://www.ietf.org/rfc/rfc1866.txt">RFC 1866</a> -
+ the HTML 2.0 standard.
+ <li><a href="http://www.ietf.org/rfc/rfc1867.txt">RFC 1867</a> -
+ Form-based file upload.
+ <li><a href="http://www.ietf.org/rfc/rfc2616.txt">RFC 2616</a> -
+ HTTP 1.1 Specification.
+ </ul>
+</ul>
- [37]Home
- [38]General FAQs
- [39]mechanize
- [40]pullparser
- [41]ClientCookie
- [42]ClientCookie docs
- ClientForm
- [43]DOMForm
- [44]python-spidermonkey
- [45]ClientTable
- [46]1.5.2 urllib2.py
- [47]1.5.2 urllib.py
- [48]Other stuff
- [49]Example
- [50]Notes
- [51]Parsers
- [52]Compatibility
- [53]Credits
- [54]Download
- [55]FAQs
+<p>I prefer questions and comments to be sent to the <a
+href="http://lists.sourceforge.net/lists/listinfo/wwwsearch-general">
+mailing list</a> rather than direct to me.
-References
+<p><a href="mailto:jjl at pobox.com">John J. Lee</a>,
+October 2006.
- 1. http://sourceforge.net/
- 2. http://www.linpro.no/lwp/
- 3. http://pyunit.sourceforge.net/
- 4. http://wwwsearch.sourceforge.net/ClientForm/src/README_0_1_18.html
- 5. http://wwwsearch.sourceforge.net/ClientForm/#faq
- 6. http://www.egenix.com/files/python/mxTidy.html
- 7. http://utidylib.berlios.de/
- 8. http://www.crummy.com/software/BeautifulSoup/
- 9. http://wwwsearch.sourceforge.net/ClientForm/#compat
- 10. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.2.tar.gz
- 11. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.2.2.zip
- 12. http://wwwsearch.sourceforge.net/ClientForm/src/ChangeLog.txt
- 13. http://wwwsearch.sourceforge.net/ClientForm/src/
- 14. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.1.17.tar.gz
- 15. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0_1_17.zip
- 16. http://wwwsearch.sourceforge.net/ClientForm/src/ChangeLog.txt
- 17. http://wwwsearch.sourceforge.net/ClientForm/src/
- 18. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.0.16.tar.gz
- 19. http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0_0_16.zip
- 20. http://wwwsearch.sourceforge.net/ClientForm/src/ChangeLog.txt
- 21. http://wwwsearch.sourceforge.net/ClientForm/src/
- 22. http://subversion.tigris.org/
- 23. http://codespeak.net/svn/wwwsearch/ClientForm/trunk#egg=ClientForm-dev
- 24. http://www.python.org/
- 25. http://www.opensource.org/licenses/bsd-license.php
- 26. http://www.zope.org/Resources/ZPL
- 27. http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
- 28. http://wwwsearch.sourceforge.net/ClientCookie/
- 29. http://wwwsearch.sourceforge.net/ClientCookie/doc.html#debugging
- 30. http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
- 31. http://www.w3.org/TR/html401/
- 32. http://www.ietf.org/rfc/rfc1866.txt
- 33. http://www.ietf.org/rfc/rfc1867.txt
- 34. http://www.ietf.org/rfc/rfc2616.txt
- 35. http://lists.sourceforge.net/lists/listinfo/wwwsearch-general
- 36. mailto:jjl at pobox.com
- 37. http://wwwsearch.sourceforge.net/
- 38. http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
- 39. http://wwwsearch.sourceforge.net/mechanize/
- 40. http://wwwsearch.sourceforge.net/pullparser/
- 41. http://wwwsearch.sourceforge.net/ClientCookie/
- 42. http://wwwsearch.sourceforge.net/ClientCookie/doc.html
- 43. http://wwwsearch.sourceforge.net/DOMForm/
- 44. http://wwwsearch.sourceforge.net/python-spidermonkey/
- 45. http://wwwsearch.sourceforge.net/ClientTable/
- 46. http://wwwsearch.sourceforge.net/bits/urllib2_152.py
- 47. http://wwwsearch.sourceforge.net/bits/urllib_152.py
- 48. http://wwwsearch.sourceforge.net/#other
- 49. http://wwwsearch.sourceforge.net/ClientForm/#example
- 50. http://wwwsearch.sourceforge.net/ClientForm/#notes
- 51. http://wwwsearch.sourceforge.net/ClientForm/#parsers
- 52. http://wwwsearch.sourceforge.net/ClientForm/#compat
- 53. http://wwwsearch.sourceforge.net/ClientForm/#credits
- 54. http://wwwsearch.sourceforge.net/ClientForm/#download
- 55. http://wwwsearch.sourceforge.net/ClientForm/#faq
+</div>
+
+<div id="Menu">
+
+<a href="..">Home</a><br>
+<br>
+<a href="../bits/GeneralFAQ.html">General FAQs</a><br>
+<br>
+<a href="../mechanize/">mechanize</a><br>
+<a href="../mechanize/doc.html"><span class="subpage">mechanize docs</span></a><br>
+<span class="thispage">ClientForm</span><br>
+<br>
+<a href="../ClientCookie/">ClientCookie</a><br>
+<a href="../ClientCookie/doc.html"><span class="subpage">ClientCookie docs</span></a><br>
+<a href="../pullparser/">pullparser</a><br>
+<a href="../DOMForm/">DOMForm</a><br>
+<a href="../python-spidermonkey/">python-spidermonkey</a><br>
+<a href="../ClientTable/">ClientTable</a><br>
+<a href="../bits/urllib2_152.py">1.5.2 urllib2.py</a><br>
+<a href="../bits/urllib_152.py">1.5.2 urllib.py</a><br>
+
+<br>
+
+<a href="../#other">Other stuff</a><br>
+
+<br>
+
+<a href="./#example">Example</a><br>
+<a href="./#notes">Notes</a><br>
+<a href="./#parsers">Parsers</a><br>
+<a href="./#compat">Compatibility</a><br>
+<a href="./#credits">Credits</a><br>
+<a href="./#download">Download</a><br>
+<a href="./#faq">FAQs</a><br>
+
+</div>
+
+</body>
+</html>
Added: python-clientform/branches/upstream/current/ez_setup.py
===================================================================
--- python-clientform/branches/upstream/current/ez_setup.py 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/ez_setup.py 2007-04-09 21:18:24 UTC (rev 769)
@@ -0,0 +1,222 @@
+#!python
+"""Bootstrap setuptools installation
+
+If you want to use setuptools in your package's setup.py, just include this
+file in the same directory with it, and add this to the top of your setup.py::
+
+ from ez_setup import use_setuptools
+ use_setuptools()
+
+If you want to require a specific version of setuptools, set a download
+mirror, or use an alternate download directory, you can do so by supplying
+the appropriate options to ``use_setuptools()``.
+
+This file can also be run as a script to install or upgrade setuptools.
+"""
+import sys
+DEFAULT_VERSION = "0.6c3"
+DEFAULT_URL = "http://cheeseshop.python.org/packages/%s/s/setuptools/" % sys.version[:3]
+
+md5_data = {
+ 'setuptools-0.6b1-py2.3.egg': '8822caf901250d848b996b7f25c6e6ca',
+ 'setuptools-0.6b1-py2.4.egg': 'b79a8a403e4502fbb85ee3f1941735cb',
+ 'setuptools-0.6b2-py2.3.egg': '5657759d8a6d8fc44070a9d07272d99b',
+ 'setuptools-0.6b2-py2.4.egg': '4996a8d169d2be661fa32a6e52e4f82a',
+ 'setuptools-0.6b3-py2.3.egg': 'bb31c0fc7399a63579975cad9f5a0618',
+ 'setuptools-0.6b3-py2.4.egg': '38a8c6b3d6ecd22247f179f7da669fac',
+ 'setuptools-0.6b4-py2.3.egg': '62045a24ed4e1ebc77fe039aa4e6f7e5',
+ 'setuptools-0.6b4-py2.4.egg': '4cb2a185d228dacffb2d17f103b3b1c4',
+ 'setuptools-0.6c1-py2.3.egg': 'b3f2b5539d65cb7f74ad79127f1a908c',
+ 'setuptools-0.6c1-py2.4.egg': 'b45adeda0667d2d2ffe14009364f2a4b',
+ 'setuptools-0.6c2-py2.3.egg': 'f0064bf6aa2b7d0f3ba0b43f20817c27',
+ 'setuptools-0.6c2-py2.4.egg': '616192eec35f47e8ea16cd6a122b7277',
+ 'setuptools-0.6c3-py2.3.egg': 'f181fa125dfe85a259c9cd6f1d7b78fa',
+ 'setuptools-0.6c3-py2.4.egg': 'e0ed74682c998bfb73bf803a50e7b71e',
+ 'setuptools-0.6c3-py2.5.egg': 'abef16fdd61955514841c7c6bd98965e',
+}
+
+import sys, os
+
+def _validate_md5(egg_name, data):
+ if egg_name in md5_data:
+ from md5 import md5
+ digest = md5(data).hexdigest()
+ if digest != md5_data[egg_name]:
+ print >>sys.stderr, (
+ "md5 validation of %s failed! (Possible download problem?)"
+ % egg_name
+ )
+ sys.exit(2)
+ return data
+
+
+def use_setuptools(
+ version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir,
+ download_delay=15
+):
+ """Automatically find/download setuptools and make it available on sys.path
+
+ `version` should be a valid setuptools version number that is available
+ as an egg for download under the `download_base` URL (which should end with
+ a '/'). `to_dir` is the directory where setuptools will be downloaded, if
+ it is not already available. If `download_delay` is specified, it should
+ be the number of seconds that will be paused before initiating a download,
+ should one be required. If an older version of setuptools is installed,
+ this routine will print a message to ``sys.stderr`` and raise SystemExit in
+ an attempt to abort the calling script.
+ """
+ try:
+ import setuptools
+ if setuptools.__version__ == '0.0.1':
+ print >>sys.stderr, (
+ "You have an obsolete version of setuptools installed. Please\n"
+ "remove it from your system entirely before rerunning this script."
+ )
+ sys.exit(2)
+ except ImportError:
+ egg = download_setuptools(version, download_base, to_dir, download_delay)
+ sys.path.insert(0, egg)
+ import setuptools; setuptools.bootstrap_install_from = egg
+
+ import pkg_resources
+ try:
+ pkg_resources.require("setuptools>="+version)
+
+ except pkg_resources.VersionConflict, e:
+ # XXX could we install in a subprocess here?
+ print >>sys.stderr, (
+ "The required version of setuptools (>=%s) is not available, and\n"
+ "can't be installed while this script is running. Please install\n"
+ " a more recent version first.\n\n(Currently using %r)"
+ ) % (version, e.args[0])
+ sys.exit(2)
+
+def download_setuptools(
+ version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir,
+ delay = 15
+):
+ """Download setuptools from a specified location and return its filename
+
+ `version` should be a valid setuptools version number that is available
+ as an egg for download under the `download_base` URL (which should end
+ with a '/'). `to_dir` is the directory where the egg will be downloaded.
+ `delay` is the number of seconds to pause before an actual download attempt.
+ """
+ import urllib2, shutil
+ egg_name = "setuptools-%s-py%s.egg" % (version,sys.version[:3])
+ url = download_base + egg_name
+ saveto = os.path.join(to_dir, egg_name)
+ src = dst = None
+ if not os.path.exists(saveto): # Avoid repeated downloads
+ try:
+ from distutils import log
+ if delay:
+ log.warn("""
+---------------------------------------------------------------------------
+This script requires setuptools version %s to run (even to display
+help). I will attempt to download it for you (from
+%s), but
+you may need to enable firewall access for this script first.
+I will start the download in %d seconds.
+
+(Note: if this machine does not have network access, please obtain the file
+
+ %s
+
+and place it in this directory before rerunning this script.)
+---------------------------------------------------------------------------""",
+ version, download_base, delay, url
+ ); from time import sleep; sleep(delay)
+ log.warn("Downloading %s", url)
+ src = urllib2.urlopen(url)
+ # Read/write all in one block, so we don't create a corrupt file
+ # if the download is interrupted.
+ data = _validate_md5(egg_name, src.read())
+ dst = open(saveto,"wb"); dst.write(data)
+ finally:
+ if src: src.close()
+ if dst: dst.close()
+ return os.path.realpath(saveto)
+
+def main(argv, version=DEFAULT_VERSION):
+ """Install or upgrade setuptools and EasyInstall"""
+
+ try:
+ import setuptools
+ except ImportError:
+ egg = None
+ try:
+ egg = download_setuptools(version, delay=0)
+ sys.path.insert(0,egg)
+ from setuptools.command.easy_install import main
+ return main(list(argv)+[egg]) # we're done here
+ finally:
+ if egg and os.path.exists(egg):
+ os.unlink(egg)
+ else:
+ if setuptools.__version__ == '0.0.1':
+ # tell the user to uninstall obsolete version
+ use_setuptools(version)
+
+ req = "setuptools>="+version
+ import pkg_resources
+ try:
+ pkg_resources.require(req)
+ except pkg_resources.VersionConflict:
+ try:
+ from setuptools.command.easy_install import main
+ except ImportError:
+ from easy_install import main
+ main(list(argv)+[download_setuptools(delay=0)])
+ sys.exit(0) # try to force an exit
+ else:
+ if argv:
+ from setuptools.command.easy_install import main
+ main(argv)
+ else:
+ print "Setuptools version",version,"or greater has been installed."
+ print '(Run "ez_setup.py -U setuptools" to reinstall or upgrade.)'
+
+
+
+def update_md5(filenames):
+ """Update our built-in md5 registry"""
+
+ import re
+ from md5 import md5
+
+ for name in filenames:
+ base = os.path.basename(name)
+ f = open(name,'rb')
+ md5_data[base] = md5(f.read()).hexdigest()
+ f.close()
+
+ data = [" %r: %r,\n" % it for it in md5_data.items()]
+ data.sort()
+ repl = "".join(data)
+
+ import inspect
+ srcfile = inspect.getsourcefile(sys.modules[__name__])
+ f = open(srcfile, 'rb'); src = f.read(); f.close()
+
+ match = re.search("\nmd5_data = {\n([^}]+)}", src)
+ if not match:
+ print >>sys.stderr, "Internal error!"
+ sys.exit(2)
+
+ src = src[:match.start(1)] + repl + src[match.end(1):]
+ f = open(srcfile,'w')
+ f.write(src)
+ f.close()
+
+
+if __name__=='__main__':
+ if len(sys.argv)>2 and sys.argv[1]=='--md5update':
+ update_md5(sys.argv[2:])
+ else:
+ main(sys.argv[1:])
+
+
+
+
+
Added: python-clientform/branches/upstream/current/setup.cfg
===================================================================
--- python-clientform/branches/upstream/current/setup.cfg 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/setup.cfg 2007-04-09 21:18:24 UTC (rev 769)
@@ -0,0 +1,5 @@
+[egg_info]
+tag_build =
+tag_date = 0
+tag_svn_revision = 0
+
Modified: python-clientform/branches/upstream/current/setup.py
===================================================================
--- python-clientform/branches/upstream/current/setup.py 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/setup.py 2007-04-09 21:18:24 UTC (rev 769)
@@ -15,11 +15,11 @@
import re
#VERSION_MATCH = re.search(r'VERSION = "(.*)"', open("ClientForm.py").read())
#VERSION = VERSION_MATCH.group(1)
-VERSION = '0.2.2'
+VERSION = '0.2.6'
INSTALL_REQUIRES = []
NAME = "ClientForm"
PACKAGE = False
-LICENSE = "BSD"
+LICENSE = "BSD" # or ZPL 2.1
PLATFORMS = ["any"]
ZIP_SAFE = True
CLASSIFIERS = """\
Modified: python-clientform/branches/upstream/current/test.py
===================================================================
--- python-clientform/branches/upstream/current/test.py 2007-04-09 21:12:03 UTC (rev 768)
+++ python-clientform/branches/upstream/current/test.py 2007-04-09 21:18:24 UTC (rev 769)
@@ -212,7 +212,33 @@
except AttributeError:
return req.headers.items()
+class MockResponse:
+ def __init__(self, f, url):
+ self._file = f
+ self._url = url
+ def geturl(self):
+ return self._url
+ def __getattr__(self, name):
+ return getattr(self._file, name)
+
class ParseTests(TestCase):
+
+ def test_failing_parse(self):
+ # XXX couldn't provoke an error from BeautifulSoup (!), so this has not
+ # been tested with RobustFormParser
+ import sgmllib
+ # Python 2.0 sgmllib raises RuntimeError rather than SGMLParseError,
+ # but seems never to even raise that except as an assertion, from
+ # reading the code...
+ if hasattr(sgmllib, "SGMLParseError"):
+ f = StringIO("<!!!!>")
+ base_uri = "http://localhost/"
+ self.assertRaises(
+ ClientForm.ParseError,
+ ClientForm.ParseFile, f, base_uri, backwards_compat=False,
+ )
+ self.assert_(issubclass(ClientForm.ParseError, sgmllib.SGMLParseError))
+
def test_unknown_control(self):
f = StringIO(
"""<form action="abc">
@@ -226,6 +252,84 @@
for ctl in form.controls:
self.assert_(isinstance(ctl, ClientForm.TextControl))
+ def test_ParseFileEx(self):
+ # empty "outer form" (where the "outer form" is the form consisting of
+ # all controls outside of any form)
+ f = StringIO(
+"""<form action="abc">
+<input type="text"></input>
+</form>
+""")
+ base_uri = "http://localhost/"
+ forms = ClientForm.ParseFileEx(f, base_uri)
+ outer = forms[0]
+ self.assertEqual(len(forms), 2)
+ self.assertEqual(outer.controls, [])
+ self.assertEqual(outer.name, None)
+ self.assertEqual(outer.action, base_uri)
+ self.assertEqual(outer.method, "GET")
+ self.assertEqual(outer.enctype, "application/x-www-form-urlencoded")
+ self.assertEqual(outer.attrs, {})
+
+ # non-empty outer form
+ f = StringIO(
+"""
+<input type="text" name="a"></input>
+<form action="abc">
+ <input type="text" name="b"></input>
+</form>
+<input type="text" name="c"></input>
+<form action="abc">
+ <input type="text" name="d"></input>
+</form>
+<input type="text" name="e"></input>
+""")
+ base_uri = "http://localhost/"
+ forms = ClientForm.ParseFileEx(f, base_uri)
+ outer = forms[0]
+ self.assertEqual(len(forms), 3)
+ self.assertEqual([c.name for c in outer.controls], ["a", "c", "e"])
+ self.assertEqual(outer.name, None)
+ self.assertEqual(outer.action, base_uri)
+ self.assertEqual(outer.method, "GET")
+ self.assertEqual(outer.enctype, "application/x-www-form-urlencoded")
+ self.assertEqual(outer.attrs, {})
+
+ def test_ParseResponse(self):
+ url = "http://example.com/"
+ r = MockResponse(
+ StringIO("""\
+<input type="text" name="outer"></input>
+<form action="abc"><input type="text" name="inner"></input></form>
+"""),
+ url,
+ )
+
+ forms = ClientForm.ParseResponse(r)
+ self.assertEqual(len(forms), 1)
+ form = forms[0]
+ self.assertEqual(form.action, url+"abc")
+ self.assertEqual(form.controls[0].name, "inner")
+
+ def test_ParseResponseEx(self):
+ url = "http://example.com/"
+ r = MockResponse(
+ StringIO("""\
+<input type="text" name="outer"></input>
+<form action="abc"><input type="text" name="inner"></input></form>
+"""),
+ url,
+ )
+
+ forms = ClientForm.ParseResponseEx(r)
+ self.assertEqual(len(forms), 2)
+ outer = forms[0]
+ inner = forms[1]
+ self.assertEqual(inner.action, url+"abc")
+ self.assertEqual(outer.action, url)
+ self.assertEqual(outer.controls[0].name, "outer")
+ self.assertEqual(inner.controls[0].name, "inner")
+
def test_parse_error(self):
f = StringIO(
"""<form action="abc">
@@ -285,10 +389,12 @@
self.assert_(len(forms) == 1)
form = forms[0]
self.assert_(form.name is None)
- self.assertEqual(form.action, "http://localhost/abc&"+u"\u2014".encode('utf8')+"d")
+ self.assertEqual(
+ form.action,
+ "http://localhost/abc&"+u"\u2014".encode('utf8')+"d")
control = form.find_control(type="textarea", nr=0)
self.assert_(control.name is None)
- self.assert_(control.value == "blah, blah,\nRhubarb.\n\n")
+ self.assert_(control.value == "blah, blah,\r\nRhubarb.\r\n\r\n")
empty_control = form.find_control(type="textarea", nr=1)
self.assert_(str(empty_control) == "<TextareaControl(<None>=)>")
@@ -499,7 +605,7 @@
form = forms[0]
self.assert_(form.controls[0].name is None)
- def testNamelessListControls(self):
+ def testNamelessListItems(self):
# XXX SELECT
# these controls have no item names
file = StringIO("""<form action="./weird.html">
@@ -621,7 +727,18 @@
single_control = form.find_control(type="select", nr=1)
self.assert_(single_control.value == ["1"])
+ def test_close_base_tag(self):
+ # Benji York: a single newline immediately after a start tag is
+ # stripped by browsers, but not one immediately before an end tag.
+ # TEXTAREA content is converted to the DOS newline convention.
+ forms = ClientForm.ParseFile(
+ StringIO("<form><textarea>\n\nblah\n</textarea></form>"),
+ "http://example.com/",
+ )
+ ctl = forms[0].find_control(type="textarea")
+ self.assertEqual(ctl.value, "\r\nblah\r\n")
+
class DisabledTests(TestCase):
def testOptgroup(self):
for compat in [False, True]:
@@ -2012,6 +2129,22 @@
else:
self.assertRaises(AmbiguityError, fc, label="Book")
+ def test_find_nameless_control(self):
+ data = """\
+<form>
+ <input type="checkbox"/>
+ <input type="checkbox" id="a" onclick="blah()"/>
+</form>
+"""
+ f = StringIO(data)
+ form = ClientForm.ParseFile(f, "http://example.com/",
+ backwards_compat=False)[0]
+ self.assertRaises(
+ AmbiguityError,
+ form.find_control, type="checkbox", name=ClientForm.Missing)
+ ctl = form.find_control(type="checkbox", name=ClientForm.Missing, nr=1)
+ self.assertEqual(ctl.id, "a")
+
def test_deselect_disabled(self):
def get_new_form(f, compat):
f.seek(0)
@@ -2974,7 +3107,62 @@
if compat:
reset_deprecations()
+ def test_nameless_list_control(self):
+ # ListControls are built up from elements that match by name and type
+ # attributes. Nameless controls cause some tricky cases. We should
+ # get a new control for nameless controls.
+ for data in [
+ """\
+<form>
+ <input type="checkbox" name="foo"/>
+ <input type="checkbox" name="bar"/>
+ <input type="checkbox" id="a" onclick="bar()" checked />
+</form>
+""",
+"""\
+<form>
+ <input type="checkbox" name="foo"/>
+ <input type="checkbox" id="a" onclick="bar()" checked />
+</form>
+""",
+"""\
+<form>
+ <input type="checkbox"/>
+ <input type="checkbox"/>
+ <input type="checkbox" id="a" onclick="bar()" checked />
+</form>
+""",
+ ]:
+ f = StringIO(data)
+ form = ClientForm.ParseFile(f, "http://example.com/",
+ backwards_compat=False)[0]
+ bar = form.find_control(type="checkbox", id="a")
+ # should have value "on", but not be successful
+ self.assertEqual([item.name for item in bar.items], ["on"])
+ self.assertEqual(bar.value, [])
+ self.assertEqual(form.click_pairs(), [])
+ def test_action_with_fragment(self):
+ for method in ["GET", "POST"]:
+ data = ('<form action="" method="%s">'
+ '<input type="submit" name="s"/></form>' % method
+ )
+ f = StringIO(data)
+ form = ClientForm.ParseFile(f, "http://example.com/",
+ backwards_compat=False)[0]
+ self.assertEqual(
+ form.click().get_full_url(),
+ "http://example.com/"+(method=="GET" and "?s=" or ""),
+ )
+ data = '<form action=""><isindex /></form>'
+ f = StringIO(data)
+ form = ClientForm.ParseFile(f, "http://example.com/",
+ backwards_compat=False)[0]
+ form.find_control(type="isindex").value = "blah"
+ self.assertEqual(form.click(type="isindex").get_full_url(),
+ "http://example.com/?blah")
+
+
class ContentTypeTests(TestCase):
def test_content_type(self):
import ClientForm
@@ -3003,6 +3191,32 @@
self.assertEqual(req.ah, not auh)
+class FunctionTests(TestCase):
+
+ def test_normalize_line_endings(self):
+ def check(text, expected, self=self):
+ got = ClientForm.normalize_line_endings(text)
+ self.assertEqual(got, expected)
+
+ # unix
+ check("foo\nbar", "foo\r\nbar")
+ check("foo\nbar\n", "foo\r\nbar\r\n")
+ # mac
+ check("foo\rbar", "foo\r\nbar")
+ check("foo\rbar\r", "foo\r\nbar\r\n")
+ # dos
+ check("foo\r\nbar", "foo\r\nbar")
+ check("foo\r\nbar\r\n", "foo\r\nbar\r\n")
+
+ # inconsistent -- we just blithely convert anything that looks like a
+ # line ending to the DOS convention, following Firefox's behaviour when
+ # normalizing textarea content
+ check("foo\r\nbar\nbaz\rblah\r\n", "foo\r\nbar\r\nbaz\r\nblah\r\n")
+
+ # pathological ;-O
+ check("\r\n\n\r\r\r\n", "\r\n"*5)
+
+
def startswith(string, initial):
if len(initial) > len(string): return False
return string[:len(initial)] == initial
More information about the pkg-zope-commits
mailing list