[SCM] WebKit Debian packaging branch, debian/unstable, updated. debian/1.1.15-1-40151-g37bb677

sullivan sullivan at 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Sat Sep 26 07:49:21 UTC 2009


The following commit has been merged in the debian/unstable branch:
commit da45c146ce8c3e31c84ac67621ae4fd07d6f024f
Author: sullivan <sullivan at 268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Date:   Wed Jul 30 19:04:07 2003 +0000

    JavaScriptCore:
    
    	- JavaScriptCore part of fix for 3284525 -- AutoFill fills in
    	only e-mail address field of New Account form on Apple Store Japan
    
            Reviewed by Darin
    
            * JavaScriptCore.pbproj/project.pbxproj:
    	Mark pcre.h as a Private header
    
    WebCore:
    
    	- WebCore part of fix for 3284525 -- AutoFill fills in
    	only e-mail address field of New Account form on Apple Store Japan
    
    	There were two problems: the regex library being used by
    	KWQRegExp.mm didn't handle unicode at all, and the way we
    	were using word boundaries in our regular expressions didn't
    	work with Japanese.
    
            Reviewed by Darin
    
            * kwq/KWQKHTMLPart.mm:
            (regExpForLabels):
    	Redid the way word boundaries are used; the old way didn't
    	work with PCRE, and also didn't work with Japanese.
    
            * kwq/KWQRegExp.h:
    	removed treatStartAsStartOfInput parameter to match() that Trey had added;
    	it was being used incorrectly and was not necessary.
    
            * kwq/KWQRegExp.mm:
            (compareStringOffsets), (createSortedOffsetsArray),
            (convertCharacterOffsetsToUTF8ByteOffsets),
            (convertUTF8ByteOffsetsToCharacterOffsets):
    	Code copied from JavaScriptCore/regexp.cpp to convert between
    	byte and character offsets. Darin preferred that I copy these
    	methods rather than make them public in JavaScriptCore/regexp.h.
            (QRegExp::KWQRegExpPrivate::compile):
    	converted from regex.h style to pcre.h style
            (QRegExp::KWQRegExpPrivate::~KWQRegExpPrivate):
    	ditto
            (QRegExp::match):
    	ditto
            (QRegExp::search):
    	removed parameter to match()
            (QRegExp::searchRev):
    	ditto
    
            * kwq/KWQString.mm:
            (QString::replace):
    	removed parameter to match()
    
    WebBrowser:
    
    	- WebBrowser part of fix for 3284525 -- AutoFill fills in
    	only e-mail address field of New Account form on Apple Store Japan
    
            Reviewed by Darin
    
            * FormCompletionController.m:
            (-[FormToABBinder _indexMapping:]):
    	Replace ASSERT with ERROR when label tables have duplicates.
    	The Japanese localized file that I was using to test with had
    	such duplicates, and was crashing on me.
    
    
    git-svn-id: http://svn.webkit.org/repository/webkit/trunk@4731 268f45cc-cd09-0410-ab3c-d52691b4dbfc

diff --git a/JavaScriptCore/ChangeLog b/JavaScriptCore/ChangeLog
index 001f7e2..4eaea43 100644
--- a/JavaScriptCore/ChangeLog
+++ b/JavaScriptCore/ChangeLog
@@ -1,3 +1,13 @@
+2003-07-30  John Sullivan  <sullivan at apple.com>
+
+	- JavaScriptCore part of fix for 3284525 -- AutoFill fills in 
+	only e-mail address field of New Account form on Apple Store Japan
+
+        Reviewed by Darin
+
+        * JavaScriptCore.pbproj/project.pbxproj:
+	Mark pcre.h as a Private header
+
 2003-07-28  Maciej Stachowiak  <mjs at apple.com>
 
         Reviewed by Richard.
diff --git a/JavaScriptCore/ChangeLog-2003-10-25 b/JavaScriptCore/ChangeLog-2003-10-25
index 001f7e2..4eaea43 100644
--- a/JavaScriptCore/ChangeLog-2003-10-25
+++ b/JavaScriptCore/ChangeLog-2003-10-25
@@ -1,3 +1,13 @@
+2003-07-30  John Sullivan  <sullivan at apple.com>
+
+	- JavaScriptCore part of fix for 3284525 -- AutoFill fills in 
+	only e-mail address field of New Account form on Apple Store Japan
+
+        Reviewed by Darin
+
+        * JavaScriptCore.pbproj/project.pbxproj:
+	Mark pcre.h as a Private header
+
 2003-07-28  Maciej Stachowiak  <mjs at apple.com>
 
         Reviewed by Richard.
diff --git a/JavaScriptCore/JavaScriptCore.pbproj/project.pbxproj b/JavaScriptCore/JavaScriptCore.pbproj/project.pbxproj
index ec3975e..5f571e5 100644
--- a/JavaScriptCore/JavaScriptCore.pbproj/project.pbxproj
+++ b/JavaScriptCore/JavaScriptCore.pbproj/project.pbxproj
@@ -564,6 +564,9 @@
 			fileRef = 6541720F039E08B90058BFEB;
 			isa = PBXBuildFile;
 			settings = {
+				ATTRIBUTES = (
+					Private,
+				);
 			};
 		};
 		65417217039E0B280058BFEB = {
diff --git a/WebCore/ChangeLog-2003-10-25 b/WebCore/ChangeLog-2003-10-25
index e0ca4ce..5ae529e 100644
--- a/WebCore/ChangeLog-2003-10-25
+++ b/WebCore/ChangeLog-2003-10-25
@@ -1,3 +1,46 @@
+2003-07-30  John Sullivan  <sullivan at apple.com>
+
+	- WebCore part of fix for 3284525 -- AutoFill fills in 
+	only e-mail address field of New Account form on Apple Store Japan
+
+	There were two problems: the regex library being used by
+	KWQRegExp.mm didn't handle unicode at all, and the way we
+	were using word boundaries in our regular expressions didn't
+	work with Japanese.
+
+        Reviewed by Darin
+
+        * kwq/KWQKHTMLPart.mm:
+        (regExpForLabels):
+	Redid the way word boundaries are used; the old way didn't
+	work with PCRE, and also didn't work with Japanese. 
+
+        * kwq/KWQRegExp.h:
+	removed treatStartAsStartOfInput parameter to match() that Trey had added; 
+	it was being used incorrectly and was not necessary.
+
+        * kwq/KWQRegExp.mm:
+        (compareStringOffsets), (createSortedOffsetsArray),
+        (convertCharacterOffsetsToUTF8ByteOffsets),
+        (convertUTF8ByteOffsetsToCharacterOffsets):
+	Code copied from JavaScriptCore/regexp.cpp to convert between
+	byte and character offsets. Darin preferred that I copy these
+	methods rather than make them public in JavaScriptCore/regexp.h.
+        (QRegExp::KWQRegExpPrivate::compile):
+	converted from regex.h style to pcre.h style 
+        (QRegExp::KWQRegExpPrivate::~KWQRegExpPrivate):
+	ditto
+        (QRegExp::match):
+	ditto
+        (QRegExp::search):
+	removed parameter to match()
+        (QRegExp::searchRev):
+	ditto
+
+        * kwq/KWQString.mm:
+        (QString::replace):
+	removed parameter to match()
+
 2003-07-30  Richard Williamson   <rjw at apple.com>
 
 	Fixed 3349598.  Deal gracefully with <li> items that
diff --git a/WebCore/ChangeLog-2005-08-23 b/WebCore/ChangeLog-2005-08-23
index e0ca4ce..5ae529e 100644
--- a/WebCore/ChangeLog-2005-08-23
+++ b/WebCore/ChangeLog-2005-08-23
@@ -1,3 +1,46 @@
+2003-07-30  John Sullivan  <sullivan at apple.com>
+
+	- WebCore part of fix for 3284525 -- AutoFill fills in 
+	only e-mail address field of New Account form on Apple Store Japan
+
+	There were two problems: the regex library being used by
+	KWQRegExp.mm didn't handle unicode at all, and the way we
+	were using word boundaries in our regular expressions didn't
+	work with Japanese.
+
+        Reviewed by Darin
+
+        * kwq/KWQKHTMLPart.mm:
+        (regExpForLabels):
+	Redid the way word boundaries are used; the old way didn't
+	work with PCRE, and also didn't work with Japanese. 
+
+        * kwq/KWQRegExp.h:
+	removed treatStartAsStartOfInput parameter to match() that Trey had added; 
+	it was being used incorrectly and was not necessary.
+
+        * kwq/KWQRegExp.mm:
+        (compareStringOffsets), (createSortedOffsetsArray),
+        (convertCharacterOffsetsToUTF8ByteOffsets),
+        (convertUTF8ByteOffsetsToCharacterOffsets):
+	Code copied from JavaScriptCore/regexp.cpp to convert between
+	byte and character offsets. Darin preferred that I copy these
+	methods rather than make them public in JavaScriptCore/regexp.h.
+        (QRegExp::KWQRegExpPrivate::compile):
+	converted from regex.h style to pcre.h style 
+        (QRegExp::KWQRegExpPrivate::~KWQRegExpPrivate):
+	ditto
+        (QRegExp::match):
+	ditto
+        (QRegExp::search):
+	removed parameter to match()
+        (QRegExp::searchRev):
+	ditto
+
+        * kwq/KWQString.mm:
+        (QString::replace):
+	removed parameter to match()
+
 2003-07-30  Richard Williamson   <rjw at apple.com>
 
 	Fixed 3349598.  Deal gracefully with <li> items that
diff --git a/WebCore/kwq/KWQKHTMLPart.mm b/WebCore/kwq/KWQKHTMLPart.mm
index c4f6e49..eb57a48 100644
--- a/WebCore/kwq/KWQKHTMLPart.mm
+++ b/WebCore/kwq/KWQKHTMLPart.mm
@@ -260,7 +260,7 @@ HTMLFormElementImpl *KWQKHTMLPart::currentForm() const
 }
 
 // Either get cached regexp or build one that matches any of the labels.
-// The regexp we build is of the form:   [[:<:]](STR1|STR2|STRN)[[:>:]]
+// The regexp we build is of the form:  (STR1|STR2|STRN)
 QRegExp *regExpForLabels(NSArray *labels)
 {
     // Parallel arrays that we use to cache regExps.  In practice the number of expressions
@@ -268,6 +268,7 @@ QRegExp *regExpForLabels(NSArray *labels)
     static const unsigned int regExpCacheSize = 4;
     static NSMutableArray *regExpLabels = nil;
     static QPtrList <QRegExp> regExps;
+    static QRegExp wordRegExp = QRegExp("\\w");
 
     QRegExp *result;
     if (!regExpLabels) {
@@ -277,17 +278,34 @@ QRegExp *regExpForLabels(NSArray *labels)
     if (cacheHit != NSNotFound) {
         result = regExps.at(cacheHit);
     } else {
-        QString pattern("[[:<:]](");
+        QString pattern("(");
         unsigned int numLabels = [labels count];
         unsigned int i;
         for (i = 0; i < numLabels; i++) {
             QString label = QString::fromNSString([labels objectAtIndex:i]);
+
+            bool startsWithWordChar = false;
+            bool endsWithWordChar = false;
+            if (label.length() != 0) {
+                startsWithWordChar = wordRegExp.search(label.at(0)) >= 0;
+                endsWithWordChar = wordRegExp.search(label.at(label.length() - 1)) >= 0;
+            }
+            
             if (i != 0) {
                 pattern.append("|");
             }
+            // Search for word boundaries only if label starts/ends with "word characters".
+            // If we always searched for word boundaries, this wouldn't work for languages
+            // such as Japanese.
+            if (startsWithWordChar) {
+                pattern.append("\\b");
+            }
             pattern.append(label);
+            if (endsWithWordChar) {
+                pattern.append("\\b");
+            }
         }
-        pattern.append(")[[:>:]]");
+        pattern.append(")");
         result = new QRegExp(pattern, false);
     }
 
diff --git a/WebCore/kwq/KWQRegExp.h b/WebCore/kwq/KWQRegExp.h
index 8a157ac..60af0c5 100644
--- a/WebCore/kwq/KWQRegExp.h
+++ b/WebCore/kwq/KWQRegExp.h
@@ -41,7 +41,7 @@ public:
     QRegExp &operator=(const QRegExp &);
 
     QString pattern() const;
-    int match(const QString &, int startFrom = 0, int *matchLength = 0, bool treatStartAsStartOfInput = true) const;
+    int match(const QString &, int startFrom = 0, int *matchLength = 0) const;
 
     int search(const QString &, int startFrom = 0) const;
     int searchRev(const QString &) const;
diff --git a/WebCore/kwq/KWQRegExp.mm b/WebCore/kwq/KWQRegExp.mm
index d0f5237..9532584 100644
--- a/WebCore/kwq/KWQRegExp.mm
+++ b/WebCore/kwq/KWQRegExp.mm
@@ -23,12 +23,110 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
  */
 
-
 #import "KWQRegExp.h"
 #import "KWQLogging.h"
 
 #import <sys/types.h>
-#import <regex.h>
+#import <JavaScriptCore/pcre.h>
+
+
+// Functions to convert between byte offets and character offsets were
+// lifted from JavaScriptCore/regexp.cpp. It would be nice to share this code.
+struct StringOffset {
+    int offset;
+    int locationInOffsetsArray;
+};
+
+static int compareStringOffsets(const void *a, const void *b)
+{
+    const StringOffset *oa = static_cast<const StringOffset *>(a);
+    const StringOffset *ob = static_cast<const StringOffset *>(b);
+    
+    if (oa->offset < ob->offset) {
+        return -1;
+    }
+    if (oa->offset > ob->offset) {
+        return +1;
+    }
+    return 0;
+}
+
+const int sortedOffsetsFixedBufferSize = 128;
+
+static StringOffset *createSortedOffsetsArray(const int offsets[], int numOffsets,
+                                              StringOffset sortedOffsetsFixedBuffer[sortedOffsetsFixedBufferSize])
+{
+    // Allocate the sorted offsets.
+    StringOffset *sortedOffsets;
+    if (numOffsets <= sortedOffsetsFixedBufferSize) {
+        sortedOffsets = sortedOffsetsFixedBuffer;
+    } else {
+        sortedOffsets = new StringOffset [numOffsets];
+    }
+    
+    // Copy offsets.
+    for (int i = 0; i != numOffsets; ++i) {
+        sortedOffsets[i].offset = offsets[i];
+        sortedOffsets[i].locationInOffsetsArray = i;
+    }
+    
+    // Sort them.
+    qsort(sortedOffsets, numOffsets, sizeof(StringOffset), compareStringOffsets);
+    
+    return sortedOffsets;
+}
+
+static void convertCharacterOffsetsToUTF8ByteOffsets(const char *s, int *offsets, int numOffsets)
+{
+    // Allocate buffer.
+    StringOffset fixedBuffer[sortedOffsetsFixedBufferSize];
+    StringOffset *sortedOffsets = createSortedOffsetsArray(offsets, numOffsets, fixedBuffer);
+    
+    // Walk through sorted offsets and string, adjusting all the offests.
+    // Offsets that are off the ends of the string map to the edges of the string.
+    int characterOffset = 0;
+    const char *p = s;
+    for (int oi = 0; oi != numOffsets; ++oi) {
+        const int nextOffset = sortedOffsets[oi].offset;
+        while (*p && characterOffset < nextOffset) {
+            // Skip to the next character.
+            ++characterOffset;
+            do ++p; while ((*p & 0xC0) == 0x80); // if 1 of the 2 high bits is set, it's not the start of a character
+        }
+        offsets[sortedOffsets[oi].locationInOffsetsArray] = p - s;
+    }
+    
+    // Free buffer.
+    if (sortedOffsets != fixedBuffer) {
+        delete [] sortedOffsets;
+    }
+}
+
+static void convertUTF8ByteOffsetsToCharacterOffsets(const char *s, int *offsets, int numOffsets)
+{
+    // Allocate buffer.
+    StringOffset fixedBuffer[sortedOffsetsFixedBufferSize];
+    StringOffset *sortedOffsets = createSortedOffsetsArray(offsets, numOffsets, fixedBuffer);
+    
+    // Walk through sorted offsets and string, adjusting all the offests.
+    // Offsets that are off the end of the string map to the edges of the string.
+    int characterOffset = 0;
+    const char *p = s;
+    for (int oi = 0; oi != numOffsets; ++oi) {
+        const int nextOffset = sortedOffsets[oi].offset;
+        while (*p && (p - s) < nextOffset) {
+            // Skip to the next character.
+            ++characterOffset;
+            do ++p; while ((*p & 0xC0) == 0x80); // if 1 of the 2 high bits is set, it's not the start of a character
+        }
+        offsets[sortedOffsets[oi].locationInOffsetsArray] = characterOffset;
+    }
+    
+    // Free buffer.
+    if (sortedOffsets != fixedBuffer) {
+        delete [] sortedOffsets;
+    }
+}
 
 
 class QRegExp::KWQRegExpPrivate
@@ -41,8 +139,8 @@ public:
     void compile(bool caseSensitive, bool glob);
 
     QString pattern;
-    regex_t regex;
-
+    pcre *regex;
+    
     uint refCount;
 
     int lastMatchPos;
@@ -94,18 +192,28 @@ void QRegExp::KWQRegExpPrivate::compile(bool caseSensitive, bool glob)
     // Note we don't honor the Qt syntax for various character classes.  If we convert
     // to a different underlying engine, we may need to change client code that relies
     // on the regex syntax (see KWQKHTMLPart.mm for a couple examples).
-
-    const char *cpattern = p.latin1();
-
-    int err = regcomp(&regex, cpattern, REG_EXTENDED | (caseSensitive ? 0 : REG_ICASE));
-    if (err) {
-        ERROR("regcomp failed with error=%d", err);
+    
+    QCString asUTF8;
+    const char *cpattern;
+    
+    if (p.isAllASCII()) {
+        cpattern = p.ascii();
+    } else {
+        asUTF8 = p.utf8();
+        cpattern = asUTF8;
+    }
+        
+    const char *errorMessage;
+    int errorOffset;
+    regex = pcre_compile(cpattern, PCRE_UTF8 | (caseSensitive ? 0 : PCRE_CASELESS), &errorMessage, &errorOffset, NULL);
+    if (regex == NULL) {
+        ERROR("KWQRegExp: pcre_compile failed with '%s'", errorMessage);
     }
 }
 
 QRegExp::KWQRegExpPrivate::~KWQRegExpPrivate()
 {
-    regfree(&regex);
+    pcre_free(regex);
 }
 
 
@@ -146,31 +254,42 @@ QString QRegExp::pattern() const
     return d->pattern;
 }
 
-int QRegExp::match(const QString &str, int startFrom, int *matchLength, bool treatStartAsStartOfInput) const
+int QRegExp::match(const QString &str, int startFrom, int *matchLength) const
 {    
-    const char *cstring = str.latin1() + startFrom;
-
-    int flags = 0;
-
-    if (startFrom != 0 && !treatStartAsStartOfInput) {
-	flags |= REG_NOTBOL;
+    QCString asUTF8;
+    const char *cstring;
+    
+    if (str.isAllASCII()) {
+        cstring = str.ascii();
+    } else {
+        asUTF8 = str.utf8();
+        cstring = asUTF8;
     }
-
-    regmatch_t match[1];
-    int result = regexec(&d->regex, cstring, 1, match, flags);
-
-    if (result != 0) {
+        
+    // first 2 offsets are start and end offsets; 3rd entry is used internally by pcre
+    int offsets[3];
+    convertCharacterOffsetsToUTF8ByteOffsets(cstring, &startFrom, 1);
+    int result = pcre_exec(d->regex, NULL, cstring, strlen(cstring), startFrom, 
+                           startFrom == 0 ? 0 : PCRE_NOTBOL, offsets, 3);
+    
+    if (result < 0) {
+        if (result != PCRE_ERROR_NOMATCH) {
+            ERROR("KWQRegExp: pcre_exec() failed with result %d", result);
+        }
         d->lastMatchPos = -1;
         d->lastMatchLength = -1;
-	return -1;
-    } else {
-        d->lastMatchPos = startFrom + match[0].rm_so;
-        d->lastMatchLength = match[0].rm_eo - match[0].rm_so;
-	if (matchLength != NULL) {
-            *matchLength = d->lastMatchLength;
-	}
-        return d->lastMatchPos;
+        return -1;
     }
+    
+    ASSERT(result < 2);
+    // 1 means 1 match; 0 means more than one match, first one is recorded in offsets
+    convertUTF8ByteOffsetsToCharacterOffsets(cstring, offsets, 2);
+    d->lastMatchPos = offsets[0];
+    d->lastMatchLength = offsets[1] - offsets[0];
+    if (matchLength != NULL) {
+        *matchLength = d->lastMatchLength;
+    }
+    return d->lastMatchPos;
 }
 
 int QRegExp::search(const QString &str, int startFrom) const
@@ -178,7 +297,7 @@ int QRegExp::search(const QString &str, int startFrom) const
     if (startFrom < 0) {
         startFrom = str.length() - startFrom;
     }
-    return match(str, startFrom, NULL, false);
+    return match(str, startFrom, NULL);
 }
 
 int QRegExp::searchRev(const QString &str) const
@@ -190,7 +309,7 @@ int QRegExp::searchRev(const QString &str) const
     int lastMatchLength = -1;
     do {
         int matchLength;
-        pos = match(str, start, &matchLength, start == 0);
+        pos = match(str, start, &matchLength);
         if (pos >= 0) {
             if ((pos+matchLength) > (lastPos+lastMatchLength)) {
                 // replace last match if this one is later and not a subset of the last match
diff --git a/WebCore/kwq/KWQString.mm b/WebCore/kwq/KWQString.mm
index 24f5b55..a208440 100644
--- a/WebCore/kwq/KWQString.mm
+++ b/WebCore/kwq/KWQString.mm
@@ -2306,7 +2306,7 @@ QString &QString::replace(const QRegExp &qre, const QString &str)
     int slen  = str.dataHandle[0]->_length;
     int len;
     while ( index < (int)dataHandle[0]->_length ) {
-	index = qre.match( *this, index, &len, FALSE );
+	index = qre.match( *this, index, &len);
 	if ( index >= 0 ) {
 	    replace( index, len, str );
 	    index += slen;

-- 
WebKit Debian packaging



More information about the Pkg-webkit-commits mailing list