[Webkit-unassigned] [Bug 16122] When posting to these boards, all Safari users have "webkitformboundary-gibberish" appended to their name and message

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Mar 24 16:34:46 PDT 2009


https://bugs.webkit.org/show_bug.cgi?id=16122





------- Comment #11 from billmonk2 at gmail.com  2009-03-24 16:34 PDT -------
The issue is not really WebKit's, though WebKit has worked aroud similar
issues. Below is a repeatable explanation for why this issue occurs. A patch
will be submitted shortly.

When WebBBS 5.12 parses a POST, it uses a perl regex to locate form boundaries,
then uses the found boundary text itself as a regex to decide whether to skip
boudary text or accept it as user text. I'm sure you can guess where this is
going...

In /webkit/WebCore/platform/network/FormDataBuilder.cpp,
::generateUniqueBoundaryString() creates random strings from the set of
alphanumeric characters plus the '+' character. A comment there mentions that a
few other legal characters (according to RFC 2046) have been omitted from
because they gmail and possibly other sites have problems with them.
<http://bugs.webkit.org/show_bug.cgi?id=13352> and <rdar://problem/5252577>

Thus WebKit generates boundary strings which legally, though randomly, may
contain '+' characters.

Next, looking at the latest WebBBS perl files (last updated 2002) at
http://awsd.com/download/webbbs/webbbs_files.zip

in the file webbbs_post.pl, the following code occurs at line 28:

 sub Parse_Post {
        if ($ENV{'CONTENT_TYPE'} =~ /boundary=(\"?([^\";,]+)\"?)*/) { $boundary
= $1; }
        binmode STDIN;
        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
        @buffer = split(/\r\n/,$buffer);
        foreach $line (@buffer) {
                if ($line =~ /$boundary/) { $Current = ""; next; }

...

}

The first regex tries to identify the random portions of form boundaries as
"one or more characters which are not a quote symbol, a semicolon, or a comma."
This is not the same thing as the set of legal characters identified by RFC
2046. Still, it seems to work more or less reliably as-is. 

The second regex, in this code

        foreach $line (@buffer) {
                if ($line =~ /$boundary/) { $Current = ""; next; }

then uses the boundary itself as a regex, to decide whether to ignore the
boundary. I'm certain you see where this is going...

Given that WebKit can randomly emit boundaries containing '+' characters, say a
given boundary is "a+b". The first regex finds it. The second regex then asks
"does the string "a+b" exactly match the pattern "one or more 'a' characters
followed by one 'b' character?" This of course fails. Thus the boundary text is
processed as if it were user input, causing "------WebKitFormBoundary"
(followed by the random boundary characters, always containing a '+') to
randomly appear in WebBBBS user input.

This ca be tested by going to the sites mentioned in previous comments,
http://wwwboard.modelcarkits.com/
http://www.misterguitar.us/cgi-bin/chetboard.pl

Scroll to bottom of page, click "Post New Message", enter anything for name,
subject, and message body, then click the "Preview Message" button.
If "------WebKitFormBoundary...." does not appear in various fields, WebKit's
randomly-generated form boundary text did not contain a '+' character and so
did not trigger the WebBBS regex escaping issue. Continue clicking the "Preview
Message" button until the boundary text appears - it may take a few tries. When
it does, note that the "gibberish" or "garbage characters" reported by users
following the "WebKitFormBoundary" contains one or more '+' characters.

This bug also can trigger WebBBS' rudimentary "bad langauge filter" - a file of
"udesirable phrases" which can optionally be supplied by the board's admin. If
such a file is in use, it is not uncommon for the random boundary text to
contain character sequences which are close enough to the target words to
trigger the filter, causing the board to reject the post and ask the user to
rephrase their submission. When this occurs, no amount of editing will clear
the restriction since the "problem word" is in the boundary text (most likely
interspersed with other characters and not immediately recognizable) and not in
the user's submission at all. In the course of investigating this issue, I
noticed some quite surprising "near-misses" being emitted.

A WebBBS fix (which works here, running it under OS X WebSharing) is to escape
any '+' characters in found boundary text before using the text as a regex:

        if ($ENV{'CONTENT_TYPE'} =~ /boundary=(\"?([^\";,]+)\"?)*/)     
                { $boundary = $1; }
        $boundary =~ s/\+/\\+/g;


However, while the WebBBS site appears to be active, WebBBS itself has not seen
an update since 2002; it may effectively be abandonware. I see no obvious place
to submit patches there.

A WebKit workaround is to do what's been done before: in
generateUniqueBoundaryString(), omit the legal '+' from the array of characters
used to create the random boundary strings. 

In the array below, the next-to-last value (0x2B) would be replaced by some
other character in the array, as was done with the last value (which now
duplicates the first, due to http://bugs.webkit.org/show_bug.cgi?id=13352
and/or <rdar://problem/5252577>):

    static const char alphaNumericEncodingMap[64] = {
        0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,
        0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F, 0x50,
        0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58,
        0x59, 0x5A, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66,
        0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E,
        0x6F, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76,
        0x77, 0x78, 0x79, 0x7A, 0x30, 0x31, 0x32, 0x33,
        0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x2B, 0x41
};

Pehaps 'a' is a decent choice, for no reason other than that it's near the
middle of the array, halfway between the other two repeated characters.

In general, the legal boundary characters which are also regex metacharacters
seem likely to have unexpected effects in the wild.
The + character is the last such character remaining in the array.

Note that even patching both WebBBS and WebKit will not immediately eliminate
all reports of "------WebKitFormBoundary...." appearing on these sites. Safari
users, for instance, who have experienced the issue now likely have form
boundary text (always containing a '+') in their forms AutoFill, which gets
auto-re-entered into WebBBS forms even when WebKit's current boundary text does
not contain a '+' and doesn't actually trigger the WebBBS bug! This makes it
rather annoying and difficult to get rid of on active boards.

Once either WebBBS is fixed or WebKit has a workaround in place, WebBBS admins
may wish to have their users to edit their BBS' URLs out of
Safari->Preferences->AutoFill->other forms. But until one patch or the other is
in place, this won't help - WebKit's legal, randomly-generated '+' characters
will continue to trigger the issue in WebBBS.

The patch above has been built and tested on all reported problem sites. It
appears to successfully work around the issue, and will be submitted shortly. 


-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list