[Webkit-unassigned] [Bug 38572] [WTFURL] Add core URL parser

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed May 5 17:34:38 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=38572





--- Comment #11 from Alexey Proskuryakov <ap at webkit.org>  2010-05-05 17:34:37 PST ---
> +        This patch adds the core of the URL parser.  The URL parser uses a
> +        templated notion of a character to support different character types.

Ideally, we should match Unicode nomenclature, and call these "code units"
<http://unicode.org/glossary/#code_unit>. It's wrong to call a UTF-16 code unit
a character, even though that's already established in WebKit code and some
Web-related specifications - but using this word for bytes in UTF-8 strings is
just weird.

>        'sources': [
>          'src/URLComponent.h',
> +        'src/URLParser.h',

Is this intentional? My understanding was that no port (including chromium) was
using this right away, and it's strange that only a header was added.

> + * The contents of this file are subject to the Mozilla Public License Version

Aren't we supposed to only leave LGPL when importing Mozilla code? I don't know
for sure, but I never saw any official word permitting tri-license in WebKit
repository.

> +    // This handles everything that may be an authority terminator, including
> +    // backslash. For special backslash handling see parseAfterScheme.
> +    static bool isAuthorityTerminator(CHAR ch)

Why not call it "isPossibleAuthorityTerminator" then? BTW, I didn't know that
semicolon could do that.

Also, I'd make this function just take UChar32. We only need CHAR in arrays.

> +    // Given an already-identified auth section, breaks it into its consituent

Typo: "consituent"

> +    static void parseAuthority(const CHAR* spec, const URLComponent& auth, URLComponent* username, URLComponent* password, URLComponent* host, URLComponent* port)

All URLComponent arguments are required here, so they should all be passed in
the same way, either by pointer or by reference. Or is it some calling
convention I do not know about?

> +        // FIXME: add ASSERT(auth.isValid()); // We should always get an authority.
> +        if (auth.length() == 0) {

Should be !auth.length() (or can we even say auth.isEmpty()?)

> +        // Search backwards for @, which is the separator between the user info and
> +        // the server info.
> +        int i = auth.begin() + auth.length() - 1;
> +        while (i > auth.begin() && spec[i] != '@')
> +            --i;

This will find '@' as the last character, but not as the first one. If that's
correct, this should ideally have a comment explaining why we don't care.

I only read a small portion of the patch, but it changed while I was looking at
it; will start over later or tomorrow.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list