[Webkit-unassigned] [Bug 38572] [WTFURL] Add core URL parser
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Wed May 5 17:34:38 PDT 2010
https://bugs.webkit.org/show_bug.cgi?id=38572
--- Comment #11 from Alexey Proskuryakov <ap at webkit.org> 2010-05-05 17:34:37 PST ---
> + This patch adds the core of the URL parser. The URL parser uses a
> + templated notion of a character to support different character types.
Ideally, we should match Unicode nomenclature, and call these "code units"
<http://unicode.org/glossary/#code_unit>. It's wrong to call a UTF-16 code unit
a character, even though that's already established in WebKit code and some
Web-related specifications - but using this word for bytes in UTF-8 strings is
just weird.
> 'sources': [
> 'src/URLComponent.h',
> + 'src/URLParser.h',
Is this intentional? My understanding was that no port (including chromium) was
using this right away, and it's strange that only a header was added.
> + * The contents of this file are subject to the Mozilla Public License Version
Aren't we supposed to only leave LGPL when importing Mozilla code? I don't know
for sure, but I never saw any official word permitting tri-license in WebKit
repository.
> + // This handles everything that may be an authority terminator, including
> + // backslash. For special backslash handling see parseAfterScheme.
> + static bool isAuthorityTerminator(CHAR ch)
Why not call it "isPossibleAuthorityTerminator" then? BTW, I didn't know that
semicolon could do that.
Also, I'd make this function just take UChar32. We only need CHAR in arrays.
> + // Given an already-identified auth section, breaks it into its consituent
Typo: "consituent"
> + static void parseAuthority(const CHAR* spec, const URLComponent& auth, URLComponent* username, URLComponent* password, URLComponent* host, URLComponent* port)
All URLComponent arguments are required here, so they should all be passed in
the same way, either by pointer or by reference. Or is it some calling
convention I do not know about?
> + // FIXME: add ASSERT(auth.isValid()); // We should always get an authority.
> + if (auth.length() == 0) {
Should be !auth.length() (or can we even say auth.isEmpty()?)
> + // Search backwards for @, which is the separator between the user info and
> + // the server info.
> + int i = auth.begin() + auth.length() - 1;
> + while (i > auth.begin() && spec[i] != '@')
> + --i;
This will find '@' as the last character, but not as the first one. If that's
correct, this should ideally have a comment explaining why we don't care.
I only read a small portion of the patch, but it changed while I was looking at
it; will start over later or tomorrow.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list