[Webkit-unassigned] [Bug 29972] Implement URL decomposition IDL attributes for HTMLAnchorElement

Mon Oct 12 09:57:34 PDT 2009

https://bugs.webkit.org/show_bug.cgi?id=29972

Darin Adler <darin at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #40586|review?                     |review-
               Flag|                            |

--- Comment #5 from Darin Adler <darin at apple.com>  2009-10-12 09:57:34 PDT ---
(From update of attachment 40586)
Change looks good. Still some things to fix.

> -    // Setting the port to 0 will clear any port from the URL.
> +    void clearPort();

Given that the call to check if a port is present is named hasPort(), I would
call this removePort() rather than clearPort(). This is also consistent with
the name of the removeFragmentIdentifier function.

>  #ifndef NDEBUG
>      void print() const;
>  #endif
> +    bool isHierarchical() const;

This seems like a strange place to tuck this function -- easy to miss it here
in the same paragraph with a debug-only function. I'm also not sure we want a
public function for this, and if we do we should document it I think.

> +    for (int i = 1; i < protocol.length(); i++) {
> +        if (!isSchemeChar(protocol[i]))
> +            return false;
> +    }

Since String::length returns an unsigned, I suspect this won't compile on some
platforms due to unsigned/signed comparison warnings. Please make "i" an
unsigned rather than an int.

But also, I would normally put the length into a local variable in a case like
this, since it's used once outside the loop and then again each time through
the loop. Not sure everyone would agree with me on that.

> +void KURL::clearPort()
> +{
> +    parse(m_string.left(m_hostEnd) + m_string.substring(m_portEnd));
> +}

If m_isValid is false, I'm not sure this will do the right thing. Does one of
your test cases exercise that code path?

If m_hostEnd == m_portEnd, then I think this does quite a bit of extra work,
and I think you should optimize that case with an early return.

>      // FIXME: Non-ASCII characters must be encoded and escaped to match parse() expectations.
> -    parse(m_string.left(m_queryEnd) + "#" + s);
> +    if (s.isEmpty())
> +        parse(m_string.left(m_queryEnd));
> +    else
> +        parse(m_string.left(m_queryEnd) + "#" + s);

I think it's strange that this refuses to put "#" on for an empty fragment
identifier. I suggest coding this quirk at the call site inside setHash rather
than building it into the KURL function.

> +    // Setter condition:
> +    // The new value is not the empty string and input is hierarchical and uses
> +    // a server-based naming authority 
> +    if (value.isEmpty())
> +        return;
> +    KURL url = href();
> +    if (!url.isHierarchical() || url.isLocalFile())
> +        return;

I don't think !isHierarchical || isLocalFile is the right way to check if the
URL uses a server-based naming authority. You should ask some people about that
on webkit-dev -- we need to find some experts. I think we might need to ask
SecurityOrigin this question.

And if this is the reason you exposed isHierarchical, then I think that's a
mistake. Instead you should expose a function that does the entire
(!isHierarchical || isLocalFile) test with an appropriate name. And keep
isHierarchical private.

> +        // FireFox rejects the port if it is not numerical value.

The name Firefox doesn't have a second capital "F" in it. Normally in comments
like this it's useful to be more specific, mentioning the version number you
tested. Also, is Firefox what we want to cite here? Does the HTML 5
specification say anything about this? What about Internet Explorer's behavior?

> +        // Accept the port portion only if it is numerical,
> +        // and not default for the scheme.
> +        int i = separator + 1;
> +        for (; i < value.length() && isASCIIDigit(value[i]); i++);

I believe this loop structure will give a warning on some compilers. Instead I
suggest making a helper function that returns true only if all characters are
ASCII digits.

Did you test cases where there is leading or trailing whitespace? Does Firefox
actually ignore the numbers in strings that have that whitespace?

The toUInt and toUIntStrict functions already have an out parameter "ok" that
tells whether the passed in thing is a number. There's a chance you could use
that instead of writing you own loop to check for ASCII digits.

> +    // Setter preprocessor:
> +    //Remove all leading U+002F SOLIDUS ("/") characters 

I don't understand the term "Setter preprocessor" term here and "Setter
condition" -- are those terms from the HTML 5 standard? If so, then perhaps
there's a way to make that clearer. If not, maybe you could just leave those
comment lines out. They don't seem to add much.

Missing space after "//" on the second line. And it should have a period too.

> +    int i = 0;
> +    for (; i < value.length() && value[i] == '/'; i++);

Again, the semicolon on the same line as the loop will not compile with some
compilers. And searching for a single character should be done by calling
String::find rather than with a custom loop.

> +    //If it has no leading U+002F SOLIDUS ("/") character, prepend a U+002F SOLIDUS ("/") character to the new value 

Missing space after "//" on the second line. And it should have a period too.

> +    if (value.startsWith("/"))

A significantly more efficient way to do this check is value[0] == '/'.
String's subscript operator returns 0 for characters past the end of the string
so you don't have to worry about the empty string. Doing it the way you wrote
it actually creates an unneeded temporary String because we haven't overloaded
the function for "const char*" (which we could do to fix that problem) and also
calls strlen. All unneeded.

> +    // We accept the port if it is numerical and not default for the scheme,
> +    // else remove the port.

The "remove the port" comment here makes it clear that removePort is a better
name.

> +    int separator = value.find(":");

The find function will run more efficiently if you pass it ':' instead of ":".

> +    KURL url = href();
> +    if (separator > 0)  {
> +        String subvalue = value.substring(0, separator);
> +        if (!KURL::isProtocolValid(subvalue))
> +            return;
> +        url.setProtocol(subvalue);
> +    }
> +    else {
> +        if (!KURL::isProtocolValid(value))
> +            return;
> +        url.setProtocol(value);
> +    }

You could save duplicated code if you set subvalue to value when separator is
-1. And in fact, that's exactly what would happen if you passed the -1 to the
substring function, so you could just remove the if entirely and keep the code
inside the if clause.

Also, the brace before else is supposed to be on the same line with the else.

> +    if (value.startsWith("?"))

Again, value[0] == '?' is more efficient for such cases.

Try to structure the code so that it doesn't call toUInt two times on the same
string. I noticed that in a couple of cases.

The tests look good. They would be even better if they were written using the
script-tests machinery, because then they would make it clear what the expected
results are and you could tell if they were passing or failing just by viewing
them in a web browser. An example of one of these is
fast/dom/HTMLFormElement/script-tests/elements-not-in-document.js. The HTML
wrapper for such tests is made by running the make-script-tests-wrappers
script. To start a new directory like HTMLAnchorElement you just need to create
the script-tests directory and put a TEMPLATE.html file in it. Best to svn copy
the TEMPLATE.html file from another script-tests directory at the same level of
the directory hierarchy, such as
fast/dom/HTMLFormElement/script-tests/TEMPLATE.html in your case.

Similarly, it's helpful to write out what's being tested, not just a list of
the test output. You would use the poorly-named "debug()" function in your
script-tests test to write out a line of text telling what the script has done
so that then you can get a sense of why the results are expected. A pretty good
example of that is fast/dom/Node/script-tests/initial-values.js.

It would be good to include test cases that document what happens when the
original URL is not well formed. Your test cases are good, but they are staying
too close to "normal good" cases and not exercising the edge cases enough.
There should be a test case that takes every single branch in each of the
functions used. For example, you have to at least try setting each of the
properties to the empty string and other types of illegal values.

To make the tests more economical, I suggest you write out the e entire URL
after the tests rather than writing out the separate protocol, host, pathname,
href, and search each time. It's OK to validate that the engine properly parses
the URL after changes like this, but writing them out on separate lines is
making the test seem much bigger than it is and indirectly causing us to test
fewer cases. The goal here is to test the setter behavior, not to create
additional tests of the parsing for the getters, so I think that's a good
tradeoff.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.