[Webkit-unassigned] [Bug 172748] Consider blocking requests to HTTP(S) URLs that contain both `\n` and `<` characters.

Tue Oct 8 11:50:54 PDT 2019

https://bugs.webkit.org/show_bug.cgi?id=172748

--- Comment #4 from Mike West <mkwst at chromium.org> ---
> URLs are used in a lot of places that aren't vulnerable to dangling markup attacks,
> so it definitely shouldn't go in the URL parser or specification.  HTML is a more
> appropriate place because you're trying to avoid URLs that look like HTML, and URLs
> should not need to know anything about HTML.

It's totally possible to implement this outside the URL parser. In Chromium, it's implemented as a flag that the URL parser sets during parsing (https://cs.chromium.org/chromium/src/url/url_canon_etc.cc?rcl=2bd9bea1c6b9ace95707a0e8715f40793c9dc909&l=26). We're scanning the URL anyway at that point to remove whitespace, and scanning the string prior to canonicalizing it turned out to show up in benchmarks. There is likely a clever way to avoid that performance impact, but it's what Chromium is doing today.

>From a spec perspective, I'd be fine with this all living in HTML, with the caveat that it seems like a large amount of work to go through that spec to find all the places where URLs could be parsed and wire them up to some parsing proxy. I don't have time right now to do that work. :(

> That said, I'm worried about compatibility.  I'm under the impression that hand
> written URLs sometimes contain tabs, newlines, < and > for good reasons, but I
> have no data to back that up.

FWIW, Chrome has been shipping this behavior since 2017.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20191008/06920035/attachment.html>