[webkit-dev] PageGroup and visited link coloring

Maciej Stachowiak mjs at apple.com
Mon Nov 10 12:56:27 PST 2008


On Nov 10, 2008, at 11:12 AM, Brett Wilson wrote:

> I was recently looking at the PageGroup and visited link coloring.
> Chromium has some interesting requirements. Our design goal is to
> store hundreds of thousands to a million URLs in the database with no
> problems (basically all your history forever). We have multiple
> processes so we can't just have a local list of visited pages in each
> renderer process.
>
> Our solution to the first problem is to have 64-bit hashes (with 1
> million visited links, you would get too many collisions using 32-bit
> hashes like WebKit currently uses). Our solution to the second problem
> is to have a dedicated multiprocess hash table. This dedicated system
> manages its own hashing because we also have salting which must be in
> sync through all processes.
>
> WebKit recently changed around how visited link coloring worked. It
> used to work call a global function historyContains() and this was
> easy to integrate into our system, The new system passes 32-bit hashes
> around and maintains a global list of visited pages in the PageGroup.
> Neither of these will work with our system.
>
> My current idea is to create a new file LinkHash which has a typedef
> for the hash type (rather than using unsigned everywhere) so we can
> define it to be 64-bits in PLATFORM(CHROMIUM) and it can remain
> 32-bits for other platforms (or they can change it if they like).

I think it would be better to just always use a 64-bit hash with  
salting for all ports (assuming that is not a significant performance  
hit - I would expect it isn't). I say this because:

1) WebKit in general supports keeping unlimited history, and Safari in  
particular has a non-default option to keep history forever. I don't  
think it is Chromium-specific to support such a requirement.

2) The visited link color spoofing you mention seems like a fairly  
serious bug which would apply to any port regardless of history size.


> It also defines a visitedLinkHash function which is moved from  
> Document.
> I have a patch for this, and it's very clean. I think it improves
> things even without our porting constraint since almost 200 lines got
> moved out of Document. This is described in
> https://bugs.webkit.org/show_bug.cgi?id=22131

Moving the function out of Document seems reasonable.

> The more complicated part is in PageGroup, which seems to basically be
> the visited link database.

It's more accurate to say it *has* a visited link database rather than  
that it *is* one.

> I'm thinking of just providing a new
> PageGroupChromium.cpp which contains a different implementation that
> proxies these calls to our glue layer to be sent to our multiprocess
> database.
>
> However, I'm not sure what exactly the intent of PageGroup is. It's
> clearly not intended that this be port-specific. Is there a cleaner
> way to integrate our link database with the rest of WebKit?

PageGroup is supposed to represent a set of Page objects (essentially  
top level Web content holders) which should be considered as together  
forming a "browser". Since WebKit is designed to be a public API  
framework and to be used for purposes other than a browser, it is  
possible for a browser to show some Web content views that are not  
part of the user's browsing. So it would put those in a separate  
PageGroup (however that is reflected in that platform's API).  
PageGroup takes care of those things that we judged to belong at this  
level of granularity, rather than global or per-Page.

I don't know what the right way to integrate Chromium's visited link  
checking would be. Do you incur IPC for every link checked? Does it  
cache on the client side? Does it use shared memory?

Regards,
Maciej



More information about the webkit-dev mailing list