[Webkit-unassigned] [Bug 30322] WebKit level persistent caching

Mon Nov 10 03:24:03 PST 2014

https://bugs.webkit.org/show_bug.cgi?id=30322

--- Comment #33 from Antti Koivisto <koivisto at iki.fi> ---
> * having 1 file per resource is a source of disk fragmentation, many
> resources are smaller than disk block size so the cache ends up being bigger
> than expected (at least this is an issue we have in lib soup)

Note that the serialization combines meta data, headers and body data to a single file so even empty resources are typically a few kBs. It is still true that there is about (entry count)*(block size)/2 of wasted disk space. With my current ~97MB cache with 3157 items this is ~7%. But it is not like any other storage and meta-data scheme is overhead free either.

> * missing entry eviction based of cache size. That will likely require some
> other missing pieces, like an LRU for evicting old resources, code to make
> room for new resources should the cache be full...

Yeah, size limit is still unimplemented. It will probably be random eviction at least initially. Naive LRU is generally a bad strategy for caches.

> * key filter is built by transversing the directory, are you planning to
> have some kind of index file (perhaps with precomputed data) instead of
> doing that? (the index file brings many other issues BTW specially related
> to consistency after crashes and so).

Hopefully not. Avoiding global metadata is one of things I'm trying out here. As you note if this works out it will eliminate lots of complexity and whole classes of robustness issues.

Note that the initialization really just needs the directory, not the actual cache files or their inodes. This should be quick even with very large caches.

> * shouldn't we have some extra check to avoid issues with clashes in the key
> filter? (in chromium they have [key,linkedlist] pairs to avoid this IIRC)

Clashes are dealt purely as correctness issue by verifying the key against the retrieved cache entry. Since clash probability is very low doing anything more complex doesn't seem justified.

Hashes could also be much longer than the current 32 bits without any real design changes.

> If I understood this correctly this is a first (very nice!) step, so I guess
> the most important thing is to have a good API for the cache. Later on, it
> could be evolved to improve its efficiency like having block-files storage
> for small resources alà Firefox), mmap'ed writable indexes to reduce the
> size of the key hashes in memory, etc...

Yeah, the idea is to separate the storage backend from the logic and have a simple internal interface for. People might want to try out completely different designs.

With the current implementation I'm going for the "minimal viable" backend. And perhaps pushing the boundary of that a bit too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20141110/b647c267/attachment-0002.html>