[Webkit-unassigned] [Bug 110805] New: webdatabase: Need more robust OriginUsageRecord::diskUsage().

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Feb 25 14:34:32 PST 2013


https://bugs.webkit.org/show_bug.cgi?id=110805

           Summary: webdatabase: Need more robust
                    OriginUsageRecord::diskUsage().
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: Unspecified
        OS/Version: Unspecified
            Status: ASSIGNED
          Severity: Normal
          Priority: P2
         Component: WebCore Misc.
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: mark.lam at apple.com


Created an attachment (id=190129)
 --> (https://bugs.webkit.org/attachment.cgi?id=190129&action=review)
disk stat speed test

OriginUsageRecord::diskUsage() is used to provide the total disk usage size of all databases in a specified origin.  The current implementation relies on a cache of database sizes that the OriginUsageRecord tracks.  The OriginUsageRecord is notified whenever any database activity occurs (within the same process) that may change the size of the database.  OriginUsageRecord::diskUsage() then checks which databases have changed, and updates their sizes by fetching the actual size from the file system.  Thereafter, OriginUsageRecord::diskUsage() sums up the sizes of the databases from the specified origin and return the sum as its result.

This strategy works as long as there will only be one process that will be creating and tracking databases.  In a multi-process environment, the in-process cached sizes can be easy out-dated by other processes that may be creating or adding to databases in the same origin.  This is because the activity of other processes will not trigger an update of the cached sizes in this process.

One solution to this issue is simply to do without the cached sizes and go fetch the actual size from the file system every time.  Intuitively, the file sizes should be cached in data structures in the OS' memory.  Hence, the act of fetching the file size should not incur that much overhead.  Here are some test results based on running the following test:

The test
=====
1. Create 10 databases (differently named, of course) from the same origin, and initialize them with some minimal data.
2. Measure the time it takes to do 1000 repetitions of adding a small record to only 1 of those databases.
Note: each measured unit of time is taken from a transaction cycle from just before the disk usage is computed (which slightly precedes the start of the transaction), and ends right after the transaction is committed.  

The intent here is that the disk usage should be computed by summing up the sizes of the 10 databases.  Meanwhile, oily 1 of those 10 databases is being mutated with a small write.  For the scenario where we used cached sizes, we should only be fetching the size of 1 database from the file system, while the other 9 (being unchanged) comes from the cache.  For the scenario where we fetch the sizes from the file system, we will be doing 10 file size fetches per iteration.

The results
=======
These measurements are done on a MacBook Pro (2.4 GHz core i7) with a SSD.  Times are in microseconds.

                                        Run1         Run2        Run3       |    Average
                                        ===         ===         ===        |   =====
Baseline (using a cache):   1491.97   1559.09   1483.12   |   1511.39
No cache:                         1688.97   1828.95   1656.79   |   1724.90

The difference is: the no cache approach is 1.14x slower (a difference of 214 microseconds).  That is for 10 vs 1 database size fetches.  The difference per additional database (when there is more than 1) is 1/9th of that i.e. 0.016 slower or 23 microseconds.

These results show that the cost of not using a cache is insignificant, while we get the benefit of having a more reliable read on the disk usage of databases for a given origin.

The test file disk-stat-speed-test.html is attached.  To run the test,
1. Do a debug build of webkit and add instrumentation to the transaction code to measure the time from before the call to DatabaseTracker::getMaxSizeForDatabase() till after m_sqliteTransaction->commit().  Print the delta time (and other stats: total, average) after measuring end time.
2. Delete all databases from origin localhost.
3. Serve up the page on localhost.
4. Run the debug version of webkit and load the test page.
5. When the test starts, it will present you with a "Start" button in an alert box.  Click on start and pay attention to your console to see the time values.  When the test ends, you will get an alert box with a "End" button.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list