No subject


Mon Sep 28 12:00:37 PDT 2015


Web applications benefit from being built on top of a thin layer.

WebKit is using get() on header names found in HTTPHeaderNames.in.
To take that into account, we can optimize get() for some/all of those header names.

This can be achieved using dedicated member fields, or an additional hashmap<HTTPHeaderName, index> or even a vector given the number of common header names (< 100).
For those "indexed" header names, a header entry could also store the index to the next header with the same name.

> > For instance, keeping-first-index mechanism might be triggered when header
> > set size is above a given threshold. 
> This would make it hard to find bugs that only appear in large lists or near
> the threshold.

I agree in general. At the same time, it does not sound too difficult to add suffficient test coverage for it.

> > Or it can be done lazily when a header is actually get.
> This would make get slow.

get would be slow the first time only, and only if number of headers N is very large.
If we look at a typical case when processing a message (request or response), there will be N append operations and M get operations. 
But M is much smaller than N when N becomes large.
Having a fast append is appealing, having an amortized fast get sounds reasonnable to me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--1452508412.3F1c4.26083
Date: Mon, 11 Jan 2016 02:33:32 -0800
MIME-Version: 1.0
Content-Type: text/html

<html>
    <head>
      <base href="https://bugs.webkit.org/" />
    </head>
    <body>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Replace HTTPHeaderMap by HTTPHeaderList"
   href="https://bugs.webkit.org/show_bug.cgi?id=152828#c4">Comment # 4</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Replace HTTPHeaderMap by HTTPHeaderList"
   href="https://bugs.webkit.org/show_bug.cgi?id=152828">bug 152828</a>
              from <span class="vcard"><a class="email" href="mailto:youennf&#64;gmail.com" title="youenn fablet &lt;youennf&#64;gmail.com&gt;"> <span class="fn">youenn fablet</span></a>
</span></b>
        <pre><span class="quote">&gt; &gt; I would first like to have a simple functional HTTPHeaderList with a clean
&gt; &gt; API. Then it could be optimized internally to handle large sets.
&gt; This would be a performance regression until the optimizations are done.  I
&gt; don't think we should do this.</span >

I don't fully agree here.
Practically speaking, it would be a small performance improvement, except for one hypothetical case.
I am not against supporting this hypothetical case.
But there is a tradeoff here, the question is then how valuable is it to support it compared to complexity/maintenance cost?

In libsoup, the headers are stored as arrays.
Some headers are made specific (Content-Type, Content-Length) to be retrieved quickly.
IIRC, lazy optimization is also done to retrieve quickly combined header values.

Libcurl is lower-level so does not provide direct get/remove API but the principle is similar.

A list of header is good in the sense that it is fast to build and above layer can rearrange it for their own needs.


More information about the webkit-unassigned mailing list