[webkit-reviews] review denied: [Bug 27165] Connections-per-host should be tracked closer to the ResourceHandle level, not the Cache : [Attachment 70314] Rename to ResourceLoadScheduler, etc.

Tue Oct 12 11:08:22 PDT 2010

Alexey Proskuryakov <ap at webkit.org> has denied	review:
Bug 27165: Connections-per-host should be tracked closer to the ResourceHandle
level, not the Cache
https://bugs.webkit.org/show_bug.cgi?id=27165

Attachment 70314: Rename to ResourceLoadScheduler, etc.
https://bugs.webkit.org/attachment.cgi?id=70314&action=review

------- Additional Comments from Alexey Proskuryakov <ap at webkit.org>
+// Having a limit might still help getting more important resources first

This is just copy/pasted, but this comment is nearly incomprehensible. Might?
Still? When can we remove the limit?

+static const unsigned maxRequestsInFlightForNonHTTPProtocols = 20;

I found REQUEST_MANAGEMENT_ENABLED useful before.

+static unsigned maxRequestsInFlightPerHost;

This has lost an important comment that used to be present in original code
(match the parallel connection count used by the networking layer).

Ideally, we should be clearer about whether this is about parallel connections
or about inflight requests (this is not the same if request pipelining is
enabled).

+ResourceLoadScheduler::Priority ResourceLoadScheduler::determinePriority(const
ResourceRequest& request) const
+{
+    switch (request.targetType()) {
+    case ResourceRequest::TargetIsMainFrame:

Ugh, this is horrible. ResourceRequest::m_targetType is misguided code - this
member variable isn't guaranteed to be accurate. The reason is that Mac WebKit
API sends every request to embedder as an NSURLRequest, and the embedder is
free to do anything with it. After the call returns, ResourceRequest is
re-created from NSURLRequest, so anything
ResourceRequest::doUpdateResourceRequest() doesn't know about is lost.

It's not good design to have cross-platform code with such strong Mac ties, but
that's what we have now (plus misguided code from developers who chose to
ignore this limitation after having been told about it). We should at least
have comments in ResourceRequestBase.h - or better, come up with a strategy for
dealing with this. See also: bug 44722 comment 49.

In any case, request.targetType() shouldn't be used in ResourceLoadScheduler
now.

+ResourceLoadScheduler* resourceLoadScheduler()
+{
+    DEFINE_STATIC_LOCAL(ResourceLoadScheduler, resourceLoadScheduler, ());
+    return &resourceLoadScheduler;
+}

I'd add ASSERT(isMainThread()) here - we need a lot more of those asserts with
additional functionality being available in workers.

+ResourceLoadScheduler::ResourceLoadScheduler()
+    : m_requestTimer(this, &ResourceLoadScheduler::requestTimerFired)
+    , m_isSuspendingPendingRequests(false)
+{
+    m_nonHTTPProtocolHost = HostInformation::create(String(),
maxRequestsInFlightForNonHTTPProtocols);

Why does m_nonHTTPProtocolHost need to be assigned, and not initialized
together with m_requestTimer and m_isSuspendingPendingRequests?

+    HostInformation* oldHost = hostForURL(resourceLoader->request().url());
+    ASSERT(oldHost);
+    HostInformation* newHost = hostForURL(request.url(), true);
+
+    if (oldHost == newHost)
+	 return;

Comparing by pointer is slightly surprising - besides AtomicString, this never
works. Perhaps a function like urlsAreOnSameHost() would be easier to the eyes?

+    class HostInformation : public RefCounted<HostInformation> {

Why is HostInformation reference counted? It seems to have a clear ownership.

+	 // Try to request important resources immediately

Please add periods at the end of sentences, even in moved code.

+    HostMap::iterator i = m_hosts.begin();
+    HostMap::iterator end = m_hosts.end();
+    for (;i != end; ++i)

There is no reason to define i outside for. Also, I don't see why end is
precomputed, but hostsToServe.size() just below the quoted code isn't.

-    m_handle = ResourceHandle::create(m_frame->loader()->networkingContext(),
clientRequest, this, m_defersLoading, m_shouldContentSniff);

+    resourceLoadScheduler()->add(this);

So, ResourceLoader has a circular relationship with ResourceLoadScheduler - you
create a ResourceLoader, it registers with ResourceLoadScheduler, and then the
scheduler tells the loader to start. And other times, you just start a load and
register it with the scheduler after the fact . I didn't expect ResourceLoader
to know about scheduling, but perhaps I didn't think about it long enough. Can
this be streamlined?

+    if (!redirectResponse.isNull())
+	 resourceLoadScheduler()->changeHost(this, request);

Since changeHost is only called as a response to a redirect, it should probably
have "redirect" in its name somehow. I've been puzzled about what this function
did until I saw where it was being called.

I'm wondering how changing a host affects scheduling. Clearly, we don't get a
chance to re-prioritize the request in WebCore, networking library does
everything internally.

It may be worth making an experiment - create multiple long-standing
connections to host 1, then make a request to host 2 that gets redirected to
host 1. Will it load immediately in Safari, Chrome or other browsers, even
though the concurrent connection limit is already met? This is probably not
going to affect this patch, but is something that's nice to check while we're
at it.

+    enum SchedulePolicy {
+	 Schedule,
+	 AlreadyLoading
+    };

SchedulePolicy::Schedule looks a little mysterious, maybe ScheduleLoad? I know
it's hard to come up with good names for these enums.

+ResourceLoadScheduler::HostInformation*
ResourceLoadScheduler::hostForURL(const KURL& url, bool createIfNotFound)
+{
+    if (!url.protocolInHTTPFamily())
+	 return m_nonHTTPProtocolHost.get();

Should these all be the same (ftp: and data: have little in common)? Is that
how existing code works?

+	 const ResourceRequest& request() const { return m_request; }

It's a little dangerous to have request() as a public method of ResourceLoader,
it's easy to misuse. There are initial request, request as changed by embedder
delegate call, and redirected request. There's also m_deferredRequest, which I
have no idea about.

The fewer "request()" functions a developer has to randomly choose from at any
given call site, the better. And if this one has to be made public, it needs a
comment explaining which request this is.

+	 * http/tests/xmlhttprequest/logout.html: It is no longer safe
+	 to assume that an async xhr will have any effects synchronously.

Did this test pass in Firefox and IE in its original form? If it did, we should
probably find a way to keep it working.

This patch needs more work (possibly several more iterations), but I really
like the goal and the scope of changes that you chose to make.