[webkit-help] How WebKit builds HTTP GET Request headers, and how WebKit resolves relative paths in <link>s, <img>s, etc.

Lew Hollerbach lew at customerconversations.com
Sun Feb 15 14:44:00 PST 2015


Hello,
 
In the most general terms, my question is around relative paths in links
(e.g., <link>, <img>, <script>) and the related HTTP GET request, and the
corresponding Request/Response headers: how WebKit builds the headers, and
how WebKit resolves relative paths.
 
What happens if the URL to a resource is not an absolute URL but a relative
one? My own testing suggests that the "Host" request-header field is what's
used to resolve relative paths. Is that correct? (If it isn't correct, then
what is used to resolve relative URLs?)
 
But, if it is, then how does the browser/WebKit know what the value of this
field should be? Often - but certainly not always - the value is just the
basic domain name, like "www.somesite.com". But it can also have different
sub-domains, like "assets.somesite.com". So what does the browser use to
determine this "host" value?
 
And, is there anything in a response-header - from the original request -
that the browser uses to set this value? So, for a real example, if you go
to "www.lordandtaylor.com", when the page is loaded and parsed, the very
first <script>'s "src" is 
"/wcsstore/dojoHBC/dojo/dojo.js". So how does the browser know that the
"host" value should be "www.lordandtaylor.com" here?
 
Then, a few <script>s later, a request goes to
"http://1.shrd.lordandtaylor.com" (absolute URL), and the immediately
following request again has a relative path of
"/wcsstore/HBCStorefrontAssetStore/javascript/jquery.min.js"; again, how
does the browser know to - again - use "www.lordandtaylor.com" for the
"host" value for this second request?
 
This area is confusing for me and I don't have the necessary knowledge to
understand the inner workings here.
 
If we do end up knowing how this "host" value is arrived at, how can we -
from within JavaScript - set it so that, as the page is parsed and rendered,
the rendering engine can know what this "host" should be? Is there some
global or window.variable, or other global setting that can be set via an
API from JavaScript? Obviously the rendering engine (or some other part of
the browser) must know what it is so that the Request-header is correctly
set when the various HTTP GET requests - to fetch images, CSS files,
JavaScript files, etc. - are invoked. Would you know how we can set this
"host" value?
 
And, as I mentioned earlier, if it's not "host" that determines how relative
paths are resolved, then what is?
 
I'm trying to load a Website through your proxy, to bypass the same-origin
restrictions, and have it fully rendered inside an <iframe>. Not for any
nefarious purposes such as clickjacking; no, we're building a
consumer-facing app that features a way to view and browse any Website from
the app, with some other features. And the app (a hybrid mobile app) runs
inside embedded WebKit.
 
I know this can be done: A few companies have successfully implemented such
a feature, for use cases such as customer service, co-browsing, etc. Every
company that I know of who has successfully done this has been acquired -
presumably exactly for this feature - and so their IP is clearly a trade
secret and can't be easily (if at all) gotten at.
 
If what we're trying to do can't be done using proxies, is there another way
that you would work, that would result in the same ultimate experience - of
being able to load any given Website into some container and have it be
fully functional, allow all the links to be traversed and assets loaded?
 
Thanks for any and all insights, suggestions, and general help!
 
Lew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-help/attachments/20150215/30a53dd8/attachment-0001.html>


More information about the webkit-help mailing list