[webkit-dev] libsoup

Tue Feb 24 13:19:30 PST 2009

On Tue, Feb 24, 2009 at 6:34 PM, Christian Dywan <christian at twotoasts.de> wrote:

> Hey,
>
> note that some time ago the WebKitGtk hackers decided to drop support
> for libcurl and always work with libsoup.

 ohh.  ... same difference? :)

> If you check out the source
> from the repository by now, you have to build with libsoup. The idea is
> that you want to be able to access the network implementation and by
> far the most effecient approach is to expose the SoupSession* and
> SoupMessage* objects where appropriate.
>
> So you don't have to fork or customize anything. Support for custom URI
> schemes is planned in libSoup for the future.

 okaaay.  excellent.  run-time loadable?  so it would be possible to
e.g. do a python plugin?

> Further more Glib is
> going to support networking functionality.

 intriguing.  if that goes through to gobject so you could pass in an
object into webkit (or libsoup) which conforms to a standard
[networking] interface that would be _very_ interesting.

> And you don't depend on
> WebKit supporting a particular feature. If your use case is missing,
> file a feature request, the libSoup maintainer is a nice person :)

 :)

 run-time access to anything-conceivable, in any language.
standardised interface of course.

 so, imagine an application which ohh, i dunno - redefines what
http:// actually _is_.  or it performs virus-checking or
spam-filtering, or does rewrites, stripping out certain kinds of
javascript before letting it get to the webkit application.

or, it takes care of this SSL security bug that was announced last
week, by pre-analysing the URL and forcing certain web sites to go
through https instead of the silly mixture of http + https as is done
at the moment.

that sort of thing is neither libsoup's job nor is it webkit's job,
but it's still essential and/or conceivable, and the design of neither
libsoup nor webkit should restrict or impose on anyone who wants to do
that kind of complex data analysis.

at the moment, the only way to do that kind of thing is to have a
"proxy" - but then you have to point the web browser at the proxy, and
also, if you get any URLs that the proxy isn't designed to cope with,
or if in fact libsoup tries to load URLs direct, because they are e.g.
ftp:// and the web browser was only configured to point at an http://
proxy, you're screwed.

so "putting in a proxy" doesn't cut the mustard.

so the feature i'd like to see in libsoup is: _everything_ goes
through to a dynamically-loadable module and i _mean_ everything.  if
there isn't a dynamic module loaded (by libsoup) then the default
behaviour is the "current" behaviour.  otherwise, the dynamic loadable
module gets the chance to call all the shots.

in that way, a dynamic loadable module - chosen by the _user_, even at
runtime - can do whatever it likes to the URLs. to the data.

personally i'd use that to create a python module which merges file://
in with http:// in order to load the "startup" html file from the
user's desktop with the AJAX locations i.e. make
file:///home/user/myapplication/ a legal location.

i'd also make it possible for python-based CGI scripts to be
executable "as if" they were http.

in other words (and this only just occurred to me!) i'd cut out the
middle-man - the web server - entirely from the loop.

the situation is this (bear with me....)

* i've got pywebkitgtk doing DOM model manipulation, now, so it's
possible to (like Adobe AIR / Flex) directly manipulate the DOM model
using languages other than javascript.  including, from python, asking
for Web Pages using XmlHttpRequest.  actually,
gdom_xml_http_request_open() because it's gobject bindings.

* via the python bindings, executing gdom_xml_http_request_open()
results in calls through to libsoup, which .... ah no, you see it
_doesn't_ result in a call to libsoup, because i just loaded the
application from file:///home/lkcl/apps/testajaxapp.html and so WebKit
goes "HAH! GOTCHA!  THAT'S A SECURITY RISK!  LOADING AJAX FROM file://
IS BAAAAAAD".

* so, first off, i have to create a "fake" version of http:// using my
very own special (non-existent at the moment) libsoup
dynamically-loadable module - written in python - which goes "http://
? no no nooo, i don't think so - let's check the filesystem
/home/lkcl/apps/ first".

* then, when gdom_xml_http_request_open() goes "http://apps/server",
that is now a _completely_ non-existent URL, but that's ok _because_:

* again, the python module that's loaded by libsoup goes "http://apps?
that's for meee!" and it then looks in /home/lkcl/apps/server and
guess what?  it finds that there's a cgi-bin script (or a django app,
or a WSGI python app) which the python module DIRECTLY RUNS in order
to obtain the http content!

in other words, you're _cutting out_ the web server!

the python module (dynamically loaded by libsoup) _is_ the web server.

this is the final piece in the puzzle that i've been racking my brains
over.  it's all very well to say "oh yes, pyjamas-desktop (or any
other Webkit-based application framework) is _wonderful_, except...
err.. you still need to install a web server in order to get round
limitations of webkit enforcing security limitations on file:// URLs."

that doesn't go down too well :)

so there is a _hell_ of a lot that can be done - a heck of a lot of
options open up - if you make the URIs go through to dynamic languages
such as python, perl, you name it.

l.