[webkit-help] Building a simple web crawler with webkit?

n179911 n179911 at gmail.com
Tue Sep 1 09:02:30 PDT 2009

I also care about http redirect, javascript execution. css lookup
(e.g. remove the hidden element in the DOM)

On Tue, Sep 1, 2009 at 5:15 AM, tonikitoo (Antonio
Gomes)<tonikitoo at gmail.com> wrote:
> Are you interested on fetching web pages content only or how webkit
> would lay it out also matters for you "crawler" ?
> i am asking because, even if you have no UI, webkit would not just get
> the page source (and its associated resources) but also parser,
> decode, render and all other steps involved. These could be a
> potential performance bottleneck for you if you just care about
> fetching web pages source/content (which is usually a crowler cares
> about).
> please be more specific about your needs ...
> On Fri, Aug 28, 2009 at 9:02 PM, n179911<n179911 at gmail.com> wrote:
>> On Tue, Aug 25, 2009 at 10:53 PM, Nevo<sakur.deagod at gmail.com> wrote:
>>> 2009/8/26 Dan <dan at dancryer.com>
>>>> Hi list,
>>>> Just posted this to webkit-dev, and was advised that this is a better list
>>>> for the question. Sorry if this is a little vague... but, does anyone have
>>>> any general guidance as to where I'd start with webkit if I wanted to build
>>>> a headless web client, along the lines of a crawler / bot, on top of it?
>>>> Would I be best to use individual parts of the code, or implement a browser
>>>> and hide the UI side of it?
>>>> I'm not much of a C++/ObjC developer, so I can't begin to expect to be
>>>> able to do this immediately, but any tips you can give would be greatly
>>>> appreciated.
>>>  You might take a look at Webkit's WebInspector, which helps you to view DOM
>>> hierarchy in a tree style , so you could have a good sense of how WebCore
>>> manipulates/traverses a web page .
>> I have a related question on this kind of Webkit usage as well. How
>> can we run Webkit without any display (e.g. X server on Linux)? For
>> web crawler purpose, it does not need to display anything on screen.
>> Is there a configure of Webkit for this kind of thing?
>> Thank you.
>>> Nevo
>>> _______________________________________________
>>> webkit-help mailing list
>>> webkit-help at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-help
>> _______________________________________________
>> webkit-help mailing list
>> webkit-help at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-help
> --
> --Antonio Gomes

More information about the webkit-help mailing list