[webkit-help] Building a simple web crawler with webkit?

tonikitoo (Antonio Gomes) tonikitoo at gmail.com
Tue Sep 1 05:15:39 PDT 2009


Are you interested on fetching web pages content only or how webkit
would lay it out also matters for you "crawler" ?

i am asking because, even if you have no UI, webkit would not just get
the page source (and its associated resources) but also parser,
decode, render and all other steps involved. These could be a
potential performance bottleneck for you if you just care about
fetching web pages source/content (which is usually a crowler cares
about).

please be more specific about your needs ...

On Fri, Aug 28, 2009 at 9:02 PM, n179911<n179911 at gmail.com> wrote:
> On Tue, Aug 25, 2009 at 10:53 PM, Nevo<sakur.deagod at gmail.com> wrote:
>>
>>
>> 2009/8/26 Dan <dan at dancryer.com>
>>>
>>> Hi list,
>>> Just posted this to webkit-dev, and was advised that this is a better list
>>> for the question. Sorry if this is a little vague... but, does anyone have
>>> any general guidance as to where I'd start with webkit if I wanted to build
>>> a headless web client, along the lines of a crawler / bot, on top of it?
>>> Would I be best to use individual parts of the code, or implement a browser
>>> and hide the UI side of it?
>>> I'm not much of a C++/ObjC developer, so I can't begin to expect to be
>>> able to do this immediately, but any tips you can give would be greatly
>>> appreciated.
>>
>>  You might take a look at Webkit's WebInspector, which helps you to view DOM
>> hierarchy in a tree style , so you could have a good sense of how WebCore
>> manipulates/traverses a web page .
>>
>
> I have a related question on this kind of Webkit usage as well. How
> can we run Webkit without any display (e.g. X server on Linux)? For
> web crawler purpose, it does not need to display anything on screen.
> Is there a configure of Webkit for this kind of thing?
>
> Thank you.
>
>
>
>> Nevo
>>
>>
>> _______________________________________________
>> webkit-help mailing list
>> webkit-help at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-help
>>
>>
> _______________________________________________
> webkit-help mailing list
> webkit-help at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-help
>



-- 
--Antonio Gomes


More information about the webkit-help mailing list