[webkit-gtk] Getting text for the region

Niranjan Rao nhrdls at gmail.com
Sun Mar 31 11:14:02 PDT 2013

Hi there,

Looking for hints/guidance about programatically getting the text for given
region. I know ranges can be used but not sure how to implement in my
scenario. This is on ubuntu 12.10 using python gtk.

I am working on web page scraping and trying to infer the text contents of
web page. Normal scenario is a piece of logic determines "interesting"
input text box on the page. My goal is to copy 3-4 lines of text that
appears just before the input box.

In terms of mouse/keyboard equivalent this will be equivalent to clicking
just before input box and then typing shift-up arrow 3-4 times and then
typing shift-home and ctrl-c to copy the region. Up arrow works nicely as
text selection works despite of text size and I get exactly 3-4 lines that
I am interested in.

In html markup, the text may not be closer because of markup structure/css
and javascript built content. So I have to depend upon the text that is
actually displayed by webkit. My efforts of defining the range and calling
it's to_string method returned too much information including
javascript/css code. I want to get what user will get if he selects the
region manually

Is there any way I can do this at webkit level rather than injecting
mouse/keyboard clicks? After all webkit knows about element positions and
exposing some information with DOM/CSS apis. Biggest problem I see if
defining the correct range since copy/paste othewise works perfectly if
done manually. And to define the range, I need to get elements correct.


