[webkit-gtk] Where to start on writing an offline browser?

Fri Apr 6 12:56:20 PDT 2012

I have this idea, and it may already have been done before, of a browser
that prefers offline browsing to online. One that allows you to surf the
web and automatically downloads every site you see onto the harddrive. If
you've already visited a site, it should load the harddrive version and
check online for updates.

My current idea for how to impliment this uses webkit/gtk. It has to
convert the links to file URIs upon saving in a standardized way such that
if you type in "google.com" on two different instances, the harddrive
version can be found both times using the URL alone. (It would save to
~/web/google.com/, for example. gmail might be ~/web/gmail.google.com/.)
And it has to store the sites such that subdomains are just as easily
findable. (So if reddit.com saved to ~/web/reddit.com,
reddit.com/r/worldnews would save to ~/web/reddit.com/r/worldnews/.)

I've been able to modify the simple example given
here<http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&sqi=2&ved=0CEoQFjAE&url=http%3A%2F%2Fwebkitgtk.org%2FCookbook%25200.1b.pdf&ei=2kV_T4fDNeWbiAKswJmwAw&usg=AFQjCNEfNCSb1r3n7tIIBoiRiAcR_KnMeA&sig2=myxbB7JBpTDguwyigU5nqQ>to
save the downloaded page to where i want on the harddrive (basically
by
just calling g_file_set_contents with the GString that i got from the web
data source from the web frame from the web view.), but i need to be able
to catch it (maybe through a signal?) when the user clicks a link. I need
to check if, when converted to a file URI, it refers to an existing file to
load or it needs to download the site first.

My code, thus far (two day's work), only modifies a small portion of the
example in the above link, so i'll just post it here. Other than the
problems i mentioned above, i'm not sure how to save the images on
websites. I've tried a few things but since nothing worked, so i deleted it
from this code. Even if i had the images saved, i might need to correct
webkit on where they're stored after reading their links!

static void loadStatusCb(WebKitWebView *web_view, GParamSpec *pspec, void*
context)
{
    if( webkit_web_view_get_load_status (web_view) != WEBKIT_LOAD_FINISHED )
        return;

    WebKitWebFrame* frame        = webkit_web_view_get_main_frame( web_view
);
    WebKitWebDataSource* source  = webkit_web_frame_get_data_source( frame
);
    GString*             content = webkit_web_data_source_get_data( source
);

    const gchar* home = g_get_home_dir();
    const gchar* saveBase = g_build_filename (
        home, "web",
        webkit_web_view_get_title( web_view ),
        NULL
    );

    const gchar* savePath = g_build_filename( saveBase, "index.html", NULL
);

    g_mkdir_with_parents( saveBase, 0777 );
    g_file_set_contents( savePath, content->str, -1, 0 );

    g_free( (void*)home     );
    g_free( (void*)saveBase );
    g_free( (void*)savePath );
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-gtk/attachments/20120406/7867a7d3/attachment.html>