[webkit-dev] TimerWindowWndProc and dangling page instances

Van Den Berghe, Vincent Vincent.VanDenBerghe at bvdinfo.com
Fri Sep 13 01:34:01 PDT 2013


Hello Mark,

Thank you for your tips. I am painfully aware of #1 and have compensated for it in much the same way you have. Tip #2 doesn't really apply to my scenario.
I'm embedding the Wincairo build of WebKit to generate PDF files from web pages in an ASP.NET application. Webkit isn't really designed for such a scenario but with some careful patching, a solution can be constructed which is "reasonably reliable".   However, these patches touch on WebKit's fundamental design decisions and I'm reluctant to submit them without first some validation of someone who knows the code better.   Some are more hacks than patches.
So I became a member of the webkit-dev list and submitted the result of my last frustrating bug hunt to see what happens. And it looks like it's not totally worthless.  As an added bonus, I can show it to my manager to explain why I was doing these last few days and why I was in such a foul disposition. Yay!

If people are interested, I'll post the rest of my experiences here, with the solutions I came up with. My employer's needs have priority, however.


Vincent

From: Salisbury, Mark [mailto:mark.salisbury at hp.com]
Sent: Friday, September 13, 2013 10:07 AM
To: Van Den Berghe, Vincent; webkit-dev at lists.webkit.org
Subject: RE: TimerWindowWndProc and dangling page instances

Vincent,

I've also embedded WebKit in a .NET application, and I've seen similar issues - not exactly what you're seeing though.

Maybe some tips I've learned could be of help to you or others:

*         Calling DestroyWindow will cause WebView::close() to be called (when WM_DESTROY is received), but it does not cause WebView's destructor to be called, that doesn't happen until the RCW holding the last COM reference is destroyed and the COM ref count goes to 0.  This happens on the GC thread (.NET internals), and deleting core webkit objects from a thread besides the main thread can lead to undefined behavior.  I'd see crashes from concurrent javascript GCs.  I fixed this by marshaling the delete call back to the main thread (in every single COM exposed class).

*         WebView's destructor calls DestroyWindow() on a tool tip window which may or may not have been destroyed (at least on WinCE).  When embedding in a .NET application, there can be a decent gap between the time the WebView's window is destroyed and the destructor is called.  It's possible that in the instances when Windows destroyed the tool tip window (because it is a child of the webview window) that the tool tip handle is assigned to another window by the time WebView's destructor is called.  You can guess what happens then when WebView's destructor deletes the tool tip window.  Windows in the system disappear - sometimes causing a crash.  The only fix I could figure out what to check to see if the window is still a tool tip window before deleting it.

I don't know if anyone wants to see a patch land for either of these problems, they're confined to embedded WebKit on Windows under .NET...

Mark

From: webkit-dev-bounces at lists.webkit.org<mailto:webkit-dev-bounces at lists.webkit.org> [mailto:webkit-dev-bounces at lists.webkit.org] On Behalf Of Van Den Berghe, Vincent
Sent: Friday, September 13, 2013 1:43 AM
To: webkit-dev at lists.webkit.org<mailto:webkit-dev at lists.webkit.org>
Subject: [webkit-dev] TimerWindowWndProc and dangling page instances

Hello,

Background
---------------
I'm currently embedding WebKit in a .NET application, which required me to solve a whole set of issues that are not relevant here. However, I've stumbled upon a scenario which I suspect is *really* problematic (but I can be wrong). Before patching anything, I'd like to submit this to greater minds for validation. The version of WebKit to which this applies is irrelevant: any recent version will do.

Problem
----------
When I'm finish with a WebView instance and before I'm releasing the COM object, I close it by doing a

DestroyWindow(m_viewWindow)

...where m_viewWindow is WebView's view window. Note that calling WebView::close() would work just as well.
This triggers a sequence of events that ultimately causes the destructor of the page to be called:

Page::~Page()

This works well, but when stress testing this will crash the WebKit.DLL with the following stack trace:

                WebKit.dll!WebCore::setImageLoadingSettings(WebCore::Page * page)  Line 57 + 0x3 bytes      C++
               WebKit.dll!WebCore::Settings::imageLoadingSettingsTimerFired(WebCore::Timer<WebCore::Settings> * __formal)  Line 363 + 0x8 bytes C++
               WebKit.dll!WebCore::Timer<WebCore::XMLHttpRequestProgressEventThrottle>::fired()  Line 114 + 0xb bytes C++
               WebKit.dll!WebCore::ThreadTimers::sharedTimerFiredInternal()  Line 132           C++
               WebKit.dll!WebCore::TimerWindowWndProc(HWND__ * hWnd, unsigned int message, unsigned int wParam, long lParam)  Line 111              C++

Depending on the managed environment (details are irrelevant for this discussion), the crash happens either immediately or after a while. Because of the way I embed WebKit, this usually happens when the second (or third, forth...) instance is being created.
The visible cause of the crash is the call to

WebKit.dll!WebCore::setImageLoadingSettings(WebCore::Page * page)

... with a page instance for which its destructor has already been executed.

Primary Cause
-----------------
I suspect the primary cause of a crash is the pending timer whose message is still in the message queue, that is executed just after the destruction of the page, but before any pending timers are killed. This causes a delayed execution of a function on a dangling page instance.
The problem doesn't occur when I don't call DestroyWindow or WebView::Close(), but this leads to unacceptable memory leaks.

Workaround
---------------
The minimal (but far from elegant) workaround is to add a method to WebCore::Settings:

        void clearPage()
        {
          m_page = 0;
        }


And call this method as the first line of Page's destructor:

Page::~Page()
{
    m_settings->clearPage();
    m_mainFrame->setView(0);
...


...and finally add a if(page) guard in setImageLoadingSettings like so:

static void setImageLoadingSettings(Page* page)
{
    if(page)
    for (Frame* frame = &page->mainFrame(); frame; frame = frame->tree().traverseNext()) {
        frame->document()->cachedResourceLoader()->setImagesEnabled(page->settings().areImagesEnabled());
        frame->document()->cachedResourceLoader()->setAutoLoadImages(page->settings().loadsImagesAutomatically());
    }
}

This doesn't remedy the delayed execution of the timer, but solves the problem by setting the deleted instance to NULL and making sure it's not used in the delayed call.

The questions of course are:

-          Is this behavior by design?

-          If it is, is there a way to clean up WebKit instances without causing this crashing behavior, and if so which one?

-          If it isn't, is there a more elegant workaround than the one outlined here?

Thanks for any feedback,

Vincent


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130913/b6b2ca0e/attachment-0001.html>


More information about the webkit-dev mailing list