Determine whether all resources have been successfully loaded
Hi Webkit developers I have posted this question a couple of weeks ago on the webkit-sdk mailinglist, but did not get any answer. I assume that subscribers of this list are more familiar with the implementation of webkit and might know the answer. I'm working on a command-line applications that uses WebKit for creating PDF documents from webpages (http://wkpdf.plesslweb.ch). While I found WebKit intuitive to use, I'm stuck with a seemingly simple problem: how can I check, whether all resources referenced by a webpage have been loaded successfully? To get information about the loading of frames and resources, I'm implementing the following delegate methods. WebResourceLoadDelegate methods: - webView:identifierForInitialRequest:fromDataSource: - webView:resource:didFinishLoadingFromDataSource: - webView:resource:didFailLoadingWithError:fromDataSource: WebFramLoadDelegate methods: - webView:didStartProvisionalLoadForFrame:(WebFrame *)frame; - webView:didFinishLoadForFrame: - webView:didCommitLoadForFrame: - webView:didFailLoadWithError:forFrame: - webView:didFailProvisionalLoadWithError:forFrame: The problem is, that didFinishLoadingFromDataSource is called, _even_ if not all resources could be loaded successfully. I have created a minimal test case to show you what I mean. The main function creates a WebView, adds delegates, and load the URL request like this: WebView * webView = [[WebView alloc] initWithFrame:NSMakeRect (0,0,800,600) frameName:@"myFrame" groupName:@"myGroup"]; [webView setFrameLoadDelegate: controller]; [webView setResourceLoadDelegate: controller]; NSURLRequest * request = ... @"test_missing_frame.html" [[webView mainFrame] loadRequest:request]; The HTML file test_missing_frame.html is a frameset referencing two frames (frame1.html,frame2.html), where frame2 references a missing resource. The code for these HTML files is shown below. I have instrumented all delegate methods to generate a trace listing the call sequence and the most important arguments to the delegate calls. identifierForInitialRequest returns an identifier that contains a unique number and the name of the resource: didStartProvisionalLoadForFrame myFrame identifierForInitialRequest -> (id:0 (http://xxx/ test_missing_frame.html)) didCommitLoadForFrame myFrame didStartProvisionalLoadForFrame f1 identifierForInitialRequest -> (id:1 (http://xxx/frame1.html)) didStartProvisionalLoadForFrame f2 identifierForInitialRequest -> (id:2 (http://xxx/frame2.html)) didFinishLoadingFromDataSource id: id:0 (http://xxx/ test_missing_frame.html) didCommitLoadForFrame f1 didFinishLoadForFrame f1 didFinishLoadingFromDataSource id: id:1 (http://xxx/frame1.html) didCommitLoadForFrame f2 identifierForInitialRequest -> (id:3 (http://xxx/missing.jpg)) didFinishLoadingFromDataSource id: id:2 (http://xxx/frame2.html) -> didFinishLoadForFrame f2 -> didFinishLoadForFrame myFrame -> didFinishLoadingFromDataSource id: id:3 (http://xxx/missing.jpg) This call sequence shows 3 surprises (or bugs): 1) didFinishLoadingFromDataSource is also called for the _missing_ resource. I would have expected that didFailLoadingWithError is called 2) didFinishLoadForFrame myFrame is called, before all resources of frame f2 have been loaded 3) didFinishLoadForFrame is called although the load was not successfull although. This contradicts Apple's documentation that says: "This method is invoked when a location request for frame has successfully; that is, when all the resources are done loading." (Although the first part of this sentence is unclear and incomplete) What is the recommended way to find out, whether all resources have been loaded successfully? Did I misunderstand something or is this a bug in WebKit? Should I file a bugreport with WebKit? I would appreciate any help. Cheers, Christian -- test_missing_frame.html --------------- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>missing resources test</title> </head> <frameset cols="200,*"> <frame src="frame1.html" name="f1"> <frame src="frame2.html" name="f2"> </frameset> </html> -- frame1.html --------------------------- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> </head> <body> <h1>This is frame1</h1> <p>No resource is missing for this frame!</p> </body> </html> -- frame2.html --------------------------- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> </head> <body> <h1>This is frame2</h1> <p>There is 1 resource missing for this frame.<img src="missing.jpg"/></p> </body> </html> -- Christian Plessl christian@plesslweb.ch http://plesslweb.ch
On Aug 3, 2007, at 1:50 AM, Plessl Christian wrote:
1) didFinishLoadingFromDataSource is also called for the _missing_ resource. I would have expected that didFailLoadingWithError is called
I don't think this is a bug. When confronted with a bad URL, I presume the web server passes back an error page. From the point of view of the WebKit API, loading that error page is a successful load; many clients want to handle error pages just as they would normal content. You can detect that it's an error from your point of view by looking at the NSURLResponse object. If it's a NSHTTPURLResponse then you can get at statusCode.
2) didFinishLoadForFrame myFrame is called, before all resources of frame f2 have been loaded
This might be a bug. I believe the design is that each frame is independent and parent frames don't necessarily wait for subframes. But I could be wrong. It might be a bug.
3) didFinishLoadForFrame is called although the load was not successfull although. This contradicts Apple's documentation that says: "This method is invoked when a location request for frame has successfully; that is, when all the resources are done loading." (Although the first part of this sentence is unclear and incomplete)
That documentation seems pretty unclear, and quite possibly wrong. You can report bugs in the Apple documentation at <http://bugreport.apple.com
.
What is the recommended way to find out, whether all resources have been loaded successfully?
Besides didFinishLoadForFrame, WebKit doesn't offer another built-in way to detect that all resources have been loaded. "When have all the resources been loaded successfully?" is not a specific-enough question. At any point a timer can fire and JavaScript can add, say, a new <img> element to a page, creating a new unloaded resource. So there's always a potential for additional loading in the future. In the general case, there's no single "this page is entirely done" point in time. And there is also considerable ambiguity about what constitutes failure. The concept of "completely done loading" that's used for Safari's status bar is the progress computation that's done by WebView. It sends out a WebViewProgressFinishedNotification when "the load has finished"; I think this is the same timing as didFinishLoadForFrame in the frame load delegate. -- Darin
On Aug 3, 2007, at 12:54 PM, Darin Adler wrote:
On Aug 3, 2007, at 1:50 AM, Plessl Christian wrote:
2) didFinishLoadForFrame myFrame is called, before all resources of frame f2 have been loaded
This might be a bug.
I believe the design is that each frame is independent and parent frames don't necessarily wait for subframes. But I could be wrong. It might be a bug.
I think this is a bug (I believe design intent is to wait for subframe resources), and it looks like didFinishLoadForFrame for frame f2 itself did not wait for f2's resources. Regards, Maciej
Hi Darin Thanks a lot for your insightful reply. You are perfectly right, in general there is no specific point in time when a web page can be considered as fully loaded, since new loads can be triggered by JavaScript etc. For my specific application that converts HTML pages to PDF I will have to find a good heuristic to determine the point where I will start the PDF generation.
1) didFinishLoadingFromDataSource is also called for the _missing_ resource. I would have expected that didFailLoadingWithError is called
I don't think this is a bug. When confronted with a bad URL, I presume the web server passes back an error page. From the point of view of the WebKit API, loading that error page is a successful load
Good point, I did not think about missing resources like that. Your suggestion to evaluate the NSHTTPURLResponse seems like a good solution to the problem.
2) didFinishLoadForFrame myFrame is called, before all resources of frame f2 have been loaded
This might be a bug.
I believe the design is that each frame is independent and parent frames don't necessarily wait for subframes. But I could be wrong. It might be a bug.
I will file a bug report after I have checked that the same error is still present in the latest version of WebKit. What is the simplest way to check this? Will downloading the nightly webkit build and setting the DYLD_FRAMEWORK_PATH to /Applications/ WebKit.app/Contents/Resources/ work?
3) didFinishLoadForFrame is called although the load was not successfull although. This contradicts Apple's documentation that says: "This method is invoked when a location request for frame has successfully; that is, when all the resources are done loading." (Although the first part of this sentence is unclear and incomplete)
That documentation seems pretty unclear, and quite possibly wrong. You can report bugs in the Apple documentation at <http:// bugreport.apple.com>.
I will report a bug with Apple. Thanks a lot for your help, Christian -- Christian Plessl christian@plesslweb.ch http://plesslweb.ch
On Aug 4, 2007, at 4:36 AM, Plessl Christian wrote:
What is the simplest way to check this? Will downloading the nightly webkit build and setting the DYLD_FRAMEWORK_PATH to / Applications/WebKit.app/Contents/Resources/ work?
Yes, that should work. -- Darin
participants (3)
-
Darin Adler
-
Maciej Stachowiak
-
Plessl Christian