DOM tree surgery and DOM tree destruction
Given that Node methods have parameters and return values that are variously specified to be raw pointers and PassRefPtr's, what's the best practice for DOM tree surgery? Also, what's the best way to destroy DOM trees and constituents of DOM trees? I've read the first draft of Darin Adler's "RefPtr and PassRefPtr Basics," and I've searched in WebKit source for further guidance, but I still need more help. For example, suppose that a browser based on WebKit has loaded a Web page and parsed it, producing a DOM tree T. As an exercise, I want to perform surgery on T and then destroy both T and the constituents that the surgery removed from T, without leaking. For purposes of this exercise, the browser session itself is of no interest. Suppose a method M takes a PassRefPtr<Node> A as a parameter, where A is the root of a subtree of T. The method will find a node X meeting certain criteria within this subtree. The method will then create a new node N and interpose it between X and its children, so that [X [Y0 ... Yn]] becomes [X [N [Y0 ... Yn]]]. N isn't to be used as either a parameter or a return value. Further, M will find a node W within T that meets certain criteria and remove it and all the nodes that it dominates from T. Suppose that A (in the method that calls M), X, Y0, ... ,Yn and W are initially supplied as raw pointers via calls to firstChild() and nextSibling(). N can be declared as a RefPtr, and assigned, with casting as required, to the output of createElement. Logically, removeChild will apply to X with Y0 as a parameter, and so on for Y1, ... Yn, and to the parent of W with W as a parameter. appendChild will apply to N with Y0 as a parameter, and so on for Y1, ... Yn, and to A with N as a parameter. (Of course there are other ways to handle this simple case, but I want to do it with removeChild and appendChild for purposes of the exercise.) removeChild takes a raw pointer as a parameter. appendChild takes a PassRefPtr as a parameter. What's the best practice here with respect to raw pointers, PassRefPtr's, and RefPtr's? How do I destroy the subtree dominated by W after I remove it? How do I destroy T when I'm all finished? If any part of this question is ill-formed, I'll be grateful if the response reformulates it appropriately. Thanks for any help. Pitaga
On Jun 25, 2008, at 10:06 AM, Pitaga wrote:
For example, suppose that a browser based on WebKit has loaded a Web page and parsed it, producing a DOM tree T. As an exercise, I want to perform surgery on T and then destroy both T and the constituents that the surgery removed from T, without leaking. For purposes of this exercise, the browser session itself is of no interest.
You don't explicitly destroy anything. DOM objects will be destroyed when the last owner goes away; that's what reference counting is used for. So "How do I destroy the tree?" is the wrong question. If there's a test case where something's not getting destroyed, you could ask "Why isn't it being destroyed?" but there's no need to write code to explicitly destroy anything.
What's the best practice here with respect to raw pointers, PassRefPtr's, and RefPtr's?
That question is too vague and broad for me to answer. Maybe you could ask a more specific question?
How do I destroy the subtree dominated by W after I remove it?
I had a lot of trouble following the letters in your example, so I'm not sure about "W" or even what "dominated" means. Generally if you remove a child with removeChild, then that child and all its descendants will be destroyed when the last reference to it goes away. Typically you are holding a RefPtr to the child you are removing; in that case the most likely time it will be destroyed is when that RefPtr goes out of scope. Or you might only have a raw pointer. In that case, it's likely the child will destroyed within the removeChild function.
How do I destroy T when I'm all finished?
You say that "WebKit has loaded a web page and parsed it [to produce T]". Given that, T is a document and it's owned by the Frame that loaded it. As long as its the current document in that Frame it will be kept alive. When the Frame either goes away or loads a new document, then it will be destroyed, unless someone else is holding a reference to it. When the last reference to it goes away, it will be destroyed. -- Darin
Thanks very much for this response. We (my co-workers and I) want to use WebKit modules selectively, without running anything like full browser sessions. Over time, we'll do this as cleanly as we can, taking full advantage of smart pointers. For now, we're focused on implementing our own (non-trivial) algorithms. We're breaking into browser sessions, running our algorithms on DOM trees, and worrying as little as possible about API issues. We're more than willing to write code to destroy objects. We're coping with smart pointers, rather than taking advantage of them. If this seems like the wrong attitude, please excuse us on the grounds that it's appropriate for us to focus first on algorithm implementation. Given that we're interfering with the mechanisms for automatic destruction, and need to write code to destroy trees, how do we do this? Given that we're interfering with mechanisms for automatic destruction, my question on best practice for using raw pointers, RefPtr's, and PassRefPtr's in the surgery example may be ill-conceived. But when we're ready to use smart pointers intelligently, we'd like our algorithm methods to need as little recoding as possible. So let me ask some more specific questions anyway. A is a node pointer parameter of method M. A corresponds to a subtree of T (where T is a DOM tree corresponding to a parsed Web page). Inside M, N is a node pointer to which the output of createElement (suitably cast) will be assigned. X, Y, and W are nodes underneath A (at whatever level of descent). I'm going to find X, Y, and W by traversing the subtree dominated by A with firstChild and nextSibling calls, assigning to a local node pointer variable L and applying tests as I go. (By assumption, there's guaranteed to be one each of X, Y, and W, and Y will be the only child of X). I'm going to interpose N between X and Y, as the child of X and the parent of Y, using removeChild and appendChild. I'm going to remove W and all its descendants from T using removeChild, and destroy them. After I call M, I'm going to manually destroy T. How should A be declared in the method that calls M? How should A be supplied to M? How should N be declared inside M? How should L be declared inside M? When I find X, it will be assigned to L. How do I call removeChild on L with a parameter corresponding to firstChild of L? How do I call appendChild on L with a parameter corresponding to N? How do I call appendChild on N with a parameter corresponding to firstChild of L? When I find W, its parent will be assigned to L. How do I call removeChild on L with a parameter corresponding to the appropriate child of L? How do I make sure that the removed child and its descendants are destroyed? How do I destroy T? These are questions about smart pointers, and about destruction, not about tree traversal and tree surgery. I'll be grateful for any further help. Pitaga ----- Original Message ----- From: "Darin Adler" <darin@apple.com> To: "Pitaga" <achats@avvanta.com> Cc: <webkit-dev@lists.webkit.org> Sent: Wednesday, June 25, 2008 10:15 AM Subject: Re: [webkit-dev] DOM tree surgery and DOM tree destruction
On Jun 25, 2008, at 10:06 AM, Pitaga wrote:
For example, suppose that a browser based on WebKit has loaded a Web page and parsed it, producing a DOM tree T. As an exercise, I want to perform surgery on T and then destroy both T and the constituents that the surgery removed from T, without leaking. For purposes of this exercise, the browser session itself is of no interest.
You don't explicitly destroy anything. DOM objects will be destroyed when the last owner goes away; that's what reference counting is used for.
So "How do I destroy the tree?" is the wrong question. If there's a test case where something's not getting destroyed, you could ask "Why isn't it being destroyed?" but there's no need to write code to explicitly destroy anything.
What's the best practice here with respect to raw pointers, PassRefPtr's, and RefPtr's?
That question is too vague and broad for me to answer. Maybe you could ask a more specific question?
How do I destroy the subtree dominated by W after I remove it?
I had a lot of trouble following the letters in your example, so I'm not sure about "W" or even what "dominated" means.
Generally if you remove a child with removeChild, then that child and all its descendants will be destroyed when the last reference to it goes away. Typically you are holding a RefPtr to the child you are removing; in that case the most likely time it will be destroyed is when that RefPtr goes out of scope. Or you might only have a raw pointer. In that case, it's likely the child will destroyed within the removeChild function.
How do I destroy T when I'm all finished?
You say that "WebKit has loaded a web page and parsed it [to produce T]". Given that, T is a document and it's owned by the Frame that loaded it. As long as its the current document in that Frame it will be kept alive. When the Frame either goes away or loads a new document, then it will be destroyed, unless someone else is holding a reference to it. When the last reference to it goes away, it will be destroyed.
-- Darin
On Jun 25, 2008, at 12:16 PM, Pitaga wrote:
Thanks very much for this response.
We (my co-workers and I) want to use WebKit modules selectively, without running anything like full browser sessions. Over time, we'll do this as cleanly as we can, taking full advantage of smart pointers. For now, we're focused on implementing our own (non-trivial) algorithms. We're breaking into browser sessions, running our algorithms on DOM trees, and worrying as little as possible about API issues. We're more than willing to write code to destroy objects. We're coping with smart pointers, rather than taking advantage of them. If this seems like the wrong attitude, please excuse us on the grounds that it's appropriate for us to focus first on algorithm implementation.
Given that we're interfering with the mechanisms for automatic destruction, and need to write code to destroy trees, how do we do this?
If you want to use WebKit DOM classes in a way that violates their API and memory management model, I think you are on your own as to figuring out how to make it work. Regards, Maciej
Fair enough. Obviously we're not out to violate the WebKit DOM API and memory management model. We're having trouble mastering their unique features, and don't want to get bogged down. My most recent email makes it clear that we'd like our code to conform to the API, but don't know how to do it, or at least we're not sure we know how to do it. Responses to the questions on the toy example in the email would be a big help. As far as manual destruction goes, we're facing not only the WebKit DOM API and memory management model, but also the architecture of a browser session. We're not letting WebKit's default treatment of Web pages proceed to its natural conclusion, because that would involve a huge amount of processing that's extraneous to our purposes. We'd really like to defer a careful adaptation of the architecture. Until we get to this, knowing how to manually destroy trees will be very useful. Can you point to a document that supplements "RefPtr and PassRefPtr Basics," or is an update of that document? Thanks. Pitaga ----- Original Message ----- From: "Maciej Stachowiak" <mjs@apple.com> To: "Pitaga" <achats@avvanta.com> Cc: "Darin Adler" <darin@apple.com>; <webkit-dev@lists.webkit.org> Sent: Wednesday, June 25, 2008 1:12 PM Subject: Re: [webkit-dev] DOM tree surgery and DOM tree destruction
On Jun 25, 2008, at 12:16 PM, Pitaga wrote:
Thanks very much for this response.
We (my co-workers and I) want to use WebKit modules selectively, without running anything like full browser sessions. Over time, we'll do this as cleanly as we can, taking full advantage of smart pointers. For now, we're focused on implementing our own (non-trivial) algorithms. We're breaking into browser sessions, running our algorithms on DOM trees, and worrying as little as possible about API issues. We're more than willing to write code to destroy objects. We're coping with smart pointers, rather than taking advantage of them. If this seems like the wrong attitude, please excuse us on the grounds that it's appropriate for us to focus first on algorithm implementation.
Given that we're interfering with the mechanisms for automatic destruction, and need to write code to destroy trees, how do we do this?
If you want to use WebKit DOM classes in a way that violates their API and memory management model, I think you are on your own as to figuring out how to make it work.
Regards, Maciej
participants (3)
-
Darin Adler
-
Maciej Stachowiak
-
Pitaga