[webkit-dev] Is the New XMLParser dead?

Mon Aug 27 17:03:42 PDT 2012

On Aug 27, 2012, at 4:28 PM, Adam Barth <abarth at webkit.org> wrote:

> On Mon, Aug 27, 2012 at 4:02 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>> On Aug 27, 2012, at 3:48 PM, Adam Barth <abarth at webkit.org> wrote:
>>> On Mon, Aug 27, 2012 at 3:06 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>>> On Aug 27, 2012, at 2:45 PM, Eric Seidel <eric at webkit.org> wrote:
>>>>> Checking back in:
>>>>> 
>>>>> Curious if this effort is still underway.  Adam and I would like to
>>>>> delete the New XML Parser if it's not needed in order to simplify the
>>>>> HTML 5 Parser again. :)
>>>> 
>>>> We do tentatively plan to get back to it (the original implementor is currently working full-time at Apple on the WebKit team).
>>> 
>>> As far as I can tell, no one has worked on the NEWXML code in over a
>>> year, the implementation doesn't work, and the code is disabled by all
>>> ports.  It seems like we should remove it from trunk.  We can retore
>>> it if/when someone is interested in working on it again.
>> 
>> What you describe as the current status is (afaik) correct. The data point I provided (since Eric asked) is that we do in fact plan to get back to it.
>> 
>>>> As far as simplifying the HTML5 parser - isn't most of the foundational work that touches the HTML5 parser also required for WebVTT, as mentioned by me in the email you quote below? Is there a big simplicity win to be had without breaking WebVTT? If so, we can think about whether removing the scaffolding and reconstructing it when needed is worthwhile.
>>> 
>>> This is a separate issue.
>> 
>> If there's a reason to remove it other than "simplify the HTML5 parser again", then certainly we can consider it. But that was the only reason Eric cited, so I wanted to check if it's actually the case, in light of WebVTT. I am still curious about the answer. But I'd be happy to discuss other reasons instead.
> 
> My understanding is that we don't typically leave broken, unused code
> in trunk unless someone is actively working on it.  Having this code
> around has costs and little benefit:
> 
> 1) The code needs to be maintained.
> 2) The code confuses contributors who don't know that it's dead.
> 
> By contrast, if someone wants to work on this code again, they can
> just revert the patch that removed it.  They might need to do some
> maintenance work at that point, but that's work that otherwise would
> have to have been done by someone else.

Do you have examples of actual confusion or undue additional maintenance? I think if those things were really happening, then that would make a decent argument for removing the code for the time being. From svn history, it doesn't look like the new xml parser code has been touched in quite some time, so I'm not seeing evidence for maintenance cost. But I could be overlooking other costs.

> 
> As for VTT, I suspect that the VTT parser doesn't need all the
> complexity that the HTML parser needs.  It's doing a much simpler task
> and likely can be made much simpler by not try to share code with the
> HTML parser at all.

I think the main sharing in all three cases is the MarkupTokenBase and MarkupTokenizerBase base classes. So XML tokenizer causes (as far as I know) no additional abstraction or complexity beyond WebVTT.

I am skeptical that either duplicating that particular code instead of sharing it, or rewriting the WebVTT tokenizer in some dramatically different way would be beneficial. Let's set the simplification argument aside unless there is some compelling concrete evidence for it (such as WebVTT hackers expressing interest in rewriting the WebVTT parser to not share code with HTML.)

Regards,
Maciej