[Webkit-unassigned] [Bug 45271] HTML parser should provide script column position within HTML document to JavaScript engine

Wed Sep 22 10:32:07 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=45271

--- Comment #17 from Peter Rybin <peter.rybin at gmail.com>  2010-09-22 10:32:07 PST ---
(In reply to comment #15)
> (From update of attachment 68076 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=68076&action=review
> 
> I like the approach with the specific types, but there are a lot of errors in this patch.  Also please post html-parser benchmark scores with your patch.  This code is highly performance sensitive.

Hi Adam
Thanks a lot for the review. I sorry for posting a code with so many errors.

I guess there are 2 issues here:

1. HTMLPosition

a. Name. It might be a wrong name actually.
It's a simple structure of line/column, but I also had to add a third field 'generation'. I need to reflect the fact that some 'script' (and other) elements may be generated in 'document.write' call. The positions go to JavaScript debugger and it is important that it treats positions of generated script differently; possibly it doesn't need positions for the generated script at all. I am not sure I chose the best solution possible.

b. Scope. 'HTMLPosition' goes together with the script data everywhere, so it seems okay for ScriptableDocumentParser to depend on it.

2. Column in SegmentedString.
Unfortunately I couldn't read your hint about view source parser. I haven't found anything related to columns or something I could use. What do I miss there?

I am running a test that Eric Seidel (thanks to him!) gave me a reference to. In the test_shell from Chromium, release mode, nice -15.
The numbers in both cases are very divergent, so I'd like to make some more runs now and post result here later.

Peter

> 
> > JavaScriptCore/JavaScriptCore.gypi:388
> > +            'wtf/HTMLPosition.h',
> > +            'wtf/HTMLPosition.cpp',
> 
> This isn't the right place for these files.  WTF doesn't know anything about HTML.  They should probably go in WebCore/html/parser.
> 
> > JavaScriptCore/wtf/HTMLPosition.h:86
> > +/* An int wrapper that always reminds you that the number should be 0-based */
> 
> We generally use C++ style comments.
> 
> > WebCore/bindings/js/ScriptSourceCode.h:46
> > +    ScriptSourceCode(const String& source, const KURL& url = KURL(),
> > +        const HTMLPosition1& startPosition = HTMLPosition1::minimumPosition())
> 
> Please put this on one line.
> 
> > WebCore/bindings/v8/ScriptController.cpp:263
> > +HTMLPosition0 ScriptController::eventHandlerPosition() const
> 
> What is HTMLPosition0 ?
> 
> > WebCore/dom/XMLDocumentParserLibxml2.cpp:1372
> > +    // TODO(peter.rybin): the implementation probably return 1-based int, but method should return 0-based.
> 
> WebKit uses FIXME instead of TODO(name).  Also, this is not a complete sentence.
> 
> > WebCore/dom/XMLDocumentParserLibxml2.cpp:1374
> > +    int res = (context() ? context()->input->line : 1) - 1;
> > +    printf("line number %d\n", res);
> 
> We don't usually call printf.  Also, "res" is a pretty unintelligible variable name.
> 
> > WebCore/dom/XMLDocumentParserLibxml2.cpp:1391
> > +    if (context)
> > +        return HTMLPosition0(WTF::ZeroBasedNumber::fromZeroBasedInt(context->input->line - 1),
> > +            WTF::ZeroBasedNumber::fromZeroBasedInt(context->input->col - 1), 0);
> 
> This multi-line if should use curly braces.
> 
> This code doem't make much sense.  It looks like context->input is one-based.  Shouldn't we create a one-based number from that and then covert that to a zero-based number instead of subtracting manually?
> 
> > WebCore/html/parser/HTMLDocumentParser.cpp:84
> > -    , m_treeBuilder(HTMLTreeBuilder::create(m_tokenizer.get(), document, reportErrors))
> > +    , m_treeBuilder(HTMLTreeBuilder::create(m_tokenizer.get(), this, document, reportErrors))
> 
> This change is not correct.  The HTMLTreeBuilder isn't allowed to know about the HTMLDocumentParser.  There are going to be situations in the future where the HTMLTreeBuilder is used without an HTMLDocumentParser.
> 
> > WebCore/html/parser/HTMLDocumentParser.cpp:421
> > +    int line = m_input.current().getCurrentLine();
> > +    int column = m_input.current().getCurrentColumn();
> > +    int generation = m_input.getInsertionLevel();
> 
> Generally, we don't use a "get" prefix for getters.  We just call the methods by the thing they return.  Why isn't m_input.current() returning a HTMLPosition?
> 
> > WebCore/html/parser/HTMLInputStream.h:53
> > -        : m_last(&m_first)
> > +        : m_last(&m_first), m_insertionLevel(0)
> 
> The HTMLInputStream should not have an insertionLevel.  That's a concept of the HTMLDocumentParser.  (Also, it's called a nesting level.)
> 
> > WebCore/html/parser/HTMLInputStream.h:145
> > +        m_insertionLevel = m_inputStream->insertionLevel();
> > +        m_inputStream->setInsertionLevel(m_insertionLevel + 1);
> 
> This is redundantly tracking the nestingLevel, which is already tracked carefully in the HTMLDocumentParser.
> 
> > WebCore/html/parser/HTMLTreeBuilder.cpp:2748
> > +    HTMLPosition0 position = m_scriptableDocumentParser->htmlPosition();
> 
> How can a ScriptableDocumentParser have an HTMLPosition?  ScriptableDocumentParser does not necessary have anything to do with HTML.  This makes no sense.
> 
> > WebCore/platform/text/SegmentedString.cpp:239
> > +        if (*m_currentString.m_current++ == '\n' && m_currentString.doNotExcludeLineNumbers()) {
> > +          ++lineNumber;
> > +          ++m_currentLine;
> > +          m_lineBeginCharactersConsumedValue = numberOfCharactersConsumed();
> > +        }
> 
> Wrong indent.
> 
> > WebCore/platform/text/SegmentedString.cpp:254
> > +int SegmentedString::currentLine() const
> > +{
> > +    return m_currentLine;
> > +}
> > +
> > +int SegmentedString::currentColumn() const
> > +{
> > +    return numberOfCharactersConsumed() - m_lineBeginCharactersConsumedValue;
> > +}
> 
> Why not just return a position?
> 
> > WebCore/platform/text/SegmentedString.h:166
> > +            if (m_currentString.doNotExcludeLineNumbers()) {
> > +                ++lineNumber;
> > +                ++m_currentLine;
> > +                m_lineBeginCharactersConsumedValue = numberOfCharactersConsumed();
> > +            }
> 
> I'm confused.  I though I did all this work already for the view source parser.  Also, this code explicitly uses += to avoid a branch here.  Did you run the parsing benchmark???

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.