[webkit-reviews] review denied: [Bug 62882] WebVTT parser : [Attachment 100074] Correcting some typos

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Jul 8 12:25:01 PDT 2011


Eric Carlson <eric.carlson at apple.com> has denied Anna Cavender
<annacc at chromium.org>'s request for review:
Bug 62882: WebVTT parser
https://bugs.webkit.org/show_bug.cgi?id=62882

Attachment 100074: Correcting some typos
https://bugs.webkit.org/attachment.cgi?id=100074&action=review

------- Additional Comments from Eric Carlson <eric.carlson at apple.com>
I agree with Eric, tests for this parser are absolutely essential! 

Even though none of the ports builds with VIDEO_TRACK, you obviously build with
it on so you can run the tests yourself to make sure the parser behaves
correctly, and to make sure you don't accidentally break it as you add more
functionality. 


View in context: https://bugs.webkit.org/attachment.cgi?id=100074&action=review


> Source/WebCore/html/TextTrackCue.cpp:160
> +	       while (position < input.length() && input[position] != ' ' &&
input[position] != '\t')

Why break on a tab, the spec says "Collect a sequence of characters that are
not space characters"?

> Source/WebCore/html/TextTrackCue.cpp:165
> +	       if (equalIgnoringCase(writingDirection, "vertical"))
> +		   m_writingDirection = VerticalGrowingLeft;
> +	       if (equalIgnoringCase(writingDirection, "vertical-lr"))
> +		   m_writingDirection = VerticalGrowingRight;

The spec says the direction strings are case sensitive: "If value is a
case-sensitive match for the string "vertical", then let cue's text track
cue..."

> Source/WebCore/html/TextTrackCue.cpp:174
> +	       if (position <= input.length() && input[position] != ' ' &&
input[position] != '\t')
> +		   break;

Ditto about stopping on a tab. 

In the case where there is a non-space character immediately after an 'L', the
spec says to skip everything *up to* the next space: "Collect a sequence of
characters that are not space characters and discard them". Just breaking out
of the case will skip up to the first non-space. This is different from the
other terminating conditions for this stage.

> Source/WebCore/html/TextTrackCue.cpp:176
> +	       if (linePosition.substring(1,
linePosition.length()-1).contains('-') || linePosition.substring(0,
linePosition.length()-1).contains('%'))
> +		   break;

Doesn't "linePosition.substring(1, linePosition.length()-1" return a new string
identical to "linePosition"? Is it necessary to copy the string to look for a
character?

> Source/WebCore/html/TextTrackCue.cpp:177
> +	       if (linePosition[0] == '-' &&
linePosition[linePosition.length()-1] == '%')

Nit here and elsewhere: "xxx-1" should have a space before and after the minus
sign.

> Source/WebCore/html/TextTrackCue.cpp:187
> +	       if (linePosition[linePosition.length()-1] == '%') {
> +		   number = linePosition.substring(0,
linePosition.length()-1).toInt64();
> +		   if (number < 0 || number > 100)
> +		       break;
> +		   m_snapToLines = false;
> +	       } else
> +		   number = linePosition.substring(0,
linePosition.length()).toInt64();
> +	       m_linePosition = number;

1) Here again, it isn't necessary to create a substring from 0..length. 

2) You skipped step 3: "If value does not contain at least one character in the
range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then jump back to the
step labeled settings".

3) I believe that toInt64() will convert everything up to the first non-digit
character to a number so you can simplify this:

bool validNumber;
number = linePosition.substring(0,
linePosition.length()).toInt64(&validNumber);
if (!validNumber)
    break;
if (linePosition[linePosition.length()-1] == '%') {
    if (number < 0 || number > 100)
	break;
    m_snapToLines = false;
}
m_linePosition = number;

> Source/WebCore/html/TextTrackCue.cpp:198
> +	       if (input[position++] != '%')
> +		   break;

If the character is not a '%' you should discard everything *up to* the next
space.

> Source/WebCore/html/TextTrackCue.cpp:200
> +	       if (position < input.length() && input[position] != ' ' &&
input[position] != '\t')
> +		   break;

Break on tab? In this case you should also skip everything  up to the next
space.

> Source/WebCore/html/TextTrackCue.cpp:213
> +	       while (position < input.length() && '0' <= input[position] &&
input[position] <= '9')
> +		   cueSize.append(input[position++]);

The "Collect a sequence of characters that are in the range U+0030 DIGIT ZERO
(0) to U+0039 DIGIT NINE (9)" pattern is used often enough that it calls out
for a function instead of repeating the code each time.

> Source/WebCore/html/TextTrackCue.cpp:219
> +	       if (position < input.length() && input[position] != ' ' &&
input[position] != '\t')
> +		   break;

Break on tab? In this case you should also skip everything  up to the next
space.

> Source/WebCore/html/TextTrackCue.cpp:232
> +	       while (position < input.length() && input[position] != ' ' &&
input[position] != '\t')
> +		   cueAlignment.append(input[position++]);

Break on tab?

> Source/WebCore/html/TextTrackCue.cpp:238
> +	       if (equalIgnoringCase(cueAlignment, "start"))
> +		   m_cueAlignment = Start;
> +	       if (equalIgnoringCase(cueAlignment, "middle"))
> +		   m_cueAlignment = Middle;
> +	       if (equalIgnoringCase(cueAlignment, "end"))
> +		   m_cueAlignment = End;

These comparisons should all be case sensitive.

> Source/WebCore/html/TextTrackCue.cpp:244
> +	   while (position < input.length() && (input[position] == ' ' ||
input[position] == '\t'))
> +	       position++;

Break on tab?

> Source/WebCore/html/TextTrackCue.h:84
>      String m_settings;

Is it useful to retain the settings as a string in a non-debug build?

> Source/WebCore/platform/track/WebVTTParser.cpp:65
> +    // 5-8 - Collect the first line and check that it is "WEBVTT" (we also
check for "WEBVTT FILE" for legacy reasons).
> +    String line;
> +    while (position < input.length() && input[position] != '\r' &&
input[position] != '\n')
> +	   line.append(input[position++]);

A question that the other Eric may know the answer to: does it make sense to
use a SegmentedString in at least this outer processing loop because WebVTT
files are line based?

> Source/WebCore/platform/track/WebVTTParser.cpp:93
> +Header:
> +    // 13-18 - Allow a header (comment area) under the WEBVTT line.
> +    line = "";
> +    while (position < input.length() && input[position] != '\r' &&
input[position] != '\n')
> +	   line.append(input[position++]);
> +    position = skipLineTerminator(input, position);
> +    if (!line.isEmpty())
> +	   goto Header;
> +
> +CueLoop:
> +    // 19-29 - Allow any number of line terminators, then initialize new cue
values.
> +    while (position < input.length() && (input[position] == '\r' ||
input[position] == '\n'))
> +	   position++;
> +    resetCueValues();
> +

Instead of using inline code and jumping between labels, I think the code will
be easier to follow (and debug later) if you are able to break it up into
functions and structure it as a state machine. Something like:

enum ParserState {
	Initial,
	CueLoop,
	... etc ...
    };

    ParserState state = Initial;

    while (position < input.length()) {
	switch (state) {
	case Initial: {
	    if (!requireFileIdentifier(input, position))
		return;
	    skipHeader(input, position);
	    state = CueLoop;
	    break;
	}

	case CueLoop: {
	    state = parseCue(input, position, line);
	    break;
	}

... etc ...

> Source/WebCore/platform/track/WebVTTParser.cpp:173
> +    if (input.substring(position, 3) != "-->")
> +	   return false;

You could use find() instead of allocating a new string.

> Source/WebCore/platform/track/WebVTTParser.cpp:206
> +    String digits1;
> +    while (position < input.length() && '0' <= input[position] &&
input[position] <= '9')
> +	   digits1.append(input[position++]);
> +    int value1 = digits1.toInt();

This should be able to use the same function you will create to parse numbers
for parseSettings().

> Source/WebCore/platform/track/WebVTTParser.cpp:220
> +    // 8-12 - Collect the next sequence of 0-9 after ':' (must be 2 chars).
> +    if (position >= input.length() || input[position++] != ':')
> +	   return malformedTime;
> +    if (position >= input.length() || !('0' <= input[position] &&
input[position] <= '9'))
> +	   return malformedTime;
> +    String digits2;
> +    while (position < input.length() && '0' <= input[position] &&
input[position] <= '9')
> +	   digits2.append(input[position++]);
> +    int value2 = digits2.toInt();

Here too.

> Source/WebCore/platform/track/WebVTTParser.cpp:237
> +	   while (position < input.length() && '0' <= input[position] &&
input[position] <= '9')
> +	       digits3.append(input[position++]);
> +	   if (digits3.length() != 2)
> +	       return malformedTime;
> +	   value3 = digits3.toInt();

And here.

> Source/WebCore/platform/track/WebVTTParser.cpp:256
> +    // 14-19 - Collect next sequence of 0-9 after '.' (must be 3 chars).
> +    if (position >= input.length() || input[position++] != '.')
> +	   return malformedTime;
> +    if (position >= input.length() || !('0' <= input[position] &&
input[position] <= '9'))
> +	   return malformedTime;
> +    String digits4;
> +    while (position < input.length() && '0' <= input[position] &&
input[position] <= '9')
> +	   digits4.append(input[position++]);
> +    if (digits4.length() != 3)
> +	   return malformedTime;
> +    int value4 = digits4.toInt();
> +    if (value2 > 59 || value3 > 59)
> +	   return malformedTime;

And here.

> Source/WebCore/platform/track/WebVTTParser.cpp:259
> +    return value1 * secondsInHour + value2 * secondsInMinute + value3 +
(double)value4 / 1000;

Parentheses around the terms would make this much easier to parse visually.


More information about the webkit-reviews mailing list