[webkit-reviews] review granted: [Bug 23051] web page searching should use ICU's search so it can ignore diacritical differences : [Attachment 26332] patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Jan 1 12:49:16 PST 2009


mitz at webkit.org has granted Darin Adler <darin at apple.com>'s request for review:
Bug 23051: web page searching should use ICU's search so it can ignore
diacritical differences
https://bugs.webkit.org/show_bug.cgi?id=23051

Attachment 26332: patch
https://bugs.webkit.org/attachment.cgi?id=26332&action=review

------- Additional Comments from mitz at webkit.org
(In reply to comment #4)
> > Everything else looks good, but I can't r+ this patch because I don't
> > understand the logic here:
> > 
> > > +    // Matches that are in the overlap area are only tentative.
> > > +    // The same match may appear later, matching more characters,
> > > +    // possibly including a combining character that's not yet in the
buffer.
> > > +    if (!m_atBreak && matchStart + matchLength + m_overlap >= size) {
> > > +        memcpy(m_buffer.data(), m_buffer.data() + m_buffer.size() -
m_overlap, m_overlap);
> > > +        m_buffer.shrink(m_overlap);
> > > +        return 0;
> > > +    }
> 
> I believe the idea is that if you have the following text:
> 
>     abcde<combing-umlaut>
> 
> Then you can match abcde, but you can also match abcde<combining-umlaut>. And

> if the very last characters in the buffer are "abcde", then you should not
> accept the match until you know if the subsequent character is
> <combining-umlaut>. So a match where the match touches the end of the buffer
is
> only tentative if you might be receiving additional characters in the future,

> and thus needs to be ignored.

It looks like the code checks if the match ends in the overlap area, but then
it assumes that in that case, it also starts in the overlap area. I think
that's what confused me, but it makes sense because the overlap area is
guaranteed to be fairly large.


More information about the webkit-reviews mailing list