[Webkit-unassigned] [Bug 16217] New: PCRE offset vector handling need to be re-written

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Nov 30 18:32:34 PST 2007


http://bugs.webkit.org/show_bug.cgi?id=16217

           Summary: PCRE offset vector handling need to be re-written
           Product: WebKit
           Version: 523.x (Safari 3)
          Platform: Macintosh
        OS/Version: Mac OS X 10.4
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: JavaScriptCore
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: eric at webkit.org
                CC: darin at apple.com, ggaren at apple.com


PCRE offset vector handling need to be re-written

PCRE has this *awful* concept of an "offset_vector" which is set on the
match_data and passed in to jsRegExpExec.  We've already partially disabled the
code in jsRegExpExec to second-guess the caller who might pass in a too-small
offsets vector.  Now we need to change how the offsets vector is constructed
and used.

First, how it's used:

offsets_vector is an int array of length 3n where n is the expected number of
sub-string matches.

The first 2n ints are used to store start/stop offsets into the string for any
matched sub-strings.  The last 1n ints are "private" and I'm not entirely sure
what they're always used for, I believe they mostly store offset lengths during
exec.

This should be replaced by an array of SubstringMatch { int startOffset; int
endOffset; } structs an some auxiliary data store for the extra temporary
offset data.  I assume that the current solution came into being to try and
have the best data locality, and to try and reduce the number of mallocs, but
having 1 fewer mallocs per call is totally not worth this compexity.  If we
decide it is, we can hide this hack behind some sort of OffsetsStorage
structure.

i suggest that we continue to pass in the SubstringMatch array by pointer
(SubstringMatch* matches) even if the callers use a vector.  The reason being
that that allows the callers to use inline capacity vectors w/o needing to copy
to a generic Vector<T> for the call,  or have the SPI have some fixed size
vector in it's argument lists, ick.  The offset support information can be
allocated internally.  Alternaitvely all this allocation could be abstracted
into a single class which the caller is required to create and pass in (or
delete when it's passed out).  Example: OffsetsStorage* offsets =
OffsetsStorage(number); would create the right allocation, and then internally
we can access things using methods like setSubstringStart(number, offset).

I'd be curious if other PCRE hackers have other suggestions.  I think one of
the above mentioned options should work and would be a huge cleanup from what
we currently have.


-- 
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list