[Webkit-unassigned] [Bug 16217] New: PCRE offset vector handling need to be re-written
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Fri Nov 30 18:32:34 PST 2007
http://bugs.webkit.org/show_bug.cgi?id=16217
Summary: PCRE offset vector handling need to be re-written
Product: WebKit
Version: 523.x (Safari 3)
Platform: Macintosh
OS/Version: Mac OS X 10.4
Status: NEW
Severity: Normal
Priority: P2
Component: JavaScriptCore
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: eric at webkit.org
CC: darin at apple.com, ggaren at apple.com
PCRE offset vector handling need to be re-written
PCRE has this *awful* concept of an "offset_vector" which is set on the
match_data and passed in to jsRegExpExec. We've already partially disabled the
code in jsRegExpExec to second-guess the caller who might pass in a too-small
offsets vector. Now we need to change how the offsets vector is constructed
and used.
First, how it's used:
offsets_vector is an int array of length 3n where n is the expected number of
sub-string matches.
The first 2n ints are used to store start/stop offsets into the string for any
matched sub-strings. The last 1n ints are "private" and I'm not entirely sure
what they're always used for, I believe they mostly store offset lengths during
exec.
This should be replaced by an array of SubstringMatch { int startOffset; int
endOffset; } structs an some auxiliary data store for the extra temporary
offset data. I assume that the current solution came into being to try and
have the best data locality, and to try and reduce the number of mallocs, but
having 1 fewer mallocs per call is totally not worth this compexity. If we
decide it is, we can hide this hack behind some sort of OffsetsStorage
structure.
i suggest that we continue to pass in the SubstringMatch array by pointer
(SubstringMatch* matches) even if the callers use a vector. The reason being
that that allows the callers to use inline capacity vectors w/o needing to copy
to a generic Vector<T> for the call, or have the SPI have some fixed size
vector in it's argument lists, ick. The offset support information can be
allocated internally. Alternaitvely all this allocation could be abstracted
into a single class which the caller is required to create and pass in (or
delete when it's passed out). Example: OffsetsStorage* offsets =
OffsetsStorage(number); would create the right allocation, and then internally
we can access things using methods like setSubstringStart(number, offset).
I'd be curious if other PCRE hackers have other suggestions. I think one of
the above mentioned options should work and would be a huge cleanup from what
we currently have.
--
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the webkit-unassigned
mailing list