[Webkit-unassigned] [Bug 42264] Prepare YARR JIT for matching regexps with iterative parentheses

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Jul 15 00:54:15 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=42264





--- Comment #2 from Gavin Barraclough <barraclough at apple.com>  2010-07-15 00:54:15 PST ---
Hi Peter, this is really interesting, there are a couple of quite major problems here but hopefully they are fixable.

(1) You are recursing on the machine stack without stack guards.  This is unsafe, and may prove to be a security vulnerability - particularly in the case of JS execution in worker threads, where the stack may be small, and pages for the stack may lie within address ranges containing pages of the heap.

The trivial fix to this problem would be to set a recursion limit, but I don't think this will be sufficient in this case.  We have seen important use cases with regular expressions that contain quantified parentheses being run over very large strings, and have run into problems with recursion limits being too low in the past.  I'm doubtful that we can safely rely on there always being sufficient space available on the machine stack at the point a match starts to run all reasonable regexs (again, particularly considering worker threads).  Setting too low a threshold would be a functional regression with respect to the current implementation (which allocates memory on the heap, and as such is not bounded in this fashion).

I'd suggest a better approach to fixing this would be to implement a fast stack-like allocator that does bump-pointer memory allocation from pages allocated on the heap, rather than using the machine stack (possibly making an exception for regexes that only require a small, fixed stack space, in order to avoid the overhead of a memory allocation).  A class implementing the allocator could expose a simple data oriented interface to JIT code, which could describe the bounds of the allocation space current available, and could be used from JIT code to perform bounded bump-pointer allocations.  Upon performing an allocation a limit check could be performed, and if this failed we could call out to C code to attempt to allocate further pages.  Such an allocator could also be used from the YARR interpreter (which currently inefficiently mallocs/frees frames, despite having stack-like behaviour), and we would likely want to share a common implementation of an allocator.


(2) Performance.  Applying this patch I see the runtime of v8-regex increase from 227ms to 470ms - is this in line with your expectations?  Given that some of the time in this benchmark is currently spent in PCRE, this is a >2x regression in the YARR JIT code.  Since it is not strictly necessary for the code generation for regular expressions currently supported by the JIT to be changed in adding support for those containing quantified parentheses, then I do not think any regression in the performance of currently supported regexes can be accepted.  I'm assuming that the problem here is related to the overhead you introduce by removing the use of compile-time determined stack frames, and replacing this with continuous dynamic allocation from the stack throughout.  I don't really expect you to be able to make this change without introducing a significant regression, and would suggest that to overcome this you're likely going to have to return to using compile-timer determined 
 stack frames for disjunctions (excluding the variable amount of space required for nested disjunctions with a quantifier >=2).  This way a stack(-like) allocation need only be performed upon each iteration of a quantified set of parentheses, rather than needing to do so for every single quantified atom encountered (or other atom that needs to record information).


One last thought: fixing problem (1) seems only likely to aggravate problem (2) - resolving the problem of stack overruns seems to inevitably involve introducing some form of bounds checking, which is likely to be further overhead upon every stack allocation, which would only makes any move away from using a compile-time determined stack frames more expensive.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list