[webkit-dev] What regular expressions do we spend time evaluating?

Darin Adler darin at apple.com
Sat Jun 28 11:39:51 PDT 2008


Since SunSpider time now is about 10% inside the regular expression  
matcher, it's time to reconsider regular expression optimizations. In  
particular, we should see if we can find a subset of regular  
expressions where we can implement a more efficient matching algorithm.

I wrote code to dump a histogram of time spent matching regular  
expressions: <https://bugs.webkit.org/show_bug.cgi?id=19801>.

Here's the result from running SunSpider on my computer. The column on  
the left is the number of seconds spend matching an expression, and  
the column on the right is the expression:

     0.080214 - \b\w+\b
     0.079685 - agggtaa[cgt]|[acg]ttaccct (ignore case)
     0.077910 - [cgt]gggtaaa|tttaccc[acg] (ignore case)
     0.073445 - agggta[cgt]a|t[acg]taccct (ignore case)
     0.073439 - a[act]ggtaaa|tttacc[agt]t (ignore case)
     0.073387 - agggt[cgt]aa|tt[acg]accct (ignore case)
     0.072898 - ag[act]gtaaa|tttac[agt]ct (ignore case)
     0.072552 - aggg[acg]aaa|ttt[cgt]ccct (ignore case)
     0.072440 - agg[act]taaa|ttta[agt]cct (ignore case)
     0.072167 - agggtaaa|tttaccct (ignore case)
     0.057528 - >.*\n|\n
     0.045544 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(:?[eE][+ 
\-]?\d+)?
     0.024901 - (?:^|:|,)(?:\s*\[)+
     0.008831 - ^[a-zA-Z0-9\-\._]+@[a-zA-Z0-9\-_]+(\.?[a-zA-Z0-9\-_]*) 
\.[a-zA-Z]{2,3}$
     0.001060 - ^[\],:{}\s]*$
     0.000517 - \\.
     0.000264 - [\0\t\n\v\f\r\xa0'"!-]
     0.000158 - !\d\d?\d?!
     0.000062 - -.*
     0.000026 - ^
     0.000024 - ('|\\)
     0.000010 - ^[^-]*-

I wish we had a test with some better regular expression coverage!

Here's one other case with some significant regexp time in it, I think  
perhaps from JSON validtion? Loading 280slides.com and looking at the  
test presentation:

     0.028335 - \/\/.*(\r|\n)?|\/\*(?:.|\n|\r)*?\*\/|\w+\b|[+-]?\d+ 
(([.]\d+)*([eE][+-]?\d+))?|"[^"\\]*(\\.[^"\\]*)*"|'[^'\\]*(\\.[^'\ 
\]*)*'|\s+|.
     0.006968 - \S
     0.001098 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+ 
\-]?\d+)?
     0.000422 - \\.
     0.000263 - (?:^|:|,)(?:\s*\[)+
     0.000259 - [\:\+\-\*\/\=\<\>\&\|\!\.\%]
     0.000245 - [^\s]
     0.000193 - [\+\-\*\/\=\<\>\&\|\!\.\[\^\(]
     0.000119 - \s
     0.000109 - ^[0-9a-zA-Z_\-+~!$\^{}|.%'`#&*]+/[0-9a-zA-Z_\-+~!$ 
\^{}|.%'`#&*]+\+xml$ (ignore case)
     0.000042 - ^\w
     0.000023 - ^[\],:{}\s]*$

I had a hard time finding pages and tests where regular expressions  
took enough time to be worth looking at.

     -- Darin



More information about the webkit-dev mailing list