[webkit-dev] What regular expressions do we spend time evaluating?
Darin Adler
darin at apple.com
Sat Jun 28 11:39:51 PDT 2008
Since SunSpider time now is about 10% inside the regular expression
matcher, it's time to reconsider regular expression optimizations. In
particular, we should see if we can find a subset of regular
expressions where we can implement a more efficient matching algorithm.
I wrote code to dump a histogram of time spent matching regular
expressions: <https://bugs.webkit.org/show_bug.cgi?id=19801>.
Here's the result from running SunSpider on my computer. The column on
the left is the number of seconds spend matching an expression, and
the column on the right is the expression:
0.080214 - \b\w+\b
0.079685 - agggtaa[cgt]|[acg]ttaccct (ignore case)
0.077910 - [cgt]gggtaaa|tttaccc[acg] (ignore case)
0.073445 - agggta[cgt]a|t[acg]taccct (ignore case)
0.073439 - a[act]ggtaaa|tttacc[agt]t (ignore case)
0.073387 - agggt[cgt]aa|tt[acg]accct (ignore case)
0.072898 - ag[act]gtaaa|tttac[agt]ct (ignore case)
0.072552 - aggg[acg]aaa|ttt[cgt]ccct (ignore case)
0.072440 - agg[act]taaa|ttta[agt]cct (ignore case)
0.072167 - agggtaaa|tttaccct (ignore case)
0.057528 - >.*\n|\n
0.045544 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(:?[eE][+
\-]?\d+)?
0.024901 - (?:^|:|,)(?:\s*\[)+
0.008831 - ^[a-zA-Z0-9\-\._]+@[a-zA-Z0-9\-_]+(\.?[a-zA-Z0-9\-_]*)
\.[a-zA-Z]{2,3}$
0.001060 - ^[\],:{}\s]*$
0.000517 - \\.
0.000264 - [\0\t\n\v\f\r\xa0'"!-]
0.000158 - !\d\d?\d?!
0.000062 - -.*
0.000026 - ^
0.000024 - ('|\\)
0.000010 - ^[^-]*-
I wish we had a test with some better regular expression coverage!
Here's one other case with some significant regexp time in it, I think
perhaps from JSON validtion? Loading 280slides.com and looking at the
test presentation:
0.028335 - \/\/.*(\r|\n)?|\/\*(?:.|\n|\r)*?\*\/|\w+\b|[+-]?\d+
(([.]\d+)*([eE][+-]?\d+))?|"[^"\\]*(\\.[^"\\]*)*"|'[^'\\]*(\\.[^'\
\]*)*'|\s+|.
0.006968 - \S
0.001098 - "[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+
\-]?\d+)?
0.000422 - \\.
0.000263 - (?:^|:|,)(?:\s*\[)+
0.000259 - [\:\+\-\*\/\=\<\>\&\|\!\.\%]
0.000245 - [^\s]
0.000193 - [\+\-\*\/\=\<\>\&\|\!\.\[\^\(]
0.000119 - \s
0.000109 - ^[0-9a-zA-Z_\-+~!$\^{}|.%'`#&*]+/[0-9a-zA-Z_\-+~!$
\^{}|.%'`#&*]+\+xml$ (ignore case)
0.000042 - ^\w
0.000023 - ^[\],:{}\s]*$
I had a hard time finding pages and tests where regular expressions
took enough time to be worth looking at.
-- Darin
More information about the webkit-dev
mailing list