[webkit-dev] HTML5 & MathML3 entities

Mike Marchywka marchywka at hotmail.com
Mon Jul 12 04:52:37 PDT 2010
















----------------------------------------
> From: abarth at webkit.org
> Date: Sun, 11 Jul 2010 16:33:57 -0700
> To: mjs at apple.com
> CC: webkit-dev at lists.webkit.org
> Subject: Re: [webkit-dev] HTML5 & MathML3 entities
>
> On Sat, Jul 10, 2010 at 6:28 PM, Maciej Stachowiak  wrote:
>> On Jul 10, 2010, at 11:10 AM, Sausset François wrote:
>>> I just saw that when looking at the code by myself.
>>> What do you exactly mean by a prefix tree?
>>
>> The data structure commonly called a "Trie" is a prefix tree:
>> http://en.wikipedia.org/wiki/Trie

Never missing a chance to reveal my ignorance, I didn't know these had a name.
However, I am using a prefix tree to store URL's in our browser cache.
I did keep vaciliating over redundant hashes and linked lists as well as explicit
copies of complete keys (uggggh, I don't want to explain now ). As pointed
out, this helps eliminated redundnant prefixes ( http:// may come uip a lot ).
Also of course memory coherence options proliferate if you start thinking
about things like this. 

>>
>> This data structure not only lets you tell if a particular key is present, but it also lets you check if a string you have could possibly be the prefix of any valid key.
>>
>> I think it is challenging, though, to make a trie structure that can be a compile-time constant, and building one dynamically will cost runtime memory per-process (whereas constant data would be in the data segment and shared).
>>
>> Another possibility is to make an array of all the entity names in sorted order. Then lookup can use a binary search, and on a failed lookup, looking to either side of the last key checked should determine whether it is a valid prefix.
>>
>> I expect binary search would be slower than Trie lookup, though I don't know by how much.
>
> Binary search will certainly be easier to implement. Let's start with
> that and experiment with prefix trees as a possible performance
> optimization. I'll give it a try now.

When I did this, I wrote the code in java but made heavy use of conditioanl compilation
to get it to work on j2me or j2se. This proved to be invaluable since lots of subtle low
probability errors can occur and debugging in target setting ( a phone) would have taken forever.
Certainly with java simple things can be surprisingly slow ( like recreating a key from pieces if
you need to do it a lot) but with things that translate to native code this may be
easier to optimize. 

Also, I'm not sure about the wiki speed analysis. If you do simple string compares on highly
redundant keys, you spend a lot of time comparing "http://www." to "http:///www." etc/
Fail fast equality compares as well as memory compaction could offer many benefits.  
Its too early for me to think about Orders and logs LOL. 

>
> Adam
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5


More information about the webkit-dev mailing list