... <input type="checkbox" name="box" checked="checked" />Test <input ...> ... When I draw this page, I see a box at the end of "Test". "Test" is comming into Font::drawText() as a 5 character string, with a CR (or LF, don't remember which) at the end. In my font, that draws as a box. Is it correct that the parser didn't strip that, or convert it into a space? If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)? If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero-width spaces, but I don't have the luxury. If I am required to handle these control characters, is there a list of exactly which the parser will pass through? thanks, mike
On Tuesday 13 June 2006 17:59, Mike Reed wrote:
... <input type="checkbox" name="box" checked="checked" />Test <input ...> ...
When I draw this page, I see a box at the end of "Test". "Test" is comming into Font::drawText() as a 5 character string, with a CR (or LF, don't remember which) at the end. In my font, that draws as a box.
Is it correct that the parser didn't strip that, or convert it into a space? If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)? If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero-width spaces, but I don't have the luxury.
If I am required to handle these control characters, is there a list of exactly which the parser will pass through?
No, it will let a lot of weird unicode things through, but mostly NBSP, LF, CR and TAB are issues. The rest you can just ignore, since it would required changes elsewhere to support as well. You can instead make a list of what you want to support, or use a generic isSpace function. (don't be too afraid of speed, drawing is much slower than replacement anyway). I had the same issue when implementing the white-space parsing in KDE's KHTML. The quick and dirty solution was to do a one time replacement (keeping an extra white-space cleaned string), but this ruins editing and cost a little memory. The best solution would be to fix bidi.cpp so it doesn't include control-characters in the strings to draw. This would mean splitting strings that was separated with new-lines or tabs and place them one space apart. The current method in WebCore pretty much depend on being in control of the font rendering layer. `Allan
On Jun 13, 2006, at 8:59 AM, Mike Reed wrote:
... <input type="checkbox" name="box" checked="checked" />Test <input ...> ...
When I draw this page, I see a box at the end of "Test". "Test" is comming into Font::drawText() as a 5 character string, with a CR (or LF, don't remember which) at the end. In my font, that draws as a box.
Is it correct that the parser didn't strip that, or convert it into a space?
Yes. The parser must not convert it to a space; the DOM must contain a space.
If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)?
Yes. Having the text rendering machinery handle these characters specially makes things faster on platforms where we can do that efficiently (which now includes both Macintosh and Windows on TOT, since there's shared high speed text rendering code). Allan outlined a way we could change bidi.cpp to implement this rule at a higher level. If we can do that without hurting performance on Macintosh and Windows, we could take the code out. Hyatt's the one who's been working on this recently.
If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero- width spaces, but I don't have the luxury.
There may be other ways to do that quickly in the text rendering layer, for example it's probably quite quick to scan a string and check if any characters are in this range. In the case where they are, then you have to allocate a buffer and copy the string, but I think that's relatively rare. I'd also be comfortable taking a patch that changes it so that the bidi.cpp level takes care of this and the code from the platform directory doesn't have to handle it any more. Since this is a highly-performance-sensitive part of the code, and the way we do this now is very fast, we have to make sure we do performance measurements if we change how this works.
If I am required to handle these control characters, is there a list of exactly which the parser will pass through?
Here's the rule, taken from the code in GlyphMap.cpp (now cross- platform on TOT, formerly Macintosh-specific code) that implements the rule for the fast code path: Control characters (U+0000 - U+0020, U+007F - U+00A0) must not render at all. \n (U+000A), \t (U+0009), and non-breaking space (U+0020) must render as a space. -- Darin
On Jun 13, 2006, at 11:31 AM, Darin Adler wrote:
On Jun 13, 2006, at 8:59 AM, Mike Reed wrote:
... <input type="checkbox" name="box" checked="checked" />Test <input ...> ...
When I draw this page, I see a box at the end of "Test". "Test" is comming into Font::drawText() as a 5 character string, with a CR (or LF, don't remember which) at the end. In my font, that draws as a box.
Is it correct that the parser didn't strip that, or convert it into a space?
Yes. The parser must not convert it to a space; the DOM must contain a space.
Darin meant "the DOM must contain a newline, not a space". To be more specific, because whitespace mode can be controlled with CSS and indeed can even be changed on the fly, the DOM must always preserve teh original characters with no outspace processing.
Here's the rule, taken from the code in GlyphMap.cpp (now cross- platform on TOT, formerly Macintosh-specific code) that implements the rule for the fast code path:
Control characters (U+0000 - U+0020, U+007F - U+00A0) must not render at all. \n (U+000A), \t (U+0009), and non-breaking space (U+0020) must render as a space.
That sounds wrong for tabs... - Maciej
On Jun 13, 2006, at 11:40 AM, Maciej Stachowiak wrote:
Here's the rule, taken from the code in GlyphMap.cpp (now cross- platform on TOT, formerly Macintosh-specific code) that implements the rule for the fast code path:
Control characters (U+0000 - U+0020, U+007F - U+00A0) must not render at all. \n (U+000A), \t (U+0009), and non-breaking space (U+0020) must render as a space.
That sounds wrong for tabs...
It's possible that tabs don't make it down to the text rendering machinery given the code in higher levels, so it's probably irrelevant what is done for tabs. We'd have to test to be sure. -- Darin
On Jun 13, 2006, at 8:59 AM, Mike Reed wrote:
Is it correct that the parser didn't strip that, or convert it into a space?
Correct. The original string must be preserved, since this is relevant both for HTML editing (to not mangle the source) and for CSS white-space, which can be dynamically set at any time to values like "pre" (thus indicating that the whitespace should be preserved).
If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)?
No. Your port should be hooking in at a lower level than drawText. Font.cpp is completely cross-platform, and you should be using all of it if you can. Check out the files in the win and mac directories under platform to see what methods you have to implement (all the Font*** and Glyph*** classes). Much of the font layer is cross-platform, and there's really only a handful of methods you have to implement. For simple text, we handle tabs, newlines, etc. for you. You only have to deal with it for advanced scripts (the ***ComplexText methods).
If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero-width spaces, but I don't have the luxury.
That's basically what we do down the fast code path (which is cross- platform). We have a cached glyph map that is hacked to map newlines and tabs to spaces. (The map is only used for those characters when the white-space mode indicates they should not be preserved. Otherwise the white-space handling kicks in at a higher level and avoids the glyph map when processing those characters.) dave (hyatt@apple.com)
Got it. I just peeked at TOT, and I see the new (for me at least) GlyphBuffer and GlyphMap files. Can't wait to sync to the tip... On 6/13/06, David Hyatt <hyatt@apple.com> wrote:
On Jun 13, 2006, at 8:59 AM, Mike Reed wrote:
Is it correct that the parser didn't strip that, or convert it into a space?
Correct. The original string must be preserved, since this is relevant both for HTML editing (to not mangle the source) and for CSS white-space, which can be dynamically set at any time to values like "pre" (thus indicating that the whitespace should be preserved).
If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)?
No. Your port should be hooking in at a lower level than drawText. Font.cpp is completely cross-platform, and you should be using all of it if you can.
Check out the files in the win and mac directories under platform to see what methods you have to implement (all the Font*** and Glyph*** classes). Much of the font layer is cross-platform, and there's really only a handful of methods you have to implement. For simple text, we handle tabs, newlines, etc. for you. You only have to deal with it for advanced scripts (the ***ComplexText methods).
If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero-width spaces, but I don't have the luxury.
That's basically what we do down the fast code path (which is cross- platform). We have a cached glyph map that is hacked to map newlines and tabs to spaces. (The map is only used for those characters when the white-space mode indicates they should not be preserved. Otherwise the white-space handling kicks in at a higher level and avoids the glyph map when processing those characters.)
dave (hyatt@apple.com)
I'm waiting for the review of the actual gdk code :( On 6/14/06, Mike Reed <mikerreed@gmail.com> wrote:
Got it. I just peeked at TOT, and I see the new (for me at least) GlyphBuffer and GlyphMap files. Can't wait to sync to the tip...
On 6/13/06, David Hyatt <hyatt@apple.com> wrote:
On Jun 13, 2006, at 8:59 AM, Mike Reed wrote:
Is it correct that the parser didn't strip that, or convert it into a space?
Correct. The original string must be preserved, since this is relevant both for HTML editing (to not mangle the source) and for CSS white-space, which can be dynamically set at any time to values like "pre" (thus indicating that the whitespace should be preserved).
If so, is my port expected to strip these sorts of characters each time I measure or draw (hurting performance)?
No. Your port should be hooking in at a lower level than drawText. Font.cpp is completely cross-platform, and you should be using all of it if you can.
Check out the files in the win and mac directories under platform to see what methods you have to implement (all the Font*** and Glyph*** classes). Much of the font layer is cross-platform, and there's really only a handful of methods you have to implement. For simple text, we handle tabs, newlines, etc. for you. You only have to deal with it for advanced scripts (the ***ComplexText methods).
If I had complete control over all my fonts, I could wack their cmap tables to ensure that all control characters mapped to zero-width spaces, but I don't have the luxury.
That's basically what we do down the fast code path (which is cross- platform). We have a cached glyph map that is hacked to map newlines and tabs to spaces. (The map is only used for those characters when the white-space mode indicates they should not be preserved. Otherwise the white-space handling kicks in at a higher level and avoids the glyph map when processing those characters.)
dave (hyatt@apple.com)
_______________________________________________ webkit-dev mailing list webkit-dev@opendarwin.org http://www.opendarwin.org/mailman/listinfo/webkit-dev
participants (6)
-
Allan Sandfeld Jensen
-
Darin Adler
-
David Hyatt
-
Maciej Stachowiak
-
Mike Emmel
-
Mike Reed