Newline translation issue with textareas
I'm trying to solve <http://bugzilla.opendarwin.org/show_bug.cgi? id=3401>, which involves writing accessors for the selection on QTextEdit and KWQTextArea. After poking around, I discovered that KWQTextArea translates \r\n sequences (and just plain \r sequences) to \n when returning its text value. This makes a certain amount of sense, because standardizing newlines can be helpful at times, and Firefox also exhibits this behaviour when tested. However, WebKit has a problem with this. If I set the value of a textarea in ecmascript to "A\r\ntext\r\nfield" and fetch the value back out, the \r\n sequences are still there. Only if I change the value of the textarea by hand do the sequences get translated. After examining the code, I'm guessing this is because HTMLTextAreaElementImpl caches the value of the textarea and simply returns that cached value if the NSTextView widget hasn't been touched. And since the translation happens down in KWQTextArea, the returned value still contains the \r\n. As you can imagine, this causes quite a few problems when trying to handle the selection, since the selection can stay the same and the text change underneath it, since the selection isn't cached in HTMLTextAreaElementImpl and will always be fetched from the KWQTextArea (at least, as of this writing - I see no reason to try to cache it), so the selection will always be given assuming the translation is in place (which is another issue - I have to modify the real selection to return the perceived selection, since the NSTextView's text still contains the \r\n sequences, KWQTextArea simply translates when returning the text value) Another issue related to this is KWQTextArea has accessors for the cursor position (but not the selection - odd), which do not take into account the translation. I'm not entirely sure how to test this, since they're only used when updating the widget, and I haven't looked at this enough to figure out how to actually trigger that, but I imagine it could cause the cursor position to change based on the \r \n translation. But this is actually fixable once the above issue is dealt with. Anyway, I'm looking for a solution to the problem outlined above. Should we translate the text when setting it from Javascript? But that makes HTMLTextAreaElementImpl acquire behaviour that was hidden down in KWQTextArea, behaviour that I don't if it should belong in HTMLTextAreaElementImpl. Should we stop caching the text value in HTMLTextAreaElementImpl? I'm not sure why it is cached, AFAIK it's not something that's being accessed over and over, the only real benefit to caching is to avoid translating the NSString to a QString each time it's accessed. Should we invalidate the cached value when setting a new value in code? I'm not sure how to do that, because I'm not sure where the set value is pushed down to the KWQTextArea. Does anybody who has more experience with this code than a single day have any comments or suggestions for this problem? -- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
For a quick test of the newline translation problem I was talking about, try this code. Press the first button to get a count of the characters in the field, and the second to replace the contents of the field with the same text but using \r\n instead of \n as newline. So render this in WebKit, press the first button to get a count of 12, press the second button to modify the field, press the first to get a count of 14, then manually modify the field (i.e. type a character and delete it) and press the first button and you'll get a count of 12 again. The correct behaviour, I believe, is to always return a count of 12, and if you test this in Firefox you'll see that's what happens there. <html> <head> <title>foo</title> <script type="text/javascript"> function foobar() { var textarea = document.getElementById("foo"); alert(textarea.value.length); } function bar() { var textarea = document.getElementById("foo"); //textarea.setSelectionRange(3,7); textarea.value = "A\r\ntext\r\nfield"; textarea.focus(); } </script> </head> <body> <p>Test</p> <form> <textarea id="foo">A text field</textarea> <br /> <input type="submit" onclick="foobar(); return false;" value="Length" /> <br /> <input type="submit" onclick="bar(); return false;" value="Modify" /> </form> </body> </html> -- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
On Jun 19, 2005, at 12:17 PM, Kevin Ballard wrote:
I discovered that KWQTextArea translates \r\n sequences (and just plain \r sequences) to \n when returning its text value.
I think that ideally this translation should occur when text gets into a text area; then we would have no need to handle these odd cases later. This would be a bit challenging because NSTextView has various code paths for inserting text, and we'd have to cover all of those.
Another issue related to this is KWQTextArea has accessors for the cursor position (but not the selection - odd), which do not take into account the translation. I'm not entirely sure how to test this, since they're only used when updating the widget, and I haven't looked at this enough to figure out how to actually trigger that, but I imagine it could cause the cursor position to change based on the \r\n translation. But this is actually fixable once the above issue is dealt with.
Good point. You should keep at it and come up with the test case that shows this is broken.
Should we translate the text when setting it from Javascript? But that makes HTMLTextAreaElementImpl acquire behaviour that was hidden down in KWQTextArea, behaviour that I don't if it should belong in HTMLTextAreaElementImpl.
I think that would be an acceptable solution, and probably the easiest thing to get right. The heart of the matter here is that NSTextView can handle all three types of line endings, and makes no attempt to unify them as text is inserted. Yet we want to have the line ending always be a single standard one.
Should we stop caching the text value in HTMLTextAreaElementImpl? I'm not sure why it is cached, AFAIK it's not something that's being accessed over and over, the only real benefit to caching is to avoid translating the NSString to a QString each time it's accessed.
I think one of the reasons it's cached might be that the value needs to survive the process of destroying and re-creating the renderer.
Should we invalidate the cached value when setting a new value in code? I'm not sure how to do that, because I'm not sure where the set value is pushed down to the KWQTextArea.
RenderTextArea::updateFromElement pushes the value from the DOM into the QTextEdit (which in turn uses KWQTextArea). In the case of a setValue call, HTMLTextAreaElementImpl::setValue calls setChanged (true), which results in an updateFromElement call later. -- Darin
On Jun 20, 2005, at 3:03 PM, Darin Adler wrote:
I think that ideally this translation should occur when text gets into a text area; then we would have no need to handle these odd cases later. This would be a bit challenging because NSTextView has various code paths for inserting text, and we'd have to cover all of those.
I agree, that would be best, but as you said, there's multiple code paths for inserting text.
Another issue related to this is KWQTextArea has accessors for the cursor position (but not the selection - odd), which do not take into account the translation. I'm not entirely sure how to test this, since they're only used when updating the widget, and I haven't looked at this enough to figure out how to actually trigger that, but I imagine it could cause the cursor position to change based on the \r\n translation. But this is actually fixable once the above issue is dealt with.
Good point. You should keep at it and come up with the test case that shows this is broken.
It's not broken if we translate the text on insertion, only in the current situation where we translate on retrieval. But I will try to figure out how this is being used and how to reproduce the issue.
Should we translate the text when setting it from Javascript? But that makes HTMLTextAreaElementImpl acquire behaviour that was hidden down in KWQTextArea, behaviour that I don't if it should belong in HTMLTextAreaElementImpl.
I think that would be an acceptable solution, and probably the easiest thing to get right.
The heart of the matter here is that NSTextView can handle all three types of line endings, and makes no attempt to unify them as text is inserted. Yet we want to have the line ending always be a single standard one.
This would probably be the easiest thing to get right, but then we'd have to think about the other ways of setting text, such as pasting text in with multiple line endings. Sure, we'd still return the "correct" value when accessed, but the internal storage would still contain \r\n sequences and the accessors for the cursor position and selected text would have to be aware of that. It's just something to think about.
I think one of the reasons it's cached might be that the value needs to survive the process of destroying and re-creating the renderer.
Ah, makes sense
RenderTextArea::updateFromElement pushes the value from the DOM into the QTextEdit (which in turn uses KWQTextArea). In the case of a setValue call, HTMLTextAreaElementImpl::setValue calls setChanged (true), which results in an updateFromElement call later.
It's the "results in an updateFromElement call later" that's the problem. If we're not pushing it down to the QTextEdit at the time it's set, we can't invalidate the cached value there, and invalidating it in updateFromElement doesn't make sense, plus it would leave open a window in which the "incorrect" value (i.e. with \r \n sequences) could still be retrieved. I guess it comes down to the following options: 1) We translate the text whenever it gets set from anywhere so that all line endings are \n. This has the benefit of making the cursor position and selection accessors much easier to write, since the internal text storage is the same as what's exposed to the outside. However, this has the problem of requiring translation on all the different ways text can get into the NSTextView 2) We translate text set up in HTMLTextAreaElementImpl whenever it's set from there (or we force a updateFromElement and then invalidate the cache so that the next access will get it from KWQTextArea). This has the benefit of being pretty easy to do, with the negative of requiring the cursor position and selection accessors to be aware of \r\n sequences and treat them as a single char 3) We stop translating the text. This is probably not desired, because at least Firefox always translates \r\n sequences to \n (I don't have any way to test WinIE) and compatibility with other browsers is a good thing. Given that, I'm not actually aware of any spec that says line endings should be coerced to \n. I personally vote for option #2. -- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
On Jun 20, 2005, at 1:00 PM, Kevin Ballard wrote:
1) We translate the text whenever it gets set from anywhere so that all line endings are \n. This has the benefit of making the cursor position and selection accessors much easier to write, since the internal text storage is the same as what's exposed to the outside. However, this has the problem of requiring translation on all the different ways text can get into the NSTextView
2) We translate text set up in HTMLTextAreaElementImpl whenever it's set from there (or we force a updateFromElement and then invalidate the cache so that the next access will get it from KWQTextArea). This has the benefit of being pretty easy to do, with the negative of requiring the cursor position and selection accessors to be aware of \r\n sequences and treat them as a single char
3) We stop translating the text. This is probably not desired, because at least Firefox always translates \r\n sequences to \n (I don't have any way to test WinIE) and compatibility with other browsers is a good thing. Given that, I'm not actually aware of any spec that says line endings should be coerced to \n.
I like (1) best long term. But I think it's fine to do (2) first. It's likely we'll be reimplementing <textarea> to use HTML editing rather than QTextEdit some time soon anyway, so we can take care of (1) at that point. -- Darin
On Jun 20, 2005, at 4:07 PM, Darin Adler wrote:
2) We translate text set up in HTMLTextAreaElementImpl whenever it's set from there (or we force a updateFromElement and then invalidate the cache so that the next access will get it from KWQTextArea). This has the benefit of being pretty easy to do, with the negative of requiring the cursor position and selection accessors to be aware of \r\n sequences and treat them as a single char
I like (1) best long term. But I think it's fine to do (2) first. It's likely we'll be reimplementing <textarea> to use HTML editing rather than QTextEdit some time soon anyway, so we can take care of (1) at that point.
Ok, sounds good. In terms of actually doing #2, I would suggest that forcing an updateFromElement and then invalidating the cache is the best way to do it, so we don't have the translation happening in 2 different places. However, I'm not sure about actually implementing this, since I'm not really familiar with how it gets called in the first place. When I looked at it, setting text in HTMLTextAreaElementImpl calls setChanged(true), which basically just walks up the tree and tells each element that one of its children has changed, yes? And then how does that translate into an updateFromElement call later? Or, more importantly, would there be any problem with simply calling updateFromElement directly after the setChanged call, and then setting m_valueIsValid to false so the next access will get the translated value from KWQTextArea? -- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
On Jun 20, 2005, at 1:25 PM, Kevin Ballard wrote:
In terms of actually doing #2, I would suggest that forcing an updateFromElement and then invalidating the cache is the best way to do it, so we don't have the translation happening in 2 different places.
Nah, I think it's OK to have the translation in two different places for now. If this was a QString, we'd be talking two lines of code here: xxx.replace("\r\n", '\n'); xxx.replace('\r', '\n'); Lets not bend over too far backwards for this. -- Darin
Alright, I'll do the translation in HTMLTextAreaElementImpl::setValue. And yes, it is a QString, although the QString is taken from a const DOMString, and those replace calls look like they replace-in-place. I'm not familiar with QString (or DOMString) yet, so is there any issue with calling these 2 lines on the QString returned by value.string() (where value is a const DOMString &)? Is that returning a value in the DOMString, or does the DOMString::string call copy its contents into a new QString? I could probably figure this out by looking at the code, but I just thought it might be faster to ask. On Jun 20, 2005, at 4:55 PM, Darin Adler wrote:
Nah, I think it's OK to have the translation in two different places for now. If this was a QString, we'd be talking two lines of code here:
xxx.replace("\r\n", '\n'); xxx.replace('\r', '\n');
Lets not bend over too far backwards for this.
-- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
On Jun 20, 2005, at 2:03 PM, Kevin Ballard wrote:
Alright, I'll do the translation in HTMLTextAreaElementImpl::setValue. And yes, it is a QString, although the QString is taken from a const DOMString, and those replace calls look like they replace-in-place. I'm not familiar with QString (or DOMString) yet, so is there any issue with calling these 2 lines on the QString returned by value.string() (where value is a const DOMString &)? Is that returning a value in the DOMString, or does the DOMString::string call copy its contents into a new QString?
No issue. Modifying the QString is fine. It's a copy. -- Darin
Well, I just ran into another problem with doing it this way. If I have an empty textarea, and I run the following code: var elt = document.getElementById("foo"); // foo is the textarea elt.value = "This is a test"; elt.setSelectionRange(3,5); alert(elt.selectionStart); I end up getting 0 back, because the value hasn't been pushed down to the KWQTextArea yet, but that's where the selection is, which means I'm setting the selection while KWQTextArea still has an empty value. So there's 2 solutions to this: 1) Cache the selection up in the HTMLTextAreaElementImpl as well 2) Push the text value down to KWQTextArea on HTMLTextAreaElementImpl::setValue(). I don't think option #1 is really necessary, it would just cause more syncing issues, so I'd suggest going with #2. Is there any problem with calling updateFromElement() after setting the text value? I also suspect that HTMLInputElementImpl will have the same problem, but I haven't actually tested yet. On Jun 20, 2005, at 4:55 PM, Darin Adler wrote:
On Jun 20, 2005, at 1:25 PM, Kevin Ballard wrote:
In terms of actually doing #2, I would suggest that forcing an updateFromElement and then invalidating the cache is the best way to do it, so we don't have the translation happening in 2 different places.
Nah, I think it's OK to have the translation in two different places for now. If this was a QString, we'd be talking two lines of code here:
xxx.replace("\r\n", '\n'); xxx.replace('\r', '\n');
Lets not bend over too far backwards for this.
-- Kevin Ballard kevin@sb.org http://www.tildesoft.com http://kevin.sb.org
On Jun 21, 2005, at 12:00 PM, Kevin Ballard wrote:
Well, I just ran into another problem with doing it this way.
If I have an empty textarea, and I run the following code:
var elt = document.getElementById("foo"); // foo is the textarea elt.value = "This is a test"; elt.setSelectionRange(3,5); alert(elt.selectionStart);
I end up getting 0 back, because the value hasn't been pushed down to the KWQTextArea yet, but that's where the selection is, which means I'm setting the selection while KWQTextArea still has an empty value.
So there's 2 solutions to this:
1) Cache the selection up in the HTMLTextAreaElementImpl as well 2) Push the text value down to KWQTextArea on HTMLTextAreaElementImpl::setValue().
I don't think option #1 is really necessary, it would just cause more syncing issues, so I'd suggest going with #2. Is there any problem with calling updateFromElement() after setting the text value?
I also suspect that HTMLInputElementImpl will have the same problem, but I haven't actually tested yet.
Sounds fine. Lets see how well it works. -- Darin
participants (2)
-
Darin Adler
-
Kevin Ballard