[Webkit-unassigned] [Bug 24906] 0x5C of EUC-JP is not Yen Sign but U+005C

Thu May 13 05:26:22 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=24906

--- Comment #49 from Shinichiro Hamaji <hamaji at chromium.org>  2010-05-13 05:26:21 PST ---
(In reply to comment #48)
> Created an attachment (id=55964)
 --> (https://bugs.webkit.org/attachment.cgi?id=55964) [details]
> Patch v2

Sorry for the latency... but finally :) I think I've addressed all comments.

> Does searching work correctly with this patch, so that a yen sign as visible on a web page can always be found by searching for a yen sign? I think it should work, but nothing beats actual testing.

Thanks for this comment. I've added a test case for searching.

> > -    static const char* const a = atomicCanonicalTextEncodingName("Shift_JIS_X0213-2000");
> > -    static const char* const b = atomicCanonicalTextEncodingName("EUC-JP");
> > -    return (m_name == a || m_name == b) ? 0x00A5 : '\\';
> > +    DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ());
> > +    if (set.isEmpty()) {
> > +        addEncodingName(set, "Shift_JIS");
> > +        addEncodingName(set, "Shift_JIS_X0213-2000");
> > +        addEncodingName(set, "EUC-JP");
> > +        addEncodingName(set, "ISO-2022-JP");
> > +        addEncodingName(set, "ISO-2022-JP-2");
> > +    }
> > +    return m_name && set.contains(m_name) ? 0x00A5 : '\\';
> 
> This fix seems separate from the rest of the patch and should be landed separately if possible. And needs a test case.

Yeah, I'll do in a separated patch.

> +    // FIXME: We should not do this when a user explicitly specifies a
> +    //        non-hacked font.
> 
> No need to wrap the FIXME. But even in cases where comments are long enough to wrap, we don't try to indent.
> 
> How bad is this issue in practice? I'm not sure if I can make a test case just from this description.

I updated the comment. I think it's a kind of rare case, but if an author of a web page explicitly uses english fonts to show backslashes, this change can break the author's intention. I'm not sure if it's easy to resolve this issue. The ideal behavior would be

1. If a document explicitly specifies a japanese font, do the backslash => yen sign transcoding
2. If a document explicitly specifies a non-japanese font, don't the transcoding
3. If IE's default font for the document encoding is japanese, do the transcoding
4. Otherwise, don't the transcoding

I didn't implement the step 2 because I wasn't sure if it's easy to implement. I was not sure if it's easy to check a Font is specified explicitly by CSS or not.

> +    switch (converterType(fontFamily, encoding)) {
> +    case BackslashToYenSign: {
> <...>
> +    case NoConvertion:
> +    default:
> +        ASSERT_NOT_REACHED();
> +    }
> 
> Using a switch for two cases, on of which can not be taken, is somewhat unusual. Is there a reason you did that, and not just ASSERT+if()?

I guessed we might be able to re-use this FontTranscoder class in Bug 22339 (EOT font issue) by adding enum values. That's also the reason why I added an enum which has only two values. If you want to keep code simple for now, please let me know and I'm happy to change to ASSERT+if() code.

> Is there harm in transcoding when running on Windows and the font being used is actually one of the fonts that have backslash mapped to the Yen sign?

I think such font has yen sign glyphs for both U+005C and U+0x00A5, so there should be no particular issues for such fonts.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.