[Webkit-unassigned] [Bug 22339] Support Indian web sites with EOT by on-the-fly transcoding to Unicode

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Feb 23 16:37:02 PST 2009


https://bugs.webkit.org/show_bug.cgi?id=22339


darin at apple.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26892|review?                     |review-
               Flag|                            |




------- Comment #13 from darin at apple.com  2009-02-23 16:37 PDT -------
(From update of attachment 26892)
> + virtual Vector<UChar> transcode(const UChar* characters, int length) = 0;

Returning a Vector<UChar> is inefficient, because it requires copying the
entire vector; it's not reference counted or anything like that. Instead the
transcoder should take a Vector<UChar>& as an "out" parameter and put the
result into the vector.

> +#if ENABLE(TEXT_TRANSCODER)
> +    , m_isTranscodableFont(transcoder()->isSupportedFont(family().family().string()))
> +#endif

How many different Font objects are created when browsing? Will this additional
function call and hash table overhead have measurable performance impact? If
not, why not?

> +#include "StringImpl.h"

This include should not be needed. Please double check there are no extra
includes.

> +namespace WebCore
> +{

The brace should go on the same line.

> +DynamicParser::DynamicParser(const String& input, FontEncoder* encoder)
> +    : Parser(input, encoder)

Ownership is unclear to me. What guarantees the lifetime of the FontEncoder?

> +    if (encoder->isPreprocessRequired())

Name should be isPreprocessingRequired.

> +    m_length = m_text.length();

Why store a separate m_length if it's always the same as m_text.length()?

> +    m_hasSuffixes = m_encoder->hasSuffixes();

Why store m_hasSuffixes rather than calling m_encoder->hasSuffixes() each time?

> +String DynamicParser::removeRedundantSymbols()
> +{
> +    Vector<UChar> response;
> +    for (int i = 0; i < m_text.length(); i++) {
> +        UChar c = m_text[i];
> +        if (!m_encoder->isRedundant(c))
> +            response.append(c);
> +    }
> +    return String::adopt(response);
> +}

For better performance, it would be best to just return m_text as-is if there
are no redundant symbols rather than constructing a new String object. You
should do a reserveCapacity on the vector since it typically will be about the
same length as m_text. The index should be size_t, not int.

> +                String lookupKey = m_text.substring(m_index, i);

The substring operation requires memory allocation so it is expensive. Is there
a way to do this without allocating new strings all the time?

> +    String gunintam = syllable.gunintam();
> +    if (gunintam.length() == 2)
> +        syllable.setGunintam(m_encoder->handleTwoPartVowelSigns(gunintam[0], gunintam[1]));

If guintam is always two characters, then I suggest returning two characters
rather than allocating a string.

> +#include "FontEncoder.h"
> +#include "Parser.h"
> +#include "PlatformString.h"
> +#include "Syllable.h"

You need to include "Parser.h" here, but all of the other includes seem
unnecssary.

> +    DynamicParser(const String& input, FontEncoder* encoder);

In WebKit code we leave out argument names that are clear based on only the
type. For example "encoder" would be omitted here.

> +    virtual bool handleConsonantTermination(Syllable& syllable);

And "syllable" here.

> +    virtual bool isCurrentSyllableComplete(SyllableType type);

And "type" here.

> +#include "PlatformString.h"

This file doesn't use String, it uses Vector and UChar, so it should include
the headers for those, not "PlatformString.h".

> +namespace WebCore
> +{

Brace goes up on previous line.

> +    virtual String preprocessMessage(const String&);
> +    virtual bool isPreprocessRequired();
> +    virtual bool hasSuffixes();
> +    virtual bool isOverloaded(const String&);
> +    virtual bool isSuffixSymbol(const String&);
> +    virtual bool isPrefixSymbol(const String&);
> +
> +    virtual String lookup(const String& key) = 0;
> +    virtual String handleTwoPartVowelSigns(UChar a, UChar b) = 0;

The names "a" and "b" should be omitted here.

The interface here is heavy on use of String; this will cause the text to have
to be put into lots of separate String objects. Could these functions instead
work some other way so they won't require so much memory allocation?

> Property changes on: WebCore/platform/text/transcoder/FontEncoder.h
> ___________________________________________________________________
> Name: svn:executable
>    + *

Please do not submit patches with executable source files.

> +const SyllableType FontTranscoder::type[] = {TypeAccuMod, TypeAccuMod, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeHalluMod, TypeUnknown, TypeUnknown, TypeDigit, TypeDigit, 
> +    TypeDigit, TypeDigit, TypeDigit, TypeDigit, TypeDigit, TypeDigit, TypeDigit, 
> +    TypeDigit, TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, 
> +    TypeUnknown, TypeUnknown, TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, 
> +    TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, 
> +    TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, TypeAccu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, 
> +    TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeHallu, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeDigit, TypeDigit, TypeDigit, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeHalluMod, TypeHalluMod, TypeAccuMod, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, 
> +    TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, 
> +    TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, TypeGunintam, 
> +    TypeGunintam, TypeGunintam, TypeVattu, TypeHallu, TypeVattu, TypeHallu, 
> +    TypeVattu, TypeVattu, TypeHallu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeHallu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeHallu, 
> +    TypeVattu, TypeHallu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeVattu, TypeVattu, TypeVattu, TypeHallu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeVattu, TypeVattu, TypeHallu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, TypeVattu, 
> +    TypeVattu, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, 
> +    TypeUnknown, TypeUnknown, TypeUnknown, TypeUnknown, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeHalfForm, 
> +    TypeHalfForm, TypeHalfForm, TypeHalfForm, TypeUnknown };

Formatting here doesn't put braces where we normally would.

> +    // Assign index to fonts
> +    m_encoderIndex.set("InaiMathiNewTSC",0);
> +
> +    m_converterIndex.set("InaiMathiNewTSC",0);

Why not put the actual pointers into these maps? I don't see a reason to have
them go indirectly through array indices. And why not have a single map with a
struct that holds both the encoder and the converter?

> +    m_transcodableSites.add("www.kalkiweekly.com");

Requiring a particular host name seems like a bad idea. I think the font name
should be enough. Did we already discuss this?

> +bool FontTranscoder::isTranscodableSite(const String& origin) const
> +{
> +    return m_transcodableSites.contains(origin);
> +}

Why no check for empty string here?

> +String FontTranscoder::convert(const String& text, const String& font)
> +{
> +    if (!m_encoderIndex.contains(font) || !m_converterIndex.contains(font))
> +        return text;
> +    int encoderIndex = m_encoderIndex.get(font);
> +    int converterIndex = m_converterIndex.get(font);

This is a poor idiom because it does extra hash table lookups. With a find call
you can both check if something is in the map and get at the value that's in
there. That's twice as fast as doing a contains followed by a get. By making
this a single map you can make this four times faster.

> +    Parser* parser = new DynamicParser(text, m_encoders[encoderIndex]);

You should use OwnPtr here rather than an explicit call to delete.

> +class FontTranscoder {
> +public:
> +    static const int baseIndex = 0xEC00;
> +    static const int baseStart = 0xEC22;
> +    static const int baseEnd   = 0xEC68;
> +    static const int dependentStart = 0xECA2;
> +    static const int dependentEnd   = 0xECE8;
> +    static const int halfStart = 0xED33;
> +    static const int halfEnd   = 0xED68;
> +    static const int vattuStart = 0xECB3;
> +    static const int vattuEnd   = 0xECE8;
> +
> +    // Constants definitions
> +    static const UChar anusvara = 0xEC00;
> +    static const UChar visarga = 0xEC01;
> +    static const UChar pollu = 0xEC02;
> +    static const UChar chillu = 0xEC0D;
> +    static const UChar syllbreak = 0xEC7B;
> +    static const UChar nukta = 0xEC7C;
> +    static const UChar ardhavisarga = 0xEC7D;
> +    static const UChar tippi = 0xEC7E;
> +    static const UChar addak = 0xEC7F;
> +    static const UChar ekonkar = 0xEC80;
> +    static const UChar virama = 0xEC02;
> +    static const UChar chandrakkala = 0xEC02;
> +    static const UChar pulli = 0xEC02;
> +    static const UChar halant = 0xEC02;
> +    static const UChar hasant = 0xEC02;
> +    static const UChar candrabindu = 0xEC03;
> +    static const UChar avagraha = 0xEC04;
> +    static const UChar yati = 0xEC05;
> +    static const UChar udAtta = 0xEC06;
> +    static const UChar anudAtta = 0xEC07;
> +    static const UChar svarita = 0xEC08;
> +    static const UChar guru = 0xEC09;
> +    static const UChar laghu = 0xEC0A;
> +    static const UChar danda = 0xEC0B;
> +    static const UChar ddanda = 0xEC0C;
> +    static const UChar abbrev = 0xEC0E;
> +    static const UChar om = 0xEC0F;
> +    static const UChar digitZERO = 0xEC10;
> +    static const UChar digitONE = 0xEC11;
> +    static const UChar digitTWO = 0xEC12;
> +    static const UChar digitTHREE = 0xEC13;
> +    static const UChar digitFOUR = 0xEC14;
> +    static const UChar digitFIVE = 0xEC15;
> +    static const UChar digitSIX = 0xEC16;
> +    static const UChar digitSEVEN = 0xEC17;
> +    static const UChar digitEIGHT = 0xEC18;
> +    static const UChar digitNINE = 0xEC19;
> +    static const UChar vowelA = 0xEC20;
> +    static const UChar vowelSHTA = 0xEC21;
> +    static const UChar vowelAA = 0xEC22;
> +    static const UChar vowelI = 0xEC23;
> +    static const UChar vowelII = 0xEC24;
> +    static const UChar vowelU = 0xEC25;
> +    static const UChar vowelUU = 0xEC26;
> +    static const UChar vowelR = 0xEC27;
> +    static const UChar vowelRR = 0xEC28;
> +    static const UChar vowelL = 0xEC29;
> +    static const UChar vowelLL = 0xEC2A;
> +    static const UChar vowelCDRE = 0xEC2B;
> +    static const UChar vowelE = 0xEC2C;
> +    static const UChar vowelEE = 0xEC2D;
> +    static const UChar vowelAI = 0xEC2E;
> +    static const UChar vowelCDRO = 0xEC2F;
> +    static const UChar vowelO = 0xEC30;
> +    static const UChar vowelOO = 0xEC31;
> +    static const UChar vowelAU = 0xEC32;
> +    static const UChar consntKA = 0xEC33;
> +    static const UChar consntQA = 0xEC34;
> +    static const UChar consntKHA = 0xEC35;
> +    static const UChar consntKHHA = 0xEC36;
> +    static const UChar consntGA = 0xEC37;
> +    static const UChar consntGHA = 0xEC38;
> +    static const UChar consntGHHA = 0xEC39;
> +    static const UChar consntNGA = 0xEC3A;
> +    static const UChar consntCA = 0xEC3B;
> +    static const UChar consntCHA = 0xEC3C;
> +    static const UChar consntJA = 0xEC3D;
> +    static const UChar consntZA = 0xEC3E;
> +    static const UChar consntJHA = 0xEC3F;
> +    static const UChar consntNYA = 0xEC40;
> +    static const UChar consntTTA = 0xEC41;
> +    static const UChar consntTTHA = 0xEC42;
> +    static const UChar consntDDA = 0xEC43;
> +    static const UChar consntDDDHA = 0xEC44;
> +    static const UChar consntDDHA = 0xEC45;
> +    static const UChar consntRHA = 0xEC46;
> +    static const UChar consntNNA = 0xEC47;
> +    static const UChar consntTA = 0xEC48;
> +    static const UChar consntTHA = 0xEC49;
> +    static const UChar consntDA = 0xEC4A;
> +    static const UChar consntDHA = 0xEC4B;
> +    static const UChar consntNA = 0xEC4C;
> +    static const UChar consntNNNA = 0xEC4D;
> +    static const UChar consntPA = 0xEC4E;
> +    static const UChar consntFA = 0xEC4F;
> +    static const UChar consntPHA = 0xEC50;
> +    static const UChar consntBA = 0xEC51;
> +    static const UChar consntBHA = 0xEC52;
> +    static const UChar consntMA = 0xEC53;
> +    static const UChar consntYA = 0xEC54;
> +    static const UChar consntYYA = 0xEC55;
> +    static const UChar consntRA = 0xEC56;
> +    static const UChar consntRRA = 0xEC57;
> +    static const UChar consntLA = 0xEC58;
> +    static const UChar consntLLA = 0xEC59;
> +    static const UChar consntZHA = 0xEC5A;
> +    static const UChar consntVA = 0xEC5B;
> +    static const UChar consntSHA = 0xEC5C;
> +    static const UChar consntSSA = 0xEC5D;
> +    static const UChar consntSA = 0xEC5E;
> +    static const UChar consntHA = 0xEC5F;
> +    static const UChar consntTSA = 0xEC60;
> +    static const UChar consntDJA = 0xEC61;
> +    static const UChar consntGGA = 0xEC62;
> +    static const UChar consntJJA = 0xEC63;
> +    static const UChar consntDDDA = 0xEC64;
> +    static const UChar consntBBA = 0xEC65;
> +    static const UChar consntRAMD = 0xEC66;
> +    static const UChar consntRALD = 0xEC67;
> +    static const UChar consntKHANDATA = 0xEC68;
> +    static const UChar vowelsnAA = 0xECA2;
> +    static const UChar vowelsnI = 0xECA3;
> +    static const UChar vowelsnII = 0xECA4;
> +    static const UChar vowelsnU = 0xECA5;
> +    static const UChar vowelsnUU = 0xECA6;
> +    static const UChar vowelsnR = 0xECA7;
> +    static const UChar vowelsnRR = 0xECA8;
> +    static const UChar vowelsnL = 0xECA9;
> +    static const UChar vowelsnLL = 0xECAA;
> +    static const UChar vowelsnCDRE = 0xECAB;
> +    static const UChar vowelsnE = 0xECAC;
> +    static const UChar vowelsnEE = 0xECAD;
> +    static const UChar vowelsnAI = 0xECAE;
> +    static const UChar vowelsnCDRO = 0xECAF;
> +    static const UChar vowelsnO = 0xECB0;
> +    static const UChar vowelsnOO = 0xECB1;
> +    static const UChar vowelsnAU = 0xECB2;
> +    static const UChar vowelsnEELEN = 0xEC1A;
> +    static const UChar vowelsnAILEN = 0xEC1B;
> +    static const UChar vowelsnAULEN = 0xEC1C;
> +    static const UChar vowelsnIILEN = 0xEC1D;
> +    static const UChar vattuKA = 0xECB3;
> +    static const UChar vattuQA = 0xECB4;
> +    static const UChar vattuKHA = 0xECB5;
> +    static const UChar vattuKHHA = 0xECB6;
> +    static const UChar vattuGA = 0xECB7;
> +    static const UChar vattuGHA = 0xECB8;
> +    static const UChar vattuGHHA = 0xECB9;
> +    static const UChar vattuNGA = 0xECBA;
> +    static const UChar vattuCA = 0xECBB;
> +    static const UChar vattuCHA = 0xECBC;
> +    static const UChar vattuJA = 0xECBD;
> +    static const UChar vattuZA = 0xECBE;
> +    static const UChar vattuJHA = 0xECBF;
> +    static const UChar vattuNYA = 0xECC0;
> +    static const UChar vattuTTA = 0xECC1;
> +    static const UChar vattuTTHA = 0xECC2;
> +    static const UChar vattuDDA = 0xECC3;
> +    static const UChar vattuDDDHA = 0xECC4;
> +    static const UChar vattuDDHA = 0xECC5;
> +    static const UChar vattuRHA = 0xECC6;
> +    static const UChar vattuNNA = 0xECC7;
> +    static const UChar vattuTA = 0xECC8;
> +    static const UChar vattuTHA = 0xECC9;
> +    static const UChar vattuDA = 0xECCA;
> +    static const UChar vattuDHA = 0xECCB;
> +    static const UChar vattuNA = 0xECCC;
> +    static const UChar vattuNNNA = 0xECCD;
> +    static const UChar vattuPA = 0xECCE;
> +    static const UChar vattuFA = 0xECCF;
> +    static const UChar vattuPHA = 0xECD0;
> +    static const UChar vattuBA = 0xECD1;
> +    static const UChar vattuBHA = 0xECD2;
> +    static const UChar vattuMA = 0xECD3;
> +    static const UChar vattuYA = 0xECD4;
> +    static const UChar vattuYYA = 0xECD5;
> +    static const UChar vattuRA = 0xECD6;
> +    static const UChar vattuRRA = 0xECD7;
> +    static const UChar vattuLA = 0xECD8;
> +    static const UChar vattuLLA = 0xECD9;
> +    static const UChar vattuZHA = 0xECDA;
> +    static const UChar vattuVA = 0xECDB;
> +    static const UChar vattuSHA = 0xECDC;
> +    static const UChar vattuSSA = 0xECDD;
> +    static const UChar vattuSA = 0xECDE;
> +    static const UChar vattuHA = 0xECDF;
> +    static const UChar vattuTSA = 0xECE0;
> +    static const UChar vattuDJA = 0xECE1;
> +    static const UChar vattuGGA = 0xECE2;
> +    static const UChar vattuJJA = 0xECE3;
> +    static const UChar vattuDDDA = 0xECE4;
> +    static const UChar vattuBBA = 0xECE5;
> +    static const UChar vattuRAMD = 0xECE6;
> +    static const UChar vattuRALD = 0xECE7;
> +    static const UChar vattuKHANDATA = 0xECE8;
> +    static const UChar halffmKA = 0xED33;
> +    static const UChar halffmQA = 0xED34;
> +    static const UChar halffmKHA = 0xED35;
> +    static const UChar halffmKHHA = 0xED36;
> +    static const UChar halffmGA = 0xED37;
> +    static const UChar halffmGHA = 0xED38;
> +    static const UChar halffmGHHA = 0xED39;
> +    static const UChar halffmNGA = 0xED3A;
> +    static const UChar halffmCA = 0xED3B;
> +    static const UChar halffmCHA = 0xED3C;
> +    static const UChar halffmJA = 0xED3D;
> +    static const UChar halffmZA = 0xED3E;
> +    static const UChar halffmJHA = 0xED3F;
> +    static const UChar halffmNYA = 0xED40;
> +    static const UChar halffmTTA = 0xED41;
> +    static const UChar halffmTTHA = 0xED42;
> +    static const UChar halffmDDA = 0xED43;
> +    static const UChar halffmDDDHA = 0xED44;
> +    static const UChar halffmDDHA = 0xED45;
> +    static const UChar halffmRHA = 0xED46;
> +    static const UChar halffmNNA = 0xED47;
> +    static const UChar halffmTA = 0xED48;
> +    static const UChar halffmTHA = 0xED49;
> +    static const UChar halffmDA = 0xED4A;
> +    static const UChar halffmDHA = 0xED4B;
> +    static const UChar halffmNA = 0xED4C;
> +    static const UChar halffmNNNA = 0xED4D;
> +    static const UChar halffmPA = 0xED4E;
> +    static const UChar halffmFA = 0xED4F;
> +    static const UChar halffmPHA = 0xED50;
> +    static const UChar halffmBA = 0xED51;
> +    static const UChar halffmBHA = 0xED52;
> +    static const UChar halffmMA = 0xED53;
> +    static const UChar halffmYA = 0xED54;
> +    static const UChar halffmYYA = 0xED55;
> +    static const UChar halffmRA = 0xED56;
> +    static const UChar halffmRRA = 0xED57;
> +    static const UChar halffmLA = 0xED58;
> +    static const UChar halffmLLA = 0xED59;
> +    static const UChar halffmZHA = 0xED5A;
> +    static const UChar halffmVA = 0xED5B;
> +    static const UChar halffmSHA = 0xED5C;
> +    static const UChar halffmSSA = 0xED5D;
> +    static const UChar halffmSA = 0xED5E;
> +    static const UChar halffmHA = 0xED5F;
> +    static const UChar halffmTSA = 0xED60;
> +    static const UChar halffmDJA = 0xED61;
> +    static const UChar halffmGGA = 0xED62;
> +    static const UChar halffmJJA = 0xED63;
> +    static const UChar halffmDDDA = 0xED64;
> +    static const UChar halffmBBA = 0xED65;
> +    static const UChar halffmRAMD = 0xED66;
> +    static const UChar halffmRALD = 0xED67;
> +    static const UChar halffmKHANDATA = 0xED68;
> +    static const UChar digitTEN = 0xEC70;
> +    static const UChar digitHUNDRED = 0xEC71;
> +    static const UChar digitTHOUSAND = 0xEC72;
> +    static const UChar signDAY = 0xEC73;
> +    static const UChar signMONTH = 0xEC74;
> +    static const UChar signYEAR = 0xEC75;
> +    static const UChar signDEBIT = 0xEC76;
> +    static const UChar signCREDIT = 0xEC77;
> +    static const UChar signASABOVE = 0xEC78;
> +    static const UChar signRUPEE = 0xEC79;
> +    static const UChar signNUMBER = 0xEC7A;
> +
> +    static const UChar UniCodeSharedNUKTA    = 0x093C;
> +    static const UChar UniCodeSharedAVAGRAHA = 0x093D;
> +    static const UChar UniCodeSharedOM       = 0x0950;
> +    static const UChar UniCodeSharedUDATTA   = 0x0951;
> +    static const UChar UniCodeSharedANUDATTA = 0x0952;
> +    static const UChar UniCodeSharedSVARITA  = 0x0951;
> +    static const UChar UniCodeSharedDANDA    = 0x0964;
> +    static const UChar UniCodeSharedDDANDA   = 0x0965;
> +    static const UChar UniCodeSharedabbrev   = 0x0970;
> +    static const UChar UniCodeSharedZWNJ     = 0x200C;
> +    static const UChar UniCodeSharedZWJ      = 0x200D;

Unicode doesn't have a capital "C".

I don't understand why these constants are part of the public interface to
FontTranscoder. Can we put them somewhere else?

> +    static String fastVattu(UChar code)
> +    {
> +        String response = "";
> +        response.append(static_cast<UChar>(code + 0x80));
> +        return response;
> +    }

To construct a single-character string, you should initialize a character and
then:

    return String(&character, 1);

But don't do it. Don't create interfaces that require allocating a string (two
calls to the memory allocator, and reference counting overhead) to hold a
single character!

> +    static String baseForm(const String& input) 
> +    {
> +        Vector<UChar> response;
> +        for (int i = 0; i < input.length(); i++) {
> +            UChar code = input[i];
> +            if (code >= dependentStart && code <= dependentEnd)
> +                response.append(static_cast<UChar>(code - 0x80));
> +            if (code >= halfStart && code <= halfEnd)
> +                response.append(static_cast<UChar>(code - 0x100));
> +        }
> +        return String::adopt(response);
> +    }

I don't understand why these helper functions are public members of the
FontTranscoder class. There's no need for clients using this class to see
these. Lets find somewhere else to put these. These functions use strings way
too much -- it's going to make the code slow!

> +                if(!value.isNull())
> +                    break;

Need a space after the "if" here.

> +void Parser::process(const String& value, const String& key, Syllable& current
> +                     ,bool prefix, bool suffix)

Comma goes at the end of the previous line, not the start of the next line.

Also, we frown on the use of bools for things like "prefix" and "suffix". Makes
the code hard to understand at the call site.

> +typedef enum 
> +{
> +    TransformerUnicode,
> +    TransformerISCII,
> +    TransformerITrans,
> +    TransformerTSCII,
> +    TransformerTAB,
> +    TransformerTAM,
> +    TransformerDynamicFonts
> +} TransformerType;

This is the C language way to do an enum. In C++ you can just do:

    enum TransformerType {
    };

> +    void process(const String& value
> +            ,const String& key
> +            ,Syllable& syllable
> +            ,bool prefix = false
> +            ,bool suffix = false);

Commas in the wrong place here.

> +    for (int i = 0; i < input.length(); i++) {
> +        if (i != 0)
> +            type = FontTranscoder::getType(input[i]);
> +        if (type == TypeAccu || type == TypeDigit || type == TypeHallu 
> +            || type == TypeUnknown)
> +            m_body.append(input[i]);
> +        else if (type == TypeHalfForm) {
> +            if (suffix)
> +                m_body.insert(input.characters() + i,1,0);
> +            else
> +                m_body.append(input[i]);
> +        } else if (type == TypeGunintam) {
> +            if (prefix)
> +                m_prefixGunintam.append(input[i]);
> +            else
> +                m_gunintam.append(input[i]);
> +        } else if (type == TypeVattu) {
> +            if (prefix)
> +                m_prefixVattu.append(input[i]);
> +            else
> +                m_body.append(input[i]);
> +        } else if (type == TypeAccuMod)
> +            m_vowelModifier.append(input[i]);
> +        else if (type == TypeHalluMod)
> +            m_consonantModifier.append(input[i]);
> +    }

Building a String object a character at a time is going to be excruciatingly
slow. You need a better idiom for this, perhaps using Vector<UChar> instead.

> +public:
> +    static const UChar visarga = 0x00B7;
> +    static const UChar vowelA = 0x00AB;
> +    static const UChar vowelAA = 0x00AC;
> +    static const UChar vowelI1 = 0x00AD;
> +    static const UChar vowelI2 = 0x00FE;
> +    static const UChar vowelII = 0x00AE;
> +    static const UChar vowelU = 0x00AF;
> +    static const UChar vowelUU = 0x00B0;
> +    static const UChar vowelE = 0x00B1;
> +    static const UChar vowelEE = 0x00B2;
> +    static const UChar vowelAI = 0x00B3;
> +    static const UChar vowelO = 0x00B4;
> +    static const UChar vowelOO = 0x00B5;
> +    static const UChar vowelAU = 0x00B6;
> +    static const UChar consntKA = 0x00B8;
> +    static const UChar consntNGA = 0x00B9;
> +    static const UChar consntCA = 0x00BA;
> +    static const UChar consntJA = 0x0192;
> +    static const UChar consntNYA = 0x00BB;
> +    static const UChar consntTTA = 0x00BC;
> +    static const UChar consntNNA = 0x00BD;
> +    static const UChar consntTA = 0x00BE;
> +    static const UChar consntNA = 0x00BF;
> +    static const UChar consntNNNA = 0x00C9;
> +    static const UChar consntPA = 0x00C0;
> +    static const UChar consntMA = 0x00C1;
> +    static const UChar consntYA = 0x00C2;
> +    static const UChar consntRA = 0x00C3;
> +    static const UChar consntLA = 0x00C4;
> +    static const UChar consntVA = 0x00C5;
> +    static const UChar consntZHA = 0x00C6;
> +    static const UChar consntLLA = 0x00C7;
> +    static const UChar consntRRA = 0x00C8;
> +    static const UChar consntSSA = 0x201E;
> +    static const UChar consntSA = 0x2026;
> +    static const UChar consntHA = 0x2020;
> +    static const UChar conjctKSH = 0x2021;
> +    static const UChar conjctSRII = 0x201A;
> +    static const UChar vowelsnAA = 0x00A1;
> +    static const UChar vowelsnI = 0x00A2;
> +    static const UChar vowelsnII = 0x00A3;
> +    static const UChar vowelsnU = 0x00A4;
> +    static const UChar vowelsnUU = 0x00A5;
> +    static const UChar vowelsnE = 0x00A6;
> +    static const UChar vowelsnEE = 0x00A7;
> +    static const UChar vowelsnAI = 0x00A8;
> +    static const UChar vowelsnAULEN = 0x00AA;
> +    static const UChar comboKPULLI = 0x00EC;
> +    static const UChar comboNGPULLI = 0x00ED;
> +    static const UChar comboCPULLI = 0x00EE;
> +    static const UChar comboJPULLI = 0x02C6;
> +    static const UChar comboNYPULLI = 0x00EF;
> +    static const UChar comboTTPULLI = 0x00F0;
> +    static const UChar comboNNPULLI = 0x00F1;
> +    static const UChar comboTPULLI = 0x00F2;
> +    static const UChar comboNPULLI = 0x00F3;
> +    static const UChar comboNNNPULLI = 0x00FD;
> +    static const UChar comboPPULLI = 0x00F4;
> +    static const UChar comboMPULLI = 0x00F5;
> +    static const UChar comboYPULLI = 0x00F6;
> +    static const UChar comboRPULLI = 0x00F7;
> +    static const UChar comboLPULLI = 0x00F8;
> +    static const UChar comboVPULLI = 0x00F9;
> +    static const UChar comboZHPULLI = 0x00FA;
> +    static const UChar comboLLPULLI = 0x00FB;
> +    static const UChar comboSSPULLI = 0x2030;
> +    static const UChar comboSPULLI = 0x0160;
> +    static const UChar comboHPULLI = 0x2039;
> +    static const UChar comboRRPULLI = 0x00FC;
> +    static const UChar comboKSHPULLI = 0x0152;
> +    static const UChar comboTTI = 0x00CA;
> +    static const UChar comboTTII = 0x00CB;
> +    static const UChar comboKU = 0x00CC;
> +    static const UChar comboKUU = 0x00DC;
> +    static const UChar comboNGU = 0x2122;
> +    static const UChar comboNGUU = 0x203A;
> +    static const UChar comboCU = 0x00CD;
> +    static const UChar comboCUU = 0x00DD;
> +    static const UChar comboNYU = 0x0161;
> +    static const UChar comboNYUU = 0x0153;
> +    static const UChar comboTTU = 0x00CE;
> +    static const UChar comboTTUU = 0x00DE;
> +    static const UChar comboNNU = 0x00CF;
> +    static const UChar comboNNUU = 0x00DF;
> +    static const UChar comboTU = 0x00D0;
> +    static const UChar comboTUU = 0x00E0;
> +    static const UChar comboNU = 0x00D1;
> +    static const UChar comboNUU = 0x00E1;
> +    static const UChar comboNNNU = 0x00DB;
> +    static const UChar comboNNNUU = 0x00EB;
> +    static const UChar comboPU = 0x00D2;
> +    static const UChar comboPUU = 0x00E2;
> +    static const UChar comboMU = 0x00D3;
> +    static const UChar comboMUU = 0x00E3;
> +    static const UChar comboYU = 0x00D4;
> +    static const UChar comboYUU = 0x00E4;
> +    static const UChar comboRU = 0x00D5;
> +    static const UChar comboRUU = 0x00E5;
> +    static const UChar comboLU = 0x00D6;
> +    static const UChar comboLUU = 0x00E6;
> +    static const UChar comboVU = 0x00D7;
> +    static const UChar comboVUU = 0x00E7;
> +    static const UChar comboZHU = 0x00D8;
> +    static const UChar comboZHUU = 0x00E8;
> +    static const UChar comboLLU = 0x00D9;
> +    static const UChar comboLLUU = 0x00E9;
> +    static const UChar comboRRU = 0x00DA;
> +    static const UChar comboRRUU = 0x00EA;
> +    static const UChar digitZERO = 0x20AC;
> +    static const UChar digitONE = 0x0081;
> +    static const UChar digitTWO = 0x008D;
> +    static const UChar digitTHREE = 0x017D;
> +    static const UChar digitFOUR = 0x008F;
> +    static const UChar digitFIVE = 0x0090;
> +    static const UChar digitSIX = 0x2022;
> +    static const UChar digitSEVEN = 0x2013;
> +    static const UChar digitEIGHT = 0x2014;
> +    static const UChar digitNINE = 0x02DC;
> +    static const UChar digitTEN = 0x009D;
> +    static const UChar digitHUNDRED = 0x017E;
> +    static const UChar digitTHOUSAND = 0x0178;
> +    static const UChar LQTSINGLE = 0x2018;
> +    static const UChar RQTSINGLE = 0x2019;
> +    static const UChar LQTDOUBLE = 0x201C;
> +    static const UChar RQTDOUBLE = 0x201D;
> +    static const UChar COPYRIGGHT = 0x00A9;

These should be private unless they are part of the public interface to this
class. Ideally they shouldn't be class members at all. They could just go
inside the .cpp file.

In general, names for Unicode characters should go into the CharacterNames.h
header and should be the names from the Unicode specification, without any runs
of all capital letters. For example: copyrightSign.

> -    if (oldTransform != style()->textTransform() || oldSecurity != style()->textSecurity()) {
> +    if (oldTransform != style()->textTransform() || oldSecurity != style()->textSecurity()
> +#if ENABLE(TEXT_TRANSCODER)
> +        || style()->font().isTranscodableFont()
> +#endif

This check seems wrong. It doesn't even check if the font changed. If it's the
same font, we have no reason to re-transcode.

> +#if ENABLE(TEXT_TRANSCODER)
> +    if (style()->font().isTranscodableFont())
> +        m_text = transcode(String(m_text->characters(), m_text->length()));
> +#endif

Should just be String(m_text) here -- this code unnecessarily makes a deep
copy.

>  #include "RenderObject.h"
> +#include <wtf/HashSet.h>
> +#include <wtf/HashMap.h>

Why did you add these includes?

>  
>  namespace WebCore {
>  
> @@ -147,6 +151,9 @@ private:
>      bool containsOnlyWhitespace(unsigned from, unsigned len) const;
>      int widthFromCache(const Font&, int start, int len, int xPos) const;
>      bool isAllASCII() const { return m_isAllASCII; }
> +#if ENABLE(TEXT_TRANSCODER)
> +    PassRefPtr<StringImpl> transcode(String text);
> +#endif

The argument type should be "const String&", not "String". The return type
should probably just be String rather than PassRefPtr<StringImpl>.


-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list