[webkit-dev] Bit fields cause Purify to report UMRs

Wed May 2 17:31:36 PDT 2007

I'm Purifying WebKit on Windows. I'm getting tons of UMR (uninitialized
memory read) warnings. Most of these seem to be in constructors of
classes that contain bit fields. These UMRs are benign but it is strictly
speaking correct for Purify to report them.  So my question is: would
you trade off the space savings of the bit fields for the elimination of the
Purify noise?

Here is an example.

[W] UMR: Uninitialized memory read in
WebCore::DeprecatedStringData::DeprecatedStringData(DeprecatedStringData::WebCore&)
{8 occurrences}
        Reading 4 bytes from 0x05a02f50 (4 bytes at 0x05a02f50 uninitialized)
        Address 0x05a02f50 is 16 bytes into a 52 byte block at 0x05a02f40
        Address 0x05a02f50 points to a C++ new block in heap 0x044d0000
        Thread ID: 0x7d8
        Error location
            WebCore::DeprecatedStringData::DeprecatedStringData(DeprecatedStringData::WebCore&)
[deprecatedstring.cpp:332]
                    , _isHeapAllocated(0)
                    , _maxAscii(o._maxAscii)
                    , _isAsciiValid(o._isAsciiValid)
             => {
                    // Handle the case where either the Unicode or
8-bit pointer was
                    // pointing to the internal buffer. We need to point at the
                    // internal buffer in the new object, and copy the
characters.

The DeprecatedStringData class is defined as follows:

// Keep this struct to <= 46 bytes, that's what the system will allocate.
// Will be rounded up to a multiple of 4, so we're stuck at 44.

#define WEBCORE_DS_INTERNAL_BUFFER_SIZE 20
[...]

struct DeprecatedStringData
{
    [...]
    unsigned refCount;
    unsigned _length;
    mutable DeprecatedChar *_unicode;
    mutable char *_ascii;

    unsigned _maxUnicode : 30;
    bool _isUnicodeValid : 1;
    bool _isHeapAllocated : 1; // Fragile, but the only way we can be
sure the instance was created with 'new'.
    unsigned _maxAscii : 31;
    bool _isAsciiValid : 1;

    // _internalBuffer must be at the end - otherwise it breaks on archs that
    // don't pack structs on byte boundary, like some versions of gcc on ARM
    char _internalBuffer[WEBCORE_DS_INTERNAL_BUFFER_SIZE]; // Pad out
to a (((size + 1) & ~15) + 14) size

    [...]
};

Its size is 52 bytes as opposed to the intended 46 bytes on Windows
because 'bool' bit fields are packed separately from the 'unsigned' bit
fields.  So "16 bytes into a 52 byte block" is the 'maxUnicode' field.

This UMR is followed by three related UMRs at other offsets:

    Reading 1 byte from 0x05a02f54 (1 byte at 0x05a02f54 uninitialized)
    Address 0x05a02f54 is 20 bytes into a 52 byte block at 0x05a02f40
    Address 0x05a02f54 points to a C++ new block in heap 0x044d0000
    Thread ID: 0x7d8

    Reading 4 bytes from 0x05a02f58 (4 bytes at 0x05a02f58 uninitialized)
    Address 0x05a02f58 is 24 bytes into a 52 byte block at 0x05a02f40
    Address 0x05a02f58 points to a C++ new block in heap 0x044d0000
    Thread ID: 0x7d8

    Reading 1 byte from 0x05a02f5c (1 byte at 0x05a02f5c uninitialized)
    Address 0x05a02f5c is 28 bytes into a 52 byte block at 0x05a02f40
    Address 0x05a02f5c points to a C++ new block in heap 0x044d0000
    Thread ID: 0x7d8

We can conclude that these UMRs are for the other bit fields.

I found that these UMRs are technically real UMRs and they are
a limitation of Purify.  My web search confirms that UMRs caused
by bit fields are a known problem.  I didn't inspect the assembly
code to confirm this, but you can easily see why Purify reports
these UMRs if you think how the compiler would generate code
to initialize the bit fields:

DeprecatedStringData::DeprecatedStringData(DeprecatedStringData &o)
    : refCount(1)
    , _length(o._length)
    , _unicode(o._unicode)
    , _ascii(o._ascii)
    , _maxUnicode(o._maxUnicode)
    , _isUnicodeValid(o._isUnicodeValid)
    , _isHeapAllocated(0)
    , _maxAscii(o._maxAscii)
    , _isAsciiValid(o._isAsciiValid)
{
[...]
}

Suppose bit field x is the 0th bit (least significant bit) of a word w
and bit field y is the 1st bit.  To initialize x to a and y to b, the
compiler would generate this pseudocode:

    unsigned w;  // uninitialized

    w |= (a & 0x1);  // x(a)
    w |= (b & 0x1) << 1;  // y(b)

Since the other bits in the word w are unused, the compiler
doesn't need to initialize w to 0.  Unfortunately, Purify instruments
the binaries directly and operates at byte level, so it correctly
reports that there are UMRs of w.

There are so many UMRs that I am forced to ignroe all UMRs.
This is bad.  I am wondering if you'd consider not using bit fields
so that these UMRs can be eliminated.  You can either just make
them regular fields, which take more space, or implement the bit
fields by hand.
-- 
Anyang Ren
Open source developer