[webkit-help] dylib strangeness
Tim Prepscius
timprepscius at gmail.com
Mon May 31 08:24:46 PDT 2010
Never mind. There was, in fact one stray pointer, which I introduced
in the last two weeks making these plugins. Thank the tao for
libgmalloc I guess.
On 5/31/10, Tim Prepscius <timprepscius at gmail.com> wrote:
> Well I wrote this e-mail to a friend, but perhaps one of you may read
> it and see the solution,
> any hints would be interesting:
>
>
>
> I'm having a debugging problem like I've never had before with this
> safari plugin.
>
>
> Maybe explaining to you will help me gain some sort of insight.
>
> Imagine this:
> You have a program. It is made of about 20 libraries. Some other
> peoples, some yours.
> The difference between the safari plugin, and an executable, is about
> 100 lines of startup code. Maybe 0.01% of the entire program.
>
> On both windows and osx, the executable functions flawlessly. 80% of
> the program has been around for 5 years atleast, 18% is in the last 2
> years, 1-2% is in the last year. So in other words, things have been
> working for a long time.
>
>
> As a safari plugin, there is a point A and a point B which crash.
> Only in release build.
>
> A crashes only 10% of the time.
> I can increase the likely hood that A crashes by delaying the event by
> about 10 seconds. At which the likely hood is maybe 30%. (but I'm
> weirded out by this and don't trust this observation) I can create
> this delay by just pausing the debugger, or pausing the server with
> which it is talking to.
> A crashes during a dynamic cast of an object.
> The same dynamic cast occurred a few moments before.
> If it makes it past point A, that same code/dyamic_cast will work
> perpetually.
> This same code is called millions of times. The object it is casting
> is allocated in the very beginning, and deallocated at the very end.
>
>
> When I turn on logging, the crash does not occur.
> When I turn off optimization, the crash does not occur. However if I
> turn off optimization of only the module in which the crash occurs (or
> the call to dynamic cast), it still occurs.
>
> The code in which A crashes looks like this:
> void Dynamic::event (const Object::Event::Base *event)
> {
> LogDebug (SnowCrash::Object::Dynamic::event, "object receiving event
> " << this);
>
> std::list<Common::Object::Component *>::iterator i;
>
> for (i=orderedComponents.begin(); i!=orderedComponents.end(); ++i)
> {
> Common::Object::Component *_component = *i;
> LogDebug (SnowCrash::Object::Dynamic::event, "object distributing
> event to " << _component);
> LogDebug (SnowCrash::Object::Dynamic::event, "object distributing
> event to " << _component->getComponentID());
>
> Object::Component *component = CheckCastPtr(Object::Component,
> _component);
> if (component)
> {
> component->event (event);
> }
> }
> }
>
>
>
> B has no pattern.
> It occurs when a piece of memory is deallocated twice. When a
> smartptr decs. But this is impossible. Unless either a copy
> constructor or a copy operator is not being called. It could be a
> copy construct of the object which contains the smart pointer or the
> smart pointer itself. Either seem very unlikely. Unfortunately this
> bug occurs so rarely it is hard to catch.
>
>
>
> --
>
> So at first my theory was, well, let's see what is happening.
> But after stepping through over and over, I can't see anything wrong
> with the object it is trying to cast. Obviously there is.
>
>
> So then I thought, well, perhaps this is just a messed up build. So I
> rebuilt everything. This occurs sometimes on win32 with me if I link
> to a class of which I've changed the virtual methods, but not
> recompiled modules depending on it.
>
>
> So then I thought.. Well given that the executables operate fine.
> Maybe there is some sort of bug in static initializations.
> But they *seem* to be occurring. At least some of them are.
>
>
> So then I thought, maybe there is some sort of discord between
> object-c and c++, with memory management. And I investigated that for
> a while. However that would not explain the fact it always crashes in
> the same place. If it crashes at all. It seems to me, that enough
> people are mixing objective-c and c++ so that this should not be a
> problem.
>
>
> So then I thought.. Ok, I think that that memory is being modified,
> either by safari. Or by my own threads (which function fine as an
> executable). And it is suspicious that this problem seems linked to
> time. So I wrote a memory watcher. I overwrote new and delete, kept
> a set of memory, and did continuous CRC's on that memory, looking for
> when bits changed. [which it turns out is pretty interesting to watch
> anyway]
>
>
> However this new/delete overriding changed the timing of the program.
> And it stopped crashing.
> I tried to move the area which is watched only to a specific section,
> however it continues to not crash.
> But when I turn that memory watching off, it crashes again.
>
> Also, perhaps that memory watching causes more allocations, and
> perhaps that changes the overall structure of the allocations.
> Because a *single time*, this memory watcher/debugger crashed. Saying
> that it was watching NULL memory. Which was impossible.
> Cause basically I have this:
>
> new:
> lock memory-mutex
> make memory, make memory tag
> if either is NULL, return NULL
> else add it to the set of memory to watch.
> unlock memory-mutex
>
> delete:
> lock memory-mutex
> if the memory is tagged
> remove it from set and delete it
> else just delete it
> unlock memory-mutex
>
> test:
> lock memory-mutex
> evaluate crc's of memory compare with tags, has anything changed,
> print out a message
> unlock memory-mutex
>
>
> This crash of the memory watcher really weirded me out, cause it it
> nearly impossible, unless boost+pthreads has problems on osx, so it
> seems to me that some external process zero'd a segment of my memory.
>
> Which would explain why the crash of the smart ptr dec, and also the
> dynamic_cast failure.
>
>
> So my current working theory is:
> 1. a pointer somewhere, is initialized incorrectly, but always the same
> way.
> 2. writing to it is zeroing out my memory.
> 3. this pointer may or may not be within my dylib/process space
>
>
> So my question to you is:
>
> What would your approach to solving this be? Cause my usual isn't
> working. Any magic bullets?
> I'm up to maybe 50 hours on this bug.
>
>
> -tim
>
>
>
> On 5/28/10, Tim Prepscius <timprepscius at gmail.com> wrote:
>> Greetings again,
>>
>> So I've been able to (perhaps) solve my opengl issues, by switching
>> cocoa basically. I'm still using agl via the window ref of the cocoa
>> window. Seems to function, I wonder if it will fail with some update
>> of safari. On a side note, if anyone sees this post while
>> investigating opengl problems, don't bother with xulrunner on mac! It
>> will just be a waste of time. It took me a while to figure out that
>> npapi was in webkit as well.
>>
>>
>> But now I'm seeing some extreme strangeness in other areas.
>>
>>
>> So I have a Client application.
>> It is made up of about 20 libraries and a bit of connecting code.
>>
>> One version links as a windowed executable.
>> One version links as a plugin.
>> (depending on which bit of connecting code you use)
>> However the rest of the code for the application in both cases is
>> exactly they same. 99.999% of it.
>>
>>
>> The strangeness I'm seeing is this:
>> The application version functions without problem both debug and
>> release. (as it has done for quite a while).
>> The plugin version crashes. But only the optimized non debug build.
>>
>> And it crashes is weird ways that are reminiscent of out of sync
>> linking problems. For instance "dynamic_cast" is failing and causing
>> a crash in an area nearly impossible. And that area of code has
>> existed without problem for 9 years.
>>
>> There seem to be initialization problems of variables. Or perhaps a
>> copy operator/constructor is not being called correctly.
>>
>>
>>
>> I've spent the last two days investigating what could be causing this.
>> It is a mystery, cause the normal application just hums along fine,
>> while the plugin crashes, not immediately, however in the first 5
>> seconds or so, as significant events occur.
>>
>> My leaning is to think there is a problem with gcc and optimized code
>> in dylibs, perhaps their static initializations are not being
>> completely performed? But I must think that the chances of this are
>> fairly small, as apple uses dylibs everywhere, so they would make sure
>> that these function correctly.
>>
>>
>> Has anyone else seen a situation where optimized code doesn't perform
>> as a dylib, while as an executable it does? What was the work around?
>>
>> Or, does anyone know of problems with mixing objective-c and c++ in a
>> dylib?
>>
>>
>>
>> As of now, I'm trying to isolate the module which causes the problem
>> in release build, and see if I can isolate the code segment, but it is
>> slow going, and I'm not sure whether this error will manifest
>> somewhere else.
>>
>> -tim
>>
>
More information about the webkit-help
mailing list