[webkit-help] dylib strangeness

Tim Prepscius timprepscius at gmail.com
Mon May 31 08:24:46 PDT 2010


Never mind.  There was, in fact one stray pointer, which I introduced
in the last two weeks making these plugins.  Thank the tao for
libgmalloc I guess.

On 5/31/10, Tim Prepscius <timprepscius at gmail.com> wrote:
> Well I wrote this e-mail to a friend, but perhaps one of you may read
> it and see the solution,
> any hints would be interesting:
>
>
>
> I'm having a debugging problem like I've never had before with this
> safari plugin.
>
>
> Maybe explaining to you will help me gain some sort of insight.
>
> Imagine this:
> You have a program.  It is made of about 20 libraries.  Some other
> peoples, some yours.
> The difference between the safari plugin, and an executable, is about
> 100 lines of startup code.  Maybe 0.01% of the entire program.
>
> On both windows and osx, the executable functions flawlessly.  80% of
> the program has been around for 5 years atleast,  18% is in the last 2
> years, 1-2% is in the last year.  So in other words, things have been
> working for a long time.
>
>
> As a safari plugin, there is a point A and a point B which crash.
> Only in release build.
>
> A crashes only 10% of the time.
> I can increase the likely hood that A crashes by delaying the event by
> about 10 seconds.  At which the likely hood is maybe 30%.  (but I'm
> weirded out by this and don't trust this observation)  I can create
> this delay by just pausing the debugger, or pausing the server with
> which it is talking to.
> A crashes during a dynamic cast of an object.
> The same dynamic cast occurred a few moments before.
> If it makes it past point A, that same code/dyamic_cast will work
> perpetually.
> This same code is called millions of times.  The object it is casting
> is allocated in the very beginning, and deallocated at the very end.
>
>
> When I turn on logging, the crash does not occur.
> When I turn off optimization, the crash does not occur.  However if I
> turn off optimization of only the module in which the crash occurs (or
> the call to dynamic cast), it still occurs.
>
> The code in which A crashes looks like this:
> void Dynamic::event (const Object::Event::Base *event)
> {
> 	LogDebug (SnowCrash::Object::Dynamic::event, "object receiving event
> " << this);
>
> 	std::list<Common::Object::Component *>::iterator i;
>
> 	for (i=orderedComponents.begin(); i!=orderedComponents.end(); ++i)
> 	{
> 		Common::Object::Component *_component = *i;
> 		LogDebug (SnowCrash::Object::Dynamic::event, "object distributing
> event to " << _component);
> 		LogDebug (SnowCrash::Object::Dynamic::event, "object distributing
> event to " << _component->getComponentID());
>
> 		Object::Component *component = CheckCastPtr(Object::Component,
> _component);
> 		if (component)
> 		{
> 			component->event (event);
> 		}
> 	}
> }
>
>
>
> B has no pattern.
> It occurs when a piece of memory is deallocated twice.  When a
> smartptr decs.  But this is impossible.  Unless either a copy
> constructor or a copy operator is not being called.  It could be a
> copy construct of the object which contains the smart pointer or the
> smart pointer itself.  Either seem very unlikely.  Unfortunately this
> bug occurs so rarely it is hard to catch.
>
>
>
> --
>
> So at first my theory was, well, let's see what is happening.
> But after stepping through over and over, I can't see anything wrong
> with the object it is trying to cast.  Obviously there is.
>
>
> So then I thought, well, perhaps this is just a messed up build.  So I
> rebuilt everything.  This occurs sometimes on win32 with me if I link
> to a class of which I've changed the virtual methods, but not
> recompiled modules depending on it.
>
>
> So then I thought..  Well given that the executables operate fine.
> Maybe there is some sort of bug in static initializations.
> But they *seem* to be occurring.  At least some of them are.
>
>
> So then I thought, maybe there is some sort of discord between
> object-c and c++, with memory management.  And I investigated that for
> a while.  However that would not explain the fact it always crashes in
> the same place.  If it crashes at all.  It seems to me, that enough
> people are mixing objective-c and c++ so that this should not be a
> problem.
>
>
> So then I thought..  Ok, I think that that memory is being modified,
> either by safari.  Or by my own threads (which function fine as an
> executable).  And it is suspicious that this problem seems linked to
> time.  So I wrote a memory watcher.  I overwrote new and delete, kept
> a set of memory, and did continuous CRC's on that memory, looking for
> when bits changed.  [which it turns out is pretty interesting to watch
> anyway]
>
>
> However this new/delete overriding changed the timing of the program.
> And it stopped crashing.
> I tried to move the area which is watched only to a specific section,
> however it continues to not crash.
> But when I turn that memory watching off, it crashes again.
>
> Also, perhaps that memory watching causes more allocations, and
> perhaps that changes the overall structure of the allocations.
> Because a *single time*, this memory watcher/debugger crashed.  Saying
> that it was watching NULL memory.  Which was impossible.
> Cause basically I have this:
>
> new:
> lock memory-mutex
>   make memory, make memory tag
>   if either is NULL, return NULL
>   else add it to the set of memory to watch.
> unlock memory-mutex
>
> delete:
> lock memory-mutex
>   if the memory is tagged
>   remove it from set and delete it
>   else just delete it
> unlock memory-mutex
>
> test:
> lock memory-mutex
>   evaluate crc's of memory compare with tags, has anything changed,
> print out a message
> unlock memory-mutex
>
>
> This crash of the memory watcher really weirded me out, cause it it
> nearly impossible, unless boost+pthreads has problems on osx, so it
> seems to me that some external process zero'd a segment of my memory.
>
> Which would explain why the crash of the smart ptr dec, and also the
> dynamic_cast failure.
>
>
> So my current working theory is:
> 1.  a pointer somewhere, is initialized incorrectly, but always the same
> way.
> 2.  writing to it is zeroing out my memory.
> 3.  this pointer may or may not be within my dylib/process space
>
>
> So my question to you is:
>
> What would your approach to solving this be?  Cause my usual isn't
> working.  Any magic bullets?
> I'm up to maybe 50 hours on this bug.
>
>
> -tim
>
>
>
> On 5/28/10, Tim Prepscius <timprepscius at gmail.com> wrote:
>> Greetings again,
>>
>> So I've been able to (perhaps) solve my opengl issues, by switching
>> cocoa basically.  I'm still using agl via the window ref of the cocoa
>> window.  Seems to function, I wonder if it will fail with some update
>> of safari.  On a side note, if anyone sees this post while
>> investigating opengl problems, don't bother with xulrunner on mac!  It
>> will just be a waste of time.   It took me a while to figure out that
>> npapi was in webkit as well.
>>
>>
>> But now I'm seeing some extreme strangeness in other areas.
>>
>>
>> So I have a Client application.
>> It is made up of about 20 libraries and a bit of connecting code.
>>
>> One version links as a windowed executable.
>> One version links as a plugin.
>> (depending on which bit of connecting code you use)
>> However the rest of the code for the application in both cases is
>> exactly they same.  99.999% of it.
>>
>>
>> The strangeness I'm seeing is this:
>> The application version functions without problem both debug and
>> release.  (as it has done for quite a while).
>> The plugin version crashes.  But only the optimized non debug build.
>>
>> And it crashes is weird ways that are reminiscent of out of sync
>> linking problems.  For instance "dynamic_cast" is failing and causing
>> a crash in an area nearly impossible.  And that area of code has
>> existed without problem for 9 years.
>>
>> There seem to be initialization problems of variables.  Or perhaps a
>> copy operator/constructor is not being called correctly.
>>
>>
>>
>> I've spent the last two days investigating what could be causing this.
>>  It is a mystery, cause the normal application just hums along fine,
>> while the plugin crashes, not immediately, however in the first 5
>> seconds or so, as significant events occur.
>>
>> My leaning is to think there is a problem with gcc and optimized code
>> in dylibs, perhaps their static initializations are not being
>> completely performed?  But I must think that the chances of this are
>> fairly small, as apple uses dylibs everywhere, so they would make sure
>> that these function correctly.
>>
>>
>> Has anyone else seen a situation where optimized code doesn't perform
>> as a dylib, while as an executable it does?  What was the work around?
>>
>> Or, does anyone know of problems with mixing objective-c and c++ in a
>> dylib?
>>
>>
>>
>> As of now, I'm trying to isolate the module which causes the problem
>> in release build, and see if I can isolate the code segment, but it is
>> slow going, and I'm not sure whether this error will manifest
>> somewhere else.
>>
>> -tim
>>
>


More information about the webkit-help mailing list