Click to See Complete Forum and Search --> : threading crash with std::vector


adamjmac
June 18th, 2009, 01:23 AM
Hi,

I'm using a function to create, fill, and return a vector which runs in multiple threads (with pthreads) simultaneously. When I run the program in a single thread, it works. In multiple threads on one processor, it works. Only when I use multiple processors does it crash, consistently on each thread. After spending the day troubleshooting I'm about to blame the vector template and make my own dynamic array, but hopefully someone can pick up on something I missed...

AgentBase and SystemBase are classes, and xy is typedef float[2]. Summarized, the code is:

typedef vector<AgentBase *> AgentList; // dynamic list of agents

AgentBase **agent; // within SystemBase

------ in the thread -------

AgentList SystemBase :: GetAgentsInRange(xy center, float range) {

AgentList list;

for (int i = 0; i < n_agent; i++) {
AgentBase *a = agent[i];
if (xy_distsq(center, a->s) <= rangesq) {
list.push_back(a);
}
}
return list;
}

Called with (also inside thread):

AgentList list = GetAgentsInRange(...);


When I surround the function call in a mutex, it runs, but a mutex surrounding the code inside the function does not run. The crash happens in push_back, and each thread segfaults on the first function call. Also, if I put any printf's in the function, it works. Yeah, one of those bugs.

I also tried replacing vector with a basic array, and it works fine. This is my fallback, but I'd really like to use vector so I don't have to worry about returning the size as well as the array. But this is why I'm starting to think vector is not actually thread-safe, even though it should be - at least for what I'm doing.

The trace from gdb is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 2132.0x338]
0x75f3a09e in ?? ()
(gdb) backtrace
#0 0x75f3a09e in ?? ()
#1 0x00419e0c in AgentBase** std::__copy_trivial<AgentBase*>(AgentBase* const*,
AgentBase* const*, AgentBase**) (__first=0xa133a8, __last=0xa133ac,
__result=0x40f100) at C:/Dev-Cpp/include/c++/3.3.1/bits/stl_algobase.h:252
#2 0x00419d5f in AgentBase** std::__copy_aux2<AgentBase*>(AgentBase**, AgentBas
e**, AgentBase**, __true_type) (__first=0xa133a8, __last=0xa133ac,
__result=0x40f100) at C:/Dev-Cpp/include/c++/3.3.1/bits/stl_algobase.h:272
#3 0x00419d26 in __gnu_cxx::__normal_iterator<AgentBase**, std::vector<AgentBas
e*, std::allocator<AgentBase*> > > std::__copy_ni2<AgentBase**, __gnu_cxx::__nor
mal_iterator<AgentBase**, std::vector<AgentBase*, std::allocator<AgentBase*> > >
...
#9 0x00419965 in std::vector<AgentBase*, std::allocator<AgentBase*> >::push_bac
k(AgentBase* const&) (this=0xeffeb0, __x=@0xeffdd0)
at C:/Dev-Cpp/include/c++/3.3.1/bits/stl_vector.h:603
#10 0x00404375 in SystemBase::GetAgentsInRange(float*, float) (this=0x3511d0,
center=0x35f398, range=2) at abmsim.cpp:840
#11 0x00404e12 in SystemBase::Physics_Repulsion_Thread(int) (this=0x3511d0,
id=1) at abmsim.cpp:1163
#12 0x00404946 in SystemBase::Physics1_Thread(int) (this=0x3511d0, id=1)
---Type <return> to continue, or q <return> to quit---
at abmsim.cpp:1103
#13 0x004015aa in cpu_func_physics1(void*) (arg=0x35135c) at abmsim.cpp:158
#14 0x69ec12fa in ptw32_threadStart@4 ()
...

It also crashes sometimes on default_alloc_template(true, 0) or something like that. Also, sometimes if I make it run somehow, it crashes on a different push_back on a different vector in the same thread.

Please help. I recently switched over from C to C++ so I'm not used to templates. I can provide more info if it's vague at this point.

Codeplug
June 18th, 2009, 09:41 AM
Are you protecting all access to shared memory/variables by multiple threads with a mutex? I see a read from "agent[]", as well as accesses to the contained objects. Are those read-only objects?

It's possible that you have an STL implementation or build issue - but such suspicions are rarely true.

gg

adamjmac
June 18th, 2009, 10:23 AM
I'm not using mutexes to access globals that are not modified by another thread. I've commented the code down to this single loop in two threads and it continuously crashes. It works on linux, so it may be a build issue... I'm using the compiler and libraries that come with Dev-C++ 5.

Is there something funny about vectors messing with the objects that pointers they contain point to? I found out it is crashing on all threads on the FIRST iteration of this loop, so it wouldn't be a resizing issue. Why would the vector be using 10 function calls to allocator stuff if its the first element to add? Unless it starts empty... I'll try an initial capacity.

adamjmac
June 18th, 2009, 10:54 AM
Okay, please add some insight on this discovery. On windows, a new vector has 0 capacity. If I call .reserve(32), it crashes. If I use reserve(33), it works. Any number >= 33 works, any less and it crashes.

adamjmac
June 18th, 2009, 11:05 AM
Fortunately, updating mingw fixed the crash. Thanks anyway...

Codeplug
June 18th, 2009, 11:18 AM
Cool.

Dev-C++ is fairly old. Code::Blocks (http://www.codeblocks.org/) is a good alternative. You can also get the latest (unofficial) builds of MinGW here: http://www.tdragon.net/recentgcc/

gg