Click to See Complete Forum and Search --> : [RESOLVED] cstdatomic (c++0x std::atomic) / g++ 4.4


lundyp
October 5th, 2009, 08:59 AM
Hello,

I'm trying to use cstdatomic (std::atomic in the upcoming c++0x standard) in g++ 4.4, but it doesn't seems to work as expected and I would like to known if I missed something...

My goal in the code below is just to test differents ways of concurrency (mutex vs atomic) : 4 threads increments counters in 3 ways.

1 - with std::mutex : works perfectly but slightly slow (lock / unlock)
2 - with g++ atomics "builtins" : works perfectly
3 - with std::atomic : doesn't works.


here is the code :

#include <thread>
#include <mutex>
#include <cstdatomic>

int cnt_mutex = 0;
int cnt_builtins = 0;
std::atomic<int> cnt_stdatomic = {0};
std::mutex smutex;

/* thread code */
void thrcode(int thrid)
{
/* counter local to thread */
int cnt_local;
for (cnt_local = 0; cnt_local < 10000000; cnt_local++)
{
/* c++0x fetchadd -> FAIL */
cnt_stdatomic.fetch_add(1);

/* g++ builtin fetchadd -> OK */
__sync_fetch_and_add(&cnt_builtins,1);

/* scoped std::mutex : OK */
{
std::unique_lock<std::mutex> lockit(smutex);
cnt_mutex++;
};
};
printf("thread(%d) : cnt_local %d\n",thrid,cnt_local);
};


int main() {
/* thread init */
std::thread m1(thrcode,1),m2(thrcode,2),m3(thrcode,3),m4(thrcode,4);

/* wait all thread */
m1.join(); m2.join(); m3.join(); m4.join();

/* display counters */
printf("end : cnt_mutex %d, cnt_builtins %d, cnt_stdatomic %d\n",
cnt_mutex,cnt_builtins,(int)cnt_stdatomic);

};


compile with : g++ -std=c++0x mt.cpp -O3 -pthread && date && ./a.out && date

it give :
dimanche 4 octobre 2009, 19:10:28 (UTC+0200) (french time...)
thread(1) : cnt_local 10000000
thread(3) : cnt_local 10000000
thread(4) : cnt_local 10000000
thread(2) : cnt_local 10000000
end : cnt_mutex 40000000, cnt_builtins 40000000, cnt_stdatomic 39990400
dimanche 4 octobre 2009, 19:10:46 (UTC+0200)

cnt_stdatomic counter is incorrect... Someone could tell me where i'm wrong, did i miss something in code ?

(I'm using g++ version 4.4.1 Ubuntu 4.4.1-4ubuntu6)

Thanks !

Codeplug
October 5th, 2009, 11:55 AM
Please use code tags when posting code: http://www.codeguru.com/forum/misc.php?do=bbcode#code

Do you get the same result if you replace "-O3" with "-g"?

If so, I'd like to see the generated assembly with:

g++ -std=c++0x -g -c mt.cpp
objdump -d -S mt.o
Keep in mind that "-std=c++0x" is "bleeding edge" stuff. So don't be surprised if you see blood :)

gg

Codeplug
October 5th, 2009, 12:36 PM
Wait a sec....

>> std::atomic<int> cnt_stdatomic = {0};
That should not compile since std::atomic<> is not an aggregate.

That should be: std::atomic<int> cnt_stdatomic(0);

See if that helps.

gg

Ajay Vijay
October 5th, 2009, 01:25 PM
/* c++0x fetchadd -> FAIL */
cnt_stdatomic.fetch_add(1);Have you checked the value just after you call it first time?

lundyp
October 5th, 2009, 03:01 PM
thanks Codeplug & Ajay Vijay (sorry for code tags i'm a newbie on forums, and i'm frenchie so english is not natural for me :-))

Codeplug

I've compiled with options -g -c only, but the result is the same. (even if it failed, I have joined assembly code -> I can see lock(f0) assembly instruction into expanded _sync_fetch_and_add, but my assembly is too ... older to follow assembly's stream).

for std::atomic<int> cnt_stdatomic = {0} it compile fine. My understanding is it's part of the new initializer list/uniform initialization enhancement in c++0x (now, not only restricted to aggregate/POD type). I've tested with cnt_stdatomic(0), no change.


Ajay Vijay

I have not checked it. My first attempt was to do a "atomic_increment()" but g++ doesn't not define it actually, so I rely on atomic_fetch_add which do the trick (i think).
By checking value, do you mean the one before increment in fetch_add or the one before the call to fetch_add, because in this test checking accuracy of the return value will be hard : which value should be expected on return of fetch_add since preempt may occur just before it in one thread by another thread


// pseudo-code
// in thread_1
oldval = (int)cnt_stdatomic;
// --> thread_2 preempt just here and do increment, thread_3 do the same, etc...
// --> now thread_1 get back
newval = (int)cnt_stdatomic.fetch_add(1);
// which value contain newval ?! oldval +1 ? +2 ?


and adding a scoped mutex around this codeblock would have influence on cnt_stdatomic.fetch_add (in fact, here it will implicitly works...), i'm right ?

Codeplug
October 5th, 2009, 06:03 PM
>> My understanding is it's part of the new initializer list/uniform initialization enhancement in c++0x
Ah, yes. That explains why it compiled :)
http://en.wikipedia.org/wiki/C%2B%2B0x#Initializer_lists

So I downloaded the "Ubuntu 9.10 alpha 6" VMWare image from here: http://bagside.com/bagvapp/
This came with "g++ (Ubuntu 4.4.1-4ubuntu1) 4.4.1".

It compiled, and generated identical assembly to what you posted. I ran it - and it worked. The virtual machine was setup for only one processor. So I reconfigured the VM for 2 processors and ran it again:
end : cnt_mutex 40000000, cnt_builtins 40000000, cnt_stdatomic 39714790Oops.

I got a better assembly listing using the following: g++ -c -g -Wa,-alh,-L -std=c++0x test.cpp > test.lstThis is a little more readable. You can see what fetch_add() is doing in /usr/include/c++/4.4/bits/atomic_0.h:371

It just seems to be broken - for i686, SMP at least...

gg

lundyp
October 6th, 2009, 04:00 AM
(you shown me I have to remove dust on my databooks :-))

Effectively, i'm using Core2duo with kernel 2.6.31 SMP, and I did not think to do test on single cpu.

I've take a look on fetch_add() you pointed out :
macro _ATOMIC_MODIFY_ is expanded with a call to __atomic_flag_wait_explicit(), which internally calls __atomic_flag_test_and_set_explicit which use a volatile atomic_flag.

According to me and as far i can read on various articles / c++0x drafts : atomic_flag "have to be" lock-free, and maybe it's the point of failure within actually SMP code implementation (at least, in my architecture)

You said it before : since c++0x in not fully standardized, i can see blood... here it is, i've just have to wait

Thanks for all Codeplug !

PS : How do i close this post ? (i'm really newbie...)

lundyp
October 6th, 2009, 07:00 AM
forget my PS, i've find close....

Codeplug
October 6th, 2009, 02:25 PM
I just downloaded a Fedora 11 VM and the code worked fine - it was running gcc 4.4.1 with libstdc++ 20090725. (Ubuntu VM is running libstdc++ 20091003.)

In fact, stepping into fetch_add() showed __sync_fetch_and_add() being called directly, from atomic_2.h.

So I did some digging on how that stuff works - using atomic_0.h vs atomic_2.h etc... It's based on some macro's from bits\c++config.h. To see what they were, I added this code:

#ifdef _GLIBCXX_ATOMIC_BUILTINS_1
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_1 defined")
#else
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_1 NOT defined")
#endif

#ifdef _GLIBCXX_ATOMIC_BUILTINS_2
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_2 defined")
#else
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_2 NOT defined")
#endif

#ifdef _GLIBCXX_ATOMIC_BUILTINS_4
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_4 defined")
#else
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_4 NOT defined")
#endif

#ifdef _GLIBCXX_ATOMIC_BUILTINS_8
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_8 defined")
#else
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_8 NOT defined")
#endif
On Ubuntu I get:
_GLIBCXX_ATOMIC_BUILTINS_1 defined
_GLIBCXX_ATOMIC_BUILTINS_2 defined
_GLIBCXX_ATOMIC_BUILTINS_4 defined
_GLIBCXX_ATOMIC_BUILTINS_8 NOT defined
On Fedora _8 IS defined! So I built on Ubuntu with -D_GLIBCXX_ATOMIC_BUILTINS_8 and it too started using atomic_2.h (where fetch_add() does a simple __sync_fetch_and_add()).

I don't know if this is a bug with libstdc++ or with Ubuntu's gcc/libstdc++ packaging. Either way, there's definitely a bug with using atomic_0.h.

gg

lundyp
October 7th, 2009, 03:48 PM
Hello codeplug,

_GLIBCXX_ATOMIC_BUILTINS_8 seems to be defined (in c++config.h) only if 64bits native operations are supported (for further __sync* operations, and in particular atomic_2.h. Did you run Fedora VM image x86_64 ?), but Ubuntu installed on my desktop is :

Linux xxx 2.6.31-10-generic #35-Ubuntu SMP Tue Sep 22 17:33:42 UTC 2009 i686 GNU/Linux

so all compiled code is 32bits. Since _GLIBCXX_ATOMIC_BUILTINS_8 is not defined, I'm "defaulted" to compile with atomic_0.h because there no implementation of __sync_fetch_and_add for 64bits operations (see #ifdef... in c++config.h), and here is the result :


compiling this code works fine (32bits int):

int cnt_builtins = 0;

int main () {
__sync_fetch_and_add(&cnt_builtins,1);
};


but compiling this one (64bits long long int) :

long long cnt_builtins = 0;

int main () {
__sync_fetch_and_add(&cnt_builtins,1);
};

give error at link-time :

/tmp/ccUs0NaZ.o: In function `main':
mt2.cpp:(.text+0x21): undefined reference to `__sync_fetch_and_add_8'
collect2: ld returned 1 exit status


so I checked how 64bits are handled in assembly with this :

long long cnt_builtins = 0;

int main () {
cnt_builtins++;
};


and the resulting assembly is :

4:mt2.cpp **** cnt_builtins++;
33 .loc 1 5 0
34 0003 A1000000 movl cnt_builtins, %eax
34 00
35 0008 8B150400 movl cnt_builtins+4, %edx
35 0000
36 000e 83C001 addl $1, %eax
37 0011 83D200 adcl $0, %edx
38 0014 A3000000 movl %eax, cnt_builtins
38 00
39 0019 89150400 movl %edx, cnt_builtins+4
39 0000
40 001f B8000000 movl $0, %eax
40 00


64bits integer operations are done in 2 steps, so I think that __sync_fetch_and_add_8 (64bits) cannot be implemented since it would required 2 assembly locks and 2 consecutives locks may be affected "in the middle" by preemption so 64bits operation would be corrupted -> unsecured (this is my diagnosis... I really don't known if it's works that way, especially concerning kernel preemption !!)

But the above tests (case of *_8/64bits) doesn't explain why atomic operations seems to have bug with atomic_0.h. The only point which is confusing me is the heavy use of volatile in atomic_0.h (force compiler to not to cache and assume possible "external" change)

article on multhreading / volatile / mutex : http://www.ddj.com/cpp/184403766

but for me, volatile means "force re-read before any operation on content" and not "atomically lock content for operation", I have to read more about to well understand and check output of assembly to see if atomic_0.h really generate assembly locks...

see you later

Codeplug
October 7th, 2009, 06:00 PM
I think you are confusing/combining "architecture" with "ELF bit-depth". Both VM's are 32bit kernels, however, the default architecture for gcc on each is different.

Here's some info on each VM:

$ uname -a
Linux Fedora11 2.6.30.8-64.fc11.i586 #1 SMP Fri Sep 25 04:30:19 EDT 2009 i686 i686 i386 GNU/Linux
$ gcc -v
Using built-in specs.
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.4.1 20090925 (Red Hat 4.4.1-17) (GCC)
$ file -L /usr/bin/gcc
/usr/bin/gcc: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped


$ uname -a
Linux Ubuntu910a6 2.6.31-10-generic #34-Ubuntu SMP Wed Sep 16 00:23:19 UTC 2009 i686 GNU/Linux
$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.1-4ubuntu6' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu6)


#include <cstdatomic>

#ifdef _GLIBCXX_ATOMIC_BUILTINS_8
# pragma message("defined")
#else
# pragma message("NOT defined")
#endif

int main ()
{
long long cnt_builtins = 0;
__sync_fetch_and_add(&cnt_builtins,1);
return 0;
}To make this compile on Ubuntu, you need to add "-march=i686" on the command line (or i586). This is because gcc on Ubuntu targets 486 by default. However, I still get "NOT defined" when targeting i686.

So it seems to me that _GLIBCXX_ATOMIC_BUILTINS_8 should be determined at compile time (on every compile). So I tested that on the Fedora VM:
compile with -march=i486, got "NOT defined", (and doesn't link)
compile with -march=i686, "defined", (does link)

So on Fedora, bits/c++config.h determines the correct value of _GLIBCXX_ATOMIC_BUILTINS_8 on each compile (as it should). But in the bits\c++config.h on Ubuntu, _GLIBCXX_ATOMIC_BUILTINS_8 is hard coded to 0, no matter what -march you specify.

>> ... doesn't explain why atomic operations seems to have bug with atomic_0.h.
Agreed.

>> The only point which is confusing me is the heavy use of volatile in atomic_0.h
Don't let that confuse you. It's either internal, compiler-specific usage - or it's to agree with cv-qualification given by the latest draft standard. volatile semantics have not changed that much in the latest C++0x draft. See 1.9-6 and 7 in N2857 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2857.pdf).

>> article ... http://www.ddj.com/cpp/184403766

The main intent of this article is to "hack" the C++ typing system. He's not
presuming any semantics for "volatile" -- he's using it purely as a
syntactic tag that's universally supported by C++ and not commonly used by
applications. The idea is that one declares variables that are shared in a
way that the compiler will complain if you reference them in a context
where you don't hold a lock. That is, you lock in a way that removes the
"volatile tag" from the data and allows you to reference it normally. While
this idea is "sound" (in a bizarre sort of way) it's also a misuse of the
compiler typing mechanism.

This sort of thing isn't a bad idea, and could be used as an argument for an
extensible "attributes" mechanism in C++, as in Lisp or some other
languages.

It also, however, suggests that having "volatile" for unlocked references
helps the program -- and in general, for all the reasons David Schwartz has
been trying to explain, that's silly, pointless, and dangerously wrong
unless you're targeting a SPECIFIC implementation that you happen to KNOW
provides a NONSTANDARD meaning to "volatile" which is relevant to threads. From: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/1fa4a82dda916b18/fd6be8f0b18bd62d?#fd6be8f0b18bd62d

A more recent article on the subject: http://www.ddj.com/architect/212701484

gg