I'm trying to use cstdatomic (std::atomic in the upcoming c++0x standard) in g++ 4.4, but it doesn't seems to work as expected and I would like to known if I missed something...
My goal in the code below is just to test differents ways of concurrency (mutex vs atomic) : 4 threads increments counters in 3 ways.
1 - with std::mutex : works perfectly but slightly slow (lock / unlock)
2 - with g++ atomics "builtins" : works perfectly
3 - with std::atomic : doesn't works.
compile with : g++ -std=c++0x mt.cpp -O3 -pthread && date && ./a.out && date
it give :
dimanche 4 octobre 2009, 19:10:28 (UTC+0200) (french time...)
thread(1) : cnt_local 10000000
thread(3) : cnt_local 10000000
thread(4) : cnt_local 10000000
thread(2) : cnt_local 10000000
end : cnt_mutex 40000000, cnt_builtins 40000000, cnt_stdatomic 39990400
dimanche 4 octobre 2009, 19:10:46 (UTC+0200)
cnt_stdatomic counter is incorrect... Someone could tell me where i'm wrong, did i miss something in code ?
(I'm using g++ version 4.4.1 Ubuntu 4.4.1-4ubuntu6)
Thanks !
Codeplug
October 5th, 2009, 11:55 AM
Please use code tags when posting code: http://www.codeguru.com/forum/misc.php?do=bbcode#code
Do you get the same result if you replace "-O3" with "-g"?
If so, I'd like to see the generated assembly with:
g++ -std=c++0x -g -c mt.cpp
objdump -d -S mt.o
Keep in mind that "-std=c++0x" is "bleeding edge" stuff. So don't be surprised if you see blood :)
gg
Codeplug
October 5th, 2009, 12:36 PM
Wait a sec....
>> std::atomic<int> cnt_stdatomic = {0};
That should not compile since std::atomic<> is not an aggregate.
That should be: std::atomic<int> cnt_stdatomic(0);
See if that helps.
gg
Ajay Vijay
October 5th, 2009, 01:25 PM
/* c++0x fetchadd -> FAIL */
cnt_stdatomic.fetch_add(1);Have you checked the value just after you call it first time?
lundyp
October 5th, 2009, 03:01 PM
thanks Codeplug & Ajay Vijay (sorry for code tags i'm a newbie on forums, and i'm frenchie so english is not natural for me :-))
Codeplug
I've compiled with options -g -c only, but the result is the same. (even if it failed, I have joined assembly code -> I can see lock(f0) assembly instruction into expanded _sync_fetch_and_add, but my assembly is too ... older to follow assembly's stream).
for std::atomic<int> cnt_stdatomic = {0} it compile fine. My understanding is it's part of the new initializer list/uniform initialization enhancement in c++0x (now, not only restricted to aggregate/POD type). I've tested with cnt_stdatomic(0), no change.
Ajay Vijay
I have not checked it. My first attempt was to do a "atomic_increment()" but g++ doesn't not define it actually, so I rely on atomic_fetch_add which do the trick (i think).
By checking value, do you mean the one before increment in fetch_add or the one before the call to fetch_add, because in this test checking accuracy of the return value will be hard : which value should be expected on return of fetch_add since preempt may occur just before it in one thread by another thread
// pseudo-code
// in thread_1
oldval = (int)cnt_stdatomic;
// --> thread_2 preempt just here and do increment, thread_3 do the same, etc...
// --> now thread_1 get back
newval = (int)cnt_stdatomic.fetch_add(1);
// which value contain newval ?! oldval +1 ? +2 ?
and adding a scoped mutex around this codeblock would have influence on cnt_stdatomic.fetch_add (in fact, here it will implicitly works...), i'm right ?
Codeplug
October 5th, 2009, 06:03 PM
>> My understanding is it's part of the new initializer list/uniform initialization enhancement in c++0x
Ah, yes. That explains why it compiled :)
http://en.wikipedia.org/wiki/C%2B%2B0x#Initializer_lists
So I downloaded the "Ubuntu 9.10 alpha 6" VMWare image from here: http://bagside.com/bagvapp/
This came with "g++ (Ubuntu 4.4.1-4ubuntu1) 4.4.1".
It compiled, and generated identical assembly to what you posted. I ran it - and it worked. The virtual machine was setup for only one processor. So I reconfigured the VM for 2 processors and ran it again:
end : cnt_mutex 40000000, cnt_builtins 40000000, cnt_stdatomic 39714790Oops.
I got a better assembly listing using the following: g++ -c -g -Wa,-alh,-L -std=c++0x test.cpp > test.lstThis is a little more readable. You can see what fetch_add() is doing in /usr/include/c++/4.4/bits/atomic_0.h:371
It just seems to be broken - for i686, SMP at least...
gg
lundyp
October 6th, 2009, 04:00 AM
(you shown me I have to remove dust on my databooks :-))
Effectively, i'm using Core2duo with kernel 2.6.31 SMP, and I did not think to do test on single cpu.
I've take a look on fetch_add() you pointed out :
macro _ATOMIC_MODIFY_ is expanded with a call to __atomic_flag_wait_explicit(), which internally calls __atomic_flag_test_and_set_explicit which use a volatile atomic_flag.
According to me and as far i can read on various articles / c++0x drafts : atomic_flag "have to be" lock-free, and maybe it's the point of failure within actually SMP code implementation (at least, in my architecture)
You said it before : since c++0x in not fully standardized, i can see blood... here it is, i've just have to wait
Thanks for all Codeplug !
PS : How do i close this post ? (i'm really newbie...)
lundyp
October 6th, 2009, 07:00 AM
forget my PS, i've find close....
Codeplug
October 6th, 2009, 02:25 PM
I just downloaded a Fedora 11 VM and the code worked fine - it was running gcc 4.4.1 with libstdc++ 20090725. (Ubuntu VM is running libstdc++ 20091003.)
In fact, stepping into fetch_add() showed __sync_fetch_and_add() being called directly, from atomic_2.h.
So I did some digging on how that stuff works - using atomic_0.h vs atomic_2.h etc... It's based on some macro's from bits\c++config.h. To see what they were, I added this code:
#ifdef _GLIBCXX_ATOMIC_BUILTINS_8
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_8 defined")
#else
#pragma message("_GLIBCXX_ATOMIC_BUILTINS_8 NOT defined")
#endif
On Ubuntu I get:
_GLIBCXX_ATOMIC_BUILTINS_1 defined
_GLIBCXX_ATOMIC_BUILTINS_2 defined
_GLIBCXX_ATOMIC_BUILTINS_4 defined
_GLIBCXX_ATOMIC_BUILTINS_8 NOT defined
On Fedora _8 IS defined! So I built on Ubuntu with -D_GLIBCXX_ATOMIC_BUILTINS_8 and it too started using atomic_2.h (where fetch_add() does a simple __sync_fetch_and_add()).
I don't know if this is a bug with libstdc++ or with Ubuntu's gcc/libstdc++ packaging. Either way, there's definitely a bug with using atomic_0.h.
gg
lundyp
October 7th, 2009, 03:48 PM
Hello codeplug,
_GLIBCXX_ATOMIC_BUILTINS_8 seems to be defined (in c++config.h) only if 64bits native operations are supported (for further __sync* operations, and in particular atomic_2.h. Did you run Fedora VM image x86_64 ?), but Ubuntu installed on my desktop is :
Linux xxx 2.6.31-10-generic #35-Ubuntu SMP Tue Sep 22 17:33:42 UTC 2009 i686 GNU/Linux
so all compiled code is 32bits. Since _GLIBCXX_ATOMIC_BUILTINS_8 is not defined, I'm "defaulted" to compile with atomic_0.h because there no implementation of __sync_fetch_and_add for 64bits operations (see #ifdef... in c++config.h), and here is the result :
compiling this code works fine (32bits int):
int cnt_builtins = 0;
int main () {
__sync_fetch_and_add(&cnt_builtins,1);
};
but compiling this one (64bits long long int) :
long long cnt_builtins = 0;
int main () {
__sync_fetch_and_add(&cnt_builtins,1);
};
give error at link-time :
/tmp/ccUs0NaZ.o: In function `main':
mt2.cpp:(.text+0x21): undefined reference to `__sync_fetch_and_add_8'
collect2: ld returned 1 exit status
so I checked how 64bits are handled in assembly with this :
64bits integer operations are done in 2 steps, so I think that __sync_fetch_and_add_8 (64bits) cannot be implemented since it would required 2 assembly locks and 2 consecutives locks may be affected "in the middle" by preemption so 64bits operation would be corrupted -> unsecured (this is my diagnosis... I really don't known if it's works that way, especially concerning kernel preemption !!)
But the above tests (case of *_8/64bits) doesn't explain why atomic operations seems to have bug with atomic_0.h. The only point which is confusing me is the heavy use of volatile in atomic_0.h (force compiler to not to cache and assume possible "external" change)
article on multhreading / volatile / mutex : http://www.ddj.com/cpp/184403766
but for me, volatile means "force re-read before any operation on content" and not "atomically lock content for operation", I have to read more about to well understand and check output of assembly to see if atomic_0.h really generate assembly locks...
see you later
Codeplug
October 7th, 2009, 06:00 PM
I think you are confusing/combining "architecture" with "ELF bit-depth". Both VM's are 32bit kernels, however, the default architecture for gcc on each is different.
int main ()
{
long long cnt_builtins = 0;
__sync_fetch_and_add(&cnt_builtins,1);
return 0;
}To make this compile on Ubuntu, you need to add "-march=i686" on the command line (or i586). This is because gcc on Ubuntu targets 486 by default. However, I still get "NOT defined" when targeting i686.
So it seems to me that _GLIBCXX_ATOMIC_BUILTINS_8 should be determined at compile time (on every compile). So I tested that on the Fedora VM:
compile with -march=i486, got "NOT defined", (and doesn't link)
compile with -march=i686, "defined", (does link)
So on Fedora, bits/c++config.h determines the correct value of _GLIBCXX_ATOMIC_BUILTINS_8 on each compile (as it should). But in the bits\c++config.h on Ubuntu, _GLIBCXX_ATOMIC_BUILTINS_8 is hard coded to 0, no matter what -march you specify.
>> ... doesn't explain why atomic operations seems to have bug with atomic_0.h.
Agreed.
>> The only point which is confusing me is the heavy use of volatile in atomic_0.h
Don't let that confuse you. It's either internal, compiler-specific usage - or it's to agree with cv-qualification given by the latest draft standard. volatile semantics have not changed that much in the latest C++0x draft. See 1.9-6 and 7 in N2857 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2857.pdf).
>> article ... http://www.ddj.com/cpp/184403766
The main intent of this article is to "hack" the C++ typing system. He's not
presuming any semantics for "volatile" -- he's using it purely as a
syntactic tag that's universally supported by C++ and not commonly used by
applications. The idea is that one declares variables that are shared in a
way that the compiler will complain if you reference them in a context
where you don't hold a lock. That is, you lock in a way that removes the
"volatile tag" from the data and allows you to reference it normally. While
this idea is "sound" (in a bizarre sort of way) it's also a misuse of the
compiler typing mechanism.
This sort of thing isn't a bad idea, and could be used as an argument for an
extensible "attributes" mechanism in C++, as in Lisp or some other
languages.
It also, however, suggests that having "volatile" for unlocked references
helps the program -- and in general, for all the reasons David Schwartz has
been trying to explain, that's silly, pointless, and dangerously wrong
unless you're targeting a SPECIFIC implementation that you happen to KNOW
provides a NONSTANDARD meaning to "volatile" which is relevant to threads. From: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/1fa4a82dda916b18/fd6be8f0b18bd62d?#fd6be8f0b18bd62d
A more recent article on the subject: http://www.ddj.com/architect/212701484
gg
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.