An Efficient Pointer Wrapper in C++ for Scientific Computation
Benchmark
Declaration and Conclusion
Motivation
For academic researchers (such like me --- a physicist) who need to deal with heavy scientific computation and maintain simulation packages on a regular basis, usually, efficiency is the ultimate goal. However, resource management is also important, such like preventing memory leak. Some of my fellow physicists prefer to use smart pointers such like std::tr1::shared_ptr/boost::shared_ptr, etc. Although the resource management issue can be handled, the efficiency would be affected a lot! Because most of the cases, using boost::shared_ptr would force the scientific researchers to pay for what they never need.
In this article, I will share a piece of code I use often ---- a pointer wrapper with reference counting. It is small, portable, straight forward, with reference counting and automatic memory management. For most of scientific computations I am dealing with, this homemade smart pointer can achieve what is needed with excellent efficiency comparing with boost::shared_ptr. You can find the source code, testing code in the attachment of this article.
Benchmark
To save the reader's time, first the benchmark plot and testing code is shown. the x-axis of the above plot is the variable "counter" in the main function below, which is number of loops for the test. The y-axis is in the unit of second. My pointer wrapper is called BT::wrapper_ptr. The wrapper_ptr is about 3 times faster than boost:shared_ptr. If you are satisfied with the benchmark result and would like to try the this pointer wrapper, please continue to the next section.
#include <time.h>
#include <boost/shared_ptr.hpp>
#include <tr1/memory>
#include "BT_wrapper_ptr.h"
using namespace std;
typedef int type;
typedef type* raw_ptr_type;
typedef BT::wrapper_ptr<type> wrapper_ptr_type;
typedef boost::shared_ptr<type> boost_shared_ptr_type;
typedef tr1::shared_ptr<type> tr1_shared_ptr_type;
template<typename T>
void benchmark(T ptr){
T p1 = ptr;
T p2 = ptr;
T p3 = ptr;
}
int main(int argc, char* argv[]){
size_t count = 1E6;
if(argc > 1)
count = atoi(argv[1]);
clock_t start, end;
double diff;
//raw pointer
{
start=clock();
for (size_t i = 0; i <count ; ++i){
raw_ptr_type ptr(new type);
benchmark(ptr);
delete ptr;
}
end=clock();
}
diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
printf ("%.2lf seconds elapsed for raw pointer.\n", diff );
//BT::wrapper_ptr
{
start=clock();
for (size_t i = 0; i < count ; ++i){
wrapper_ptr_type ptr(new type);
benchmark(ptr);
}
end=clock();
}
diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
printf ("%.2lf seconds elapsed for BT::wrapper_ptr.\n", diff );
//tr1::shared_ptr
{
start=clock();
for (size_t i = 0; i < count ; ++i)
{
tr1_shared_ptr_type ptr(new type);
benchmark(ptr);
}
end=clock();
}
diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
printf ("%.2lf seconds elapsed for tr1::shared_ptr.\n", diff );
//boost::shared_ptr
{
start=clock();
for (size_t i = 0; i < count ; ++i)
{
boost_shared_ptr_type ptr(new type);
benchmark(ptr);
}
end=clock();
}
diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
printf ("%.2lf seconds elapsed for boost::shared_ptr.\n", diff );
return 0;
}
Declaration and Conclusion
The source code can be found in the zip file below. The wrapper is adaptable with containers, has reference counter, has most often used operator overloaded and has type conversion template member functions.
namespace BT{
template<class T>
class wrapper_ptr{
public:
typedef T& reference;
typedef unsigned long ref_type;
explicit wrapper_ptr(T* p=0);
wrapper_ptr& operator= (T* p);
wrapper_ptr(const wrapper_ptr& r) throw();
wrapper_ptr& operator= (const wrapper_ptr& r) throw();
template<typename Y> friend class wrapper_ptr;
//template member functions
template<typename Y> wrapper_ptr(const wrapper_ptr<Y>& r)
template<typename Y> wrapper_ptr& operator= (const wrapper_ptr<Y>& r);
template<typename Y> wrapper_ptr(Y* py);
template<typename Y> wrapper_ptr& operator= (Y* py);
void reset(T* p=0);
reference operator*() const throw();
T* operator->() const throw();
T* get() const throw();
ref_type use_count() const throw();
bool unique() const throw();
~wrapper_ptr();
private:
void dispose() throw();
T* pointer;
ref_type* pref;
}; //wrapper ends here
//overloaded operators
template<typename T1,typename T2>
inline bool operator==(wrapper_ptr<T1> const & l, wrapper_ptr<T2> const & r);
template<typename T1,typename T2>
inline bool operator!=(wrapper_ptr<T1> const & l, wrapper_ptr<T2> const & r);
}
The wrapper can be used just like regular smart pointers and examples can be found in the sample.cpp in the zip file.
In conclusion, if you are like me, always have to seek balance between efficiency and robustness when developing scientific computation programs, then i'd like to share with you this snip of code i use often. This wrapper does not have advanced features as boost:shared_ptr, i.e. the wrapper doesn't take into consideration of multi-threading programming, but the wrapper is straight forward and handy. Hope the wrapper can save you time and be helpful to you.

Comments
Things to improve test
Posted by Philippecp on 07/22/2010 01:02pmInteresting post, however I say one potential bias in the test: your heap could becomes more and more fragmented as the tests go. You should try and run each test on its own. Second, have you looked into boost::make_shared<>()? It will significantly increase the performance of creating the shared pointer since it will hit the heap once instead of twice.
ReplyI read your post and will try some this weekend.
Posted by everid on 07/09/2010 01:55amFormatting still not right!
Posted by Alok Govil on 07/08/2010 04:03pmI hate this formatting issue with Codeguru. Can we talk via email Batao? I can be reached at alokgovil at hot mail com
-
Reply
-
Reply
Replytest result
Posted by everid on 07/10/2010 04:49pmI read your post and will try some this weekend
Posted by everid on 07/09/2010 01:56amFaster than what I expected
Posted by Alok Govil on 07/08/2010 04:00pm== OK. Third try. Codeguru.com gotta fix this issue == The comparison of different pointer types and the relative speeds broadly makes sense to me. There's something that is not. Your code, even for the raw pointer, is going about 3 to 6 times faster than what I would expect. My quick estimate is that for count = 100 million, raw pointer should take about 20 to 40 seconds compared to 6 that it took! Things would make immediate sense to me if the last point on the x-axis of your plot were 10 million instead of 100 million. I have also not taken the second CPU on your machine into account. Could you include the following code into your test and tell me how fast this goes: //no memory allocation { start=clock(); for (size_t i = 0; i
Reply
Faster than what I expected
Posted by Alok Govil on 07/08/2010 03:59pm== Posting again to see if it goes through properly == The comparison of different pointer types and the relative speeds broadly makes sense to me. There's something that is not. Your code, even for the raw pointer, is going about 3 to 6 times faster than what I would expect. My quick estimate is that for count = 100 million, raw pointer should take about 20 to 40 seconds compared to 6 that it took! Things would make immediate sense to me if the last point on the x-axis of your plot were 10 million instead of 100 million. I have also not taken the second CPU on your machine into account. Could you include the following code into your test and tell me how fast this goes: //no memory allocation { start=clock(); for (size_t i = 0; i
Reply
Faster than what I expected
Posted by Alok Govil on 07/08/2010 03:55pmThe comparison of different pointer types and the relative speeds broadly makes sense to me. There's something that is not. Your code, even for the raw pointer, is going about 3 to 6 times faster than what I would expect. My quick estimate is that for count = 100 million, raw pointer should take about 20 to 40 seconds compared to 6 that it took! Things would make immediate sense to me if the last point on the x-axis of your plot were 10 million instead of 100 million. I have also not taken the second CPU on your machine into account. Could you include the following code into your test and tell me how fast this goes: //no memory allocation { start=clock(); for (size_t i = 0; i
Reply