An Efficient Pointer Wrapper in C++ for Scientific Computation

Motivation
Benchmark
Declaration and Conclusion

Motivation

For academic researchers (such like me --- a physicist) who need to deal with heavy scientific computation and maintain simulation packages on a regular basis, usually, efficiency is the ultimate goal. However, resource management is also important, such like preventing memory leak. Some of my fellow physicists prefer to use smart pointers such like std::tr1::shared_ptr/boost::shared_ptr, etc. Although the resource management issue can be handled, the efficiency would be affected a lot! Because most of the cases, using boost::shared_ptr would force the scientific researchers to pay for what they never need.

In this article, I will share a piece of code I use often ---- a pointer wrapper with reference counting. It is small, portable, straight forward, with reference counting and automatic memory management. For most of scientific computations I am dealing with, this homemade smart pointer can achieve what is needed with excellent efficiency comparing with boost::shared_ptr. You can find the source code, testing code in the attachment of this article.

Benchmark

To save the reader's time, first the benchmark plot and testing code is shown. the x-axis of the above plot is the variable "counter" in the main function below, which is number of loops for the test. The y-axis is in the unit of second. My pointer wrapper is called BT::wrapper_ptr. The wrapper_ptr is about 3 times faster than boost:shared_ptr. If you are satisfied with the benchmark result and would like to try the this pointer wrapper, please continue to the next section.
#include <time.h>
#include <boost/shared_ptr.hpp>
#include <tr1/memory>
#include "BT_wrapper_ptr.h"

using namespace std;

typedef int type;
typedef type* raw_ptr_type;
typedef BT::wrapper_ptr<type> wrapper_ptr_type;
typedef boost::shared_ptr<type> boost_shared_ptr_type;
typedef tr1::shared_ptr<type> tr1_shared_ptr_type;

template<typename T>
void benchmark(T ptr){
  T p1 = ptr;
  T p2 = ptr;
  T p3 = ptr;
}

int main(int argc, char* argv[]){
  size_t count = 1E6;
  
  if(argc > 1)
    count = atoi(argv[1]);

  clock_t start, end;
  double diff;
  
  //raw pointer
  {
    start=clock();
    for (size_t i = 0; i <count ; ++i){
	raw_ptr_type ptr(new type);
	benchmark(ptr);
	delete ptr;
      }
    end=clock();
  }
  diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
  printf ("%.2lf seconds elapsed for raw pointer.\n", diff );
  
  //BT::wrapper_ptr
  {
    start=clock();
    for (size_t i = 0; i < count ; ++i){
	wrapper_ptr_type ptr(new type);
	benchmark(ptr);
      }
    end=clock();
  }
  diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
  printf ("%.2lf seconds elapsed for BT::wrapper_ptr.\n", diff );

  //tr1::shared_ptr
  {
    start=clock();
    for (size_t i = 0; i < count ; ++i)
      {
	tr1_shared_ptr_type ptr(new type);
	benchmark(ptr);
      }
    end=clock();
  }
  diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
  printf ("%.2lf seconds elapsed for tr1::shared_ptr.\n", diff );

  //boost::shared_ptr
  {
    start=clock();
    for (size_t i = 0; i < count ; ++i)
      {
	boost_shared_ptr_type ptr(new type);
	benchmark(ptr);
      }
    end=clock();
  }
  diff=static_cast<double>(end-start)/CLOCKS_PER_SEC;
  printf ("%.2lf seconds elapsed for boost::shared_ptr.\n", diff );
  
  return 0;
}

Declaration and Conclusion

The source code can be found in the zip file below. The wrapper is adaptable with containers, has reference counter, has most often used operator overloaded and has type conversion template member functions.
namespace BT{
 
  template<class T>
  class wrapper_ptr{
  public:
    typedef T& reference;
    typedef unsigned long ref_type;
    explicit wrapper_ptr(T* p=0);    
    wrapper_ptr& operator= (T* p);
    wrapper_ptr(const wrapper_ptr& r) throw();
    wrapper_ptr& operator= (const wrapper_ptr& r) throw();
    template<typename Y> friend class wrapper_ptr;

    //template member functions
    template<typename Y> wrapper_ptr(const wrapper_ptr<Y>& r)
    template<typename Y> wrapper_ptr& operator= (const wrapper_ptr<Y>& r);
    template<typename Y> wrapper_ptr(Y* py);
    template<typename Y> wrapper_ptr& operator= (Y* py);
    void reset(T* p=0);
    reference operator*() const throw();
    T* operator->() const throw();
    T* get() const throw();
    ref_type use_count() const throw();
    bool unique() const throw();
    ~wrapper_ptr();

  private:
    
    void dispose() throw();
    T* pointer; 
    ref_type* pref; 
  }; //wrapper ends here
  
  //overloaded operators
  template<typename T1,typename T2>
    inline bool operator==(wrapper_ptr<T1> const & l, wrapper_ptr<T2> const & r);
  
  template<typename T1,typename T2>
    inline bool operator!=(wrapper_ptr<T1> const & l, wrapper_ptr<T2> const & r);
}
The wrapper can be used just like regular smart pointers and examples can be found in the sample.cpp in the zip file. In conclusion, if you are like me, always have to seek balance between efficiency and robustness when developing scientific computation programs, then i'd like to share with you this snip of code i use often. This wrapper does not have advanced features as boost:shared_ptr, i.e. the wrapper doesn't take into consideration of multi-threading programming, but the wrapper is straight forward and handy. Hope the wrapper can save you time and be helpful to you.


About the Author

Botao Jia

Botao Jia is currently a graduate student in the PhD program at Duke University (USA) physics department. One of his research project was to develop a simulation package for Duke SRFEL. He finished a physics bachelor degree and a computer science bachelor degree at University of Science and Technology of China (USTC). He also has a Master of Science degree at Duke University statistical science department. He is preparing to finish his PhD in the year of 2011. His physics research related publications can be found at: http://prst-ab.aps.org/abstract/PRSTAB/v13/i6/e060701 and at: http://prst-ab.aps.org/abstract/PRSTAB/v13/i8/e080702

Downloads

Comments

  • Cheap Oakley Dart sale online

    Posted by zyjxygqug on 07/05/2013 02:00pm

    Cheap Ray Ban Sale ,Regardless of what product, providing careful after numerous years of experience in design, production and market demand will likely be loved by the public. Sunglasses in the field of sports can be stated that this brightest spot from the retro elements in their production isn't much, in addition to employ a large amount of style. Oakley Eyewear lenses in bright light top in sunny conditions, all of the lens use a truer color perception. Cheap Oakley Flak Jacket ,Oakley and enterprises to create regular success, because of the amazing efforts. Nearly all generation, which Oakley exposed, exposed improvement. All those defending the clothing colors will have to protect yourself, and sometimes terrible events, as the situation is like direct sunlight: unique. Oakley sunglasses, superiority and good design is able to reduce sunshine from your eyes, they may be very popular exclusive appearance and fashion template. To recognise that the repair of good sunglasses can not only extend the duration useful of sunglasses, and also can better make sure the sunglasses eye protection role. Cheap Oakley Sunglasses ,This can be the most appropriate to your particular face shape, Iridium lens coating reduces glare and adjusts the transmission, and really create virtually any lighting conditions are appropriate. . Authentic goggles, simply because happen to be called, has become suitable for the C5 products. The product or service is very constructed from five metal is heated following your merger, an incredibly difficult creative. Ray Ban Outlet ,Artists and professional sunglasses, just like expensive for your kind of movement, greater than the price tag on almost all sunglasses. However, you will discover many different discount Oakley sunglasses available on the web. Oakley sunglasses shape and stainless metal injection molding, creating a convenient button release a each shot, the handle towards smallest detail. longchamp pas cher ,A selection, allowing guests to marvel with the high-definition optical measurement of lens to enlarge the image through the refractive power with the test machine, virtually eliminating distortion a result of the magnification. OAKLEY truly accurate optical technology, expensive Strengthening research for countless years and also the firm manufacturing. Use of such retailers to check on effortlessly Oakley sunglasses incredible cost - Trends, simple and easy to obtain the artist's sunglasses.

    Reply
  • Things to improve test

    Posted by Philippecp on 07/22/2010 01:02pm

    Interesting post, however I say one potential bias in the test: your heap could becomes more and more fragmented as the tests go. You should try and run each test on its own. Second, have you looked into boost::make_shared<>()? It will significantly increase the performance of creating the shared pointer since it will hit the heap once instead of twice.

    Reply
  • I read your post and will try some this weekend.

    Posted by everid on 07/09/2010 01:55am

    Hi Alok,
    I read your post and i will try some trails this weekend. today is a little bit too late.
    your post is complete in my email account, so no worries about the formatting.
    i will send you emails via the address you gave me.

    Reply
  • Formatting still not right!

    Posted by Alok Govil on 07/08/2010 04:03pm

    I hate this formatting issue with Codeguru. Can we talk via email Batao? I can be reached at alokgovil at hot mail com

    • test result

      Posted by everid on 07/10/2010 04:49pm

      HI Alok,
      Thanks for your comments and interests in the articles.
      i followed your suggestion and tried some test, the followings are the result:
      the source code is attached in this email. The platform is still the same one i described in my article (in the textbox on the figure).
      
      **************************************
      
      A)  count = 1E8; g++ -O0 smartptr_test.cpp
      output:
      0.77 seconds elapsed for raw pointer (no memory allocation).
      7.41 seconds elapsed for raw pointer.
      24.71 seconds elapsed for BT::wrapper_ptr.
      65.06 seconds elapsed for tr1::shared_ptr.
      65.05 seconds elapsed for boost::shared_ptr.
      
      
      B)  count = 1E8; g++ -O1 smartptr_test.cpp
      output:
      0.36 seconds elapsed for raw pointer (no memory allocation).
      6.77 seconds elapsed for raw pointer.
      15.79 seconds elapsed for BT::wrapper_ptr.
      20.07 seconds elapsed for tr1::shared_ptr.
      35.65 seconds elapsed for boost::shared_ptr.
      
      C)  count = 1E8; g++ -O2 smartptr_test.cpp
      output:
      0.36 seconds elapsed for raw pointer (no memory allocation).
      6.73 seconds elapsed for raw pointer.
      15.25 seconds elapsed for BT::wrapper_ptr.
      17.56 seconds elapsed for tr1::shared_ptr.
      32.96 seconds elapsed for boost::shared_ptr.
      
      D)  count = 1E8; g++ -O3 smartptr_test.cpp
      output:
      0.00 seconds elapsed for raw pointer (no memory allocation).
      6.65 seconds elapsed for raw pointer.
      13.12 seconds elapsed for BT::wrapper_ptr.
      19.22 seconds elapsed for tr1::shared_ptr.
      32.44 seconds elapsed for boost::shared_ptr.
      
      **************************************
      
      it seems to me that -O3 is the best option for this particular case, the size of the executable file is also the smallest one among all optimization options.

      Reply
    • I read your post and will try some this weekend

      Posted by everid on 07/09/2010 01:56am

      Hi Alok,
      I read your post and i will try some trails this weekend. today is a little bit too late.
      your post is complete in my email account, so no worries about the formatting.
      i will send you emails via the address you gave me.

      Reply
    Reply
  • Faster than what I expected

    Posted by Alok Govil on 07/08/2010 04:00pm

    == OK. Third try. Codeguru.com gotta fix this issue == The comparison of different pointer types and the relative speeds broadly makes sense to me. There's something that is not. Your code, even for the raw pointer, is going about 3 to 6 times faster than what I would expect. My quick estimate is that for count = 100 million, raw pointer should take about 20 to 40 seconds compared to 6 that it took! Things would make immediate sense to me if the last point on the x-axis of your plot were 10 million instead of 100 million. I have also not taken the second CPU on your machine into account. Could you include the following code into your test and tell me how fast this goes: //no memory allocation { start=clock(); for (size_t i = 0; i Reply

  • Faster than what I expected

    Posted by Alok Govil on 07/08/2010 03:59pm

    == Posting again to see if it goes through properly == 
    
    The comparison of different pointer types and the relative speeds broadly makes sense to me.
    
    There's something that is not.  Your code, even for the raw pointer, is going about 3 to 6 times faster than what I would expect.  My quick estimate is that for count = 100 million, raw pointer should take about 20 to 40 seconds compared to 6 that it took!  Things would make immediate sense to me if the last point on the x-axis of your plot were 10 million instead of 100 million.  I have also not taken the second CPU on your machine into account.
    
    Could you include the following code into your test and tell me how fast this goes:
    
    //no memory allocation
      {
        start=clock();
        for (size_t i = 0; i 

    Reply
  • Faster than what I expected

    Posted by Alok Govil on 07/08/2010 03:55pm

    The comparison of different pointer types and the relative speeds
    broadly makes sense to me.
    
    There's something that is not.  Your code, even for the raw pointer,
    is going about 3 to 6 times faster than what I would expect.  My quick
    estimate is that for count = 100 million, raw pointer should take about
    20 to 40 seconds compared to 6 that it took!  Things would make
    immediate sense to me if the last point on the x-axis of your plot were
    10 million instead of 100 million.  I have also not taken the second
    CPU on your machine into account.
    
    Could you include the following code into your test and tell me how
    fast this goes:
    
    //no memory allocation
      {
        start=clock();
        for (size_t i = 0; i 

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

  • Live Event Date: August 20, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT When you look at natural user interfaces as a developer, it isn't just fun and games. There are some very serious, real-world usage models of how things can help make the world a better place – things like Intel® RealSense™ technology. Check out this upcoming eSeminar and join the panel of experts, both from inside and outside of Intel, as they discuss how natural user interfaces will likely be getting adopted in a wide variety …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds