Polymorphism and derived classes in CUDA / CUDA Thrust

I am not going to attempt to answer everything in this question, it is just too large. Having said that here are some observations about the code you posted which might help:

  • The GPU side new operator allocates memory from a private runtime heap. As of CUDA 6, that memory cannot be accessed by the host side CUDA APIs. You can access the memory from within kernels and device functions, but that memory cannot be accessed by the host. So using new inside a thrust device functor is a broken design that can never work. That is why your “vector of pointers” model fails.
  • Thrust is fundamentally intended to allow data parallel versions of typical STL algorithms to be applied to POD types. Building a codebase using complex polymorphic objects and trying to cram those through Thrust containers and algorithms might be made to work, but it isn’t what Thrust was designed for, and I wouldn’t recommend it. Don’t be surprised if you break thrust in unexpected ways if you do.
  • CUDA supports a lot of C++ features, but the compilation and object models are much simpler than even the C++98 standard upon which they are based. CUDA lacks several key features (RTTI for example) which make complex polymorphic object designs workable in C++. My suggestion is use C++ features sparingly. Just because you can do something in CUDA doesn’t mean you should. The GPU is a simple architecture and simple data structures and code are almost always more performant than functionally similar complex objects.

Having skim read the code you posted, my overall recommendation is to go back to the drawing board. If you want to look at some very elegant CUDA/C++ designs, spend some time reading the code bases of CUB and CUSP. They are both very different, but there is a lot to learn from both (and CUSP is built on top of Thrust, which makes it even more relevant to your usage case, I suspect).

Leave a Comment