« Archives in April, 2017

Reinventing the Wheel

*This* is something that I wish I had found affirmation for before (though I suppose I’ve been stubborn enough on that front not to have needed it). I’ve always sort of believed this, as one of my ‘personal heresies’ – things that you know are true that society doesn’t let you acknowledge. If you really want to understand something – to know fundamentally how it works, not just to stack someone else’s black box into the assembly of whatever you are working on, you pretty much *have* to reinvent the wheel. If you reinvent the wheel, it’s *your* wheel. In my case, I can reconstitute how something works and remember it far more effectively if I’ve done that at some point in the past, than if it were just some random magic-trick I couldn’t take the time to understand.


From Longuski, James, Advice to Rocket Scientists

Big Iron

Recently I’ve purchased a wonderful device: An Nvidia Tesla M2070 GPU.


Over the past 8-10 years, a new field in high-power computing has been developing: that of using the computational power of graphics processing units to perform numerical computation.


The requirements for rendering detailed and complex images in real-time led to the development of special purpose processors for graphics cards. Specific applications then drove the development of GPUs intended for general purpose computing. These processors, while limited in some respects relative to supercomputing clusters with large numbers of independent cores, can nevertheless perform floating point operations at a comparable pace.


The NVIDIA tesla M2070 is rated at 515 Gflops (billion floating point operations per second), and has 6 GB of memory. I bought mine from e-bay for $100. The bitcoin bubble (for which a lot of these GPUs were repurposed) has burst, and a lot of powerful hardware is being resold right now for peanuts!

Eventually I want to be able to do this: https://www.youtube.com/watch?v=vYA0f6R5KAI

That wasn’t done on a supercomputer. That was done on a single GPU. That is how much power that is within easy reach of anyone operating an (appropriately powered, appropriately cooled) desktop PC these days.

So far, I’ve been following the tutorials for writing SIMT/SIMD programs using the CUDA API. Another API that is used to program these GPUs is OpenCL, which I’ll get to after I get through the CUDA tutorials.

So far, the tesla, along with my graphics card that does actual graphics, have been able to accept command kernels, allocate, deallocate, and transfer memory. It performed 4.1943E16 integer operations in about ten seconds – it took far longer just to randomize the integer vectors on the host side than it did to run the operations on the device side. (edit: on reexamination, it couldn’t possibly be running this calculation to completion. I think I’m overwriting the execution in a host loop – will have to check…)

One problem that I will have to address though is the heat generated by this GPU. Even idling, the card gets rather worrisomely hot. When I start stress-testing it, it begins to act weird about a second or two into the computation and gets hot as an oven. (Solder starts to melt at 230-ish C, so I’d better watch out!) Conservation of energy says that I’m spending electricity as if I have an always on toaster. There are heat-sink ribs on the bottom of the device that are oriented such that it looks like they expect a fan to blow air across the length of the card. I’ll have to figure out how to design a bracket that will hold a fan to the back of the case so that I don’t burn the card up in future use.

Anyway, with luck (lack of migraines), and frantic typing, I should have a CFD engine for structured grids banged together. Then I want to investigate this octree-based grid that SpaceX is using to such amazing effect in their modeling efforts.