A large problem can usually be devided into smaller tasks that operate together in order to create a solution. This includes painting the house. Say you need to buy 5 liters of paint and 5 brushes before having to paint the whole house. You can either run out and buy everything and paint the whole house yourself, or you can get help by friends or rent painters.
You probably want to do the latter, get help. In order to save time, you go out and buy the paint, and another person gets the brushes. Then you get help from 4 persons that will paint one wall of the house, each. This will save you time because you get help from many persons, working on the same solution in parallel.
This applies to computing as well. Say you want to add two vectors v(x,y,z) and u(x,y,z), where v=(1,2,3) and u=(4,5,6). You do this by saying v + u = x, (1,2,3)+(4,5,6)=(1+4, 2+5, 3+6)=(5,7,9). You can do this yourself, one calculation at a time, but as you probably can see, this problem can be devided into smaller problems. You can have one “person” adding the x components together, another adding the y components together and a third adding the z components together:
Each person in the table above got the exact same procedure on doing their tasks: a+b=c, but each with different numbers and results.
This isn’t new. Parallel computing (wikipedia) have existed in many years, and PCs got multiple CPUs to handle tasks in parallel, increasing the execution speed of the different applications that implement parallel processes. Above, you can think of a person as a process, or a thread, but don’t think too much about these words just yet as these will be covered later. The computer can then send each of these processes to different processors, each executing a task(calculation) in parallel.
Nowdays, most computers got multiple processors that can handle multitasking. Heavy applications can run with great performance using the available resources on a computer. But what if you need additional power in your applications? Should you get another processors, or upgrade your system in a way? It all depends on what solution and requirements your application have, but one solution could be the use of a GPU (wikipedia).
The GPU what? Its the Graphics processing unit that handles all the graphics on your desktop or in many games, offloading your CPU with the heavy processing of graphical applications. The CPU got enough by having to calculate Artificial Intelligence and Collision detection in games, so any help is welcome. The GPU got a heavy parallel architecture, making the really effective for arithmetic operations and calculations, and a great friend of the CPU.
(Image taken from nVidia)
The purpose of this tutorial is to help you get started with parallel computing on the GPU using a language named CUDA C. CUDA C is created by nVidia and is a C-like programming language created spesifically for creating applications using the GPU for parallel computing. A few other languages does also exist like OpenCL and DirectCompute(DirectX 11), but as CUDA C is the only language i know, it’s the natural selection for this tutorial. They all base on the same principles, so it really doesn’t matter what you learn.
But before deviding into the programming, let’s get your computer up and running with CUDA! First of all, you will need a pretty new GPU (from 2007+ with more than 256MB of memory will probably work, but check www.nvidia.com/cuda if unsure) that is CUDA-enabled. I got the nVidia GeForce 480GTX, but the newest 500 series looks amazing.
Important: Make sure to also install the latest driver!!
Then, you will need the tools! This is where the CUDA Development Toolkit comes into the picture. You can download it from here: http://developer.nvidia.com/object/gpucomputing
(Direct link to the download page for “CUDA Toolkit 3.2”: Download the CUDA Toolkit 3.2 http://developer.nvidia.com/object/cuda_3_2_downloads.html)
On the downloads page, find the “CUDA Toolkit” and download either the 32bit or the 64bit, based on what system you got. Once download completes, install the software.
The GPU Computing SDK comes with many handy code samples and documents that will kickstart you GPU Compute skills.
Now, once the CUDA Toolkit is installed, you can write CUDA C applications using your favorite text editor application. I use notepad. To compile an application, you can use the Visual Studio 2008 command prompt (to get the right paths to VS and linkers), and use nvcc.exe to compile.
Test if the installation is a success
Let’s try this out. A really really simple CUDA application that is working looks like any other C code:
int main( void )
printf( “Hello, World!” );
This source might come as a supprise for you. Acctually, you can type any C application using CUDA. The real magic happens when we start deciding what functions we want to execute on the CPU and what we want to execute on the GPU.
Ok, let’s compile this example. Write the code above in your favorite texteditor, and save it as “TestCUDA.cu”.
Next, lets compile and build our application. Still in the Console Window, on the same path as where you saved “TestCUDA.cu”, type the following command:
nvcc –o test.exe TestCUDA.cu
and hit [ENTER]. This will build the application and create a EXE file named “test.exe”.
Now, if you type “test.exe”, your first CUDA C application will run and print “Hello, World” on the screen. Pretty neat, huh?
If you got any problems compiling, copy the errormessage and make a search. Most of the common mistakes and errors got solutions out there. Good luck! (if you downloaded the 64bit version of the CUDA Toolkit, try uninstalling this and test the 32bit version.)
Thats it for now, see you in Tutorial 2 of this series.
Inspiration for learning CUDA