CUDA implemantation of numerical algorithms

Abstract:

In order to benefit from recent technologies of GPGPU that allow significant speedup in numerical computation, this proposal aims at porting the main SciLab algorithms (LU decomposition, FFT, eigenvalues computation...) to CUDA architecture, so that the software can use the GPU to do its computation.

Various details : -A new type will be introduced, "GPU Matrix", used for optimised interaction with the GPU.

Goals :

- As there will be a new type, the "elementary" operation will be implanted :

gpuAdd
gpuMult
exp
ln
cos
sin
^
toGPU (transfer matrix from host memory to GPU memory)
fromGPU(transfer from GPU to host)

So the user will be able to build matrix from operation. This should take ~a month

- Following functions in scilab will be "ported", prefixed with gpu- :

lu
inv
det
qr
chol
hess
det
lsq

That means a new toolbox will ship with functions gpulu, gpuinv, gpudet, gpuqr...

Extended use of the cublas library (proved to work with SciLab at the moment of writing), copy/paste of Fortran code with workaround when necessary (for instance, it's not possible to use value from gpu memory straight forward). As Cublas, Fortran and SciLab work with Column-Major Matrix, it will make the task a little easier.

- Some functions from FFTW library will be ported, some investigation need to be done at the moment of writing to make a planning.

Limitations (at least for the first release) :

It will only support one GPU per SciLab process* (ie if you have 2 GPU, only one will be used if you launch a lone instance of SciLab, and both of them will be used if you launch 2 time SciLab). As double is the "core" precision for SciLab, every function will work at double precision. The targeted required CUDA version is 1.1 (G92, ie from GeForce 8400, with noticeable exception of 8800 GTS and 8800 GTX). No special optimisation for integrated chipset (ie no ZeroCopy feature).

Student time line

See here for the Google Summer of Code planning.

*If everything work as planned...