1. Creating the Mandelbrot set with a vectorized Scilab algorithm
1.1. Abstract
In this page, we present how to efficiently create a plot of the Mandelbrot set in Scilab. We emphasize the performance difference between a naive algorithm and a vectorized algorithm. We show how Scilab's complex datatype can be used to simplify the algorithm and increase the speed.
These examples must be used under the terms of the CeCILL.
Contents
1.2. Introduction
We discussed the other day of the Mandelbrot set and we wonder if it was possible to get good performances within Scilab to create this set.
Perhaps the simplest algorithm is presented on Wikipedia's http://en.wikipedia.org/wiki/Mandelbrot_set#Computer_drawings. After some research, Allan Cornet found Jonathan Blanchard's blog Writing external functions for Scilab , where Jonathan explained in 2010 how to connect Scilab to a C code using Open MP statements.
The Mandelbrot set makes use of the recurrence equation:
Z = Z^2 + C
where Z and C are complex. For each point C of the complex plane, we initialize the point Z=0+i*0, and use the reccurence equation. The point C is in the Mandelbrot set if all the points in the sequence have an absolute value lower or equal to 2.
Here, we will present first the naive algorithm, which uses 3 nested loops. Then we will compare with a vectorized function and analyze the performance difference.
1.3. The data structure
For each initial point C in the complex plane, we apply the recurrence equation nmax times. If the point C(i,j) comes out of the sequence at the iteration k, we set R(i,j)=k. Otherwise, we set R(i,j)=-1.
R(i,j) = k if the point (i,j) has left the Mandelbrot set at iteration k, R(i,j) = -1 if the point is in the set.
1.4. Plotting the set
The following function plots the fractal associated with the data in R. We consider a colormap with cmaxcolors, which requires to scale the colors. The scaled colors assiociated with the point (i,j) are stored in D(i,j). The points (i,j) which are out of the set have R(i,j)=-1 and are plotted with the color D(i,j)=cmax.
function plotFractal(R,cmax) f=scf(); f.color_map = jetcolormap(cmax); D = R; k = find(R<>-1); D(k) = floor(R(k)/max(R(k))*cmax); k = find(R==-1); D(k) = cmax; Matplot(D); f.children.isoview="on"; f.children.axes_visible=["off" "off" "off"]; endfunction
1.5. Mandelbrot, the naive way
Then the table R is used as an argument of the Matplot function, which creates a 2D plot, where each point (i,j) has a color depending on R(i,j). Notice that our data is not plotted with increasing x and y coordinates, but with increasing indices i and j. The indices in R are from 1 to nmax for points in the Mandelbrot set.
In the following plotMandelbrotNaive function, we compute the Mandelbrot set and plot the associated graphics with the Matplot function.
function R = computeMandelbrotNaive(xsize,ysize,nmax,xmin,xmax,ymin,ymax) xvect = linspace( xmin, xmax, xsize ); yvect = linspace( ymin, ymax, ysize ); R = zeros(xsize,ysize); for i = 1:xsize for j = 1:ysize x = xvect(i); y = yvect(j); x0 = x; y0 = y; k = 0; while( x*x + y*y < 4 & k < nmax ) xtemp = x*x - y*y + x0; y = 2*x*y + y0; x = xtemp; k=k+1; end if k<nmax then R(i,j) = k; else R(i,j) = -1; end end end endfunction
The following script creates a 50-by-50 plot of the Mandelbrot set and measures the time required to perform the computation and the graphics creation. In order to measure the performance, we compute the number of Pixel Per Seconds (PPS), which is equal the number of points divided by the time required to produce it.
xsize = 50; ysize = 50; nmax = 1000; xmin = 0.2675; xmax = 0.2685; ymin = 0.591; ymax = 0.592; tic(); R = computeMandelbrotNaive(xsize,ysize,nmax,xmin,xmax,ymin,ymax); t = toc(); mprintf("Time = %f (s)\n",t); PPS = floor(xsize*ysize/t); mprintf("PPS = %d\n",PPS); plotFractal(R,1000);
The previous script produces the following output.
Time = 2.954000 (s) PPS = 846
Scilab also produces the following plot.
This is a deception, since it requires almost 3 seconds to produce only a 50-by-50 plot.
Fortunately, vectorizing the algorithm is possible, as we are going to see in the next section.
1.6. Mandelbrot, the vectorized way
The following function is a vectorized version of the previous function. In the row matrix J, we store the indices of the points in the Mandelbrot set. Initially, the matrix J contains the indices from 1 to xsize*ysize, meaning that all the points are in the set. During the computation, we are going to remove points from the set, by setting the corresponding entry of J to zero. In the current iteration, the set of indices of points which are in the set is therefore computed with the L = J(J>0) statement. Hence, when the algorithm progresses, there are less and less points in the set so that the matrix L has a decreasing number of entries. The operations are done only on the entries defined in L. This makes the algorithm faster and faster.
function R = computeMandelbrotVect(xsize,ysize,nmax,xmin,xmax,ymin,ymax) xvect = linspace( xmin, xmax, xsize ); yvect = linspace( ymin, ymax, ysize ); [X,Y]=meshgrid(xvect,yvect); Z = zeros(xsize,ysize); R = -ones(xsize,ysize); W = zeros(xsize,ysize); C=X+%i*Y; J = 1:xsize*ysize; for k=0:nmax L = J(J>0); Z(L) = Z(L).^2+C(L); W(L) = abs(Z(L)); M = find(W(L)>2); R(L(M)) = k; J(L(M)) = 0; end R = R'; // The maximum number of colors CMAX = 1000; f=gcf(); f.color_map = jetcolormap(CMAX); D = R; k = find(R<>-1); D(k) = floor(R(k)/max(R(k))*CMAX); k = find(R==-1); D(k) = CMAX; Matplot(D); f.children.isoview="on"; f.children.axes_visible=["off" "off" "off"]; endfunction
The main trick is to use directly the recurrence relation.
The following script produces a 200-by-200 plot of the Mandelbrot set.
stacksize("max"); xsize = 200; ysize = 200; nmax = 1000; xmin = 0.2675; xmax = 0.2685; ymin = 0.591; ymax = 0.592; tic(); R = computeMandelbrotVect(xsize,ysize,nmax,xmin,xmax,ymin,ymax); t = toc(); mprintf("Time = %f (s)\n",t); PPS = floor(xsize*ysize/t); mprintf("PPS = %d\n",PPS); plotFractal(R,1000);
The previous script produces the following output.
Time = 2.602000 (s) PPS = 15372
The previous script produces the following output:
This is much faster since, in terms of pixels per seconds, the factor is 15372/846 = 18.
1.7. Conclusion
We have seen how to vectorize the creation of the Mandelbrot set. We have seen that we may be 20 times faster with a vectorized Scilab algorithm, which makes use of the complex numbers in Scilab.
However, Scilab can be much faster since it makes use of optimized linear algebra libraries such as the Intel MKL or ATLAS. Unfortunately, the elementwise matrix power for complex matrices is not making use of the 4 cores I have on this test machine. To see this, we can use the following script.
stacksize('max'); n = 2000; for k = 1 : 100 A = ones(n,n)+%i*ones(n,n); B = A.^2; end
Only one core makes all the job. This is why Jonathan Blanchard's C source code based on OpenMP is probably faster on a multicore machine. Another possibility would be to connect Scilab to the part of the Intel MKL which provides vector operations. The Intel Vector Math Library (VML) is designed to compute elementary functions on vector arguments:
http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm
In this library, the vzpow function could be used to compute all the elements of the power of a complex matrix, potentially using the several cores:
The Intel VML is not currently available in Scilab.
This specific fractal is a very good example of how to apply vectorization techniques. The equation Z=Z^2+C seems to be designed for the matrix operations of Scilab's complex numbers.
The process by which we can vectorize an algorithm seems non obvious, but documents such as [3] may help to understand the basic tricks, such as combining find statements with matrix extractions. The section 5, "Performances", presents the main vectorization principles, present common vectorization tricks and show how to make a more efficient use of the optimized linear algebra libraries (Intel MKL, ATLAS) which can be used with Scilab.
On the other hand, there are algorithms which are much more difficult to vectorize.
- We can solve sudoku puzzles with techniques which are very close to what humans do in practice. This is the technique used in [4], where we analyze the entries of a sudoku entry-by-entry with a Scilab-based algorithm. This is a process which is fundamentally sequential, since determining one entry allows to eliminate candidates for all the cells in the same row, column or block. Hence, it is not possible to operate on several entries at the same time. This seems very hard to vectorize. We could think of processing several sudokus at once, but this would lead to nowhere, since two sudoku puzzles may have nothing in common in terms of difficulties: the first sudoku may require only a dozen of iterations or so, while the second may require to use recursive backtracking, a technique which usually mean that the sudoku puzzle is moderately or extremelly hard. All in all, vectorizing the resolution of sudoku puzzles seems impossible. By chance, regular sudoku puzzles have only 9-by-9 entries, so that most sudoku puzzles can be solved in 1 or 2 seconds at worst. Only extremelly hard puzzles need more than 10 seconds while the hardest sudokus require up to 1 or 2 minutes.
- We can produce low discrepancy sequences such as the Halton, Sobol or Faure sequence with Scilab [5]. These sequences generate sub-random numbers, which can be used in Monte-Carlo experiments, for example. The algorithm is based on number theory and requires to decompose an integer into its b-ary decomposition, where b is an integer which may depend, for example, on the index of the dimension of the vector to compute. The decomposition of an integer into its b-ary components is a process which is sequential at its heart: there is no way to compute all the digits at the same time, since the value of the digit at index i must be computed prior to the computation of the digit i+1. Hence, vectorizing the generation of a low discrepancy sequence is not possible. This is a major problem, because we generally need from 1000 to 10 000 points (or more) in a space of dimension 5 to 10 (or more). In this case, the performance is a major issue. This is why the toolbox [5] provides the sequences both in script form and in compiled form: the functions based on compiled C source codes are from 100 to 1000 times faster.
There are many more examples which cannot be vectorized. In this case, converting the script to a compiled language is the only solution. This can be done "by hand", that is, by a programmer or automatically, with a tool such as Scilab2C [6].
1.8. Script
The script is provided in attachement:
(Deprecated plotMandelbrot.sce)
1.9. Bibliography
[1] "Mandelbrot set", http://en.wikipedia.org/wiki/Mandelbrot_set#Computer_drawings
[2] "Writing external functions for Scilab", Jonathan Blanchard's blog, http://jblopen.com/node/18
[3] "Programming in Scilab", Consortium Scilab - Digitéo, Michael Baudin, 2011, http://forge.scilab.org/index.php/p/docprogscilab/downloads/
[4] "Sudoku Toolbox", Michael Baudin, 2010, http://forge.scilab.org/index.php/p/sudoku/
[5] "Low Discrepancy Sequences", Consortium Scilab - Digitéo, Michael Baudin, 2012, http://forge.scilab.org/index.php/p/lowdisc/, http://atoms.scilab.org/toolboxes/lowdisc
[6] "Scilab 2 C, Translate Scilab code into C code", Bruno JOFRET, Allan SIMON, Raffaele NUTRICATO, Alberto MOREA, Maria Teresa CHIARADA, 2010, http://atoms.scilab.org/toolboxes/scilab2c