Weekly(28th-3rd June) Report
This is the report of Prateek Papriwal for the GSOC 2012 on the project "Distribution functions" for the period 28th-3rd June 2012.
Implementation of Geometric random generator
distfun_geornd.sci macro committed in the repository . With the addition of this macro into geometric distribution , all the macros (distfun_geostat,distfun_geopdf,distfun_geocdf,distfun_geoinv) have been implemented .
Problems faced However while i had completed the distfun_geornd.sci there was an issue regarding the definition of geometric distribution used . Actually there are two definitions for Geometric distribution , one with Xn as "the number of bernoulli trials after which first success occurs" and another with Xn as "the number of bernoulli trials to get first success" . Previously in other macros also, i had used the second definition while matlab,R use first definition . So as maintain compatability I changed the definition .
Things learnt Also the functions igngeom.c and distfun_grandgeom.c used the same definition i used . I committed the desired modification in code as well as documentation . I used grandgeom function in distfun_geornd.sci to generate random numbers . It helped me learn how can we use .c functions(which are often f2c converted ones) in .sci macro .
Implementation of Unit Tests
Implemented the unit tests for the macros . Studied the papers by Yalta, McCullough to develop some good accuracy tests . Implemented geopdf.csv which contained the input values(Xn,Pr) and output value PDP-P,CDP-P,CDF-Q . I used those values in the tests to check the accuracy of the macros . All the values in the .csv file have 17 significant digits . I also tried to include the critical points in the .csv file . I tried to test for the extreme upper and lower tails i.e values very close to zero and one . I also tried to identify some of the values which leads to numerical distortions .The check for consistency of PDF/CDF was also implemented . Also dia_refs for the tests were also created .
Problems faced The creation of critical input values was a difficult task indeed. Looking for critical inputs was an important step in building accuracy tests . Initially i was not able to create dia_ref and run tests using the command test_run(arguments), but as test_run() function was made for running tests for internal loaded modules so i loaded the external module 'distfun' onto scilab by writing 'cd Scilab' and 'exec loader.sce' in the etc/scilab.start file . Then the tests run and i made the necessary modifications and then committed it .
Problems to deal with the accuracy of the PDF/CDF/INV functions and the accuracy of the RND function can be improved . geornd.tst test only the mean and variance , it needs to be improved upon.
Things I learnt Learnt to create tests by checking the accuracy between the computed and expected result . i learnt how to create a .csv file and then use it in the tests . The paper by Yalta helped me to know how professional developpers were able to produce so inaccurate distribution functions.
The documentation for the macros and tests was written so as to provide good enough idea about the module .
Things to be added Latex and some more examples (to be implemented in Week 4-10th June) and also the creation of .xml files for online help pages creation .
Added few commits to solve the bug . The bug was that the builder.sce script of 'distfun' module did not run on linux(while it runs on windows) . After adding some commits , the builder.sce script ran but then loader.sce showed error . The details of the bug is at - http://bugzilla.scilab.org/show_bug.cgi?id=11127 .
Things learnt Learnt about some old functions present in scilab which needed to be removed .