 [Contents] [TitleIndex] [WordIndex]

# Overview of Probabilities and Statistics in Scilab

In this page, we present a list of ressources for Probabilities and Statistics in Scilab. We present documents, tutorials and software tools in this field.

## Key features in Scilab

Scilab provides the following features:

• 6 Uniform Random Number generators in grand, including Mersenne Twister by Matsumoto and Nishimura. This generator has a large period about 2^19937 so that any simulation is guaranteed to use only a small fraction of the overall period. Example:

`u = grand(1000,5,"def")`
• 16 Non-uniform Random Number Generators: Normal, Exponential, Poisson, Geometric, etc... Example:

`Y=grand(m,n,"nor",av,sd)`
• 11 Cumulated Distribution Functions and their inverse: Normal (cdfnor), Poisson (cdfpoi), Exponential, Beta, etc... Provides P,Q for increased accuracy for small probabilities. Example:

`[P,Q]=cdfnor("PQ",X,Mean,Std)`
• Central tendency : mean, trimmean, etc...
• Data with missing values : nanmean, nanmin, nanmax, etc...
• Descriptive statistics : center, covar, median, variance, st_deviation, etc...
• Measures of dispersion : iqr (interquartile range), etc...
• Measures of shape : moment, perctl, etc...
• One-factor ANOVA : ftest (Fisher ratio), etc...
• Principal component analysis : princomp, pca (standardized), show_pca.
• Sampling : sample, samwr, etc...

Graphics:

• histplot : plot a histogram
• bar : bar histogram
• barh : horizontal display of bar histogram
• pie : draw a pie

Data fitting and parameter identification:

• reglin : linear regression
• datafit : parameter identification based on measured data

The following plot comes from the scidemo module (http://forge.scilab.org/index.php/p/scidemo). It compares the normal distribution function with normal random numbers generated by the grand function. Moreover, confidence intervals are computed based on numerical integration of the normal distribution function. Some of these functions are based on the work by Carlos Kliman on Labostat .

Scilab makes use of the Open Source Library Dcdflib, by Barry W. Brown, James Lovato, Kathy Russell. Scilab uses the Fortran version of the Dcdflib. The Dcdflib library is known for its accuracy.

## Toolboxes

In this section, we describe toolboxes which are providing features in statistics for Scilab.

We review the following toolboxes:

• Stixbox, a statistics toolbox,
• Low Discrepancy Toolbox, a toolbox providing low discrepancy sequences,
• NIST Dataset: a toolbox providing datasets from NIST,
• Regression Tools : a toolbox for linear and non linear regression analysis,
• NaN-Toolbox : a toolbox for classification and statistics,
• NISP toolbox : a toolbox for sensitivity analysis,
• libsvm toolbox : a toolbox for Support Vector Machines.

### Stixbox

Stixbox is a statistics toolbox which provides distribution functions, datasets, statistical tests and plotting facilities.

Stixbox is developped on Scilab's Forge :

and is available on ATOMS :

Features

• Probability Distribution Functions: pbeta, pbinom, pnorm, etc...
• Datasets: 23 datasets Cost-of-Living, Scottish Hill Race, Salary Survey, Unemployment, etc...
• Graphics : histo, pairs, plotdens, plotsym, qqnorm, qqplot
• Inverse Cumulated Distribution Functions: qbeta, qbinom, qchisq, etc...
• Logistic Regression: lodds, loddsinv, logitfit, etc...
• Miscellaneous: linreg, betainc, bincoef, polyfit, stdboot, etc...
• Probability Distribution Functions: dbeta, dbinom, dchisq, ...
• Random Numbers: rbeta, rbinom, rchisq, etc...
• Reject Methods: rjbinom, rjgamma, rjpoiss.
• Tests, confidence intervals and model estimation: Nonparametric confidence interval for quantile

### Low Discrepancy Toolbox

The goal of this toolbox is to provide a collection of low discrepancy sequences. These random numbers are designed to be used in a Monte-Carlo simulation. For example, low discrepancy sequences provide a higher convergence rate to the Monte-Carlo method when used in numerical integration. The toolbox takes into account the dimension of the problem, i.e. generate vectors with arbitrary size.

The low discrepancy toolbox is developped on Scilab's Forge:

and is available on ATOMS:

Overview of sequences :

• The Halton sequence,
• The Sobol sequence,
• The Faure sequence,
• The Reverse Halton sequence of Vandewoestyne and Cools,
• The Niederreiter base 2 and arbitrary base sequence.

Main features :

• manage arbitrary number of dimensions,
• skips a given number of elements in the sequence,
• leaps (i.e. ignores) a given number of elements from call to call,
• fast sequences based on compiled source code,
• suggest optimal settings to use the best of the sequences.

This module currently provides the following functions:

• lowdisc_cget : Returns the value associated with the given key for the given object.
• lowdisc_configure : Update one option of the current object and returns an updated object.
• lowdisc_destroy : Destroy the current object and returns an updated object.
• lowdisc_display : Prints the current sequence.
• lowdisc_new : Creates and returns a new sequence.
• lowdisc_next : Returns the next vector in the sequence.
• lowdisc_reset : Reset the random number generator.
• lowdisc_startup : Startup a random number object.
• lowdisc_terms : Returns several terms of the sequence.

Provides the following functions to extend the maximum dimension of the Halton and Faure sequences :

• lowdisc_primes100 : Returns a matrix containing the 100 first primes.
• lowdisc_primes1000 : Returns a matrix containing the 1000 first primes.
• lowdisc_primes10000 : Returns a matrix containing the 10000 first primes.

Provides the following functions to suggest expert settings for the sequences :

• lowdisc_fauresuggest : Returns favorable parameters for Faure sequences.
• lowdisc_haltonsuggest : Returns favorable parameters for Halton sequence.
• lowdisc_niederbase : Returns optimal base for Niederreiter sequence.
• lowdisc_niedersuggest : Returns favorable parameters for Niederreiter sequence.
• lowdisc_sobolsuggest : Returns favorable parameters for Sobol sequences.
• lowdisc_soboltau : Returns favorable starting seeds for Sobol sequences.

This component currently provides the following sequences:

• "slow" sequences based on macros : Halton, Sobol, Faure, Reverse Halton, Niederreiter base 2,
• "fast" sequences based on C source code : Halton, Sobol, Faure, Reverse Halton, Niederreiter in arbitrary base.

To install it, type :

`atomsInstall('lowdisc')`

The following example plots the 2D Faure sequence.

```lds = lowdisc_new("fauref");
lds = lowdisc_configure(lds,"-dimension",2);
lds = lowdisc_startup (lds);
[lds,computed] = lowdisc_next (lds,100);
lds = lowdisc_destroy(lds);
plot(computed(:,1),computed(:,2),"bo");
xtitle("Faure sequence","X1","X2");```

This produces the following figure.

This module was first reviewed at 17th June 2010 : Low Discrepancy Sequences

### NIST Dataset

The goal of this toolbox is to provide a collection of datasets distributed by NIST.

The NIST Standard Reference Datasets is a collection of datasets. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

The following is a list of functions in this toolbox.

• nistdataset_getpath — Returns the path to the current module.

Moreover, the module provides 34 datasets from NIST in the following categories:

• Univariate Summary Statistics (9 datasets)
• Non Linear Least Squares (25 datasets)

Datasets from other categories are provided on the NIST website, which cannot be read by the current toolbox. However, it should be straightforward to extend the current toolbox to read the other categories of files.

The reference website for this project is the Statistical Reference Datasets:

The nistdataset toolbox is developped on Scilab's Forge:

and is available on ATOMS:

To install it, type:

`atomsInstall('nistdataset')`

The nistdataset_read function reads a dataset from NIST. In the following example, we read the Gauss2 dataset in Scilab.

```path = nistdataset_getpath();
filename = fullfile(path,"datasets","nls","lower","Gauss2.dat");

The previous script produces the following output:

```-->data = nistdataset_read(filename)
data  =

NISTDTST Object:
===========

name: Gauss2
category: Nonlinear Least Squares Regression
description:
The data are two slightly-blended Gaussians on a
decaying exponential baseline plus normally
distributed zero-mean noise with variance = 6.25.
reference:
Rust, B., NIST (1996).
datastring:
1 Response  (y)
1 Predictor (x)
250 Observations
Lower Level of Difficulty
Generated Data
model:
Exponential Class
8 Parameters (b1 to b8)

residualSumOfSquares: 1247.5282
residualStandardDev: 2.270479
degreeFreedom: 242
numberOfObservations: 250
x: 250-by-1 constant matrix
y: 250-by-1 constant matrix
start1: 8-by-1 constant matrix
start2: 8-by-1 constant matrix
parameter: 8-by-1 constant matrix
standarddeviation: 8-by-1 constant matrix
sampleMean: []
sampleSTD: []
sampleAutocorr: []```

From there, it is easy to access to the x and y fields of the data structure:

```-->size(data.x)
ans  =
250.    1.
-->data.x
ans  =
1.
2.
3.
4.
5.
[...]
-->size(data.y)
ans  =
250.    1.
-->data.y
ans  =
97.587761
97.763443
96.567047
92.52037
91.15097
95.217278
[...]```

For example, the following script:

```scf();
plot(data.x,data.y,"bo")```

The previous script produces the following output: ### Regression Tools

The toolbox regtools provides three functions for performing linear and non linear regression analysis.

The regtools module provides the following functions:

• linregr() : an interactive user interface for linear regression analysis, including plot facilities and the most relevant statistical information at the solution.
• nlinregr() : an interactive user interface for performing non linear (weighted) regression analysis. Also here plot facilities and statistical information are available. Both functions can be called in silent command line mode.
• nlinlsq() : a more flexible non linear (weighted) regression analysis function - called by nlinregr(). nlinlsq() uses the scilab function optim() for solving the regression problem. Supports both analytical and numerical derivatives.
• qqplot() : quantile-quantile plots.

This module is developped by Torbjorn Pettersen.

It is available on ATOMS:

To install it, type:

`atomsInstall('regtools')`

The following plot is a demo of the Regression toolbox. A more complete description of this module is available at:

### NaN-Toolbox

This toolbox is for classification and statistics. This toolbox is especially written for data with missing values encoded as NaN. It is a Scilab port of the nan-toolbox for matlab/octave.

This toolbox is developped by Holger Nahrstaedt under GPL (2.1).

The classification routines are routines for train a classificator (nan_train_sc, nan_classify, svmtrain, train ) and routines for testing (nan_test_sc, predict, svmpredict) and visualisation (nan_confusionmat, nan_partest, nan_rocplot).

The Nan toolbox provides the following functions.

• Data Correlation and Covariance
• nan_conv — Convolve two vectors.
• nan_conv2 — performs 2D convolution of matrices a and b
• nan_conv2nan — calculates 2-dim convolution between X and Y
• nan_cor — calculates the correlation matrix
• nan_corrcoef — calculates the correlation matrix from pairwise correlations.
• nan_corrcov — Compute correlation matrix from covariance matrix.
• nan_cov — calculates covariance matrix
• nan_covm — generates covariance matrix
• nan_decovm — decomposes extended covariance matrix
• nan_ecovm — produces an extended Covariance matrix,
• nan_partcorrcoef — calculates the partial correlation between X and Y after removing the influence of Z.
• nan_rankcorr — calculated the rank correlation coefficient.
• nan_tiedrank — compute rank of samples, the mean value is used in case of ties
• nan_xcorr — Compute correlation R_xy of X and Y for various lags k:
• nan_xcorr2 — Compute the 2D correlation
• nan_xcov — Compute covariance at various lags[=correlation(x-mean(x),y-mean(y))].
• nan_xcovf — generates cross-covariance function.
• Classification
• nan_cat2bin — converts categorial into binary data
• nan_classify — classifies sample data into categories
• nan_confusionmat — Confusion matrix for classification algorithms.
• nan_fss — feature subset selection and feature ranking
• nan_kappa — estimates Cohen's kappa coefficient
• nan_mahal — return the Mahalanobis' D-square distance
• nan_partest — This function calculate the performance, based on Bayes theorem, of a
• nan_rocplot — plot a Receiver Operating Characteristic (ROC) curve
• nan_row_col_deletion — selects the rows and columns for removing any missing values.
• nan_svmrocplot — plotroc draws the recevier operating characteristic(ROC) curve for an svm-model
• nan_test_sc — apply statistical and SVM classifier to test data
• nan_train_lda_sparse — Linear Discriminant Analysis for the Small Sample Size Problem as described in
• nan_train_sc — Train a (statistical) classifier
• nan_xval — is used for crossvalidation
• predict — Does prediction for a calculated svm model
• svmpredict — Does prediction for a calculated svm model
• svmtrain — trains a svm model
• train — trains a linear model
• Cluster Analysis
• nan_kmeans — K-means clustering algorithm.
• Descriptive Statistics
• nan_center — removes the mean
• nan_coef_of_variation — returns STD(X)/MEAN(X)
• nan_detrend — removes the trend from data, NaN's are considered as missing values
• nan_ecdf — empirical cumulative function
• nan_geomean — calculates the geomentric mean of data elements.
• nan_grpstats — Summary statistics by group.
• nan_harmmean — calculates the harmonic mean of data elements.
• nan_hist2res — Evaluates Histogram data
• nan_histc — Produce histogram counts.
• nan_histo — calculates histogram for each column
• nan_histo2 — calculates histogram of each column
• nan_histo3 — calculates histogram and performs data compression
• nan_histo4 — calculates histogram for rows and supports data compression
• nan_iqr — calculates the interquartile range
• nan_kurtosis — estimates the kurtosis
• nan_mad — estimates the Mean Absolute deviation
• nan_mean — calculates the mean of data elements.
• nan_meanAbsDev — estimates the Mean Absolute deviation
• nan_meandev — estimates the Mean deviation
• nan_meansq — calculates the mean of the squares
• nan_medAbsDev — calculates the median absolute deviation
• nan_median — median data elements,
• nan_moment — estimates the p-th moment
• nan_percentile — calculates the percentiles of histograms and sample arrays.
• nan_prctile — calculates the percentiles of histograms and sample arrays.
• nan_quantile — calculates the quantiles of histograms and sample arrays.
• nan_range — Range of values
• nan_ranks — gives the rank of each element in a vector.
• nan_rms — calculates the root mean square
• nan_sem — calculates the standard error of the mean
• nan_skewness — estimates the skewness
• nan_spearman — Spearman's rank correlation coefficient.
• nan_statistic — estimates various statistics at once.
• nan_std — calculates the standard deviation.
• nan_sumsq — calculates the sum of squares.
• nan_trimean — evaluates basic statistics of a data series
• nan_trimmean — calculates the trimmed mean by removing the upper and lower
• nan_var — calculates the variance.
• nan_y2res — evaluates basic statistics of a data series
• nan_zScoreMedian — removes the median and standardizes by the 1.483*median absolute deviation
• nan_zscore — removes the mean and normalizes the data to a variance of 1.
• File I/O
• writesparse — writes sparse matrix to a file in LIBSVM format
• xptopen — Read and write in stata fileformat
• Hypothesis Tests
• nan_ttest — (paired) t-test
• nan_ttest2 — (unpaired) t-test
• Utility functions
• flag_accuracy_level — sets and gets accuracy level
• flag_impl_significance — sets and gets default alpha (level) of any significance test
• flag_impl_skip_nan — sets and gets default mode for handling NaNs

• flag_nans_occured — checks whether the last call(s) to sumskipnan or covm
• nan_accumarray — Create an array by accumulating the elements of a vector into the positions defined by their subscripts.
• nan_fft — matlab compatible fft
• nan_flix — floating point index - interpolates data in case of non-integer indices
• nan_grp2idx — Create index vector from a grouping variable.
• nan_ifft — matlab compatible ifft
• nan_ismember — Checks which elements of one matrix are member of an other matrix
• nan_mgrp2idx — Convert multiple grouping variables to index vector
• nan_postpad — append the scalar
• nan_prepad — prepend the scalar
• nan_unique — Return the unique elements of x, sorted in ascending order.
• str2array — C-MEX implementation of STR2ARRAY - this function is part of the NaN-toolbox.
• sumskipnan — adds all non-NaN values.
• Statistical Visualization
• nan_andrewsplot — Andrews plot for multivariate data.
• nan_boxplot — Draw a box-and-whiskers plot for data provided as column vectors.
• nan_cdfplot — plots empirical commulative distribution function
• nan_ecdfhist — Create histogram from ecdf output.
• nan_errorb — plot nice healthy error bars
• nan_errorbar — This function put an errobar range onto plot
• nan_fscatter3 — Plots point cloud data
• nan_gplotmatrix — Scatter plot matrix with grouping variable.
• nan_gscatter — scatter plot of groups
• nan_hist — Histogram.
• nan_nhist — Histogram
• nan_normplot — Produce a normal probability plot for each column of X.
• nan_parallelcoords — Parallel coordinates plot for multivariate data.
• nan_plotmatrix — function [h]= * nan_plotmatrix(x,y,param1,param2)

The Nan Toolbox is available on ATOMS:

To install it, type:

`atomsInstall('nan')`

### The NISP toolbox

This module allows to perform sensitivity analysis. This is the analysis of the uncertainty in the output of a given model, depending on the uncertainty in its inputs.

The analysis is based on chaos polynomials, which are orthogonal polynomials which are used as an approximation of the original model. Once the coefficients of the chaos polynomial are computed, the associated sensitivity indices are straightforward to get.

The module provides the following components :

• “nisp” provides function to configure the global behaviour of the
• toolbox. This allows to startup and shutdown the library, configure and quiery the verbose level or initialize the seed of the random number generator.
• “randvar” is the class which allows to manage a random variable.
• Various types of random variables are available, including uniform, normal, exponential, etc...
• “setrandvar”, is the class which allows to manage a set of random
• variables. Several methods are available to build a sampling from a set of random variables. We can use, for example, a Monte-Carlo sampling or a Sobol low discrepancy sequence. This feature allows to use the class as Design of Experiment tool (DOE).
• “polychaos” is the class which allows to manage a polynomial chaos
• expansion. The coefficients of the expansion are computed based on given numerical experiments which creates the association between the inputs and the outputs. Once computed, the expansion can be used as a regular function. The mean, standard deviation or quantile can also be directly retrieved.

The current toolbox provides an object-oriented approach of the C++ NISP library.

The following list presents the features provided by the NISP toolbox :

• Manage various types of random variables : uniform, normal, exponential,

log-normal,

• Generate random numbers from a given random variable,
• Transform an outcome from a given random variable into another,
• Manage various sampling methods for sets of random variables: Monte-Carlo,

Sobol, Latin Hypercube Sampling, various samplings based on Smolyak.

• Manage polynomial chaos expansion and get specific outputs, including: mean,

variance, quantile, correlation, etc... Generate the C source code which computes the output of the polynomial chaos expansion.

We additionally provide the nisp_sobolsa function which provides the Sobol method for sensitivity analysis. It allows to compute the first order sensitivity indices, the total sensitivity indices and all the sensitivity indices.

This module is developed by Michael Baudin (Digiteo) and Jean-Marc Martinez (CEA). The module is provided under the LGPL licence.

The NISP toolbox is developped on Scilab's Forge:

The NISP toolbox is provided on Atoms:

The following figure is the histogram of the output of the ishigami function, a classical benchmark in sensitivity analysis.

More details on this module are provided on the wiki:

### The libsvm toolbox

The libsvm toolbox provides a simple interface to LIBSVM, a library for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). It is very easy to use as the usage and the way of specifying parameters are the same as that of LIBSVM.

This tool provides also a simple interface to LIBLINEAR, a library for large-scale regularized linear classification (http://www.csie.ntu.edu.tw/~cjlin/liblinear). It is very easy to use as the usage and the way of specifying parameters are the same as that of LIBLINEAR.

The libsvm v1.2.2 toolbox is an update of the libsvm v1.0 toolbox, first distributed at 7th of November 2011.

This interface was initially written by Jun-Cheng Chen, Kuan-Jen Peng, Chih-Yuan Yang and Chih-Huai Cheng from Department of Computer Science, National Taiwan University.

It was converted to Scilab 5.3 by Holger Nahrstaedt from TU Berlin.

This Toolbox is compatible with the NaN-toolbox.

The libsvm toolbox provides the following functions :

• libsvmwrite — writes sparse matrix to a file in LIBSVM format
• predict — Does prediction for a calculated svm model
• svmconfmat — Confusion matrix for classification algorithms.
• svmgrid — parameter selection tool for C-SVM classification using the RBF (radial basis function) kernel
• svmgridlinear — parameter selection tool for linear classification
• svmnormalize — scale the input data for correct learning
• svmpartest — This function calculate the performance, based on Bayes theorem, of a clinical test
• svmpredict — Does prediction for a calculated svm model
• svmrocplot — plotroc draws the recevier operating characteristic(ROC) curve for an svm-model
• svmscale — scale the input data for correct learning
• svmtoy — shows the two-class classification boundary of the 2-D data
• svmtrain — trains a svm model
• train — trains a linear model

The libsvm toolbox is provided under the BSD license.

The toolbox provides the following demos :

• scaling_demo : train_demo.sce
• linear_demo : linear_demo.sce
• rbf_demo 1 : rbf_demo1.sce
• rbf_demo 2 : rbf_demo2.sce
• rbf_demo 3 : rbf_demo3.sce
• three class demo : three_class_demo.sce
• perfomance_demo : performance_demo.sce
• liblinear_perfomance_demo : liblinear_performance_demo.sce
• svm fitting demo : svm_fitting_demo.sce
• linear weight demo : linear_weight_demo.sce
• svmtoy demo : svmtoy_demo.sce
• outlier_detection : outlier_detection.sce

The libsvm toolbox is provided on ATOMS :

and is developped on Scilab's Forge :

To install this toolbox, we type :

`atomsInstall('libsvm')`

and restart Scilab.

The "linear_demo" produces the following graphics. This review was first published at :

### Other Toolboxes

There are other significant toolboxes for Scilab which are relevant to this field. There are two toolboxes for Neural Networks:

More toolboxes are available in the Statistics category of ATOMS :

Other modules are available in the former Toolbox Center:

For example, a boxplot toolbox is available at:

## Documents and tutorials

In this section, we present documents, tutorials and books which present practical uses of Scilab on probabilities and statistics.

### In English

• Introduction to Discrete Probabilities with Scilab (91 pages) (PDF) (LaTeX Sources), Available under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License, Michaël Baudin, Consortium Scilab - DIGITEO, 2010

• Scientific Methods in Mobile Robotics, Ulrich Nehmzow, Springer, 2006. See especially the chapter "Statistical Tools for Describing Experimental Data", with examples in Scilab.

### In French

• Processus stochastiques et modélisation, (Cours et exercices corrigés), L3 MIAGE, Université de Nice-Sophia Antipolis, 2011-2012, Chapitres 1,2,3 (PDF)

• Probabilités et statistiques avec Scilab (PDF), Jean-Marc Decauwert, 2011

• Probabilités-statistiques pour l'agrégation de mathématiques à l'université de Lille 1 (HTML+PDF+scripts), Charles Suquet, 2005

• Démarrer en Scilab et statistiques en Scilab (HTML), B. Ycart, 2001

• MAP 311 - Mathématiques Appliquées, Introduction aux probabilités, Sylvie Méléard, (HTML)

• Introduction à Scilab pour les probabilités, Jean-Marc Steyaert, http://www.lix.polytechnique.fr/Labo/Jean-Marc.Steyaert/Scilab/Scilab.html

• Ingénierie Stochastique [M2 MASS]
• Fonctions itérées stochastiques et images fractales, Programmes Scilab :
• Chaînes en auto-interaction. Chaînes renforcées, Programmes Scilab :
• Algorithme de Robbins Monro et recherche de médianes, Programmes Scilab :
• Application du recuit simulé au problème du voyageur de commerce, Programmes Scilab :
• Filtre de Kalman Bucy et estimation de signaux linéaires et gaussiens, ou à espace fini. Algorithmes génétiques, systèmes de particules en interaction, arbres généalogiques, Programmes Scilab :
• Analyse et estimation d'événements rares. Méthodes de branchement par niveaux, arbres généalogiques, Programmes Scilab :
• Analyse et simulation de macro-polymères. Marches sans intersection et constantes de connectivité.
Programmes Scilab :
• Universite Paris Diderot (Paris VII) et Pierre et Marie Curie (Paris VI), Année 2011/2012., Préparation à l’Agrégation externe de Mathématiques (Option Probabilités), Laurent Mazliak
• Feuille de TP n°1 : Introduction Probabiliste `a Scilab (Correction) : (PDF)

• Script Scilab: (SCE)

2022-09-08 09:27