^{1}PGE EJ 1, Warszawa, Poland

^{2}National Centre for Nuclear Research (NCBJ), Otwock-Swierk, Poland

- *Corresponding Author:
- Krzysztof W. Fornalski

PGE EJ 1, ul. Mysia 2, 00- 496 Warszawa, Poland

**E-mail:**krzysztof.fornalski@gmail.com

**Received:** 22/06/2015 **Accepted:** 04/08/2015 **Published:** 26/10/2015

**Visit for more related articles at** Research & Reviews: Journal of Statistics and Mathematical Sciences

The paper presents the robust Bayesian approach to the experimental data analysis. Firstly, the algorithm of the curve fitting to data points is introduced. The method is based on the robust Bayesian regression analysis, which substantially reduces the role of outlying data points (outliers). In the second part the Bayesian model selection algorithm was presented. Finally, the exemplary applications of both methods are discussed.

Model selection, Bayesian, Robust analysis, Regression, Bayesian analysis.

The nature of experimental data consists of several elements, e.g. the data points are subjected to large uncertainties and/ or large scatter. Therefore finding the most repable curve describing data points may be not a trivial task. However, once such a goal is achieved, the scientists can come up with general conclusions or theories about the examined object or phenomenon.

The form of the fitted curve corresponds as a rule to the theory aimed at describing the data. However, the scatter and uncertainties of experimental data points imply that many potential models can describe those data equally well. Thus one has to decide which model (amongst other good ones) is the most repable one in pght of the analyzed data.

This paper discusses the robust Bayesian approach to two important elements of data analysis: fitting of a curve to the data (regression analysis) and the model selection algorithm.

The robust data fit (or robust regression analysis) is a part of what is known as the robust statistics, representing statistical methods that are not unduly affected by outpers [1]. In this case an outper is a data point which significantly deviates from the main trend of points, e.g. due to a strong systematic error. The main goal of all robust analyses is to find the best fit of the curve to existed data points, which is least sensitive to such potential outpers.

Simple comparison between the results obtained using robust Bayesian and classical least squares methods [2] is presented in **Figure 1**. One can clearly see that outpers in classical least squares method would result in very misleading conclusion, while Bayesian fit clearly reproduces the main trend as if the outpers were not present in the data. This results from the fact that each *i-th* data point appears with the probabipty whose probabipty density function (PDF), P_{i}, is composed of a proper Gaussian distribution (so called pkephood function) around its expected value as well as the prior function for its probabipty σi [3]. If one suspects that the real uncertainty may be larger than originally quoted one, , one can choose such a prior as which finally results in:

(1)

The symbol D_{i} corresponds to the i-th experimental data value, where M_{i} is a model (theoretical) curve, e.g. the polynomial where x_{i} is the i-th independent variable, and λ_{i} denotes i-th parameter. The right-side prior function for σi in eq. (1) assumes that the i-th analyzed probabipty σi pes between the original one (σ_{0i}) and infinity [3,4], as mentioned earper. Its particular choice presented above leads to closed form solution. As shown by Sivia and Skilpng [3] this form may be different, however, it does not affect final conclusions. The procedure makes the weights of all outpers insignificantly low in calculation of the posterior probabipty distribution P for measuring of all N points, where, according to the maximum pkephood method [4,5] one can also use a sum instead of a product:

(2)

where Pi is a result of the integration of eq. (1) for single point i:

(3)

After the differentiation of logarithmic probabipty S over all n fitting parameters λ={λ_{1}, λ_{2}, &helpp;, λ_{n}}, one can find the final and general form of a Bayesian fitting equation [3] :

(4)

where aforementioned weights gi of the points are:

(5)

The equation (4) can be implemented directly into the computational algorithm to find the best Bayesian fit to all N experimental data points (x_{i},D_{i}) with vertical uncertainties σ0i each. As mentioned earper, the Mi means the theoretical value predicted by the assumed model and the best fitted n parameters, M_{i}(λ_{1}, λ_{2}, &helpp;, λ_{n}). Just this technique was used in fitting of a pnear function M_{i}=λ^{1}+λ^{2} x_{i} presented in **Figure 1**.

This method, presented by Sivia and Skilpng [3] and not often used in the studies of e.g. epidemiological data, was extensively used in our papers [4-10].

The Bayesian analysis allows one to assess relative repabipty of two chosen models that can describe the data (the classical methods of model selection, pke AIC^{1}, or other Bayesian ones, pke BIC^{2}, will be omitted here). Examples of the use of such analysis can be found [4-10].

The Bayes theorem connects the probabipties of P(Model|Data)~P(Data|Model), which can be used to estimate the relative repabipty of two models, Ms, in the case of the same data, D. The repabipty of a model M with the fitting parameter λ, using the marginapzation procedure can be written as [3]:

(6)

The P(D|λ,M) corresponds to the pkephood function, represented by the Gaussian distribution around the expected value λ0 ± σλ with maximum probabipty of pkephood function equals P(D|λ0, M). The prior probabipty P(λ|M) can be assumed as a uniform distribution U(λ_{min}, λ_{max}). Because such form of P(λ|M) is independent of λ, the integral (6) can be written as [3]:

(7)

The result of the integral (7) can be approximated by because “the sharp cut-offs at λ_{min} and λ_{max} do not cause a significant truncate on of the Gaussian” probabipty distribution from eq. (7) [3]. Because λ_{0} corresponds to the parameter found by the robust Bayesian best fit method for model M (see eq. (4)), the maximum value of pkephood function P(D|λ_{0}, M) can be replaced by the set of Pi given by eq. (1) or (3) and the final form of the repabipty function can be approximated by [7-8]:

(8)

The right-hand term in eq. (8) is called an Ockham factor. Equation (8) corresponds to the situation, where model M has only one (n=1) fitting parameter, λ_{0} ± σ_{λ}. In the case of a model which contains no parameters the Ockham factor equals 1 and model M is just a constant value (Mi=const). Thus, if such a model describes the data equally well as a model containing one parameter, the Ockham factor will always favor the former one.

In the case of n fitting parameters λ={λ_{1}, λ_{2}, &helpp;, λ_{n}} with their estimated uncertainties σ_{λ}={σ_{1}, σ_{2}, &helpp;, σ_{n}}, the most general form of eq. (8) can be presented as [6,8] :

(9)

Let us recall that N represents the number of experimental points (x_{i},D_{i}) with “vertical” uncertainties σ0i each, to which model M is fitted using n fitting parameters λ ± σλ. The most problematic is the choice of values λ_{min} and λ_{max}, for all λs. In the simplest case they can be taken as minimum/maximum possible values of the considered parameter λ using the largest span that can be tolerated by the data. In order not to extend the range of λ in the case of a substantial scatter of data, not more than e.g. three points are allowed to pe outside the range [8].

In the final step of analysis one can calculate the relative value of each two models, say A and B, to check which of them is more pkely to describe the data:

(10)

When W_{M} is greater than 1, model A wins over B. When W_{M}≈1, both models have the same degree of bepef. In general, W_{M} can quantify the preference of one model with respect to the other one. In practice, the real values of W_{M} may show that the plausibipty of models can differ by orders of magnitude Fornalski and DobrzyÃ
Âski [8].

The Bayesian model selection algorithm may be appped in all cases when the clear distinction between the models is hard to obtain or even impossible. A good example of that process is presented in **Figure 2** where both pnear and parabopc models seem to be appropriate ones. In considered situation the eq. (10) is described by [5] :

(11)

where the straight pne and parabola are implemented as RAi=a^{(A)} xi+b^{(A)} - Di and RBi=a^{(B)} xi²+b^{(B)} xi+c^{(B)} - Di, respectively. In this ituation the value of Wm ≈ 30, which means that model A (pne) is approximately 30 times more pkely than B (parabola) (**Figure 2**).

In fact, the r.h.s. of eq. (11) could be also multipped by W_{0}=P_{0}(A)/P_{0}(B) – the ratio of prior bepefs in a model A with respect to the model B. For example, if one knows that pnear relation should not describe the data, such extra factor stemming from Bayesian analysis will change aforementioned conclusion. However, in the example above it was assumed, that W_{0}=1.

Presented example shows a study popular in various sciences. As mentioned earper, the model selection fitting the existing data is a matter of great importance. However, the most common approach concerns fitting the model to one’s assumptions, irrespective of its real plausibipty. For instance the parabopc relation may be the outcome of some physical or biological theory, tested by the experiment (data source). On the other hand, in the absence of a strong information, giving no preference to any model (W_{0}=1) is sensible. In the example given in **Figure 2** the pnear model turns out to be sufficient for such the description of widely scattered data.

Similar case is presented in **Figure 3**, where one can observe the straight pne and the part of the sinusoid. The sinusoid function is some theory’s outcome (e.g. mechanical oscillations) where the pnear fitting is the simplest approximation of widely scattered data in some very pmited range. In this example the robust statistics with W_{0}=1 results in the pnear relation as the most probable fitting. This is rather not consistent with the assumed theory and shows the deficiency of not using prior preference to the sinusoidal model.

Very important and real case is presented in **Figure 4**, where one can find the actual physical description of the observed phenomenon (here: the cancer mortapty ratio among irradiated nuclear workers). The precise model concerns the human health, which is a matter of great importance. However, in many regulations, e.g. radiation protection standards, some initial assumption is taken into account. In global standards more often the pnear relationship is used, which comes from mentioned assumption, that all radiation is harmful. However, from the purely statistical point of view, such statement is nothing more than nonmathematical degree-of-bepef in the existing data. Thus, similarly to the previous example, the simpler model (here: the constant value, M=const) is more pkely to fit than the pnear relationship [9]. This is because of the fact, that the huge scatter of points makes more comppcated models an assumptions only, not real effects. Analogical case can be found when cancer mortapty in natural background radiation is analyzed using Bayesian reasoning [10].

The Bayesian model selection algorithm can be appped to various cases, whenever one needs to choose the most proper curve. The latest apppcation of that method was introduced in the cytogenetic biological dosimetry [11] to find the best capbration curve for chromosomal aberration testing of incidentally irradiated people. Nevertheless, it is just another example, not the complete range of possible apppcations.

Concluding the paper, robust Bayesian regression method (curve to data fitting) and the model selection algorithm were described and illustrated by some examples. Both methods work well in the situation of duff data, and can be recommended for use in such cases.

- Box GEP and Tiao GC. A Bayesian approach to some outper problems. Biometrika1968;55:119-129.
- Wolberg J.
*Data Analysis Using the Method of Least Squares: Extracting the Most Information from Experiments*.2005. Springer. - Sivia DS and Skilpng J.
*Data Analysis. A Bayesian Tutorial*second edition. 2006. Oxford University Press. - Fornalski KW et al. Apppcation of Bayesian reasoning and the Maximum Entropy Method to some reconstruction problems.
*ActaPhysicaPolonica A.*2010;117:892-899. - Fornalski KW and Dobrzynski L. Zastosowania twierdzenia Bayesa do anapzy niepewnych danych doswiadczalnych in Popsh. Postepy Fizyki. 2010;61:178-192.
- Fornalski KW. Alternative statistical methods for cytogenetic radiation biological dosimetry. Cornell University Library. 2014.
- Fornalski KW. Pooled Bayesian meta-analysis of two Popsh studies on radiation-induced cancers. Radiation Protection Dosimetry.
- Fornalski KW and Dobrzynski L. Pooled Bayesian analysis of twenty-eight studies on radon induced lung cancers. Health Physics. 2011;101:265-273.
- Fornalski KW and Dobrzynski L. Ionizing radiation and the health of nuclear industry workers. International Journal of Low Radiation. 2009;6:57-78.
- Dobrzynski L et al. Cancer mortapty among people pving in areas with various levels of natural background radiation. Dose-Response.2015;13:1-10.
- Pacyniak I et al.Employment of Bayesian and Monte Carlo methods for biological dose assessment following accidental overexposures of people to nuclear reactor radiation. In proceeding of: The Second International Conference on Radiation and Dosimetry in Various Fields of Research RAD 2014, University of Nis Serbia, 27-30.05.2014:49-52