ISSN: 2322-0066

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

On New Generalized Logistic Distributions and Applications Barreto FHS, Mota JMA and Rathie PN*

Barreto FHS, Mota JMA and Rathie PN*

Department of Statistics and Applied Mathematics, Federal University of Ceara Fortaleza, Brazil

*Corresponding Author:
Rathie PN
Rathie PN, Department of Statistics and Applied
Mathematics, Federal University of Ceara
Fortaleza, Brazil
E-mail: pushpanrathie@yahoo.com

Received date: 06/03/2017; Accepted date: 18/04/2017; Published date: 07/04/2017

Visit for more related articles at Research & Reviews: Research Journal of Biology

Abstract

In 2006, Rathie and Swamee had proposed a generalization of the logistic distribution which is more flexible and multimodal. This work presents an addition of a new parameter to increase the flexibilization of the distribution as well as an asymmetric distribution using the Azzalini method, adding another parameter of asymmetry. Five data sets (Human Body Fat Index, HIV, Precipitation, pH Concentration, Relative Humidity) are analysed by applying the new distributions. The estimation of the parameters of the new distributions and mixture of the normals was accomplished by the automaximum likelihood method. Due to complex mathematical resources required to calculate the estimates of the new distributions, we use interactive numerical methods such as L-BFGS-B, BFGS, SANN etc. using an adaptive barrier algorithm added to enforce the constraint and an adapted function that searches for global maximum of a very complex non-linear objective function to initial values of the algorithm of estimation. All computational work was implemented in software R. In most cases, we use the Hartigan’s test to reject unimodality. Using the KolmogorovSmirnov test at significance level of 5% and applying various criteria, such as Mean Square Error, Mean Absolute Deviation and Maximum Deviation, to indicate the best fit. The classical and general method for multimodal adjustment is a mixture of distributions, in particular, the mixture of the normal distributions because the normal distribution presents good mathematical properties. In the case of mixture of the normals, we use EM algorithm to calculate the estimates. We also use Akaike Information Criterion and Bayesian Information Criterion as selection criteria to highlight the best distribution, in both cases, comparing them with the mixture of normal distributions to illustrate the applicability of the results derived in this paper.

Keywords

Rathie-Swamee distribution, Azzalini method, multimodal data set analysis, Akaike criterion information, Bayesian criterion information, maximum likelihood method, Kolmogorov-Smirnov test

Introduction

There are several classical models, such as normal, exponential, binomial, Poisson, logistic etc. to analyze different data sets. As there is not a single unified model, we have to construct new models suitable for the data sets under consideration. The logistic model is very useful in many areas in statistics and physics. This article is divided as follows: Section 2 deals with symmetric generalized logistic distribution whereas in Section 3 the skew form is studied. Section 4 presents applications to analyze five real data sets using the results of earlier sections and comparing them with the mixture of two normal distributions where possible. The article ends with a short conclusion and a list of references. Rathie et al.[1] defined a multimodal symmetric distribution function G(x) for a random variable X∼RS (a, b, p) as

images 1

With a and b not zeros simultaneously. For b=0 or when p=0, (1) is written as a logistic distribution

images 2

Where c=a or c=a + b. The density function corresponding to (1) is

images 3

Generalized Symmetric Logistic Distribution

A symmetric distribution can be generated by using the method proposed by Jones in 2004 [2]. Let U ~Beta (α,α), and X=G- 1(U), where G(x) is a distribution function of g(x). Then, the distribution function H(x) of X is given as

images 4

Differentiating H(x) yields the corresponding density function as

images 5

Using (1) and (3) in (5), the generalized symmetric logistic density function for X ~ RSG (a, b, p, α) is given by

images 6

Where both a and b not zeros simultaneously and B (., .) is the beta function. For α=1, reduce to (3). We may introduce the location parameter μ in the model (6). There is no need to introduce the scale parameter, otherwise the density function will become non-identifiable. The density function (6) takes the following form on introducing the location parameter μ є R:

images 7

The Figures 1 to 4 show graphs for (6) and (7) respectively for various values of the parameters μ, a, b, p and α.

applied-science-innovations-Graphs-Fixed

Figure 1: Graphs of (6) and (7) for Fixed a.

applied-science-innovations-Graphs-Fixed

Figure 2: Graphs of (6) and (7) for fixed b.

applied-science-innovations-Graphs-Fixed

Figure 3: Graphs to (6) and (7) for fixed p.

applied-science-innovations-Graphs-Fixed

Figure 4: Graphs of (6) and (7) for fixed α.

Distribution function

In this subsection, we prove that the distribution function corresponding to (6) is given by

images 8

Proof. For x > 0, we have

images 9

Substuting images we get

images

we have

images 10

By symmetry, we easily write the result for x < 0.

Moments

In this subsection, we obtain the n-th moments about the origin. By definition,

when n is an even integer

images 11

Then, by expanding the denominator by binomial theorem, we have

images 12

when n is an even integer.

The variance of X ~ RSG (a, b, p, α) is given by

images 13

Generalized Skew Logistic Distribution

In Azzalini density [3]

s(x) = 2 v(x)V[w(x)], x∈R 14

With w(x)=kx; k 2 R, take v(x) as the density function of X ~ RSG(a; b; p; ) and V (x) as the distribution function of X ~ RS(a; b; p). Then, the density function of generalized skew logistic model X ~ RSGA (a; b; p;α; k) is given by

images 15

Introducing the location parameter μ∈R, the density function of X ∼ RSGA (a, b, p, α, k) is given by

images 16

For certain values of the parameters, s(x) and s1(x) are plotted in Figure 5 for k =±0.7 and in Figures 6 and 7 for a=0 and b=0 respectively.

applied-science-innovations-certain-values

Figure 5: Graphs of (15) and (16) for certain values of the parameters.

applied-science-innovations-certain-values

Figure 6: Graphs of (15) and (16) for certain values of the parameters with a=0.

applied-science-innovations-certain-values

Figure 7: Graphs of (15) and (16) for certain values of the parameters with b=0.

Applications Involving Real Data

In the present section, five data sets are analyzed by using the distributions defined in earlier sections as well as the mixture of two normals for bimodal data. The estimation of parameters is done by utilizing the method of maximum likelihood estimation. Akaike Cri- terion Information [4], Bayesian Information Criterion, Mean Square Error, Absolute Mean Deviation and Maximum Absolute Deviation are calculated to judge the fit of RSG, RSGA and mixture of two normals. The goodness of fit test of Kolmogorov-Smirnov is used with significance level of 5%. Some packages of sotfware R are used. The GenSA package [5] is used to obtain initial values to interactive algorithm. For interactive algorithm, we use the bbmle::mle2 package [6], in most cases, using BFGS method and optimizer constrOptim to guarantee that the estimated parameters are consistent within their respective parametric space. For more details to adaptive barrier algorithm, see stats::constrOptim into soft- ware R. We obtain the estimates of the parameters, approximate the standard errors of the estimates based on quadratic approximation to the curvature at the maximum likelihood estimate, and a test (z test) of the parameter difference from zero based on this standard error and on an assumption that the sampling distribution of the estimated parameters is normal.

The AIC and BIC for the classification of the model-fit on data sets in various applications will be used. These are defined below

images 17

where ņpar is the number of parameters to be estimated and l(.;.) is the logarithm of the estimated likelihood function.

images 18

where η is the number of observations. Mean Square Error (MSE), Mean Absolute Deviation (MAD) and Maximum Absolute Deviation (MD) are defined below:

images

images

images

where images is the empirical cumulative distribution and images is the fitted cumulative distribution of the data. Of course, the smallest value obtained will indicate that there is a good fit.

Human body fat index

The data consist of 252 observations on 17 variables about human body fat. For details, see Jonhson [7], Penrose et al. [8], and Ambler et al. [9]. Figure 8 demonstrates that the data is unimodal which is also confirmed by test [10,11] with statistics D=0.014114 and p-value near 1. The estimates of the parameters using RSG and RSGA models are given in Table 1.

RSG Parameter Estimate Error z-value P (z)
µ 19.26 2.1087 × 10−5 9.1336 × 105 <0.0001
a 0.15401 1.127 × 10−2 13.662 <0.0001
b 10−4 3.3937 × 10−5 2.9467 <0.004
p 2.1986 1.0742 × 10−4 2.0468 × 104 <0.0001
α 1.2338 8.1766 × 10−4 1.5089 × 103 <0.0001
log L −890.9885
RSGA Parameter Estimate Error z-value P (z)
µ 7.8768 1.0392 × 10−2 757.9289 <0.0001
a 0.18403 2.8006 × 10−2 6.5712 <0.0001
b 10−4 3.0035 × 10−5 3.3294 <0.0001
p 2.2996 1.9703 × 10−3 1167.086 <0.0001
α 0.35062 7.2149 × 10−2 4.8597 <0.0001
k 1.7177 2.8455 × 10−2 60.3678 <0.0001
log L−889.786

Table 1: Estimates associated with RSG and RSGA models.

applied-science-innovations-Body-Fat

Figure 8: Adjustments of two new distributions to Body Fat Index.

Table 2 shows the comparison of the models used. Figure 8 presents the histogram with adjusted models. The empirical and theoretical distributions are shows in Figure 9.

Model K-S p-value MSE (10−4) MAD MD AIC BIC
RSG 0.047619 0.9375 1.315639 0.009163 0.033421 1791.977 1809.624
RSGA 0.06746 0.615 1.189378 0.008951 0.030355 1791.572 1812.749

Table 2: The comparison of adjusted models used.

applied-science-innovations-theoretical-distributions

Figure 9: Graphs of empirical and theoretical distributions..

For AIC, it may be observed that the RSGA fit is better than RSG fit for this data set. The Bayesian criterion indicates a better fit for RSG distribution.

Precipitation

The data consist of 121 observations about annual precipitation (rain) between 1978 and 1998 at the center of the city of Los Angeles. These data were obtained from the site [12]. Figure 10 demonstrates that the data is unimodal which is also confirmed by Hartigan’s test with statistics D=0.027273 and p-value equal to 0.7971. The estimates of the parameters, using RSGA distribution, are given in Table 3.

Parameter Estimate Error z-value  P(z)
µ 4.0393 4.4968× 10−2 89.825 <0.0001
a 49.999 2:6007 × 10-4 1.9225 × 105 <0.0001
b 34.113 3.9072 × 10−4 8.7308 × 104 <0.0001
p 0.7582 0.1095 6.9239 <0.0001
α 2.9333 × 10−4 8.0064 × 10−5 3.6638 <0.0003
3.838 2.6556 × 10-4 1.4452.10-4 <0.0001
Log L-393.2849

Table 3: Estimates associated with RSGA model.

applied-science-innovations-distribution-fitted

Figure 10: Graphs of empirical and theoretical distributions.

Applying the non-parametric Kolmogorov-Smirnov test, the K-S value obtained is 0.07438 with p-value 0.8914, thus not reject the hypothesis that the data satisfies RSGA distribu- tion. In 2014, Eirado et al. [13] proposed an asymmetric model and applied to this data set. The MSE obtained is equal to 0.001058396, the mean absolute deviation (MAD) is 0.02785116 and the maximum absolute deviation (MD) is 0.06496284. Also, we obtained MSE equal to 0.0002414233, MAD equal to 0.01185483 and MD equal to 0.04669135.

AIC and BIC of the fits of the two models are given in Table 3,4. The empirical and theoretical distributions are shows in Figure 11. Clearly, the RSGA distribution gave better fit to the precipitation data.

Model log-likelihood AIC BIC
RSGA −393.2849 798.5697 815.3444
Eirado-Rathie −551.6425 1113.285 1127.264

Table 4: The comparison of the models.

applied-science-innovations-theoretical-distributions

Figure 11: Empirical and theoretical distributions of precipitation.

HIV Data

The HIV data with 2843 observations is available in fitdistrplus: Aids2 package of software R, giving the age when a patient is diagnosed with AIDS in Australia in 1991. Table 5 presents the estimates of the parameters of RSG and RSGA models.

RSG Estimate Error z-value P (z)
µ 36.931 0.18698 197.51 <0.0001
a 0.16731 0.017989 9.3006 <0.0001
p 8.9282 5.2278 × 10−17 1.7078 × 1017 <0.0001
α 1.1148 0.017463 6.3838 <0.0001
log L −10552.23
RSGA Estimate Error z-value P (z)
µ 27.477 0.031826 86.336 <0.0001
a 0.05717 0.001222 46.779 <0.0001
p 9.7371 1.0564 × 10−15 9.2174 × 1015 <0.0001
α 3.5391 0.20708 17.091 <0.0001
k 4.5317 0.17192 26.359 <0.0001
log L−10508.95

Table 5: Estimates associated with RSGA and RSG models.

NORSKEW Estimate Error z-value P (z)
µ 37.5304 0.187355 200.317 <0.0001
σ 10.01696 0.13529 74.041 <0.0001
ξ 1.273675 0.031561 40.355 <0.0001
log L −10549.26
µ 37.40907 0.1887 198.245<0.0001
σ 10.06149 0.13343 75.406<0.0001
log L−10597.72

Table 6: Estimates associated with asymmetric normal and normal distributions.

Histogram and RSGA distributions to HIV data are shown in Figure 12 while Empirical and RSGA distributions in Figure 13. In Table 7, the Kolmogorov-Smirnov test rejects almost all adjusted distributions except RSGA distribution.

applied-science-innovations-Adjustments-distribution

Figure 12: Adjustments of RSGA distribution to HIV data.

Model K-S p-value MSE(10−4) MAD MD AIC BIC
RSG 0.289524 0.0014 4.376593 0.017052 0.04955 21112.47 21136.28
RSGA 0.063492 0.69 1.450326 0.009691 0.032249 21027.9 21057.66
NORSKEW 0.041857 0.01373 3.033533 0.014451 0.040824 21104.53 21122.39
NORMAL 0.059796 7.696 × 10−5 8.539093 0.025396 0.058367 21199.44 21211.35

Table 7: Comparison of the models used. Comparison of the models used.

applied-science-innovations-Empirical-distribution

Figure 13: Empirical and theoretical distributions to HIV data.

pH Concentration data

The pH concentration data [14] with 252 observations show bimodality which is also demonstrated by Hartigan’s test with statistics of the test equal to 0.046498 and p-value of 0.00045. The estimates of the parameters are given in Table 8.

RSGA Estimate Error z-value P (z)
µ 3.094726 0.071289 43.4109 <0.0001
a 8.242063 2.241954 3.6763 <0.0003
b 0.003 0.001066 2.8153 0.004874
p 6.244648 0.344886 18.1064 <0.0001
α 0.045077 0.011673 3.8616 <0.0002
k 0.86603 0.335523 2.5811 0.009848
log L −364.2
µ 4.918676 0.042907 114.6364 <0.0001
a 6.027683 0.616692 9.7742 <0.0001
b 2.906972 1.071397 2.7133 <0.007
p 2.711035 0.459798 5.8961 <0.0001
α 0.068114 0.006893 9.8812 <0.0001
log L −363.7172

Table 8: Estimates associated with RSGA and RSG models.

Silva et al. [15] proposed two new asymmetric models by Azzalini’s method h1(x) and h2(x) where the pH concentration data was fitted by these two models. Table 10 shows the performance of the fitted distributions.

Using package of Benaglia et al. [16], the estimates of mixture of normals are given in Table 9 with parametric bootstrap performed for standard error approximation.

Parameters Component 1 Component 2 Error of Component 1 Error of Component 2
λ 0.50439 0.49561 0.041677 0.0416768
µ 3.892103 5.961384 0.076694 0.07539492
σ 0.575443 0.568638 0.056243 0.05409495
log L −366.8661

Table 9: Estimates of mixture of two normal.

Histogram of pH values along with the distributions adjusted are shown in Figures 14 and 15

applied-science-innovations-fitted-models

Figure 14: pH histogram and the fitted models.

applied-science-innovations-theoretical-distributions

Figure 15: Graphs of empirical and theoretical distributions.

Table 10 gives the accuracy values of AIC, BIC, MSE etc, for various models. The RSG model adjusted well the bimodal data.

Model K-S p-value MSE (10−4) MAD MD AIC BIC
RSG 0.06746 0.61 1.814886 0.01067501 0.039083 737.4343 755.0871
RSGA 0.075397 0.4709 2.546568 0.01283771 0.038684 740.4067 761.5833
NORMIX 0.083333 0.3457 7.407901 0.02202145 0.064505 743.7322 761.3793
h1(x) 0.8316 3 0.0152 0.0373 744.6913 776.4561
h2(x) 0.09438 96 0.0912 0.1454 857.387 889.1519

Table 10: Comparison of the models used.

Relative Humidity (RH)

The RH observations data are taken from Nychka et al. [17]. The estimates of the parameters for RH data using the RSGA model are given in Table 11.

Parameter Estimate Error z-value P (z)
µ 59.72236 0.008989 6643.879 <0.0001
a 0.034228 0.016025 2.1359 <0.04
b 0.002588 0.001281 2.0199 <0.05
p 1.227392 0.151744 8.0886 <0.0001
α 0.266291 0.115667 2.3022 <0.03
k −0.4621166 0.095596 −4.8341 <0.0001

Table 11: Estimation of the parameters of the RSGA model.

The estimation for a mixture of two normal s are given in Table 12. The values of AIC, BIC etc. measuring the quality of fit are given in Table 13.

NORMIX Component 1 Component 2 Error Component 1 Error Component 2
λ 0.6975 0.3025 0.025634 0.02563423
µ 36.8122 77.08626 0.865337 1.139471
σ 11.835 9.28641 0.648855 0.8474422
log L −1958.626

Table 12: Estimation of the parameters of the mixture of two normal.

Model K-S p-value EQM (10−4) MAD MD AIC BIC
RSGA 0.080178 0.1115 8.497387 0.0217099 0.076005 3926.544 3951.182
NORMIX 0.073497 0.1768 9.316102 0.02176488 0.066207 3927.252 3947.787

Table 13: Comparison of the models used,

In Figure 16, the histogram and the fit using Empirical, RSGA and the mixture of two normals distributions are shown. In Figure 17, the empirical and theoretical distributions are shown.

appliedapplied-science-innovations-adjusted-model

Figure 16: Relative Humidity and adjusted model.

applied-science-innovations-theoretical-distributions

Figure 17: The empirical and theoretical distributions.

Conclusion

The Rathie-Swamee generalized distribution (RSG) and its skew form (RSGA) proved useful to five data sets analyzed, thus demonstrating their applicabilities over the mixture of two normals, in case of bimodal sets (pH concentration and relative humidity).

Acknowledgement

P. N. Rathie thanks the Coordination for the Improvement of Higher Level Personnel (CAPES) for supporting his Senior National Visiting Professorship.

References