On New Generalized Logistic Distributions and Applications
Barreto FHS, Mota JMA and Rathie PN*

Barreto FHS; Mota JMA; Rathie PN

On New Generalized Logistic Distributions and Applications Barreto FHS, Mota JMA and Rathie PN*

Barreto FHS, Mota JMA and Rathie PN^*

Department of Statistics and Applied Mathematics, Federal University of Ceara Fortaleza, Brazil

*Corresponding Author:: Rathie PN
Rathie PN, Department of Statistics and Applied
Mathematics, Federal University of Ceara
Fortaleza, Brazil
E-mail: pushpanrathie@yahoo.com

Received date: 06/03/2017; Accepted date: 18/04/2017; Published date: 07/04/2017

Visit for more related articles at Research & Reviews: Research Journal of Biology

Abstract

In 2006, Rathie and Swamee had proposed a generalization of the logistic distribution which is more flexible and multimodal. This work presents an addition of a new parameter to increase the flexibilization of the distribution as well as an asymmetric distribution using the Azzalini method, adding another parameter of asymmetry. Five data sets (Human Body Fat Index, HIV, Precipitation, pH Concentration, Relative Humidity) are analysed by applying the new distributions. The estimation of the parameters of the new distributions and mixture of the normals was accomplished by the automaximum likelihood method. Due to complex mathematical resources required to calculate the estimates of the new distributions, we use interactive numerical methods such as L-BFGS-B, BFGS, SANN etc. using an adaptive barrier algorithm added to enforce the constraint and an adapted function that searches for global maximum of a very complex non-linear objective function to initial values of the algorithm of estimation. All computational work was implemented in software R. In most cases, we use the Hartigan’s test to reject unimodality. Using the KolmogorovSmirnov test at significance level of 5% and applying various criteria, such as Mean Square Error, Mean Absolute Deviation and Maximum Deviation, to indicate the best fit. The classical and general method for multimodal adjustment is a mixture of distributions, in particular, the mixture of the normal distributions because the normal distribution presents good mathematical properties. In the case of mixture of the normals, we use EM algorithm to calculate the estimates. We also use Akaike Information Criterion and Bayesian Information Criterion as selection criteria to highlight the best distribution, in both cases, comparing them with the mixture of normal distributions to illustrate the applicability of the results derived in this paper.

Keywords

Rathie-Swamee distribution, Azzalini method, multimodal data set analysis, Akaike criterion information, Bayesian criterion information, maximum likelihood method, Kolmogorov-Smirnov test

Introduction

There are several classical models, such as normal, exponential, binomial, Poisson, logistic etc. to analyze different data sets. As there is not a single unified model, we have to construct new models suitable for the data sets under consideration. The logistic model is very useful in many areas in statistics and physics. This article is divided as follows: Section 2 deals with symmetric generalized logistic distribution whereas in Section 3 the skew form is studied. Section 4 presents applications to analyze five real data sets using the results of earlier sections and comparing them with the mixture of two normal distributions where possible. The article ends with a short conclusion and a list of references. Rathie et al.[1] defined a multimodal symmetric distribution function G(x) for a random variable X∼RS (a, b, p) as

images 1

With a and b not zeros simultaneously. For b=0 or when p=0, (1) is written as a logistic distribution

images 2

Where c=a or c=a + b. The density function corresponding to (1) is

images 3

Generalized Symmetric Logistic Distribution

A symmetric distribution can be generated by using the method proposed by Jones in 2004 [2]. Let U ~Beta (α,α), and X=G- ¹(U), where G(x) is a distribution function of g(x). Then, the distribution function H(x) of X is given as

images 4

Differentiating H(x) yields the corresponding density function as

images 5

Using (1) and (3) in (5), the generalized symmetric logistic density function for X ~ RSG (a, b, p, α) is given by

images 6

Where both a and b not zeros simultaneously and B (., .) is the beta function. For α=1, reduce to (3). We may introduce the location parameter μ in the model (6). There is no need to introduce the scale parameter, otherwise the density function will become non-identifiable. The density function (6) takes the following form on introducing the location parameter μ є R:

images 7

The Figures 1 to 4 show graphs for (6) and (7) respectively for various values of the parameters μ, a, b, p and α.

applied-science-innovations-Graphs-Fixed

Figure 1: Graphs of (6) and (7) for Fixed a.

Figure 2: Graphs of (6) and (7) for fixed b.

Figure 3: Graphs to (6) and (7) for fixed p.

Figure 4: Graphs of (6) and (7) for fixed α.

Distribution function

In this subsection, we prove that the distribution function corresponding to (6) is given by

images 8

Proof. For x > 0, we have

images 9

Substuting images we get

images

we have

images 10

By symmetry, we easily write the result for x < 0.

Moments

In this subsection, we obtain the n-th moments about the origin. By definition,

when n is an even integer

images 11

Then, by expanding the denominator by binomial theorem, we have

images 12

when n is an even integer.

The variance of X ~ RSG (a, b, p, α) is given by

images 13

Generalized Skew Logistic Distribution

In Azzalini density [3]

s(x) = 2 v(x)V[w(x)], x∈R 14

With w(x)=kx; k 2 R, take v(x) as the density function of X ~ RSG(a; b; p; ) and V (x) as the distribution function of X ~ RS(a; b; p). Then, the density function of generalized skew logistic model X ~ RSGA (a; b; p;α; k) is given by

images 15

Introducing the location parameter μ∈^R, the density function of X ∼ RSGA (a, b, p, α, k) is given by

images 16

For certain values of the parameters, s(x) and s1(x) are plotted in Figure 5 for k =±0.7 and in Figures 6 and 7 for a=0 and b=0 respectively.

applied-science-innovations-certain-values

Figure 5: Graphs of (15) and (16) for certain values of the parameters.

Figure 6: Graphs of (15) and (16) for certain values of the parameters with a=0.

Figure 7: Graphs of (15) and (16) for certain values of the parameters with b=0.

Applications Involving Real Data

In the present section, five data sets are analyzed by using the distributions defined in earlier sections as well as the mixture of two normals for bimodal data. The estimation of parameters is done by utilizing the method of maximum likelihood estimation. Akaike Cri- terion Information [4], Bayesian Information Criterion, Mean Square Error, Absolute Mean Deviation and Maximum Absolute Deviation are calculated to judge the fit of RSG, RSGA and mixture of two normals. The goodness of fit test of Kolmogorov-Smirnov is used with significance level of 5%. Some packages of sotfware R are used. The GenSA package [5] is used to obtain initial values to interactive algorithm. For interactive algorithm, we use the bbmle::mle2 package [6], in most cases, using BFGS method and optimizer constrOptim to guarantee that the estimated parameters are consistent within their respective parametric space. For more details to adaptive barrier algorithm, see stats::constrOptim into soft- ware R. We obtain the estimates of the parameters, approximate the standard errors of the estimates based on quadratic approximation to the curvature at the maximum likelihood estimate, and a test (z test) of the parameter difference from zero based on this standard error and on an assumption that the sampling distribution of the estimated parameters is normal.

The AIC and BIC for the classification of the model-fit on data sets in various applications will be used. These are defined below

images 17

where ņ_par is the number of parameters to be estimated and l(.;.) is the logarithm of the estimated likelihood function.

images 18

where η is the number of observations. Mean Square Error (MSE), Mean Absolute Deviation (MAD) and Maximum Absolute Deviation (MD) are defined below:

images

where images is the empirical cumulative distribution and images is the fitted cumulative distribution of the data. Of course, the smallest value obtained will indicate that there is a good fit.

Human body fat index

The data consist of 252 observations on 17 variables about human body fat. For details, see Jonhson [7], Penrose et al. [8], and Ambler et al. [9]. Figure 8 demonstrates that the data is unimodal which is also confirmed by test [10,11] with statistics D=0.014114 and p-value near 1. The estimates of the parameters using RSG and RSGA models are given in Table 1.

RSG Parameter	Estimate	Error	z-value	P (z)
µ	19.26	2.1087 × 10−5	9.1336 × 105	<0.0001
a	0.15401	1.127 × 10−2	13.662	<0.0001
b	10−4	3.3937 × 10−5	2.9467	<0.004
p	2.1986	1.0742 × 10−4	2.0468 × 104	<0.0001
α	1.2338	8.1766 × 10−4	1.5089 × 103	<0.0001
log L	−890.9885
RSGA Parameter	Estimate	Error	z-value	P (z)
µ	7.8768	1.0392 × 10−2	757.9289	<0.0001
a	0.18403	2.8006 × 10−2	6.5712	<0.0001
b	10−4	3.0035 × 10−5	3.3294	<0.0001
p	2.2996	1.9703 × 10−3	1167.086	<0.0001
α	0.35062	7.2149 × 10−2	4.8597	<0.0001
k	1.7177	2.8455 × 10−2	60.3678	<0.0001
log L−889.786

Table 1: Estimates associated with RSG and RSGA models.

Figure 8: Adjustments of two new distributions to Body Fat Index.

Table 2 shows the comparison of the models used. Figure 8 presents the histogram with adjusted models. The empirical and theoretical distributions are shows in Figure 9.

Model	K-S	p-value	MSE (10−4)	MAD	MD	AIC	BIC
RSG	0.047619	0.9375	1.315639	0.009163	0.033421	1791.977	1809.624
RSGA	0.06746	0.615	1.189378	0.008951	0.030355	1791.572	1812.749

Table 2: The comparison of adjusted models used.

applied-science-innovations-theoretical-distributions

Figure 9: Graphs of empirical and theoretical distributions..

For AIC, it may be observed that the RSGA fit is better than RSG fit for this data set. The Bayesian criterion indicates a better fit for RSG distribution.

Precipitation

The data consist of 121 observations about annual precipitation (rain) between 1978 and 1998 at the center of the city of Los Angeles. These data were obtained from the site [12]. Figure 10 demonstrates that the data is unimodal which is also confirmed by Hartigan’s test with statistics D=0.027273 and p-value equal to 0.7971. The estimates of the parameters, using RSGA distribution, are given in Table 3.

Parameter	Estimate	Error	z-value	P(z)
µ	4.0393	4.4968× 10⁻²	89.825	<0.0001
a	49.999	2:6007 × 10^-4	1.9225 × 10⁵	<0.0001
b	34.113	3.9072 × 10⁻⁴	8.7308 × 10⁴	<0.0001
p	0.7582	0.1095	6.9239	<0.0001
α	2.9333 × 10⁻⁴	8.0064 × 10⁻⁵	3.6638	<0.0003
	3.838	2.6556 × 10^-4	1.4452.10^-4	<0.0001
Log L-393.2849

Table 3: Estimates associated with RSGA model.

applied-science-innovations-distribution-fitted

Figure 10: Graphs of empirical and theoretical distributions.

Applying the non-parametric Kolmogorov-Smirnov test, the K-S value obtained is 0.07438 with p-value 0.8914, thus not reject the hypothesis that the data satisfies RSGA distribu- tion. In 2014, Eirado et al. [13] proposed an asymmetric model and applied to this data set. The MSE obtained is equal to 0.001058396, the mean absolute deviation (MAD) is 0.02785116 and the maximum absolute deviation (MD) is 0.06496284. Also, we obtained MSE equal to 0.0002414233, MAD equal to 0.01185483 and MD equal to 0.04669135.

AIC and BIC of the fits of the two models are given in Table 3,4. The empirical and theoretical distributions are shows in Figure 11. Clearly, the RSGA distribution gave better fit to the precipitation data.

Model	log-likelihood	AIC	BIC
RSGA	−393.2849	798.5697	815.3444
Eirado-Rathie	−551.6425	1113.285	1127.264

Table 4: The comparison of the models.

Figure 11: Empirical and theoretical distributions of precipitation.

HIV Data

The HIV data with 2843 observations is available in fitdistrplus: Aids2 package of software R, giving the age when a patient is diagnosed with AIDS in Australia in 1991. Table 5 presents the estimates of the parameters of RSG and RSGA models.

RSG	Estimate	Error	z-value	P (z)
µ	36.931	0.18698	197.51	<0.0001
a	0.16731	0.017989	9.3006	<0.0001
p	8.9282	5.2278 × 10−17	1.7078 × 1017	<0.0001
α	1.1148	0.017463	6.3838	<0.0001
log L	−10552.23
RSGA	Estimate	Error	z-value	P (z)
µ	27.477	0.031826	86.336	<0.0001
a	0.05717	0.001222	46.779	<0.0001
p	9.7371	1.0564 × 10−15	9.2174 × 1015	<0.0001
α	3.5391	0.20708	17.091	<0.0001
k	4.5317	0.17192	26.359	<0.0001
log L−10508.95

Table 5: Estimates associated with RSGA and RSG models.

NORSKEW	Estimate	Error	z-value	P (z)
µ	37.5304	0.187355	200.317	<0.0001
σ	10.01696	0.13529	74.041	<0.0001
ξ	1.273675	0.031561	40.355	<0.0001
log L −10549.26
µ	37.40907	0.1887	198.245<0.0001
σ	10.06149	0.13343	75.406<0.0001
log L−10597.72

Table 6: Estimates associated with asymmetric normal and normal distributions.

Histogram and RSGA distributions to HIV data are shown in Figure 12 while Empirical and RSGA distributions in Figure 13. In Table 7, the Kolmogorov-Smirnov test rejects almost all adjusted distributions except RSGA distribution.

applied-science-innovations-Adjustments-distribution

Figure 12: Adjustments of RSGA distribution to HIV data.

Model	K-S	p-value	MSE(10−4)	MAD	MD	AIC	BIC
RSG	0.289524	0.0014	4.376593	0.017052	0.04955	21112.47	21136.28
RSGA	0.063492	0.69	1.450326	0.009691	0.032249	21027.9	21057.66
NORSKEW	0.041857	0.01373	3.033533	0.014451	0.040824	21104.53	21122.39
NORMAL	0.059796	7.696 × 10⁻⁵	8.539093	0.025396	0.058367	21199.44	21211.35

Table 7: Comparison of the models used. Comparison of the models used.

applied-science-innovations-Empirical-distribution

Figure 13: Empirical and theoretical distributions to HIV data.

pH Concentration data

The pH concentration data [14] with 252 observations show bimodality which is also demonstrated by Hartigan’s test with statistics of the test equal to 0.046498 and p-value of 0.00045. The estimates of the parameters are given in Table 8.

RSGA	Estimate	Error	z-value	P (z)
µ	3.094726	0.071289	43.4109	<0.0001
a	8.242063	2.241954	3.6763	<0.0003
b	0.003	0.001066	2.8153	0.004874
p	6.244648	0.344886	18.1064	<0.0001
α	0.045077	0.011673	3.8616	<0.0002
k	0.86603	0.335523	2.5811	0.009848
log L −364.2
µ	4.918676	0.042907	114.6364	<0.0001
a	6.027683	0.616692	9.7742	<0.0001
b	2.906972	1.071397	2.7133	<0.007
p	2.711035	0.459798	5.8961	<0.0001
α	0.068114	0.006893	9.8812	<0.0001
log L −363.7172

Table 8: Estimates associated with RSGA and RSG models.

Silva et al. [15] proposed two new asymmetric models by Azzalini’s method h1(x) and h2(x) where the pH concentration data was fitted by these two models. Table 10 shows the performance of the fitted distributions.

Using package of Benaglia et al. [16], the estimates of mixture of normals are given in Table 9 with parametric bootstrap performed for standard error approximation.

Parameters	Component 1	Component 2	Error of Component 1	Error of Component 2
λ	0.50439	0.49561	0.041677	0.0416768
µ	3.892103	5.961384	0.076694	0.07539492
σ	0.575443	0.568638	0.056243	0.05409495
log L	−366.8661

Table 9: Estimates of mixture of two normal.

Histogram of pH values along with the distributions adjusted are shown in Figures 14 and 15

applied-science-innovations-fitted-models

Figure 14: pH histogram and the fitted models.

Figure 15: Graphs of empirical and theoretical distributions.

Table 10 gives the accuracy values of AIC, BIC, MSE etc, for various models. The RSG model adjusted well the bimodal data.

Model	K-S	p-value	MSE (10⁻⁴)	MAD	MD	AIC	BIC
RSG	0.06746	0.61	1.814886	0.01067501	0.039083	737.4343	755.0871
RSGA	0.075397	0.4709	2.546568	0.01283771	0.038684	740.4067	761.5833
NORMIX	0.083333	0.3457	7.407901	0.02202145	0.064505	743.7322	761.3793
h1(x)	–	0.8316	3	0.0152	0.0373	744.6913	776.4561
h2(x)	–	0.09438	96	0.0912	0.1454	857.387	889.1519

Table 10: Comparison of the models used.

Relative Humidity (RH)

The RH observations data are taken from Nychka et al. [17]. The estimates of the parameters for RH data using the RSGA model are given in Table 11.

Parameter	Estimate	Error	z-value	P (z)
µ	59.72236	0.008989	6643.879	<0.0001
a	0.034228	0.016025	2.1359	<0.04
b	0.002588	0.001281	2.0199	<0.05
p	1.227392	0.151744	8.0886	<0.0001
α	0.266291	0.115667	2.3022	<0.03
k	−0.4621166	0.095596	−4.8341	<0.0001

Table 11: Estimation of the parameters of the RSGA model.

The estimation for a mixture of two normal s are given in Table 12. The values of AIC, BIC etc. measuring the quality of fit are given in Table 13.

NORMIX	Component 1	Component 2	Error Component 1	Error Component 2
λ	0.6975	0.3025	0.025634	0.02563423
µ	36.8122	77.08626	0.865337	1.139471
σ	11.835	9.28641	0.648855	0.8474422
log L	−1958.626

Table 12: Estimation of the parameters of the mixture of two normal.

Model	K-S	p-value	EQM (10−4)	MAD	MD	AIC	BIC
RSGA	0.080178	0.1115	8.497387	0.0217099	0.076005	3926.544	3951.182
NORMIX	0.073497	0.1768	9.316102	0.02176488	0.066207	3927.252	3947.787

Table 13: Comparison of the models used,

In Figure 16, the histogram and the fit using Empirical, RSGA and the mixture of two normals distributions are shown. In Figure 17, the empirical and theoretical distributions are shown.

appliedapplied-science-innovations-adjusted-model

Figure 16: Relative Humidity and adjusted model.

Figure 17: The empirical and theoretical distributions.

Conclusion

The Rathie-Swamee generalized distribution (RSG) and its skew form (RSGA) proved useful to five data sets analyzed, thus demonstrating their applicabilities over the mixture of two normals, in case of bimodal sets (pH concentration and relative humidity).

Acknowledgement

P. N. Rathie thanks the Coordination for the Improvement of Higher Level Personnel (CAPES) for supporting his Senior National Visiting Professorship.

References

Rathie PN, et al. On a new invertible generalized logistic distribution approximation to normal distribution, technical research report in statistics. Applisciephy. 2006;1:120-125.
Jones M, et al. Families of distributions arising from distributions of order statistics. Test.2004;13:1-43.
Azzalini A. A class of distributions which includes the normal ones. Scand J Stat Theory. 1985;12:171-178.
Akaike RW.A new look at the statistical model identication. IEEE Trans Automat Contr. 1974;19:716-723.
Xiang Y, et al.Generalized simulated annealing for ecient global optimization: the GenSA package. R J. 2013;5:13-28.
https://rdrr.io/cran/bbmle/
Johnson RW and Carleton C. Fitting percentage of body fat to simple body measurements. J Stat Educ. 1996;4:265-266.
Penrose KW, et al. Generalized body composition prediction equation for men using simple measurement techniques, Med ScieSpo Exe. 1985;17:1-8.
https://cran.r-project.org/web/packages/mfp/index.html
Hartigan JA and Hartigan PM. The dip test of unimodality. Ann Stat.1985;13:70-84.
https://cran.r-project.org/web/packages/diptest/index.html
http://www.noaa.gov/
Eirado CRDR and Rathie PN. On new invertible skew and symmetric distributions. StatRese Lett.2014;3:17-22.
Growitz DJ, et al. Reconnaissance of mine drainage in the coal elds of eastern Pennsylvania. TechRep US Geological Survey. 1985.
Silva RO, et al. On the new skew distributions using azzalinis formula. MathematicaAeterna. 2016;6:649-673.
Benaglia T, et al. Mixtools: An R package for analyzing nite mixture models. J Stat Softw.2009;32:1-29.
Nychka D, et al. Tools for plotting skew-T diagrams and wind profiles, R package version. 2014;1.4.