A Note On The Asymptotic Convergence of Bernoulli Distribution

Adeniran Adefemi T; Ojo J. F.; Olilima J. O

A Note On The Asymptotic Convergence of Bernoulli Distribution

Adeniran Adefemi T^1*, Ojo J. F.² and Olilima J. O¹

¹Department of Mathematical Sciences, Augustine University Ilara-Epe, Nigeria

²Department of Statistics, University of Ibadan, Ibadan, Nigeria

*Corresponding Author:: Adeniran Adefemi T
Department of Mathematical Sciences, Augustine University Ilara-Epe, Nigeria.
E-mail: adefemi.adeniran@augustineuniversity.edu.ng

Received Date:20/10/2018; Accepted Date: 05/11/2018; Published Date: 12/11/2018

Visit for more related articles at Research & Reviews: Journal of Statistics and Mathematical Sciences

Abstract

This paper presents concepts of Bernoulli distribution, and how it can be used as an approximation of Binomial, Poisson and Gaussian distributions with different approach from earlier existing literatures. Due to discrete nature of the random variable X, a more appropriate method of Principle of Mathematical Induction (PMI) is used as an alternative approach to limiting behavior of binomial random variable. The study proved de'Moivres Laplace theorem (convergence of binomial distribution to Gaussian distribution) to all values of p such that p p ≠ 0 and p ≠ 1 using a direct approach which opposes the popular and most widely used indirect method of moment generating function.

Keywords

Bernoulli distribution, binomial distribution, Poisson distribution, Gaussian distribution, Principle of Mathematical Induction, convergence, moment generating function. AMS subject classication: 60B10, 60E05, 60F25

Introduction

If X is a random variable, the function f(x) whose value is P[X=x] for each value x in the range of X is called probability distribution [1]. There are various Probability distributions used in Sciences, Engineering and Social Sciences. They are numerous to the extent that there is no single literature that can give their comprehensive list. Bernoulli distribution is one of the simplest probability distributions in literature. An experiment consisting of only two mutually exclusive possible outcomes, success or failure, male or female, life or death, defective or non-defective, present or absent is called a Bernoulli trial [1-3]. These Independent trials having a common success of probability p were first studied by the Swiss mathematician, Jacques Bernoulli in his book Ars Conjectandi, published by his nephew Nicholas eight years after his death in 1713 [2,4-8].

Definition 1.1 (The Bernoulli Distribution)

If X is a random variable that takes on the value 1 when the Bernoulli trial is a success (with probability of success p є (0; 1)) and the value 0 when the Bernoulli trial is a failure (with probability of failure q=1-p), then X is called a Bernoulli Random Variable with probability function.

Equation (1)

The function (1), where 0 < p < 1 and p+q=1, is called the Bernoulli probability function.

The distribution is rarely applied in real life situation because of its simplicity and because it has no strength of modeling a metric variable as it is restricted to whether an event occur or not with probabilities p and 1-p, respectively [9]. Despite this limitation, Bernoulli distribution turns (converge) to many compound and widely used robust distributions; n-Bernoulli distribution is binomial, limiting-value of binomial distribution gives Poisson distribution, binomial distribution under certain conditions yields Gaussian distribution and so on [1].

Convergence of a particular probability distribution to another and other related concepts such as limiting value of probability function or proofing of central limit theorem and law of large number had been thoroughly dealt with in literature by many authors [10-13]. All these studies adopted moment generating function (mgf) approach, perhaps because it is easy to tract and the mathematics involved is less rigorous. Based on their approach (mgf approach), they were able to establish the convergence with a valid evidence because of uniqueness property of the mgf which asserts that.

Definition 1.2 (Uniqueness Theorem)

Suppose F_X and F_Y are two cumulative distribution functions (cdf 's) whose moments exist, if the mgf's exist for the r.v.'s X and Y and M_X(t)=M_Y(t) for all t in -h<t<h, h>0, then, FX(u)=Y(u) for all u. That is, f(x)=f(y) for all u.

The limitation of this approach is that mgf does not exist for all distributions, yet the particular distribution approach normal or converge to other distribution under some specified condition(s). For example, the log normal distribution doesn't have a mgf, still it converges to normal distribution [14,15]. A standard/direct proof of this more general theorem uses the characteristic function which is defined for any distribution [10,11,13,16-19]. Opined that direct proving of convergence of a particular probability density function (pdf) to another pdf as n increasing indefinitely is rigorous, since it is based on the use of characteristic functions theory which involves complex analysis, the study of which primarily only advanced mathematics majors students and professors in colleges and universities understand. In this paper, we employed the direct approach to proof convergence of binomial distribution to Poisson and Gaussian distributions with a lucid explanation and supports our theorems with copious lemma to facilitate and enhance students understanding regardless of their mathematical background vis-a-vis their level.

The structure of the paper is as follows; we provide some useful preliminary results in section 2, these results will be used in section 3 where we give details convergence of Bernoulli to Binomial, Poisson and Gaussian distributions. Section 4 contains some concluding remarks.

Preliminaries

In this section, we state some results that will be used in various proofs presented in section 3.

Equation

Proof 2.1 The proof of this lemma is as follows: using the binomial series expansion, Equation can be expressed as follows;

Equation

Taking limit of the preceding equation as n → ∞ gives

Equation

The RHS of the preceding equation is the same with Maclaurin series expansion of e^-λ. Hence,

Equation (2)

Lemma 2.2 Prove that Equation

Proof 2.2

Equation

Equation (3)

Lemma 2.3 (Stirling Approximation Principle) Given an integer α; α>0, the factorial of a large number α can be replaced with the approximation Equation

Proof 2.3 This lemma can be derived using the integral definition of the factorial,

Equation (4)

Note that the derivative of the logarithm of the integrand can be written

Equation

The integrand is sharply peaked with the contribution important only near x=α. Therefore, let x=α+δ where δ ≤ α, and write

Equation

Equation (5)

Taking the exponential of each side of (5) gives

Equation (6)

Plugging (6) into the integral expression for α! that is, (4) gives

Equation (7)

From (7), Suppose

Equation

Transforming I2 from algebra to polar coordinates yields δ=ρ cos θ, k=ρ sin θ and δ2+k2=ρ2. Jacobian (J) of the transformation is

Equation

Hence,

Equation

Substituting for I in (7) gives

Equation (8)

Convergence of Bernoulli Distribution to Other Probability Distribution

Bernoulli Probability Function to Binomial Probability Distribution

Theorem 3.1 (Convergence of bernoulli to binomial)

Suppose that the outcomes are labeled success and failure in n Bernoulli trials, and that their respective probabilities are p and q=1-p. If Sn=X is the total number of success in the n Bernoulli trials, what is the probability function of x? In other word, what is P[Sn=x] or P[X=x] for x=0,1,2……n?

Proof 3.1 To solve this problem, let us first consider the event of obtaining a success on each of the first x Bernoulli trials followed by failures on each of the remaining n-x Bernoulli trials. To compute the probability of this event, let si denote the event of success and fi the event of failure on the i^th Bernoulli trial. Then, the probability we seek is P(s1s2….sxfx+1fx+2…fn). Now, because we are dealing with repeated performances of identical experiments (Bernoulli experiments), the events s1s2….sxfx+1fx+2…fn are independent. Therefore, the probability of getting x success and n-x failures in the one specific order designated below is

Equation (9)

In fact, the probability of getting x success and n-x failures in any one specific order is px(1-p)n-x. Thus, the probability of getting exactly x successes in n Bernoulli trials is the number of ways in which x successes can occur among the n trials multiplied by the quantity px(1-p)n-x. Now, the number of ways in which x successes can occur among the n trials is the same as the number of distinct arrangements of n items of which x are of one kind (success) and n-x are of another kind (failure). In counting techniques, this number is the binomial coefficient

Equation (10)

In conclusion, the probability function of Sn or X, the number of successes in n Bernoulli trials, is the product of (9) and (10) which gives

Equation (11)

This probability function is called the binomial probability function or the binomial probability distribution, or simply binomial distribution because of the role of the binomial coefficient in deriving the exact expression of the function. In view of its importance in probability and statistics there is a special notation for this function, as we shall see in the definition given below.

Definition 3.1 (The binomial probability function)

Let nєz+ i.e, be a positive integer, and let p and q be probabilities, with p+q=1. The function (11)

Equation

is called the binomial probability function. A random variable is said to have a binomial distribution, and is referred to as a binomial random variable, if its probability function is the binomial probability function.

Binomial Probability Function To Poisson Probability Function

In this section, we prefer to take different approach (Principle of Mathematical Induction) to show that as n number of Bernoulli trials tends to infinity, the binomial probability binom[X=x;n,p] converges to the Poisson probability POI[X=x;λ] provided that the value of p is allowed to vary with n so that np remains equal to λ for all values of n. This is formally stated in the following theorem, which is known as Poisson's Limit Law/Theorem.

Theorem 3.2 (Poisson's limit theorem)

If p varies with n so that np=λ, where λ is a positive constant, then

Equation

Proof 3.2 Required to show that as n → ∞, p↓0 and np → λ (a constant), one has X(n,p) →D X where X is Poisson with mean λ. Using (11), if x = 0 then

Equation

From Theorem (3.2), as n → ∞, p ↓ 0 and np → λ (a constant) implies Equation

Equation

Similarly; when x=1

Equation

Again, when x=2,

Equation

For x=k,

Equation

Therefore,

Equation (12)

Now for x=k+1

Equation

Multiplying numerator and the denominator of the preceding equation by (n-k) gives

Equation

Note: As 5 X 4! = 5!, so (n-k)(n-k-1)!=(n-k)! and (k+1)!=(k+1)k!

Equation

Equation and from (12), the limit inside the square bracket is Hence,

Equation (13)

Since this is true for x=0, 1, 2,…, k and k+1, then it is true Equation . That is,

Equation (14)

This is the required Poisson probability distribution function introduced by a French mathematician Simeon Denis Poisson (1781-1840).

Definition 3.2 (Poisson random variable) A random variable X, taking on one of the values 0, 1, 2,…, (a count of events occurring at random in regions of time or space) is said to be a Poisson random variable with parameter λ if for some λ > 0,

Equation (14) defines a probability mass function, since

Equation (15)

Remarks 3.1 Thus, when n is large and p is small, we have shown using Principle of Mathematical Induction approach that Poisson with parameter λ is a very good approximation to the distribution of the number of success in "n" independent Bernoulli trials.

Binomial Distribution to Gaussian Distribution

In probability theory, de Moivre Laplace theorem asserts that under certain conditions, the probability mass function of the random number of "successes" observed in a series of n independent Bernoulli trials, each having probability p of success, converges to the probability density function of the normal distribution with mean np and standard deviation Equation as n grows large, assuming p is not 0 or 1. The theorem appeared in the second edition of The Doctrine of Chances by Abraham de Moivre, published in 1738 [6,17]. Although de Moivre did not use the term "Bernoulli trials", he wrote about the probability distribution of the number of times "heads" appears when a coin is tossed 3600 times. He proved the result for p=1/2 [7,20,21]. In this paper, we support the prove to all values of p such that p ≠ 0 and p ≠ 1.

Theorem 3.3 (de Moivre's Laplace Limit Theorem) As n grows large, for x in the neighborhood of np we can approximate

Equation (16)

in the sense that the ratio of the left-hand side to the right-hand side converges to 1 as N → ∞, x → ∞ and p not too small and not too big.

Proof 3.3 From equation (11), the binomial probability function is

Equation (17)

Hence by lemma (2.3) we have

Equation

Multiplying both numerator and denominator by Equation , we have;

Equation

Equation (18)

Change variables x=Np+ε, ε measures the distance from the mean of the Binomial Np, and the measured quantity r. The variance of a Binomial is Np(1-p), so the typical deviation of x from Np is given by Equation . Terms of the form will therefore be of order and will be small.

Re-write (18) in terms of ε.

Equation

Equation (19)

Rewrite (19) in exponential form to have

Equation

Equation (20)

Suppose f(x)=ln(1+x), using Maclaurin series Equation and similarly Therefore

Equation (21)

Equation (22)

Putting (21) and (22) in (20) and simplify, we have

Equation

Recall that x=Np+ε which implies that ε2=(x-Np)2. From Binomial distribution Np=μ and Np(1-p)=σ2 which also implies that Making appropriate substitution of these in the equation (23) above provides;

Equation (24)

We have easily derived equation (24) that is popularly and generally known anywhere in the whole world as Normal or Gaussian distribution.

Definition 3.3 (Normal Distribution) A continuous random variable X has a normal distribution, is said to be normally distributed, and is called a normal random variable if its probability density function can be defined as follows: Let μ and σ be constants with -∞ < μ < ∞ and σ > 0. The function (24)

Equation

is called the normal probability density function with parameters μ and σ.

The normal probability density function is without question the most important and most widely used distribution in statistics [20]. It is also called the "Gaussian curve" named after the Mathematician Karl Friedrich Gauss.

To verify that equation (24) is a proper probability density function with parameters μ and σ2 is to show that the integral

Equation

is equal to 1.

Change variables of integration by letting Equation which implies that dx=σdz. Then

Equation

Now

Equation

or equivalently

Equation

Here x; y are dummy variables. Switching to polar coordinate by making the substitutions x=r cos θ, y=r sin θ produces the Jacobian of the transformation as

Equation (25)

Equation

Put Equation Therefore,

Equation

Thus I=1, indicating that (24) is a proper p.d.f.

Concluding Remarks

It is now well-known that a Binomial r.v. is the sum of i.i.d. Bernoulli r.v.s, Poison r.v. arises from Binomial (n Bernoulli trial) with n increasing indefinitely, p reducing to 0 so that np=λ (a constant), Also, when n (number of trials in a Bernoulli experiment) increases without bound and p ≠ 0 and p ≠1 (that is, p is moderate), the resulting limiting distribution is Gaussian with mean (μ)=np and variance (σ²)=np(1-p). This our alternative technique provides a direct approach of convergence, this material should be of pedagogical interest and can serve as an excellent teaching reference in probability and statistics classes where only basic calculus and skills to deal with algebraic expressions are the only background requirements. The proofs are straightforward and require only an additional knowledge of Maclaurin series expansion, gamma function and basic limit concepts which were thoroughly dealt with in preliminaries section.

References

Hogg RV, et al. Probability and Statistical Inference, Fifth edition, Prentice-Hall,Inc. Simon and Schuster/ A Viacom Company, Upper Saddle River, New Jersey 07458, United State of America,. 1997.
Peggy T S. A First Course in Probability and Statistics with Applications, second ed., Harcourt Brace Jovanovich (HBJ) Publishers, Washington, D.C. 1989.
Walck C. Hand-book on Statistical Distributions for Experimentalists, Particle Physics Group, Fysikum University of Stockholm. 2007.
Baclawski K, et al. An Introduction to Probability and Random Processes. 1979.
Sering RJ. Approximation Theorems of Mathematical Statistics, Wiley. 1980.
Stigler SM. The History of Statistics: The Measurement of Uncertainty before 1900, Cambridge, MA: The Belknap Press of Harvard University Press. 1986.
Soong TT. Fundamentals of Probability and Statistics for Engineers, John Wiley and Sons. 2004.
Reinhard, V. Probability and Statistics: Probability Theory, Stochastic Processes and Random Fields. Eolss Publishers. 2009.
Billingsley P. Probability and Measure, 3rd edition, Wiley. 1995.
Inlow, M. A moment generating function proof of the Lindeberg-Levy central limit theorem. Am Statistician. 2010;64:228-230.
Bagui S C, et al. A few counter examples useful in teaching central limit theorem, The Am Statistician. 2013a;67:49-56.
Bagui SC, et al. Non-rigorous proof's Stirling's formula, Mathematics and Computer Education, 2013b;47:115-125.
Bagui SC, et al. Convergence of Binomial, Poisson, Negative-Binomial, and Gamma to Normal Distribution: Moment Generating Functions Technique. Am J Math Stat. 2016;6:115-121.
Reed WJ, et al. The double Pareto-lognormal distribution-a new parametric model for size distributions. Communications in Statistics-Theory and Methods. 2004;33:1733-1753.
Lesigne E. Heads or tails: An introduction to limit theorems in probability. Am Math Soc. 2005.
Feller, W. An Introduction to Probability Theory and Its Applications. John Wiley and Sons 1973.
Proschan MA. The Normal approximation to the binomial. The American Statistician, 2008;62:62-63.
Bain LJ. Et al. Introduction to Probability and Mathematical Statistics. Belmont: Duxbury Press. 1992.
Jeffrey DB, et al. Illustrating the Law of Large Numbers. Am Statistical Assoc. 2003;57.
Richards JI, et al. Theory of Distributions: A Non-technical Introduction. Cambridge University Press. 1990.
Ramana BV. Higher Engineering Mathematics, Tata McGraw Hill Publishing Company Limited. 2008.