DS9 Project

Learning Shiny, Slidify, etc...

CChevalier
Data Science Learner

Remember Course DS6 - Statistical Inference?

In the first project of the DS6 course we investigated the exponential distribution and the Central Limit Theorem.

lambda <- 0.2; n <- 40; nosim <- 1000
demo_exp  <- rexp(n, lambda)

The mean of the exponential distribution is: \(\mu = 1/\lambda\) and the standard deviation is: \(\sigma = 1/\lambda\)

# Mean of the demo distribution: Theory vs Computed demo 
c(1/lambda, mean(demo_exp))
## [1] 5.000000 4.353505
# Standard deviation of the demo distribution: Theory vs Computed demo
c(1/lambda, sd(demo_exp))
## [1] 5.00000 5.27591

The Central Limit Theorem

We generated 1000 simulations of 40 samples each from the exponential distribution with rate = lambda and computed the mean value of each simulation

simus <- matrix(rexp(nosim * n, lambda), ncol = n, byrow = TRUE)
simus_mean <- apply(simus, 1, mean)

According to the Central Limit Theorem (CLT), the distribution of sample means, \(\bar X\), is approximately normal with mean = \(\mu\) and variance = \(\sigma^2/n\)

# Overall mean: CLT Theory vs Simulated
c(1/lambda, mean(simus_mean))
## [1] 5.000000 5.055629
# Variance: CLT Theory vs Simulated
c((1/lambda)^2 /n, var(simus_mean))
## [1] 0.6250000 0.6281013

Now there is a ShinyApp for that!

What's in there?

This ShinyApp allows the user to experiment with the exponential distribution and to assess the validity of the Central Limit Theorem (CLT) as also done in the first project of Course DS6 - Statistical Inference.

This app generates first a 1000 simulations of 50 exponentials each for a user specified rate of the exponential distribution (the lambda parameter).

From this overall pool of simulations the user is able to select a subset (n, nosim) of the pre-computed simulations in order to assess the impact of these parameters on the statistical analysis of the mean of each simulation. The app presents the following results in different numbered panels:

  1. Basic statistical analysis of a given simulation
  2. Plot of the mean value of each simulation
  3. CLT: Distribution of mean values
  4. CLT: Q-Q plot of mean values

Thank you!