## A simple test with Nucleus data

Ok, lets start simple and test if mean SAP ROE is zero or not.

The hypotheses are \begin{aligned} \mathcal{H}_0 &: \mu = 0 \\ \mathcal{H}_1 &: \mu \ne 0. \end{aligned}

The $$t$$-statistic is $\frac{\bar{y} - 0}{s/\sqrt{n}} = \frac{\bar{y}}{s/\sqrt{n}}.$

nucleus <- read.csv("../data/nucleus.csv")
sap.ybar <- mean(nucleus$ROE) n <- nrow(nucleus) sap.se <- sd(nucleus$ROE)/sqrt(n)
tstat <- sap.ybar/sap.se
tstat
## [1] 4.43258

We can use the pt function to calculate the $$p$$-value;

• by default pt uses the lower tail.
phi <- 2*pt(-abs(tstat), n-1)
phi
## [1] 2.931474e-05
• This value is tiny, giving lots of evidence against the null.
• If we choose $$\alpha = 0.05$$ we clearly reject the null.

R provides a simple command automating all of this.

t.test(nucleus$ROE) ## ## One Sample t-test ## ## data: nucleus$ROE
## t = 4.4326, df = 80, p-value = 2.931e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   6.963478 18.310596
## sample estimates:
## mean of x
##  12.63704

## A less flimbsy straw man

What about testing whether mean SAP ROE agrees with mean industry ROE?

• Let $$\bar{y}^I$$ be the industry average ROE calculated from the data.
• This isnâ€™t quite right, again, but it makes a nice illustration.

The hypotheses are \begin{aligned} \mathcal{H}_0 &: \mu = \bar{y}^I \\ \mathcal{H}_1 &: \mu \ne \bar{y}^I. \end{aligned} and the $$t$$-stat is

tstat2 = (sap.ybar-mean(nucleus$IndustryROE))/sap.se tstat2 ## [1] -1.073935 Now the $$p$$-value: phi2 <- 2*pt(-abs(tstat2), n-1) phi2 ## [1] 0.2860804 t.test(nucleus$ROE, mu=mean(nucleus$IndustryROE)) ## ## One Sample t-test ## ## data: nucleus$ROE
## t = -1.0739, df = 80, p-value = 0.2861
## alternative hypothesis: true mean is not equal to 15.69877
## 95 percent confidence interval:
##   6.963478 18.310596
## sample estimates:
## mean of x
##  12.63704
• Not enough evidence to reject the null at $$\alpha = 0.05$$.

## Doing the SAP/Industry comparison right

The two sample test for difference in means:

n <- nrow(nucleus)
ind.ybar <- mean(nucleus$IndustryROE) delta <- sap.ybar - ind.ybar se.delta <- sqrt(var(nucleus$ROE)/n + var(nucleus$IndustryROE)/n) phi <- 2*pnorm(-abs(delta/se.delta)) phi ## [1] 0.3103706 • Not enough evidence to reject the null hypothesis! • These populations have the same mean. The shortcut in R performs a similar, so-called Welch test. t.test(nucleus$ROE, nucleus$IndustryROE) ## ## Welch Two Sample t-test ## ## data: nucleus$ROE and nucleus$IndustryROE ## t = -1.0144, df = 99.039, p-value = 0.3128 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -9.050329 2.926872 ## sample estimates: ## mean of x mean of y ## 12.63704 15.69877 The explanation is all in a comparison of historgrams, which perhaps should have looked at this a long time ago! hist(nucleus$ROE, main="Atoms and averages")
legend("topright", c("SAP", "Industry"), col=1:2, pch=c(21,19))