A simple test with Nucleus data

Ok, lets start simple and test if mean SAP ROE is zero or not.

The hypotheses are \[ \begin{aligned} \mathcal{H}_0 &: \mu = 0 \\ \mathcal{H}_1 &: \mu \ne 0. \end{aligned} \]

The \(t\)-statistic is \[ \frac{\bar{y} - 0}{s/\sqrt{n}} = \frac{\bar{y}}{s/\sqrt{n}}. \]

nucleus <- read.csv("../data/nucleus.csv")
sap.ybar <- mean(nucleus$ROE)
n <- nrow(nucleus)
sap.se <- sd(nucleus$ROE)/sqrt(n)
tstat <- sap.ybar/sap.se
tstat
## [1] 4.43258

We can use the pt function to calculate the \(p\)-value;

phi <- 2*pt(-abs(tstat), n-1)
phi
## [1] 2.931474e-05

R provides a simple command automating all of this.

t.test(nucleus$ROE)
## 
##  One Sample t-test
## 
## data:  nucleus$ROE
## t = 4.4326, df = 80, p-value = 2.931e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   6.963478 18.310596
## sample estimates:
## mean of x 
##  12.63704

A less flimbsy straw man

What about testing whether mean SAP ROE agrees with mean industry ROE?

The hypotheses are \[ \begin{aligned} \mathcal{H}_0 &: \mu = \bar{y}^I \\ \mathcal{H}_1 &: \mu \ne \bar{y}^I. \end{aligned} \] and the \(t\)-stat is

tstat2 = (sap.ybar-mean(nucleus$IndustryROE))/sap.se
tstat2
## [1] -1.073935

Now the \(p\)-value:

phi2 <- 2*pt(-abs(tstat2), n-1)
phi2
## [1] 0.2860804
t.test(nucleus$ROE, mu=mean(nucleus$IndustryROE))
## 
##  One Sample t-test
## 
## data:  nucleus$ROE
## t = -1.0739, df = 80, p-value = 0.2861
## alternative hypothesis: true mean is not equal to 15.69877
## 95 percent confidence interval:
##   6.963478 18.310596
## sample estimates:
## mean of x 
##  12.63704

Doing the SAP/Industry comparison right

The two sample test for difference in means:

n <- nrow(nucleus)
ind.ybar <- mean(nucleus$IndustryROE)
delta <- sap.ybar - ind.ybar 
se.delta <- sqrt(var(nucleus$ROE)/n + var(nucleus$IndustryROE)/n)
phi <- 2*pnorm(-abs(delta/se.delta))  
phi
## [1] 0.3103706

The shortcut in R performs a similar, so-called Welch test.

t.test(nucleus$ROE, nucleus$IndustryROE)
## 
##  Welch Two Sample t-test
## 
## data:  nucleus$ROE and nucleus$IndustryROE
## t = -1.0144, df = 99.039, p-value = 0.3128
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.050329  2.926872
## sample estimates:
## mean of x mean of y 
##  12.63704  15.69877

The explanation is all in a comparison of historgrams, which perhaps should have looked at this a long time ago!

hist(nucleus$ROE, main="Atoms and averages")
hist(nucleus$IndustryROE, col=2, add=TRUE)
legend("topright", c("SAP", "Industry"), col=1:2, pch=c(21,19))