Chapter 20 Sample size for multiple regression

When you conduct a multiple regression there are two ways to think about conducting a sample size analysis.

Focus on having a sufficient power to the variance accounted for by the set of predictors (i.e. \(R^2\))
Focus on having a sufficient power to the unique variance account for by a single predictor (i.e. \(sr^2\))

20.1 Sample size using \(R^2\)

You want to run a multiple regression study in an existing research area using 3 predictors. A past study, \(N\) = 100, found an \(R^2=.20\). How many people do you need in your study to have 80% power for the overall regression equation (i.e., \(R^2\)).

We need to determine two pieces of information to proceed: 1) the effect size as \(f^2\) instead of \(R^2\) and 2) the degrees of freedom for the predictors. We use these two pieces of information in the pwr command.

20.1.1 Determining \(f^2\)

We need to represent \(R^2\) as \(f^2\). We do this with the formula:

\[ f^2=\frac{R^2}{1-R^2} \]

So in R we type:

my_f2 <- .20 / (1 - .20)
print(my_f2)

## [1] 0.25

Thus, \(f^2=.25\).

20.1.2 Determine degrees of freedom

There are degrees of freedom for the predictors (\(u\)) and error (\(v\)) in the original study. The degrees of freedom for the predictors is easy to compute:

Degrees of Freedom Predictors: \(u =\) number of predictors = 3.

20.1.3 Calculate sample size

With multiple regression we send R the effect size (.25) the degrees of freedom for the predictor (3). The output provide returns the degrees of freedom for the denominator - which requires a bit of work to convert to sample size.

We begin with the R code below:

library(pwr)
pwr.f2.test(u = 3,
            f2 = .25,
            power = .90)

## 
##      Multiple regression power calculation 
## 
##               u = 3
##               v = 56.75
##              f2 = 0.25
##       sig.level = 0.05
##           power = 0.9

What does this output mean? It provides us with the degrees of freedom error (i.e., \(v\)) for your study (not the original study). We need to use the degrees of freedom error to calculate the \(N\) needed.

Degrees of Freedom Error: \(v = N - u - 1\) therefore \(N = u + v + 1\).

We know \(u = 3\) because there were three predictors. Our power analysis revealed we need \(v = 57\) in our study.

Therefore \(N = u + v + 1 = 3 + 57 + 1\)

N = 3 + 57 + 1
print(N)

## [1] 61

Thus we want an N of 61 in our study.

20.2 Sample size using \(sr^2\)

You want to run a multiple regression study in an existing research area using 3 predictors (A, B, and C). A past study, \(N\) = 100, found an \(R^2=.20\). You are particularly interested in one predictor (predictor C) that you think will account for 2% of the variance in the criterion above and beyond the other two predictors. Said another way you believe .02 of the .20 will be a result of the unique contribution of one predictor. What sample size do you need to ensure you have power of .90 for detecting this increment in variance.

Let’s think about this scenario in terms of Blocks to make it a bit clear. You are describing a scenario where if you put predictors A and B are in Block 1 you would obtain an overall \(R^2 = .18\). Then when you put Predictor A, B, and C into Block 2 you would obtain an overall \(R^2 = .20\), an increment of .02 so \(sr^2\) for predictor C is .02.

We want to conduct a power analysis to determine how many people you need to ensure power of .90 for this incremental prediction effect.

20.2.1 Determine degrees of freedom

We are interested in the incremental prediction of one variable so \(u = 1\).

20.2.2 Determine \(f^2\)

We know \(sr^2=.02\) and \(R^2 = .20\). We can use these to compute the needed \(f^2\).

\[ f^2=\frac{sr^2}{1-R^2} \]

So we type:

my_f2 <- .02 / (1 - .20)

print(my_f2)

## [1] 0.025

20.2.3 Calculate power

We can calculate the required degrees of freedom with the command below:

library(pwr)
pwr.f2.test(u = 1,
            f2 = 0.025,
            power = .90)

## 
##      Multiple regression power calculation 
## 
##               u = 1
##               v = 420.2
##              f2 = 0.025
##       sig.level = 0.05
##           power = 0.9

Thus, degrees of freedom error (i.e., \(v\)) is 421

We then calculate \(N = u + v + 1\)

In R we type:

N = 1 + 421 + 1
print(N)

## [1] 423

Thus, you need an \(N\) of 423.

20.3 Power obtained \(sr^2\)

Imagine your power analysis tells you that you need an \(N\) of 423 but you only get an \(N\) of 75. What is your power? Simply calculate the degrees of freedom for the predictors and error and run the command again. This time you leave out power. It will be calculated for you. You can do this before you collect your data if you know in advance you will have a low/restricted \(N\) - as is the case for many theses.

In this example, \(R^2=.20\) and \(sr^2=.02\) are the expected effect sizes.

Degrees of Freedom Predictors: \(u = 1\) as above (incremental prediction of one variable)

In terms of error:

Degrees of Freedom Error: \(v = N - u - 1 = 75 - 1 - 1 = 73\).

In terms of effect size:

\(f^2=\frac{sr^2}{1-R^2} = \frac{.02}{1-.20}=.025\).

pwr.f2.test(u = 1,
            v = 73,
            f2 = 0.025)

## 
##      Multiple regression power calculation 
## 
##               u = 1
##               v = 73
##              f2 = 0.025
##       sig.level = 0.05
##           power = 0.2718

In this case, power is .27. Thus, there is only a 27% chance of finding the effect if it exists. Given this, would it make sense to spend the time and energy to run the study?