Chapter 9 Effect sizes from publications
9.2 Objective
In this chapter we focus on obtaining effect size estimates (e.g., correlation or standardized mean difference) from published articles. Often studies do not report effect sizes (e.g., \(d\)-values) or if they do, they don’t report the confidence interval for that effect. We will learn more about confidence intervals in a future chapter, but for now, think of a confidence interval for a study/sample effect (e.g., \(d\)-value) as providing a plausible range of values for the population parameter (e.g., \(\delta\)). Below we learn the R commands needed to obtain effect sizes with confidence intervals from published articles.
The ability to obtain effect size estimates from articles will be come extraordinarily important later in the course. Specifically, when it is time to engage in sample size planning (e.g., for your thesis) you will need to obtain estimates of the effects you are interested in from past research. The commands in this chapter will help you to do so.
9.3 Estimating \(\delta\)
9.3.1 Independent Groups
When conducting an independent-groups t-test there are a few different ways to calculate \(d\)-values. We focus on two approaches in this chapter. First, researchers may assume that any intervention (e.g., control group vs. experimental group) will only affect group means (and not group standard deviations / variances). This is the most common scenario. When researchers have this belief they use a pooled variance approach to calculating the \(d\)-value. That is, the variances of the two groups are pooled (i.e., averaged) to create the denominator for the \(d\)-value. Second, researchers may assume that the intervention (e.g., control group vs. experimental group) will affect both group means and standard deviations/variances. This is a less common situation. When researchers have this belief they use one group (e.g., a control group) as the frame of reference for the comparison. That is, the standard deviation for a single group (e.g., the control group) is used as the denominator for the \(d\)-value.
9.3.1.1 \(d\) denominator: Pooled variance
9.3.1.1.1 Reported \(d\)-value no confidence interval
Scenario: A researcher reports a \(d\)-value of 1.67 in their article and that each condition has 50 people. The researcher believed that the intervention would only influence means and not standard deviations and used a pooled-variance \(d\)-value. They did not report a confidence interval. We can calculate a confidence interval for the independent groups \(d\)-value (pooled variance) with the command below:
## $Lower.Conf.Limit.smd
## [1] 1.211
##
## $smd
## [1] 1.67
##
## $Upper.Conf.Limit.smd
## [1] 2.123
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 1.67, 95% CI [1.21, 2.12]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 1.21 to 2.12.
9.3.1.1.2 Using cell means only
Scenario: A researcher does not report a \(d\)-value but indicates the descriptive statistics for the groups. For the first group \(M\) = 5.00, \(SD\) = 1.20, and \(n\) = 50. For the second group, \(M\) = 3.00, \(SD\) = 1.00, and \(n\) = 50. We can calculate the standardized mean difference (i.e., independent groups \(d\)-value) using the command below:
## [1] 1.811
Our sample estimate of \(\delta\) is \(d\) = 1.81. The 95% confidence interval is obtained below:
## $Lower.Conf.Limit.smd
## [1] 1.34
##
## $smd
## [1] 1.81
##
## $Upper.Conf.Limit.smd
## [1] 2.273
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 1.81, 95% CI [1.34, 2.27]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 1.34 to 2.27.
9.3.1.1.3 t: Equal group sizes
Scenario: A researcher does not report a \(d\)-value, descriptive statistics, or n per group. Fortunately, the researcher did report the \(t\)-value, \(t\)(98) = 8.30, and the fact that the groups were the same size. After reading the article you believe the intervention is only likely to affect the means of the groups and not the variances - so a pooled variance term for the denominator of the \(d\)-value is appropriate. We can use the information provided to calculate a \(d\)-value and confidence interval.
To calculate the \(d\)-value we need to know the number of people per group. We can obtain that information from the degrees of freedom (98) using the formula below.
\[ \begin{aligned} df &= n_1 + n_2 - 2 \end{aligned} \]
Because the two groups were the same size we just use \(n\) instead of \(n_1\) and \(n_2\). We know \(df\) = 98:
\[ \begin{aligned} 98 &= n + n - 2\\ 98 &= 2n - 2\\ 98 + 2 &= 2n \\ 100 &= 2n \\ \frac{100}{2} &= n \\ 50 &= n \\ \end{aligned} \]
## $Lower.Conf.Limit.smd
## [1] 1.201
##
## $smd
## [1] 1.66
##
## $Upper.Conf.Limit.smd
## [1] 2.112
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 1.66, 95% CI [1.20, 2.11]. This \(d\)-value value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 1.20 to 2.11.
9.3.1.1.4 t: Unequal group sizes
Scenario: A researcher does not report a \(d\)-value or descriptive statistics. Fortunately, the researcher did report the \(t\)-value, \(t\)(98) = 9.36, and the fact that the groups had 65 and 35 people. After reading the article you believe the intervention is only likely to affect the means of the groups and not the variances - so a pooled variance term for the denominator of the \(d\)-value is appropriate. We can use the information provided to calculate a \(d\)-value and confidence interval.
## $Lower.Conf.Limit.smd
## [1] 1.465
##
## $smd
## [1] 1.962
##
## $Upper.Conf.Limit.smd
## [1] 2.453
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 1.96. This value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval, 95% CI [1.47, 2.45] indicates that a plausible range for \(\delta\) is 1.47 to 2.45.
9.3.1.2 \(d\) denominator: Control group variance
In this section we focus on calculating the \(d\)-value when the researcher of the published article has used the standard deviation of a single group as the denominator for the \(d\)-value. This is usually done when the researcher believes an experimental intervention will influence both group means and group standard deviations. Researchers who use this approach will hopefully have made it clear what they have done. If you’re not sure what a researcher has done I suggest using the pooled variance approach for the denominator above.
9.3.1.2.1 Reported \(d\)-value no confidence interval
Scenario: A researcher reports a \(d\)-value of 0.85 in their article and that each condition has 50 people. The researcher believed that the intervention would influence both means and standard deviations. Therefore, when they calculated the \(d\)-value they used the standard deviation of a single group (the control group). We can calculate a confidence interval for the independent groups \(d\)-value (control group variance) with the command below:
## $Lower.Conf.Limit.smd.c
## [1] 0.4199
##
## $smd.c
## [1] 0.85
##
## $Upper.Conf.Limit.smd.c
## [1] 1.273
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 0.85, 95% CI [0.42, 1.27]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.42 to 1.27.
9.3.1.2.2 Using cell means
Scenario: A researcher does not report a \(d\)-value but indicates the descriptive statistics for the groups. For the experimental group \(M\) = 5.00, \(SD\) = 1.20, and \(n\) = 50. For the control group, \(M\) = 3.00, \(SD\) = 1.40, and \(n\) = 50. After reading the article you believe that intervention might well influence both means and standard deviations. Therefore, you think a control group standard deviation should be the frame of reference for the \(d\)-value. You can calculate the standardized mean difference (i.e., independent groups \(d\)-value) using the command below:
## [1] 1.429
The confidence interval can be obtained with the command:
## $Lower.Conf.Limit.smd.c
## [1] 0.942
##
## $smd.c
## [1] 1.43
##
## $Upper.Conf.Limit.smd.c
## [1] 1.908
The population effect (\(\delta\)) is unknown but our sample estimate is \(d\) = 1.43, 95% CI [0.94, 1.91]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.94 to 1.91.
9.3.2 Repeated Measures
For a repeated measures design, different \(d\)-values can be reported. We focus on the formula below for repeated measures \(d\)-values. In this formula the symbol, \(\bar{x}_{diff}\), indicates the mean of the column of differences between the two times. Likewise, \(s_{diff}\), is used to indicate the standard deviation of the column of differences.
\[ \begin{aligned} d & = \frac{\bar{x}_{diff}}{s_{diff}} \end{aligned} \]
9.3.2.1 Reported repeated \(d\)-value, no confidence interval
Scenario: A researcher reports a repeated-measures \(d\) = .50 and \(n\) = 150. The 95% confidence interval for this \(d\)-value (i.e., standardized mean difference) can be obtained with the command below.
## $Lower.Conf.Limit.Standardized.Mean
## [1] 0.3295
##
## $Standardized.Mean
## [1] 0.5
##
## $Upper.Conf.Limit.Standardized.Mean
## [1] 0.669
In this repeated measures design the population effect (\(\delta\)) is unknown but our sample estimate of weight loss is \(d = 0.50\), 95% CI [0.33, 0.67]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.33 to 0.67.
9.3.2.2 Using repeated measures mean difference
Scenario: A researcher reports a repeated-measures design on weight loss but does not report the \(d\)-value or confidence interval. He does report, however, that weights decreased on average by 3.00 lbs, SD = 6.00.
In order to understand what the M = 3.00 lbs, SD = 6.00 refers to, consider the following situation. The participants “weigh in” at time 1. They diet for three months. Then they “weigh out” at time 2. There are now two columns (time1, time2) with weight information - one participant per row. We can create a third column, called diff, by subtracting time 1 weights from time 2 weights. That is, diff = time 2 weight - time 1 weight. The new diff column indicates how the weights for each person have changed over the diet. The mean of the diff column is \(\bar{x}_{diff}\) = 3.00 and the standard deviation of the diff column is \(s_{diff}\) = 6.00.
This information can be used in the formula below:
\[ \begin{aligned} d & = \frac{\bar{x}_{diff}}{s_{diff}}\\ &= \frac{-3.00}{6.00} \\ &= -.50 \end{aligned} \] We tend to always report \(d\)-values as positive values so \(d\) = .50. We can get a confidence interval around this value with the code below.
## $Lower.Conf.Limit.Standardized.Mean
## [1] 0.3295
##
## $Standardized.Mean
## [1] 0.5
##
## $Upper.Conf.Limit.Standardized.Mean
## [1] 0.669
In this repeated measures design the population effect (\(\delta\)) is unknown but our sample estimate of weight loss is \(d = 0.50\), 95% CI [0.33, 0.67]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.33 to 0.67.
9.3.2.3 Using repeated measures cell information
Scenario: A researcher reports a repeated-measures design on weight loss but does not report the \(d\)-value, mean difference, or standard deviation for the differences. He does, however, report mean and standard deviation for before and after the weight loss program. As well, he reports the correlation between before diet weights and after diet weights. The mean weight before the diet was M = 140 lbs, SD = 5 whereas after the diet the mean weight was 132 lbs, SD = 6. The correlation between time 1 and time 2 weights was \(r\) =.80.
In order to understand what the M = 140 lbs, SD = 5 refers to, consider the following situation. The participants “weigh in” at time 1. They diet for three months. Then they “weigh out” at time 2. There are now two columns (time1, time2) with weight information. We can create a third column, called diff, by subtracting time 1 weight from time 2 weights. That is, diff = time 2 weight - time 1 weight. The new diff column indicates how the weights for each person have changed over the diet.
The researcher reports descriptive statistics (M, SD) for the time 1 and time 2 columns but nothing about the diff column. We can, however, figure the mean of the diff column, \(bar{x}_{diff}\), and the standard deviation of the diff column, \(s_{diff}\), from the information provided.
Recall the formula for repeated measures \(d\)-value:
\[ \begin{aligned} d & = \frac{\bar{x}_{diff}}{s_{diff}} \end{aligned} \]
We can obtain the numerator easily:
\[ \begin{aligned} \bar{x}_{diff} & = \bar{x}_1 - \bar{x}_2 \\ &= 140 - 132 \\ &= 8 \\ \end{aligned} \] Obtaining the values for the denominator is a bit more complicated. We can calculate \(s_{diff}\) but it requires the standard deviations of the before weights (\(s_1\) = 5) and the after weights (\(s_1\) = 6) - as well as the correlation between the two times (\(r\) = .80).
We begin by calculating \(s_{diff}^2\) which is variance of the column of differences differences:
\[ \begin{aligned} s_{diff}^2 & = s_1^2 + s_2^2 + 2(s_1)(s_2)r_{12} \\ & = 5^2 + 6^2 + 2(5)(6)(.80)\\ & = 25 + 36 + 48\\ & = 109 \end{aligned} \]
We obtain \(s_{diff}\) by taking the square root of \(s_{diff}^2\):
\[ \begin{aligned} s_{diff} & = \sqrt{s_{diff}^2} \\ & = \sqrt{109}\\ & = 10.44031 \end{aligned} \]
Then we calculate the \(d\)-value using the numerator and denominator we calculated:
\[ \begin{aligned} d & = \frac{\bar{x}_{diff}}{s_{diff}}\\ & = \frac{8}{10.44031} \\ & = 0.7662608\\ & = 0.77\\ \end{aligned} \] The \(d\)-value is 0.77 but we need a confidence interval:
## $Lower.Conf.Limit.Standardized.Mean
## [1] 0.5451
##
## $Standardized.Mean
## [1] 0.77
##
## $Upper.Conf.Limit.Standardized.Mean
## [1] 0.9918
In this repeated measures design the population effect (\(\delta\)) is unknown but our sample estimate of weight loss is \(d\) = 0.77, 95% CI [0.55, 0.99]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.55 to 0.99
9.3.2.4 Using repeated measures \(t\)-value
Scenario: A researcher reports a repeated-measures but does not report a d-value, mean difference with SD, or even the sample size. He does, however, report that t(40) = 6.23. The 95% confidence interval for this \(d\)-value (i.e., standardized mean difference) can be obtained with the command below.
\[ \begin{aligned} df &= N - 1 \\ 40 &= N - 1\\ 40 + 1 &= N\\ 41 &= N \end{aligned} \]
We can use the N and \(t\)-value to calculate the \(d\)-value and CI using the command:
## $Lower.Conf.Limit.Standardized.Mean
## [1] 0.5962
##
## $Standardized.Mean
## [1] 0.973
##
## $Upper.Conf.Limit.Standardized.Mean
## [1] 1.341
In this repeated measures design the population effect (\(\delta\)) is unknown but our sample estimate of weight loss is \(d\) = 0.97, 95% CI [0.60, 1.34]. This \(d\)-value will differ from the population effect (\(\delta\)) due to sampling error. The confidence interval indicates that a plausible range for \(\delta\) is 0.60 to 1.34.
9.4 Estimating \(\rho\)
9.4.1 A single correlation
Correlations are frequently reported in the literature. Less common, however, is the reporting of confidence intervals for those correlations. A confidence interval for a reported correlations (e.g., \(r\) = .40) can be obtained if you also have the sample size (\(n\) = 200) with the command below:
## $Lower.Limit
## [1] 0.2766
##
## $Estimated.Correlation
## [1] 0.4
##
## $Upper.Limit
## [1] 0.5104
From this output you see that \(r\) = .40, 95% CI [.28, .51]. The population effect (\(\rho\)) is unknown but our sample estimate of the population correlation is \(r = .30\), 95% CI [.28, .51]. This sample correlation will differ from the population correlation (\(\rho\)) due to sampling error. The confidence interval indicates that a plausible range for \(\rho\) is .28 to .51.
9.4.2 Difference between two correlations
Sometimes a researcher will conduct two studies in the same article and compare the correlations.
Scenario: A researcher reports that in Study 1 he found that \(r\) = .32, N = 300. But in Study 2 he found a correlation of \(r\) = .51, N = 400. You want to know the difference between the correlations and a confidence interval for that difference.
\[ \begin{aligned} \Delta r &= r_2 - r_1 \\ &= .51 - .32 \\ &= .19 \end{aligned} \] You can obtain a confidence interval for this difference between correlations using the code below. Note, however, that this code applies only when the two correlations are from different samples. If the correlations are from the same sample the code is more complex but possible - see the cocor package documentation for details. The code for obtaining the confidence interval for the difference between these two correlations from different samples is below. Examine the last part of the output labeled zou2007.
##
## Results of a comparison of two correlations based on independent groups
##
## Comparison between r1.jk = 0.51 and r2.hm = 0.32
## Difference: r1.jk - r2.hm = 0.19
## Group sizes: n1 = 300, n2 = 400
## Null hypothesis: r1.jk is equal to r2.hm
## Alternative hypothesis: r1.jk is not equal to r2.hm (two-sided)
## Alpha: 0.05
##
## fisher1925: Fisher's z (1925)
## z = 3.0120, p-value = 0.0026
## Null hypothesis rejected
##
## zou2007: Zou's (2007) confidence interval
## 95% confidence interval for r1.jk - r2.hm: 0.0668 0.3105
## Null hypothesis rejected (Interval does not include 0)
From this output you see that in Study 1 \(r\) = .32 and in Study 2 \(r\) = .51. The difference between the correlations \(\Delta r\) = .19, 95% CI [.07, .31]. Both population correlations are unknown as is the difference between the population correlations. We have a sample estimate of the difference between the two population correlations, \(\Delta r\) = .19, and the confidence interval indicates that difference between the population correlations may plausibly be as small as .07 or as large as .31.