TY - JOUR
T1 - Many Labs 5: testing pre-data-collection peer review as an intervention to increase replicability
AU - Ebersole, Charles R.
AU - Mathur, Maya B.
AU - Baranski, Erica
AU - Bart-Plange, Diane Jo
AU - Buttrick, Nicholas R.
AU - Chartier, Christopher R.
AU - Corker, Katherine S.
AU - Corley, Martin
AU - Hartshorne, Joshua K.
AU - IJzerman, Hans
AU - Lazarević, Ljiljana B.
AU - Rabagliati, Hugh
AU - Ropovik, Ivan
AU - Aczel, Balazs
AU - Aeschbach, Lena F.
AU - Andrighetto, Luca
AU - Arnal, Jack D.
AU - Arrow, Holly
AU - Babincak, Peter
AU - Bakos, Bence E.
AU - Baník, Gabriel
AU - Baskin, Ernest
AU - Belopavlović, Radomir
AU - Bernstein, Michael H.
AU - Białek, Michał
AU - Bloxsom, Nicholas G.
AU - Bodroža, Bojana
AU - Bonfiglio, Diane B.V.
AU - Boucher, Leanne
AU - Brühlmann, Florian
AU - Brumbaugh, Claudia C.
AU - Casini, Erica
AU - Chen, Yiling
AU - Chiorri, Carlo
AU - Chopik, William J.
AU - Christ, Oliver
AU - Ciunci, Antonia M.
AU - Claypool, Heather M.
AU - Coary, Sean
AU - Čolić, Marija V.
AU - Collins, W. Matthew
AU - Curran, Paul G.
AU - Day, Chris R.
AU - Dering, Benjamin
AU - Dreber, Anna
AU - Edlund, John E.
AU - Falcão, Filipe
AU - Fedor, Anna
AU - Feinberg, Lily
AU - Ferguson, Ian R.
AU - Schultze, Thomas
AU - Many Labs 5
PY - 2020/9
Y1 - 2020/9
N2 - Replication studies in psychological science sometimes fail to reproduce prior findings. If these studies use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data-collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replication studies from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) for which the original authors had expressed concerns about the replication designs before data collection; only one of these studies had yielded a statistically significant effect (p <.05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate the original effects. We revised the replication protocols and received formal peer review prior to conducting new replication studies. We administered the RP:P and revised protocols in multiple laboratories (median number of laboratories per original study = 6.5, range = 3–9; median total sample = 1,279.5, range = 276–3,512) for high-powered tests of each original finding with both protocols. Overall, following the preregistered analysis plan, we found that the revised protocols produced effect sizes similar to those of the RP:P protocols (Δr =.002 or.014, depending on analytic approach). The median effect size for the revised protocols (r =.05) was similar to that of the RP:P protocols (r =.04) and the original RP:P replications (r =.11), and smaller than that of the original studies (r =.37). Analysis of the cumulative evidence across the original studies and the corresponding three replication attempts provided very precise estimates of the 10 tested effects and indicated that their effect sizes (median r =.07, range =.00–.15) were 78% smaller, on average, than the original effect sizes (median r =.37, range =.19–.50).
AB - Replication studies in psychological science sometimes fail to reproduce prior findings. If these studies use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data-collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replication studies from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) for which the original authors had expressed concerns about the replication designs before data collection; only one of these studies had yielded a statistically significant effect (p <.05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate the original effects. We revised the replication protocols and received formal peer review prior to conducting new replication studies. We administered the RP:P and revised protocols in multiple laboratories (median number of laboratories per original study = 6.5, range = 3–9; median total sample = 1,279.5, range = 276–3,512) for high-powered tests of each original finding with both protocols. Overall, following the preregistered analysis plan, we found that the revised protocols produced effect sizes similar to those of the RP:P protocols (Δr =.002 or.014, depending on analytic approach). The median effect size for the revised protocols (r =.05) was similar to that of the RP:P protocols (r =.04) and the original RP:P replications (r =.11), and smaller than that of the original studies (r =.37). Analysis of the cumulative evidence across the original studies and the corresponding three replication attempts provided very precise estimates of the 10 tested effects and indicated that their effect sizes (median r =.07, range =.00–.15) were 78% smaller, on average, than the original effect sizes (median r =.37, range =.19–.50).
KW - metascience
KW - open data
KW - peer review
KW - preregistered
KW - Registered Reports
KW - replication
KW - reproducibility
U2 - 10.1177/2515245920958687
DO - 10.1177/2515245920958687
M3 - Article
AN - SCOPUS:85096661669
SN - 2515-2459
VL - 3
SP - 309
EP - 331
JO - Advances in Methods and Practices in Psychological Science
JF - Advances in Methods and Practices in Psychological Science
IS - 3
ER -