Transcription

Justify your alphaD. Lakens, F.G. Adolfi, C.J. Albers, F. Anvari, M.A.J. Apps, S.E. Argamon, T. Baguley,R.B. Becker, S.D. Benning, D.E. Bradford, E.M. Buchanan, A.R. Caldwell, Calster B.Van, R. Carlsson, S.-C. Chen, B. Chung, L.J. Colling, G.S. Collins, Z. Crook, E.S. Cross,S. Daniels, Henrik Danielsson, L. Debruine, D.J. Dunleavy, B.D. Earp, M.I. Feist, J.D.Ferrell, J.G. Field, N.W. Fox, A. Friesen, C. Gomes, M. Gonzalez-Marquez, J.A.Grange, A.P. Grieve, R. Guggenberger, J. Grist, Harmelen A.-L. Van, F. Hasselman,K.D. Hochard, M.R. Hoffarth, N.P. Holmes, M. Ingre, Peder Isager, H.K. Isotalus, C.Johansson, K. Juszczyk, D.A. Kenny, A.A. Khalil, B. Konat, J. Lao, E.G. Larsen,G.M.A. Lodder, J. Lukavský, C.R. Madan, D. Manheim, S.R. Martin, A.E. Martin,D.G. Mayo, R.J. McCarthy, K. McConway, C. McFarland, A.Q.X. Nio, G. Nilsonne,Oliveira C.L. De, Xivry J.-J.O. De, S. Parsons, G. Pfuhl, K.A. Quinn, J.J. Sakon, S.A.Saribay, I.K. Schneider, M. Selvaraju, Z. Sjoerds, S.G. Smith, T. Smits, J.R. Spies, V.Sreekumar, C.N. Steltenpohl, N. Stenhouse, W. Swiatkowski, M.A. Vadillo, AssenM.A.L.M. Van, M.N. Williams, S.E. Williams, D.R. Williams, T. Yarkoni, I. Ziano andR.A. ZwaanThe self-archived postprint version of this journal article is available at LinköpingUniversity Institutional Repository (DiVA) :http://urn.kb.se/resolve?urn urn:nbn:se:liu:diva-146130N.B.: When citing this work, cite the original publication:Lakens, D., Adolfi, F., Albers, C., Anvari, F., Apps, M., Argamon, S., Baguley, T., Becker, R., Benning,S., Bradford, D., Buchanan, E., Caldwell, A., Van, C. B., Carlsson, R., Chen, S., Chung, B., Colling, L.,Collins, G., Crook, Z., Cross, E., Daniels, S., Danielsson, H., Debruine, L., Dunleavy, D., Earp, B., Feist,M., Ferrell, J., Field, J., Fox, N., Friesen, A., Gomes, C., Gonzalez-Marquez, M., Grange, J., Grieve, A.,Guggenberger, R., Grist, J., Van, H. A., Hasselman, F., Hochard, K., Hoffarth, M., Holmes, N., Ingre,M., Isager, P., Isotalus, H., Johansson, C., Juszczyk, K., Kenny, D., Khalil, A., Konat, B., Lao, J., Larsen,E., Lodder, G., Lukavský, J., Madan, C., Manheim, D., Martin, S., Martin, A., Mayo, D., McCarthy, R.,McConway, K., McFarland, C., Nio, A., Nilsonne, G., De, O. C., De, X. J., Parsons, S., Pfuhl, G., Quinn,K., Sakon, J., Saribay, S., Schneider, I., Selvaraju, M., Sjoerds, Z., Smith, S., Smits, T., Spies, J.,Sreekumar, V., Steltenpohl, C., Stenhouse, N., Swiatkowski, W., Vadillo, M., Van, A. M., Williams, M.,Williams, S., Williams, D., Yarkoni, T., Ziano, I., Zwaan, R., (2018), Justify your alpha, Nature HumanBehaviour, 2(3), 168-171. https://doi.org/10.1038/s41562-018-0311-xOriginal publication available ght: Nature Publishing Grouphttp://www.nature.com/

1Justify Your Alpha23In Press, Nature Human Behavior4Daniel Lakens*1, Federico G. Adolfi2, Casper J. Albers3, Farid Anvari4, Matthew A. J. Apps5,5Shlomo E. Argamon6, Thom Baguley7, Raymond B. Becker8, Stephen D. Benning9, Daniel E.6Bradford10, Erin M. Buchanan11, Aaron R. Caldwell12, Ben van Calster13, Rickard Carlsson14,7Sau-Chin Chen15, Bryan Chung16, Lincoln J Colling17, Gary S. Collins18, Zander Crook19,8Emily S. Cross20, Sameera Daniels21, Henrik Danielsson22, Lisa DeBruine23, Daniel J.9Dunleavy24, Brian D. Earp25, Michele I. Feist26, Jason D. Ferrell27, James G. Field28, Nicholas10W. Fox29, Amanda Friesen30, Caio Gomes31, Monica Gonzalez-Marquez32, James A.11Grange33, Andrew P. Grieve34, Robert Guggenberger35, James Grist36, Anne-Laura van12Harmelen37, Fred Hasselman38, Kevin D. Hochard39, Mark R. Hoffarth40, Nicholas P.13Holmes41, Michael Ingre42, Peder M. Isager43, Hanna K. Isotalus44, Christer Johansson45,14Konrad Juszczyk46, David A. Kenny47, Ahmed A. Khalil48, Barbara Konat49, Junpeng Lao50,15Erik Gahner Larsen51, Gerine M. A. Lodder52, Jiří Lukavský53, Christopher R. Madan54, David16Manheim55, Stephen R. Martin56, Andrea E. Martin57, Deborah G. Mayo58, Randy J.17McCarthy59, Kevin McConway60, Colin McFarland61, Amanda Q. X. Nio62, Gustav Nilsonne63,18Cilene Lino de Oliveira64, Jean-Jacques Orban de Xivry65, Sam Parsons66, Gerit Pfuhl67,19Kimberly A. Quinn68, John J. Sakon69, S. Adil Saribay70, Iris K. Schneider71, Manojkumar20Selvaraju72, Zsuzsika Sjoerds73, Samuel G. Smith74, Tim Smits75, Jeffrey R. Spies76, Vishnu21Sreekumar77, Crystal N. Steltenpohl78, Neil Stenhouse79, Wojciech Świątkowski80, Miguel A.22Vadillo81, Marcel A. L. M. Van Assen82, Matt N. Williams83, Samantha E. Williams84, Donald23R. Williams85, Tal Yarkoni86, Ignazio Ziano87, Rolf A. Zwaan882425Affiliations262728*1Human-Technology Interaction, Eindhoven University of Technology, Den Dolech,5600MB, Eindhoven, The Netherlands1

12Laboratoryof Experimental Psychology and Neuroscience (LPEN), Institute of Cognitive2and Translational Neuroscience (INCYT), INECO Foundation, Favaloro University,3Pacheco de Melo 1860, Buenos Aires, ientific and Technical Research Council (CONICET), Godoy Cruz 2290, BuenosAires, Argentina3HeymansInstitute for Psychological Research, University of Groningen, Grote Kruisstraat2/1, 9712TS Groningen, The Netherlands4Collegeof Education, Psychology & Social Work, Flinders University, Adelaide, GPO Box2100, Adelaide, SA, 5001, Australia5Departmentof Experimental Psychology, University of Oxford, New Radcliffe House,Oxford, OX2 6GG, UK6Departmentof Computer Science, Illinois Institute of Technology, Chicago, IL, 10 W. 31stStreet, Chicago, IL 60645, USA7Departmentof Psychology, Nottingham Trent University, Nottingham, 50 ShakespeareStreet, Nottingham, NG1 4FQ, UK8Facultyof Linguistics and Literature, Bielefeld University, Bielefeld, Universitätsstraße 25,33615 Bielefeld, Germany9Psychology,University of Nevada, Las Vegas, Las Vegas, 4505 S. Maryland Pkwy., Box455030, Las Vegas, NV 89154-5030, USA10Psychology,WI. 53706, USA2211Psychology,2312Health,2425262728University of Wisconsin-Madison, Madison, 1202 West Johnson St. MadisonMissouri State University, 901 S. National Ave, Springfield, MO, 65897, USAHuman Performance, and Recreation, University of Arkansas, Fayetteville, 155Stadium Drive, HPER 321, Fayetteville, AR, 72701, USA13Departmentof Development and Regeneration, KU Leuven, Leuven, Herestraat 49 box805, 3000 Leuven, Belgium, Belgium13Departmentof Medical Statistics and Bioinformatics, Leiden University Medical Center,Postbus 9600, 2300 RC, Leiden, The Netherlands2

12345614DepartmentKalmar, 9Department20Schoolof Psychology, Bangor University, Bangor, Adeilad Brigantia, Bangor, Gwynedd,LL57 2AS, UK21RamseyDecision Theoretics, 4849 Connecticut Ave. NW #132, Washington, DC 20008,1622Department19202122232425of Psychology, The University of Edinburgh, 7 George Square, Edinburgh, EH89JZ, UKUSA18of Psychology, University of Cambridge, Cambridge CB2 3EB, UKfor Statistics in Medicine, University of Oxford, Windmill Road, Oxford, OX3 7LD,1517of Surgery, University of British Columbia, Victoria, #301 - 1625 Oak Bay Ave,Victoria BC Canada, V8R 1B1 , Canada17Department10of Human Development and Psychology, Tzu-Chi University, No. 67, JierenSt., Hualien City, Hualien County, 97074, Taiwan79of Psychology, Linnaeus University, Kalmar, Stagneliusgatan 14, 392 34,of Behavioural Sciences and Learning, Linköping University, SE-581 83,Linköping, Sweden23Instituteof Neuroscience and Psychology, University of Glasgow, Glasgow, 58 HillheadStreet, UK24Collegeof Social Work, Florida State University, 296 Champions Way, University Center C,Tallahassee, FL, 32304, USA25Departmentsof Psychology and Philosophy, Yale University, 2 Hillhouse Ave, New HavenCT 06511, USA26Departmentof English, University of Louisiana at Lafayette, P. O. Box 43719, Lafayette LA70504, USA2627Department27USAof Psychology, St. Edward's University, 3001 S. Congress, Austin, TX 78704,3

123456789101127Departmentof Psychology, University of Texas at Austin, 108 E. Dean Keeton Stop A8000,Austin, TX 78712-1043, USA28Departmentof Management, West Virginia University, 1602 University Avenue,Morgantown, WV 26506, USA29Departmentof Psychology, Rutgers University, New Brunswick, 53 Avenue E, PiscatawayNJ 08854, USA30Departmentof Political Science, Indiana University Purdue University, Indianapolis,Indianapolis, 425 University Blvd CA417, Indianapolis, IN 46202, USA31Booking.com,32DepartmentHerengracht 597, 1017 CE Amsterdam, The Nederlandsof English, American and Romance Studies, RWTH - Aachen University,Aachen, Kármánstraße 17/19, 52062 Aachen, Germany1233Schoolof Psychology, Keele University, Keele, Staffordshire, ST5 5BG, UK1334Centreof Excellence for Statistical Innovation, UCB Celltech, 208 Bath Road, Slough,14Berkshire SL1 3WE, 9202122232425262728Neurosurgery, Eberhard Karls University Tübingen, Tübingen, GermanyTübingen, International Centre for Ethics in Sciences and Humanities, Germanyof Radiology, University of Cambridge, Box 218, Cambridge BiomedicalCampus, CB2 0QQ, UK37Departmentof Psychiatry, University of Cambridge, Cambridge, 18b Trumpington Road,CB2 8AH, UK38BehaviouralScience Institute, Radboud University Nijmegen, Montessorilaan 3, 6525 HR,Nijmegen, The Netherlands39Departmentof Psychology, University of Chester, Chester, Department of Psychology,University of Chester, Chester, CH1 4BJ, UK40Departmentof Psychology, New York University, 4 Washington Place, New York, NY10003, USA41Schoolof Psychology, University of Nottingham, Nottingham, University Park, NG7 2RD,UK4

dependent, Stockholm, Skåpvägen 5, 12245 ENSKEDE, SwedenLinköping,, Sweden44Schoolof Clinical Sciences, University of Bristol, Bristol, Level 2 academic offices, L&RBuilding, Southmead Hospital, BS10 5NB, UK45Occupational46TheFaculty of Modern Languages and Literatures, Institute of Linguistics, PsycholinguisticsDepartment, Adam Mickiewicz University, Al. Niepodległości 4, 61-874, Poznań, Poland47Department48Centerfor Stroke Research Berlin, Charité - Universitätsmedizin Berlin, Hindenburgdamm30, 12200 Berlin, Germany48MaxPlanck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103Leipzig, Germany48BerlinSchool of Mind and Brain, Humboldt-Universität zu Berlin, Luisenstraße 56, 10115Berlin, Germany40SocialSciences, Adam Mickiewicz University, Poznań, Szamarzewskiego 89, 60-568Poznan, Poland2151School2252 Department25of Psychological Sciences, University of Connecticut, Storrs, CT, Departmentof Psychological Sciences, U-1020, Storrs, CT 06269-1020, USA50Department24Orthopaedics and Research, Sahlgrenska University Hospital, 413 45Gothenburg, Sweden2023of Clinical and Experimental Medicine, University of Linköping, 581 83of Psychology, University of Fribourg, Faucigny 2, 1700 Fribourg, Switzerlandof Politics and International Relations, University of Kent, Canterbury CT2 7NX, UKof Sociology / ICS, University of Groningen, Grote Rozenstraat 31, 9712 TGGroningen, The Netherlands53Instituteof Psychology, Czech Academy of Sciences, Hybernská 8, 11000 Prague, CzechRepublic2654Schoolof Psychology, University of Nottingham, Nottingham, NG7 2RD, UK2755PardeeRAND Graduate School, RAND Corporation, 1200 S Hayes St, Arlington, VA2822202, USA5

nter11of Psychology, School of Philosophy, Psychology, and Language Sciences,University of Edinburgh, 7 George Square, EH8 9JZ Edinburgh, UK58Dept10of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen,Wundtlaan 1, 6525XD, The Netherlands79and Neuroscience, Baylor University, Waco, One Bear Place 97310, Waco TX,of Philosophy, Major Williams Hall, Virginia Tech, Blacksburg, VA, USfor the Study of Family Violence and Sexual Assault, Northern Illinois University,DeKalb, IL, 125 President's BLVD., DeKalb, IL 60115, USA60Schoolof Mathematics and Statistics, The Open University, Milton Keynes, Walton Hall,Milton Keynes MK7 6AA, UK1261Skyscanner,1362School14UK1563Stress16171815 Laurison Place, Edinburgh, EH3 9EN, UKof Biomedical Engineering and Imaging Sciences, King's College London, London,Research Institute, Stockholm University, Stockholm, Frescati Hagväg 16A, SE-10691 Stockholm, Sweden63Departmentof Clinical Neuroscience, Karolinska Institutet, Nobels väg 9, SE-17177Stockholm, Sweden1963Department2064Laboratoryof Psychology, Stanford University, 450 Serra Mall, Stanford, CA 94305, USAof Behavioral Neurobiology, Department of Physiological Sciences, Federal21University of Santa Catarina, Florianópolis, Campus Universitário Trindade, 88040900,22Brazil232465Departmentof Kinesiology, KU Leuven, Leuven, Tervuursevest 101 box 1501, B-3001Leuven, Belgium2566Departmentof Experimental Psychology, University of Oxford, Oxford, UK2667Departmentof Psychology, UiT The Arctic University of Norway, Tromsø, Norway2768Departmentof Psychology, DePaul University, Chicago, 2219 N Kenmore Ave, Chicago, IL2860657, USA6

1269Centerfor Neural Science, New York University, 4 Washington Pl Room 809 New York, NY10003, USA370Departmentof Psychology, Boğaziçi University, Bebek, 34342, Istanbul, Turkey471Psychology,University of Cologne, Cologne,Herbert-Lewin-St. 2, 50931, Cologne,5Germany678972SaudiHuman Genome Program, King Abdulaziz City for Science and Technology(KACST); Integrated Gulf Biosystems, Riyadh, Saudi Arabia73CognitivePsychology Unit, Institute of Psychology, Leiden University, Wassenaarseweg52, 2333 AK Leiden, The te for Brain and Cognition, Leiden University, Leiden, The NetherlandsInstitute of Health Sciences, University of Leeds, Leeds, LS2 9NL, UKfor Media Studies, KU Leuven, Leuven, Belgiumfor Open Science, 210 Ridge McIntire Rd Suite 500, Charlottesville, VA 22903, USAof Engineering and Society, University of Virginia, Thornton Hall, P.O. Box400259, Charlottesville, VA 22904, USA77SurgicalNeurology Branch, National Institute of Neurological Disorders and Stroke,National Institutes of Health, Bethesda, MD 20892, USA78Departmentof Psychology, University of Southern Indiana, 8600 University Boulevard,Evansville, Indiana, USA79LifeSciences Communication, University of Wisconsin-Madison, Madison, Wisconsin, 1545Observatory Drive, Madison, WI 53706, USA80Departmentof Social Psychology, Institute of Psychology, University of Lausanne, QuartierUNIL-Mouline, Bâtiment Géopolis, CH-1015 Lausanne, Switzerland81Departamentode Psicología Básica, Universidad Autónoma de Madrid, c/ Ivan Pavlov 6,28049 Madrid, Spain82Departmentof Methodology and Statistics, Tilburg University, Warandelaan 2, 5000 LETilburg, The Netherlands7

123482Departmentof Sociology, Utrecht University, Padualaan 14, 3584 CH, Utrecht, TheNetherlands83Schoolof Psychology, Massey University, Auckland, Private Bag 102904, North Shore,Auckland, 0745, New Zealand584Psychology,6USA785Psychology,University of California, Davis, Davis, One Shields Ave, Davis, CA 95616, USA886Departmentof Psychology, University of Texas at Austin, 108 E. Dean Keeton Stop A8000,9Austin, TX 78712-1043, USA1087Marketing1188Department12Saint Louis University, St. Louis, MO, 3700 Lindell Blvd, St. Louis, MO 63108,Department, Ghent University, Tweekerkenstraat 2, 9000 Ghent, Belgiumof Psychology, Education, and Child Studies, Erasmus University Rotterdam,Rotterdam, Burgemeester Oudlaan 50, 3000 DR, Rotterdam, The Netherlands1314Author Contributions. Daniel Lakens, Nicholas W. Fox, Monica Gonzalez-Marquez, James15A. Grange, Nicholas P. Holmes, Ahmed A. Khalil, Stephen R. Martin, Vishnu Sreekumar,16and Crystal N. Steltenpohl participated in brainstorming, drafting the commentary, and data-17analysis. Casper J. Albers, Shlomo E. Argamon, Thom Baguley, Erin M. Buchanan, Ben van18Calster, Zander Crook, Sameera Daniels, Daniel J. Dunleavy, Brian D. Earp, Jason D.19Ferrell, James G. Field, Anne-Laura van Harmelen, Michael Ingre, Peder M. Isager, Hanna20K. Isotalus, Junpeng Lao, Gerine M. A. Lodder, David Manheim, Andrea E. Martin, Kevin21McConway, Amanda Q. X. Nio, Gustav Nilsonne, Cilene Lino de Oliveira, Jean-Jacques22Orban de Xivry, Gerit Pfuhl, Kimberly A. Quinn, Iris K. Schneider, Zsuzsika Sjoerds, Samuel23G. Smith, Jeffrey R. Spies, Marcel A. L. M. Van Assen, Matt N. Williams, Donald R. Williams,24Tal Yarkoni, and Rolf A. Zwaan participated in brainstorming and drafting the commentary.25Federico G. Adolfi, Raymond B. Becker, Michele I. Feist, and Sam Parsons participated in26drafting the commentary, and data-analysis. Matthew A. J. Apps, Stephen D. Benning,27Daniel E. Bradford, Sau-Chin Chen, Bryan Chung, Lincoln J Colling, Henrik Danielsson, Lisa28DeBruine, Mark R. Hoffarth, Erik Gahner Larsen, Randy J. McCarthy, John J. Sakon, S. Adil8

1Saribay, Tim Smits, Neil Stenhouse, Wojciech Świątkowski, and Miguel A. Vadillo2participated in brainstorming. Farid Anvari, Aaron R. Caldwell, Rickard Carlsson, Emily S.3Cross, Amanda Friesen, Caio Gomes, Andrew P. Grieve, Robert Guggenberger, James4Grist, Kevin D. Hochard, Christer Johansson, Konrad Juszczyk, David A. Kenny, Barbara5Konat, Jiří Lukavský, Christopher R. Madan, Deborah G. Mayo, Colin McFarland,6Manojkumar Selvaraju, Samantha E. Williams, and Ignazio Ziano did not participate in7drafting the commentary because the points that they would have raised had already been8incorporated into the commentary, or endorse a sufficiently large part of the contents as if9participation had occurred. Except for the first author, authorship order is alphabetical.1011Acknowledgements: We’d like to thank Dale Barr, Felix Cheung, David Colquhoun, Hans12IJzerman, Harvey Motulsky, and Richard Morey for helpful discussions while drafting this13commentary. Daniel Lakens was supported by NWO VIDI 452-17-013. Federico G. Adolfi14was supported by CONICET. Matthew Apps was funded by a Biotechnology and Biological15Sciences Research Council AFL Fellowship (BB/M013596/1). Gary Collins was supported by16the NIHR Biomedical Research Centre, Oxford. Zander Crook was supported by the17Economic and Social Research Council [grant number C106891X]. Emily S. Cross was18supported by the European Research Council (ERC-2015-StG-677270). Lisa DeBruine is19supported by the European Research Council (ERC-2014-CoG-647910 KINSHIP). Anne-20Laura van Harmelen is funded by a Royal Society Dorothy Hodgkin Fellowship (DH150176).21Mark R. Hoffarth was supported by the National Science Foundation under grant SBE22SPRF-FR 1714446. Junpeng Lao was supported by the SNSF grant 100014 156490/1.23Cilene Lino de Oliveira was supported by AvH, Capes, CNPq. Andrea E. Martin was24supported by the Economic and Social Research Council of the United Kingdom [grant25number ES/K009095/1]. Jean-Jacques Orban de Xivry is supported by an internal grant from26the KU Leuven (STG/14/054) and by the Fonds voor Wetenschappelijk Onderzoek27(1519916N). Sam Parsons was supported by the European Research Council (FP7/2007–282013; ERC grant agreement no; 324176). Gerine Lodder was funded by NWO VICI 453-14-9

1016. Samuel Smith is supported by a Cancer Research UK Fellowship (C42785/A17965).2Vishnu Sreekumar was supported by the NINDS Intramural Research Program (IRP). Miguel3A. Vadillo was supported by Grant 2016-T1/SOC-1395 from Comunidad de Madrid. Tal4Yarkoni was supported by NIH award R01MH109682.56Competing Interests: The authors declare no competing interests.78Abstract: In response to recommendations to redefine statistical significance to p .005, we9propose that researchers should transparently report and justify all choices they make when10designing a study, including the alpha level.1110

1Justify Your Alpha23Benjamin et al.1 proposed changing the conventional “statistical significance” threshold (i.e.,4the alpha level) from p .05 to p .005 for all novel claims with relatively low prior odds.5They provided two arguments for why lowering the significance threshold would6“immediately improve the reproducibility of scientific research.” First, a p-value near .057provides weak evidence for the alternative hypothesis. Second, under certain assumptions,8an alpha of .05 leads to high false positive report probabilities (FPRP2; the probability that a9significant finding is a false positive).1011We share their concerns regarding the apparent non-replicability of many scientific studies,12and agree that a universal alpha of .05 is undesirable. However, redefining “statistical13significance” to a lower, but equally arbitrary threshold, is inadvisable for three reasons: (1)14there is insufficient evidence that the current standard is a “leading cause of non-15reproducibility”1; (2) the arguments in favor of a blanket default of p .005 do not warrant the16immediate and widespread implementation of such a policy; and (3) a lower significance17threshold will likely have negative consequences not discussed by Benjamin and colleagues.18We conclude that the term “statistically significant” should no longer be used and suggest19that researchers employing null hypothesis significance testing justify their choice for an20alpha level before collecting the data, instead of adopting a new uniform standard.2122Lack of evidence that p .005 improves replicability2324Benjamin et al.1 claimed that the expected proportion of replicable studies should be25considerably higher for studies observing p .005 than for studies observing .005 p .05,26due to a lower FPRP. Theoretically, replicability is related to the FPRP, and lower alpha27levels will reduce false positive results in the literature. However, in practice, the impact of28lowering alpha levels depends on several unknowns, such as the prior odds that the11

1examined hypotheses are true, the statistical power of studies, and the (change in) behavior2of researchers in response to any modified standards.34An analysis of the results of the Reproducibility Project: Psychology3 showed that 49%5(23/47) of the original findings with p-values below .005 yielded p .05 in the replication6study, whereas only 24% (11/45) of the original studies with .005 p .05 yielded p .057(χ2(1) 5.92, p .015, BF10 6.84). Benjamin and colleagues presented this as evidence of8“potential gains in reproducibility that would accrue from the new threshold.” According to9their own proposal, however, this evidence is only “suggestive” of such a conclusion, and10there is considerable variation in replication rates across p-values (see Figure 1).11Importantly, lower replication rates for p-values just below .05 are likely confounded by p-12hacking (the practice of flexibly analyzing data until the p-value passes the “significance”13threshold). Thus, the differences in replication rates between studies with .005 p .0514compared to those with p .005 may not be entirely due to the level of evidence. Further15analyses are needed to explain the low (49%) replication rate of studies with p .005, before16this alpha level is recommended as a new significance threshold for novel discoveries17across scientific disciplines.1819Weak justifications for the α .005 threshold2021We agree with Benjamin et al. that single p-values close to .05 never provide strong22“evidence” against the null hypothesis. Nonetheless, the argument that p-values provide23weak evidence based on Bayes factors has been questioned4. Given that the marginal24likelihood is sensitive to different choices for the models being compared, redefining alpha25levels as a function of the Bayes factor is undesirable. For instance, Benjamin and26colleagues stated that p-values of .005 imply Bayes factors between 14 and 26. However,27these upper bounds only hold for a Bayes factor based on a point null model and when the28p-value is calculated for a two-sided test, whereas one-sided tests or Bayes factors for non-12

1point null models would imply different alpha thresholds. When a test yields BF 25 the data2are interpreted as strong relative evidence for a specific alternative (e.g., μ 2.81), while a p3 .005 only warrants the more modest rejection of a null effect without allowing one to reject4even small positive effects with a reasonable error rate5. Benjamin et al. provided no5rationale for why the new p-value threshold should align with equally arbitrary Bayes factor6thresholds. We question the idea that the alpha level at which an error rate is controlled7should be based on the amount of relative evidence indicated by Bayes factors.89The second argument for α .005 is that the FPRP can be high with α .05. Calculating the10FPRP requires a definition of the alpha level, the power of the tests examining true effects,11and the ratio of true to false hypotheses tested (the prior odds). Figure 2 in Benjamin et al.12displays FPRPs for scenarios where most hypotheses are false, with prior odds of 1:5, 1:10,13and 1:40. The recommended p .005 threshold reduces the minimum FPRP to less than145%, assuming 1:10 prior odds (the true FPRP might still be substantially higher in studies15with very low power). This prior odds estimate is based on data from the Reproducibility16Project: Psychology3 using an analysis modelling publication bias for 73 studies6. Without17stating the reference class for the “base-rate of true nulls” (e.g., does this refer to all18hypotheses in science, in a discipline, or by a single researcher?), the concept of “prior odds19that H1 is true” has little meaning. Furthermore, there is insufficient representative data to20accurately estimate the prior odds that researchers examine a true hypothesis, and thus,21there is currently no strong argument based on FPRP to redefine statistical significance.2223How a threshold of p .005 might harm scientific practice2425Benjamin et al. acknowledged that their proposal has strengths as well as weaknesses, but26believe that its “efficacy gains would far outweigh losses.” We are not convinced and see at27least three likely negative consequences of adopting a lowered threshold.2813

1Risk of fewer replication studies. All else being equal, lowering the alpha level requires larger2sample sizes and creates an even greater strain on already limited resources. Achieving380% power with α .005, compared to α .05, requires a 70% larger sample size for4between-subjects designs with two-sided tests (88% for one-sided tests). While Benjamin et5al. propose α .005 exclusively for “new effects” (and not replications), designing larger6original studies would leave fewer resources (i.e., time, money, participants) for replication7studies, assuming fixed resources overall. At a time when replications are already relatively8rare and unrewarded, lowering alpha to .005 might therefore reduce resources spent on9replicating the work of others. More generally, recommendations for evidence thresholds10need to carefully balance statistical and non-statistical considerations (e.g., the value of11evidence for a novel claim vs. the value of independent replications).1213Risk of reduced generalisability and breadth. Requiring larger sample sizes across scientific14disciplines may exacerbate over-reliance on convenience samples (e.g., undergraduate15students, online samples). Specifically, without (1) increased funding, (2) a reward system16that values large-scale collaboration, and (3) clear recommendations for how to evaluate17research with sample size constraints, lowering the significance threshold could adversely18affect the breadth of research questions examined. Compared to studies that use19convenience samples, studies with unique populations (e.g., people with rare genetic20variants, patients with post-traumatic stress disorder) or with time- or resource-intensive data21collection (e.g., longitudinal studies) require considerably more research funds and effort to22increase the sample size. Thus, researchers may become less motivated to study unique23populations or collect difficult-to-obtain data, reducing the generalisability and breadth of24findings.2526Risk of exaggerating the focus on single p-values. Benjamin et al.’s proposal risks (1)27reinforcing the idea that relying on p-values is a sufficient, if imperfect, way to evaluate28findings, and (2) discouraging opportunities for more fruitful changes in scientific practice14

1and education. Even though Benjamin et al. do not propose p .005 as a publication2threshold, some bias in favor of significant results will remain, in which case redefining p 3.005 as "statistically significant" would result in greater upward bias in effect size estimates.4Furthermore, it diverts attention from the cumulative evaluation of findings, such as5converging results of multiple (replication) studies.67No one alpha to rule them all89We have two key recommendations. First, we recommend that the label “statistically10significant” should no longer be used. Instead, researchers should provide more meaningful11interpretations of the theoretical or practical relevance of their results. Second, authors12should transparently specify—and justify—their design choices. Depending on their choice of13statistical approach, these may include the alpha level, the null and alternative models,14assumed prior odds, statistical power for a specified effect size of interest, the sample size,15and/or the desired accuracy of estimation. We do not endorse a single value for any design16parameter, but instead propose that authors justify their choices before data are collected.17Fellow researchers can then evaluate these decisions, ideally also prior to data collection,18for example, by reviewing a Registered Report submission7. Providing researchers (and19reviewers) with accessible information about ways to justify (and evaluate) design choices,20tailored to specific research areas, will improve current research practices.2122Benjamin et al. noted that some fields, such as genomics a

Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Luisenstraße 56, 10115 17 Berlin, Germany 18. 40. Social Sciences, Adam Mickiewicz University, Poznań, Szamarzewskiego 89, 60-568 19 Poznan, Poland 20. 50. Department of Psychology, University of Fribourg, Faucigny 2, 1700 Fribourg, Switzerland 21. 51