### Transcription

Psychological Review2011, Vol. 118, No. 1, 164 –173 2011 American Psychological Association0033-295X/11/ 12.00 DOI: 10.1037/a0020698COMMENTAssessing the Belief Bias Effect With ROCs:Reply to Dube, Rotello, and Heit (2010)Karl Christoph Klauer and David KellenThis document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.Albert-Ludwigs-Universität FreiburgDube, Rotello, and Heit (2010) argued (a) that the so-called receiver operating characteristic is nonlinearfor data on belief bias in syllogistic reasoning; (b) that their data are inconsistent with Klauer, Musch, andNaumer’s (2000) model of belief bias; (c) that their data are inconsistent with any of the existing accountsof belief bias and only consistent with a theory provided by signal detection theory; and (d) that in fact,belief bias is a response bias effect. In this reply, we present reanalyses of Dube et al.’s data and of olddata suggesting (a) that the receiver operating characteristic is linear for binary “valid” versus “invalid”responses, as employed by the bulk of research in this field; (b) that Klauer et al.’s model describes theold data significantly better than does Dube et al.’s model and that it describes Dube et al.’s datasomewhat better than does Dube et al.’s model; (c) that Dube et al.’s data are consistent with the accountof belief bias by misinterpreted necessity, whereas Dube et al.’s signal detection model does not fit theirdata; and (d) that belief bias is more than a response bias effect.Keywords: reasoning, belief bias, multinomial models, signal detection modelsSupplemental material: http://dx.doi.org/10.1037/a0020698.suppbias. In the present context, it plots the proportion of correct“valid” responses for valid syllogisms (hit rate) against the proportion of false “valid” responses for invalid syllogisms (falsealarm rate). Different levels of response bias can be obtained byvarying the perceived base rate of valid relative to invalid syllogisms or by varying payoff schedules (Macmillan & Creelman,2005; McNicol, 1972; Wickens, 2002). Confidence ratings havebeen used to emulate ROCs in a less expensive manner. For thispurpose, the differences between response levels from 1 to 6 areconstrued as differences in response bias. For example, to obtainthe point of the ROC corresponding to the most liberal responsebias condition, only Response 6 (high-confidence “invalid” judgment) is considered an “invalid” response, whereas all other responses are treated as though the participant had responded“valid,” including “invalid” Responses 4 and 5, and the hit rate andfalse alarm rate are computed by aggregating over Responses 1–5.For the point of the ROC corresponding to the strictest responsebias, only Response 1 (high-confidence “valid” judgment) is considered a “valid” response; all other responses, including the“valid” Responses 2 and 3, are treated as though the participanthad responded “invalid.” Moving across the response scale in thisfashion, an ROC with 5 points is emulated. In what follows, wewill refer to the emulated ROC as a confidence-based ROC,whereas ROCs based on a binary response format will be referredto as binary ROCs.DRH fitted an SDT model, implying nonlinear ROC, to datafrom three experiments and judged that it provided reasonable fitsto their data. In contrast, a number of multinomial models (Batchelder & Riefer, 1999) with linear ROCs did not fit. They concludedthat ROCs are nonlinear for data on belief bias in syllogisticreasoning. As DRH pointed out, nonlinear ROCs would invalidateDube, Rotello, and Heit (2010), henceforth referred to as DRH,presented a signal detection theory (SDT) analysis and a series ofmodel comparisons for data from three experiments on syllogisticreasoning, manipulating the perceived (Experiment 1) and actual(Experiment 3) base rate of valid versus invalid syllogisms andconclusion believability (Experiments 2 and 3). A number ofstrong claims were based on these analyses, and if true, theseclaims would have important implications for modeling data onsyllogistic reasoning and for accounts of belief bias. The purposeof this reply is to examine these claims on the basis of both DRH’sdata and old data. Before we do so, it is necessary to describecentral features of DRH’s procedure.Responses were collected via confidence ratings. More precisely, for each syllogism, participants were asked to first decidewhether it was valid or invalid and then give a rating of confidencein their response on a 3-point rating scale. Responses were subsequently recoded using the numbers 1 to 6, where 1 reflects ahigh-confidence “valid” judgment, 3 a low-confidence “valid”judgment, 4 a low-confidence “invalid” judgment, and 6 a highconfidence “invalid” judgment.DRH based their argument on the so-called receiver operatingcharacteristic (ROC). An ROC is a two-dimensional plot plottingtwo aspects of performance across different levels of responseKarl Christoph Klauer and David Kellen, Institut für Psychologie,Albert-Ludwigs-Universität Freiburg, Freiburg, Germany.The research reported in this article was supported by Grant Kl 614/31-1from the Deutsche Forschungsgemeinschaft to Karl Christoph Klauer.Correspondence concerning this article should be addressed to Karl ChristophKlauer, Institut für Psychologie, Albert-Ludwigs-Universität Freiburg, D-79085Freiburg, Germany. E-mail: [email protected]

This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.BELIEF BIAStraditional analyses of the data in terms of linear models such asanalyses of variance.1DRH argued that the particular multinomial models they fittedrepresent appropriate extensions of Klauer, Musch, and Naumer’s(2000) multinomial model of belief bias. That model was developedto account for data collected in a binary (valid vs. invalid) format andcannot be applied to confidence-rating data without modification.DRH concluded that their data was not consistent with Klauer et al.’smodel.On the basis of two null findings for effects of conclusionbelievability on parameters quantifying reasoning performance(Experiments 2 and 3), DRH furthermore concluded that beliefbias in syllogistic reasoning is just a response-bias effect and thatthere are no effects of conclusion believability on reasoning. Theyalso concluded that their data were inconsistent with accounts ofbelief bias in terms of selective scrutiny, misinterpreted necessity,mental models, metacognitive uncertainty, verbal reasoning theory, modified verbal reasoning theory, and the selective processingtheory,2 whereas the only theory of belief bias consistent with theirdata was claimed to be the one provided by SDT.In what follows, we consider each of these conclusions in turn,beginning with the issue of nonlinearity of ROCs.The Shape of the ROC in Reasoning DataThe bulk of data collected on belief bias in syllogistic reasoninghas employed a binary (valid vs. invalid) response format (for areview, see Klauer et al., 2000). Unfortunately, it is an openquestion how confidence-based ROCs relate to binary ROCs. DRHdrew heavily on the literature on recognition memory in which theshape of ROCs has been examined for a couple of decades and inwhich nonlinear confidence-based ROCs have frequently beenobserved. There is, however, a recent article by Bröder and Schütz(2009) in that literature that has received surprisingly little attention.3 Bröder and Schütz argued as others have (Erdfelder &Buchner, 1998; Klauer & Kellen, in press; Malmberg, 2002) thatconfidence ratings may create rather than reveal nonlinear ROCsdue to variations, across and within participants, in response style.A meta-analysis of data from 59 studies and three new experimentsconducted by Bröder and Schütz suggests that binary ROCs maybe linear in the field of recognition memory, underlining empirically the important theoretical argument that nonlinear confidencebased ROCs do not imply nonlinear binary ROCs. We elaborate onthis point later.Considering binary ROCs, there is evidence for nonlinearshapes in some domains (e.g., in perception; Egan, Schulman, &Greenberg, 1959) and evidence for a linear shape in others (e.g., inworking memory; Rouder et al., 2008). The issue is moot inrecognition memory (Bröder & Schütz, 2009).Most studies on belief bias in syllogistic reasoning collectedbinary (valid vs. invalid) responses. Taking the previously mentioned arguments into account, there may therefore not be a nonlinearity problem to begin with in this literature. The nonlinearityproblem postulated by DRH may instead be created by their use ofconfidence ratings. It would therefore be good to have morepositive reassurance for the possibility that nonlinearity of ROCsgeneralizes beyond the use of confidence ratings in the field ofsyllogistic reasoning.165Fortunately, there are published data sets, some of them ofconsiderable size, that can be used to address the issue. We fittedKlauer et al.’s (2000) multinomial model and the SDT model to the10 data sets reported by Klauer et al. that employed a manipulationof response bias via base rate and a binary response format. For thebinary response format, the multinomial model implies linearROCs, whereas the SDT model implies nonlinear ROCs.4Table 1 shows the results for these 10 data sets (Study 8 inKlauer et al., 2000, did not employ a base rate manipulation, so themodels cannot be fitted to the results of that study). The tablereports the goodness-of-fit index G2, the associated p values (smallvalues indicating misfit), and model-selection indices Akaike’sinformation criterion (AIC) and Bayesian information criterion(BIC) as used by DRH for comparing models (models with smallervalues are preferred).5 Because DRH did not employ syllogismswith neutral conclusions, syllogisms with neutral conclusions asused in Klauer et al.’s Studies 1, 3, 5, and 7 were excluded for thefits reported in Table 1. Including them leads to the same conclusions; in fact, the results become even stronger in the directionsummarized next.As can be seen in Table 1, the multinomial model outperformsthe SDT model in 8 of 10 cases in terms of AIC and in 10 of 10cases in terms of BIC. The differences in AIC values, but not thosein BIC values, are numerically small for each individual study, butit is possible to enhance the information-to-noise ratio via theaggregation principle.One way to do this is to test whether the differences in AICvalues and BIC values are significant across the 10 data sets. Thedifference in AIC values between the two models is significant ina Wilcoxon test (Z –2.29, p .02), as is that in BIC values (Z –2.80, p .01). Another way to do this is to consider the 10 datasets as one big data set and to compute AIC and BIC for the joint1As pointed out by Klauer et al. (2000), even linear ROCs wouldquestion such analyses unless the slope of the linear ROC is 1. Klauer etal. proposed a model-based approach to remedy this problem.2DRH argued, however, that their data and model were broadly consistent with broader theories of reasoning such as Chater and Oaksford’s(1999) probability heuristics model of syllogistic reasoning.3DRH also did not deal with current criticisms suggesting that theinterpretation of confidence-based ROCs entertained in their article isinadequate (Benjamin, Diaz, & Wee, 2009; Ratcliff & Starns, 2009; Rosner& Kochanski, 2009).4Details on how these analyses were done including R scripts (RDevelopment Core Team, 2009), HMMTree equation files (Stahl &Klauer, 2007), and data files can be found through the supplementalmaterials link at the beginning of the article or at uer/r-scripts.zip/5In fitting the SDT model to the data from Study 7, we had to put anupper bound on the model parameters, because maximum likelihood estimation led to unrealistically large values for these parameters. The upperbound was 3, an unrealistically large value for any of the model parameters. The problem arises because in this study invalid believable syllogismswere accepted as frequently as valid believable syllogisms, as predicted byKlauer et al. (2000). This occasionally occurs (see, e.g., the data setpresented as an introductory example by DRH in their Table 1) but shouldnot happen, according to the SDT model. This pattern of belief bias causesproblems in estimating the SDT variance parameter. Excluding Study 7from the analyses altogether does not change the results, including theoutcome of the significance tests reported next.

KLAUER AND KELLEN166Table 1Fit Indices and Model-Selection Indices for Data Sets by Klauer et al. (2000)Multinomial modelThis document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.Study anddata setaStudy 1Study 2NaiveExpertStudy 3Study 4NaiveExpertStudy 5Study 6NaiveExpertStudy 7G2bSignal detection 0757.60174.39175.9499.341.7421.2110.15.78 .27.50 .01.0188.60105.0161.56156.54178.33101.07Note. G2 goodness-of-fit index; AIC Akaike’s information criterion; BIC Bayesian information criterion.aNaive and expert refer to data from participants who reported no prior experience with formal logic and who did report such experience,respectively. b df 4. c df 2.data (with different parameters estimated for each individualstudy). This yields a difference in AIC of 20.24 in favor of themultinomial model and a difference in BIC of 198.91. Accordingto the rules of thumb stated by Burnham and Anderson (2005,Chapter 2), a difference in AIC values larger than 10 observed ina large data set means that the model with the larger AIC (i.e., theSDT model) has essentially no empirical support (Burnham &Anderson, 2005, p. 70) relative to the model with the smaller AIC(i.e., the multinomial model).In sum, there is surprisingly strong evidence for the multinomialmodel.6 These findings thereby parallel those obtained by Bröderand Schütz (2009) in the field of recognition memory. The conclusion, by DRH’s standards, is that as far as we can tell on thebasis of the available data, ROCs are linear for binary responseformats. This suggests that the use of confidence ratings createsrather than reveals problems of nonlinearity (for reasons elaborated on later). Because most studies on belief bias are based onbinary responses, the nonlinearity problem postulated by DRHmay be largely nonexistent. Another conclusion is that the multinomial model describes the old data significantly better than doesthe SDT model.DRH presented one analysis involving Klauer et al.’s (2000)original model for DRH’s Experiment 3 with data dichotomized(see DRH’s Table 9). Unfortunately, they used 3 as degrees offreedom for that model, but the degrees of freedom of the modelequals 4.7 This changes the values of p, AIC, and BIC for themultinomial model. Less importantly, the likelihood terms in AICand BIC are computed wrongly. According to our reanalysis of themultinomial and the SDT models, respectively, p values are .23and .85, AIC values are 93.01 and 91.72, and BIC values are143.14 and 154.37. Considering model fit, there is no indication inthe p values of significant model violations for either model.Considering the model-comparison indices, the two models aremore or less tied on AIC, whereas the multinomial model performsconsiderably better in terms of BIC. This suggests that Klauer etal.’s model provides, if anything, a better description of DRH’sdata than does the SDT model.Note that Klauer et al.’s (2000) model also shows the null effect ofbelievability on reasoning parameters that the SDT model analysesexhibit. The reasoning parameters of the multinomial model—rvb, rvu,rib, riu—measure the participants’ ability to determine the validity orinvalidity of syllogisms, separately, for valid (v) and invalid (i) syllogisms with believable (b) and unbelievable (u) conclusions. For thedichotomized data of DRH’s Experiment 3, the H0 of no effect ofbelief on reasoning, rvb rvu and rib riu, can be maintained ( G2 0.96, df 2, p .62). DRH chose not to report this, although theypresented the analogous information for the SDT model applied to thedichotomized data (i.e., the H0: vb vu and vb vu can bemaintained). Instead, they reported that rvb riu rvu can bemaintained, whereas it is not possible to set all four r parameters equalto each other, suggesting that rib differs from the other three once6One statement found in the SDT literature is that empirical ROCs withmore than 3 points are more diagnostic for discriminating between modelswith differently shaped predicted ROCs than empirical ROCs based on 3points, as in the Klauer et al. (2000) data (e.g., Bröder & Schütz, 2009). Ifso, this would make it difficult to obtain a clear decision in favor of one ofthe two models on the basis of the Klauer et al. data, rendering the currentoutcome the more impressive.7Only the ratio of parameters u and b, but not their absolute values,is identified in Klauer et al.’s (2000) model. To fix the scale, b is set equalto 1 a priori and is therefore not a free parameter. This does not imply thatthe “true” value of b, if it could be identified, is 1.8Consider an analogously focused test strategy for the SDT model. Theparameters governing reasoning performance in the SDT model are the means xy and standard deviations xy of the distributions of valid (x v) and invalid(x i) syllogisms with believable (y b) and unbelievable (y u) conclusions with iu ib 1 and iu ib 0 imposed a priori. The four parameters and the four parameters cannot simultaneously be set equal; thatis, the and/or parameters differ as a function of validity and/or belief( G2 345.66, p .01, with df 4 due to the a priori constraints). In asecond step, it is seen, however, that the four parameters can be set equal (toone): G2 2.86, df 2, p .24. Once these have been set equal, effects ofbelief on the two parameters not constrained a priori are “revealed”; that is,the H0: vb vu must be rejected ( G2 5.28, df 1, p .02).

BELIEF BIASthese have been set equal. DRH asserted that the multinomial modelconcludes that there are effects of belief on the reasoning stage. Thereis, however, little justification for this focused test strategy, and nonesuch is pursued for the SDT model.8167adapting the model to deal with different response formats if theresulting model is to be consistent with Klauer et al.’s (2000)model for binary data.State-Response MappingThis document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.Klauer et al.’s (2000) Model for Confidence RatingsKlauer et al.’s (2000) model was designed for binary data. Itspurpose was to provide a measurement tool to measure reasoningaccuracy for the four kinds of syllogisms typically investigated instudies of belief bias (i.e., valid and invalid syllogisms crossedwith believable and unbelievable conclusions), correcting for response bias and possible effects of belief on it. One issue in Klaueret al. was that response bias should be controlled for in evaluatingreasoning performance for each kind of syllogism.To extend the model to confidence ratings, we found itconvenient to present the model in two parts, a stimulus-statemapping and a state-response mapping (Klauer & Kellen, inpress).Stimulus-State MappingThe stimulus-state mapping specifies how each kind of syllogism is mapped on a number of unobservable mental states. InKlauer et al.’s (2000) most basic model, there are two detectionstates: M1, in which a valid syllogism is correctly detected asvalid, and M2, in which an invalid syllogism is correctly detected as invalid. There are also two states of uncertainty—M3band M3u—in which participants are uncertain about the syllogism’s logical status and in which responses are based on aninformed guessing process that draws on extralogical information such as conclusion believability. For M3b, the logical status(valid vs. invalid) of a given believable syllogism is not detected, and for M3u, the logical status of a given unbelievablesyllogism is not detected.The stimulus-state mapping is depicted in Figure 1. The parameters r of the stimulus-state mapping provide the probabilities withwhich detection states M1 and M2 rather than the uncertainty statesM3b and M3u are reached. For example, given a valid (v), believable (b) syllogism, the probability of reaching state M1 is rvb andthat of reaching state M3b is 1 – rvb. The stimulus-state mapping isindependent of response format and should not be changed inFigure 1.The state-response mapping specifies how states are mapped onresponses. For binary responses, detection states M1 and M2 lead to“valid” and “invalid” responses, respectively, deterministically. Inuncertainty states, response guessing occurs that may be biased byconclusion believability (and other extralogical cues such as baserate). Thus, in state M3b (M3u), the “valid” response is guessedwith probability ab (au), and the “invalid” response with probability 1 – ab (1 – au).In modeling confidence ratings, only the state-response mappingneeds to be adapted; the stimulus-state mapping is independent ofresponse format. Adapting the state-response mapping is straightforward. Table 2 shows a plausible state-response mapping following the one used by Klauer and Kellen (in press) for modelingconfidence ratings. As for the case of binary responses, and in linewith the definition of detection states, detection states M1 and M2are mapped on “valid” and “invalid” responses in a deterministicfashion. There are, however, three “valid” and three “invalid”responses that can occur, and the probabilities of using these aremodeled, following Klauer and Kellen, by three parameters: sl, sm,and sh for the ratings expressing lowest, medium, and highestconfidence, respectively. The three s parameters have to sum to 1,so there are only two free parameters to be estimated. Becausepeople differ in their propensity to use extreme ratings (known asextreme response style; Hamilton, 1968), and because there isintraindividual variation in scale usage (e.g., Haubensak, 1992), itis not reasonable to assume that detection states such as M1 and M2are invariably mapped on highest confidence responses (seeOnyper, Zhang, & Howard, 2010, for a similar assumption in theSDT framework). The scale-usage parameters sl, sm, and sh captureinterindividual and intraindividual variation in scale usage. Theyare not a function of the syllogism’s validity or believability. Notethat the ROC implied by this model is nonlinear if sh is smallerthan 1. That is, interindividual and intraindividual variations inscale usage, leading to some less than highest confidence responses in detection states, cause nonlinear ROCs according to thismodel.Stimulus-state mapping of Klauer et al.’s (2000) basic model.

KLAUER AND KELLEN168Table 2State-Response Mapping for Klauer et al.’s (2000) ModelExtended to Confidence RatingsThis document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.“Valid” response“Invalid” 0shab(6)au(6)Note. M1 detection state in which a valid syllogism is correctlydetected as valid; M2 detection state in which an invalid syllogism iscorrectly detected as invalid; M3b state of uncertainty in which thelogical status of a given believable syllogism is not detected; M3u stateof uncertainty in which the logical status of a given unbelievable syllogismis not detected; sh highest confidence parameter; sm medium confidence parameter; sl lowest confidence parameter; ab(1) to ab(6) guessing parameters for believable syllogisms; au(1) to au(6) guessingparameters for unbelievable syllogisms.The guessing parameters ab(1), . . . , ab(6) and au(1), . . . , au(6)correspond conceptually to the parameters ab and au, respectively,for the binary response format. Because each of ab(1), . . . , ab(6)and au(1), . . . , au(6) have to sum to 1, there are only fiveparameters to be estimated per mental state M3b and M3u. Thestate-response mapping thus comprises 12 parameters (i.e., 10nonredundant guessing parameters and two nonredundant scaleusage parameters). Taken together with the four r parametersgoverning the stimulus-state mapping, the model requires 16parameters and thus two more than the SDT model for confidence ratings.9Model Fits and Model Comparisons for DRH’s DataTable 3 presents the results of a reanalysis of DRH’s data withKlauer et al.’s (2000) model for confidence ratings and DRH’sSDT model. Parameter estimates for the multinomial model areshown in Table 4. In their own analyses, DRH presented the fit andmodel-selection indices per condition for each data set (e.g., separately for the data from believable and unbelievable syllogisms;see DRH’s Tables 4, 6, and 12). However, when models such asthe ones fitted by DRH and the present model have parameters(e.g., the present s parameters) that are shared by the differentconditions, the condition-wise indices G2, p, AIC, and BIC do nothave the intended statistical interpretations. Table 3 thereforepresents the results per experiment.10In terms of goodness of fit, the SDT model is inconsistentwith three of the four data sets; that is, the goodness-of-fitstatistic G2 indicates significant violations of the model assumptions with p smaller than .001 for three of four data sets.This is surprising given DRH’s strong reliance on the SDTmodel (DRH did not report p values). Goodness of fit approaches more acceptable levels for the multinomial model,although there is certainly room for improvement (G2 indicatessignificant model violations with p .01 in one case and withp .05 in a second case).Note, however, that for large data sets, there is high power todetect even tiny model violations. To take this into account,several approaches have been considered in the literature. Onepossibility is to compute compromise power analyses with effectsize w .10 (a small effect according to Cohen, 1988) and a / ratio 1 for each data set (Bröder & Schütz, 2009), adjusting thelevel of significance according to sample size. Another possibility is to use relaxed criteria for p and G2, as found in theliterature on structural equation modeling (Schermelleh-Engel,Moosbrugger, & Müller, 2003), according to which .05 p ⱕ 1.0corresponds to a good fit and .01 ⱕ p ⱕ .05 to an acceptable fit,whereas 0 ⱕ G2 ⱕ 2 dfs corresponds to a good fit and 2 dfs G2 ⱕ 3 dfs to an acceptable fit. Whichever of these criteria is used,the SDT model is rejected in three of four cases, whereas themultinomial model is rejected in one or two cases, depending uponthe criterion used.In terms of model-selection indices AIC and BIC, the multinomial model outperforms the SDT model in three of four cases forAIC and in two of four cases for BIC, although one of these lattercases is more or less a tie. Taken together with the goodness-of-fitresults, the multinomial model would probably be preferred on thebasis of the results.Given the poor goodness of fit of the SDT model, it wouldprobably be prudent not to interpret its parameter estimates further(see Footnote 10). Nevertheless, DRH reported that manipulationsof base rate (Experiments 1 and 3) as well as manipulations ofconclusion believability (Experiments 2 and 3) map on the response criteria, whereas there are mostly no significant effects onthe parameters governing reasoning performance. The same pattern of results is obtained for the multinomial model: Manipulations of base rate (Experiments 1 and 3) as well as manipulationsof conclusion believability (Experiments 2 and 3) map on theguessing parameters capturing response bias (smallest G2 21.48, largest p .01), whereas there are no effects on the rparameters governing reasoning performance (largest G2 4.39,smallest p .13).Taken together, DRH’s data are inconsistent with the SDTmodel and somewhat better described by Klauer et al.’s (2000)model for confidence ratings. This implies that what is wrong with9In Experiment 1, DRH manipulated perceived base rate in twosteps. Base rate here takes the role of conclusion believability in themultinomial model, as it does in the SDT model. In Experiment 3, theymanipulated perceived base rate in three steps. In analyzing the data asa function of base rate, base rate again takes the role of conclusionbelievability, but there are now three levels for this factor. Here, themultinomial model has 23 parameters and the SDT model 21 parameters. The parameters for the multinomial models are six r parameters (2[valid vs. invalid] 3 [3 base rates]), 15 nonredundant guessingparameters (five per base rate condition), and two nonredundant sparameters for scale usage.10Presenting the results per experiment also supports DRH’s intention tocompare parameters across conditions. For example, DRH wished todetermine whether response criterion parameters differ signif

Karl Christoph Klauer and David Kellen, Institut fu r Psychologie, Albert-Ludwigs-Universita t Freiburg, Freiburg, Germany. The research reported in this article was supported by Grant Kl 614/31-1 from the Deutsche Forschungsgemeinschaft to Karl Christoph Klauer. Correspondence concerning this article should be addressed to Karl Christoph