### Transcription

Paper SAS400-2014An Introduction to Bayesian Analysis with SAS/STAT SoftwareMaura Stokes, Fang Chen, and Funda GunesSAS Institute Inc.AbstractThe use of Bayesian methods has become increasingly popular in modern statistical analysis, with applications in numerous scientific fields. In recent releases, SAS has provided a wealth of tools for Bayesiananalysis, with convenient access through several popular procedures as well as the MCMC procedure, whichis designed for general Bayesian modeling. This paper introduces the principles of Bayesian inference andreviews the steps in a Bayesian analysis. It then describes the built-in Bayesian capabilities provided inSAS/STAT , which became available for all platforms with SAS/STAT 9.3, with examples from the GENMODand PHREG procedures. How to specify prior distributions, evaluate convergence diagnostics, and interpretthe posterior summary statistics is discussed.FoundationsBayesian methods have become a staple for the practicing statistician. SAS provides convenient toolsfor applying these methods, including built-in capabilities in the GENMOD, FMM, LIFEREG, and PHREGprocedures (called the built-in Bayesian procedures), and a general Bayesian modeling tool in the MCMCprocedure. In addition, SAS/STAT 13.1 introduced the BCHOICE procedure, which performs Bayesianchoice modeling. With such convenient access, more statisticians are digging in to learn more about thesemethods.The essence of Bayesian analysis is using probabilities that are conditional on data to express beliefs aboutunknown quantities. The Bayesian approach also incorporates past knowledge into the analysis, and so itcan be viewed as the updating of prior beliefs with current data. Bayesian methods are derived from theapplication of Bayes’ theorem, which was developed by Thomas Bayes in the 1700s as an outgrowth of hisinterest in inverse probabilities.For events A and B, Bayes’ theorem is expressed asPr.AjB/ DPr.BjA/ Pr.A/Pr.B/It can also be written asPr.AjB/ DPr.BjA/ Pr.A/N Pr.A/NPr.BjA/ Pr.A/ C Pr.BjA/where AN means not A. If you think of A as a parameter and B as data y, then you havePr. jy/ DPr.yj / Pr. /Pr.yj / Pr. /DPr.y/Pr.yj / Pr. / C Pr.yj N / Pr. N /The quantity Pr.y/ is the marginal probability, and it serves as a normalizing constant to ensure that theprobabilities add up to unity. Because Pr.y/ is a constant, you can ignore it and writePr. jy/ / Pr.yj / Pr. /1

Thus, the likelihood Pr.yj / is being updated with the prior Pr. / to form the posterior distribution Pr. jy/.For a basic example of how you might update a set of beliefs with new data, consider a situation whereresearchers screen for vision problems in children in an after-school program. A study of 14 students chosenat random produces two students with vision issues. The likelihood is obtained from the binomial distribution:!14 2LDp .12p/12Suppose that the parameter p only takes values { 0.1, 0.12, 0.14, 0.16, 0.18, 0.20}. Researchers haveprior beliefs about the probabilities of these values, and they assign them prior weights. Columns 1 and 2in Table 1 contain the possible values for p and the prior probability weights, respectively. You can thencompute the likelihoods for each of the values for p based on the study results, and then you can weightthem with the corresponding prior weight. Column 5 contains the posterior values, which are the computedvalues displayed in column 4 divided by the normalizing constant 0.2501. Thus, the prior beliefs have beenupdated to a posterior distribution by accounting for the data obtained by the study. The posterior values aresimilar to, but different from, the likelihood.Table 1 Empirical Posterior DistributionpPrior WeightLikelihoodPrior x 1030.1700.2330.2300.1640.1001In a nutshell, this is what any Bayesian analysis does: it updates your beliefs about the parameters byaccounting for additional data. You weight the likelihood for the data with the prior distribution to produce theposterior distribution. If you want to estimate a parameter from data y D fy1 ; : : : ; yn g by using a statisticalmodel described by density p.yj /, Bayesian philosophy says that you can’t determine exactly but you candescribe the uncertainty by using probability statements and distributions. You formulate a prior distribution . / to express your beliefs about . You then update those beliefs by combining the information fromthe prior distribution and the data, described with the statistical model p. jy/, to generate the posteriordistribution p. jy/.p. jy/ Dp. ; y/p.yj / . /p.yj / . /DDRp.y/p.y/p.yj / . /d The quantity p.y/ is the normalizing constant of the posterior distribution. It is also called the marginaldistribution, and it is often ignored, as long as it is finite. Hence p. jy/ is often written asp. jy/ / p.yj / . / D L. / . /where L is the likelihood and is defined as any function that is proportional to p.yj /. This expression makesit clear that you are effectively weighting your likelihood with the prior distribution. Depending on the influenceof the prior, your previous beliefs can impact the generated posterior distribution either strongly (subjectiveprior) or minimally (objective or noninformative prior).Consider the vision example again. Say you want to perform a Bayesian analysis where you assume a flatprior for p, or one that effectively will have no influence. A typical flat prior is the uniform, .p/ D 12

and because the likelihood for the binomial distribution is written as!n yL.p/ Dp .1yp/nyyou can write the posterior distribution as .pjy/ / p 2 .1p/12which is also a beta (3,13) distribution. The flat prior weights equally on the likelihood, making the posteriordistribution have the same functional form as the likelihood function. The difference is that, in the likelihoodfunction, the random variable is y; in the posterior, the random variable is p. Figure 1 displays how theposterior distribution and the likelihood have the same form for a flat prior.Figure 1 Beta (3,13) Posterior with Flat Uniform PriorPriorLike/Posterior0.00.20.40.60.81.0pYou can compute some summary measures of the posterior distribution directly such as an estimate of themean of p and its variance, but you might want to compute other measures that aren’t so straightforward,such as the probability that p is greater than a certain value such as 0.4. You can always simulate data fromthe beta distribution and address such questions by working directly with the simulated samples.The following SAS statements create such a simulated data set for the beta (3,13) distribution:data seebeta;%let N 10000;call streaminit (1234);a 3; b 13;do i 1 to &N;y rand("beta", a, b );output;end;run;The results can be seen by the histogram in Figure 2 generated by using with the SGPLOT procedure.3

Figure 2 Simulation for beta(3,13)The mass of the distribution lies between 0.0 and 0.4, with the heaviest concentration between 0.1 and 0.2.Very little of the distribution lies beyond p 0.5. If you want to determine the probability of p 0.4, youwould total the area under the curve for p 0.4.More often, closed forms for the posterior distribution like the beta distribution discussed above are notavailable, and you have to use simulation-based methods to estimate the posterior distribution itself, notjust draw samples from it for convenience. Thus, the widespread use of Bayesian methods had to wait forthe computing advances of the late 20th century. These methods involve repeatedly drawing samples froma target distribution and using the resulting samples to empirically approximate the posterior distribution.Markov chain Monte Carlo (MCMC) methods are used extensively. A Markov chain is a stochastic processthat generates conditional independent samples according to a target distribution; Monte Carlo is a numericalintegration technique that finds an expectation of an integral. Put together, MCMC methods generate asequence of dependent samples from the target posterior distribution and compute posterior quantities ofinterest by using Monte Carlo. Popular and flexible MCMC simulation tools are the Metropolis, MetropolisHastings, and Gibbs sampling algorithms as well as numerous variations.This paper does not discuss the details of these computational methods, but you can find a summary inthe “Introduction to Bayesian Analysis” chapter in the SAS/STAT User’s Guide as well as many references.However, understanding the need to check for the convergence of the Markov chains is essential in performingBayesian analysis, and this is discussed later.The Bayesian MethodBayesian analysis is all about the posterior distribution. Parameters are random quantities that havedistributions, as opposed to the fixed model parameters of classical statistics. All of the statistical inferencesof a Bayesian analysis come from summary measures of the posterior distribution, such as point and intervalestimates. For example, the mean or median of a posterior distribution provides point estimates for ,whereas its quantiles provide credible intervals.These credible intervals, also known as credible sets, are analogous to the confidence intervals in frequentistanalysis. There are two types of credible intervals: the equal-tail credible interval describes the region4

between the cut-points for the equal tails that has 100.1 /% mass, while the highest posterior density(HPD), is the region where the posterior probability of the region is 100.1 /% and the minimum densityof any point in that region is equal to or larger than the density of any point outside that region. Somestatisticians prefer the equal-tail interval because it is invariant under transformations. Other statisticiansprefer the HPD interval because it is the smallest interval, and it is more frequently used.The prior distribution is a mechanism that enables the statistician to incorporate known information into theanalysis and to combine that information with that provided by the observed data. For example, you mighthave expert opinion or historical information from previous studies. You might know the range of values for aparticular parameter for biological reasons. Clearly, the chosen prior distribution can have a tremendousimpact on the results of an analysis, and it must be chosen wisely. The necessity of choosing priors, and itsinherent subjectivity, is the basis for some criticism of Bayesian methods.The Bayesian approach, with its emphasis on probabilities, does provide a more intuitive framework forexplaining the results of an analysis. For example, you can make direct probability statements aboutparameters, such as that a particular credible interval contains a parameter with measurable probability.Compare this to the confidence interval and its interpretation that, in the long run, a certain percentage ofthe realized confidence intervals will cover the true parameter. Many non-statisticians wrongly assume theBayesian credible interval interpretation for a confidence interval interpretation.The Bayesian approach also provides a way to build models and perform estimation and inference forcomplicated problems where using frequentist methods is cumbersome and sometimes not obvious. Hierarchical models and missing data problems are two cases that lend themselves to Bayesian solutions nicely.Although this paper is concerned with less sophisticated analyses in which the driving force is the desire forthe Bayesian framework, it’s important to note that the consummate value of the Bayesian method might beto provide statistical inference for problems that couldn’t be handled without it.Prior DistributionsSome practitioners want to benefit from the Bayesian framework with as limited an influence from theprior distribution as possible: this can be accomplished by choosing priors that have a minimal impacton the posterior distribution. Such priors are called noninformative priors, and they are popular for someapplications, although they are not always easy to construct. An informative prior dominates the likelihood,and thus it has a discernible impact on the posterior distribution.A prior distribution is noninformative if it is “flat” relative to the posterior distribution, as demonstrated inFigure 1. However, while a noninformative prior can appear to be more objective, it’s important to realizethat there is some degree of subjectivity in any prior chosen; it does not represent complete ignoranceabout the parameter in question. Also, using noninformative priors can lead to what is known as improperposteriors (nonintegrable posterior density), with which you cannot make inferences. Noninformative priorsmight also be noninvariant, which means that they could be noninformative in one parameterization but notnoninformative if a transformation is applied.On the other hand, an improper prior distribution, such as the uniform prior distribution on the number line,can be appropriate. Improper prior distributions are frequently used in Bayesian analysis because theyyield noninformative priors and proper posterior distributions. To form a proper posterior distribution, thenormalizing constant has to be finite for all y.Some of the priors available with the built-in Bayesian procedures are improper, but they all produce properposterior distributions. However, the MCMC procedure enables you to construct whatever prior distributionyou can program, and so you yourself have to ensure that the resulting posterior distribution is proper.More about PriorsJeffreys’ prior (Jeffreys 1961) is a useful prior because it doesn’t change much over the region in which thelikelihood is significant and doesn’t have large values outside that range—the local uniformity property. It isbased on the observed Fisher information matrix. Because it is locally uniform, it is a noninformative prior.Thus, it provides an automated way of finding a noninformative prior for any parametric model; it is alsoinvariant with respect to one-to-one transformations. The GENMOD procedure computes Jeffreys’ prior forany generalized linear model, and you can use it for your prior density for any of the coefficient parameters.Jeffreys’ prior can lead to improper posteriors, but not in the case of the PROC GENMOD usage.5

You can show that Jeffreys’ prior is .p/ / p1 2.1p/1 2for the binomial distribution, and the posterior distribution for the vision example with Jeffreys’ prior isL.p/ .p//py12p/n.1y12 beta.2:5; 12:5/Figure 3 displays how the Jeffreys’ prior for the vision study example is relatively uninfluential in the areaswhere the posterior has the most mass.Figure 3 Beta (2.5,12.5) Posterior with Jeffreys’ pA prior is a conjugate prior for a family of distributions if the prior and posterior distributions are from thesame family. Conjugate priors result in closed-form solutions for the posterior distribution, enabling eitherdirect inference or the construction of efficient Markov chain Monte Carlo sampling algorithms. Thus, thedevelopment of these priors was driven by the early need to minimize computational requirements. Althoughthe computational barrier is no longer an issue, conjugate priors can still have performance benefits, andthey are frequently used in Markov chain simulation because they directly sample from the target conditionaldistribution. For example, the GENMOD procedure uses conjugacy sampling wherever it is possible.The beta is a conjugate prior for the binomial distribution. If the likelihood is based on the binomial (n,p):L.p/ / p y .1p/nyand the prior is a beta ( , ˇ), .pj ; ˇ/ / p 1.1p/ˇ1then the posterior distribution is written as .pj ; ˇ; y; n//Dp yC 1 .1 p/n yCˇ 1beta .y C ; n y C ˇ/6

This posterior is easily calculated, and you can rely on simulations from it to produce the measures ofinterest, as demonstrated above.Assessing ConvergenceAlthough this paper does not describe the underlying MCMC computations, and you can perform Bayesiananalysis without knowing the specifics of those computations, it is important to understand that a Markovchain is being generated (its stationary distribution is the desired posterior distribution) and that you mustcheck its convergence before you can work with the resulting posterior statistics. An unconverged Markovchain does not explore the parameter space sufficiently, and the samples cannot approximate the targetdistribution well. Inference should not be based on unconverged Markov chains, or misleading results canoccur. And you need to check the convergence of all the parameters, not just the ones of interest.There is no definitive way of determining that you have convergence, but there are a number of diagnostictools that tell you if the chain hasn’t converged. The built-in Bayesian procedures provide a numberof convergence diagnostic tests and tools, such as Gelman-Rubin, Geweke, Heidelberger-Welch, andRaftery-Lewis tests. Autocorrelation measures the dependency among the Markov chain samples, and highcorrelations can indicate poor mixing. The Geweke statistic compares means from early and late parts of theMarkov chain to see whether they have converged. The effective sample size (ESS) is particularly usefulas it provides a numerical indication of mixing status. The closer ESS is to n, the better the mixing in theMarkov chain. In general, an ESS of approximately 1,000 is adequate for estimating the posterior density.You might want it larger if you are estimating tail percentiles.One of the ways that you can assess convergence is with visual examination of the trace plot, which is a plotof the sampled values of a parameter versus the sample number. Figure 4 displays some types of traceplots that can result:Figure 4 Types of Trace PlotsGood MixingBurn-InNonconvergenceThinning?By default, the built-in Bayesian procedures discard the first 2,000 samples as burn-in and keep thenext 10,000. You want to discard the early samples to reduce the potential bias they might have on theestimates.(You can increase the number of samples when needed.) The first plot shows good mixing. Thesamples stay close to the high-density region of the target distribution; they move to the tail areas but7

quickly return to the high-density region. The second plot shows evidence that a longer burn-in periodis required. The third plot sets off warning signals. You could try increasing the number of samples, butsometimes a chain is simply not going to converge. Additional adjustment might be required such as modelreparameterization or using a different sampling algorithm. Some practitioners might see thinning, or thepractice of keeping every kth iteration to reduce autocorrelation, as indicated. However, current practicetends to downplay the usefulness of thinning in favor of keeping all the samples. The Bayesian proceduresproduce trace plots and also autocorrelation plots and density plots to aid in convergence assessment.For further information about these measures, see the “Introduction to Bayesian Analysis” chapter in theSAS/STAT User’s Guide.Summary of the Steps in a Bayesian AnalysisYou perform the following steps in a Bayesian analysis. Choosing a prior, checking convergence, andevaluating the sensitivity of your results to your prior might be new steps for many data analysts, but they areimportant ones.1. Select a model (likelihood) and corresponding priors. If you have information about the parameters,use them to construct the priors.2. Obtain estimates of the posterior distribution. You might want to start with a short Markov chain.3. Carry out convergence assessment by using the trace plots and convergence tests. You usually iteratebetween this step and step 2 until you have convergence.4. Check for the fit of the model and evaluate the sensitivity of your results due to the priors used.5. Interpret the results: Do the posterior mean estimates make sense? How about the credible intervals?6. Carry out further analysis: compare different models, or estimate various quantities of interest, such asfunctions of the parameters.Bayesian Capabilities in SAS/STAT SoftwareSAS provides two avenues for Bayesian analysis: built-in Bayesian analysis in certain modeling proceduresand the MCMC procedure for general-purpose modeling. The built-in Bayesian procedures are ideal for dataanalysts beginning to use Bayesian methods, and they suffice for many analysis objectives. Simply addingthe BAYES statement generates Bayesian analyses without the need to program priors and likelihoods forthe GENMOD, PHREG, LIFEREG, and FMM procedures. Thus, you can obtain Bayesian results for thefollowing: linear regression Poisson regression logistic regression loglinear models accelerated failure time models Cox proportional models piecewise exponential models frailty models finite mixture models8

The built-in Bayesian procedures apply the appropriate Markov chain Monte Carlo sampling technique. TheGamerman algorithm is the default sampling method for generalized linear models fit with the GENMODprocedure, and Gibbs sampling with adaptive rejection sampling (ARS) is generally the default, otherwise.However, conjugate sampling is available for a few cases, and the independent Metropolis algorithm and therandom walk Metropolis algorithm are also available when appropriate.The built-in Bayesian procedures provide default prior distributions depending on what models are specified.You can choose from other available priors by using the CPRIOR option (for coefficient parameters) andSCALEPRIOR option (for scale parameters). Other options allow you to choose the numbers of burn-ins,the number of iterations, and so on. The following posterior statistics are produced: point estimates: mean, standard deviation, percentiles interval estimates: equal-tail and highest posterior density (HPD) intervals posterior correlation matrix deviance information criteria (DIC)All these procedures produce convergence diagnostic plots and statistics, and they are the same diagnosticsthat the MCMC procedure produces. You can also output posterior samples to a data set for further analysis.The following sections describe how to use the built-in Bayesian procedures to perform Bayesian analyses.Linear RegressionConsider a study of 54 patients who undergo a certain type of liver operation in a surgical unit (Neter el al1996). Researchers are interested in whether blood clotting score has a positive effect on survival.The following statements create SAS data set SURGERY. The variable Y is the survival time, and LOGX1 isthe natural logarithm of the blood clotting score.data surgery;input x1 logy;y 10**logy;label x1 'Blood Clotting Score';label y 'Survival Time';logx1 log(x1);datalines;6.72.30105.12.0043.;run;Suppose you want to perform a Bayesian analysis for the following regression model for the survival times,where is a N.0; 2 / error term:Y D ˇ0 C ˇ1 logX1 C If you wanted a frequentist analysis, you could fit this model by using the REG procedure. But this model isalso a generalized linear model (GLM) with a normal distribution and the identity link function, so it can be fitwith the GENMOD procedure, which offers Bayesian analysis. To review, a GLM relates a mean response toa vector of explanatory variables through a monotone link function where the likelihood function belongsto the exponential family. The link function g describes how the expected value of the response variable isrelated to the linear predictor,g.E.yi // D g. i / D xit ˇ9

where yi is a response variable .i D 1; : : : ; n/, g is a link function, i D E.yi /, xi is a vector of independentvariables, and ˇ is a vector of regression coefficients to be estimated. For example, when you assume anormal distribution for the response variable, you specify an identity link function g. / D . For Poissonregression you specifylog link function g. / D log. /, and for a logistic regression you specify a logit link a function g. / D log 1 .The BAYES statement produces Bayesian analyses with the GENMOD procedure for most of the models itfits; currently this does not include models for which the multinomial distribution is used. The first step of aBayesian analysis is specifying the prior distribution, and Table 2 describes priors that are supported by theGENMOD procedure.Table 2 Prior Distributions Provided by the GENMOD ProcedureParameterPriorRegression coefficientsDispersionJeffreys’, normal, uniformGamma, inverse gamma,improperGamma, improperScale and precisionYou would specify a prior for one of the dispersion, scale, or precision parameters, in models that have suchparameters.The following statements request a Bayesian analysis for the linear regression model with PROC GENMOD:proc genmod data surg;model y logx1/dist normal;bayes seed 1234 outpost Post;run;The MODEL statement with the DIST NORMAL option describes the simple linear regression model (thedefault is the identity link function.) The BAYES statement requests the Bayesian analysis. The SEED option in the BAYES statement sets the random number seed so you can reproduce the analysis in the future.The OUTPOST option saves the generated posterior samples to the POST data set for further analysis.By default, PROC GENMOD produces the maximum likelihood estimates of the model parameters, asdisplayed in Figure 5.Figure 5 Maximum Likelihood Parameter EstimatesAnalysis Of Maximum Likelihood Parameter EstimatesParameter DF EstimateStandardWald 95%Error Confidence LimitsIntercept1 -94.9822 114.5279 -319.453 129.4884logx11 170.174965.8373Scale1 135.796313.0670 112.4556 163.981541.1361 299.2137Note: The scale parameter was estimated by maximumlikelihood.Subsequent tables are produced by the Bayesian analysis. The “Model Information” table in Figure 6summarizes information about the model that you fit. The 2,000 burn-in samples are followed by 10,000samples. Because the normal distribution was specified in the MODEL statement, PROC GENMOD exploitedthe conjugacy and sampled directly from the target distribution.10

Figure 6 Model InformationBayesian AnalysisModel InformationData SetWORK.SURGERYBurn-In Size2000MC Sample Size10000Thinning1Sampling AlgorithmConjugateDistributionNormalLink FunctionIdentityDependent Variabley Survival TimeThe “Prior Distributions” table in Figure 7 identifies the prior distributions. The default uniform prior distributionis assumed for the regression coefficients ˇ0 and ˇ1 , and the default improper prior is used for the dispersionparameter. An improper prior is defined as . / /1 Both of these priors are noninformative.Figure 7 Prior DistributionsBayesian AnalysisUniform Prior forRegressionCoefficientsParameter PriorInterceptConstantlogx1ConstantIndependent PriorDistributions for ModelParametersPriorParameter DistributionDispersion ImproperFigure 8 displays the convergence diagnostics: the “Posterior Autocorrelations” table reports that theautocorrelations at the selected lags (1, 5, 10, and 50, by default) drop off quickly, indicating reasonablemixing of the Markov chain. The p-values in the “Geweke Diagnostics” table show that the mean estimate ofthe Markov chain is stable over time. The “Effective Sample Sizes” table reports that the number of effectivesample sizes of the parameters is equal to the Markov chain sample size, which is as good as that measuregets.11

Figure 8 Convergence Diagnostics TableBayesian AnalysisPosterior AutocorrelationsParameterLag 1Lag 5 Lag 10 Lag 50Intercept0.0059 0.0145 -0.0059 0.0106logx10.0027 0.0124 -0.0046 0.0091Dispersion 0.0002 -0.0031 0.0074 0.0014Geweke DiagnosticsParameterInterceptlogx1z Pr z -0.2959 0.76730.2873 0.7739Dispersion 0.9446 0.3449Effective Sample SizesParameterESSAutocorrelationTime 1.00001.0000Dispersion 10000.01.00001.0000The built-in Bayesian procedures produce three types of plots that help you visualize the posterior samplesof each parameter. These diagnostics plots for the slope coefficient ˇ1 are shown in Figure 9. The traceplot indicates that the Markov chain has stabilized with good mixing. The autocorrelation plot confirms thetabular information, and the kernel density plot estimates the posterior marginal distribution. The diagnosticplots for the other parameters (not shown here) have similar outcomes.Figure 9 Bayesian Model Diagnostic Plot for ˇ112

Because convergence doesn’t seem to be an issue, you can review the posterior statistics displayed inFigure 10 and Figure 11.Figure 10 Posterior Summary StatisticsBayesian AnalysisPosterior SummariesPercentilesParameterNStandardMean DeviationIntercept10000 -95.9018logx110000170.7119.468.6094Dispersion 10000 19919.925%50%75%-176.1 -96.0173 -16.2076124.2171.1216.74072.4 17006.0 19421.7 22253.7Figure 11 Posterior Interval StatisticsPosterior IntervalsParameter 050 37.4137135.5HPD Interval-324.0139.2304.3 36.8799303.0Dispersion 0.050 13475.4 29325.9 12598.8 28090.3The posterior summaries displayed in Figure 10 are similar to the maximum likelihood estimates shown inFigure 5. This is because noninformative priors were used and the posterior is effectively the likelihood.Figure 11 displays the HPD interval and the equal-tail interval.You might be interested in whether LOGX1 has a positive effect on survival time. You can address thisquestion by using the posterior samples that are saved to the POST data set. This means that you candetermine the conditional probability Pr.ˇ1 0 j y/ directly from the posterior sample. All you have to do isto determine the proportions of samples where ˇ1 0,Pr.ˇ1 0 j y/ DN1 XI.ˇ1t 0/N tD1where N 10000 is the number of samples after burn-in and I is an indicator function, where I.ˇ1t 0/ D 1if ˇ1t 0

updated to a posterior distribution by accounting for the data obtained by the study. The posterior values are similar to, but different from, the likelihood. Table 1 Empirical Posterior Distribution p Prior Weight Likelihood Prior x Likelihood Posterior 0.10 0.10 0.257 0.0257 0.103 0.12 0.15 0.2827 0.0424 0.170 0.14 0.20 0.2920 0.0584 0.233