Transcription

MINITAB ASSISTANT WHITE PAPERThis paper explains the research conducted by Minitab statisticians to develop the methods anddata checks used in the Assistant in Minitab Statistical Software.One-Way ANOVAOverviewOne-way ANOVA is used to compare the means of three or more groups to determine whetherthey differ significantly from one another. Another important function is to estimate thedifferences between specific groups.The most common method to detect differences among groups in one-way ANOVA is theF-test, which is based on the assumption that the populations for all samples share a common,but unknown, standard deviation. We recognized, in practice, that samples often have differentstandard deviations. Therefore, we wanted to investigate the Welch method, an alternative tothe F-test, which can handle unequal standard deviations. We also wanted to develop a methodto calculate multiple comparisons that accounts for samples with unequal standard deviations.With this method, we can graph the individual intervals, which provide an easy way to identifygroups that differ from one another.In this paper, we describe how we developed the methods used in the Minitab Assistant OneWay ANOVA procedure for: Welch test Multiple comparison intervalsAdditionally, we examine conditions that can affect the validity of the one-way ANOVA results,including the presence of unusual data, the sample size and power of the test, and the normalityof the data. Based on these conditions, the Assistant automatically performs the followingchecks on your data and reports the findings in the Report Card: Unusual data Sample size Normality of dataIn this paper, we investigate how these conditions relate to one-way ANOVA in practice and wedescribe how we established the guidelines to check for these conditions in the Assistant.WWW.MINITAB.COM

One-way ANOVA methodsThe F-test versus the Welch testThe F-test commonly used in one-way ANOVA is based on the assumption that all of the groupsshare a common, but unknown, standard deviation (σ). In practice, this assumption rarely holdstrue, which leads to problems controlling the Type I error rate. Type I error is the probability ofincorrectly rejecting the null hypothesis (concluding the samples are significantly different whenthey are not). When the samples have different standard deviations, there is a greater likelihoodthat the test will reach an incorrect conclusion. To address this problem, the Welch test wasdeveloped as an alternative to the F-test (Welch, 1951).ObjectiveWe wanted to determine whether to use the F-test or the Welch test for the One-Way ANOVAprocedure in the Assistant. To do this, we needed to evaluate how closely the actual test resultsfor the F-test and the Welch test matched the target level of significance (alpha, or Type I errorrate) for the test; that is, whether the test incorrectly rejected the null hypothesis more often orless often than intended given different sample sizes and sample standard deviations.MethodTo compare the F-test and the Welch test, we performed multiple simulations, varying thenumber of samples, the sample size, and the sample standard deviation. For each condition, weperformed 10,000 ANOVA tests using both the F-test and the Welch method. We generatedrandom data so that the means of the samples were the same and thus, for each test, the nullhypothesis was true. Then, we performed the tests using target significance levels of 0.05 and0.01. We counted the number of times out of 10,000 tests the F-test and Welch tests actuallyrejected the null hypothesis, and compared this proportion to the target significance level. If thetest performs well, the estimated Type I error should be very close to the target significancelevel.ResultsWe found that Welch method performed as well as or better than the F-test under all of theconditions we tested. For example, when comparing 5 samples using the Welch test, the Type Ierror rates were between 0.0460 and 0.0540, very close to the target significance level of 0.05.This indicates that Type I error rate for the Welch method matches the target value even whensample size and standard deviation varies across samples.ONE-WAY ANOVA2

On the other hand, the Type I error rates for the F-test were between 0.0273 and 0.2277. Inparticular, the F-test did poorly under the following conditions: The Type I error rates fell below 0.05 when the largest sample also had the largeststandard deviation. This condition results in a more conservative test and demonstratesthat simply increasing the sample size is not a viable solution when the standarddeviations for the samples are not equal. The Type I error rates were above 0.05 when the sample sizes were equal but standarddeviations were different. The rates were also greater than 0.05 when the sample with alarger standard deviation was of a smaller size than the other samples. In particular,when smaller samples have larger standard deviations, there is a substantial increase inthe risk that this test incorrectly rejects the null hypothesis.For more information on the simulation methodology and results, see Appendix A.Because the Welch method performed well when the standard deviations and sizes of thesamples were unequal, we use the Welch method for the One-way ANOVA procedure in theAssistant.Comparison intervalsWhen an ANOVA test is statistically significant, indicating that at least one of the sample meansis different from the others, the next step in the analysis is to determine which samples arestatistically different. An intuitive way to make this comparison is to graph the confidenceintervals and identify the samples whose intervals do not overlap. However, the conclusionsdrawn from the graph may not match the test results because the individual confidence intervalsare not designed for comparisons. Although a published method for multiple comparisons existsfor samples with equal standard deviations, we needed to extend this method to account forsamples with unequal standard deviations.ObjectiveWe wanted to develop a method to calculate individual comparison intervals that can be usedto make comparisons across samples and that also match the test results as closely as possible.We also wanted to provide a visual method for determining which samples are statisticallydifferent from the others.MethodStandard multiple comparison methods (Hsu 1996) provide an interval for the differencebetween each pair of means while controlling for the increased error that occurs when makingmultiple comparisons. In the special case of equal sample sizes and under the assumption ofequal standard deviations, it is possible to display individual intervals for each mean in a waythat corresponds exactly to the intervals for the differences of all the pairs. For the case ofunequal sample sizes, with the assumption of equal standard deviations, Hochberg, Weiss, andHart (1982) developed individual intervals that are approximately equivalent to the intervals forONE-WAY ANOVA3

differences among pairs, based on the Tukey-Kramer method of multiple comparisons. In theAssistant, we apply the same approach to the Games-Howell method of multiple comparisons,which does not assume equal standard deviations. The approach used in the Assistant in release16 of Minitab was similar in concept, but was not based directly on the Games-Howell approach.For more details, see Appendix B.ResultsThe Assistant displays the comparison intervals in the Means Comparison Chart in the One-WayANOVA Summary Report. When the ANOVA test is statistically significant, any comparisoninterval that does not overlap with at least one other interval is marked in red. It is possible forthe test and the comparison intervals to disagree, although this outcome is rare because bothmethods have the same probability of rejecting the null hypothesis when it is true. If the ANOVAtest is significant yet all of the intervals overlap, then the pair with the smallest amount ofoverlap is marked in red. If the ANOVA test is not statistically significant, then none of theintervals are marked in red, even if some of the intervals do not overlap.ONE-WAY ANOVA4

Data checksUnusual dataUnusual data are extremely large or small data values, also known as outliers. Unusual data canhave a strong influence on the results of the analysis and can affect the chances of findingstatistically significant results, especially when the sample is small. Unusual data can indicateproblems with data collection, or may be due to unusual behavior of the process you arestudying. Therefore, these data points are often worth investigating and should be correctedwhen possible.ObjectiveWe wanted to develop a method to check for data values that are very large or very smallrelative to the overall sample, which may affect the results of the analysis.MethodWe developed a method to check for unusual data based on the method described by Hoaglin,Iglewicz, and Tukey (1986) to identify outliers in boxplots.ResultsThe Assistant identifies a data point as unusual if it is more than 1.5 times the interquartile rangebeyond the lower or upper quartile of the distribution. The lower and upper quartiles are the25th and 75th percentiles of the data. The interquartile range is the difference between the twoquartiles. This method works well even when there are multiple outliers because it makes itpossible to detect each specific outlier.When checking for unusual data, the Assistant displays the following status indicators in theReport Card:StatusConditionThere are no unusual data points.At least one data point is unusual and may have a strong influence on the results.ONE-WAY ANOVA5

Sample sizePower is an important property of any hypothesis test because it indicates the likelihood thatyou will find a significant effect or difference when one truly exists. Power is the probability thatyou will reject the null hypothesis in favor of the alternative hypothesis. Often, the easiest way toincrease the power of a test is to increase the sample size. In the Assistant, for tests with lowpower, we indicate how large your sample needs to be to find the difference you specified. If nodifference is specified, we report the difference you could detect with adequate power. Toprovide this information, we needed to develop a method for calculating power because theAssistant uses the Welch method, which does not have an exact formula for power.ObjectiveTo develop a methodology for calculating power, we needed to address two questions. First, theAssistant does not require that users enter a full set of means; it only requires that they enter adifference between means that has practical implications. For any given difference, there are aninfinite number of possible configurations of means that could produce that difference.Therefore, we needed to develop a reasonable approach to determine which means to use whencalculating power, given that we could not calculate power for all possible configurations ofmeans. Second, we needed to develop a method to calculate power because the Assistant usesthe Welch method, which does not require equal sample sizes or standard deviations.MethodTo address the infinite number of possible configurations of means, we developed a methodbased on the approach used in the standard one-way ANOVA procedure in Minitab (Stat ANOVA One-Way). We focused on the cases where only two of the means differ by the statedamount and the other means are equal (set to the weighted average of the means). Because weassume that only two means differ from the overall mean (and not more than two), theapproach provides a conservative estimate of power. However, because the samples may havedifferent sizes or standard deviations, the power calculation still depends on which two meansare assumed to differ.To solve this problem, we identify the two pairs of means that represent the best and worstcases. The worst case occurs when the sample size is small relative to the sample variance, andpower is minimized; the best case occurs when the sample size is large relative to the samplevariance, and power is maximized. All of the power calculations consider these two extremecases, which minimize and maximize the power under the assumption that exactly two meansdiffer from the overall weighted average of means.To develop the power calculation, we used a method shown in Kulinskaya et al. (2003). Wecompared the power calculations from our simulation, the method we developed to address theconfiguration of means and the method shown in Kulinskaya et al. (2003). We also examinedanother power approximation that shows more clearly how power depends on the configurationof means. For more information on the power calculation, see Appendix C.ONE-WAY ANOVA6

ResultsOur comparison of these methods showed that the Kulinskaya method provides a goodapproximation of power and that our method for handling the configuration of means isappropriate.When the data does not provide enough evidence against the null hypothesis, the Assistantcalculates practical differences that can be detected with an 80% and a 90% probability for thegiven sample sizes. In addition, if you specify a practical difference, the Assistant calculates theminimum and maximum power values for this difference. When the power values are below90%, the Assistant calculates a sample size based on the specified difference and the observedsample standard deviations. To ensure that the sample size results in both the minimum andmaximum power values being 90% or greater, we assume that the specified difference isbetween the two means with the greatest variability.If the user does not specify a difference, the Assistant finds the largest difference at which themaximum of the range of power values is 60%. This value is labeled at the boundary betweenthe red and yellow bars on the Power Report, corresponding to 60% power. We also find thesmallest difference at which the minimum of the range of power values is 90%. This value islabeled at the boundary between the yellow and green bars on the Power Report,corresponding to 90% power.When checking for power and sample size, the Assistant displays the following status indicatorsin the Report Card:StatusConditionThe data does not provide sufficient evidence to conclude that there are differences among themeans. No difference was specified.The test finds a difference between the means, so power is not an issue.ORPower is sufficient. The test did not find a difference between the means, but the sample is largeenough to provide at least a 90% chance of detecting the given difference.Power may be sufficient. The test did not find a difference between the means, but the sample islarge enough to provide an 80% to 90% chance of detecting the given difference. The sample sizerequired to achieve 90% power is reported.Power might not be sufficient. The test did not find a difference between the means, and the sampleis large enough to provide a 60% to 80% chance of detecting the given differenc e. The sample sizesrequired to achieve 80% power and 90% power are reported.Power is not sufficient. The test did not find a difference between the means, and the sample is notlarge enough to provide at least a 60% chance of detecting the given difference. The sample sizesrequired to achieve 80% power and 90% power are reported.ONE-WAY ANOVA7

NormalityA common assumption in many statistical methods is that the data are normally distributed.Fortunately, even when data are not normally distributed, methods based on the normalityassumption can work well. This is in part explained by the central limit theorem, which says thatthe distribution of any sample mean has an approximate normal distribution and that theapproximation becomes almost normal as the sample size gets larger.ObjectiveOur objective was to determine how large the sample needs to be to give a reasonably goodapproximation of the normal distribution. We wanted to examine the Welch test andcomparison intervals with samples of small to moderate size with various nonnormaldistributions. We wanted to determine how closely the actual test results for Welch method andthe comparison intervals matched the chosen level of significance (alpha, or Type I error rate) forthe test; that is, whether the test incorrectly rejected the null hypothesis more often or less oftenthan expected given different sample sizes, numbers of levels, and nonnormal distributions.MethodTo estimate the Type I error, we performed multiple simulations, varying the number of samples,sample size, and the distribution of the data. The simulations included skewed and heavy-taileddistributions that depart substantially from the normal distribution. The size and standarddeviation were constant across samples within each test.For each condition, we performed 10,000 ANOVA tests using the Welch method and thecomparison intervals. We generated random data so that the means of the samples were thesame and thus, for each test, the null hypothesis was true. Then, we performed the tests using atarget significance level of 0.05. We counted the number of times out of 10,000 when the testsactually rejected the null hypothesis, and compared this proportion to the target significancelevel. For the comparison intervals, we counted the number of times out of 10,000 when theintervals indicated one or more difference. If the test performs well, the Type I error should bevery close to the target significance level.ResultsOverall, the tests and the comparison intervals perform very well across all conditions withsample sizes as small as 10 or 15. For tests with 9 or fewer levels, in almost every case, theresults are all within 3 percentage points of the target significance level for a sample size of 10and within 2 percentage points for a sample size of 15. For tests that have 10 or more levels, inmost cases the results are within 3 percentage points with a sample size of 15 and within 2percentage points with a sample size of 20. For more information, see Appendix D.ONE-WAY ANOVA8

Because the tests perform well with relatively small samples, the Assistant does not test the datafor normality. Instead, the Assistant checks the size of the samples and indicates when thesamples are less than 15 for 2-9 levels and less than 20 for 10-12 levels. Based on these results,the Assistant displays the following status indicators in the Report Card:StatusConditionThe sample sizes are at least 15 or 20, so normality is not an issue.Because some sample sizes are less than 15 or 20, normality may be an issue.ONE-WAY ANOVA9

ReferencesDunnet, C. W. (1980). Pairwise Multiple Comparisons in the Unequal Variance Case. Journal ofthe American Statistical Association, 75, 796-800.Hoaglin, D. C., Iglewicz, B., and Tukey, J. W. (1986). Performance of some resistant rules foroutlier labeling. Journal of the American Statistical Association, 81, 991-999.Hochberg, Y., Weiss G., and Hart, S. (1982). On graphical procedures for multiple comparisons.Journal of the American Statistical Association, 77, 767-772.Hsu, J. (1996). Multiple comparisons: Theory and methods. Boca Raton, FL: Chapman & Hall.Kulinskaya, E., Staudte, R. G., and Gao, H. (2003). Power approximations in testing for unequalmeans in a One-Way ANOVA weighted for unequal variances, Communication in Statistics, 32(12), 2353-2371.Welch, B.L. (1947). The generalization of “Student’s” problem when several different populationvariances are involved. Biometrika, 34, 28-35Welch, B.L. (1951). On the comparison of several mean values: An alternative approach.Biometrika 38, 330-336.ONE-WAY ANOVA10

Appendix A: The F-test versus theWelch testThe F-test can result in an increase of the Type I error rate when the assumption of equalstandard deviations is violated; the Welch test is designed to avoid these problems.Welch testRandom samples of sizes n1, , nk from k populations are observed. Let μ1, ,μk denote thepopulation means and let 𝜎12 , , 𝜎𝑘2 denote the population variances. Let 𝑥̅1 , , 𝑥̅𝑘 denote thesample means and let 𝑠12 , , 𝑠𝑘2 denote the sample variances. We are interested in testing thehypotheses:H0: 𝜇1 𝜇2 𝜇𝑘H1: 𝜇𝑖 𝜇𝑗 for some i, j.The Welch test for testing the equality of k means compares the statistic2𝑊 𝑘̂ ) (𝑘 1)𝑗 1 𝑤𝑗 (𝑥̅ 𝑗 𝜇1 [2(𝑘 2) (𝑘 2 1)] 𝑘𝑗 1 ℎ𝑗to the F(k – 1, f) distribution, where𝑤𝑗 𝑛𝑗𝑠𝑗2,𝑊 𝑘𝑗 1 𝑤𝑗 ,𝜇̂ 𝑘𝑗 1 𝑤𝑗 𝑥̅ 𝑗,𝑊2ℎ𝑗 𝑓 (1 𝑤𝑗 𝑊)𝑛𝑗 1𝑘 2 13 𝑘𝑗 1 ℎ𝑗, and.The Welch test rejects the null hypothesis if 𝑊 𝐹𝑘 – 1,𝑓,1 – 𝛼 , the percentile of the F distributionthat is exceeded with probability 𝛼.Unequal standard deviationsIn this section we demonstrate the sensitivity of the F-test to violations of the assumption ofequal standard deviations and compare it to the Welch test.The results below are for one-way ANOVA tests using 5 samples of N(0, σ2). Each row is basedon 10,000 simulations using the F-test and the Welch test. We tested two conditions for thestandard deviation by increasing the standard deviation of the fifth sample, doubling it andONE-WAY ANOVA11

quadrupling it compared to the other samples. We tested three different conditions for thesample size: samples sizes are equal, the fifth sample is greater than the others, and the fifthsample is less than the others.Table 1 Type I error rates for simulated F-tests and Welch tests with 5 samples with targetsignificance level 0.05Standard deviation(σ1, σ2, σ3, σ4, σ5)Sample size(n1, n2, n3, n4, n5)F-testWelch test1, 1, 1, 1, 210, 10, 10, 10, 20.0273.05241, 1, 1, 1, 220, 20, 20, 20, 20.0678.04621, 1, 1, 1, 220, 20, 20, 20, 10.1258.05401, 1, 1, 1, 410, 10, 10, 10, 20.0312.04601, 1, 1, 1, 420, 20, 20, 20, 20.1065.05331, 1, 1, 1, 420, 20, 20, 20, 10.2277.0503When the sample sizes are equal (rows 2 and 5), the probability that the F-test incorrectly rejectsthe null hypothesis is greater than the target 0.05, and the probability increases when theinequality among standard deviations is greater. The problem is made even worse by decreasingthe size of the sample with the largest standard deviation. On the other hand, increasing the sizeof the sample with the largest standard deviation reduces the probability of rejection. However,increasing the sample size by too much makes the probability of rejection too small, which notonly makes the test more conservative than necessary under the null hypothesis, but alsoadversely affects the power of the test under the alternative hypothesis. Compare these resultswith the Welch test, which agrees well with the target significance level of 0.05 in every case.Next we conducted a simulation for cases with k 7 samples. Each row of the table summarizes10,000 simulated F-tests. We varied the standard deviations and sizes of the samples. The targetsignificance levels are 𝛼 0.05 and 𝛼 0.01. As above, we see deviations from the target valuesthat can be quite severe. Using a smaller sample size when variability is higher leads to verylarge Type I error probabilities, while using a larger sample can lead to an extremelyconservative test. The results are shown in Table 2 below.Table 2 Type I error rates for simulated F-tests with 7 samplesStandard deviation(σ1, σ2, σ3, σ4, σ5, σ6, σ7)Sample sizes(n1, n2, n3, n4, n5, n6, n7)Target 𝛂 0.05Target 𝛂 0.011.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.921, 21, 21, 21, 22, 22, 120.07950.02331.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.920, 21, 21, 21, 21, 24, 120.07850.0226ONE-WAY ANOVA12

Standard deviation(σ1, σ2, σ3, σ4, σ5, σ6, σ7)Sample sizes(n1, n2, n3, n4, n5, n6, n7)Target 𝛂 0.05Target 𝛂 0.011.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.920, 21, 21, 21, 21, 21, 150.07120.01991.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.920, 20, 20, 21, 21, 23, 150.07190.01721.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.920, 20, 20, 20, 21, 21, 180.06320.01661.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.920, 20, 20, 20, 20, 20, 200.05760.01381.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.918, 19, 19, 20, 20, 20, 240.04740.01331.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.918, 18, 18, 18, 18, 18, 320.03140.00571.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.915, 18, 18, 19, 20, 20, 300.04000.00851.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.912, 18, 18, 18, 19, 19, 360.02880.00641.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.915, 15, 15, 15, 15, 15, 500.01630.00251.85, 1.85, 1.85, 1.85, 1.85, 1.85, 2.912, 12, 12, 12, 12, 12, 680.00520.00021.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.521, 21, 21, 21, 22, 22, 120.10970.04361.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.520, 21, 21, 21, 21, 24, 120.11190.04521.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.520, 21, 21, 21, 21, 21, 150.09960.03761.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.520, 20, 20, 21, 21, 23, 150.06570.03451.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.520, 20, 20, 20, 21, 21, 180.07790.02831.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.520, 20, 20, 20, 20, 20, 200.07370.02641.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.518, 19, 19, 20, 20, 20, 240.06040.02041.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.518, 18, 18, 18, 18, 18, 320.03680.01221.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.515, 18, 18, 19, 20, 20, 300.03900.01171.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.512, 18, 18, 18, 19, 19, 360.02320.00461.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.515, 15, 15, 15, 15, 15, 500.01240.00261.75, 1.75, 1.75, 1.75, 1.75, 1.75, 3.512, 12, 12, 12, 12, 12, 680.00270.00041.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.921, 21, 21, 21, 22, 22, 120.13400.06301.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.920, 21, 21, 21, 21, 24, 120.13290.0654ONE-WAY ANOVA13

Standard deviation(σ1, σ2, σ3, σ4, σ5, σ6, σ7)Sample sizes(n1, n2, n3, n4, n5, n6, n7)Target 𝛂 0.05Target 𝛂 0.011.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.920, 21, 21, 21, 21, 21, 150.11010.04841.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.920, 20, 20, 21, 21, 23, 150.11210.04951.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.920, 20, 20, 20, 21, 21, 180.08760.03741.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.920, 20, 20, 20, 20, 20, 200.08080.03171.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.918, 19, 19, 20, 20, 20, 240.06060.02431.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.918, 18, 18, 18, 18, 18, 320.03560.01191.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.915, 18, 18, 19, 20, 20, 300.04120.01341.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.912, 18, 18, 18, 19, 19, 360.02610.00681.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.915, 15, 15, 15, 15, 15, 500.01000.00231.68333, 1.68333, 1.68333, 1.68333,1.68333, 1.68333, 3.912, 12, 12, 12, 12, 12, 680.00170.00031.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.721, 21, 21, 21, 22, 22, 120.17730.10061.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.720, 21, 21, 21, 21, 24, 120.18110.10401.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.720, 21, 21, 21, 21, 21, 150.14450.07601.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.720, 20, 20, 21, 21, 23, 150.14480.07861.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.720, 20, 20, 20, 21, 21, 180.11640.05721.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.720, 20, 20, 20, 20, 20, 200.10200.05031.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.718, 19, 19, 20, 20, 20, 240.08340.03691.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.718, 18, 18, 18, 18, 18, 320.04250.01591.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.715, 18, 18, 19, 20, 20, 300.04630.01681.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.712, 18, 18, 18, 19, 19, 360.03050.01031.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.715, 15, 15, 15, 15, 15, 500.00820.0021ONE-WAY ANOVA14

Standard deviation(σ1, σ2, σ3, σ4, σ5, σ6, σ7)Sample sizes(n1, n2, n3, n4, n5, n6, n7)Target 𝛂 0.05Target 𝛂 0.011.55, 1.55, 1.55, 1.55, 1.55, 1.55, 4.712, 12, 12, 12, 12, 12, 680.00130.0001ONE-WAY ANOVA15

Appendix B: Comparison intervalsThe means comparison chart allows you to evaluate the statistical significance of differencesamong the population means.Figure 1 The Means Comparison Chart in the Assistant One-Way ANOVA Summary ReportONE-WAY ANOVA16

A similar set of intervals appears in the output for the standard one-way ANOVA procedure inMinitab (Stat ANOVA One-Way):Interval Plot of C1, C2, .95% CI for the Mean10Data86420C1C2C3C4Individual standard deviations were used to calculate the intervals.Worksheet: Worksheet 1However, note that the intervals above are simply individual confidence intervals for the means.When the ANOVA test (either F or Welch) concludes that some means are different, there is anatural tendency to look for intervals that do not overlap and draw conclusions about whichmeans differ. This informal analysis of the individual confidence intervals will often lead toreasonable conclusions, but it does not control for the probability of error the same way theANOVA test does. Depending on the number of populations, the intervals may be substantiallymore or less likely than the test to conclude that there are differences. As a result, the twomethods can easily reach inconsistent conclusions. The comparison chart is designed to moreconsistently match the Welch test results when making multiple comparisons, although it is notalways possible to achieve complete consistency.Multiple comparison methods, such as such as the Tukey-Kramer and Games-Howellcomparisons in Minitab (Stat ANOVA One-Way), allow you to draw statistically validconclusions about differences among the individual means. These two methods are pairwisecomparison methods, which provide an interval for the difference between each pair of means.The probability that all intervals simultaneously contain the differences they are estimating is atleast 1 𝛼. The Tukey-Kramer method depends on the assumption of equal variances, while theGames-Howell method does not require equal variances. If the null hypothesis of equal means istrue, then all the differences are zero, and the probability that any of the Games-Howell intervalswill fail to contain zero is at most 𝛼. So we can use the intervals to perform a hypothesis testONE-WAY ANOVA17

with significance level 𝛼. We use Games-Howell intervals as the s

One-Way ANOVA Overview One-way ANOVA is used to compare the means of three or more groups to determine whether they differ significantly from one another. Another important function is to estimate the differences between specific groups. The most common method to detect differen