Transcription

Multiple Regression Analysis in Minitab1Suppose we are interested in how the exercise and body mass index affect the blood pressure. Arandom sample of 10 males 50 years of age is selected and their height, weight, number of hours ofexercise and the blood pressure are measured. Body mass index is calculated by the following()( )formula:.Select Stat-Regression-Regression from the pull-down menu.Placing the variable we would like to predict, blood pressure, in the “Response:” and the variablewe will use for prediction, exercise and body mass index in the “Predictors:” box. Click OK.This generates the following Minitab output.The regression equation isBloodPressure 74.5 - 2.84 Exercise 2.71 19S 11.9087SE Coef29.411.8610.9144R-Sq 80.4%T2.53-1.522.97P0.0390.1710.021R-Sq(adj) 74.8%Analysis of VarianceSourceRegressionResidual .38P0.003The interpretation of R2 is same as before. We can see that 80.4% of the variation in Y isexplained by the regression line. The fitted regression model found from the output is(Blood Pressure) 74.5 - 2.84 * Exercise 2.71 * BMI.

Multiple Regression Analysis in Minitab2The next part of the output is the statistical analysis (ANOVA-analysis of variance) for theregression model. The ANOVA represents a hypothesis test with where the null hypothesis is(In simple regression, i 1)H o : i 0 for all iH A : i 0 for at least 1 coefficientIn this example, p-value for this overall test is .003 concluding at least one of independentvariables is significantly meaningful to explain the blood pressure.The individual t-test can also be performed.In this example p-value is .171 and .021. Thus, 1 is not significantly different from zero whenbody mass index is in the model, and 2 is significantly different from zero when body massindex is in the model.Model assumption checking and prediction interval can be done in the similar manner as thesimple regression analysis. Normal probability plot and residual plot can be obtained by clickingthe “Graphs” button in the “Regression” window, then checking the “Normal plot of residuals”and “Residuals versus fits” boxes. Click OK to exit the graphs window, click OK again to runthe test.

Multiple Regression Analysis in Minitab3Full and Reduced ModelsSometimes in multiple regression analysis, it is useful to test whether subsets of coefficients areequal to zero. To do this a partial F test will be considered. This answers the question, “Is the fullmodel better than the reduced model at explaining variation in y?” The following hypotheses areconsidered:Where L represents the number of variables in the reduced model and K represents the numberof variables in our full model. Rejectingmeans the full model is preferred over the reducedmodel, whereas not rejectingmeans the reduced model is preferred.The Partial F is used totest these hypotheses and is given by(()(() ()) ())from the full model’s output.Note that the denominator is theThe decision rule for the Partial F test is:Rejectif F F(α; K-L, n-K-1)Fail to rejectif F F(α; K-L, n-K-1)In our example above we might consider comparing the full model with exercise and BMIpredicting blood pressure to a reduced model of the BMI predicting blood pressure at the α 0.05level. To calculate the partial F we will run the regression for the reduced model by clickingStat-Regression-Regression and putting BloodPressure in as the “Response:” variable andBMI in as the “Predictors:” variable and click “OK.” The reduced model output reads:The regression equation isBloodPressure 41.0 3.61 BMIPredictorConstantBMICoef40.983.6060S 12.8545SE Coef21.070.7569R-Sq 73.9%T1.944.76P0.0880.001R-Sq(adj) 70.7%Analysis of VarianceSourceRegressionResidual 2.70P0.001To calculate the partial F we will use the output for the full model found on page 1 and thereduced model output above.

Multiple Regression Analysis in Minitab(()(() (4)) ())() (() ())Then use an F-table to look up the value for F(α; K-L, n-K-1) F(0.05; 1, 7) 5.59. Accordingto our decision rule, 2.32 5.59. This means we fail to reject . This means that at the α 0.05level we have found evidence that the reduced model is more efficient at explaining the gameswon.Calculating Confidence Intervals and Prediction IntervalsCalculating CI and PI for multiple regressions are fairly similar to simple linear regressions. Formultiple regressions you can create the intervals for your model based on the predictor variables.Consider the full model from earlier in this tutorial. We can predict the CI and PI for 6 hours ofexercise and a BMI of 20.1 by entering the values in as seen below after clicking StatRegression-Regression-Options to get to the window.Then press “OK” and “OK” to run the regression analysis. The output will now include:Predicted Values for New ObservationsNew Obs1Fit111.98SE Fit6.3895% CI(96.88, 127.08)95% PI(80.03, 143.93)Values of Predictors for New ObservationsNew Obs1Exercise6.00BMI20.1The 95% CI for this combination is (96.88, 127.08) and the 95% PI is (80.03, 143.93). Thevalues entered can be seen at the bottom of the output to ensure each variable was correctlyentered and not accidentally switched around or mistyped.

Multiple Regression Analysis in Minitab5Transformation of VariablesIt is not always obvious what to do when your model does not fit well. Transformations may bethe easiest way to produce a better fit, especially when collecting more data is not feasible.Options to try include polynomial regression, inverse transformation, log transformation of theexplanatory variable, log transformation of dependent and explanatory variable, and many moretransformations. This tutorial will look at creating an inverse transformation for the model andstoring this information in your Minitab sheet.Click Calc-Calculator and enter your information in the appropriate spaces in the windowthat pops up.Choose a column without data stored in it to store your inverse transformation. In the expressionblank enter the appropriate algebra and functions for your transformation. Then press “OK” andyour transformation will then appear in your Minitab sheet. Use this variable instead of theoriginal variable in your regression to see if the model becomes a better fit.A Potential Problem with Multiple RegressionWhen explanatory variables are correlated with one another, the problem of multicollinearity, ornear-linear dependence among regression variables, is said to exist. When a high degree ofmulticollinearity exists, the variance of the regression coefficients are infalted. This can lead tosmall t-values and unstable regression coefficients. Multicollinearity does not affect the ability toobtain a good fit to the regression ( ) or the quality of forecasts or predictions from theregression.The way to determine if this is a problem with your model is to look at the Variance InflationFactors (VIFs). The equation for the VIF for the jth regression coefficent can be written as, whereis the coefficient of multiple determination obtained by performing the

Multiple Regression Analysis in Minitab6regression of on the remaining K-1 regressor variables. Any individual VIF larger than 10should indiciate that multicollinearity is present.To check for VIFs in Minitab click Stat-Regression-Regression from the drop-down menu.Next click the Options button. Then check the “Variance inflation factors” box under Display,click OK. Then click OK again to run the test.The data created in the output will look identical to the data collected before, except the table ofcoefficient will contain an additional column of 2.8362.7119SE VIF1.7011.701Seeing the both our VIFs are below 10 we assume multicollinearity is not present in our model.Correcting MulticollinearityIn order to correct for multicollinearity you need to remove the variables that are highlycorrelated with others. You can also try to add more data, this might break the pattern ofmulticollinearity.There are drawbacks to these solutions. If you remove a variable you will obtain no informationon the removed variable. So choosing which correlated variable to remove can be difficult. Ifyou add more data the multicollinearity won’t always disappear, and sometimes it is impossibleto add more data (due to budget restraints or lack of data beyond what is known).If a regression model is used strictly for forecasting, corrections may not be necessary.

Multiple Regression Analysis in Minitab 2 The next part of the output is the statistical analysis (ANOVA-analysis of variance) for the regression model. The ANOVA represents a hypothesis test with where the null hypothesis is H o:E i 0 for all i (In simple