Transcription

Partial Least Squares (PLS):Path ModelingMethod TalkWinter Term 2015/16Pascal Stichler

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS2

Today’s LectureObjectivesPLS1Evaluate when to use PLS2Learn how PLS works and how to use it3Investigate how to evaluate a PLS model, interpret the results andadjust the model accordingly3

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Introduction to PLS4

PLS: A silver bullet?Partial Least Squares Path Modeling is a statistical data analysismethodology that exists at the intersection of Regression Models,Structural Equation Models, and Multiple Table Analysis methods [9]Goal: Use theoretical knowledge about structure of latent variables topredict indicators based on dataIDoing so with least possible distribution assumptionsIPLS-PM is known under several names: PLS-PM, PLS-SEM,component-based structural equation modeling, projection to latentstructures, soft modeling etc.IDeveloped by Herman Wold in the mid 1960s under the term of "softmodeling" [14]IAfter initial introduction and discussions it received little attention untilthe late 1990s, however since then sharply rising interestPLS: Introduction to PLS5

Why use PLS?PLS-PM is worth considering when .Structural modelI . . . you have a theoretical model that involves latent variablesI . . . the phenomenon you investigate is relatively new and measurementmodels need to be newly developedI . . . the structural equation model is complex with a large number of latentvariables and indicator variables [12]Observed variablesI . . . you have small sample sets (e. g. more variables than observations) [7]I . . . you have non-normal distributed dataI . . . you have multicollinearity problemsI . . . you have formative and reflective measures (to be discussed)I . . . you need minimum requirements regarding measurement scales (e. g.ratio and nominal variables)I . . . you need minimum requirements regarding residuals distribution [1]PLS: Introduction to PLS6

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Putting PLS in Context7

General OverviewTypes of PLS:IPLS-Path Modeling:Component-based modeling based ontheoretical structure modelMainly used in: social sciences, econometrics,marketing and strategic managementI PLS-Regression:Regression based approach investigating thelinear relationship between multiple independentvariables and dependent variable(s)Mainly used in: chemometrics, bioinformatics,sensometrics, neuroscience and anthropologyI OPLS: Orthognal projection improvesinterpretabilityI PLS-DA: Used when Xr is categorialI CB-SEM: Covariance-based structural equationmodellingPLS: Putting PLS in ContextLVpLVrXpXrI Predictors Xp XI Responses Xr X withXp Xr I Exogenous latent variablesLVp LVI Endogenous latent variablesLVr LV with LVp LVr 8

PLS-PM vs. CB-SEMBoth methods differ from statistical point of view. Hence, neither of the techniques isgenerally superior to the other and neither of them is appropriate for all situations. Ingeneral, the strenghts of PLS-SEM are CB-SEM’s weaknesses, and visa versa. [3]PLS-PM (PLS-SEM)Variance-basedI The goal is prediction and theorydevelopmentI Formatively measured constructs arepart of the structural modelCB-SEMCovariance-basedI The goal is theory testing, theoryconfirmation, or the comparison ofalternative theoriesI The structural model is complexI Error terms require additionalspecification, such as the covariationI The sample size is small and/or thedata are non-normally distributedI The structural model has non-recursiverelationshipsI The plan is to use latent variable scoresin subsequent analysesI The research requires a globalgoodness-of-fit criterionI Available Software: SmartPLS,PLSGraph, R packages (plspm) etc.I Available Software: LISREL, AMOS,EQS etc.Based on [8], [4], [11]PLS: Putting PLS in Context9

Theory TestingComparisonCB-SEMPLS-PMPredictionPLS: Putting PLS in Context10

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Model Definition11

Exemplary ModelMeasurement Model/Outer ModelMeasurement Model/Outer l Model/Inner ModelFormal definition:I X data set with n observations and m variablesI X can be divided into J exclusive blocks with K variables each X1,1 . . . XJ ,K etc.cj YjI Each block Xj associated with LVj ; estimation of variable ("score") denoted by LVI LV1 and LV3: reflective blocks; LV2: formative block [9]PLS: Model Definition12

Structural Model (Inner Model)1Linear RelationshipAll relationships are considered linear relationships and can be notedasLVj β0 βji LVi εj i jThe coefficients βji represent the path coefficients2Recursive Model mandatoryCausality flow must be unidirectional (no loops)3Regression Specification (Predictor Specification)E (LVj LVi ) β0i βji LVi i jSpecifying that the regression has to be linear under the assumptionthatcov(LVj , εj ) 0 and εj 0PLS: Model Definition13

Measurement Model (Outer Model)Reflective IndicatorsX11X12X13X21λ11λ12LV1λ13I Linear relationships:Xjk λ0jk λjk LVj εjk(λjk is called loading)I RegressionSpecification:E (Xjk LVj ) λ0jk λjk LVjI Characteristics:- Unidimensional- Correlated- Xjk "fully relevant"PLS: Model DefinitionFormative λ32LV3λ33LVj λ0j λjk Xjk εjequivalent to reflective andformative (depending onindicator)E (LVj Xjk ) λ0j λjk Xjkequivalent to reflective andformative (depending onindicator)- Multidimensional- Uncorrelated- Xjk "partly relevant"In R package plspm notpossible*multiple effect indicators for multiple causes14

Weight Relations (Scores)IThe latent variables are only virtual entitiesIHowever, as all linear relations depend on the latent variables, theyneed a representation: Weight Relationscj Yj Score: LV wjk XjkkIThe score, as a representation of the latent variable, is calculated asthe sum of its indicators (similar to the approach in principalcomponent analysis)IBecause of this PLS is called a component-based approachPLS: Model Definition15

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Solution Algorithm16

PLS-PM Algorithm Overview1Stage: Get the weights to compute latent variable scores Most important and most difficult2Stage: Estimate the path coefficients (inner model) Usually done via OLS3Stage: Obtain the loadings (outer model) Calculation of correlationsPLS: Solution Algorithm17

Stage 1: Latent Variable ScoresStart:initial arbitrary outerweights (e.g. wjk 1)Check forconvergence ofouter weightsMode A:SimpleregressionMode B:Multipleregression(Mode C:)CombinationPLS: Solution AlgorithmStep 1:Compute the externalapproximation oflatent variablesYj wjk XjkkStep 4:Step 2:Calculate newouter weights wjkObtain innerweights eijInner weightingschemes: Centroidscheme Factor scheme Path schemeStep 3:Compute the internalapproximation oflatent variablesZj i j eijYi 18

Stage 2 & 32. Stage: Path CoefficientsThe path coefficient estimates βbji Bji are calculated usually using ordinaryleast squares in the multiple regression of Yi on the Yj ’s related with itYj βbji Yii jIn case high multicollinearity occurs PLS regression can also be applied [11]3. Stage: LoadingsFor convenience and simplicity reasons, loadings are preferably calculatedas correlations between a latent variable and its indicators:λcjk cor (Xjk , Yj )PLS: Solution Algorithm19

PLS-PM usage in R (package plspm)Parameters to define the PLS Path ModelDatapath matrixblocksscalingmodesData for the modelDefinition of inner modelList definitng the blocks of variables of the outer modelList defining the measurment scale of variables for non-metric dataVector defining the measuremnt mode of each blockParameters related to the PLS-PM algorithmschemescaledtolmaxiterplscompInner path weighting schemeIndicates whether the data should be standardizedTolerance threshold for checking convergence of the iterative stagesmaximum number of iterationsIndicates the number of PLS components when handling non-metric dataAdditional parametersboot.valbrdatasetPLS: Solution AlgorithmIndicates whether bootstrap validation must be performedNumber of bootstrap resamplesIndicates whether the data matrix should be retrieved20

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Model Evaluation21

Interpreting the ResultsIn PLS the real challenge is interpreting the results and makingwell-founded adjustments the model [9], p. 54Steps of Model Assessment:1Assessment MeasurementModel (Outer Model)2Assessment Structural Model(Inner Model)(It is important to keep this order due to modeldependencies)PLS: Model Evaluation22

1. Measurement Model Assessment (Outer Model)I Formative Blocks: Evaluation relatively straightforwardI Reflective Blocks: Evaluation rather complexFormative Blocks:Variables are considered as causingthe latent variableI They do not necessarilymeasure the same underlyingconstructI Not supposed to be correlatedI Compare outer weights tocheck which indicatorcontributes most efficientlyI Elimination of variables should Test theory appliedReflective Blocks:Variables are considered as measuring thesame underlying constructI Hence they need a strong mutualassociationI Further they should be strongly related to itslatent variable1Unidimensionality of indicators2Indicators well explained3Constructs differ from each otherbe based on multicollinearityPLS: Model Evaluation23

Deep Dive: Reflective Indicators1Unidimensionality of indicators: All for one and one for all(a) Cronbach’s alphaMeasures the average inter-variable correlation(considered good if 0.7)(b) Dillon-Goldstein’s rhoFocus on the variance of the sum of variables (considered a betterindicator than Cronbach’s alpha ([1], p.320)(considered good if 0.7)(see [11], [13] p. 50 for formal definition)(c) First eigenvalueFirst eigenvalue of correlation matrix should be larger than one andsecond one significantly smaller (preferably smaller than 1)2Loadings & Communalities: Indicators well explainedIILoadings are considered for each indicator (considered good if 0.7)Communalities (squared loadings): amount of indicator varianceexplained by its corresponding LVPLS: Model Evaluation24

Deep Dive: Reflective Indicators3Cross-loadings: Constructs differ from each othercross-loadings ˆ loadings of an indicator with the rest of the latentvariablesGoal: Ensure that shared variance between construct and itsindicators is higher than for other constructs (no "traitor" indicators) Loadings should always be highest for the respective block[. . . ] crossloadingsPLS: Model Evaluation25

2. Structural Model Assessment (Inner Model)Standard OLS regression output:3 further indicators of model quality:I R 2 determination coeffcient: Amount of variance of endogenous LVs explained byits independent LVs (considered low below 0.3 and high above 0.6)I Redundancy Index: Amount of variance in the endogenous block that explained by2 2its independent LVs (defined as Rd (LVj , xjk ) loadingjkRj )I Goodness-of-Fit (GoF): No single criterion exists for overall quality of a model.GoF as a pseudo criterion:qGoF communality R 2 (considered good if 0.7) [10] [11]I Validation: Resampling (bootstrapping, jackknifing) possible; more traditionalapproaches are not (as there are no assumptions made on the distribution)PLS: Model Evaluation26

Outline1Introduction to PLS2Putting PLS in Context3Model Definition4Solution Algorithm5Model Evaluation6Wrap-UpPLS: Wrap-Up27

Summary: PLSAdvisable for the following conditions (based on [8])FocusDistributionSample sizePrediction and theory developmentMinimum assumptions made regarding indicator distributionSmall sample size possible (however questioned in literature [2], [6], [5])Model definitionIndicatorsMeasurement ModelStructural ModelDefine blocks of variables and respective latent variablesDefine relations (formative/reflective)Define internal modelInterpreting the resultsMeasurement Model (formative)Measurement Model (reflective)Structural ModelValidationPLS: Wrap-UpEliminate multicollinearityUnidimensionality, loadings & communalities andcross-loadingsConsider R 2 , redundancy index and GoFApply resampling (bootstrapping, jackknifing)28

Bibliography IW. W. C HIN. The partial least squares approach to structuralequation modeling. In: Modern methods for business research,Vol. 295, No. 2 (1998), pp. 295–336.D. G OODHUE, W. L EWIS, and R. T HOMPSON. PLS, small samplesize, and statistical power in MIS research. In: System Sciences,2006. HICSS’06. Proceedings of the 39th Annual Hawaii InternationalConference on. Vol. 8. IEEE. 2006, 202b–202b.J. F. H AIR J R et al. A primer on partial least squares structuralequation modeling (PLS-SEM). Sage Publications, 2013.J. F. H AIR, C. M. R INGLE, and M. S ARSTEDT. PLS-SEM: Indeed asilver bullet. In: Journal of Marketing Theory and Practice, Vol. 19,No. 2 (2011), pp. 139–152.G. A. M ARCOULIDES, W. W. C HIN, and C. S AUNDERS. A critical lookat partial least squares modeling. In: Mis Quarterly (2009),pp. 171–175.PLS: Wrap-Up29

Bibliography IIG. A. M ARCOULIDES and C. S AUNDERS. Editor’s comments: PLS: asilver bullet? In: MIS quarterly, Vol. 30, No. 2 (2006), pp. iii–ix.B.-H. M EVIK and R. W EHRENS. The pls package: principalcomponent and partial least squares regression in R. In: Journal ofStatistical Software, Vol. 18, No. 2 (2007), pp. 1–24.W. R EINARTZ, M. H AENLEIN, and J. H ENSELER. An empiricalcomparison of the efficacy of covariance-based and variance-basedSEM. In: International Journal of research in Marketing, Vol. 26,No. 4 (2009), pp. 332–344.G. S ANCHEZ. PLS path modeling with R. In: Online, January (2013).M. T ENENHAUS, S. A MATO, and V E SPOSITO V INZI. A globalgoodness-of-fit index for PLS structural equation modelling. In:Proceedings of the XLII SIS scientific meeting. Vol. 1. CLEUPPadova. 2004, pp. 739–742.PLS: Wrap-Up30

Bibliography IIIM. T ENENHAUS et al. PLS path modeling. In: Computationalstatistics & data analysis, Vol. 48, No. 1 (2005), pp. 159–205.N. U RBACH and F. A HLEMANN. Structural equation modeling ininformation systems research using partial least squares. In: Journalof Information Technology Theory and Application, Vol. 11, No. 2(2010), pp. 5–40.V. E. V INZI, L. T RINCHERA, and S. A MATO. PLS path modeling: fromfoundations to recent developments and open issues for modelassessment and improvement. In: Handbook of partial least squares.Springer, 2010, pp. 47–82.H. W OLD et al. Estimation of principal components and relatedmodels by iterative least squares. In: Multivariate analysis, Vol. 1(1966), pp. 391–420.PLS: Wrap-Up31

PLS-PM vs. CB-SEM Both methods differ from statistical point of view. Hence,neither of the techniques is generally superiorto the other and neither of them is appropriate for all situations. In general, thestrenghts of PLS-SEM are CB-SEM’s weaknesses, and visa versa. [3] PLS-PM (PLS-SEM) Va