Transcription

Getting Started in Factor Analysis(v. 1.5)Oscar [email protected] 2008http://www.princeton.edu/ otorres/

Factor analysis: introFactor analysis is used mostly for data reduction purposes:– To get a small set of variables (preferably uncorrelated) from a large set ofvariables (most of which are correlated to each other)– To create indexes with variables that measure similar things (conceptually).Two types of factor analysisExploratoryConfirmatory.It is exploratory when you do nothave a pre-defined idea of thestructure or how manydimensions are in a set ofvariables.It is confirmatory when you want to testspecific hypothesis about the structureor the number of dimensions underlyinga set of variables (i.e. in your data youmay think there are two dimensionsand you want to verify that).

Factor analysis: step 1To run factor analysis use the command factor (type help factor formore details).VariablesPrincipal-components factoringTotal variance accounted byeach factor. The sum of alleigenvalues total number ofvariables.When negative, the sum ofeigenvalues total number offactors (variables) with positiveeigenvalues.Kaiser criterion suggests toretain those factors witheigenvalues equal or higherthan 1.Since the sum of eigenvalues total number of variables.Proportion indicate the relativeweight of each factor in thetotal variance. For example,1.54525/5 0.3090. The firstfactor explains 30.9% of thetotal varianceCumulative shows the amountof variance explained by n (n1) factors. For example, factor1 and factor 2 account for57.55% of the total variance.Difference between oneeigenvalue and the next.Factor loadings are the weights and correlations between each variable and the factor. Thehigher the load the more relevant in defining the factor’s dimensionality. A negative valueindicates an inverse impact on the factor. Here, two factors are retained because bothhave eigenvalues over 1. It seems that ‘owner’ and ‘competition’ define factor1, and‘equality’, ‘respon’ and ‘ideol’ define factor2.Uniqueness is the variance that is ‘unique’to the variable and not shared with othervariables. It is equal to 1 – communality(variance that is shared with othervariables). For example, 61.57% of thevariance in ‘ideol’ is not share with othervariables in the overall factor model. On thecontrary ‘owner’ has low variance notaccounted by other variables (28.61%).Notice that the greater ‘uniqueness’ thelower the relevance of the variable in thefactor model.

Factor analysis: step 2 (final solution)After running factor you need to rotate the factor loads to get a clearer pattern, just type rotate toget a final solution.By default the rotation is varimax whichproduces orthogonal factors. This meansthat factors are not correlated to each other.This setting is recommended when youwant to identify variables to create indexesor new variables without inter-correlatedcomponentsSame description as in the previous slidewith new composition between the twofactors. Still both factors explain 57.55% ofthe total variance observed.The pattern matrix here offers a clearerpicture of the relevance of each variable inthe factor. Factor1 is mostly defined by‘owner’ and ‘competition’ and factor2 by‘equality’, ‘respon’ and ‘ideol’ .This is a conversion matrix to estimate therotated factor loadings (RFL):RFL Factor loadings * Factor rotationNOTE: If you want the factors to be correlated (oblique rotation) you need to use the option promax after rotate:rotate, promaxType help rotate for details. See http://www.ats.ucla.edu/stat/stata/output/fa output.htm for more info.Thank you to Jeannie-Marie S. Leoutsakos for useful feedback.

Factor analysis: step 3 (predict)To create the new variables, after factor, rotate you type predict.predict factor1 factor2 /*or whatever name you prefer to identify the factors*/These are the regression coefficients used to estimatethe individual scores (per case/row)Another option (called naïve by some) could be to create indexes out of each cluster of variables. Forexample, ‘owner’ and ‘competition’ define one factor. You could aggregate these two to create a newvariable to measure ‘market oriented attitudes’. On the other hand you could aggregate ‘ideol’, ‘equality’and ‘respon’ to create an index to measure ‘egalitarian attitudes’. Since all variables are in the samevalence (liberal for small values, capitalist for larger values), we can create the two new variables asgen market (owner competition)/2gen egalitatiran (ideol equality respon)/3

Factor analysis: sources/referencesThe main sources/references for this section are:Books Factor Analysis in International Relations. Interpretation, Problem Areas and Application /Vincent, Jack. University of Florida Press, Gainsville, 1971. Factor Analysis. Statistical Methods and Practical Issues / Kim Jae-on, Charles W.Mueller, Sage publications, 1978. Introduction to Factor Analysis. What it is and How To Do It / Kim Jae-on, Charles W.Mueller, Sage publications, 1978. Statistics with STATA (updated for version 9) / Hamilton, Lawrence C. ThomsonBooks/Cole, 2006.Online StatNotes: htm StatSoft: http://www.statsoft.com/textbook/stfacan.html UCLA Resources: http://www.ats.ucla.edu/stat/stata/output/fa output.htm

Factor analysis: step 1, To run factor analysis use the command (type more details).factorhelp factor , Total variance accounted by each factor. The sum of all eigenvalues total number of variables. , When negative, the sum of eigenvalues total number of factors (variables) with positive eigenvalues.