BAYESIAN STATISTICS 9J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid,D. Heckerman, A. F. M. Smith and M. West (Eds.)c Oxford University Press, 2010Dynamic Stock Selection Strategies:A Structured Factor Model FrameworkCarlos M. CarvalhoThe University of Chicago and The University of Texas at Austin, @mccombs.utexas.eduHedibert F. LopesThe University of Chicago, [email protected] AguilarFinancial Engines, USAo [email protected] propose a novel framework for estimating the time-varying covariationamong stocks. Our work is inspired by asset pricing theory and associateddevelopments in Financial Index Models. We work with a family of highlystructured dynamic factor models that seek the extraction of the latent structure responsible for the cross-sectional covariation in a large set of financialsecurities. Our models incorporate stock specific information in the estimation of commonalities and deliver economically interpretable factors that areused both, as a vehicle to estimate large time-varying covariance matrix, andas a potential tool for stock selection in portfolio allocation problems. In anempirically oriented, high-dimensional case study, we showcase the use of ourmethodology and highlight the flexibility and power of the dynamic factormodel framework in financial econometrics.Keywords and Phrases: Dynamic factor models; Financial indexmodels; Portfolio selection; Sparse factor models; Structuredloadings.Carlos M. Carvalho is Assistant Professor of Econometrics and Statistics, University ofChicago Booth School of Business. Hedibert F. Lopes is Associate Professor of Econometricsand Statistics, University of Chicago Booth School of Business. Omar Aguilar is Head ofPortfolio Management at Financial Engines. The authors would like to thank RobertMcCulloch for the helpful discussions throughout this project. Carvalho would like toacknowledge the support of the Donald D. Harrington Fellowship Program and the IROMdepartment at The University of Texas at Austin.

2C.M. Carvalho, H.F. Lopes and O. Aguilar1. INTRODUCTIONThe understanding of co-movements among stock returns is a central element inasset pricing research. Knowledge of this covariation is required both to academicsseeking to explain the economic nature and sources of risk and to practitionersinvolved in the development of trading strategies and asset portfolios. This leadsto a vast literature dedicated to the estimation of the covariance matrix of stockreturns; a challenging problem due to complex dynamic patterns and to the rapidgrowth of parameters as more assets are considered.Since the proposal of the Capital Asset Pricing Model (CAPM) by Sharpe (1964)and the Arbitrage Pricing Theory (APT) of Ross (1976), Financial Index Modelsbecame a popular tool for asset pricing. These models assume that all systematicvariation in the return of financial securities can be explained linearly by a set ofmarket indices, or risk factors, leading to a highly structured covariance matrix. Infinancial terms, the implication is that equity risk is multidimensional but pricedefficiently through a set of indices so that the only source of additional expectedreturn is a higher exposure to one of these risk factors.The appeal of index models is two-fold: (i) it leads to tractable and parsimoniousestimates of the covariances and (ii) it is economically interpretable and theoretically justified. It follows that the task of estimating a large covariance matrix gotsimplified to the task of identifying a set of relevant risk factor. This is an empiricalquestion usually guided by economic arguments leading to factors that representmacro-economic conditions, industry participation, etc. A very large body of literature is dedicated to selecting and testing the indices - we refer the reader toCochrane (2001) and Tsay (2005).In a series of papers, Fama and French (FF) identified a significant effect ofmarket capitalization and book-to-price ratio into expected returns. This has leadto the now famous Fama-French 3 factor model where, besides the market, twoindices are built as portfolios selected on the basis of firms’ size and book-to-priceratio. This is perhaps the most used asset pricing model in modern finance researchand it relates to many trading strategies based on “growth” and “value” stocks. Anadditional index based on past performance information (momentum) was proposedby Cahart (1997) and can also be considered a “default” factor these days.The fact that size, book-to-price and momentum are relevant to explain covariation among stocks is exploited in two common ways: as individual regressors in a multivariate linear model; as ranking variables used to construct portfolios that are used as indices.The first approach follows the ideas of Rosenberg and McKibben (1973) and it isknown as the BARRA strategy (after the company BARRA, Inc. founded by BarrRosenberg). The second is initially proposed by Fama and French (1993).Taking the view that Financial Index Models are an appropriate choice for thepurpose of covariance estimation and asset allocation, we develop a dynamic factormodel framework that contextualizes the current ideas behind these 4 aforementioned factors. Our approach will encompass both the BARRA and Fama-Frenchstrategies in a simple yet flexible modeling set up. Part of the innovation is topropose a framework where variable specific information can be used in modelingthe latent structure responsible for common variation. From a methodological viewpoint, our models can be seen as a “structured” extension of current factor modelideas as developed in Aguilar and West (2000), West (2003), Lopes and West (2004),

Dynamic Stock Selection3Lopes, Salazar and Gamerman, (2008) and Carvalho, et al., (2008). On the appliedside our goal is to propose a model-based strategy that creates better FinancialIndex Models, help deliver better estimates of time-varying covariances and lead tomore effective portfolios.We start in section 2 by introducing the general modeling framework. In Section3 we define the specific choices defining the different index models. Section 4 exploresa case study where the different specifications are put to the test in financial terms.Finally, in Section 5 we discuss the connections of our approach with the currentfactor model literature and explore future uses of the ideas presented here.2. GENERAL FRAMEWORKThe general form of an Index Model assumes that stock returns are generated following:rt αt Bt ft t(1)where ft is a vector of common factors at time t, Bt is a matrix of factor loadings(or exposures) and t is a vector of idiosyncratic residuals. If V ar(ft ) Θt andV ar( t ) Φt the model in (1) implies thatV ar(rt ) Bt Θt B0t Φt .When the number of factors is much smaller than the number of stocks, the aboveform for the covariance matrix of returns is represented by a relatively small setof parameters as the only source of systematic variation are the chosen indices.Assuming further that the factors are observable quantities the problem is essentiallyover as one is only left with a simple dynamic regression model and in fact, most ofthe literature will follow a “rolling window” approach based on OLS estimates (seeTsay, 2005, chapter 9).In our work, we take a dynamic, model-based perspective and assume that attime t we observe the vector (rt , xt , Zt ) where: rt is a p-dimensional vector of stock returns; Zt is a p k matrix of firm specific information; and xt is the market return (or some equivalent measure).We represent Index Models as defined by the dynamic factor model framework:rt αt βt xt Zt ft t(2)where βt is a p-dimensional vector of market loadings, t is the vector of idiosyncraticresiduals, and ft is a k-dimensional vector of common factors. Our notation clearlyseparates the one factor that is observed (the market) from the rest of the factorsthat are latent (ft ). In all model specifications, we assume that each element ofboth αt and βt follow a first-order dynamic linear model (West and Harrison, 1997)and that t is defined by a set of independent stochastic volatility models (Jacquier,Polson and Rossi, 1994; Kim, Shephard and Chib, 1998). Finally, we assume thatft N (0, Θt )where Θt is diagonal with dynamics driven by univariate stochastic volatility models.

4C.M. Carvalho, H.F. Lopes and O. AguilarZtrtZtftFigure 1:Illustration summarizing the idea of structuring the loadings withobserved variables present in our proposed framework. The red circles representobservable variables.Defining the factor loadings. One last element remains to be defined and it is inthe core of the different model specifications considered: the (p k) matrix of factorloadings Zt . Through Zt , company specific information will be used to help uncoverrelevant latent structures representing the risk factors. Before getting to the specificdefinitions of Zt it is worth noting that many previously proposed models are nestedin the form of (2). For example, taking βt 0 and fixing the loading through timegets us to the factor stochastic volatility models of Aguilar and West (2000) andPitt and Shephard (1999). Letting the loadings vary in time with a DLM leads tothe model considered in Lopes, Aguilar and West (2000) and Lopes and Carvalho(2007).3. MODEL SPECIFICATIONS3.1. Dynamic CAPMWe start with the simplest alternative in the proposed framework. Let Zt 0 forall t and the dynamic CAPM follows:rtαi,t αt βt xt tN (αi,t 1 , τα2i )βi,t i,t N (βi,t 1 , τβ2i )SV Modelwith independent dynamics for αt , βt and t across i, for i 1, . . . , p. This is alsothe model with a very simple implementation strategy where conditional on themarket, all the estimation is done in parallel for all p components in the vector ofreturns. Due to its historical relevance, this dynamic version of the CAPM will serveas the benchmark for comparing the alternative specifications.3.2. Dynamic BARRAIf we now set Zt Zt we get a dynamic version of the BARRA approach wherethe loadings are deterministically specified by the company-specific variables Zt .Following the ideas of Fama and French (1996) and Carhart (1997), Zt would have3 columns with measures of market capitalization (size), book-to-price ratio and

Dynamic Stock Selection5momentum. The model follows:rtαi,t αt βt xt Zt ft tN (αi,t 1 , τα2i )βi,tft i,t N (βi,t 1 , τβ2i )N (0, Θt )SV ModelThis model is jointly estimated as the common factors ft are now latent. This isstill a somewhat standard model as it is a version of the models in Aguilar andWest (2000) and Lopes and Carvalho (2007) where some factors are given (xt ) andtheir loadings have to be estimated and some time-varying loadings are given (Zt )and their factor scores are unknown. It is important to highlight that by fixing theloadings at Zt we force the latent factors to embed the information in the firm specificcharacteristics leading to set of latent factors with a direct economic interpretationas “size”, “book-to-market” and “momentum” factors.3.3. Sparse Dynamic BARRAHaving the different firm-specific characteristics directly defining the factors mightbe problematic due to potentially large amount of noise contained in these variables.The use of portfolios suggested by Fama and French (1993) was originally an attemptto filter out the relevant information contained in firm specific information about theunderlying risk factors defining the covariation of equity returns. In our proposedframework this problem could be mitigated by additional structure in Zt . Forexample, we can take the view that due to excessive noise, some elements of Ztshould not play a role at a given time so that the corresponding element in Ztwould be set to zero. The introduction of sparsity in the loadings matrix of a factormodel, as an attempt to regularize the estimation of factors in large dimensionalproblems, first appears in West (2003) and got further explored in Carvalho et al.(2008) and Frühwirth-Schnatter and Lopes (2010). We extend their approach to thetime-varying loadings set-up of the dynamic BARRA by modeling the loadings offactor j at time t as: Zij,t w.p. πj,tZij,t 0w.p. 1 πj,twhere πj,t are the inclusion probabilities associated with factor j and are usuallymodeled with a beta prior. Again, this is a fairly straightforward model to estimate.Given Zt we are back to a dynamic stochastic volatility factor model whereas, conditional on all remaining unknowns, each elements of Zj,t requires a draw froma simple discrete mixture. Altought simple, the reader should be reminded thatfitting such models to high-dimensional problems is computationally intensive andrequire careful coding as standard statistical packages are not up to the tasks. Asan example, in the p 350 dimensional case study presented below, each MCMCiteration requires, among other things, 703 filter-forward backward-sampling stepsand sampling 1,050 elements of Zt . As a side note, given the conditionally Gaussianstructure of the models, efficient sequential Monte Carlo algorithms are availableand very attractive for the on-line sequential application of the proposed framework(see Aguilar and West, 2000 and Carvalho, Johannes, Lopes and Polson, 2010).

6C.M. Carvalho, H.F. Lopes and O. Aguilar3.4. Dynamic Fama-FrenchFama and French (1996) and Carhart (1997) define factors as portfolios built bysorting stocks based on their individual characteristics. The implied 4 factor model(3 factors plus the market) is by far the most successful empirical asset pricingmodel in modern finance. More specifically, the SMB (small minus big) factor isdefined by ranking the stocks according to their market capitalization and building avalue weighted portfolio with the returns of the firms below the median market cap,minus the returns of the firms above the median. The idea behind this constructionis motivated by the observation that small firms seem to earn larger average returnsrelative to the prediction of the CAPM (also know as “growth” effect).The HML (high minus low) factor is defined by ranking the stocks according totheir book-to-price ratio and building a value weighted portfolio with the returnsof the highest 30% book-to-price firms minus the returns on the lowest 30%. Theintuition here is that “value” stocks have market value that are small relative totheir accounting value and therefore tend to present higher than expected (by theCAPM) returns.Finally, Carhart’s momentum factor (MOM) starts by ranking stocks accordingto some measure of past performance and building equal weighted portfolios withthe returns of the 30% top performers minus the returns on the 30% bottom pastperformers. Again, the idea arises from the observation that stock prices are meanreverting and therefore past losers with present higher than expected returns (seeJegadeesh and Titman, 1993).We borrow these ideas and adapt their construction to our dynamic factor framework. To this end we use the dynamic BARRA set up of Section 3.2 and define Ztfollowing the directions above. This means that, at each time point, the loadingsmatrix takes values defined by the sorting variables size, book-to-price and momentum. In detail, the first column of Zt takes values “ market value” for smallcompanies and “- market value” for large companies (as defined by the median attime t). The second column takes values “ market value” for companies in thetop 30% of book-to-price, “- market value” for companies in the bottom 30% and 0otherwise. The final column is defined with 1 for the top 30% past performers, -1for the bottom 30% and 0 otherwise.Extending the specification of Section 3.3 is immediate and would serve thesimilar purpose of regularization. In addition it is a model-based alternative tosorts and ad-hoc cut-offs for inclusion in each factor. In that spirit, we could definethe Sparse Dynamic Fama-French model in the same manner as in Section 3.3 butwith the potential values of Zt defined according to the instructions of Fama, Frenchand Carhart.3.5. Probit-Sparse Dynamic Factor ModelsIn this final specification we modify the sparse specification (either BARRA or FamaFrench) so that to model the inclusion probabilities as a function of individual firmcharacteristics. By doing so we allow for different relationship forms between firmcharacteristics and their association with a latent risk factor. Once again, let θij,t w.p. πij,tZij,t 0w.p. 1 πij,t ,but now,πij,t probit(γj φj Wij,t ).

Dynamic Stock Selection7In the above, θij,t is whatever chosen value to the loadings when variable i is involvedwith factor j. In the BARRA set up that could be the stock specific informationZt or the simple transformations in the Fama-French context. Wij,t is the variablethat carries information of whether or not stock i and factor j are related. Thisdefinition provides yet additional flexibility in using firm specific information inbuilding systematic risk factors. Instead of using sorts or assuming that inclusionin a factor is exchangeable a priori across firms, this model is more informativeand allow for more complex relationships to be uncovered. This is also a veryuseful context for the use of informative priors in relating variables to factors andfor exploring non-linear relationships with polynomials and related transformationsinside the probit link. One example, that relates directly to the Fama-French sorting,takes Wj to be a measure of distance from the median size company and assumethat it is believed a priori that φj 0. That would imply that the larger (or smaller)a company is the more likely it is to participate in the associated factor.AAPL2.01.5Market Betas1. 09BAC2.01.5Market Betas0. 200812/200911/200010/20029/2004Figure 2: Case Study. Market β ’s of Dow Chemical, Apple, Goldman Sacksand Bank of America for all models. The horizontal red line represents the OLSestimate of β in a simple linear regression.Although very appropriate to the applied context discussed here it is importantto notice that the idea of using additional information in modeling factor loadingsis much more general and widely applicable. Our ideas are inspired by the work ofLopes, Salazar and Gamerman (2008) where priors for factor loadings were informedby spatial locations. In section 4 a simulated example showcases the potentialrelevance of this approach in uncovering important latent structures responsible forcommon variation.

8C.M. Carvalho, H.F. Lopes and O. Aguilar4. EXAMPLES4.1. Case Study: 350 stocksOur case study focuses on a set of 350 stocks in the U.S. market (part of the Russel1000 index). From October 2000 to December 2009 we work with weekly returnsand use size, book-to-price and momentum as stock specific information. An overallvalue-weighted index (from CRSP) is used as market returns. Due to the preliminarynature of this work we selected our variables to avoid missing data problems. Thisexample serves as a test ground for the models and we hope to extend this analysisto the entire population of stocks in the near future.16.516.014.515.015.5First Eingenvalue17.017.5Residual StructureCAPMBARRAFFS-BARRAS-FFModelsFigure 3: Case Study. Eigenvalues of the covariance matrix of standardizedresiduals from each model: Dynamic CAPM, Dynamic BARRA, Dynamic FamaFrench, Sparse Dynamic BARRA and Sparse Dynamic Fama-French. Absence ofresidual covariation would imply a eigenvalue of 1.Five models were considered in the initial analysis: (i) Dynamic CAPM, (ii)Dynamic BARRA, (iii) Dynamic Fama-French, (iv) Sparse Dynamic BARRA and(v) Sparse Dynamic Fama-French. Figure 2 shows the posterior means of the marketβt ’s for four companies in all models. The first thing to take notice is the cleardynamic nature of β – a fact that is ignored in a variety of empirical and theoreticalwork where OLS estimates (like the one presented in the figure) are used. It is alsointeresting to notice that the path of β’s is very similar in all models leading to theconclusion that the market information is essentially orthogonal to the informationcontained in individual firm characteristics (at least in relation to the factors theycreate). This empirical fact has been observed in several articles in the financeliterature and is discussed in detail by Cochrane (2001). In other words, our differentfactor models are seeking to uncover the latent structure left after the CAPM doesits job.A summary of the remaining unexplained linear “structure” in the residuals appears in Figure 3 where we compare the first eigenvalue of the standardized residualcovariance matrix of each model. No residual structure would imply an eigenvalueof 1. It is important to remember that all models other than the Dynamic CAPMare of the same complexity and try to explain covariation with 4 factors. As expected, the simplest model, i.e., the Dynamic CAPM, leaves the most structure

Dynamic Stock Selection9behind while the Sparse Dynamic Fama-French picks up the most common variation among stocks. This is the first indication that our initial conjecture that not allstocks should be playing a role in determining the underlying factor associated withfirm characteristics might be a relevant one. By simply zeroing out some elementsof Zt we ended up extracting factors better able to explain common variation, atleast under this simple measure.Table1: Bayes Factors in relation to the benchmark Dynamic CAPM.ModelDynamic BARRADynamic Fama-FrenchSparse Dynamic BARRASparse Dynamic Fama-Frenchlog(BF)-267.59-102.55343.50473.44A more relevant overall comparison of the performance of the models is presentedin Table 1 where an approximate measure of the log Bayes Factor in relation to theDynamic CAPM is presented (See Lopes and West, 2004). The evidence in favorof the Sparse BARRA and Sparse Fama-French specification is overwhelming whilethe simple Dynamic CAPM seems to be a better alternative than both the DynamicBARRA and Dynamic Fama-French. Once again, this indicates that firm specificinformation can be helpful in uncovering relevant underlying structure but a simplead-hoc definition of the loadings is not sufficient. The Sparse Dynamic BARRAand Sparse Dynamic Fama-French are our first attempt in trying to improve themodeling of the loadings and their results are so far promising.BARRASparse 200010/20028/20067/200811/200010/2002Sparse 67/2008Sparse 5-0.50Factor-1-2-3Factor00.5151.010Sparse 2002Sparse 67/2008Sparse 8Figure 4: Case Study. Posterior means of the factor scores. The rows representthe “size”, “book-to-price” and “momentum” factors.

10C.M. Carvalho, H.F. Lopes and O. (BARRA) 5: Case Study. Scatter plots of factor scores from the Dynamic BARRAand Sparse Dynamic BARRA model specifications. In red, the 0-1 51.0abs(FF)1.52.0012345abs(FF)Figure 6: Case Study. Scatter plots of factor scores from the Dynamic FamaFrench and Sparse Dynamic Fama-French model specifications.To better understand the results in the different specifications it is worth examining the factor scores a little closer. Figure 4 shows the posterior means forall 3 latent factors in all models. It is clear that the ft ’s are very different at afirst glance as different values of Zt have a tremendous impact in the estimation offt . This is indeed the case when comparing the factors from the Dynamic BARRAand Dynamic Fama-French. A second look however, shows that the results fromthe Dynamic BARRA and Dynamic Fama-French are quite related to their sparsecounterparts. Figures 5 and 6 display scatter plots of the absolute value of each ofthe 3 factor scores in both sparse and non-sparse models. They are clearly linearlyrelated but the results from the Dynamic BARRA are overly shrunk towards zerodue to excessive noise in the loadings. The regularization exert by the sparse representation is able to better identify time periods where just a subset of stocks arereally associated with the size, book-to-price and momentum effects leading to riskfactors that are better able to explain covariation.

Dynamic Stock Selection11Table 2: Inclusion Probabilities: “Overall” stands for the overall average of theposterior means of πj,t for each factor j . “Peak Dates” refer to the average forthe time periods when we identify a big disparity between the factor scores obtainedin the sparse versus non-sparse model specifications. In the Sparse Fama-Frenchmodel, we don’t observe the shrinkage effect in the Book-to-Market factor hence theN/A values.Size (BARRA)Book-to-Market (BARRA)Momentum (BARRA)Size (FF)Book-to-Market (FF)Momentum k gs(w/o Z)0.500.0-2-1Loadings11.522.0This point is emphasized by Table 2 where we summarize and compare theoverall estimates of the inclusion probabilities πj,t relative to their values whenfactors scores are overly shrunk by the non-sparse models. The clear reductionin the probabilities implies that only a smaller subset of stocks share covariationthrough the characteristics based factors. Recall that the differences in the BayesFactor between the Dynamic BARRA and Fama-French and their sparse versionsare enormous even though the difference in their latent factor scores is adings(with Z)Figure 7:Illustrative example. The left panel shows the relationship of theloadings in factor 2 with the explanatory variable Z . The right panel plots theestimates of the loadings with or without the information in Z .Finally, Figure 11 shows the growth in estimation risk as a function of dimension (p) and the conclusion is simple: the larger the problem, the higher the importance of appropriately using the information in Z.To explore the financial effects of the different models, we build minimum variance portfolios based on the sequence of estimates of the covariance matrices ofreturns. This comparison is useful as it isolates the impact of the covariance matrix in investment decisions as the optimization solution only involves its inverse.

12C.M. Carvalho, H.F. Lopes and O. Aguilar1.00.6Risk 0812/20091.00.6Risk Ratio1.4Fama-French11/200010/20029/20041.00.6Risk Ratio1.4S-BARRA11/200010/20029/20041.00.6Risk Ratio1.4S-FF11/200010/20029/2004Figure 8: Case Study. The estimated risk ratio of the returns obtained fromminimum variance portfolios from the different models relative to the DynamicCAPM. The volatility of the returns associated with each strategy was estimatedvia a stochastic volatility model.Figure 8 displays the series of risk ratios of each portfolio vis-a-vis the benchmarkportfolio constructed by the Dynamic CAPM. Once again the observation is thatthe Sparse Dynamic BARRA and Sparse Dynamic Fama-French provide a significant improvement over the Dynamic CAPM as, for most time points, it results in aless volatile investment option.

Dynamic Stock Selection131.00.6Risk Ratio1.4S-FF vs. re 9: Case Study. The estimated risk ratio of the returns obtained fromminimum variance portfolios in the Sparse Dynamic Fama-French relative to theSparse Dynamic Barra. The volatility of the returns associated with each strategywas estimated via a stochastic volatility model.4.2. An IllustrationWe close this example with an illustration of the overall improvement of the proposedmodels relative to what we commonly see in many asset pricing articles. Figure 9presents boxplots of the percentage of variation explained by the models (essentiallya R2 like measure) for each return series. The red boxplots refer to the standardregression-based CAPM, BARRA and Fama-French while their green counterpartsare obtained from our proposed models. It is clear that the time-varying frameworkprovides potentially relevant improvements and, once again, their sparse versionsappear on top.Factor 0Factor 1no ZZno ZZFigure 10: Illustrative example. Errors in the estimation of factor scores over100 simulations.

C.M. Carvalho, H.F. Lopes and O. Aguilar301020MSE reduction (%)40501430 1001000dim (p) 11:Illustrative example. Estimation risk as a function of dimension.The y -axis represents the reduction in mean squared error of factor scores whenthe information about Z is used relative to a simple sparse factor e 12: Case Study. Boxplots of the percentage of variation explained byeach model for all stocks. The red plots are based on simple linear regressionswhereas the green represent the proposed model-based strategy. The blue plot refersto the better performing model, i.e, the Sparse-Dynamic BARRA.Our initial conjecture is somewhat validated by the performance

as individual regressors in a multivariate linear model; as ranking variables used to construct portfolios that are used as indices. The rst approach follows the ideas of Rosenberg and McKibben (1973) and it is known as the BARRA strategy (after the company