Dynamic and Static Prototype Vectors for Semantic CompositionSiva ReddyUniversity of York, [email protected] P. KlapaftisUniversity of York, [email protected] McCarthyLexical Computing Ltd, [email protected] ManandharUniversity of York, [email protected] dependencies such as subject/verb, object/verb relations, so as to extract the featureswhich serve as the dimensions of the vector space.Each word is then represented as a vector of theextracted features, where the frequency of cooccurrence of the word with each feature is usedto calculate the vector component asociated withthat feature. Figure 1 provides an example of twonouns assuming a bag-of-words model.Compositional Distributional Semanticmethods model the distributional behavior of a compound word by exploiting thedistributional behavior of its constituentwords. In this setting, a constituent wordis typically represented by a feature vector conflating all the senses of that word.However, not all the senses of a constituentword are relevant when composing the semantics of the compound. In this paper,we present two different methods for selecting the relevant senses of constituentwords. The first one is based on WordSense Induction and creates a static multiprototype vectors representing the sensesof a constituent word. The second createsa single dynamic prototype vector for eachconstituent word based on the distributional properties of the other constituentsin the compound. We use these prototype vectors for composing the semantics of noun-noun compounds and evaluate on a compositionality-based similarity task. Our results show that: (1) selecting relevant senses of the constituentwords leads to a better semantic composition of the compound, and (2) dynamicprototypes perform better than static prototypes.1animalhouseh 30hunting h 90buy6015vector dimensionsapartment price90551220rent4533kill10 i90 iFigure 1: A hypothetical vector space model.Compositional Distributional Semantic methods formalise the meaning of a phrase by applying a vector composition function on thevectors associated with its constituent words(Mitchell and Lapata, 2008; Widdows, 2008).For example, the result of vector addition tocompose the semantics of house hunting fromthe vectors house and hunting is the vectorh120, 75, 102, 75, 78, 100i.As can be observed the resulting vector doesnot reflect the correct meaning of the compoundhouse hunting due to the presence of irrelevantco-occurrences such as animal or kill. These cooccurrences are relevant to one sense of hunting,i.e. (the activity of hunting animals), but not to thesense of hunting meant in house hunting, i.e. theactivity of looking thoroughly. Given that huntinghas been associated with a single prototype (vector) by conflating all of its senses, the applicationof a composition function is likely to includeirrelevant co-occurrences in house hunting.A potential solution to this problem would involve the following steps:IntroductionVector Space Models of lexical semantics havebecome a standard framework for representinga word’s meaning. Typically these methods(Schütze, 1998; Pado and Lapata, 2007; Erkand Padó, 2008) utilize a bag-of-words model or705Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 705–713,Chiang Mai, Thailand, November 8 – 13, 2011. c 2011 AFNLP

following Mitchell and Lapata (2008) are definedas follows:1. build separate prototype vectors for each ofthe senses of house and hunting2. select the relevant prototype vectors of houseand hunting and then perform the semanticcomposition.In this paper we present two methods (section3) for carrying out the above steps on noun-nouncompounds. The first one (section 3.1) appliesWord Sense Induction (WSI) to identity different senses (also called static multi prototypes) ofthe constituent words of a compound noun andthen applies composition by choosing the relevantsenses. The second method (section 3.2) does notidentify a fixed set of senses. Instead, it representseach constituent by a prototype vector which isbuilt dynamically (also called as a dynamic prototype) by activating only those contexts consideredto be relevant to the constituent in the presence ofthe other constituent, and then performs the composition on the dynamic prototypes. For performing composition, we use vector composition functions.Our evaluation (section 5) on a task for rating similarity between noun-noun compound pairsshows: (1) sense disambiguation of constituentsimproves semantic composition and (2) dynamicprototypes are better than static multi prototypesfor semantic composition.2ADD: (N) α n β n0i.e. (N)i α ni β n0iMULT: (N) nn0i.e. (N)i ni . n0i(1)where α and β are real numbers.Relevant to our work is the work of Erk andPadó (2008) who utilize a structured vector spacemodel. The prototype vector of a constituent wordis initially built, and later refined by removing irrelevant co-occurrences with the help of the selectional preferences of other constituents. The refined vectors are then used for the semantic composition of the compound noun. The results areencouraging showing that polysemy is a problemin vector space models. Our approach differs totheirs in the way we represent meaning - we experiment with static multi prototypes and dynamicprototypes. Our vector space model is based onsimple bag-of-words which does not require selectional preferences for sense disambiguation andcan be applied to resource-poor languages.There are several other researchers who tried toaddress polysemy for improving the performanceof different tasks but not particularly to the task ofsemantic composition. Some of them are Navigliand Crisafulli (2010) for web search results clustering, Klapaftis and Manandhar (2010b) for taxonomy learning, Reisinger and Mooney (2010) forword similarity and Korkontzelos and Manandhar(2009) for compositionality detection. In all cases,the reported results demonstrate that handling polysemy lead to improved performance of the corresponding tasks. This motivates our research forhandling polysemy for the task of semantic composition using two different methods described inthe next section.Related workAny distributional model that aims to describe language adequately needs to address the issue ofcompositionality. Many distributional composition functions have been proposed in order to estimate the semantics of compound words fromthe semantics of the constituent words. Mitchelland Lapata (2008) discussed and evaluated various composition functions for phrases consistingof two words. Among these, the simple additive(ADD) and simple multiplicative (MULT) functions are easy to implement and competitive withrespect to existing sophisticated methods (Widdows, 2008; Vecchi et al., 2011).Let us assume a target compound noun N thatconsists of two nouns n and n0 . Bold letters represent their corresponding distributional vectors obtained from corpora. (N) denotes the vector ofN obtained by applying the composition function on n and n0 . Real number vi denote the ith cooccurrence in v. The functions ADD and MULT3Sense Prototype Vectors for SemanticCompositionIn this section we describe two approaches forbuilding sense specific prototype vectors of constituent words in a noun-noun compound. The firstapproach performs WSI to build static multi prototype vectors. The other builds a single dynamicprototype vector for each constituent by activating only the relevant exemplars of the constituent706

with respect to the other constituent. An exemplaris defined as a corpus instance of a target word.These sense specific prototype vectors are thenused for semantic composition. Let N be the compound noun with constituents n and n0 . Our aimis to select the relevant senses of n and n0 .3.1Static Multi Prototypes Based SenseSelectionIn the first stage (section 3.1.1), a WSI method isapplied to both n and n0 . The outcome of this stageis a set of clusters (senses). Each of these clusters is associated with a prototype vector takingthe centroid of the cluster. Following Reisingerand Mooney (2010) we use the terminology multiprototype vectors in the meaning of sense clusters. Let S(n) (resp. S(n0 )) be the set of prototypes of n, where each sni S(n) denotes theith sense of the noun n. Since these prototypes ofconstituents are static and do not change when thecompound changes we refer to them as static multiprototypes.In the next stage (section 3.1.2), we calculate allthe pairwise similarities between the clusters of nand n0 , so as to select a pair of clusters with thehighest similarity. The selected clusters are thencombined using a composition function, to produce a single vector representing the semantics ofthe target compound noun N .3.1.1Figure 2: Running example of WSIThe aim of this stage is to capture words contextually related to tw. In the first step, the targetword is removed from bc and part-of-speech tagging is applied to each context. Only nouns andverbs are kept and lemmatised. In the next step,the distribution of each word in the base corpusis compared to the distribution of the same nounin a reference corpus using the log-likelihood ratio (G2 ) (Dunning, 1993). Words that have a G2below a pre-specified threshold (parameter p1 ) areremoved from each context of the base corpus.The result of this stage is shown in the upper leftpart of Figure 2.Graph creation & clustering: Each context ci bc is represented as a vertex in a graph G. Edgesbetween the vertices of the graph are drawn basedon their similarity, defined in Equation 2, wheresmcl (ci , cj ) is the collocational weight of contexts ci , cj and smwd (ci , cj ) is their bag-of-wordsweight. If the edge weight W (ci , cj ) is abovea prespecified threshold (parameter p3 ), then anedge is drawn between the corresponding verticesin the graph.Graph-based WSIWord Sense Induction is the task of identifying thesenses of a target word in a given text. We apply a graph-based sense induction method, whichcreates a graph of target word instances and thenclusters that graph to induce the senses. We follow the work of Klapaftis and Manandhar (2010a)for creating the graph and apply Chinese Whispers(CW) (Biemann, 2006), a linear graph clusteringmethod that automatically identifies the number ofclusters.Figure 2 provides a running example of the different stages of the WSI method. In the example,the target word mouse appears with the electronicdevice sense in the contexts A, C, and with the animal sense in the contexts B and D.Corpus preprocessing: Let bc denote the basecorpus consisting of the contexts containing thetarget word tw. In our work, a context is definedby a set of words in a window of size 100 aroundthe target.1W (ci , cj ) (smcl (ci , cj ) smwd (ci , cj )) (2)2Collocational weight: The limited polysemy ofcollocations is exploited to compute the similarity between contexts ci and cj . In this setting, acollocation is a juxtaposition of two words withinthe same context. Given a context ci , a total ofN2 collocations are generated by combining eachword with any other word in the context. Each collocation is weighted using the log-likelihood ratio(G2 ) (Dunning, 1993) and is filtered out if the G2707

ParameterG2 word threshold (p1 )G2 collocation threshold (p2 )Edge similarity threshold (p3 )is below a prespecified threshold (parameter p2 ).At the end of this process, each context ci of tw isassociated with a set of collocations (gi ) as shownin the upper right part of Figure 2 . Given two contexts ci and cj , the Jaccard coefficient is used tocalculate the similarity between the collocational g g sets, i.e. smcl (ci , cj ) gii gjj .Range15,25,35,4510,15,200.05,0.09,0.13Table 1: WSI parameter values.3.1.2Bag-of-words weight: Estimating context similarity using collocations may provide reliable estimates regarding the existence of an edge in thegraph, however, it also suffers from data sparsity.For this reason, a bag-of-words model is also employed. Specifically, each context ci is associatedwith a set of words (gi ) selected in the corpuspreprocessing stage. The upper left part of Figure 2 shows the words associated with each context of our example. Given two contexts ci andcj , the bag-of-words weight is defined to be theJaccard coefficient of the corresponding word sets, g g i.e. smwd (ci , cj ) gii gjj .Cluster selectionThe application of WSI on the nouns n N andn0 N results in two sets of clusters (senses)S(n) and S(n0 ). Cluster S(n) is a set of contextsof the word n. Each context is represented asan exemplar e, a vector specific to the context.Only the 10000 most frequent words in theukWaC (along with their part-of-speech category)are treated as the valid co-occurrences i.e. thedimensionality of the vector space is 10000. Forexample, the exemplar of hunting in the context“the-x purpose-n of-i autumn-n hunting-n be-vin-i part-n to-x cull-v the-x number-n of-i young-jautumn-n fox-n” is h purpose-n:1, autumn-n:2,part-n:1, cull-v, number-n:1, young-j:1, fox-n:1 iFor every cluster sni in S(n) we construct a prontotype vector vsi by taking the centroid of all theexemplars in the cluster. Following Mitchell andLapata (2008), the context words in the prototypevector are set to the ratio of probability of the context word given the target word to the overall probability of the context word1 .The next step is to choose the relevant sense ofeach constituent for a given compound. We assume that the meaning of a compound noun canbe approximated by identifying the most similarsenses of each of its constituent nouns. Accordningly all the pairwise similarities between the vsi0nand vsi are calculated using cosine similarity andthe pair with maximum similarity is chosen forcomposition.Finally, the collocational weight and bag-ofwords weight are averaged to derive the edgeweight between two contexts as defined in Equation 2. The resulting graph of our running example is shown on the bottom of Figure 2. This graphis the input to CW clustering algorithm. Initially,CW assigns all vertices to different classes. Eachvertex i is processed for an x number of iterationsand inherits the strongest class in its local neighborhood LN in an update step. LN is defined asthe set of vertices which share a direct connectionwith vertex i. During the update step for a vertexi: each class Ck receives a score equal to the sumof the weights of edges (i, j), where j has been assigned class Ck . The maximum score determinesthe strongest class. In case of multiple strongestclasses, one is chosen randomly. Classes are updated immediately, which means that a node caninherit classes from its LN that were introduced inthe same iteration.Experimental setting The parameters of the WSImethod were fine-tuned on the nouns of theSemEval-2007 word sense induction task (Agirreand Soroa, 2007) under the second evaluation setting of that task, i.e. supervised (WSD) evaluation.We tried various parameter combinations shown inTable 1. Specifically, we selected the parametercombination p1 15, p2 10, p3 0.05 that maximized the performance in this evaluation. We useukWaC (Ferraresi et al., 2008) corpus to retrieveall the instances of the target words.3.2Dynamic Prototype Based Sense SelectionKilgarriff (1997) argues that representing a wordwith a fixed set of senses is not a good way of modelling word senses. Instead word senses should bedefined according to a given context. We propose adynamic way of building word senses for the constituents of a given compound.We use an exemplar-based approach to build thedynamic sense of a constituent with the help ofother constituent. In exemplar-based modelling1This is similar to pointwise mutual information withoutlogarithm708

followed by huge tarpon that like to use thethe Christmas trade this year or theembrace better health - but doing so in thepresent your organisation in a professionalcontinues to be significant, together with otherand near-infrared light, along with redlightlightslightlightlightlightof your torch to help them hunt. At thewill be off, probably for ever. The Merrymenof real and trusted information about theand in a way our all our clients such as electrical engineeringemitted by hydrogen atoms and green lightFigure 3: Six random sentences of light from ukWaCwith the help of the other constituent word trafficand then build a prototype vector of light whichis related to the compound traffic light. Our intuition and motivation for exemplar removal is thatit is beneficiary to choose only the exemplars oflight which have context words related to trafficsince the exemplars of traffic light will have context words related to both traffic and light. For example car, road, transport will generally be foundwithin the contexts of all the words traffic, lightand traffic light.We rank each exemplar of light with the help ofcollocations of traffic. Collocations of traffic aredefined as the context words which frequently occur with traffic, e.g. car, road etc. The exemplar oflight representing the sentence “Cameras capturecars running red lights . . .” will be ranked higherthan the one which does not have context words related to traffic. We use Sketch Engine2 (Kilgarriffet al., 2004) to retrieve the collocations of trafficfrom ukWaC. Sketch Engine computes the collocations using Dice metric (Dice, 1945). We builda collocation vector Trafficcolloc from the collocations of traffic.We also rank each exemplar of light using thedistributionally similar words to traffic i.e. wordswhich are similar to traffic e.g. transport, flow etc.These distributionally similar words helps to reduce the impact of data sparseness and helps prioritize the contexts of light which are semantically related to traffic. Sketch Engine is again usedto retrieve distributionally similar words of trafficfrom ukWaC. Sketch Engine ranks similar wordsusing the method of Rychlý and Kilgarriff (2007).We build the vector Trafficsimilar which consists ofthe similar words of traffic.Every exemplar e from the exemplar set Elight 3is finally ranked by(Erk and Padó, 2010; Smith and Medin, 1981),each word is represented by all its exemplars without conflating them into a single vector. Depending upon the purpose, only relevant exemplars of the target word are activated. Exemplarbased models are more powerful than just prototype based ones because they retain specific instance information. As described in the previoussection, an exemplar is a vector that represents asingle instance of a given word in the corpus.Let En be the set of exemplars of the word n.Given a compound N with constituents n and n0 ,we remove irrelevant exemplars in En creating a0refined set Enn En with the help of the other0constituent word n0 . The prototype vector nn ofn is then built from the centroid of the refined ex00emplar set Enn . The vector nn represents the relevant prototype vector (sense) of n in the presenceof the other constituent word n0 . Unlike the staticprototypes defined in the previous section, the prototype vectors of n and n0 are built dynamicallybased on the given compound. Therefore, we refer to them as dynamic prototype vectors.3.2.1Building Dynamic PrototypesWe demonstrate our method of building dynamicprototypes with an example. Let us take thecompound traffic light. Let Traffic, Light andTrafficLight denote the prototype vectors oftraffic, light and traffic light respectively. Wordlight occurs in many contexts such as quantum theory, optics, lamps and spiritual theory. In ukWaC,light occurs with 316,126 exemplars. Figure 3displays 6 random sentences of light from ukWaC.None of these exemplars are related to the targetcompound traffic light. When a prototype vector of light is built from all its exemplars, irrelevant exemplars add noise increasing the semantic differences between traffic light and light andthereby increasing the semantic differences between TrafficLight and Traffic Light. Thisis not desirable. The cosine similarity sim(Light,TrafficLight) is found to be 0.27.We aim to remove irrelevant exemplars of lightsim(e, Trafficcolloc ) sim(e, Trafficsimilar )2ukSketch Engine Elight , we do not include the sentences which have thecompound noun traffic light occurring in them.709

AnnotatorWe choose the top n% of the ranked exemplarstraffic .in Elight to construct a refined exemplar set ElightA prototype vector LightTraffic is then built bytraffic . LightTraffic detaking the centroid of Elightnotes the sense of light in the presence of traffic. Since sense of light is built dynamically basedon the given compound (here traffic light), we define LightTraffic as the dynamic prototype vector.The similarity sim(LightTraffic , TrafficLight)is found to be 0.47 which is higher than the initial similarity 0.27 of Light and TrafficLight.This shows that our new prototype vector of lightis closer to the meaning of traffic light.Similarly we build the dynamic prototype vector TrafficLight of traffic with the help oflight. The dynamic prototypes TrafficLight andLightTraffic are used for semantic composition toconstruct TrafficLight LightTraffic44251111145.327617Models EvaluatedWe evaluate all the models w.r.t. the compositionfunctions ADD and MULT.Composition functionsStatic Single Prototypes: This model does notperform any sense disambiguation and is similarto the method described in (Mitchell and Lapata,2008). The prototype vector of each constituentformed by conflating all its instances is used tocompose the vector of the compound.Static Multi Prototypes: In the method describedin section 3.1, word sense induction produces alarge number of clusters i.e. static multi prototypes. We tried various parameters like choosingthe target prototype of a constituent only from thetop 5 or 10 large clusters.EvaluationDynamic Prototypes: In the method describedin section 3.2, the dynamic prototype of a constituent is produced from the top n% exemplarsof the ranked exemplar set of the constituent. Wetried various percent activation (n%) values - 2%,5%, 10%, 20%, 50%, 80%.DatasetMitchell and Lapata (2010) prepared a dataset4which contains pairs of compound nouns and theirsimilarity judgments. The dataset consists of 108compound noun pairs with each pair having 7 annotations from different annotators who judge thepair for similarity. A sample of 5 compound pairsis displayed in Table 2.5.2ratingLet N and N 0 be a pair. To evaluate a model, wecalculate the cosine similarity between the composed vectors (N) and (N0 ) obtained from thecomposition on sense based prototypes generatedby the model. These similarity scores are correlated with human mean scores to judge the performance of the model.Mitchell and Lapata (2010) introduced an evaluation scheme for semantic composition models. Weevaluate on their dataset, describe the evaluationscheme, and present the results of various models.5.1N’committee meetingcommittee meetingleague matchbus companyassistant managerTable 2: Evaluation dataset of Mitchell and Lapata(2010)Given a compound, we perform composition usingthe sense based prototypes selected in the abovesection. We use the composition functions ADDand MULT described in Equation 1.For the function ADD, we use equal weights forboth constituent words i.e. α β 1. For thefunction MULT there are no parameters.5Nphone callphone callfootball clubhealth servicecompany directorCompound Prototype: We directly use the corpus instances of a compound to build the prototype vector of the compound. This method doesnot involve any composition. Ideally, one expectsthis model to give the best performance.Static Multi Prototypes with Guided Selection:This is similar to Static Multi Prototypes modelexcept in the way we choose the relevant prototype for each constituent. In section 3.1.2 we described an unsupervised way of prototype selection from multi prototypes. Unlike there, here wechoose the constituent prototype (sense) which hasthe highest similarity to the prototype vector of theEvaluation SchemeFor each pair of the compound nouns, the meanvalue of all its annotations is taken to be the finalsimilarity judgment of the compound.4We would like to thank Jeff Mitchell and Mirella Lapatafor sharing the dataset.710

Parameter DescriptionADDStatic Single Prototypes0.5173Static Multi PrototypesTop 5 clusters0.1171Top 10 clusters0.0663Dynamic PrototypesTop 2 % exemplars0.6261Top 5 % exemplars0.6326Top 10 % exemplars0.6402Top 20 % exemplars0.6273Top 50 % exemplars0.5948Top 80 % .65150.63590.63400.6355Static Multi Prototypes with Guided SelectionTop 5 clusters0.2290 0.4187Top 10 clusters0.2710 0.4140Compound Prototype0.4152Table 3: Spearman Correlations of Model predictions with Human Predictionspossible reason for poor performance could be because of the sense selection process (section 3.1.2)which might have failed to choose the relevantsense of each constituent word.compound. This is a guided way of sense selection since we are using the compound prototypevector which is built from the compound’s corpusinstances. The performance of this model gives usan idea of the upper boundary of multi prototypemodels for semantic composition.5.4However, Static Multi Prototypes with GuidedSense Selection still fail to perform better thanStatic Single Prototypes. Therefore, we can conclude that the lower performance of Static MultiPrototypes cannot be attributed to the sense selection process only. Despite that, the applied graphclustering method results in the generation of avery large number of clusters, some of which refer to the same word usage with subtle differences.Hence, our future work focuses on a selection process that chooses multiple relevant clusters of aconstituent word. Additionally, our ongoing worksuggests that the use of verbs as features in thegraph creation process (section 3.1.1) causes theinclusion of noisy edges and results in worse clustering.Results and DiscussionAll the above models are evaluated on the datasetdescribed in section 5.1. Table 3 displays theSpearman correlations of all these models with thehuman annotations (mean values).The results of Static Single Prototypes modelare consistent with the previous findings ofMitchell and Lapata (2010), in which MULT performed better than ADD.All the parameter settings of Dynamic Prototypes outperformed Static Single Prototypes. Thisshows that selecting the relevant sense prototypesof the constituents improve semantic composition.We also observe that the highest correlation isachieved by including just the top 2% exemplarsfor each constituent. It seems that as the sample ofexemplars increases, noise increases as well, andthis results in a worse performance.The comparison between Static Single Prototypes and Static Multi Prototypes shows that theformer performs significantly better than the latter. This is not according to our expectation. TheOur evaluation also shows that Dynamic Prototypes provide a better semantic composition thanStatic Multi Prototypes. The main reason for thisresult stems from the fact that Dynamic Prototypesexplicitly identify the relevant usages of a constituent word with respect to the other constituentand vice versa, without having to deal with a set ofissues that affect the performance of Static MultiPrototypes such as the clustering and the sense se711

lection process.The performance of Compound Prototype islower than the compositional models. The reasoncould be due to the data sparsity. Data sparsity isknown to be a major problem for modelling themeaning of compounds. In a way, the results areencouraging for compositional models.In all these models, the composition functionMULT gave a better performance than ADD.6Chris Biemann. 2006. Chinese Whispers - An Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems.In Proceedings of TextGraphs, pages 73–80, NewYork, USA.Lee R. Dice. 1945. Measures of the amount ofecologic association between species. Ecology,26(3):pp. 297–302.Ted Dunning. 1993. Accurate Methods for the Statistics of Surprise and Coincidence. ComputationalLinguistics, 19(1):61–74.ConclusionsKatrin Erk and Sebastian Padó. 2008. A structuredvector space model for word meaning in context. InProceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08,pages 897–906, Stroudsburg, PA, USA. Associationfor Computational Linguistics.This paper presented two methods for dealing withpolysemy when modeling the semantics of a nounnoun compound. The first one represents sensesby creating static multi prototype vectors, whilethe second represents context-specific sense of aword by generating a dynamic prototype vector.Our experimental results show that: (1) sense disambiguation improves semantic composition, and(2) dynamic prototypes are a better representationof senses than static multi prototypes for the taskof semantic composition.In future, we would like to explore otherstatic multi prototype approaches of Reisingerand Mooney (2010) and Klapaftis and Manandhar(2010a) in comparison with dynamic prototypes.Dynamic prototypes are found to be particularlyencouraging since they present a different mechanism for sense representation unlike traditionalmethods.Katrin Erk and Sebastian Padó. 2010. Exemplarbased models for word meaning in context. In Proceedings of the ACL 2010 Conference Short Papers,ACLShort ’10, pages 92–97, Stroudsburg, PA, USA.Association for Computational Linguistics.Adriano Ferraresi, Eros Zanchetta, Marco Baroni, andSilvia Bernardini. 2008. Introducing and evaluating ukWaC, a very large web-derived corpus ofenglish. In Proceedings of the WAC4 Workshop atLREC 2008, Marrakesh, Morocco.Adam Kilgarriff, Pavel Rychly, Pavel Smrz, and DavidTugwell. 2004. The sketch engine. In Proceedingsof EURALEX 2004.

bcis represented as a vertex in a graph G. Edges between the vertices of the graph are drawn based on their similarity, dened in Equation 2, where smcl(ci;cj) is the collocational weight of con-texts ci, cj and smwd(ci;cj) is their bag-of-words weight. If the edge weight W(ci;cj) is above a prespecied threshold (parameter p3), then an