Distributional Semantics applied to Flickr® Tags

Upcoming Publications

MARIANNA BOLOGNESI, International Center for Intercultural Exchange

Distributional Semantics meets Embodied Cognition: Flickr® as a database of semantic features
Selected Papers from the 4th UK Cognitive Linguistics Conference (in press)

Distributional models such as Latent Semantic Analysis (LSA, Landauer, Dumais 1997) generate semantic spaces based on words’ co-occurrences in linguistic contexts. The semantic representations that emerge from these models are based on solely linguistic information, leaving aside the information that we retrieve from perceptual experiences. The analysis proposed applies the methods of distributional semantics to Flickr®, a corpus of images enhanced with metadata (tags), expressing a wide range of concepts, including perceptual features triggered by the experiences captured in the photographs. A case study on the domain of colors shows how a distributional analysis based on Flickr® can produce semantic representations for color terms that better resemble the similarity judgments provided by humans, when compared to those that emerge from distributional models based on solely linguistic information.

Flickr® Distributional Tagspace: Evaluating the semantic spaces emerging from Flickr® tags distributions
Language and Cognition (submitted)

This study discusses a distributional method that aims to model grounded semantic representations starting from extensive analyses of concept co-occurrences across extra-linguistic contexts. The instances of situations taken into account as extra-linguistic contexts are formalized by the sets of tags associated by internet users to the pictures uploaded on Flickr®, the video/image hosting service powered by Yahoo!. The emergent semantic structures that arise from these openended large-scale bodies of uncoordinated annotations provided by humans are investigated with Flickr® Distributional Tagspace (FDT), a distributional method that yields semantic spaces where the concepts that appear in similar situations cluster automatically (the more distributionally similar they are, the closer they appear in the semantic space). FDT is evaluated in two ways: (i) through a correlation study that highlights the degree of matching between the semantic representations that it creates and those that emerge from aggregations of features, produced by speakers who were asked to list the properties of given concepts; (ii) through a categorization task, where the ability of FDT to distinguish between predetermined semantic categories is tested. The results suggest that FDT can generate semantic representations that correlate with those that emerge from aggregations of features, and can cluster homogeneous categories and subcategories of related concepts, thus competing with state-of-the-art distributional methods.

Leave a Reply

Your email address will not be published. Required fields are marked *