Variation Among Blogs: A Multi-dimensional Analysis
by Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova
In: Genres on the Web Computational Models and Empirical Studies
Alexander Mehler, Serge Sharoff and Marina Santini
Text, Speech and Language Technology
Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9
This chapter uses multi-dimensional analysis to investigate functional linguistic variation in internet blogs, with the goal of identifying text types that are distinguished linguistically. A 2 million word corpus of blogs written in American English, sampled across a wide range of topics, is analyzed for this purpose. The corpus is tagged for grammatical information and a factor analysis is carried out to identify the major linguistic patterns of co-occurrence across this corpus. The resultant factors are interpreted as underlying dimensions of functional linguistic variation. The dimensions are subsequently used as predictors in a cluster analysis, which identifies the text types that are linguistically well-defined in this domain of use. These texts types are interpreted functionally by reference to the typical thematic domains and communicative purposes of the blogs grouped into each type. Two main sub-types of blogs are identified: personal blogs and thematic blogs.
A blog, short for a weblog, is a website containing an archive of regularly updated online postings. The postings are generally made by one person and presented in reverse chronological order. The archive is generally made freely available to the public. The postings tend to consist primarily of raw text, but may also contain hyperlinks and other media, including picture, video and sound files. Often blogs allow for readers to post comments as well. In terms of content, blogs appear to fall into one of two major types: personal blogs in which an author discusses their own life and thematic blogs in which an author discusses a topic other than themselves. Popular subjects for thematic blogs include current events, politics, arts, entertainment, sports and technology, though in principle any topic is permissible. [Continue reading excerpts here or download PDF from here]