Abstract: Variation Among Blogs: A Multi-dimensional Analysis

by Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova

In: Genres on the Web Computational Models and Empirical Studies
Alexander Mehler, Serge Sharoff and Marina Santini
Text, Speech and Language Technology
Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9


This chapter uses multi-dimensional analysis to investigate functional  linguistic variation in internet blogs, with the goal of identifying text types that are  distinguished linguistically. A 2 million word corpus of blogs written in American  English, sampled across a wide range of topics, is analyzed for this purpose. The  corpus is tagged for grammatical information and a factor analysis is carried out to  identify the major linguistic patterns of co-occurrence across this corpus. The  resultant factors are interpreted as underlying dimensions of functional linguistic  variation. The dimensions are subsequently used as predictors in a cluster analysis,  which identifies the text types that are linguistically well-defined in this domain of  use. These texts types are interpreted functionally by reference to the typical thematic  domains and communicative purposes of the blogs grouped into each type. Two main  sub-types of blogs are identified: personal blogs and thematic blogs. 

1.  Introduction

A blog, short for a weblog, is a website containing an archive of regularly updated  online postings. The postings are generally made by one person and presented in  reverse chronological order. The archive is generally made freely available to the  public. The postings tend to consist primarily of raw text, but may also contain  hyperlinks and other media, including picture, video and sound files. Often blogs  allow for readers to post comments as well. In terms of content, blogs appear to fall  into one of two major types: personal blogs in which an author discusses their own  life and thematic blogs in which an author discusses a topic other than themselves.  Popular subjects for thematic blogs include current events, politics, arts,  entertainment, sports and technology, though in principle any topic is permissible. [Continue reading excerpts here or download PDF from here]

