— draft in progress —
In this blog post (that I will update seamlessly), I would like to pin down a working definition of digital genre that is appropriate for our computational experiments. The experiments I refer to are those that will be included in the forthcoming book “Computational Theory of Digital Genre” that I have already announced a while ago. With Michael Oakes and Georgious Paltoglou (both at University of Woleverhampton, UK), we are setting up experiments focussing on the computational modeling of the concept of digital genre.
Since the concept of genre is difficult to define in a simple way, because it inherits all the idiosyncrasies and ambiguities that characterize language and human communication in general, I list here what we need to account for when it comes to a computational theory of digital genre. The concept of genre can be analyzed from many different angles. For instance, we have academic genres (e.g. those related to academic writing, such as conference papers and journal articles), web genres (e.g. all those genres that we find on the web, such as personal home pages and blogs), press genres (e.g. the different types of articles we find in a newspaper, such as editorials, interviews, and letters to the editor), and so on.
In order to frame our experiments with digital documents, we start by saying that the concept of digital genre is characterized by the following traits:
- A digital genre must have a name (e.g, “emails”, “search query logs”, “reviews”, “tweets” and “blogs” are genre names.
- A genre is recognized within a community (e.g. tweets are easily recognized by Twitter’s subscribers, while this genre might be unknown to computer-illiterate people)
- A genre can be produced or retrieved during a task
- A genre has specific conventions that are often reflected in the textual organization.
- A genre raises expectations about the document organization and the rhetorical purpose.
- A genre has a linguistic function and rhetorical purpose.
- There is a relation between the communicative situation in which a genre is used and the textual and linguistic traits of a specific genre.
- A documents may be assigned to several genres.
- Genres can be classified at different levels of granularity (e.g. super genres, genres and sub genres).
- A genre can change over time. It is an cultural artifact (culture here includes society, media, techonology, etc.).
So far, the computational models that have been used for Automatic Genre Identification were based on a few types of features, such as: character n-grams, Parts-Of-Speech, function words, content words, etc.
Other Genre Definitions
Petrenz Philipp (2014) Cross-Lingual Genre Classification. PhD Thesis. University of Edinburgh (Institute for Language, Cognition and Computation School of Informatics), UK.
Christophe Clugston (2013) GENRE ANALYSIS OF SELF DEFENSE WEB ADVERTISEMENTS. Master Thesis. Payap University, Chiang Mai, Thailand.
Gunnarsson Mikael (2011) Classification along Genre Dimensions. Exploring a Multidisciplinary Problem. PhD Thesis. Swedish School of Library and Information Science, University of Borås, Sweden.
Mason Jane (2009) AN N-GRAM BASED APPROACH TO THE AUTOMATIC CLASSIFICATION OF WEB PAGES BY GENRE. PhD Thesis. Dalhousie University Halifax, Nova Scotia, Canada.
List of genres
- The WebGenre Wiki (WGW)
- List of literary genres (wikipedia)
- List of genres (wikipedia)
- Media Genres
- A Brief List of Genres
- Opinion Retrieval and Ranking: the creeping and ineluctable force of Genre
- Towards a Computational Theory of Digital Genre (I): Working Definition of Genres for Computational Purposes
- Seminar – Towards Contextualized Information: How Automatic Genre Identification Can Help
- Summary: Where is the future? From big data to contextualized information
- The Path Forward: From Big Unstructured Data to Contextualized Information
- Overview: Automatic web Genre Identification (AGI)