Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » discussions, reflections » Working Definition of Digital Genre (II)

Working Definition of Digital Genre (II)

Last Updated: 22 June 201426 June 2014 – 3 July 2014 -

— draft in progress —
In this blog post (that I will update seamlessly), I would like to pin down a working definition of digital genre that is appropriate for our computational experiments. The experiments I refer to are those that will be included in the forthcoming book “Computational Theory of Digital Genre” that I have already announced a while ago. With Michael Oakes and Georgious Paltoglou (both at University of Woleverhampton, UK), we are setting up experiments focussing on the computational modeling of the concept of digital genre.

Since the concept of genre is difficult to define in a simple way, because it inherits all the idiosyncrasies and ambiguities that characterize language and human communication in general, I list here what we need to account for when it comes to a computational theory of digital genre. The concept of genre can be analyzed from many different angles.  For instance, we have academic genres (e.g. those related to academic writing, such as conference papers and journal articles), web genres (e.g. all those genres that we find on the web, such as personal home pages and blogs), press genres (e.g. the different types of articles we find in a newspaper, such as editorials, interviews, and letters to the editor), and so on.

In order to frame our experiments with digital documents, we start by saying that the concept of digital genre is characterized by the following traits:

  1. A digital genre must have a name (e.g, “emails”, “search query logs”, “reviews”, “tweets” and “blogs” are genre names.
  2. A genre is recognized within a community (e.g. tweets are easily recognized by Twitter’s subscribers, while this genre might be unknown to computer-illiterate people)
  3. A genre can be produced or retrieved during a task
  4. A genre has specific conventions that are often reflected in the textual organization.
  5. A genre raises expectations about the document organization and the rhetorical purpose.
  6. A genre has a linguistic function and rhetorical purpose.
  7. There is a relation between the communicative situation in which a genre is used and the textual and linguistic traits of a specific genre.
  8. A documents may be assigned to several genres.
  9. Genres can be classified at different levels of granularity (e.g. super genres, genres and sub genres).
  10. A genre can change over time. It is an cultural artifact (culture here includes society, media, techonology, etc.).

 

So far, the computational models that have been used for Automatic Genre Identification were based on a few types of features, such as: character n-grams, Parts-Of-Speech, function words, content words, etc.

——
Other Genre Definitions
Genre (wikipedia)

Latest Research

Petrenz Philipp (2014) Cross-Lingual Genre Classification. PhD Thesis. University of Edinburgh (Institute for Language, Cognition and Computation School of Informatics), UK.

Christophe Clugston (2013) GENRE ANALYSIS OF SELF DEFENSE WEB ADVERTISEMENTS. Master Thesis. Payap University, Chiang Mai, Thailand.

Gunnarsson Mikael (2011) Classification along Genre Dimensions. Exploring a Multidisciplinary Problem. PhD Thesis. Swedish School of Library and Information Science, University of Borås, Sweden.

Mason Jane (2009) AN N-GRAM BASED APPROACH TO THE AUTOMATIC CLASSIFICATION OF WEB PAGES BY GENRE. PhD Thesis. Dalhousie University Halifax, Nova Scotia, Canada.

List of genres

Related Posts

Filed under: discussions, reflections · Tags: , , ,

12 Responses to “Working Definition of Digital Genre (II)”

  1. christophe Clugston says:

    I think it is point 8 that gives the most problems. There seems to be the need to finally decide if a text is leaning in a certain direction. I feel this is via its purpose and the purpose the reader (or searcher) is trying to accomplish.

  2. In my experience, “the reader” is not a single-opinionated unit. “The readerS” tend to disagree when they are asked to “name” and identify the genre of a document. See the results of this study: Zero, single, or multi? genre of web pages through the users’ perspective. Information Processing and Management (2008), and also Mark Rosso’s research (although Mark is more inclined to focus on the agreement rather than the disagreement).

  3. christophe Clugston says:

    Web marketers use very specific methods to get targeted traffic for their genre. Some one who is buying does have a purpose, this is why I feel advertising is very different than other genres. Text analysis of advertising has gravitated towards purpose as the unifying genre feature: to sell.

  4. marina santini says:

    Selling is one strong motivation… However it is not the only driving force behind genre classification. E.g. just think of the huge work done by librarians and by information scientists for digital libraries… There many genres and environments that are still profit-indipendent¨…

  5. marina santini says:

    From Applied Linguistics (LinkedIn Group)

    John House
    Founder at Meshiareni, LLC

    You might want to ask Kara McBride at Saint Louis Unviersity this question. You might also ask Rod Mitchell.
    ——————————–
    William Charlton
    Owner of etyMonda

    This is a classic (no pun intended) classification problem. We all know what a dog is and a cat is but what defines the dog genre versus the cat genre?

    One way to develop a set of classification rules is to list as many genres as seem apparent and then to apply Attributes and Values.,
    So for example

    Genre Attribute Value
    Email Transmission protocol SMTP
    Email Size limit (Mb) About 10
    Email Encoding ASCII (Base 64);HTML
    SMS Transmission protocol MAP
    Email Platform Multiple PC/Phone/online
    Email Workspace MS Outlook; Hotmail;Gmail;Eudora
    Tweet Workspace Twitter
    Tweet Size limit (Chars) 140 chars

    and so on.

    Eventually patterns will emerge and you may find that some terminology (i.e.Workspace) is wrong and needs modifying. You might also find that some (seemingly defining) characteristic (e.g. 140 chars for Twitter) has no defining value for any of the other genres and so Char length is actually not a relevant attribute. You may also find that some genres are just part of another genre etc etc.

    This is because to define a genre first you need to define the definitions.

    It is what humans do all the time on the fly, as in the dog v cat example.
    ———————
    William Charlton
    PS This is a non-trivial exercise.
    See http://www.nature.com/news/linnaeus-s-asian-elephant-was-wrong-species-1.14063
    and
    http://www.nhm.ac.uk/nature-online/science-of-natural-history/taxonomy-systematics/history-taxonomy/session1/
    ——————————————–
    William Charlton
    Owner of etyMonda

    As I have said, this is a non-trivial task.
    You will also need to consider if these “genres” are a fixed part of any sort of taxonomy and if so what is the structure of this taxonomy.

    Since the reason for his genrefication is in order to separate digital output in ways that separate the different forms such that they can be systematically analysed, this would be my starting point if I was involved in this task.
    So probably Encoding (including some indication of formatting extents), Delimiters, Audience, Usage (a bit vague but probably needed), Language(Natural) Language(Code), Message Type (again vague but I suspect something like this would be needed. Possible examples:Opinion, News, Report, Educative, Log).
    Message Type may = rhetorical purpose but I would not be sure.

    By formatting extents I mean, for example: CSV is generally ASCII but Excel allows formatting and images.
    Since CSV allows other encodings (UTF8 16 etc) but the two (plus any other spreadsheet formats) share a similar column – row paradigm, there may be a shared genre/ubergenre.

    Its a bit of a minefield and it is, as I’m sure is realised, useful to see how other genres are classified. I would look deeply at music genres and see how well they work for analysers of music. Jimmy Iovine (Dr Dre parter in Beats) has studied this and it is complex but he claims some success.
    http://evolver.fm/2013/07/01/preview-whats-up-with-project-daisy-music-service/
    Whether he uses fixed genres or not is not known.
    ————–
    Marina Santini

    Interesting, William. I will append your reflections as comments to the blog post, if you do not mind, so other people outside this linkedIn group can think about the problems you describe.
    As far as I am concerned, I would say that it is a drawback to use a single-genre classification approach that traps our mind in a tight dichotomy of the type: is it a dog or is it a cat? this approach fails when we have to decide (semantically) if a penguin is a bird or not. Does it fly like all the other birds? if not, we are compelled to say that it is not a bird…

    With digital documents and with music, I think the easy solution would be to go for a multi-label classification, so we are not compelled to take a stiff stance. For instance a personal home page can be a: home page, a piece of narrative, a short bio and a professional profile, etc, independently from audience, format and other facets. Similarly a piece of music could be classified as: jazz, blues and gospel (well… if it is the case :-) ). Just focussing on one single genre label might be misleading and inappropriate for many artifacts.
    You also propose a multilabel classification, but your labels have different facets, like audience, format etc. Which is also good, but not enough (in my experience)…
    ——————————-
    Katerina Xafis
    Training

    It’s interesting to compare the traits of digital genre with Steen’s attributes for genre:
    * domain (eg religion, science etc)
    * medium (eg spoken, written etc)
    * content (eg topics, themes)
    * forms (eg text structural patterns)
    * function (eg informative, persuasive etc)
    * type/rhetorical categories (exposition, description, narrative, argumentation)
    * register (linguistic characteristics)

    Steen, G. (1999). Genres of discourse and the definition of literature. Discourse Processes,28, 109-120.

    David Lee also pondered seriously over the definition of genre : http://llt.msu.edu/vol5num3/lee/default.html

    ———————-
    William Charlton
    Owner of etyMonda

    Marina, The penguin is a good example; which illustrates how it is easy to make the mistake of classification on one attribute. However, using the process I described above, a researcher would have also identified: mouth type = beak and forelegs = wing and possibly bone structure = hollow bones.
    Most undoubtedly multiple “tags” will be needed to useful categorise genres.

    Katerina’s heads’-up is also very useful.

    Of course you can use any of my input here, if you think it helps.

    Interestingly, I read a review of George Ezra http://www.independent.co.uk/arts-entertainment/music/the-bonus-track-billiemarten-strand-of-oaks-randall-wulff-george-ezra-9562402.html where his style was described as bright and breezy “pop” http://www.youtube.com/watch?v=VHrLPs3_1Fs
    That is a very superficial definition and completely misses the:
    Blues component
    http://www.youtube.com/watch?v=tea42XFGhPs
    Or the rock
    https://www.youtube.com/watch?v=hqn3aemN5V4
    Or Folk elements
    https://www.youtube.com/watch?v=8Mt5d5hSyv8
    On the strength of Simmy Richman’s review, who lamely accedes that he is too old for “this type” of music, I would not have given Ezra any more notice. Luckily I already knew his work; which is why I found the review so odd.

    So classification, especially facile classification can have dramatic impact. Better to be broad than narrow.

    —————–
    Katerina Xafis
    Training

    Quite right William – genre classification is serious business and the resulting audience expectation is the driving force in the world of entertainment.

    The ‘hybrid’ or ‘cross-generic’ genre is what you are referring to and it is particularly prevalent in the film industry as it attracts different kinds of groups by raising a much braoder range of audience expectation. What makes this genre stand out is its potential to include new attributes, which lend themselves to a need for re-classification and, ultimately, the creation of new genres: http://media.litmuse.net/essays/genre-its-hybrid

    ————-
    Katerina Xafis
    Training

    Hi Marina,

    I am trying to come to terms with the first quality of a digital genre:

    “A digital genre must have a name (e.g, “emails”, “search query logs”, “reviews”, “tweets” and “blogs” are genre names).” from Marina

    Let me explain why. Over 40% of twits are considered ‘babble’ , while another 40% or so are conversational, and a smaller percentage is news etc. The problem with this kind of classification is that news is found in the digital press as well, while conversational postings are found in many different social media, and ‘babble’ is commonplance even among ….babies. When it comes to blogs, they can be academic or informative, much like an updated encyclopaedia, they can be exploratory, or they can be totally personal like a diary, or full of ads and have a promotional quality about them. The fact is blogs evolve, much like their audience does, or die out depending on why they first came to be and whether the owner wants to/can keep updating it. Besides, books ( of all kinds and e-books for that matter) are all books but they are not a genre, so why are tweets or blogs considered genres?
    ———————–

    Marina Santini

    good and complicated question, Katerina… I will try to answer with some hints, since I am still pondering a lot on the definition of digital genre.

    One main confusion comes from the language itself, as stressed in the article by DLee that you also value as an important reference. Correctly, Lee points out: “Much of the confusion comes from the fact that language itself sometimes fails us, and we end up using the same words to describe both language (register or style) and category (genre). For example, “conversation” can be a register label (“he was talking in the conversational register”), a style label (“this brochure employs a very conversational style”), or a genre label (“the [super-]genre of casual/face-to-face conversations,” a category of spoken texts).”

    Another confusion comes from genre granularity (look at point 9 of the working definition). Personally, I think about genre granularity in terms of the prototype theory that includes 3 main levels: super genre, genre and sub genres. Now, would you say that a blog and a tweet are at the same genre level? or would you put the tweet at the level of a “blog post”? the news genre would be at which level then? super-genre maybe? so you could compare, say, news vs literature. Within super genres there can be many genres. Within news, you probably would include editorials, letters to the editor, reports, reportages, feature articles etc. Within literature, you have poems, novels, short stories, etc.

    If you then would like to assess the quality or reliability of information, this is something that, in my opinion, is not related with the use of a particular genre, but to the person who uses the genre. Nowadays, all politicians, ministers and even the Pope send out tweets and write blog posts. Is the information they spread reliable or not? Well, we could say that they write opinionated texts, regardless the genres they use… I am inclined to see genres as containers, each with a different shape (ie. the genre conventions). The reliability of the “content” of these containers seems to be independent from the shape the container itself… There is something called “genre colonization” that has been little explored, and it would maybe worth investigating more in the digital world.
    ————————————–
    Katerina Xafis

    It seems to me Marina that a new classification level is required between SUPER GENRE and GENRE.
    —————

    Katerina Xafis

    Do you think the ‘container’ of the digital genre should be examined in terms of how the purpose of the text type is culturally created? For example, is the purpose of the genre already in place requiring specific structural elements or is it created in an ad hoc or directed manner as a result of social interaction? In other words, is it interactive and to what extent can participants influence the shape and size of its purpose and how does this then affect the structural elements in the text type?
    ———————————

    Marina Santini

    There is a strong dependence (i think) betw the technological medium and the conventions of a genre. For ex, tweets’ genre conventions are shaped by the the allowances of the Twitter platform (number of characters, hashtags, jargon, etc). But some media/technologies are more flexible than others. For ex, blogs give the users much more freedom to decide about the interaction or the communication style. For some other genres, it is the community that decides and not the medium. For ex, the compositional structure of an academic paper is not decided by the limitations or allowances of the technology, but by the intellectual requirements of the academic community…
    ————————

    Katerina Xafis

    This seems to be the complex area: whether it is the technological medium or the community or a bit of both shaping and re-shaping the purpose(s) of the text type(s).

    Having said that, an email can contain a report, a recount, a procedure, an exposition, a discussion or any other kind of genre for that matter (as per SFG definition of ‘genre’). Indeed, the possibilities of what it could contain are endless. As you said, it is more of a container, or, as I see it, a channel of communication. Extremely interesting to see where your research leads you.
    ———————————

  6. christophe Clugston says:

    I think the norm is going to be a Hybrid genre. Using taxonomic classification requires a top down paradigm like in Zoology. Again speaking about advertising, it is clearly a hybrid genre (embedded texts, embedded genres this can all be found in my research) and its reason to exist (its purpose) is to sell. Off line creators of advertising have been stating that for more than 100 years. The digital genres of advertising are no different. The Sub Genre becomes the most salient feature: selling cars, houses, computers. Nonetheless, they all share progenitors that are the same–they do, however, vary at the end as much as a Chihuahua does from a wolf (although it cannot be denied that they carry many similar traits).

  7. Interesting, Chris…

  8. Dear Marina, Are you familiar with “Genres in the Internet
    Issues in the theory of genre” Edited by Janet Giltrow and Dieter Stein?
    https://benjamins.com/#catalog/books/pbns.188/main
    Incidentally, I am looking for a corpus of suicide notes for rhetorical genre analysis. Could you suggest a source? Thanks. Natasha

  9. Hi Natasha,

    yes, I read and reviewed “Genres in the Internet” and some to the chapters are really interesting (http://www.forum.santini.se/2011/12/book-review-genres-in-the-internet-2009/).

    As for a corpus of suicidal notes, I suggest reading this master thesis:

    AUTHOR: Tatiana Prokofyeva; [2013]
    KEYWORDS: Language use Suicide texts Discourse Analysis Linguistics;
    ABSTRACT:

    Suicide texts are the traces left by their authors for the public allowing them to understand the causes of the desire to commit suicide, regardless of whether such notes preceded successful suicide attempts or not. The types of such texts can vary dramatically in emotional expressiveness, be it a suicide note handwritten by the author or a short post typed on a web forum dedicated to suicides. While one text can be evidence of a successful suicide attempt, the other may point to a deeply depressive state which may or may not lead to a suicide attempt in future. The main questions this study aims to answer are: (1) what is the difference between the two above-named types of suicide texts (‘suicide notes’ and ‘suicide posts’) and (2) how is it expressed linguistically? Previous works on suicide texts have been of significant importance and have managed to investigate the differences between suicide notes of the attempters and those who completed suicide (Joiner 2002) as well as underline the typical features of genuine suicide notes in comparison to fabricated suicide notes. However, no studies indicating the differences between the ‘suicide notes’ of successful suicides and the ‘suicide posts’ of authors exhibiting various degrees of depressive behavior have previously been conducted. In this thesis, the comparative analysis of ‘suicide notes’ left by those successful in their attempts and ‘suicide posts’ composed by authors with unknown fates has been carried out with the help of discourse analysis. Both types of texts have been examined from such linguistic levels as semantics, pragmatics and syntax. The results show several distinctive features peculiar to each type. While providing a clear reason for committing suicide in the one case contrasts with detailing a number of causes for depression in the other, further differences exist in regard to expressing such emotions as (1) fear of life, (2) relief, (3) lack of hope and (4) lack of doubt versus displaying such emotions as (1) fear of death, (2) preserved desire and (3) doubt. An easy to follow structure and purposeful past tense usage in suicide notes stands in contrast to the allusions to previous suicide attempts and indistinguishable pattern found in suicide posts. At the same time, specific punctuation signs were found to be peculiar mainly to the suicide post type of text. The results of the research also demonstrate the necessity for further investigation of the characteristic features of different types of suicide text as well as their classification. Moreover, the study indicates the possibilities of tracing the probable transformation from ‘suicide posts’ to ‘suicide notes’ which may well serve for purposes of suicide prevention, especially if an additional category, i.e., notes written by survivors, is added to the analysis.

    Contact Tatiana directly and see if she can tell you more about the corpora she used… Let me know ifs you need additional hints. Best wishes, Marina

    • Natasha Artemeva says:

      Dear Marina,

      I have only just now discovered your response to my message. Thank you very much! I greatly appreciate it.
      Another source on internet genres, which you have probably read, is Into the Blogosphere (it’s an online book in open access). It’s written from the North American perspective of Rhetorical Genre Studies, which I find very productive.

      I’ve read Tatiana’s thesis, but have not been able to locate her email. Do you by any chance have it? Is she working at the same university?
      Thanks again.
      Cheers,
      Natasha

    • Natasha Artemeva says:

      A link to “Into the Blogosphere”
      http://blog.lib.umn.edu/blogosphere/

      Natasha

Leave a Reply

*