Book in Preparation: A Computational Theory of Digital Genre

Book in preparation: A Computational Theory of Digital Genre by Marina Santini

The book lists, examines and develops the key concepts necessary to build a novel, intuitive and robust definition of digital genre for computational purposes. The newly proposed definition is the tenet of the computational theory underlying computational models for automatic digital genre classification. The book is divided into six parts, each one discussing exhaustively issues that have been neglected or considered to be too controvertial to find any theoretical or pragmatic agreement among scholars or researchers. The book provides not only theoretical foundations, but also a number of use cases, corpora/datasets, and computational models that readers can re-use for their own experiments to evaluate the validity of the theoretical and practical solutions proposed in this book.

Preliminary Table of Contents


Chapter 1. Introduction
Chapter 2. Genre Studies: Origins and Evolution
Chapter 3. Genre: Fields of Digital Application


Chapter 4. The Concept Of Digital Medium
Chapter 5. The Concepts Of Community, Culture, Society
Chapter 6. The Concepts Of Conventions And Expectations
Chapter 7. The Concept Of Task
Chapter 8. The Concept of Genre Evolution
Chapter 9. The Concept of Predictable Linguistic Choices
Chapter 10. Definition of Digital Genre


Chapter 11. Genre and Text Types
Chapter 12. Genre and Register
Chapter 13. Genre and Sublanguage
Chapter 14. Genre and Sentiment
Chapter 15. Genre and Domain
Chapter 16. Genre and Topic


Chapter 17. Classification: A Psycological and Perceptual Point of View
Chapter 18. Classification: A Practical Point of View
Chapter 19. Automatic Text Classification
Chapter 20. Automatic Genre Classification
Chapter 21. A Computational Theory for Automatic Digital Genre Classification


Chapter 22. Genre and the Web

• Use cases
• Genres, Corpora/Datasets, Features, Computational Models
• Evaluation and Discussion
• Summary and Conclusions

Chapter 23. Genre and Social Media

• Use cases
• Genres, Corpora/Datasets, Features, Computational Models
• Evaluation and Discussion
• Summary and Conclusions

Chapter 24. Genre and Search

• Use cases
• Genres, Corpora/Datasets, Features, Computational Models
• Evaluation and Discussion
• Summary and Conclusions

Chapter 25. Genre in Organizations and Intranets

• Use cases
• Genres, Corpora/Datasets, Features, Computational Models
• Evaluation and Discussion
• Summary and Conclusions

Chapter 26. Genre and the Digitalized Work Place

Use cases
• Genres, Corpora/Datasets, Features, Computational Models
• Evaluation and Discussion
• Summary and Conclusions


Chapter 27. Conclusion and Reflections
Chapter 28. Future Directions

12 comments for “Book in Preparation: A Computational Theory of Digital Genre

  1. Graham Bennett
    14 January, 2013 at 13:42

    Hi Marina,
    I completed a Masters (Coursework) IR thesis at RMIT University (Melbourne, Australia) in 2006.
    My feedback about your book TOC:
    – Thorough, in-depth discussion of topics – with practical examples/datasets – perfect for a researcher.
    – List of topics may be too long (26 chapters)
    – Have you considered a publisher who can drip-feed chapters to interested readers who purchase a ‘Early Release’ or other e-book licence? May financially support the project.
    Best wishes, and good luck!

  2. Marina Santini
    14 January, 2013 at 13:52

    Thanx for your feedback, Graham.

  3. Marina Santini
    14 January, 2013 at 13:55

    From Corpus Linguistics LinkedIn Group:

    Horst Bogatz • Dear Ms Santini,

    I think it would be a good idea to contact Ms Gertrud Faaß. She is a lecturer at the University of Hildesheim, Department of Information Science and Natural Language Processing (IS-NLP) ( She does a lot of research on computational theory and, at present, is analysing my bilingual electronic dictionary of English collocations (ARCS).

    Horst Bogatz

  4. 14 January, 2013 at 14:19

    Good luck with this Marina!

    What kind of audience did you have in mind (researchers, practitioners, students etc.)?

  5. Marina Santini
    14 January, 2013 at 14:34

    Hi Tony,

    Thanks! The audience of the book includes the three groups of readers you have mentioned: students, researchers&scholars, and practitioners. Since the computational view of genre is still a niche topic, I think it is possible to cover the interests of these three groups at the same time.

    • Bazilah
      15 January, 2013 at 09:37

      Hi Marina,

      I do agree, for beginner scholar like me, this kind of book will be helpful to understand the computational view for both perspectives, theoretical and practical.

      Good luck for this publication.

  6. Marina Santini
    14 January, 2013 at 15:33

    From UTMA – Ubiquitous Text Mining and Analytics LinkedIn group (

    Khaled Nagati • I think we can use the finite automata to simulate both the Swalesian genere model and the two dimensional genere model proposed by Inger & Anne

  7. Marina Santini
    14 January, 2013 at 17:34

    From Similarity search LinkedIn Group

    Adrian Walker • That’s an important topic, and an ambitious project.

    For context, you may like to look at

    and the associated online system and examples.

    Hope this helps

  8. Marina Santini
    14 January, 2013 at 19:58

    From Text Mining LinkedIn group:

    John C. Sloan • Hello Dr. Santini,

    I did follow your link to the TOC and initial comments for the book you are writing, and here are some of my remarks:
    (i) Organization: A comment to the effect that you have too many chapters depends, ironically, on the intended *genre* of your book. My impression is that this will be a handbook that rounds out the state of the art in automated genre identification and, interestingly, evolution. It is not unusual for handbooks to have 25 or 30 chapters.
    (ii) Venue: In my opinion, the closest one would be the CRC Handbook Series. The publisher will not look at anything by anyone not listed in DBLP, and then only if your refereed work corresponds to the content of the proposed handbook. Since you’ve easily met both those requirements, may I recommend that publisher. (As for myself, I would have failed the second requirement). Hopefully, CRC Press had evolved like IEEE Press by going all electronic.
    (iii) Ambitious effort: One commenter observed that this work is ambitious. Actually, the Handbook Genre works in your favor. By their nature each chapter of a handbook can be authored by someone else, with you as the editor. The effort here might be to minimize the appearance that the work was written by many people through extensive editing on your part. Getting such things as enforcing the same mathematical notational conventions is a non-trivial task. The typical chapter is written by a pair of researchers — one from academe’ and one from industry. Both of these folks must have relevant work listed in DBLP.
    (iv) Formalisms: As I recall, your proposed Chapter 8 seems to come closest to a formal treatment. From what little I have read *around* this topic you need to formulate a number of problems/subproblems in automatic genre identification in terms of some formal models. The obvious ones to me would be drawn from Linear Algebra. As you get further into this, you might find that some techniques will be drawn from various logics as well. Also, formal treatments of a notion of genre *evolution* might require some kind of state-transition model.
    (v) Case studies: In genre evolution I would be curious about exactly how the genre known as the Encyclopedia had evolved from paper form to electronic form, and then how that electronic form had evolved, so a case study of Wikipedia might be appropriate.

    These were just some of my initial thoughts.

    — John C. Sloan, PhD

  9. Marina Santini
    15 January, 2013 at 10:57

    From UTMA – Ubiquitous Text Mining and Analytics LinkedIn Group:

    Khaled Nagati • 1) Are you classifying genres using a DFA? We can design a DFA to classify genres.
    2) What do you mean by text classification? In other words, what are the classes you want to classify texts into?
    2 hours ago• Like

    Marina Santini • Hi Khaled, if by DFA you mean deterministic finite automata, the answer is I do not know yet. The initial experiments are based on machine learning (supervised and unsupervised) and probabilistic models. DFA are not excluded, but I have not made up my mind yet.

    The genre classes I want to deal with are old and novel genres in digital form. From tweets to emails, from Facebook posts (if publicly available) to online newspaper articles, such as editorial, letter to the editor, op-ed, etc.

  10. Marina Santini
    15 January, 2013 at 11:20

    From The WebGenre R&D Group LinkedIn Group:

    Chaker Jebari • Thank you marina. It is a good idea to prepare a book about digital genres. The topics covered in the book seem exhaustive. But i would like to know if you will talk about multi-language genre classification. Thanks

    Marina Santini • Hi Chaker,

    may I ask you what you mean by multi-language genre classification? are you referring to documents written in different languages or to genre corpora containing the same genres in different languages and the comparison of performance across the different languages?

    Chaker Jebari • I mean documents written in different languages.

    Marina Santini • Can you give me an example of such a genre? maybe Facebook posts? or are you referring to something else?

    Chaker Jebari • yes for example facebook posts or tweets.

    Marina Santini • Yes, I will discuss this kind of multi-linguality.

  11. 20 January, 2013 at 12:31

    From Digital Humanities / Humanities Computing LinkedIn Group:

    Gideon Burton • I’m fascinated by this prospective book and look forward to its completion. It appears to me to be working mainly from the point of view of literary studies and genre theory. While that is appropriate, I think that analogous concepts from computer science and from web culture need to be brought in. For example, in computer programming there are highly codified categories (frameworks, data types, variables, libraries, platforms, etc.). Among these, algorithms are perhaps the most dominant and influential genre.

    Similarly, in web culture there are standards (everything W3C identifies, including HTML, CSS, Ajax, URIs, XML, etc.). XML schema and DTD (document type definitions) are especially relevant, since they frame types of content and essentially automate production and search of content genres. Thees also occur on the level of principles, such as metadata, a defining genre of web content and contemporary computation.

    Something I think extremely relevant as far as this goes is that web culture, unlike print culture, must be dealing with the everyday reality of change and the need to retool one’s tools to keep pace with the evolution of technology, data, media, etc. Indeed, there is a very distinct genre of writing that has emerged precisely to deal with the need to get rapid consensus from interested parties on the emergence of developing standards, the Request for Comments (RFC). Stewart Brand’s famous RFC proposing the world wide web is an example.

    Genre in the digital age, in my opinion, is not meaningful unless the dynamics of genre development are dealt with. The web both stabilizes and destabilizes simultaneously, and genres are often provisional (hence, so many things in “beta”). There are genres of activities (such as standardization, deprecation) that are so linked to content that it’s hard to understand them statically. RSS is extremely important in this regard and the way it handles subscription/syndication typifies the connection between static and dynamic.

    A kind of super-genre in the digital age (perhaps more of a principle than a specific type) is the reflective or “meta-” genre. This is most evident in the semantic web with the structure of metadata. But it also happens in less structured ways, such as the emergence of reviews and reviewing, or the widespread discourse about the web or of computation. We aren’t just using these tools; we discuss them as we do.

    A last point is that there are some very clear genres of content production that are now in place, and I’m not sure if these are present in the proposed chapters. For example, the blog post, the tweet, the comment, as well as wikis, podcasts, web series, etc. Perhaps I’m thinking of things along very different lines from your concept, however.

    Marina Santini • Hi Gideon,

    I thank you very much for your reflections and thoughts. I will include them in my list of discussion points.
    Cheers, marina

    Nyasha Mboti, Ph.D • Hi Marina,

    I like the table of contents in its entirety, with the exceptions:

    1) Chapter 1, Introduction, should ideally introduce the notion of ‘Computational Theory’, not just be ‘Introduction’.
    2) Chapter 2 and Chapter 8 seem to be repetitive? (i.e. in relation to ‘Genre Evolution’)
    3) In Part III or Part V, a chapter on Genre and Data Types may be an option.
    4) Lastly, slightly disappointed that ‘multimedia’ genres (visuals, sound) seem not to fit the TOC. My own work is in the area of ‘visual corpora’, where visual corpora (bits and fragments of pictures from everyday life) can be mined for meaning.

    All the best

    Kind regards


    Marina Santini • Dear Nyasha, I appreciate your feedback. I will think about introducing multimedia genres, or at least I will try to explain why they are not included… big thanx

Leave a Reply

Your email address will not be published. Required fields are marked *