Book Review: Multimodality and Genre (2008)

Review of  Multimodality and Genre (also available in the LinguistList Reviews archive)
 Reviewer:  Marina Santini 

Book Title:  Multimodality and Genre
Subtitle: A Foundation for the Systematic Analysis of Multimodal Documents
Book Author:  John Bateman 
Publisher:  Palgrave Macmillan 
Year: 2008

The volume ”Multimodality and Genre” by John Bateman is a monograph that presents a framework for the page-based analysis of multimodal documents, such as magazines, books, web pages, newspapers and similar. Multimodal documents combine text, graphics and pictures in more or less complex layouts. Readers are shown an approach that breaks down multimodal documents into configurations of basic elements to uncover meaning. This approach was originally developed within the Genre and Multimodality (GeM) project. A major claim of the book is that there is a need of detailed empirical analysis in order to advance our understanding of the complex multimodal meaning processes involved in multimodal documents. The concept of genre is proposed as a crucial theoretical construct for exploring document meaning making.  The book is suitable for genre analysts, discourse analysts, document designers, computational linguists and semioticians.

The book contains seven chapters, a table of contents, lists of tables and figures and indices. Number of pages: 312.

Chapter 1 (”Introduction: Four Whys and a How”) contains the purpose and the motivation of the volume. The aim of the book is to present a framework for the empirical and reproducible page-based analysis of multimodal documents. Multimodal documents are made of a variety of visually-based modes that create a net of communicative goals. The proposed framework can be used to explore how the interaction of multiple modes can be combined within individual artefacts. In particular, the framework is designed to help analysts understand what is gained in the combination of different modes and what kind of semantic relationships they establish. The motivation of this framework is the assumption that the combinations of elements signal meaningful relationships that would not be revealed by those elements in isolation.

In Chapter 2 (”Multimodal Documents and their Components”) the author places different approaches to document analysis into a common context. The author argues throughout the chapter that providing sound means for determining the ”components” of a page or a document is a crucial prerequisite for carrying out empirical investigations of the processes of interpretations, for critique, and for comparing different approaches. Document components must be identified and described in a reproducible way. The main parts of Chapter 2 focus on 1) the page as object of interpretation, 2) the page as object of perception, and 3) the page as object of production. In the last summarizing section, the author focuses on two complementary perspectives from which to view what happens within a multimodal page. The first sees graphic design as ‘macro-punctuation’, similar to text-based typography or formatting. The second perspective considers pages as visual entities and document design as a process of visual decomposition. The author concludes by saying that allowing both perspectives is the necessary prerequisite for capturing a fuller range of possibilities.

Chapter 3 (”The GeM Model: Treating the Multimodal Page as Multilayered Semiotic Artefact”) combines the perspectives described in the previous chapter in order to articulate an account of document components that is sufficiently well-defined to support reproducible analyses. The overall aim is ”to work towards functionally supportable hypotheses by means of sufficiently fine-grained formal details so as to allow empirical investigation, verification and refutation” (p. 107). For the analyses, the author suggests a number of layers of description. Each layer of the description of a page artefact tells us something different about how the page is constructed. The particular layers that the author has isolated in his investigations as being crucial are five, namely:

  1. the GeM base, including base units such as: sentences, headings, titles, headlines, icons, table cells, list items, list labels, etc. (Chapter 3, p. 111)
  2. the layout base, including layout segmentation, realisation information, and layout structure (Chapter 3, pp. 116-129);
  3. a concrete example of layout analysis is provided with a page from a Dorling-Kindersley guide to Paris that describes the parts of the Louvre (Chapter 3, pp. 130-143)
  4. the rhetorical base (Chapter 4)
    the genre base (Chapter 5)
  5. the navigation base (Chapter 6)

Each layer defines its own basic set of units as well as relations and structures defined over these units. The relations between layers are left open to empirical investigations. The author does not impose any particular inter-layer relationship beyond the simplest assumption that some configurations of units in one layer might be expressed in terms of some configurations of units within other layers. Then the chapter narrows down and presents the layout layer.

Chapter 4 (”The Rhetorical Organization of Multimodal Documents”) defines methods for exploring how configurations of elements take on significance over and above their spatial proximity, visual similarity or difference. The author states that one common approach to describing the functions communicated by a combination of distinct elements on the page is to employ notions of rhetorical organization. He adopts one specific account of rhetorical organizational called Rhetorical Structure Theory (RST), which is frequently used in linguistic approaches to explain textual coherence. The author argues that extending this approach to encompass multimodal rhetorical organization provides the required analytic hold to multimodal analysis. In the same way ”as segments of a texts contribute to that text’s coherence in systematic and specifiable ways, so can segments of a multimodal document, involving pictures, diagrams and texts, be related in an analogous manner also” (p. 144).

The chapter contains a brief introduction to RST (pp. 144-151) and how this theory is used within the GeM rhetorical layer (pp. 151-163). It also includes example analyses of rhetorical relations between layout units (pp. 163-174). The author summarizes the chapter by saying that when a document starts to utilize the full two-dimensional spatial extent of the page for expressing rhetorical and other functional organizations, we move into a different semiotic mode: one which he terms ‘page-flow’. Page-flow can combine elements in any of the semiotic modes appearing on a page, including ‘text-flow’ (i.e. running text), diagrams, graphs, etc. It adds to the individual contributions of these elements the possibility of a rhetorical unity supporting the communicative intentions of the document. Without this level of description, he says, ”we are not in a position to explicate many of the spatial distribution decisions taken in page-based documents” (p. 176). However, at the present time, he admits, it is an open question as to how much of the detail of rhetorical organisation is expressible visually. The resources that are actually employed in any document and the ways in which they are distributed around the semiotic modes activated also depend to the type of document – or genre – employed. For this reason, the author considers the concept of genre important and deals with it in the next chapter.

Chapter 5 addresses the issues of comparison and constraints. This means that in the analysis of a multimodal document, one needs to consider the sets of documents that it resembles and the sets of documents with which it stands in contrast. The author argues that, in this respect, genre is a fundamental concept for the analysis of meaning in multimodal documents. Usually readers allocate documents to particular classes of documents, and those classes bring with them certain interpretive frames and expectations. These frames guide readers to make sense of what they read in the document. Moreover, the decisions taken during document production rely on the conventions and practices established for the class of documents to which the document is meant to belong. Therefore, effective document use requires a process of negotiation between the norms for the document type – i.e. the genre – and the functional requirements of the specific document. Since the extension of traditional notions of genre to multimodal documents is not straightforward, the author suggests that the framework for multimodal genre should be drawn from linguistically-motivated accounts of genre, because, from this perspective, ”genre offers a method for relating any individual document encountered to its ‘generic’ context by means of explicitly identifiable design decision” (p. 182). The chapter continues by presenting the state of the art of views on genre (pp. 183-217), then three basic modes of genre representation are discussed: typological (genre typology, pp. 219-223), topological (genre topology, pp. 223-225) and faceted (the facets, pp. 218-219). The typological view of genre can be represented as classification networks; topological accounts are characterized in terms of variation; the faceted approach is midway between the typological and topological and builds on facets, which are semi-independent classification systems. Then the author explains his own representation of genre in Section 5.3. Here the author stresses the importance of pursuing a notion of genre that admits of fluidity and change while still imposing sufficient constraint to retain predictive value when several semiotic modes are combined. His sources of inspiration are Waller (1987), who stresses the importance of genre for typographical work expressed through a set of choices, and Lemke (1999), who suggests building trajectories of similarity across superficially different genres. The notion of genre proposed by the author is monitored at work in relation to two sets of loosely related documents. First, it is shown how the documents can be assigned to similar and contrasting genres, and second, it is described how tracking these kinds of documents over time starts to reveal generic trajectories of changes. The analyses presented draw on all aspect of the GeM model introduced in the previous chapters.

Chapter 6 (”Building Multimodal Document Corpora”) contains a characterization of corpus-based linguistics, the state of the art in linguistic corpora, and the suggestion of using the GeM model as a corpus annotation scheme. The author emphasizes how the adoption of the methods of corpus-based linguistics is a crucial step because documents must be seen against the background provided by relevant co-generic documents. The author sees each layer of the GeM model as a stand-off layer of annotation decomposing the documents analysed. The layers themselves are all defined in terms of XML descriptions. This allows analysts both to store the information following the GeM model and to use that information for constructing complex corpus queries that freely combine information from the layers of the GeM model. The author sees the ability to locate patterns that hold across distinct layers of the model as an essential precondition for locating genre characteristics.

Chapter 7 (”Conclusions and Outlook: What Next?”) summarizes the proposed framework for multimodal document analysis and puts forward three directions for future work, namely 1) extension to dynamic documents; 2) a three-dimensional layering of the layout; finally 3) temporal development in the image-flow mode.


The book offers a useful overview and summary of the achievements of the GeM project (run between 1999 and 2002) and its further development. It poses many challenging issues and can be considered a required reading (together with Biber et al. 2007; Bruce 2008; Heyd 2008; Martin and Rose 2008) for those who currently try to pin down the concept of genre for empirical or computational research.

However, the approach proposed in this book raises a few questions:

1) Although the author points out several times that ”this work is still very much in its infancy” (p. 247) and that corpora-based investigation must be carried out in the future, the qualitative analyses on only a handful of documents somewhat question the practical feasibility of the proposed approach: why does the GeM corpus annotated in so many years of research include only 10 documents? Is this approach applicable at all on a large scale or is it wishful thinking?

2) Although I agree on many points of the characterization of genre given by the author (for instance, the linguistic-motivated approach to genre, the predictivity power of genre, the presence of trajectories of similarity across superficially different genres, etc.), it is not clear how to create reproducible genre classes for empirical corpus-based studies and analyses. In certain fields, like Automatic Genre Identification, researchers are engaged in finding representative genre labels and struggling to create collections of documents that instantiate these genre labels (e.g. see Sharoff, 2010). The author only says that: ”staying within intuitive genre labels, such as, for example, ‘newspaper’ or ‘guide book’, is far from optimal precisely because it creates artificial boundaries that the dimensions of variation manipulated within genres do not necessarily respect” (p. 229). So, one spontaneous question is: what are the ideal genre labels one should work with?

3) Although the author says that we should aim ”not only to have an account of genres that exist, or have existed, but also to suggest properties for genres that do not (yet) exists” (p. 225), it seems difficult to apply the diachronic approach described in Section 5.5 to emerging genres. When do we decide that a genre is coming into existence so that we can identify its representative features?

A puzzling fact in the book is that the notion of genre for multimodal documents is a little bit hard to extrapolate. It is placed in Section 5.3, but it is so much interspersed with citations and digressions that it takes some time to be detected and isolated. It would have helped to have a summarizing section or subsection where all the characteristics of genre are listed and motivated (similar to Swales 1990:45-58).

The book contains stimulating and provocative statements, for instance the proposed detachment of genre from culture:  ”we cannot simply compare texts on the basis of the assumption that they constituted a single time-extended genre: there are variables at work. Genres must be described independently of the particular use that a culture makes of them. Genres do not merely ‘reflect’ conventions: each instance of a particular genre helps create conventions and hence generic expectations” (p. 248).

In conclusion, the book is an important piece of the still unsolved genre riddle.


Biber D., Connor U. and Upton T. (eds.) (2007). Discourse on the Move. John
Benjamins Publishing Company.
Bruce I. (2008). Academic writing and genre: a systematic analysis. Continuum.
Heyd T. (2008). Email Hoaxes. Form, function, genre ecology. John Benjamins.
Lemke J. (1999). Typology, topology, topography: genre semantics. MS University
of Michigan.
Martin J. and Rose D. (2008). Genre relations: mapping culture. Equinox.
Sharoff S. (2010). In the garden and in the jungle: comparing genres in
the BNC and Internet. In Mehler A., Sharoff S. and Santini M.  (eds) (2010).
Genres on the web: Computational Models and Empirical Studies. Springer.
Swales J (1990). Genre Analysis. English in academic and research settings.
Waller R. (1987). The typographical contribution to language: towards a model of
typographic genres and their underlying structures. PhD thesis. University of
Reading, Reading, UK.

Leave a Reply

Your email address will not be published. Required fields are marked *