Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Archive

Towards a Computational Theory of Digital Genre (I): Working Definition of Genres for Computational Purposes

Towards a Computational Theory of Digital Genre (I): Working Definition of Genres for Computational Purposes by Marina Santini – Last Updated: 29 Oct 2012 1. What is a (textual) genre? • A genre is a class of texts with similar communicative, textual and linguistic features. 2. What characterizes a genre? A genre: • Must have a name • Must be recognized within a community • Must be produced or retrieved during a task • Must have conventions • Must raise expectations • Can change over time. It is an cultural artifact (culture here includes society, media, techonology, etc.) 3. What characterizes a digital genre? • The same characteristics listed above. • A digital genre is any kind of genre that has a digital form, such as emails, chats, online academic papers, online newspaper articles, blogs… • A digital genre can be any paper genre … Read entire article »

Filed under: dialectic, discussions, dissemination, reflections

Impact of Sociolinguistics in Opinion Mining Systems

Signed post by Alexander Osherenko, Socioware Development, osherenko@socioware.de Full paper: Considering Impact of Sociolinguistic Findings in Believable Opinion Mining Systems Proceedings of The Fifth International Conference On Cognitive Science. 2012. Kalinigrad, Russia (http://www.informatik.uni-augsburg.de/~osherenk/final_kalinigrad.pdf) Opinions are frequent means of communication in human society and automatic approaches to opinion mining in texts attracted therefore much attention. All in all, most approaches apply data mining techniques and extract lexical features (words) as reliable means of classi cation. Noteworthy that although the interest in opinion mining is huge, there are only few explorations on words extracted in opinion mining. This study considers this drawback and elaborates on a sociolinguistic explanation. We hypothesize: an opinion mining system should be trained for classifying opinions in texts of the same language style. Hence, this contribution focuses on the following questions: 1) do sociolinguistic … Read entire article »

Filed under: collaborative blogging, computational models, dialectic, discussions, dissemination, featured, reading suggestions, signed posts

The Path Forward: From Big Unstructured Data to Contextualized Information

How can we convert massive quantities of unstructured data to structured information? What kind of “structure” do we need for a reliable interpretation of this undomesticated data? I suggest thinking of a text-analytic framework based on “context”. Search keywords, events, entities, sentiments, attitudes, polarities, opinions etc. have a different weight and require a different assessment depending on the kind of texts, the situational context, the  field of discussion, and the authority of the source, as well as on the purpose of use. For example, for an official use, factual texts might have more credibility than opinionated texts. In this respect, press conferences, declarations or announcements by a White House spokesman might be more reliable than newspapers’ speculations or op-ed articles. On the contrary, if we want to test the pulse and … Read entire article »

Filed under: dialectic, discussions

AGI: Structured and Unstructured Noise

How would you handle automatic text classification in noisy conditions? This is what has been done, to my knowledge, in Automatic web Genre Idintefication (AGI). By noise here I refer to two different disturbing factors*: 1) the training sample and test sample come from different sources/annotators; 2) the test set contains genre classes that are not present in the training set. These two types of noise reflect the following real-world conditions when working with genre, namely: 1) since genre is a complex notion that has been interpreted in different ways, the identification of same genre class can vary depending on the research agenda or individual preferences; 2) we cannot possibly conceive a genre classifier that has a good performance if we include all existing genres either on the web or in … Read entire article »

Filed under: dialectic, discussions, overviews

Beyond Topic: Genre and Search

One of the central problems of information retrieval (IR) is the difficulty of matching a document to a query in the absence of any contextual information about the searchers and the document creators.  Genre is a context carrier and genre information can be exploited by information systems to improve their matching algorithms. The web hosts many recognised genres that can potentially provide this contextual information, including FAQs, press releases, product descriptions, instructions, guides, and reviews. … Read entire article »

Filed under: dialectic, discussions, featured, reading suggestions, references

Genre, Social Action and Social Intelligence

An important dimension that has not been investigated so far is the relatedness among genre, social action and social intelligence. The interpretation of genre in terms of social action was put forward more than 25 years ago by Carolyn Miller (Miller, 1984) and backed up by recent empirical studies on web genres (e.g. Miller and Shepherd, 2004, 2009). Lately, the social implications of the concept of genre have been stretched up to support the claim that that teaching how to master genre since the primary school  is a way of implementing democracy and social justice (Martin and Rose, 2008). I would suggest extending the social interpretation of genre even further by arguing that the recognition of social action is a sign of social intelligence. … Read entire article »

Filed under: dialectic, discussions, featured, reading suggestions, references

Provocation or Food for thought? Web sites as AI Mind Persons

Post signed by: Arthur T. Murray, independent scholar in artificial intelligence One difference between natural and artificial intelligence is that a human being can have a webpage, while an artificial intelligence can be a webpage. There are potentially many genres of artificial intelligence, such as robot AI embodied in a robot; Web-resident AI surviving on a server with or without robotic embodiment; and cyborg AI as a cybernetic organism sharing “meatspace” with a host substrate. Let us focus on the Web-resident AI Mind that both has and is a webpage. … Read entire article »

Filed under: collaborative blogging, dialectic, discussions, signed posts

User-Web Interaction: Gestalt in Information Retrieval

Post signed by: Maya Dimitrova, Institute of Control and System Research, Bulgarian Academy of Sciences * In this post, all references, figures and tables have been removed by the blog’s moderator. [Part II] 3   Gestalt in Information Retrieval A group of information retrieval studies is concerned with identifying new linguistic, lexical or formal features (like the special tags) that can be captured by automatically processing html scripts – scanning, tokenizing, clustering – and extracting meaningful information to identify the style or genre of the text inside the Web page. Web genre in the discussed group of studies is defined as a multi-dimensional structure of features of text and html design pointing out at various linguistic and cognitive aspects of the retrieved Web document to help the user find not just the relevant topic, but … Read entire article »

Filed under: collaborative blogging, dialectic, discussions, signed posts

Gestalt Processes in User-Web Interaction: A Two-Side View

Post signed by: Maya Dimitrova, Institute of Control and System Research, Bulgarian Academy of Sciences * In this post, all references, figures and tables have been removed by the blog’s moderator. [Part I] 1. Introduction The Web is developing adaptively and displays certain natural processes that we find also in other areas, for example in the evolution and synthesis of knowledge through scientific research. This process fits very well with the implicit nature of human learning and with our natural human ability to synthesize and systematize new information, even in the face of the exponential growth on the Web. Therefore it is not the amount of new information on the Web that is troubling, but rather the lack of hints about the nature of the contents behind the lists returned by search engines and from the … Read entire article »

Filed under: collaborative blogging, dialectic, discussions, signed posts

Re-fusing form in genre study by Amy J. Devitt (2009) II

Continued. Amy J. Devitt, Re-fusing form in genre study, in Janet Giltrow and Dieter Stein (eds) Genres in the Internet, John Benjamins Publishing Company, 2009 … I am not sure that I completely agree on the importance of the “content”, or “substance” as a constitutive element of genre… … Read entire article »

Filed under: dialectic, reviews