Contextify: How to Contextualize Information

Marina Santini. Copyright © 2012

Work in progress:

Contextify is a metadata tagger that performs text and content enrichment. Contexify enriches information through text classification and content markup.

How can we capture context from a text? I would start with genre, sublanguage, and domain i.e. three textual dimensions that say something about the communicative context in which a text has been issued:

  • A ”weird” word like ”Spweet” is not a typo if it belongs to a Twitter micropost (genre and sublanguage: tweet spam)
  • A ”normal” word like ”mouse” is a specialized term if it belongs to the computer domain.

 

Other examples: surfing (sport, internet communication), agile (ordinary word, software),  sentence (law, grammar), appeal (ordinary language: ”appeal for help” or  legal sublanguage: to lodge an appeal, genre: newspaper, court act) etc.

Context helps disambiguate words and assess the relevance/importance of texts to users’ information needs. Contextualized information can be exploited to improve findability and big data management. Domain, Genre, Sublanguage and other textual categories (such as style, register, or sentiment) tell us something about the relevance of a document to a query or to an enquiry.

In short, contextualized information:

  • helps identify the most important content and the most reliable/relevant information
  • gives hints to the users about the purpose of the document (e.g. informative, product description, timetable, pricelist, etc.), so they can sieve results quickly

 

Is there any existing self-contained tool (either commercial, academic, or open source) that can contextualize information in the way described above? Suggestions are welcome!

Marina Santini. Copyright © 2012

Related Post
The Path Forward: From Big Unstructured Data to Contextualized Information

Highlight
IIiX 2012 – Fourth Information Interaction in Context Symposium, Nijmegen, the Netherlands, August 21-24, 2012

Genre-Enabled Web-Search Prototype (Bauhaus-Universität Weimar)
WEGA: WEb Genre Analysis – Firefox add-on

4 comments for “Contextify: How to Contextualize Information

  1. 28 May, 2012 at 09:31

    Hi,
    In Ford company, we have a online library for all acronyms used in daily work. Upon typing in the acronym, you will be provided with a list of results from all possible origins of the search word. I doubt if you have access to the following link:
    http://www.rlis.ford.com/main/home.html

    Search Results for GPDS

    Library Catalog (0) Ebooks (0) SAE Technical Papers (1) R&A Tech Reports (5) SME Technical Papers (0) World Automotive News (1) FordSpeak (19) Web Resources (1) E-Journals (0)

    Hope this helps somewhat.

  2. 29 May, 2012 at 09:12

    Hi Anna,

    thank you for the pointer.
    Unfortunately, I cannot display the page.
    If you have any public documentation describing the criteria or listing the acronyms, do not hesitate to share!

    Thanks again

    Have a nice day

    Marina

  3. 29 May, 2012 at 10:09

    From LinkedIn: The WebGenre R&D Group [http://lnkd.in/J5kuSW]

    Alexander Osherenko • Hi,

    1) I would first differentiate between the scopes of the context (local and global). Considering the local context of an utterance we could talk about the context, for example, in the terms of the name resolution. Say, the utterance “We have to consider it” means strong objection given the context “It is a very serious objection.”

    2) Otherwise, the context can be global as in “A model of textual affect sensing using real-world knowledge” Hugo Liu, Henry Lieberman & Ted Selker (2003) the fact “Some people find ghosts to be scary.”. This context is learnt from the birth and we consider it unconscionably when making a decision.

    Regarding tools for finding the context, I would first look for papers about the name resolution. If you look for the global context, you could consult the three sources of the real-world knowledge given in the Liu’s paper (Cyc, Open Mind Common Sense (OMCS), and ThoughtTreasure).

    Cheers
    Alexander

    Alexander Osherenko • Sorry, I forgot to mention the social context (the social status) in my previous email. I would place it between the global and the local context. In simple words, a person in a society or an organization is characterized by the social context that is given by a specific hierarchical position.

    Social context can be measured relatively simply (at least, the initial value) because it is defined by your hierarchical position in an organization.

    Alexander

    Marina Santini • Hi Alexander,

    thanks for your suggestions and insights. Anaphora resolution is certainly an important factor in information understanding.

    However, the context that I refer to in the post is “textual and communicative context”. In my view, language use, interpretation and understanding vary according to the communicative situation. Genre, sublanguage, domain and other textual dimensions give us hints about the communicative context in which language is used. As I tried to exemplify in the post, the word “spweet” is not a misspelled word if it belongs to a tweet (a genre). It a special word belonging to Twitter sublanguage or jargon. If, instead, “spweet” is found in a recipe (a different genre), it is likely to be a misspelled “sweet”.
    In this case, knowing the genre and the sublanguage (i.e. the communicative context) helps disambiguate the word and increases information understanding.

    Cheers, Marina

    Alexander Osherenko • Hi Marina,

    I see. I assume identification of misspelled words as such is not a problem — for example, I can easily identify unintentional misspelling “stweet” if I have a dictionary containing the word “spweet”.

    More interesting is identification of such “confusions” as you pointed out “spweet-sweet”. I would define other “dictionaries” where I specify not only a linguistic part such as a word “spweet”, but also a typical communicative situation with genre etc. where the word can occur. These dictionaries can also specify a frequency of a particular word in a particular genre etc. where specific word occurrence can be considered as at least unusual. Furthermore, since a final solution can be very comprehensive I would prefer a semiautomatic option.

    However, such solution can be inappropriate if I think about composing such dictionaries. Do you have suggestions of some typical domains or communicative situations where consideration of confusions does make sense?

    Cheers, Alexander

    Håkan Jonsson • Hi Marina,

    Can you please specify what you mean by “genre”?

    Regards,
    Håkan

    Marina Santini • That’s a big question, Alexander. Ambiguity is pervasive. Just think of the metaphors used in the internet/computer sublanguage, such as “mouse”, “surf”, “sleep”, “hibernation”, “store”, “save”, “clean”, “dirty”, etc.

    If the aim is to identify the context and the communicative situation in which a text has been issued, my current feeling is that starting from text classification rather than word disambiguation might be more straightforward…

    Cheers, Marina

    Marina Santini • @Håkan: what is genre? that’s another big question 🙂

    I will give you a short answer. Do not hesitate to let me know if you need more documentation.

    When talking about texts, genre is textual dimension based on acknowledged conventions that SHAPE the content in expected ways. CONVENTIONS AND EXPECTATIONS are two fundamental components of the concept of genre. There are also other components, e.g. the COMMUNITY etc, but let me focus on conventions and expectations here.

    A couple of simple examples:
    A dissertation is a genre. It has acknowledged conventions, like a TOC, an introduction, a number of descriptive or argumentative chapters, a conclusion. There can be optional conventions, such as Acknowledgements, a dedication, a certain number of pages etc. Those who browse a dissertation expect these conventions. These conventions can be valid for any topic, any domain, any language, any academic style/register. If you do not write an introduction in the expected way (eg. giving a high level overview without going into too much detail), your supervisor will ask you to re-write it. If you use colloquial language, your supervisor will (probably) ask you to re-write it.

    A newspaper article is a different genre that has completely different conventions and triggers different expectations. It is shorter than a dissertation. Usually there is a catching headline that triggers curiosity or is provocative, the most important facts/news are at the beginning of the article with no introduction. Details are provided down in the article, there is often reported speech etc. Readers expect this shaping of the content and often read only the headlines and the first paragraph.
    (I will not go into the details of press subgenres, such as editorials, reportages, interviews, etc.)

    Knowing the genre of a text, and not treating a text like a shapeless BOWs, can help us identify where we can find the most important information, how to disambiguate the language, boost some words rather than others, provide the most relevant documents to a query, etc.

    In short, genres have different communicative purposes and are produced in different contexts for different situations. By knowing the genre of a text, we can reconstruct the communicative context.

    Feel free to ask more questions…

    Cheers, Marina

    Alexander Osherenko • I agree with you — statistical text classification can be more beneficial to identifying the context and the communicative situation. Nevertheless, I also wouldn’t completely neglect word disambiguation.

    I would probably combine these two types of approach and do something that Jan Wiebe names in her approach to identifying subjectivity “strong or weak cues of subjectivity”. Her approach as far as I remember considered weak cues and strong cues to identify subjective phrases. In the current case I’ll do a similar thing: install a statistical genre identification engine (strong cue) supported by semantic pattern recognition (weak cue). Can you follow?

    Cheers, Alexander

    Marina Santini • @Alexander: yes, why not?

    Alexander Osherenko • I even assume that a 2-level scheme is more beneficial to identify context — the first level computes subresults of identification: statistical subresults, for example, after applying Bag-Of-Words and semantic subresults such as occurrence of typical word patterns. The second level summarizes the subresults and computes a final result, for instance, using a rule-based approach.

    Cheers, Alexander

    Myungdae Cho • Interesting thread. Yes, contextuality is important. How about situationality? This also should be considered. On top of, genre, sublanguage and domain how about polysemy problem. Each culture has this problem. Semiotics knowledge would help ^^

    Pedro Marcal • @Alexander, I also use a two level approach based on Statistic. The first is a context free parse followed by a semantic parse. In the first level I achieve disambiguation by a translation into Chinese where the written text seldom has ambiguous meanings.

    Alexander Osherenko • @Myungdae, great news — no problem for statistical approaches. They are real “desperados” — they learn in every case. Bad or good — they learn. They can learn to identify situational, polysemous, pragmatic, cultural or other issues. Main problem — composition of good learning corpora containing situational, polysemous, pragmatic, cultural or other learning information.

    Myungdae Cho • @Marina Is it possible for me to put your question into slightly different way? Say, What are the factors affecting in constituting “meaning”? What are happening ‘conceptually’ in between ‘symbols’ and ‘referents’? To answer this, might we say the things we discussed here? Genre, sublanguage, domain, the scopes of the context (local and global), the social context, situationality, polysemy, anaphora, textual dimensions, subjectivity, or intersubjectivity …

    Marina, what’s the difference between sublanguage and domain?

    Cheers

    Myungdae Cho

    Marina Santini • Hi Myungdae,

    Thanks for your interesting questions.

    Concepts like genre, domain, sublanguage, register, style, text types, etc. have been defined in many different ways along the centuries.

    I would like to conceptualize them for a specific aim: automatic text classification/categorization. Nowadays, most of automatic automatic text classification/categorization is based on TOPIC. In my view, topic is not enough to “represent” a text/document and to meet information needs of an information society.

    David Lee tried to disentangle these concepts for the manual classification of the British National Corpus (http://llt.msu.edu/vol5num3/pdf/lee.pdf) and he highlights some of the overlapping zones across these textual dimensions.

    My long-term goal is to find definitions/characterizations that can be implemented COMPUTATIONALLY in a intelligent information system. Therefore my view is more related to the progress of NLP and IR/IE and similar than to purely theoretical stances.
    This means that all the computational aspects of word-sense disambiguation, anaphora resolution, subjectivity, intertextuality etc. are very important in my perspective. They are “features” that can represent my classification problem, which focuses, at the moment, on genre classification, domain classification and sublanguage classification.
    While *genre* takes care of the “organization” of the content of a text in certain predictable ways, domain is more focussed on a specific field, and sublanguage accounts for the audience. Situationality (a communicative situation) can be more linked to the use of a specific “register”.
    Let me give you some examples:
    A domain like “medicine” has many different topics (heart, lungs, cancer, etc)
    A domain like “history” has infinite topics (Napoleon, the Romans, the war in Iraq etc)
    etc.
    A sublanguage like the one that “doctors” use among themselves reflect the medical domain
    A sublanguage like the one that “nerds” use is a kind of “computerese” jargon.
    Register is more linked to the use of language in a certain situation: a formal speech for the opening of an academic year, the colloquial language used by close friends, etc.
    Genre refers to the compositional conventions we use in these different context: a public speech has conventions that differ from a telephone conversation, etc.

    All these dimensions together give us a multidimensional view of a text (many facets that can coded in meta-tags), and they are all based on language uses in different contexts. Ideally the analysis and classification of the language uses can help us reconstruct the communicative context of a text.

    Oh God what a long answer 🙂
    Pls feel free to ask for additional details, if I have been unclear.

    Cheers, Marina

    Myungdae Cho • Thank you very much Marina.
    Yes. we need to go beyond the topical relevance. It is not good enough to meet the needs of information users in this information space. I think we should consider genre, sublanguage, domain, register, anaphora, subjectivity, intertextuality etc when we want to enrich the text in concern.

    Now the questions again are that is there any way to incorporate these factors into a coherent way? Are there any computational way to disambiguate polysemous words?

    Currently I am using SKOS (Simple Knowledge Organization System) in which some topicality can be dealt with and some semantic relations are represented. After reading your precious comment on a text (or a term or a concept), we can add the factors we discussed here in semantic way.

    For example, we can create skos:genre, skos:sublanguage, skos:domain, skos: register etc. Sounds a little bit awkward, but could enrich a term we are talking about.

    Just a thought ^^

    Marina Santini • @Myungdae: I am not familiar with SKOS yet, but I will explore it in the future.

    By the way, in last week’s blog post I introduced DaisyKB (http://www.forum.santini.se/2012/05/mining-query-logs-query-disambiguation-understanding-through-a-kb/)
    In that post I suggested using a KB to address some of the problems of query interpretation and understanding. But my vision of DaisyKB (which at moment is still on a pre-study stage) is to be a flexible unified standardized multilingual knowledge base where words and their textual characterization, specific factors and disambiguation cues are stored. If you look at the example in that post, the word “bank”, you can see that each entry is divided by sense, and each sense can have as many as subfields as possible, eg. synset, which is liked to a specific sense, phraseology, etc.
    Such a resource must be devised with a flexible structure, so any update (eg. the addition of a new subfield, the addition of new sense etc) can be done rapidly and consistently…

    I will publish some slides on DaisyKB when I have them ready.

    Cheers, Marina

    Alexander Osherenko • @Marina As far as I understand, your idea is to “measure” meaning multidimensionally in terms of genre, sublanguage etc. Do you have an idea what dimensions are beneficial in a taxonomy with such dimensions or how to compose it? How can you assess the taxonomy regarding exhaustiveness, for example, if you consider understanding of meaning in the context of culture or polysemy?

    Cheers, Alexander

    Marina Santini • Alexander, I have not decided the taxonomy of subfields because this must be discussed in large panel. However, it is not important to define an exhaustive taxonomy from the beginning, since DaisyKB must be designed to be continuously updated according to upcoming needs.

    As for multiculturalism and multilinguality, my idea is to have a DaisyKB per language. Each KB is connected to the others through a abstract representation that, I would suggest, is the English language. If we take a base language to hinge the relations, cultural and semantic discrepancies can be treated as they are treated in bilingual dictionaries, so there is some background experience to build upon.

    Cheers, Marina

    Myungdae Cho • I had a quick look on your DaisyKB. To me it would be better if we use ontology technology rather than XML. SKOS certainly would help. It is a actionable metadata. As you said, DaisyKB can be used for indexing, word-sense disambiguation, query disambiguation and analysis, multilingual queries, and lots more… I believe that web search engines, enterprise search engines, domain-specific information systems and other language-based applications could benefit from such a resource.

    Cheers, Myungdae

    Ian Fry • Great Thread. My personal interest is in the KM field and in particular how to recognise when knowlegde is being applied in an unsafe context. It may be used outside the original scope but may be “neutral” if used.

    Marina Santini • Hi Ian,
    thank you for your interest! Can you give us some examples of “unsafe context”? Unsafe context makes me think of the dark web and terrorism, or mobbing and fishing… Are these the contexts you are working with?

    Cheers, Marina

    Ian Fry • Let us suppose we have web content describing a “forklift” and its operation. If somebody searches for “Crane operation” then I want to make sure they do not see the “forklift” content anywhere in the returned list – even on page 120. It is dangerous to use that content. The reverse is not true. If you search for “forlkift” and use the regimes for a crane, it may not be applicable, but it is not dangerous.

    So the challenge is to describe the “forklift” context in a manner which has executable rules..

    Robert Foster • This is a great thread, Thanks Marina !

    My hope is that tools will start to appear that can produce really useful sentiment analysis results from the constantly growing corpus of social media that uses a constantly evolving English language. It sounds like Alexander’s call to ‘identify the most appropriate dimensions for a particular query’ is on the right track, but all results I have seen so far seem to be ‘shooting the dark’… and without a standard definition of the English language itself, it is hard to imagine how anything else can be offered.

    My particular interest is in software that offers suggestions for all possible ‘construals’ (Croft and Cruse 2004) that readers might take from an identical phrase within a particular context. My intuition is that description of the various dimensions of ‘context’ included earlier will be fundamental in this work.

    Do others have suggestions for systems that are approaching this goal?.. or perhaps you think it has already been reached !

    Thanks in advance for your suggestions…

    Elliot Kulakow • I’d like to double back on the idea of Topic versus Context here for a moment. I think it’s a little harsh to discard the importance of topic modelling off the bat, just because it is insufficient to provide a complete solution. Palantir has gotten really far and made a huge impact just through a combination of LDA, human markup, and UI. In my opinion, the key distinctions between Topic and Context are not one of meaning but one of locality. Topic is a very global idea that applies to the entire document and is independent of word order. Any good model of context will have to move beyond this restriction, whether by statistical or linguistic approaches. Personally, I believe that the linguistic approaches are going to be primarily of use in pre-processing raw text in order to make the statistical approaches more sensitive to distinctions which may be very rare even in a large corpus. In the end, there are really only 2 things that you need context for – finding similar phrases (as opposed to just similar words or similar documents), and providing an implicit graph structure on top of which to drive inference for some supervised learning/clustering problem a-la sentiment analysis or document filtering.

  4. 29 May, 2012 at 10:34

    From LinkedIn: Text Analytics group [http://lnkd.in/zx-7VS]

    Vineet Yadav • you can look into some word sense disambiguation application. Word sense disambiguation(http://en.wikipedia.org/wiki/Word-sense_disambiguation) algorithm are useful to guess word sense on the basis of context.
    The problem is divided into two parts
    * word sense disambiguation:- To disambiguate word are used commonly in language

    * entity disambiguation:- Disambiguate named entity on basis of context.

    Opencog(http://wiki.opencog.org/w/Word_sense_disambiguation) can be used from word sense disambiguation. It is based on the Mihalcea algorithm( “PageRank on Semantic Networks, with Application to Word Sense Disambiguation”, “Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling”) which uses Page-Brin page rank algorithm for WSD. You can find more information related to opencog word sense disambiguation on opencog blog(http://blog.opencog.org/2009/01/12/determining-word-senses-from-grammatical-usage/).
    Ted Pederson sense relate(http://senserelate.sourceforge.net/) is also useful word sense Disambiguation. You can also look at their publications(http://www.d.umn.edu/~tpederse/senserelate-pubs.html). UKB is another graph based word sense disambiguation tool(http://ixa2.si.ehu.es/ukb/). You can look at wordnet(http://wordnet.princeton.edu/) and wordnet related projects(http://wordnet.princeton.edu/wordnet/related-projects/).
    Apart from that, you can distance measures like WU-Palmer Similarity, LCH similarity, path based similarity using nltk for word sense disambiguation. For that check out jacob perkins book(http://www.packtpub.com/python-text-processing-nltk-20-cookbook/book), page no 19,20,21.
    to disambiguate word sense.

    Alchemy provides solution for named entity disambiguation(http://www.alchemyapi.com/api/entity/disamb.html), which disambiguate named entity. Alchemy uses resources like dbpedia(http://dbpedia.org/About), opencyc(http://www.cyc.com/opencyc) and other ontological resources for named entity disambiguation. You can also use wikipedia to resolve named entity like person name, location name etc. You can find some relevant papers in which wikipedia is used for named entities resolution ( http://scholar.google.co.in/scholar?q=wikipedia+named+entity+disambiguation&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on) . Researchers have used similarity measures like path_based_similarity with wikipedia for named entity disambiguation.

    Marina Santini • Hi Vineet,

    I really appreciated all pointers. They very useful for my research.

    However, I have the impression that word sense disambiguation and name entity resolution refer to the “co-text” rather than the “context”. As mentioned elsewhere, the context that I refer to in the post is “textual and communicative context”. In my view, language use, interpretation and understanding vary according to the communicative situation. Genre, sublanguage, domain and other textual dimensions give us hints about the communicative context in which language is used. As I tried to exemplify in the post, the word “spweet” is not a misspelled word if it belongs to a tweet (a genre). It a special word belonging to Twitter sublanguage or jargon. If, instead, “spweet” is found in a recipe (a different genre), it is likely to be a misspelled “sweet”.
    In this case, knowing the genre and the sublanguage (i.e. the communicative context) helps disambiguate the word and increases information understanding.

    In short, my concept of context is linked to pragmatics and textuality.
    If I have been unclear, pls let me know.

    Have a nice day, marina

    Elliot Kulakow • This is not an easy problem, and as far as I’m aware there is no existing tool for generating this information currently. If you are just looking for document topic, check out some of David Blei’s tools for topic modelling – http://www.cs.princeton.edu/~blei/. LDA obviously works, or if you are looking for something faster with somewhat different assumptions, his HDP code is ok. These tools don’t give you “context” though, just the topics. Context is as of yet still an unsolved problem at least from a product perspective as far as I’m aware. Let me know if you have any business applications for this as it’s an area I am also working on.
    Elliot

    Evgeny Fedosov • Thanks a lot, Vineet for very insteresting and helpful information!

    Vineet Yadav • word sense disambiguation also uses context information which consider neighboring words to detect sense of word. Word sense disambiguation may not work for muli-domain or multi-genre text because of resources for different genre, language, domain are not available. I think you can use word sense disambiguation approach with slight modification.

    Step 1) Form word sense dictionary for different domain. You can start with wordnet, collect all the word of your interest and translate those words and word sense for different genre, language, domain. The second case you need to create resource manually.
    Step 2) Train a genre, domain classifier. Use classifier to identify sentences and associated genre of sentences.
    Step 3) Identify paragraph or group of sentences which belong to one genre. For this, you can use other features to find communicative segments.
    a)pronoun usage:- pronouns refers nouns and other entity. So you can use pronoun use to segment text. Starting sentences should not contain pronouns as they should not refer entity of previous segment.
    b)change of word usage:- You can look how word usage is changed and use it to find communicate segments.
    Step 4)Apply word sense disambiguation for each communicate segment.
    Step 5) Find out lexical chain in each segment and across similar communicate segment and all segments use them to rank word sense.

    Marina Santini • @Elliot: I will keep you informed about context. I will post a summary of tools/suggestions in the next weeks.
    @Vineet: thanks for the suggested pipeline. I will think about it.

    Elliot Kulakow • Really glad I found this group, I’m mostly thinking about it as a purely statistical problem and it definitely helps to hear people’s approaches with a more linguistic background.

    @Marina – am I correct in understanding that LingoMotors was the Web1.0 attempt at a context search engine that hasn’t really been approached since? What ever happened to them?

    Marina Santini • Hi Elliot,

    I do not know what happened to Lingomotors. When I was there, the aim was the creation of a semantic+natural language search engine, based on Pustejovsky’s Generative Lexicon. The idea was and still is fascinating…

    Cheers, Marina

    Marina Santini • @Eliot: if you are interested in more discussion on this topic there has been a paralled thread here: http://lnkd.in/J5kuSW

    Timothy Vogel • Wonders how anyone can tackle this problem until they define fo rus what they mean by context. i’ve been querying “cognitive scientists” fro quit e awhile and challenging them to define “context’ and they always dissemble into something that sounds a lot like the classic non-definition of pornography;

    “I can”t tell you what context is but always recognize it spot it when see it.”

    To me it’s a pure matter of expectation analysis.

    “Context is an a prior set of expected outputs form a random or even independent set of inputs”.

    That’s why context is so hard to nail since the “a priori” expectation is all too often thought to exist in the answer as opposed to the question-and-questioner complex.

    If you can “know”, or at least semi-accurately gauge the question-to-questioner similarity space then, and only then, will you have a chance at getting at true context. Transactionally arrayed data-streams of online content preference afford us a glimpse into that very complex, and so context resides not so much in the query as the pre-query phase.

    We at heur-e-ka/Readware are already extracting context in a way that enhances any application that seeks to draw meaningful inference from the data it has at hand!

    TV

    Marina Santini • @Timothy: interesting comment. My personal opinion is that situational context (i.e. the situation that triggers the production of a text) can be inferred from how the language is used in the text itself. Therefore a text can be contextualized and enriched through language-based inferences. What is your approach at heur-e-ka/Readware?

    Cheers, Marina

    Timothy Vogel • Marian,

    “…the sounds in the names we give to things are relative cognitive mappings of those things…”

    http://www.youtube.com/my_videos_edit?video_id=2CIFYJxGPeo&ns=1&feature=mhsnhttp://commonsensical.wordpress.com/adis-semantic-theory/

    video and lexical! ;^D

    best,

    TV

    Timothy Vogel • just re-read his initial comment and apologizes for the typos. it was late. UGH!!! her eit is as I intended it to be…

    …wonders how anyone can tackle this problem until they define for us what they mean by context. I’ve been querying “cognitive scientists” for quite a while now, challenging them to define “context”. They always dissemble into something that sounds a lot like the classic non-definition of pornography;

    “I can’t tell you what context is but can always recognize it when I see it.” To me it’s a pure matter of expectation analysis.

    “Context is an ‘a priori’ set of expected outputs from a random or even independent set of inputs”.

    That’s why context is so hard to nail since the “a priori” expectation is all too often thought to exist in the answer as opposed to the question-and-questioner complex.

    If you can “know”, or at least semi-accurately gauge the question-to-questioner similarity space then, and only then, will you have a chance at getting at true context. Transactionally arrayed data-streams of online content preference afford us a glimpse into that very complex, and so context resides not so much in the query as the pre-query phase.

    We at heur-e-ka/Readware are already extracting context in a way that enhances any application that seeks to draw meaningful inference from the data it has at hand!

    onto-logica.com
    http://www.youtube.com/my_videos_edit?video_id=2CIFYJxGPeo&ns=1&feature=mhsn

    Miti/Readware
    http://commonsensical.wordpress.com/adis-semantic-theory/

    Marina Santini • Thanks, Timothy.

Leave a Reply

Your email address will not be published. Required fields are marked *

*