Thesis Review: Resolving Power of Search Keys

Heppin, Karin Friberg (2010). Resolving Power of Search Keys in MedEval a Swedish Medical Text Collection with User Groups: Doctors and Patients. PhD thesis, Gothenburg University, Sweden

Opponent Stefan Schulz; Defence Presentation:

The thesis “Resolving Power of Search Keys in MedEval a Swedish Medical Text Collection with User Groups: Doctors and Patients” opens with crucial questions in Information Retrieval (IR). The general question is:
1. What type of search keys are effective when searching for information in a collection of documents?
Language-specific questions refer to how to handle compounds, since around 10%2 of words in Swedish running texts are compounds
Then, important questions are:
2. What is the best way to treat compounds?
3. When is it beneficial to use individual compound constituents as search keys and when does it ruin a search?

The thesis describes how MedEval, – a Swedish medical test collection- was created to answer these (and other) questions; it also presents a series of pilot studies to show what kind of studies can be performed with such a collection. The pilot studies are focused on search key effectiveness. Results show that:
• a good search key is a term that expresses one of the main subjects of the topic in question, and on the same level of specificity, i.e. a term with a frequency in the middle range of all term frequencies and is clustered in few documents;;
• a bad search key is a term that brings noise to the top of the ranked list of retrieved documents. This is common with high frequency, i.e. terms with broad, non-specific meaning and terms spread evenly over the documents,
• a search key that is very infrequent is neither good nor bad;
• compound constituents which have collection frequencies that are not in the middle range of the collection frequencies of all terms do not make the queries more effective and should not be used; additionally some compounds, especially occasional and compositional, reveal their information first after they are split. For other compounds splitting means the information is destroyed.

The thesis contains five parts, 15 chapters and two appendices.
In Chapter 1 (Introduction) research questions are presented with a summary of the final contributions.
Part I – Background – provides the background of the thesis in Information Retrieval (Chapter 2), Evaluation (Chapter 3) and Linguistics and IR (Chapter 4). Chapter 5 (Research in medical information retrieval and in doctor/patient language) summarizes previous work in the specific area of medical information and doctor/patient language. Chapter 6 (Resolving Power) concludes Part I with a review of the different definition of key goodness given in the literature.

Part II – The Guide – contains Chapter 7 (Drawing the Roadmap) and Chapter 8 (Travel Instructions), which summarize the goals and help orientate the reader in the rest of the thesis.

Part III – Test Environment – includes Chapters 9 and 10. Chapter 9 (Tools and Resources) describes the tools and resources which have been used, namely the Indri index builder and retriever of the Lemur Toolkit (p. 96), the trec_eval tool (p.97), the Query Performance Analyser (QPA) that is used to visualize and compare the effectivity of individual queries (p. p. 98), VisualVectora, “a visualization tool that allows the user to visualize retreival results and allows user to compare runs with several queries for each topic” (p. 102), MedLex (in Swedish), a workbench for lexicographic work composed of a lexical database containing 4500 medical lemmas and a medical corpus of different genres, such as medical journals, teaching material, guidelines, patient FAQs, blogs, health care online information, etc. (p. 104), and the Swedish MeSH (the Swedish translation o the American MeSH) (p. 105). Chapter 10 (Creating the MedEval test collection) describes the construction and the composition of the MedEval test collection. MedEval has a double index (one with full compounds, the other one where the compounds have been split. In the MedEval collection, assessments of the documents were made not only for relevance, but also for target groups, namely for lay persons or for medical professionals (p.132). This means that six different results for the same query for the six possible combinations of index and user group are possible in MedEval.

Part IV — Pilot Studies. This part presents a range of experiments to show the potential of the MedEval test collection.
Chapter 11 (Constructing Facets) describes some methods used for dividing topics into facets and in choosing search keys. Baseline queries, containing all facets, were constructed for each topic. To study the impact of individual search keys within a facet, baseline queries were constructed which contained all terms within the facets (an example of a baseline query is shown in Fig 47 on p. 147). The author lists also problems arisen with the construction of facets (148-150).
Chapter 12 (Looking at Facets and Terms) describes the initial survey of facets and search keys. First, the queries were run through the Indri search engine for the two indexes, with and without decomposed compounds. Then, additional experiments are described on the effects of compound decomposition (pp. 158-161, 172) and on how search keys and facets behave with respect to other search keys and facets, respectively (162-169.
Chapter 13 (Search Key Behaviour) describes the behaviour and impact of different types of search keys. Experiments support the conclusion that the terms in the middle range of frequencies have the highest probability of discriminating documents relevant to the topic.
Chapter 14 (Looking at Doctor and Patient Documents) focuses on differences found in the language of documents assessed to have doctor target group, and of those assessed to have patient target group. Results show that it is easier to find features specific for the expert documents, than for the non-expert documents. The author states: “To find the expert documents it tends to be an effective strategy to use many synonyms, to use compounds and longer words, to use the names of specific drugs, and to use trigger phrases typical of the professional written language. On the other hand, there does not seem to be as many specific strategies to find non-expert documents.” (pp. 215-216)
Part V – Conclusions – contains Chapter 15 (The End of the Road) where the author summarises and lists the answers to the research questions asked in the introduction.
Appendix A contains the list of topics used for evaluation and Appendix B includes graphs showing the ideal cumulated gain for 30 topics.

The thesis “Resolving Power of Search Keys in MedEval a Swedish Medical Text Collection with User Groups: Doctors and Patients” provides a comprehensive overview of how to assess the effectiveness of the search keys specified in a query to a medical collection. The thesis is certainly a good starting point for those who want to continue exploring the same topic, since it contains a robust background presentation and a useful breakdown of problems related to medical subanguage (which are probably in common with other specialized and domain-specific languages, such as the legal or financial sublanguages).
The full acknowledgement of the importance of how language in used in different communicative situations is, in my opinion, the strong point of this research. It is evident from the findings that in medical information retrieval, the domain-specific “literacy” of the audience plays a major role. One major problem for search and retrieval purposes seems to be the rich synonymy. Medical synonyms are often audience-dependent. If the audience of a document is made of doctors, a term (i.e. a technical word) like (Swedish) anemi (Eng. anemia) is then used; while if the audience of a document is a patient, a more familiar/colloquial synonym is preferred, like (Swedish) blodbrist (Eng. blood deficiency) (p. 200). More specifically, the author has investigated register variation together with style and trigger-phrases variations and has found that that the effectiveness of search keys is higher when specific/specialized terms are used. This means that the expert user searches from a privileged position, due to his/her domain-specific competence.
I think, however, that both the expert user and the lay user could improve their user’s satisfaction if a concept of genre would have been considered in this medical search/retrieval scenario .
On page 104, the author states that MedEval is built on a snapshot of MedLex. Medlex is a corpus built with medical journals, teaching material, guidelines, patients’ FAQs, blogs etc. That is, Medlex. Consequently MedEval consists of several genres belonging to the medical domains. Usually the content of genres such as patients’ FAQs and blogs are geared towards patients, while scientific articles or guidelines are for experts. This means that knowing the genres of documents give us also information about the target audience. There is usually a relation and correlation among elements such as: sublanguage, register, style, vocabulary richness, terminology and the genre of a document (more on the this in future posts).
Taking the concept of genre into consideration, I propose the following next steps to continue the profitable investigation initiated by Karin Friberg Heppin:
• providing the genre of a document in the search results so the user can immediately filter out documents that do not belong to their expertise. Presumably a doctor searching for the latest research about a drug is not so much interested in explanatory or instructional genres, such as patients’ FAQs. Conversely, patients’ FAQs might be more suitable for non-expert users wanting to know more about opinions, experience and characteristics of a drug in simple and plain words.
• providing a query suggester where the patients’ search keys are combined with other terms or words that make the query more specific. In this way the lay user is helped to formulate his/her information need and to expand his/her own vocabulary.
• enriching the content of documents with meta-tags that store synonym’s expansions in the document itself, including also in the retrieval process the semantic tagging described on page 104, which seems not to be used in the pilot experiments.

Leave a Reply

Your email address will not be published. Required fields are marked *