How LLMs Can Improve Search Precision & Recall
For researchers utilizing information retrieval systems for scientific literature acquisition, knowing how to accurately assess the effectiveness of these tools is of paramount importance.
The two key metrics used when evaluating the performance of an information retrieval system or search engine are precision and recall. Precision measures the accuracy of the retrieved information while recall measures the completeness or comprehensiveness of the query results.
In other words:
- Precision = Number of relevant documents retrieved / Total number of documents retrieved
- Recall = Number of relevant documents retrieved / Total number of relevant documents in the corpus
High precision means the system is successful at returning mostly relevant results and minimizing false positives, while high recall means the system can find a substantial percentage of relevant documents, thereby reducing false negatives or significant documents missed.
The ultimate goal of any research endeavor is to attain an ideal equilibrium between optimal precision and recall. However, oftentimes, a sacrifice is being made in one direction or the other.
The Precision-Recall Tradeoff
When researchers utilize information retrieval systems, they may encounter a concept referred to as the "precision-recall tradeoff." This involves a delicate balance between precision and recall, which poses a significant challenge in information retrieval. Interestingly, researchers might not even be aware that they are faced with this predicament.
Increasing precision involves being more selective in the documents retrieved, which may lead to missing some relevant documents (reducing recall). Conversely, increasing recall involves retrieving a larger number of documents, including some irrelevant ones (reducing precision).
In order to strike a balance between these two metrics, the specific application of the search system must align with the researcher’s information needs.
Enhancing Recall & Precision Through AI
Integrating Large Language Models (LLMs) into an information retrieval system can help improve both recall and precision in a number of ways.
1. Natural Language Understanding, Semantic Search, & Contextual Query Expansion
Most (not all) information retrieval systems utilize a combination of TF-IDF (Term Frequency-Inverse Document Frequency1) and Boolean operators2 to execute a user’s search. And since most users do not know how TF-IDF and Boolean operators work, their search queries can result in precision and recall that are often sub-optimal. An LLM can help the user by understanding a natural language query and contextually and semantically expanding it. The initial query in the search below was “use of Botox in the treatment of migraines.”
By asking an LLM (in this example ChatGPT) to expand the search, we have dissected a natural language phrase into a set of terms and synonyms, and from those terms and synonyms have structured a Boolean query, which can be seen at the bottom of the image. The LLM “understands” the search context and creates a query that should improve precision and recall.
2. Document Summarization & Differentiation
When searching for information in large corpora of large documents such as patents or academic literature, even a highly optimized search query can return a tremendous number of results. Summaries can reduce overall reading time by allowing the researcher to quickly triage the results and then focus attention on the most pertinent documents. Prompts can also be engineered to identify similar documents and unique documents among search results and explain to the user why documents are similar or different. Here is an example of an LLM comparing summaries of patents to guide the user to the patents most relevant to their search:
Comparison: Based on the summaries provided, the documents that are most similar to each other are documents 1 and 3. Both documents discuss the use of specific compositions (antibodies in document 1 and botulinum toxin and hyaluronic acid in document 3) for treating specific disorders (migraines and foot pain diseases, respectively). They both propose using these compositions as medications to reduce symptoms and provide relief. The documents that are most unique or different from the others are documents 2, 5, 6, and 7. Document 2 discusses the use of psilocybin for treating various diseases and disorders, including anxiety disorders and eating disorders. Document 5 mentions the potential use of (+)-Azasetron or a related compound for treating ear disorders. Document 6 discusses peptides that can interfere with the formation of a complex and have potential benefits for the nervous system, muscles, and skin aging. Document 7 describes improved power transmission systems and methods involving an implantable neurostimulator. In summary, documents 1 and 3 are similar in that they propose specific compositions for treating specific disorders, while documents 2, 5, 6, and 7 are unique in their focus on different treatments and technologies.
3. Entity Recognition
LLMs can recognize and extract entities (e.g., names, dates, locations) from documents and queries, improving both precision and recall by allowing for more specific and accurate searches. For example, one could ask Google’s Bard:
PROMPT: Can you tell me what drugs are mentioned in this patent? US 11,466,090 B2
RESPONSE: US 11,466,090 B2 is a patent for a composition and method for treating inflammatory diseases using cannabinoids and a non-steroidal anti-inflammatory drug (NSAID). The patent mentions the following drugs:
• Non-steroidal anti-inflammatory drugs (NSAIDs)
Examples of cannabinoids include:
• Tetrahydrocannabinol (THC)
• Cannabidiol (CBD)
• Cannabigerol (CBG)
• Cannabichromene (CBC)
• Cannabinol (CBN)
Examples of NSAIDs include:
The patent describes a composition that includes a cannabinoid and an NSAID, as well as a method for treating an inflammatory disease using the composition. The method involves administering the composition to a patient in an amount effective to treat the inflammatory disease.
The patent also describes a number of specific embodiments of the composition and method. For example, one embodiment of the composition includes cannabidiol and ibuprofen. Another embodiment of the method involves administering the composition to a patient with arthritis.
It is important to note that the drugs mentioned in this patent should not be taken without the advice of a doctor.
4. Query (Re)formulation
If an initial search query does not yield satisfactory results, language models can suggest alternative queries or reformulate the existing one, improving recall by helping users explore different search avenues. For example:
PROMPT: What Bard prompts would you recommend to best find information on the use of Botox to treat migraines?
RESPONSE: Here are some Bard prompts that you could use to find information on the use of Botox to treat migraines:
• General Information:
o What is Botox and how does it work to treat migraines?
o What are the benefits and risks of using Botox to treat migraines?
o Who is a good candidate for Botox treatment for migraines?
o How often do I need to get Botox injections to treat migraines?
• Clinical Trials:
o What have clinical trials shown about the effectiveness of Botox for migraine treatment?
o How does Botox compare to other migraine treatments?
• Patient Experiences:
o What are the experiences of people who have used Botox to treat migraines?
o What are the side effects that people have experienced from Botox treatment for migraines?
o How has Botox treatment affected people's quality of life?
• More Specific Prompts:
o How does Botox work to prevent migraines?
o What are the different injection sites for Botox treatment for migraines?
o How long does Botox treatment for migraines last?
o What are the costs of Botox treatment for migraines?
You can also use Bard to compare different treatments for migraines, such as asking, "How does Botox compare to Aimovig for migraine treatment?" or "What are the benefits and risks of Botox versus preventive migraine medications?"
Bard can also help you find information on specific topics related to Botox treatment for migraines, such as:
• Botox treatment for migraines in pregnancy
• Botox treatment for migraines in children
• Botox treatment for migraines and other headache disorders
LLMs in Search & Discovery
LLMs can help improve precision and recall in information retrieval by leveraging its natural language understanding, semantic search, and various NLP capabilities to provide more relevant and accurate search results, while also allowing for adaptability to user needs and evolving data sources. As LLMs improve and become less general or more “vertical,” even more dramatic improvements will be made to in the precision-recall tradeoff.
Download Our Free eBook: A Guide for Researchers in the ChatGPT Era
In our most recent eBook, we explore Generative AI technologies with a spotlight on Large Language Models, specifically ChatGPT. We delve into how researchers are leveraging and innovating with these advanced models, as well as the opportunities they unlock and the strategies for managing associated risks.
Our eBook, The Risks & Rewards of Generative AI: What Researchers Need to Know in the ChatGPT Era, elucidates the transformative impact of Large Language Models such as ChatGPT on the research industry. It provides insights into maximizing these tools' potential and addressing the hurdles that industry leaders may encounter amidst this pivotal phase of tech evolution. Download your free copy today!
1 TF-IDF, which stands for "Term Frequency-Inverse Document Frequency," is a commonly used technique in search engines and information retrieval systems to evaluate the importance of a term (word or phrase) within a document relative to a collection of documents (corpus). TF-IDF is used to rank documents based on their relevance to a user's query in a search engine.
2 Boolean queries are based on Boolean algebra and involve using logical operators such as AND, OR, and NOT to combine search terms. Users specify whether certain terms must appear (AND), may appear (OR), or must not appear (NOT) in the search results. Boolean queries are precise but tend to produce binary results (documents are either included or excluded) without ranking them by relevance. Boolean queries are typically used for exact matching and are useful when users want to narrow down their search with strict criteria.