Förderjahr 2022 / Stipendien Call #17 / ProjektID: 6335 / Projekt: Question answering over knowledge graphs
Based on the previous posts, now you probably know what question answering (QA) over knowledge graphs (KG) is and why it matters!
A KG is a way of knowledge representation that represents real-world entities and their relationships to one another. A KG consists of a set of nodes (representing entities such as objects, concepts, or events) and a set of edges (representing the relations between entities).
The main importance of KGs for question answering is their ability to capture the complexity and context of relationships between real-world entities. The structured representation of entities and relationships makes it possible to generate more accurate and relevant answers to complex questions.
QA over KGs has a wide range of applications across various domains including healthcare, finance, tourism, manufacturing and production. For example, in health care settings, patients are able to ask questions about the side effects of particular medications in a natural manner, and the KG-based QA system can return exact and direct answers including the side effects based on facts stored in the underlying KG. In the case of manufacturing and productions, KGs can be used to store and model production pipelines and then the questions such as the optimal time to perform maintenance on a specific machine or the causes of a particular defect can be asked.
Are you wondering what different types of questions can be asked? In KG-based QA systems, natural language questions can be categorized into the following groups.
- Factoid questions: These are the most common types of questions, short , straightforward and start with WH words (e.g., what, when, where) as well as require specific answers. For example, ‘When was knowledge graph introduced?’.
- Yes/No questions: These questions are answered with ‘yes’ or ‘no’ and basically confirm or deny a given statement. For example, Is Omicron COVID-19 Vaccine (Vero Cell) the first COVID vaccine approved in China?
- Causal questions: These questions usually start with ‘why’ and ‘how’. Analyzing the cause-effect relationships in the KGs and exploring the underlying causes or reasons behind particular phenomenons or events are essential to answer causal questions. For example, ‘Why does a particular lifestyle factor increase the risk of developing a certain disorder?’
- Hypothetical questions: Different from factual questions, hypothetical questions can be objectively verified and are related to hypothetical situations or events (may or may not occur in the future). For example, ‘If there was a shortage of hospital beds during a pandemic, how should hospitals prioritize which patients to admit?’.
Factoid questions are the most asked type of questions that are divided into two groups inducing simple questions and complex questions. Simple questions (or 1-hop questions) require one-hop reasoning over facts stored in the KG while complex questions (or multi-hop questions) require reasoning over two or more facts of the KG. To get a better insight, let’s consider the following figure that shows some facts about the University of Innsbruck based on the knowledge encoded in DBPedia.
In the above figure, the question ‘When was the university of Innsbruck founded?’ only needs one fact (Innsbruck_of_Innsbruck, foundationDate, '1669-10-15' ) to be answered. However, the question 'what is the captial of the country where Innsbruck University is located? ' requires reasoning over the facts (University_of_Innsbruck, city, Innsbruck), (Innsbruck, country, Austria), and (Austria, capital, Vienna).
There are several challenges to design and develop KG-based QA systems capable of answering factoid questions, either simple or complex. In the following, the main challenges (explored in my PhD research) are summarized.
- KGs can generall contain millions or billions of facts (such DBPedia, WikiData). The presense of these numerous entities and relationships makes it challenging to consider the entire KG for each question. To address this issue, KG-based QA systems need to efficiently filter out irrelevant parts of the KG and extract subsets that are relevant to the input question in order to reduce the search space (more details are available in our published papar ).
-
While there are many opportunities to utilize machine learning methods for creating KG-based QA systems, obtaining adequate training data presents a significant obstacle in real-world scenarios. This is because there is often a lack of training examples available, especially when developing these systems for small and medium-sized enterprises (SMEs). Thus, developing KG-based QA systems for SMEs can be challenging due to the lack of available training data (more details are available in our published paper).
-
The variation in the way that questions are asked by users and KG's vocabularies and terms is one of key challenges. People may use different vocabulary, phrasing, and syntax to ask the same question, which can make it difficult for the system to accurately interpret the question and find the answer from the KG's facts. Therefore, bridging the semantic gaps and variations between questions and the vocabulary of the KG is a significant requirement for KG-based QA systems.