Förderjahr 2022 / Stipendien Call #17 / ProjektID: 6335 / Projekt: Question answering over knowledge graphs
With information exploding on the web, people prefer to get precise answers to their questions rather than exploring and identifying them from a massive amount of data. Thus, question-answering systems have a significant influence on daily life as well as industry setting.
The semantic web, an extended version of the current web, provides a large number of data in the form of knowledge graphs. A knowledge graph, made in RDF (Resource Description Framework) model, represents and stores data in a semantically structured way using RDF facts to illustrate real-world entities with meaningful relations. The standard way to retrieve and fetch the data from RDF knowledge graphs is to use query languages such as SPARQL (RDF Query Language). However, building SPARQL queries for people without familiarity with query languages and the structure of knowledge graphs is difficult.
Thus, question answering over knowledge graphs has emerged to assist end users in accessing knowledge graphs and expressing information needs by asking questions in a natural language manner and getting direct answers.
In recent years, the research trend has been to propose and design solutions to semantically parse natural language questions and convert them to SPARQL queries using semantic web technologies, natural language progessing and machine learning.
This blog post explains the basic concepts and technologies in the semantic web required to develop these question-answering systems.
RDF: As one of the foundational semantic web technologies is a data model to present interconnected data in three fields: subject, predicate, and object. In a graphical representation of an RDF fact, the subject and object are the head node and tail node, respectively, and the edge is the predicate (also known as property).
In the RDF model, the subjects and predicates are expressed as URIs, while the objects can be URIs (Uniform Resource Identifiers) or literals. Let's consider the information "The university of Innsbruck is based in the city of Innsbruck, Austria.". To see how this information is stored, consider the following representation and corresponding RDF facts in DBPedia (i.e., an evolving knowledge graph on the web):
<http://dbpedia.org/resource/University_of_Innsbruck> <http://dbpedia.org/ontology/city> <http://dbpedia.org/resource/Innsbruck>.
<http://dbpedia.org/resource/Innsbruck <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Austria>.
SPARQL: SPARQL is a graph-matching query language to retrieve and manipulate data modeled in RDF format. SPARQL uses commands such as SELECT, WHERE and GROUP BY(it offers a syntactically SQL-like language; however, it is more powerful than SQL). For example, the following SPARQL query returns the country where the university of Innsbruck is located. In addition, SPARQL queries can include prefix declarations to abbreviate URIs.
PREFIX dbpedia: <http://dbpedia.org//resource/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
select ?O2 where {
<http://dbpedia.org//resource/University_of_Innsbruck> dbpedia-owl:city? O1.
?O1 <http://dbpedia.org/ontology/country> ?O2.
}
Ontology: Ontology is a formal and explicit description of a domain in terms of classes, properties, and individuals. Ontologies play a crucial role in organizing and integrating data as well as inferring new facts and Semantic reasoning. From the above example regarding the country of the university of Innsbruck, there is a need to create (or reuse) an ontology that models these relationships.
OWL: Ontology Web Language (OWL) is a language for defining ontologies on the web and enables automated reasoning. OWL exploits many of the strengths of Description Logic. Description Logic is a subset of First Order Logics rules that can be used for representing and deducing facts and relationships from existing facts and relationships.
RDF Knowledge Graph: A RDF Knowledge Graph consisting of a set of RDF facts is a semantic network of all things related to a specific domain or organization. Thus, RDF knowledge graphs can integrate, organize, model, and store inconsistent and diverse data. At the same time, they can be explored via SPARQL and used to make reasoning due to having formal specifications.