Förderjahr 2017 / Science Call #1 / ProjektID: / Projekt: SEPSES
We are happy that our paper titled “Virtual Knowledge Graphs for Federated Log Analysis” has been accepted at the ARES conference 2021.
The paper introduces a novel approach to dynamically construct virtual log knowledge graphs directly from heterogeneous raw log files across multiple hosts. It furthermore contextualizes the results with internal and external background knowledge to enrich the results.
This has the advantage that log files can remain on the respective hosts without a priori centralized aggregation, processing, and materialization of log data. Only upon queries, the relevant log data gets processed, combined and shipped to the analyst.
The architecture of the approach is visualized in the following figure:
Our approach comprises two main components:
1. Query Processor, a component that provides an interface to formulate SPARQL queries and distributes the queries among individual endpoints.
2. Log Parser, a component on each host, which receives and translates queries, processes log data, and sends the results back to the Query Processor.
The following figure visualizes the query translation mechanism. A SPARQL query is translated and mapped to the respected log properties from a specific log source, host, and time range defined in the query.
Next, as depicted in the figure below, the selected log lines/properties are parsed and mapped into RDF, based on the respected log vocabulary. The constructed RDF log graphs are enriched with background knowledge and compressed into a compact RDF format (i.e. HDT) for further processing.
Our evaluation shows that the log processing time is primarily a function of the number of extracted (relevant) log lines and queried hosts. For future work, we plan to improve the query analysis and extend the approach for streaming scenarios.