Förderjahr 2017 / Science Call #1 / ProjektID: / Projekt: SEPSES
In this paper, we propose SLOGER, a workflow for automated knowledge graph construction from unstructured, heterogeneous, and potentially fragmented log sources. SLOGERT combines extraction techniques that leverage particular characteristics of log data into a modular and extensible processing framework.
Our underlying workflow combines log parsing and event template learning, natural language annotation, keyword extraction, automatic generation of RDF graph modelling patterns, and linking and enrichment to extract and integrate the evidence-based knowledge contained in logs.
Our approach expects unstructured log files as input and consists of five phases:
- Template and Parameter Extraction: Log files typically consist of structured elements (e.g., time stamp and device id), and an unstructured free-text message. We use LogPAI to identify constant string and variable parts (parameters) in the text message, but their semantic meaning is yet undefined.
- Semantic Annotation: We apply a combination of Named Entity Recognition (NER) techniques to identify semantic objects inside the extracted parameters and generate Reasonable Ontology Templates (OTTRs).
- RDFization: In this step, we generate an RDF knowledge graph with the help of the templates and log instances. For this purpose, we use LUTRA, the reference implementation for OTTR.
- Background Knowledge Graph (KG) linking contextualize entities that appear in a log file with local background knowledge (e.g., employees, servers, installed software) and external background knowledge (e.g., publicly available cybersecurity information).
- Knowledge Graph Integration combines the generated KGs from previously isolated log files and sources.
With such knowledge graphs, security analysts can easily navigate and query the log data in an integrated fashion. The following SPARQL query and result visualization illustrate how to write a single query to combine log events with external knowledge (standard services running on the ports from IANA, available as ontology).
By making log data amenable to semantic analysis, the workflow fills an important gap and opens up a wealth of data sources for knowledge graph building.