enRichMyData Toolbox

Overview

enRichMyData delivers capabilities as a set of interoperable tools and services forming the enRichMyData Toolbox. At the center is the concept of a data enrichment pipeline that receives input data to be enriched and generates enriched data.

Functional Tools

A set of tools providing functional capabilities needed to support the design of pipelines.

Infrastructure Services

Services providing non-functional capabilities needed to support the effective deployment and execution of pipelines.

enRichMyData features loosely coupled but interoperable tools and services designed to handle complex data enrichment scenarios, where tools and services can be combined and customized as needed.

DiscoverR

DiscoverR assists users in searching datasets, ontologies, and enrichment services and provides insights on their content to support their use in enrichment pipeline. The user can search for keywords over descriptions of catalogued datasets/ontologies/services, or browse specific descriptions from a visual interface. It offers semantic data profiling techniques to enrich basic descriptions based on metadata with ontology usage patterns and statistics, boosting FAIR principles.

ABSTAT

TRL 6

Scalable data profiling tool for RDF data (knowledge graphs) based on: 1) pattern extraction (class - predicate - class), possibly with the support of the data ontology; 2) calculation of different statistics.

SemT-X

TRL 8

Table reconciliation and extension services: Semantic enrichment of tabular data by external services accessible via browser or Web APIs. Entity reconciliation and linking to shared Knowledge Bases and Knowledge Graphs. Table extension extracting data from external datasets and Knowledge Graphs.

WrappR

WrappR provides data access using a virtual semantic layer and ensures secure access. WrappR is delivered as a semantic graph database with efficient reasoning, cluster and external index synchronization support. It provides a variety of different type of APIs and access methods as well as different types of data federation, virtualization and data interaction with large language models. Through semantic data access and integration, WrappR provides a practical, robust and versatile tool to improve access to data.

GraphDB

TRL 9

Ontotext GraphDB is a highly efficient and robust graph database with RDF and SPARQL support. It supports a number of plugins and connectors such as MongoDB connector for JSON store access, JDBC for exposing RDF as a virtual relational DB, ONTOP for virtual sparql access. Talk to your Graph for a chatbot interface on top of a RDF graph and a GraphQL layer for seamless data access for application development.

Talk to Your Graph

TRL 6

Talk to Your Graph is a chatbot that allows you to converse with your data and extract factual information using natural language. The chatbot is an example of Graph Retrieval-Augmented Generation (Graph RAG), as it retrieves relevant information from a GraphDB knowledge graph in the form of triples and uses that information to generate informed responses. Talk to Your Graph provides the accuracy and depth of graph queries without requiring comprehensive understanding of SPARQL or other retrieval methods.

GraphQL in GraphDB

TRL 9

The tool Generates a GraphQL interface directly from OWL ontologies and SHACL shapes, enabling users to expose RDF data via GraphQL endpoints. It offers configurable schema generation options to tailor types, relationships, query/mutation support, and control over how ontological constructs map to GraphQL . Formerly called "Semantic Objects" the tool now is fully integrated with GraphDB, so once configured, the GraphQL endpoints are immediately available for querying from apps or GraphiQL consoles

CleanR

CleanR supports the specification of data manipulation transformations, including data cleaning operations and the generation of knowledge graphs from various data formats. Users specify transformations interactively from a user interface, while specifications will be stored in a machine-readable format to be replicated and reused. CleanR provides a broad set of AI-enabled data transformations and integrates them with generic linking and extension functionalities provided by the ResourcR.

Ontotext Refine

TRL 8

Ontotext Refine (OntoRefine) is a free application for automating the conversion of messy string data into a knowledge graph.

RMLMapper

TRL 7

RMLMapper executes RDF Mapping Language (RML) rules to generate Linked Data from multiple originally (semi-)structured data sources.

LinkR

LinkR provides capabilities for semantic annotation of structured and semi-structured data using reference knowledge graphs and category schemes. Annotations consist of links from elements of the input data to elements of well-established knowledge bases and ontologies, or user defined knowledge graphs made available through the ResourcR. LinkR supports annotations through intelligent ML algorithms recommending annotations and a human-in-the-loop approach.

SemT-UI

TRL 4

Table reconciliation and extension services: semantic annotation of tabular data using external services. The UI supports entity linking and schema annotations to support full-fledge mapping, and specification of data extension operations using external datasets and Knowledge Graphs.

selBat

TRL 5

Table interpretation service: Semantic annotation of tabular data by an unsupervised approach based on heuristic. Schema types and properties, entity reconciliation and linking. Target Knowledge Graph: Wikidata.

Ontotext Refine

TRL 8

Ontotext Refine (OntoRefine) is a free application for automating the conversion of messy string data into a knowledge graph. It allows reconciliation against any endpoint supporting the reconcile API protocol.

Ontotext Reconciliation

TRL 5

Ontotext reconciliation generates a reconciliation API endpoint on top of an RDF knowledge graph.

Crocodile

TRL 6

Crocodile is a powerful Python library designed for efficient entity linking over tabular data. Whether you're working with large datasets or need to resolve entities across multiple tables, Crocodile provides a scalable and easy-to-integrate solution to streamline your data processing pipeline.

Lion-Linker

TRL 6

Python library that uses Large Language Models (LLMs) to perform entity linking over tabular data. It efficiently links entity mentions in tables to relevant knowledge graph entities using customizable prompts and batch processing.

Koala-UI

TRL 6

User-friendly web application designed to explore and visualize entity linking results efficiently. It features a clean interface, easy navigation, and powerful data handling capabilities.

StructR

StructR is the counterpart of LinkR for unstructured data. It generates structured data from the unstructured input text through semantic annotation, linking and extension. The text is processed by linguistic and semantic tools and concept mentions are identified and disambiguated from context. Extension with custom annotation services is supported through a labeling interface for creating and editing text annotations.

Wikifier

TRL 9

The JSI Wikifier is a web service that takes a text document as input and annotates it with links to relevant Wikipedia concepts (entities).

Expert AI Platform Document Analyser

TRL 9

With the Natural Language API's document analysis capabilities, you can perform deep linguistic analysis, keyphrase extraction, named entity recognition, relation extraction and sentiment analysis.

Event Registry Relation Classifier

TRL 9

The Event Registry (ER) Relation Classifier identifies and categorizes relations and events in text using a predefined taxonomy. As part of its core functionality, it also performs Named Entity Recognition and enriches text using Wikifier. The tool’s internal pipeline is very efficient, with hundreds of news articles to be classified per second, effectively enabling real-time monitoring of relations/events. It is an integral part of the Event Registry service, with outputs accessible via both the web platform and the ER API.

ClassifiR

ClassifiR tool collection provides solutions for automatic text classification into a range of taxonomies.

Categorizer

TRL 6

Categorizer is a multi-faceted research effort encompassing the semi-automatic adaptation of existing taxonomies to new use cases, automatic taxonomy alignment, and the development of corresponding text classifiers.

Expert.AI Platform Document Classification

TRL 9

Document classification determines what the document text is about by mapping it to the categories of a tree.

Expert.AI Expert Systems Generator

TRL 7

Expert.AI Answers creates expert systems from PDF documents by combining symbolic/semantic analysis with neural networks in a RAG architecture, enabling natural language queries with traceable, explainable answers ranked by confidence or generated via controlled LLMs.

Infrastructure Services

Infrastructure Services provide non-functional capabilities needed to support the effective deployment and execution of pipelines. These services enable scaling, reuse, streaming, and environmental impact monitoring of data enrichment processes.

ResourcR

TRL 5

Provides infrastructure components to support the creation of linking services for a given dataset from a data provider as well as access mechanisms such as search and query. Enables performant linking and search functionalities with limited effort.

ScalR (TAO)

TRL 7

Provides infrastructure components for executing cleaning, transformation and linking at large scale using software containers. Supports management of data enrichment pipelines on heterogeneous computing infrastructures.

ScalR (SIM-PIPE)

TRL 7

Kubernetes and Argo Workflows based tool to execute dry-runs of data pipelines and provide detailed metrics and predictions of resource consumption for the individual steps of data pipelines.

StreamR (StreamStory)

TRL 7

Provides infrastructure components for streaming support in data enrichment pipelines. Pipes data streams from/to appropriate endpoints ensuring high throughput for setting up custom streams for new applications.

StreamR (Time Series Explorer)

TRL 5

This tool is focused on data cleaning, de-noising, graph methods application. Implementation of descriptive statistics, filters, transition diagrams, visualizations and predictions are done for specific time series data.

StreamR (Event Registry)

TRL 9

Event Registry is an AI-powered platform that continuously monitors and analyzes global news from over 100,000 sources in real time. It automatically detects, clusters, and categorizes news articles into structured events, enabling users to track emerging topics, trends, and developments across languages and regions.

GreenR

TRL 5

Provides infrastructure components to support monitoring of data enrichment pipelines in terms of their environmental impact. Monitors the carbon footprint of pipeline components.