Data Mining & Data Fusion
The Data Mining and Data Fusion modules have the following components:
- The Data Model Generation (DMG-DM) with particular feature extraction for Data Mining and feature fusion for Data Fusion, it extracts the primitive features of the EO image in a spatial multi-grid partition and the selected EO product metadata;
- The DataBase Management System (DBMS); it stores and manages the DMG-DM output as actionable information, with particular tables for Data Fusion;
- The Image Search and Semantic Annotation (ISSA) component based on Active ML, and the Multi-Knowledge Query (M-KQ) component, with particular GUIs for Data mining or Fusion, and
- The System Validation tool.
The Data Model Generation and DBMS modules are encapsulated in Docker containers, and are interfaced via an internal service; all are deployed on the CANDELA platform. The ISSA and M-KQ modules are an external App connected to the DBMS via an external service, and are operated interactively with the user in the loop. The ISSA includes an EO image ontology. The System Validation is part of ISSA.
The tools are based on Active Learning, which is a form of supervised machine learning. The learning algorithm is able to interactively interrogate a user to label new data points with the desired outputs. The key idea behind Active Learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. The input is the training data sets obtained interactively from the GUI. The training dataset refers to a list of images marked as positive or negative examples. The output is the verification of the Active Learning loop sent to the GUI and the semantic annotation written into the DBMS catalogue.
Active learning supports users also to search for images of interest in a large repository. During the Active Learning, two goals are achieved:
1) learn the targeted image category as accurately and as exhaustively as possible, and
2) minimize the number of iterations in the relevance feedback loop.
Particularly for the EO image application Active Learning with a very small number of training samples, it allows their detailed verification. Thus, the results are trustable, avoiding the plague of training database biases. Another important asset is the adaptability to user conjectures. The resulted annotations, as semantic labels indexed to image patches, image features and the related metadata can be exported as SQL, the EO semantics and labelled images can be exported as Geotif format for integration with non-EO semantics in the CANDELA platform or other systems.
The tools have been validated for Sentinel-1 and Sentinel-2 Big Data for an area larger than 1 Million square Km, and also for other 10 multispectral and SAR sensor data.
The demonstration was done in the CANDELA use cases achieving an important result, the vineyards classification from single Sentinel-2 observations. At the Sentinel-2 resolution till now this was possible only from multi-temporal observations.