All the collections of data provided by Copernicus satellites over the past recent years open the way for data scientists to develop hundreds of innovative use cases. This is true if they can manipulate this huge amount of complex data. The Copernicus Data and Information Access Systems (DIAS), are trying to solve the data storage and access issues.
The Candela platform intends to provide a generic environment to perform data analytics on Copernicus data with an online development environment providing built-in scalability management features and easy data access. The main advantage of this platform is that the computing resources needed by users and storage resources (DIAS) are hosted on the same cloud. Running our platform on A DIAS allows us to have direct access to all the Copernicus data and thus reduce access time to earth observation products. On top of the CreoDIAS virtual machines, we have deployed a Kubernetes cluster that acts as an orchestrator for the platform. It ensures that all the components in the upper layers have access to computation resources and that these components can communicate with each other. These components can be off the shelf products like GeoServer or Jupyter or dedicated tools. The top layer of the platform corresponds to the data analytics tools developed during the project.
The development environment proposed to the user on the platform is based on Jupyter notebooks. This easy to use online development environment allows the user to access the platform from anywhere. Dedicated libraries have been developed to ease the search and access to earth observation products hosted by CREODIAS, to facilitate the access to processing services developed during the project, and to allow the user to display its georeferenced data on top of a map. The Jupyter environment is the user entry point to the Candela platform: it is used to access earth observation products, to launch the processes developed in this project, to prototype new processing services (see the image).
When a Candela user launches the online development environment, a new instance of the Candela Jupyter environment is launched in a dedicated Kubernetes pod. The user benefits at the same time from an isolated instance with dedicated resources and from the common environment configuration with geoscience libraries, easy access to EO data, and processing services. Each time a user launches a new service dedicated resources is provided for its execution. The user can launch several services at the same time. This parallelization mechanism allows the user to process a large amount of data.
A monitoring component constantly controls the state of the Kubernetes cluster. If the cluster runs out of resources, it is not able to launch new applications or services. Before reaching this state, the scalability component triggers the creation of a new virtual machine allowing Kubernetes to schedule more services.