DataCert Project: roadmap

logo UPSud and UPSaclay logo UPSaclay

Presentation

Our reseach commitment ranges from low level to high level deep specification. SQL mechanization being the cornerstone of the envisionned research plan it appears first in the list. The last research line is fully funded by ANR grant ANR -15-CE39-0009 Deep Specification of Privacy Aware Data Integration

Certified SQL query evaluation

People involved: Benzaken and Contejean

SQL compilation consists in two main phases. The first phase, parses and translates the query, when possible, into its relational algebra counterpart and the second phase generates different query execution plans and then chooses the best one in terms of performance. We first propose a Coq mechanisation of an SQL parser. As SQL is more powerful than relational algebra, any serious mechanisation of SQL needs an extended algebra to take into account queries in their full generality i.e., select-from-where-group by-having with aggregates, function symbols and nested queries.

Instead of relying on the extraction mechanism of the Coq proof assistant, we could use CertiCoq, being one of its first significant test cases. Thanks to it, (Gallina) SQL_Coq queries will be translated into Clight, and then with CompCert into assembly code. Once such a translation is obtained, it will be interesting to compare its execution times on various DB instances with respect to direct executions with existing engines.

photo

A minimal realistic SQL engine

People involved: Balabonski, Benzaken and Contejean

Real SQL engine implementations such as Oracle, DB2, PostgreSQL or Microsoft Access, are very involved and have a very long history. In order to gain performance, usually, their code is not modular at all. Hence formally verifying such existing engines is therefore not possible at the moment. This is why we aim at defining from scratch, but along the line of BerkeleyDB, DBLight, a small and nevertheless realistic SQL engine.

Specification

This new SQL engine should enjoy several properties:

A first step should be to (deeply) specify DBLight in Gallina. We believe that DBLight should at least provide a module for transaction management (concurrency control and maybe recovery) and one for memory management (disk access).

Implementation

In a second step, we would investigate whether DBLight is simple enough for being implemented on the top of an hypervisor. For that purpose, and in order to have a fully certified chain, we should chose the mCertiKOS Hypervisor. We expect that this step will require to re-examine the specification in order to make some hopefull small fixes.

Coq Deep specification of privacy aware data integration

People involved: Benzaken, Contejean, Dumbrava

The overall objective of our project is to obtain a Coq certified platform for privacy-aware data integration and exchange, this includes providing deep specification of different data models, of various security and privacy policies, of most known or to be designed by the project (privacy preserving) integration and exchange algorithms to date, formally proven correct translations into the world-wide database standard SQL, yielding a complete suite of correct-by construction programs. Obviously, this platform could be linked to the CertiCoq compiler. More on Project ANR -15-CE39-0009