Verifiable confidential computing for processing biometrics - Part II
In the first part of our exploration into verifiable confidential computing, we delved into the intricacies of this technology. Now in this part, we turn our attention to the reasons behind the Humanode team’s decision to embrace confidential computing, its advantages plus flaws, and how we plan to deal with the limitations. Let’s dive into it.
Those who have been following our journey closely are aware that our ultimate goal is to enable computations on biometric data in an encrypted manner. There will be no need to decrypt the data at any stage and all the computations will be done in a decentralized distributed system. Let me explain.
The idea is to simply send your encrypted data to any node in the system. Nodes would only be able to compare the data with the rest of the data and verify your uniqueness.
All of this will be done through the implementation of homomorphic encryption. However, the implementation is still a work in progress and will require considerable time. For detailed information on the progress of our implementation check this paper.
For now, we use a less optimal solution with hardware encryption and CVMs. With our future homomorphic scheme, we won't need CVM as the scheme itself will provide this property. However, the reason to go with CVMs and hardware encryption is that the trust model between the two is generally similar: the biometric data is allowed to be handled in unencrypted form on the user's local device only. Once the data leaves the device it remains encrypted throughout the transit as well as when it is at rest.
“Currently, we have to resort to using a CVM, which has memory encryption and blocks external access, effectively giving us the property that we want where the data is handled in the encrypted form,” MOZGIII said.
This property of CVM solves the problem for us until we deploy the homomorphic encryption. Nonetheless, there are some limitations to using CVM, or safe to say, some trade-offs that we have to make while using CVM.
Let us discuss these limitations before moving on to the solutions.
Limitations to the implementation of CVM
Even though our biometric verification system was based on CVMs, the solution needed fixing.
When running the system in a cloud environment, you are required to supply an initial image along with disk contents comprising your kernel and operating system. The cloud provider then boots this into the cloud system.
However, to be able to perform remote attestation and be certain about the code running in the BIOS of the VMs, you need to have control over the kernel and operating system.
And that's not it. Apart from the system and kernel, you must also control the EFI firmware, essentially the BIOS.
The control over the BIOS is crucial because it executes within what we deem to be the secure zone, i.e. within the encrypted memory. There's a possibility that the BIOS could trigger some trap instructions which could result in a data breach.
But there's an obstacle here.
The CVM provider doesn't currently permit firmware specification. This limitation is why we can't execute an end-to-end remote attestation, as we are uncertain about the code operating in the BIOS.
Here's a simplified explanation of our situation. Previously, we relied on the services offered by the cloud providers, precisely their proprietary solutions. They permit us to run any Virtual Machine (VM) confidentially, and while it's virtually impossible to open them, there is an inherent trust we had to place in the cloud provider not to access our data unauthorized. Not just us, but our users also need to trust them.
From a superficial viewpoint, these providers do seem to guarantee security. They restrict access and prevent the opening of memory or disk space. On the surface, this looks promising.
However, we wish to make our system more transparent and trustless. For example, we would rather like to know what’s running in the AMD SEV-SNP and the actual CPU architecture, including all information and documentation. There are certain properties we must ensure to make our system truly secure and transparent.
Even though Azure claims that its attestation service can provide this transparency, to attain this, we have to trust its service. This is because, in addition to running our VM, they also operate their own BIOS, which is bundled with their attestation service.
But still, we can't know for sure if firmware allows us to deploy a safe attestation service and if they're not trying to mess with the data. So how do we deal with this issue? Read on to find out.
How do we resolve these issues by implementing Verifiable Confidential Computing?
We aim to eliminate the trust dependency and start with a deployment using a Confidential VM that supports remote attestation and allows us to set up our own firmware, essentially leveraging the cloud offerings provided by these service providers.
In addition to that, there is a solution that we're experimenting with our dedicated server, which has a CPU with AMD SEV SNP support. We plan to use that physical system to run the virtual machine and experiment with the setup. So it's like the cloud only allows you to run virtual machines but by having a physical system, we can also control the host part of the system, which is usually controlled by Azure.
Here's another way to think about it.
Our current focus is on the latest version of CVM deployment, which incorporates two key features. Firstly, we have the ability to include our firmware, which subsequently authenticates the rest of the loaded code, including the kernel image and beyond. Secondly, we want to have the host part of the system.
In combination, these two features enable us to deploy a system that calculates cryptographic hashes of all data involved in the booting process upon launch, a process known as launch measurement.
With this deterministic launch measurement in place, we, and indeed anyone else, yes even you, will be able to compare the launch measurement of our system with that of code running in the cloud.
We will release the official disk image and kernel, allowing individuals to manually compute the hashes. This will enable them to access and understand the contents and verify the code running on the VM.
Additionally, we will provide means to review the source code and all other components we include in the image. From this source code, the same image can be built.
Essentially, we are creating a path for individuals to inspect our source code and verify that the code in question is indeed the one implemented on the VMs, meant to verify uniqueness and liveness.
Here's what this means in layman's term: We use CVM utilizing AMD SEV-SNP to boot servers in such a way that after launching, it self-configures and self-executes to process and store the data in an encrypted form so that no one, not even we, can access or temper the data at any stage of the data-lifecycle be it data in use, data in transit or data at rest.
The isolation feature of the CVM allows us to keep the computation process private and confidential from other components.
And the verification feature allows anyone to check that the computation applied to the biometric data is only meant to ensure that the data is from a unique entity.
And with our current experimentation to use dedicated servers, we will be able to ensure that the potential attacks that could be orchestrated by getting control over the hypervisor are eliminated.
Summing Up
After the implementation of verifiable CVM, we will be able to provide two main features.
I. Isolation - Meaning that the computation process will be private and no one could get their hands on unencrypted data.
II. Verifiability - Even though the computation will remain private but anyone with a cryptographic hash will be able to confirm that the system is running the code that is meant to verify the uniqueness of biometrics data.
Although homomorphic encryption is the ideal solution to keep data private and safe. Using hardware encryption with verifiable CVMs provides us with a temporary solution that is both private and transparent.