Building a privacy-preserving architecture with less server trust
By
Published: July 03, 2017
Before ever having worked on a related field, I had always thought of privacy as a matter of having the strongest encryption method in the market and using it everywhere. However, recently, I've had the privilege of working on the development of a privacy-preserving email client called , which seeks not only to decentralize control of data, but also to make privacy accessible to the non tech-savvy user through a modern email interface. It was while working on the Pixelated project that I realized that implementing encryption is, in fact, almost futile—unless you have a suitable architecture.
Ìý
In fact, one of the critical concepts of privacy-preserving software is: never trust the server. Though the reasoning behind this might be quite intuitive to some, it helps to note that relying on the servers allows for:
We're actually moving further and further away from locally persisted towards centralized data. So, we at Pixelated, want to share a few lessons we learned about how to develop software that tries its best not to share information unnecessarily with our little server onlookers. We haven't had a lifetime of experience, but we've learned at least one or two things along the way, a lot of them from our LEAP partners that provide a lot of the tooling necessary for our system.
Ìý
Pixelated had a complex challenge of implementing account recovery in a way where we guaranteed users would not lose their account, but also would not leave security behind. Given our target users, we decided that we would generate recovery codes on sign-up, and send them to an optional backup account.
Ìý
Great, so, on sign-up, we had to ask the user for a backup account. But we also needed to ensure that on every use, a new recovery code would be generated and sent to that backup account. One possible solution for this flow would be to persist the user's backup account for future use. However, that would let our service providers (and, therefore, any snoopers) know that the owner of that backup account also has an account on a Pixelated provider. It would reveal much more information than we would like to.
Ìý
To avoid sharing so much unwarranted information, our strategy was to send the recovery code to the backup email as soon as it was submitted, never persisting it. Consequently, that meant we couldn't send it again in the future, in case the user asked for a recovery code again. So, what do we do then? Well, why not just ask the user to submit a backup email again? Whether it's the same email submitted before or not, we don't need to know.
Ìý
I find this example quite interesting because it goes beyond asking software developers not to persist irrelevant data and actually demonstrates design principles that avoid having to persist even relevant data. It's not just a matter of trimming what's kept in the database for no specific purpose, but actually suggesting that the whole user's experience can be designed around privacy as a requirement.
Ìý
In fact, the most robust way of guaranteeing privacy in most software is to ensure all data sent to the server is already encrypted before even leaving your device. Client-side encryption guarantees that only you have control over your encryption key, so that even with attacks on the server, or during transport, there is no way of finding out how to decrypt the data. Of course, this hinges on having a strong encryption protocol and key, but we leave strength of the encryption for other articles to expand on.
Ìý
For us at Pixelated, we learned that at some points, it's even worth having incoming data from the Internet to the server be sent to the client to be encrypted! I don't know about you, but I hadn't ever thought about that before. Just think about it: instead of pushing processing to the server, we're actually suggesting sending it to be done in the client!
Ìý
Confused? Let's see if I can explain how LEAP's syncing of encrypted documents achieved this.
Ìý
Pixelated architecture includes the Pixelated client—by which we mean a Javascript and Python layer, not just browser UI—and the LEAP provider, where there is a whole ecosystem for polling for new emails coming in from other mail servers. This architecture allows for the client to be installed directly on your personal device, while the provider is a separate service running on someone else's machine.
Ìý
Now, say someone sends you a private email. The content of this email might already be encrypted by the sender, but all the metadata (which includes information such as the sender, recipient and subject) is not. So, what LEAP does is that when this email arrives in the provider, it's synced to the client (using encrypted transport), split into a document for easy storage, and encrypted with a LEAP-generated secret key, in the user's device. In a way, in this flow, the provider's responsibility is to forward the incoming data from the mail server to the client for it to be processed, then synced to the server again.
Ìý
The architecture looks something like this:
Ìý
Ìý
However, as expected, delegating to the client has its limits in terms of replicating this in multiple devices, which means we didn't stop there. We now needed to be able to share the secret used to encrypt the data on the client, so that you could read your emails from other devices as well (such as a mobile phone or work computer). So, LEAP added yet another level of complexity by storing this secret in a safe way on the server. Tricky, huh? Well, that's where we learned our last strategy on how not to trust the server.
Ìý
Our first step was to ensure their secret was encrypted before being sent to be persisted on the server (again, client-side encryption). Since the user should be able to retrieve this secret at any moment, we used their password for encryption (using the Galois Counter Mode of AES encryption method). But beyond that, we also wanted to ensure that all secrets saved on the server did not reveal any information about the user, such as say, their uuid or username, so that an attacker could cross-reference the information. But how do we persist a user's data in a shared database ensuring only the software knows how to retrieve the right secret for the the right user?
Ìý
Here's what we learned in this case. We were using a document-oriented NoSQL database (CouchDB) as the server-side DB. Queries on this database are usually done with the document's id, which meant that each encrypted secret would be a different document in this database, with their unique document id. So, we had to know which document id related to which user. But, instead of just using the user's id as the document id for a direct relation, we learned that a viable and useful alternative is to again use the user's password (which the server will never know) to derive the document id. That way, the service provider or any attacker will never know which document is from which user without their password.
Ìý
Sounds crazy, doesn't it? Wouldn't that allow an attacker to find out all users' passwords? Well, of course we didn't use the actual password as the id. We hashed it out from the user's password and user id, ensuring that the attacker would not be able to discern one from the other and, therefore, guaranteeing anonymity.
Ìý
This tactic goes a long way to demonstrate how, even in cases where we do need to persist some important data, it's important to consider all the ways in which such sensitive data can be protected. Consider that even in the case of leaking of the database, there are two obstacles for deriving any significant data from it: content encryption and decoupled user data.
Ìý
The most pressing of these issues is the short period between receiving an email from another provider and encrypting it on arrival. Even though some of these emails might actually have been sent to you encrypted, the metadata will not be (as email protocols leave this information in plain text), which means that for that short moment, an attacker can know who is the sender and what the subject is. This, of course, happens because we are not in control of all email networks and alternative mails servers and can't change how the information reaches us.
Ìý
Another concern that came up recently pertaining to how we maintain data safe from the server, is how we store secrets in a way that can be recovered if you lose your password. At the moment, our users can only retrieve those secrets because we derive the document id from the user's password, which means that if they lose their password, they lose access to the document, without which they cannot read their emails. This presents a whole new set of challenges that we'll have to learn how to deal with, as we think of account recovery as a coming feature.
Ìý
Privacy poses so many challenges to the way we think of software, making decisions trickier and sometimes consequently costlier. But, if you've made it this far, you're probably just as intrigued by these challenges as we are. It usually takes no more than a little creativity to find a solution that works perfectly for your users.
Ìý
In fact, one of the critical concepts of privacy-preserving software is: never trust the server. Though the reasoning behind this might be quite intuitive to some, it helps to note that relying on the servers allows for:
- Misuse of private information by service providers
- Third parties (i.e. government agencies) coercing service providers to share data
- Server attacks that lead to thefts or leaking of information
In the age of cloud computing, not trusting the server can be quite a tricky feat.
We're actually moving further and further away from locally persisted towards centralized data. So, we at Pixelated, want to share a few lessons we learned about how to develop software that tries its best not to share information unnecessarily with our little server onlookers. We haven't had a lifetime of experience, but we've learned at least one or two things along the way, a lot of them from our LEAP partners that provide a lot of the tooling necessary for our system.
Don't persist the user's data
This seems obvious, but I'd like to share here an example of how Pixelated designed around a feature to avoid saving a user's email to drive the point home.Ìý
Pixelated had a complex challenge of implementing account recovery in a way where we guaranteed users would not lose their account, but also would not leave security behind. Given our target users, we decided that we would generate recovery codes on sign-up, and send them to an optional backup account.
Ìý
Great, so, on sign-up, we had to ask the user for a backup account. But we also needed to ensure that on every use, a new recovery code would be generated and sent to that backup account. One possible solution for this flow would be to persist the user's backup account for future use. However, that would let our service providers (and, therefore, any snoopers) know that the owner of that backup account also has an account on a Pixelated provider. It would reveal much more information than we would like to.
Ìý
To avoid sharing so much unwarranted information, our strategy was to send the recovery code to the backup email as soon as it was submitted, never persisting it. Consequently, that meant we couldn't send it again in the future, in case the user asked for a recovery code again. So, what do we do then? Well, why not just ask the user to submit a backup email again? Whether it's the same email submitted before or not, we don't need to know.
Ìý
That not only solves the problem of privacy, but also saves us the trouble of updating the backup account if it changes.
I find this example quite interesting because it goes beyond asking software developers not to persist irrelevant data and actually demonstrates design principles that avoid having to persist even relevant data. It's not just a matter of trimming what's kept in the database for no specific purpose, but actually suggesting that the whole user's experience can be designed around privacy as a requirement.
Client-side encryption
If we were just talking about protecting information, why can't we just encrypt the data on the server? Why can't we just ensure the data is unreadable? Why add so much complexity and even potentially tinker with the user's experience? Well, just as this solution delegates security to the server, it also pushes it closer to the attacker. Server-side encryption would imply that the server has access to the encryption key and algorithm, which just means we're abstracting the weakest link, but not actually removing the threat.Ìý
In fact, the most robust way of guaranteeing privacy in most software is to ensure all data sent to the server is already encrypted before even leaving your device. Client-side encryption guarantees that only you have control over your encryption key, so that even with attacks on the server, or during transport, there is no way of finding out how to decrypt the data. Of course, this hinges on having a strong encryption protocol and key, but we leave strength of the encryption for other articles to expand on.
Ìý
For us at Pixelated, we learned that at some points, it's even worth having incoming data from the Internet to the server be sent to the client to be encrypted! I don't know about you, but I hadn't ever thought about that before. Just think about it: instead of pushing processing to the server, we're actually suggesting sending it to be done in the client!
Ìý
Confused? Let's see if I can explain how LEAP's syncing of encrypted documents achieved this.
Ìý
Pixelated architecture includes the Pixelated client—by which we mean a Javascript and Python layer, not just browser UI—and the LEAP provider, where there is a whole ecosystem for polling for new emails coming in from other mail servers. This architecture allows for the client to be installed directly on your personal device, while the provider is a separate service running on someone else's machine.
Ìý
Now, say someone sends you a private email. The content of this email might already be encrypted by the sender, but all the metadata (which includes information such as the sender, recipient and subject) is not. So, what LEAP does is that when this email arrives in the provider, it's synced to the client (using encrypted transport), split into a document for easy storage, and encrypted with a LEAP-generated secret key, in the user's device. In a way, in this flow, the provider's responsibility is to forward the incoming data from the mail server to the client for it to be processed, then synced to the server again.
Ìý
The architecture looks something like this:
Ìý
Figure 1: Client-side encryption
Fascinating, huh? There's some really crazy reverse-thinking going on in this architecture, for the client to be where we focus any processing that requires trust in the device.Ìý
However, as expected, delegating to the client has its limits in terms of replicating this in multiple devices, which means we didn't stop there. We now needed to be able to share the secret used to encrypt the data on the client, so that you could read your emails from other devices as well (such as a mobile phone or work computer). So, LEAP added yet another level of complexity by storing this secret in a safe way on the server. Tricky, huh? Well, that's where we learned our last strategy on how not to trust the server.
Decouple user from server-side data
Given the challenge of storing a user's secret as mentioned above, we had to evolve our solution to guarantee that the server could at least never find out whose secret was whose. This was a point where we simply didn't have the option of not persisting the user's data, as the user needed that secret to be able to read their emails from a device of choice. And that required knowing the secret with which to decrypt them.Ìý
Our first step was to ensure their secret was encrypted before being sent to be persisted on the server (again, client-side encryption). Since the user should be able to retrieve this secret at any moment, we used their password for encryption (using the Galois Counter Mode of AES encryption method). But beyond that, we also wanted to ensure that all secrets saved on the server did not reveal any information about the user, such as say, their uuid or username, so that an attacker could cross-reference the information. But how do we persist a user's data in a shared database ensuring only the software knows how to retrieve the right secret for the the right user?
Ìý
Here's what we learned in this case. We were using a document-oriented NoSQL database (CouchDB) as the server-side DB. Queries on this database are usually done with the document's id, which meant that each encrypted secret would be a different document in this database, with their unique document id. So, we had to know which document id related to which user. But, instead of just using the user's id as the document id for a direct relation, we learned that a viable and useful alternative is to again use the user's password (which the server will never know) to derive the document id. That way, the service provider or any attacker will never know which document is from which user without their password.
Ìý
Sounds crazy, doesn't it? Wouldn't that allow an attacker to find out all users' passwords? Well, of course we didn't use the actual password as the id. We hashed it out from the user's password and user id, ensuring that the attacker would not be able to discern one from the other and, therefore, guaranteeing anonymity.
Ìý
This tactic goes a long way to demonstrate how, even in cases where we do need to persist some important data, it's important to consider all the ways in which such sensitive data can be protected. Consider that even in the case of leaking of the database, there are two obstacles for deriving any significant data from it: content encryption and decoupled user data.
What we haven't figured out
Though Pixelated seeks to avoid trusting the server (as far as usability for our target users permits), we certainly can't say we've figured it all out. Unfortunately, we don't have control over all aspects of our product, as email providers rely on a network that can't all be rebuilt for us. So, there are a few things that are out of our hands.Ìý
The most pressing of these issues is the short period between receiving an email from another provider and encrypting it on arrival. Even though some of these emails might actually have been sent to you encrypted, the metadata will not be (as email protocols leave this information in plain text), which means that for that short moment, an attacker can know who is the sender and what the subject is. This, of course, happens because we are not in control of all email networks and alternative mails servers and can't change how the information reaches us.
Ìý
Another concern that came up recently pertaining to how we maintain data safe from the server, is how we store secrets in a way that can be recovered if you lose your password. At the moment, our users can only retrieve those secrets because we derive the document id from the user's password, which means that if they lose their password, they lose access to the document, without which they cannot read their emails. This presents a whole new set of challenges that we'll have to learn how to deal with, as we think of account recovery as a coming feature.
Ìý
Privacy poses so many challenges to the way we think of software, making decisions trickier and sometimes consequently costlier. But, if you've made it this far, you're probably just as intrigued by these challenges as we are. It usually takes no more than a little creativity to find a solution that works perfectly for your users.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of ºÚÁÏÃÅ.