Pickle is not for crypto

Banner photo: Christine on flickr CC BY-SA 2.0

The Python builtin module pickle implements binary serializing of (most) Python objects. It is insecure if used incorrectly - it even says so in a big red box at the top of the documentation page.

Using pickle for serialisation of any of objects in any python cryptographic library, such as our python-paillier, is not considered safe unless you 100% control and trust the serializer and the transport/storage medium. Pickle serializes both implementation code and data which makes it trivial to hide malicious code by modifying the object before pickling.

What follows is a small example of how not to serialize a public paillier key when you are using python-paillier!

Imagine we are a malicious participant in some multiparty computation. Perhaps there is a chain of participants before some final result comes back for decryption. Assume we have to create a Paillier keypair and the method of serializing a public key is to use pickle. This is purely for illustrative purposes - the whole point of this post is to reinforce why this would be a bad idea.

We first craft a new public key class that will override the encrypt function to first run code of our design:

from phe import paillier
import pickle

public_key, private_key = paillier.generate_paillier_keypair()

class DeviousPhePublicKey(paillier.PaillierPublicKey):
    def encrypt(self, x, **kwargs):
        print("Sending the secret {} to my webserver...".format(x))
        return super().encrypt(x, **kwargs)

pk_devious = DeviousPhePublicKey(public_key.n)

Now we serialize and send our serialized public key to someone else with the hope of stealing their secrets:

serialized_public_key = pickle.dumps(pk_devious)

On every other machine using this public key - i.e., the ones with the secret raw data:

pk_reconstructed = pickle.loads(serialized_public_key)

# Checking that they received a PaillierPublicKey doesn't catch foul play
assert isinstance(pk_reconstructed, paillier.PaillierPublicKey)

Now when someone else uses this reconstructed key:

>>> cipher_remote = pk_reconstructed.encrypt(42)
"Sending the secret 42 to my webserver..."

As soon as encrypt is called we have run arbitrary code that had access to the raw secret!

In this simple example we only print the secrets out, but using the Python standard library we can easily post the secret to a web server, or instead of stealing a secret we can open a subprocess to actually allow remote access. The same threat applies if the communication or storage medium is insecure - if someone in the middle can intercept and wrap a pickled object without being detected they can run arbitrary code.

Just for fun here is another example. When the ciphertext of a DeviousEncryptedNumber is accessed we attempt to open and print the contents of a local file: "~/.aws/credentials". Of course we would do more than print secrets if we were actually malicious!

class DeviousEncryptedNumber(paillier.EncryptedNumber):

    def ciphertext(self, be_secure=True):
            import os.path
            print(open(os.path.expanduser("~/.aws/credentials"), 'r').read())
        except Exception:
        return super().ciphertext(be_secure)

enc = public_key.encrypt(42)
devious_enc = DeviousEncryptedNumber(public_key, enc.ciphertext())
serialized_encrypted = pickle.dumps(devious_enc)

This poorly serialized encrypted number is sent to someone else:

encrypted_reconstructed = pickle.loads(serialized_encrypted)
assert isinstance(encrypted_reconstructed, paillier.EncryptedNumber)

Again it looks just like a valid paillier.EncryptedNumber, until they use it:

>>> print(encrypted_reconstructed + 3)
aws_secret_access_key = XXXYYY
aws_access_key_id = XXXZZZ

<phe.paillier.EncryptedNumber object at 0x7f6019c95b70>

Now their AWS account is... in a bit of a pickle.

As previously mentioned none of these issues are specific to the objects in our library, here is an example of someone getting shell access because of a webserver deserializing a header token. I wanted to post these remarks because I've been asked about serializing EncryptedNumber instances with pickle twice in the last week!

Our advice is to keep it simple! Serialize in a dumb format like JSON or YAML that will only deserialise to primitive data types. If you're using python-paillier take a look at our JSON serialisation example in the documentation.

If our public key had instead been serialized based on the JSON Web Key format the deserialization offers a much reduced attack surface with no option for running code.

    "kty": "DAJ",
    "kid": "Example Paillier public key",
    "key_ops": [ "encrypt" ],
    "n": "m0lOEwDHVA_VieL2k3BKMjf_HIgagfhNIZy1YhgZF5M",
    "alg": "PAI-GN1"

For more fun with serialization formats that are too smart for their own good (looking at you XML), check out Tom Eastman's fantastic talk Serialization formats are not toys - PyCon 2015.