Hacker News new | past | comments | ask | show | jobs | submit login
Keeping Passwords in Source Control (ejohn.org)
110 points by creativityhurts on Feb 6, 2013 | hide | past | favorite | 53 comments



To solve this problem, I wrote git-crypt[1], which uses git's smudge/clean filters to transparently encrypt/decrypt files when you check them in/out. So it's a lot like this solution except you don't need the manual makefile steps. As an added bonus, git diff/blame still work on the encrypted file.

[1] https://github.com/AGWA/git-crypt and http://www.agwa.name/projects/git-crypt/


git-crypt is incredibly cool - great work!


Thanks!


Makes me wonder the overhead of keeping a history which consists of n copies of the files instead of a delta

(or the diff is done on the plaintext and then encrypted?)


git-crypt is only used on the sensitive files in the repository and only those with the key are going to be changing those files anyway.


Right. If you need to encrypt an entire repo I don't recommend using git-crypt - there are probably better ways. But for most cases where you'd use git-crypt the overhead isn't that much.


First: I completely agree. Keeping plaintext secrets in source control is a bad idea. Encrypting them is a good idea. If you have plaintext secrets, encrypt them now using this makefile or git-crypt. Then rotate them.

That said, this solution has a couple of issues:

1. It encrypts the entire file instead of individual secrets in the settings file. Encrypted files can't take advantage of many version control features. A small change in plaintext creates a huge diff in ciphertext. Git blame doesn't work anymore. Git diff gets a lot more spammy, since you'll see a diff for the entire settings file if there's the slightest change in it.

2. It uses symmetric key encryption. If a developer knows the password to encrypt a secret setting, they can decrypt all the other secret settings. This is true until someone rotates the passphrase and re-encrypts the file.

To fix both of these problems, I recommend using Keyczar (http://code.google.com/p/keyczar/). If you write the right wrappers, it allows you to encrypt individual settings with a public key. Decrypting them requires a private key that exists only on production servers.

At a past job (Cloudkick), sensitive things in our settings.py looked like this:

  from cloudkick.crypto_wrappers import kz_decrypt
  ...
  BORING_THING = "whatever"
  SECRET_THING = kz_decrypt("kz::xxxx....", "/path/to/private/key")
kz_decrypt did exactly what you'd think: given an encrypted string and a private key, return the decrypted string. The private key was only on production servers, so the risk of leaking a secret was minimal. The public key was in source control, so anyone could encrypt a secret. For debugging or testing, one could also replace the call to kz_decrypt with a plaintext string. I wish the code had been released. It was only 100 lines or so.

This set-up would require a some extra work for settings files that don't allow code execution. Still, once you've set it up, it's pretty close to the most secure and convenient way to store secrets.


This is neat. I did something similar using a set of files with tight permissions deployed only on production servers. Like your solution it depended on configs being written in a scripting language. I think it was ten lines of code.

The whole reason for doing it at all was simply that MySQL doesn't support Kerberos. There's a very old ticket for that in their bug tracker.

"For everything else, there's Kerberos."


Doesn't this mean that for a sufficiently short secret, someone could run an offline attack to guess it?


Good catch. My comment was already rather long, so I didn't mention that the public key actually encrypts an AES key that encrypts the secret. A different AES key is used for each secret. Also if the secret is < 1000 bytes (I forget the exact value), it's padded with random bytes. The encrypted format is something like kz::[AES key]:[encrypted padded secret]. Both the AES key and secret bytes are base64 encoded so they don't screw up parsing or break Python string quoting/escaping.


Presumably it's padded if it's not a multiple of 16 bytes, because that's the AES blocksize, and not just some off-the-wall requirement that the data be 1000 bytes long. I'm also hoping that your encrypted format has one more field, which is an IV that changes each time the data is encrypted.


There's no IV, but the AES key changes each time you encrypt the secret. The key is random.


I don't think storing passwords in any format in source control is good. Like someone else said, it's mixing app logic with deployment.

We use a combination of Google's open source Keyczar [1] and a relatively new Python keyring [2] library which uses the Keyczar crypter to read/write keys to a keyring storage backend, where the backend interface can be implemented with local crypted files or a cloud service.

[1] http://www.keyczar.org/ [2] http://pypi.python.org/pypi/keyring

We built this Python wrapper called appauth around of the concept of a Keyring service by application domain.

e.g. pseudo code:

    import appauth
    auth_service = appauth.AuthService('my-web-app')
    db_creds_cfg = auth_service.get('primary-db')
Inside of db_creds_cfg, it can be a free-form dictionary that provides whatever details is needed to get into a resource:

    db_creds_cfg['db_host']
    db_creds_cfg['db_port']
    db_creds_cfg['username']
    db_creds_cfg['password']
I put in some honest effort to find an open source solution to this, but failed to find anything with a simple install process AND programming interface. Is there any interest from HN if we choose to open source this?

Furthermore, we use Google Authenticator on our servers to require two-factor auth: http://code.google.com/p/google-authenticator/, on top of disabling password auth in favor of signing in with ssh keys. All log files are then either set to permission 600 just to be super paranoid.


I'd find something like that useful. Don't know if that counts as interest from HN though


That sound like a generalization of things like:

http://www.postgresql.org/docs/9.0/static/libpq-pgservice.ht...


> Is there any interest from HN if we choose to open source this?

I personally don't understand this type of comment. Just open source it!


gotta go through the motions of putting up good readmes and documentation, not a trivial amount of effort, only worth it if I think enough people want it. open sourcing something isn't exactly free effort.


A project is worth open sourcing if it's useful enough for other people to use. I personally do not believe there's much more to it than that.


Actually, having no passwords and using a platform which supports integrated authentication (like Windows) is probably the best approach with respect to handling this. The authentication requirements are handled at an infrastructure level, meaning no credentials are kept in source control or on your production web servers.

In fact, none of our web servers carry ANY credentials at all. Our IIS processes run as a specific user and are granted access to resources (message queues, databases etc) as required.

I'm not sure stuff like this is entirely possible on Linux (I haven't tried to be honest), but I assume you can do the equivalent with OpenLDAP / pam_ldap and SELinux.


How about keys to external resources? AWS access keys, API keys etc? Can Windows manage those as well?


We use a keychain system for that and store the public key in active directory. That allows us to revoke at a per user level all keys. We do a lot of integration and use 25 different external APIs including s3.

BAsically, we don't use global config like that by design - the application only recovers the security context on demand.


on a unix system you could put them inside `/etc/profile.d/user.sh` as environment variables so that whenever that user is running something those variables exist. then if you're using chef (not familiar with puppet, etc.) you could keep those passwords/keys in an encrypted data bag and set them during provisioning.


Good idea...and, yes, you can do the same with puppet.


Nice solution.


In a previous job, we had a lot of components with configuration files containing credentials for databases, etc. What we had done instead, was to put placeholder tokens (imagine %DB_ROLE1_PASSWORD%) in the configuration files, and then puppet (chef later) would be used to deploy the packages and replace the tokens. In this way, no developer ever knew of the production passwords, and only the system admins had access to the source control for the puppet scripts. There really shouldn't be any access to production credentials by developers if you have separate roles for developer and admin. (Some companies are too small, I know =)


This seems pretty reasonable. The obvious risk is that if the encryption keys leak, all of your credentials may be retroactively compromised...

Still, much better than keeping the passwords in VC unprotected, of course.

My preference still is using environment variables so that the secure bits can be fully decoupled from your code, however..


Storing secrets in the environment is an excellent idea. Heroku highly encourages this approach, and it makes it much easier to give different levels of access to code and secure data. It also makes it easier to avoid accidentally copying secrets across environments (i.e. using a production API key in staging or dev).

If you're not familiar with the practice, I'd encourage you to read the "twelve-factor" section on configuration: http://www.12factor.net/config . The advice applies even if you're not using Heroku for hosting.


It did lead to an interesting exploit though - http://titanous.com/posts/vulnerabilities-in-heroku-build-sy...


that vulnerability still would have applied if config directives were stored in files


I like using env vars for everything that might change across environment. Should be able to deploy code to a properly configured environment and have it work with zero changes.


And those can easily be put into their own repo.


Yup, this is our approach. We have a 'config' repo that contains secrets together with automated scripts for applying new settings to a named environment, and checking that an environment matches what is in version control. This repo is separate from source code, and has different permissions. (Disclaimer: we do host our site on Heroku, so this approach is pretty baked in.)

To be honest, not having secrets under some kind of source control seems like a bad idea, as you just know that the reality is that they will be in an untracked spreadsheet somewhere.


I like this idea. I would add one tip though, use a group password safe, rather than contacting person X. As a sysadmin we have passwords all over the place (root, network, wifi, desktop, remote sites, etc, etc). There are five people on our team, and we use Password Safe (windows) [1] and/or KeePassX (linux/mac) [2] to manage lots of passwords. You do not have to contact person X, for a password, if the are away for some reason, just check the safe.

[1] http://passwordsafe.sourceforge.net/

[2] http://www.keepassx.org/


Great tip. This is what I've used in a number of organizations now, including a version of this at Khan Academy. I've amended the blog post to mention this.


I had a look at these two applications. I couldn't figure out if they do anything that Vim's ':X' doesn't do? Partial key-sharing? Access delegation?



I recommend the scrypt command line utility [1] instead of openssl. Openssl use md5 as a key derivation function [2], and cost of recovering a reasonable length, randomly generated password is surprisingly low [3]. The costs in the presentation are from 2009, and I can only imagine how they've dropped thanks to a few years of bitcoin-driven gpu/hardware developments. If you trust your code host, e.g. github or bitbucket, this isn't a concern, but neither are plaintext passwords in version control. If you're using a very long, randomly generated password, you're safe as well.

The disadvantage is that you'll need to compile scrypt from source.

[1] http://www.tarsnap.com/scrypt.html

[2] slide 20: http://www.tarsnap.com/scrypt/scrypt-slides.pdf

[3] slide 19: http://www.tarsnap.com/scrypt/scrypt-slides.pdf


You seem to be confused. scrypt is a key derivation function. This blog is suggesting you use openssl (using cast5-cbc cipher) to encrypt/decrypt text that happens to contain passwords. The two actions (key derivation vs encryption/decryption) are orthogonal.

Replacing an encryption algorithm with a key derivation function doesn't make sense.


The scrypt command line utility uses the scrypt kdf to generate a 256 bit key for aes.

Both kdf and cipher are used during single file encryption with openssl and the scrypt command line utility. Openssl implicitly uses a md5 as a kdf during encryption [1.1][1.2]. Cast5 requires a 128bit key, and the kdf helps stretch the user's password to fit this key requirement.

I can understand the confusion, as scrypt is typically referenced in kdf discussions. It's actually somewhat difficult to extract the kdf functionality from the scrypt source code because the code is geared towards single file encryption. See this post for a confused q&a with scrypt's author [2]. Wrappers around scrypt like this python package[3] have made the "mistake" of using the entire encryption pipeline when they just wanted the kdf. Using scrypt in this manner should still be safe, but it will waste some cpu cycles on aes.

[1.1] http://www.openssl.org/docs/apps/enc.html

[1.2] http://www.openssl.org/docs/crypto/EVP_BytesToKey.html

[2] https://news.ycombinator.com/item?id=1350392

[3] http://pypi.python.org/pypi/scrypt/


I see. In that case I take back everything I said.


I wonder why he chose Cast5 instead of AES.

It probably doesn't matter either way, but when you deviate from standard best practices in crypto you should at least explain why you do it. (my best guess would be that the code predates the invention of AES?)


Looks to be in recent Ubuntus, is it not the same package?

    http://packages.ubuntu.com/search?keywords=scrypt


That package is it. It's in the universe repo, which is disabled by default (I think), so I didn't want to claim scrypt was as convenient as openssl.


I'm sorry but storing sensitive information (e.g. passwords) in SCM is a terrible idea even if they are encrypted.

Why are you mixing deployment with development? They should be two different things IMHO.


Wat. SCM doesn't necessarily mean development. Many, many people version control their deployment configurations (e.g. /etc/puppet/master/*)


Since we manage our servers with Puppet, we use hiera-gpg to securely store sensitive information in encrypted form in git. Puppet then safely deploys these files to our servers and our application deployment process (Capistrano) symlinks/copies these config files in to the application as part of the deployment process. The sensitive config files themselves are excluded from our application's git repository and developers keep local copies of these files (containing local dev. credentials only) for development purposes.

More info on hiera-gpg here: http://www.craigdunn.org/2011/10/secret-variables-in-puppet-...


I find only keeping boilerplate configs in repos helps decouple application code from any one installation. Software can sometimes be used in more than one environment, it's not always a one-to-one relationship.


you're still exposing yourself by putting your settings and credentials albeit encrypted, out there. i dont like this approach at all, i'd prefer either environment variable or a more ruby way of doing things like using a rake command to convert an erb to yaml file. make sure you then encrypt or at least obfuscate credentials in the config file (base64 or encryption), though hackable if you can read ruby, but at least you're adding another layer of indirection.


> console.error("Did you forget to run `make decrypt_conf`?");

> console.error("You need to run `make decrypt_conf` to update it.");

Couldn't you just make the decrypt_conf target depend on the encrypted configuration file and make the standard build command depend on the decrypted file? This way it would get enforced with every 'make'.

In case you don't use Makefiles for building at all (because you only use script languages for example) I don't get why you use a Makefile instead of just two shell scripts.


I prefer using environment variables, and then enforcing this in the Makefile: https://gist.github.com/ChimeraCoder/4728823

If this is the first target (or a prerequisite of the first target), then running 'make' will ensure that those variables are set to non-empty strings.


https://github.com/jimktrains/polygonus I wrote this to allow us to encrypt and search passwords. The encrypted file may be kept under version control, though I don't know what types of attacks that could aid.


and i would also like to comment that you should be very careful not to commit code in source control that hardcodes credentials because there's a history that could be exploited.


the fixing of the problem is easy , use freaking envirronment variables ! there is not one language/framework whatever that doesnt support them , so for exemple in node :

> var db_password = process.env.DB_PASS ;

You dont have to keep any password sensitive file whatever inside an opensource project. that is what envirronment variables are fucking made for !




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: