ChatGPT解决这个技术问题 Extra ChatGPT

Best practices around generating OAuth tokens?

I realize that the OAuth spec doesn't specify anything about the origin of the ConsumerKey, ConsumerSecret, AccessToken, RequestToken, TokenSecret, or Verifier code, but I'm curious if there are any best practices for creating significantly secure tokens (especially Token/Secret combinations).

As I see it, there are a few approaches to creating the tokens:

Just use random bytes, store in DB associated with consumer/user Hash some user/consumer-specific data, store in DB associated with consumer/user Encrypt user/consumer-specific data

Advantages to (1) are the database is the only source of the information which seems the most secure. It would be harder to run an attack against than (2) or (3).

Hashing real data (2) would allow re-generating the token from presumably already known data. Might not really provide any advantages to (1) since would need to store/lookup anyway. More CPU intensive than (1).

Encrypting real data (3) would allow decrypting to know information. This would require less storage & potentially fewer lookups than (1) & (2), but potentially less secure as well.

Are there any other approaches/advantages/disadvantages that should be considered?

EDIT: another consideration is that there MUST be some sort of random value in the Tokens as there must exist the ability to expire and reissue new tokens so it must not be only comprised of real data.

Follow On Questions:

Is there a minimum Token length to make significantly cryptographically secure? As I understand it, longer Token Secrets would create more secure signatures. Is this understanding correct?

Are there advantages to using a particular encoding over another from a hashing perspective? For instance, I see a lot of APIs using hex encodings (e.g. GUID strings). In the OAuth signing algorithm, the Token is used as a string. With a hex string, the available character set would be much smaller (more predictable) than say with a Base64 encoding. It seems to me that for two strings of equal length, the one with the larger character set would have a better/wider hash distribution. This seems to me that it would improve the security. Is this assumption correct?

The OAuth spec raises this very issue in 11.10 Entropy of Secrets.

Why the encryption? Isn't hashing good enough? If just hashing is good enough for password, shouldn't it be even better for longer access tokens?
It has been 7.5 years since I asked the question. I honestly can't remember.
Reading again, hashing and encryption were two different approaches suggested. Encryption would allow server to get some info without a DB lookup. It was one trade off among many.

Z
ZZ Coder

OAuth says nothing about token except that it has a secret associated with it. So all the schemes you mentioned would work. Our token evolved as the sites get bigger. Here are the versions we used before,

Our first token is an encrypted BLOB with username, token secret and expiration etc. The problem is that we can't revoke tokens without any record on host. So we changed it to store everything in database and the token is simply an random number used as the key to the database. It has an username index so it's easy to list all the tokens for an user and revoke it. We get quite few hacking activities. With random number, we have to go to database to know if the token is valid. So we went back to encrypted BLOB again. This time, the token only contains encrypted value of the key and expiration. So we can detect invalid or expired tokens without going to the database.

Some implementation details that may help you,

Add a version in the token so you can change token format without breaking existing ones. All our token has first byte as version. Use URL-safe version of Base64 to encode the BLOB so you don't have to deal with the URL-encoding issues, which makes debugging more difficult with OAuth signature, because you may see triple encoded basestring.


Excellent, thanks. The version idea is a good one. I've got the URL-friendly Base64 going, but I am wishing I had a strictly alpha-numeric encoding for even easier reading.
Hadn't thought of that before, very interesting! I was planning on APC key-caching to keep unneccessay load off the DB before i read this. Still unsure if this may not be way slower than a shared-memory lookup APC does (at least on the 2nd, 3rd, etc... request within reasonable timespan).