ChatGPT解决这个技术问题 Extra ChatGPT

Which UUID version to use?

Which version of the UUID should you use? I saw a lot of threads explaining what each version entails, but I am having trouble figuring out what's best for what applications.

What are your choices?
Anything that works with python. So I guess this docs.python.org/2/library/uuid.html. 1,3,4,5.
If you are curious about Versions 3 & 5, see this Question, Generating v5 UUID. What is name and namespace?.

E
Ekevoo

There are two different ways of generating a UUID.

If you just need a unique ID, you want a version 1 or version 4.

Version 1: This generates a unique ID based on a network card MAC address and current time. If any of these things is sensitive in any way, don't use this. The advantage of this version is that, while looking at a list of UUIDs generated by machines you trust, you can easily know whether many UUIDs got generated by the same machine, or infer some time relationship between them.

Version 4: These are generated from random (or pseudo-random) numbers. If you just need to generate a UUID, this is probably what you want. The advantage of this version is that when you're debugging and looking at a long list of information matched with UUIDs, it's quicker to spot matches.

If you need to generate reproducible UUIDs from given names, you want a version 3 or version 5. If you are interacting with other systems, this choice was already made and you should check with version and namespaces they use.

Version 3: This generates a unique ID from an MD5 hash of a namespace and name. If are dealing with very strict resource requirements (e.g. a very busy Arduino board), use this.

Version 5: This generates a unique ID from an SHA-1 hash of a namespace and name. This is the more secure and generally recommended version.


I would add: If you need to generate a reproducible UUID from a given name, you want a version 3 or version 5. If you feed that algorithm the same input, it will generate the same output.
In a cloud computing environment (such as AWS or GAE), it would seem the weakness of Version 1 is mitigated into oblivion. Where there are likely to be thousands of different MAC addresses applied to a given application's UUID generator over time, eliminating predictability and/or traceability.
@user239558 Given the goal for an UUID is its uniqueness, UUIDv5 can still be preferred.
That comment about Version 1 being "not recommended", is overly simplistic. In many situations, these are indeed fine and preferable. But if you have security concerns about leaking either of these items of information from a UUID that might be made available to untrustworthy actors: (a) the MAC address of the machine creating the UUID, or (b) the date-time when created, then avoid Version 1. If those two pieces of information are not sensitive, then Version 1 is an excellent way to go.
What happened to version 2?
C
Community

If you want a random number, use a random number library. If you want a unique identifier with effectively 0.00...many more 0s here...001% chance of collision, you should use UUIDv1. See Nick's post for UUIDv3 and v5.

UUIDv1 is NOT secure. It isn't meant to be. It is meant to be UNIQUE, not un-guessable. UUIDv1 uses the current timestamp, plus a machine identifier, plus some random-ish stuff to make a number that will never be generated by that algorithm again. This is appropriate for a transaction ID (even if everyone is doing millions of transactions/s).

To be honest, I don't understand why UUIDv4 exists... from reading RFC4122, it looks like that version does NOT eliminate possibility of collisions. It is just a random number generator. If that is true, than you have a very GOOD chance of two machines in the world eventually creating the same "UUID"v4 (quotes because there isn't a mechanism for guaranteeing U.niversal U.niqueness). In that situation, I don't think that algorithm belongs in a RFC describing methods for generating unique values. It would belong in a RFC about generating randomness. For a set of random numbers:

chance_of_collision = 1 - (set_size! / (set_size - tries)!) / (set_size ^ tries)

You will not see two UUID version 4 implementations collide, unless you generate a billion UUIDs every second for a century and win a coin flip. Remember, set_size is 2^122, which is very big.
V4 algorithm isn't serial, meaning there is a chance that the first two UUIDs generated by v4 could match. Just because there are many options, does not mean you have to run out of unique options before you'll generate a repeat. That could happen at any time.
You are failing to actually do the math. We (as a species) are not generating 1 billion UUIDs every second. So we have longer than 100 years until the first collision (on average).
V4 "might" collide, but the probability is exceptionally low that for most use-cases its worth the risk. Re: "two machines in the world eventually creating the same 'UUID'v4", well, sure, but this isn't a problem because most machines in the world that use UUID's use them in different contexts. I mean, if I generate the same UUID for my own internal app as you do for your internal app, then it doesn't matter. Collisions only matter if they happen in the same context. (remember, even within an app, many UUID's don't have to be unique across the entire app, just the context they're used in)
So it sounds like, if you don't need your Guid to be secure, use version 1. If you need it secure, and feel lucky (or really, don't feel unlucky) use version 4.
N
Nik Bougalis

That's a very general question. One answer is: "it depends what kind of UUID you wish to generate". But a better one is this: "Well, before I answer, can you tell us why you need to code up your own UUID generation algorithm instead of calling the UUID generation functionality that most modern operating systems provide?"

Doing that is easier and safer, and since you probably don't need to generate your own, why bother coding up an implementation? In that case, the answer becomes use whatever your O/S, programming language or framework provides. For example, in Windows, there is CoCreateGuid or UuidCreate or one of the various wrappers available from the numerous frameworks in use. In Linux there is uuid_generate.

If you, for some reason, absolutely need to generate your own, then at least have the good sense to stay away from generating v1 and v2 UUIDs. It's tricky to get those right. Stick, instead, to v3, v4 or v5 UUIDs.

Update: In a comment, you mention that you are using Python and link to this. Looking through the interface provided, the easiest option for you would be to generate a v4 UUID (that is, one created from random data) by calling uuid.uuid4().

If you have some data that you need to (or can) hash to generate a UUID from, then you can use either v3 (which relies on MD5) or v5 (which relies on SHA1). Generating a v3 or v5 UUID is simple: first pick the UUID type you want to generate (you should probably choose v5) and then pick the appropriate namespace and call the function with the data you want to use to generate the UUID from. For example, if you are hashing a URL you would use NAMESPACE_URL:

uuid.uuid3(uuid.NAMESPACE_URL, 'https://ripple.com')

Please note that this UUID will be different than the v5 UUID for the same URL, which is generated like this:

uuid.uuid5(uuid.NAMESPACE_URL, 'https://ripple.com')

A nice property of v3 and v5 URLs is that they should be interoperable between implementations. In other words, if two different systems are using an implementation that complies with RFC4122, they will (or at least should) both generate the same UUID if all other things are equal (i.e. generating the same version UUID, with the same namespace and the same data). This property can be very helpful in some situations (especially in content-addressible storage scenarios), but perhaps not in your particular case.


I would guess it is because OP did not ask: how do I "code up [my] own UUID generation algorithm instead of calling the UUID generation functionality that most modern operating systems provide?"
Aside from that, I think it is a good explanation of UUIDv3 and v5. See my answer below about why I think v1 can be a good choice.
what is NAMESPACE_URL ? it's a variable i can get ? from where?
@stackdave NAMESPACE_URL is a UUID usually equal to 6ba7b811-9dad-11d1-80b4-00c04fd430c8, following the recommendation made on page 30 of RFC-4122.
sha256.update(something.getBytes(charset)); sha256.update(somethingElse.getBytes(charset)); byte[] hash = sha256.digest(salt); return UUID.nameUUIDFromBytes(hash).toString(); Is this v3? Do they generate the same UUID ? RFC4122 ?
E
Eugen Konkov

Postgres documentation describes the differences between UUIDs. A couple of them:

V3:

uuid_generate_v3(namespace uuid, name text) - This function generates a version 3 UUID in the given namespace using the specified input name.

V4:

uuid_generate_v4 - This function generates a version 4 UUID, which is derived entirely from random numbers.


N
NotX

Since it's not mentioned yet: you can use uuidv1 if you want to be able to sort your entities by creation time without a separate, explicit timestamp. While that's not 100 % precise and in many cases not the best way to go (due to the lack of explicity), it comes handy in some scenarios, e.g. when you're working with a Cassanda database.