ChatGPT解决这个技术问题 Extra ChatGPT

Exposing database IDs - security risk?

I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.

Any opinions or links on why it's a risk, or why it isn't?

EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123 you'll get an error page. Otherwise the URL itself should be secret.

EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.

EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.


e
erickson

There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.

The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.

Here are some good rules to follow:

Use role-based security to control access to an operation. How this is done depends on the platform and framework you've chosen, but many support a declarative security model that will automatically redirect browsers to an authentication step when an action requires some authority. Use programmatic security to control access to an object. This is harder to do at a framework level. More often, it is something you have to write into your code and is therefore more error prone. This check goes beyond role-based checking by ensuring not only that the user has authority for the operation, but also has necessary rights on the specific object being modified. In a role-based system, it's easy to check that only managers can give raises, but beyond that, you need to make sure that the employee belongs to the particular manager's department.

There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.

Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:

Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure. One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.


IMO, adding unpredictable IDs is a "security through obscurity" approach and can lead to a false sense of security. It's better to focus on (1) and (2) and make sure your access control is solid.
Using a cryptographic RNG definitely not "security through obscurity." The attacker is no closer to guessing object identifiers even when she knows how you generate them. Security through obscurity means that if the algorithm you are using is discovered, it can be exploited. It does not refer to keeping secrets, like keys or the internal state of an RNG.
@stucampbell Perhaps, but that doesn't mean that you shouldn't use unpredictable IDs at all. Bugs happen, so unpredictable IDs are an extra safety mechanism. Besides, access control is not the only reason to use them: predictable IDs can reveal sensitive information such as the number of new customers within a certain timeframe. You really don't want to expose such information.
@Stijn you can’t really say that I “really don’t want” to expose how many customers I have. I mean McDonald’s has a huge sign says that they’ve served 10 billion hamburgers. It’s not a security risk at all, it’s a preference. Furthermore, you have to login before you see any URLs in most applications where we would worry about this anyway. Therefore we would know who was scraping data.
One thing that I didn't see mentioned in this conversation is that from a troubleshooting and ease-of-use standpoint, it can be very handy to have ID's exposed in a url to help direct users to a specific resource or have them be able to tell you exactly what resource they're viewing. You can mostly avoid business intelligence concerns by starting the auto increment at a higher value, just to offer one idea.
P
Peter

While not a data security risk this is absolutely a business intelligence security risk as it exposes both data size and velocity. I've seen businesses get harmed by this and have written about this anti-pattern in depth. Unless you're just building an experiment and not a business I'd highly suggest keeping your private ids out of public eye. https://medium.com/lightrail/prevent-business-intelligence-leaks-by-using-uuids-instead-of-database-ids-on-urls-and-in-apis-17f15669fd2e


Finally, sane answer bringing other aspects than the security risk.
This should be the accepted answer. If you assume that the attacker can bypass your security (which you should always do), you don't want to just hand this kind of information over.
@Kriil They don't even need to bypass your security. They just need to create an account!
J
John Topley

It depends on what the IDs stand for.

Consider a site that for competitive reason don't want to make public how many members they have but by using sequential IDs reveals it anyway in the URL: http://some.domain.name/user?id=3933

On the other hand, if they used the login name of the user instead: http://some.domain.name/user?id=some they haven't disclosed anything the user didn't already know.


if you're using sequential IDs you're right, but if not then that doesn't expose anything
@orip: Like I said, it depends on what someone can discover by examining several id's. Is there a pattern? Can they use that information to gain information they are not intended to have?
@John: Thanks for the edit. English isn't my native language :)
I've done exactly this: used sequential ID numbers to determine the size of a competitor's userbase.
They are everywhere. Many shoppingsites use a sequential number for the order id. Place one order at one date, and one at another date and you know how many orders they got during that period. Even if you don't know how much money the orders are worth, you still get an indication of how the well the business goes.
A
Arjan Einbu

The general thought goes along these lines: "Disclose as little information about the inner workings of your app to anyone."

Exposing the database ID counts as disclosing some information.

Reasons for this is that hackers can use any information about your apps inner workings to attack you, or a user can change the URL to get into a database he/she isn't suppose to see?


Accessing resources they aren't supposed to see - only if I don't check permissions (which I do, otherwise I have a different security problem). Disclosing information about the "inner workings" - that's exactly my question. Why is it a problem?
@orip: It's all about being as secure as possible. If you are the kind of programmer who doesn't make mistakes, then it's not an issue. Otherwise, exposing fewer details makes it more difficult to exploit your code if (when) you do make mistakes. By itself, you're right, it doesn't add security.
@Adam: superficial security can be worse than no security. With no security you're explicit, with superficial security you could think it adds something non-neglible.
using uuid for public ids is about minimizing risk. Let's imagine your dev's are not perfect and manage introduce a bug that results in the ability to bypass a access check for a particular table... If you are using sequential id's in the API, I can most likely export the entire table with a few lines of bash. UUID, and I most likely get nothing.
J
Joshua

We use GUIDs for database ids. Leaking them is a lot less dangerous.


This is what I would suggest. Lot less likely to guess the GUIDs in your database.
This comes at a performance penalty. See here
Interesting thing 1 about using guids is that you can let clients generate database ids. Interesting thing 2 is that they have no problem with sharded databases the way auto incrementing ids do.
Do you use these GUIDs also in html code as id attributes to identify users, comments, posts? To indicate which link an user clicked or which post he wants to comment?
J
John Topley

If you are using integer IDs in your db, you may make it easy for users to see data they shouldn't by changing qs variables.

E.g. a user could easily change the id parameter in this qs and see/modify data they shouldn't http://someurl?id=1


If I don't check permissions I have a different security problem.
The question isn't if you will check permissions. It's if the new hire you don't know will check permissions.
@BrianWhite - which is why you want a code review process in place, so that you can educate them before it hits production.
Sure. Most places have a code review policy in place. And simultaneously, security issues abound on the web. SQL injection is one of the most common despite all the education
What we do is encrypt the integer ids when used in the URL or form variables.
k
krosenvold

When you send database id's to your client you are forced to check security in both cases. If you keep the id's in your web session you can choose if you want/need to do it, meaning potentially less processing.

You are constantly trying to delegate things to your access control ;) This may be the case in your application but I have never seen such a consistent back-end system in my entire career. Most of them have security models that were designed for non-web usage and some have had additional roles added posthumously, and some of these have been bolted on outside of the core security model (because the role was added in a different operational context, say before the web).

So we use synthetic session local id's because it hides as much as we can get away with.

There is also the issue of non-integer key fields, which may be the case for enumerated values and similar. You can try to sanitize that data, but chances are you'll end up like little bobby drop tables.


Interesting, though I would think that checking all input (including URL parameters) for every request is very appropriate for the web.
D
Dharman

From the perspective of code design, a database ID should be considered a private implementation detail of the persistence technology to keep track of a row. If possible, you should be designing your application with absolutely no reference to this ID in any way. Instead, you should be thinking about how entities are identified in general. Is a person identified with their social security number? Is a person identified with their email? If so, your account model should only ever have a reference to those attributes. If there is no real way to identify a user with such a field, then you should be generating a UUID before hitting the DB.

Doing so has a lot of advantages as it would allow you to divorce your domain models from persistence technologies. That would mean that you can substitute database technologies without worrying about primary key compatibility. Leaking your primary key to your data model is not necessarily a security issue if you write the appropriate authorization code but its indicative of less than optimal code design.


That's a common schema design approach - even general data design, a-la domain-driven design - but as a persistence approach it has downsides. The largest is mis-design - the domain doesn't behave as you thought, or has externally changed, in ways that can have large effects on your persisted data. What if a social security number can change? For example, turns out it was mistyped - there may be many records to needlessly change. In code you'd usually represent a link as a pointer or reference, why not in a DB too?
Reading over my answer again, I think it may have not been clear. I am suggesting that database rows have their own IDs that are used for their own technical reasons. But when the data is hydrated from the DB, I don't think we should populate that ID in our Domain entity. I'm still a little hesitant with my answer but I think it makes sense. @orip, what do you think?