3

Currently I want to create a SQL database effectively and "logically". Lately I read a lot about the issues that it is (mostly) not a good idea to define primary keys that can be seen from "outside" as autoincrement. And for my applications it would be a problem too/risk. As a solution I used UUIDs for this. However, I was told at the time that data retrieval (SELECT, ...) would take longer using this method. (Because there are no simple integer values to search for). <-- Please correct me, if this is wrong.

Then I had the idea to set an autoincrement primary key and a unique uuid value. Inside the database the primary keys were used and everything that was output in the API was passed from the id value to the uuid value or vice versa.

Example:

user_id = SELECT id FROM xxx WHERE xxx = user_input (user_uuid).
SELECT name FROM xxx WHERE user_id = Value1

Instead of:

SELECT name FROM xxx WHERE user_id = user_input (user_id)

*Theoretically some things can be compressed, simplified and secured. It is only for better illustration. No explicit programming language is used.

Since I haven't done anything with databases for a while, I wonder if the longer times are true at all, and if so, if my method makes sense at all - I don't think so. So I wanted to ask what is the most sensible and easiest solution to keep the query time minimal, but not to have incrementing keys.

Vito
  • 65
  • Unfortunately MySQL has pretty awful UUID support. No builtin way to generate v4 (random) UUIDs. Represents them as an UTF-8 string by default, not as a 128-bit number (but can convert to a varbinary(16) type). This could limit performance, if you really have that kind of scale. I wouldn't think twice about using UUIDs in more feature-rich DBs like Postgres though, modulo the typical UUID considerations like different data access patterns. – amon Feb 24 '23 at 22:27
  • When it comes to performance, it is well-known wisdom not to care about micro-optizations as long as an application does not show a measureable bottleneck. I guess you have'nt any measurements done, otherwise you would have mentioned them. If I am right, IMHO you are overthinking this. – Doc Brown Feb 24 '23 at 22:56
  • In the answer to this similar question, you'll find some elements about performance overhead as well as some arguments to nevertheless go the way you intend: https://softwareengineering.stackexchange.com/q/440579/209774 – Christophe Feb 24 '23 at 23:18

2 Answers2

1

No, you should stick to one primary key.

It's a bit of a myth that GUIDs make databases slow. First off they are just 128bit ints at the end of the day. Secondly, index fragmentation isn't the problem db admins make it out to be.

Yes obviously smaller ids will be faster. But outside of some extreme cases you will be fine with GUIDs as primary keys for everything and will never notice the difference.

Ewan
  • 75,506
  • corrected...... – Ewan Feb 24 '23 at 23:42
  • get them to watch this : https://www.youtube.com/watch?v=nc4CMo7VSPo – Ewan Feb 24 '23 at 23:55
  • 1
    @Vito: And the processing time required to generate a new GUID can be offloaded to the application, or a client of the application. The decision to use ints or GUIDs is largely based on who generates new values. GUIDs can be generated by anyone. Int primary keys require the database, and only the database, to generate new values. – Greg Burghardt Feb 25 '23 at 14:55
  • @Ewan Thank you for the clariffication. Just another question about inserting a guid in a Database. Should I check every time, if this guid is already used or should I just give an error (bc this should happen almonst never). Since I'm using mysql, as far as I know, I can't have GUIDs generated by the database. – Vito Feb 25 '23 at 17:35
  • no, you will never get a duplicate – Ewan Feb 25 '23 at 17:47
0

Frankly I consider this advice to be nonsense.

Standard implementations of UUIDs involve extremely large integers, which are human-unusable. It makes it much more difficult for developers or supervisors to trace through records or interpret logs.

The need for raw data to be as human-usable as possible, is usually a far greater concern than avoiding revealing the time-sequencing of when the key of a particular record was allocated.

In other words, it's about maintainability. Arguments about performance are a complete red herring.

If there isn't a specific reason to hide the time-sequencing, then don't complicate the situation with UUIDs - just allocate a sequential number in a standard way.

Even in those exceptional cases where concealing the time-sequencing is desirable, I would be more inclined to write a custom allocator which allocates randomly but within a relatively low range to begin (such as 101..999), and gradually enlarges the available range if necessary to accommodate more keys.

There could be statistical bias with this approach if the available range has to be enlarged repeatedly, but it would still be impossible to be certain when any single key was allocated.

Cases where true cryptographic randomness would be required, are in my opinion outside the scope of any ordinary database design - almost certainly something that touches on governmental administration or public policy concerns, where somebody might be given access to a large subset of data from a database, but mustn't be able to make the slightest inference about things not explicitly represented in the subset. It's probably somewhere like that, where this idea of using UUIDs for database keys has come from.

Steve
  • 8,715
  • 1
    the case for UUIDs isnt about being random, its about being unique. Ie you dont have to goto a database before you know what the id is, you can use a distributed database, you can have uniqueness across multiple databases etc. I agree that using them for an "id that doesnt show people how many orders Ive taken this year" isnt the best reason to use them, but they shouldbe standard for id creation rather than auto inc – Ewan Feb 25 '23 at 16:18
  • @Ewan, a sequential ID will also be unique, and a distributed system which allocates IDs from multiple places, can use a combination of a fixed system ID and a sequential ID. They are not standard because, like I say, they are not human-usable - they are unergonomic. – Steve Feb 25 '23 at 17:58
  • 1
    they are unergonomic true, however users shouldn't be typing in most database ids. If you need to generate a user friendly id like a customer number or something then auto inc is arguably just as bad – Ewan Feb 25 '23 at 18:50
  • sequential ids etc are only unique to a table on a database instance. that is very non-unique indeed – Ewan Feb 25 '23 at 18:51
  • @Ewan, my argument is about the effect on developers and supervisors. They do, occasionally, have to deal with the raw data. That raw data should not be obfuscated if there is no compelling reason for it. – Steve Feb 25 '23 at 22:09