16

Let's say that I have an articles table that have two columns: title and content, and let's say that this articles table doesn't have any relationship with any other table.

Why would this articles table have a primary key? I mean, what problems could I face if the articles table doesn't have a primary key? Having identical rows is not a problem because two or more articles can have the same title and content.

terdon
  • 105
  • 1
    Think of the db client/application is a user/consumer of those primary keys; in some sense, to the external client these are foreign keys as the client's usage of them is ~similar to how another table might reference them (namely, as foreign keys). If it weren't for the application interacting with the database we might not need primary keys (or even tables). – Erik Eidt Apr 07 '22 at 13:07
  • 11
    "two or more articles can have the same title and content" - can they, wouldn't they be the same article then? What would distinguish them? And why would you care how many articles with the same values exist - does it even matter if there's one, two, or more? – Bergi Apr 07 '22 at 13:22
  • 28
    having identical rows is not a problem because two or more articles can have the same title and content ironically, this IS actually a problem if you don't set a primary key colum. In a table without a primary key, when two or more rows have identical values in all of their fields, the DBMS has no way to diferentiate one from the other, so in the end they will be treated as the same row and, for example, trying to update one will either update all of them, or cause an error as the DBMS is unable to identify which one you're actually trying to update – Josh Part Apr 07 '22 at 15:15
  • 5
    @JoshPart Or in DBMS terms, that violates the first normal form (1NF). – iBug Apr 08 '22 at 11:16
  • 2
    Why wouldn't it have a primary key? Why wouldn't you want to be able to distinguish unique articles that might coincidentally have identical content? – shadowtalker Apr 08 '22 at 14:19

5 Answers5

68

The main benefit of a primary key has nothing to do with foreign keys. A primary key allows you to identify a single record in that table. Presumably, the system will have multiple articles. If all your application ever does is show a list of articles, then a primary key won't be much use. As soon as you want to show just a single, specific article, the primary key becomes mandatory.

When showing a single record to the end user, do not assume an index within the result set is enough. Consider a case when a user chooses to view article number 2. While viewing the list of articles, someone adds another article. Depending on how you sort the result set, showing "article 2" might end up showing article number 3.

Primary keys are also necessary for discrete, accurate updates. The primary key would be a discriminator value used in the UPDATE statement in order to ensure you don't accidentally update the wrong record (or no record at all).

update articles
set ...
where id = 5;

Same thing for DELETEs.

You need primary keys on a table if you want to reliably:

  • View a single record
  • Update a single record
  • Delete a single record

Changing data without referencing the primary key value is risky in most use cases, and I do not recommend doing it.

  • 35
    In short, the number of legitimate use cases for not putting in a primary key is small enough that you should just put a primary key into every table. – Robert Harvey Apr 06 '22 at 20:05
  • A single, specific article can also be returned with the right WHERE clause, so explaining why the PK is favorable world help answer the question. – whatsisname Apr 07 '22 at 00:44
  • 7
    This is the right answer. As an additional note for the OP, unless you know what you are doing, I would recommend that all tables have a primary key that is immutable, and if that is not possible with a natural key then a surrogate key should be introduced. – John Wu Apr 07 '22 at 01:04
  • @whatsisname: my answer also includes a use case where the system attempts to return a specific article by an index within the result set not being dependable. I also include cases for updates and deletes. I'm not sure what else to include or what to clarify. I'll have to think about this and edit my answer later. – Greg Burghardt Apr 07 '22 at 11:22
  • 2
    @RobertHarvey I once had to delete rows from a table of "bonus coupons" that only had a customer id and a single customer having three coupons was represented by them having three identical rows. Still don't know if the DB admins were just joking around or not. – JollyJoker Apr 07 '22 at 11:25
  • In concrete terms, I would generally not advise using PKs to dictate order of things; but it's a reasonable enough example to showcase the purpose of giving a specific unique identity to your table rows. – Flater Apr 07 '22 at 11:45
  • @Flater: I might need to try clarifying it then. I wasn't advocating to use the PK to determine the order. – Greg Burghardt Apr 07 '22 at 11:50
  • 1
    Don't many database engines add implicit, invisible primary keys if you don't? e.g. sqlite rowid – user253751 Apr 07 '22 at 14:37
  • 3
    @user253751: you need to be very careful about using those. They are meant to be internal identifiers for the database, not external identifiers for your queries. Many of these internal identifiers are partially generated by the physical location of data on disk. If that physical location can change, the rowid might change. The database and disk controllers manage the physical locations of data. Do not write application logic relying on this transient information. – Greg Burghardt Apr 07 '22 at 14:41
  • 1
    For one more reason that is easy to appreciate in use, in some systems, when a primary key is created it also becomes the physical index if a user defined one does not already exist, which is important for performance on WHERE queries on that table on the key fields even without any joins – Mike M Apr 07 '22 at 19:14
  • If this is so important to the database, it should be a property of the database and not some field that you have to add to a record. In an OOP language, when defining an object, you don't have to add a field which gives the object a primary identty: its address in memory (for instance) already does that. If we declare an object type with no fields at all, we can make two instances of it and ask, "are you the same object". – Kaz Apr 08 '22 at 14:42
  • @Kaz: that is not an accurate comparison. You don't need to declare a field in the class to represent its memory location, because the computer does that. A memory address is an internal identifier meant for use by the computer, not the programmer. Same thing for "rowid" in Oracle and primary keys. The "rowid" is an internal identifier, similar to a memory address for an object in a program. Internal identifiers are used by systems for their own purposes, not yours. They are free to change identifiers to suite their needs, leaving you to pick up the pieces of a broken application. – Greg Burghardt Apr 08 '22 at 14:55
  • @user253751: SQLite allows WITHOUT ROWID tables now, but these must have an explicit PRIMARY KEY, since the engine has to uniquely identify rows somehow. But otherwise, tables do have a hidden ROWID column, and if you declare an INTEGER PRIMARY KEY (with or without AUTOINCREMENT), it's used as an alias for the ROWID, thus making the ROWID column unhidden. – dan04 Apr 08 '22 at 17:15
  • @GregBurghardt In many OOP systems, we use an object's address as its identifier; e.g. to decide, "is this function parameter holding an object which is the same one as the one in this hash table". – Kaz Apr 08 '22 at 21:36
22

In SQL, rows are identified solely by their values. If two rows have exactly the same values, there is no way to distinguish them in a query. The standard SQL operators SELECT, UPDATE, DELETE always operates on sets matching a criteria, so if you have two duplicate rows there is no way to update one without updating the other.

It is easy to see how this can lead to all sorts of problems. Let's say you have a GUI which allow you to edit articles individually. During the editing of one article you bring it into a state where title and content it is exactly the same as another article. From this moment on, any update will update both articles. They now are "locked together". You can't even delete one of the duplicates, since a DELETE that match one will also match the other.

It is possible to circumvent this issue with advanced SQL. For example, SQL cursors can be used to iterate through a table and update rows individually even if there are duplicates. But this is a complex and slow solution to a problem which is easily avoided in the first place by ensuring all rows are unique.

SQL has a somewhat confusing relationship to duplicates. The relational model which underlies SQL assume rows are always unique (ie. every table has a primary key or unique constraint) but SQL does not enforce this when creating tables. So you end up with duplicates being allowed but giving all sorts of problems. So just avoid them.

JacquesB
  • 59,334
  • There are db-specific tricks that make it possible to refer to n of m identical rows like Oracle's ROWNUM. Not a good idea to design a db that way, of course. – JollyJoker Apr 07 '22 at 11:32
  • @JollyJoker But ROWNUM changes over time, as rows are inserted and deleted. So you can't easily use it to refer to a specific row unless you lock the table for the duration of the editing session. – Barmar Apr 07 '22 at 14:16
  • @Barmar "For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows" It can't change over time, it only lives for a single query. – JollyJoker Apr 07 '22 at 14:58
  • @JollyJoker That makes it even more useless for this, then. It can change with every query, so it DOES change over time. – Barmar Apr 07 '22 at 15:00
  • @Barmar The rows are identical. The only thing you can care about is doing something to the first n instead of all. – JollyJoker Apr 07 '22 at 15:06
  • 3
    @JollyJoker Suppose you fetch 3 identical rows, edit 1 of them. Meanwhile someone else does the same thing. You each save your changes. There's no way to know whether your changes will affect the same or different rows, because ROWNUM is not a unique key across the two result sets. – Barmar Apr 07 '22 at 15:37
  • 1
    @Barmar: I guess JollyJoker meant rowid which is the physical location of a row –  Apr 08 '22 at 19:06
  • @JollyJoker You would be better to use Oracle's ROWID pseudo-column; however, even that is only guaranteed not to change within a single transaction (if, among other reasons, row movement is enabled) so you still should not expect it to be consistent in the long-term. – MT0 Apr 09 '22 at 07:20
  • @a_horse_with_no_name Barmar probably means rowid, as he's talking about it changing as rows are inserted and deleted. The only thing I'm taking about is having n identical rows and deleting only some of them. I have no idea why people are talking about consistent ways of referring to the same one row in a set of identical rows - there can't possibly be any use for that – JollyJoker Apr 12 '22 at 11:41
3

You write

having identical rows is not a problem because two or more articles can have the same title and content.

But it is in fact a problem and it is exactly why I always have a surrogate key in every table:

If you can have two articles with the same title (and contents) and you don't have any unique key, then you have no way of deleting one of them without deleting the other as well.

AndreKR
  • 577
0

To add to this, when your API is consumed by e.g. a mobile application, the application need to be able to uniquely identify articles e.g when an article headline is pressed to open the article's body, that will then send a GET request full article's body by ID. When designing systems, you need think outside the box and realise that ultimately, what you are designing is not for you but for whoever consumes your API.

Additionally, IDs help ui with scrolling performance (but that is a bit off-topic here)

Adrian L
  • 109
0

If you can have “two or more articles can have the same title and content” then why allow the second or third? What does it mean to have a dozen or a million identical articles?

Either it means something, and you need a way to differentiate them, or it doesn’t and it’s a waste of resources which you should prevent. If you differentiate them, then you have a primary key (which may be a composite key).

A primary key is nothing more or less than a way to uniquely identify a row of data.

jmoreno
  • 10,853
  • 1
  • 31
  • 48