6

I'm reading a very verbose textbook on database design, but I suspect that much of the book could be condensed into a few pages if the authors were not trying to avoid mathematical language.

What is the mathematical definition of a relational database? Stated in a way that a pure mathematician would be satisfied with.

littleO
  • 51,938
  • 4
    Relations. Literally that's it. http://www.maths.kisogo.com/index.php?title=Relation – Alec Teal Sep 14 '15 at 00:24
  • 1
    @AlecTeal: The Wiki page is a handy way of getting the OP to some classic references like the books by Date and Codd. What's wrong with that? – Rob Arthan Sep 14 '15 at 01:20
  • @AlecTeal I think a relational database is something more than just a relation, so what do you mean? Perhaps a "table" is just a relation, but shouldn't a precise definition of "relational database" express the idea that a relational database is a collection of tables with certain properties? – littleO Sep 14 '15 at 01:35
  • @littleO no. Formally they are simply relations. Concepts like "order by" are bolted on afterwards. As are things like properties and foreign keys. – Alec Teal Sep 14 '15 at 01:37

3 Answers3

1

I will take your question as a reference request. Your first stop can be https://en.wikipedia.org/wiki/Relational_model and the books by Codd and Date cited there.

Rob Arthan
  • 48,577
1

To add to the earlier answer, from the preface of the book The relational model for database management: version 2 by Codd:

The relational model is solidly based on two parts of mathematics: first-order predicate logic and the theory of relations.

For more on first-order predicate logic and the theory of relations, I refer you to the following links:

First-order predicate logic

and

Theory of relations

Edit: Let me try, as per the OP's request, to define a relational database in a few sentences.

A relational database is a collection of tables, which mathematically are called relations. Each table consists of columns and rows, which are represented in tabular format as follows:

Say your relation is $R = \{(0,0), (0,1),(1,2),(1,3),(2,3),(3,4)\}$. In tabular format, you can have it as:

Relational database illustration

1

I come to this from a different angle... I am teaching relations, ostensibly from an abstract perspective, to computer science students... however I want to try and relate what I am teaching to computers in some way.

This is what I do with relations. It is sound on the relations side of the house... databases... I am not so sure.

Let us start with two variables:

  • person, $p$, which takes values in $P$, the set of people,
  • age, $a$, which takes values in $\mathbb{N}_0$.

We can form the Cartesian product $P\times\mathbb{N}_0$ of $P$ and $\mathbb{N}_0$. This is an infinite set, formed of ALL ordered pairs $(p,a)$, where $p$ is a person, and $a$ is a natural number:

$$P\times\mathbb{N}_0=\{(p,a)\,\colon\,p\in P,\,a\in\mathbb{N}_0\}.$$

A relation $R$ between $P$ and $\mathbb{N}_0$ is a subset, $R\subset (P\times \mathbb{N}_0)$, that is a relation is SOME ordered pairs. And this is a database. For example, suppose the relation is:

$$R=\{(\text{Alice},34),(\text{Bob},38),(\text{Carol},33)\}.$$

As a database this would look like:

person age
Alice 34
Bob 38
Carol 33

Now what we can do is extend the notion of relation to multiple sets. Let us consider two more variables:

  • nationality, which takes values in a set $N$,
  • country of residence, which takes values in a set $C$.

We can consider the full Cartesian product:

$$P\times \mathbb{N}_0\times N\times C.$$

A relation between these sets is a subset of this, e.g.

$$R=\{(\text{Alice},34,\text{American},\text{Canada}),(\text{Bob},38,\text{Irish},\text{USA}),(\text{Carol},33,\text{English},\text{England})\}.$$

This is a database:

person age nationality residence
Alice 34 American Canada
Bob 38 Irish USA
Carol 33 English England

Perhaps a lot of relations do not give databases that arise. For example, the following is also a relation on $P\times \mathbb{N}_0\times N\times C$:

$$S=\{(\text{Alice},34,\text{American},\text{Canada}),(\text{Alice},38,\text{Irish},\text{USA}),(\text{Carol},33,\text{English},\text{England})\}.$$

But is it correctly a database?

I think the databases we have here are more like partial functions:

$$P\to(\mathbb{N}_0\times N\times C),$$

so no person appears twice in the database.

I think for databases, there is an implied extra variable. For example, consider the following relation on $\{a,b\}\times \{a,b\}\times\{a,b\}$ (think of these as values of variables $v_1$, $v_2$, $v_3$:

$$(a,a,b),\,(a,a,a),\,(b,a,b).$$

The database looks like:

$v_1$ $v_2$ $v_3$
$a$ $a$ $b$
$a$ $a$ $a$
$b$ $a$ $b$

I would argue that the rows give a natural fourth variable, the row number, $r\in\mathbb{N}$ and so we have a partial function:

$$\mathbb{N}\to (\{a,b\}\times \{a,b\}\times \{a,b\}):$$

$r$ $v_1$ $v_2$ $v_3$
1 $a$ $a$ $b$
2 $a$ $a$ $a$
3 $b$ $a$ $b$
JP McCarthy
  • 8,420