With only three basic particles making up atoms, and their interaction fairly well understood (we can even predict the chemical properties of elements we have not yet discovered) atoms seem to be like Lego bricks: only fitting together in a limited number of ways. So how come, in the age of super computers, we have not yet invented a program that can predict every single possible chemical compound (lets start modestly: of around fifty atoms or less)? Would a “super version” of such a computer program- one that would create a “master list” of all possible compounds and their reactions - theoretically make the field of experimental chemistry obsolete? I realize the vast number of possibilities, but just as with a chess game, the number of actual possible boards is far less than the theoretical number of all possible chess pieces and positions. Many simple compounds could be ruled out as “impossible” and a dead end, thereby considerably shortening the list of possible compounds. Am I missing something obvious here?
-
5Speaking figuratively, we have such a program and it works. The estimated time remaining is $10^{20}$ years. In the meantime, why not do some old-style chemistry? – Ivan Neretin Dec 18 '20 at 16:05
-
2Possibly related: if you are fine with up to 13 non-H atoms of C, N, O, S and Cl, GDB-13 by Renaud et al. count 977 468 314 molecules. Have a look at the relevant question on ChemSE, https://chemistry.stackexchange.com/questions/119797/are-there-any-datasets-containing-molecules-with-more-than-38-heavy-atoms/119802#119802 for an entry. Then teach a computer about the rules about the atoms not yet considered and let it crunch the permutations in silico (because, after all, there probably are not enough atoms around to synthesize all possibilities in gram scale). – Buttonwood Dec 18 '20 at 18:04
-
11There's no such number and the problem is ill posed. A small chunk of polypropylene consists of virtually uncountable amount of "different compounds" - each molecule is different - and so what? Two very big and similar molecules are hardly distinguishable in there. On the other hand a protein molecule and and its misfolded form may be considered "same thing" and yet are very different. Such list would have about as much sense as list of all possible words, lacking their meaning. – Mithoron Dec 18 '20 at 18:37
-
1On second thought, adding the criterion of «and, can it be prepared in the lab with reasonable effort?» as in https://pubs.rsc.org/en/content/articlelanding/2021/SC/D0SC04321D#!divAbstract – Buttonwood Dec 18 '20 at 19:53
-
6Or, why haven't all possible books been written? – Buck Thorn Dec 18 '20 at 20:45
-
What if such a list would have a mass greater than the earth? – aventurin Dec 18 '20 at 21:02
-
Look at single stranded RNA. The number of different molecules grows exponentially with the number of nucleotides. So, perhaps sometime in the future, we will be able to describe the behavior for each, but never, ever, of every such RNA. You might be interested in computational complexity, too: some problems require exponential time. Look for keywords like "Turing machine" or "NP-hard". There is a whole universe of things to understand related to your question. – Gyro Gearloose Dec 18 '20 at 22:46
-
@Karl https://chess.stackexchange.com/a/8334 Chess people have been taking care that there is not an infinite number of possible games, there are rules that no position of figures is allowed more than three times, and if my memory i correct and they didn't change the rules, a game is undecided if no figure is thrown out of the game within 50 moves. – Gyro Gearloose Dec 18 '20 at 22:56
-
2It is similar to the question: “Why do not we generate all the possible text strings, and just collect all the good poems and books from them, so there will be no further need to writers, poets?” What I am saying, practicality is not the only weak point of the question. – Greg Dec 19 '20 at 02:27
-
I think we have collected enough clever analogies and counterexamples in this comment thread already. If anybody wants to comment further, please consider writing it as an answer; or even better, tie all of these together into another answer with an underlying theme. NP-hard is a good one, @GyroGearloose. – orthocresol Dec 19 '20 at 02:50
2 Answers
Because the combinatorial complexity is far to big for practical computation
One of the simplest possible cases that illustrates why this is impossible is given by restricting the choice of atoms to just carbon and hydrogen, forbidding any double bonds and disallowing any rings. This becomes equivalent to enumerating the number of possible mathematical trees of a certain structure a calculation interising in graph theory as well as chemistry.
Graphs are not exactly the same as hydrocarbons as there are spatial constraints on the molecules that can exist in 3D space and also because the molecules are in 3D, some variants with the same graph may be chemically different due to chirality (see this answer for more details). But Some calculations based on just 50 carbons put the number of possible variants at >1021 (see this wikipedia page) which is a ridiculously big number.
And this is is with just two elements from the hundred or so that could be used and it is ruling out most of the interesting structural features that could add a great deal of extra complexity. All this complexity arises from the carbon skeleton alone and it is already beyond any computational capacity we can contemplate having (just using one byte per compound would use a great deal more storage than at least one estimate of the total world capacity to store information.
So, if you still think this sounds like an easy task, you haven't done those calculations or appreciated the gargantuan scale of the resulting numbers.

- 35,967
- 4
- 86
- 173
-
Great answer, thanks. I am not a scientist, but it makes sense to me. I never thought the process would be easy, by the way - just wanted to know if it was possible, as in: would this be a deterministic process limited only by the scale of computing power and time required? Much appreciated! – Luuk van Heerde Dec 19 '20 at 01:35
-
4@LuukvanHeerde There is a big difference between what is possible in mathematics and what is possible in a finite universe. This is the sort of mathematical problem that pretty quickly exhausts all the matter in the universe just to store the answer, never mind compute it. OTOH this is the sort of complexity that makes chemistry interesting in practice and not just a branch of math or logic. – matt_black Dec 19 '20 at 02:19
Let me add a another point besides the already overwhelming problem of almost innumerable combinations.
I am afraid that the Lego analogy oversimplifies the underlying problem by a large margin. A Lego brick only cares to which brick it is directly connected, lets say we have bricks A,B,C,D. Lets connect them like this, A-B-C. For A, it doesn't matter that B is connected to C and i could switch C with D without affecting the connection between A-B. This does not hold for atoms in molecules. It matters significantly what type of other bricks are connected to B or even further along the chain.
It is possible to define functional groups that are fairly independent, having similar properties irregardless of the neighbors, but that is always an approximation. The approximation is often good, which is why the rules that one learns in basic organic chemistry work most of the times but there are cases where the approximation is bad and the rules no longer work. It applies even to A and B themselves. The properties of A and B are not conserved when you bring A and B together, the whole idea to identify A and B in the combination A-B is an approximation.
This adds to the complexity to calculating all possible combinations. In principle, you have to recalculate the properties for every new compound. You can often make a good guess for the properties of the new compound based on the constituent parts but that's it.
This poses already a difficult problem but coupled with the insane amount of possible combinations makes it all but impossible to create such a table.

- 1,245
- 7
- 15