264

I just ran across this old question asking what's so evil about global state, and the top-voted, accepted answer asserts that you can't trust any code that works with global variables, because some other code somewhere else might come along and modify its value and then you don't know what the behavior of your code will be because the data is different! But when I look at that, I can't help but think that that's a really weak explanation, because how is that any different from working with data stored in a database?

When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter. You don't care what the data is; that's the entire point. All that matters is that your code deals correctly with the data that it encounters. (Obviously I'm glossing over the often-thorny issue of caching here, but let's ignore that for the moment.)

But if the data you're working with is coming from an external source that your code has no control over, such as a database (or user input, or a network socket, or a file, etc...) and there's nothing wrong with that, then how is global data within the code itself--which your program has a much greater degree of control over--somehow a bad thing when it's obviously far less bad than perfectly normal stuff that no one sees as a problem?

Mason Wheeler
  • 82,789
  • 122
    It's nice to see veteran members challenge the dogmas a little ... – svidgen May 24 '16 at 19:58
  • 11
    In an application, you usually provide a mean to access the database, this mean is passed to functions which want to access the database. You don't do that with global variables, you simply know they're at hand. That's a key difference right there. – Andy May 24 '16 at 20:55
  • 46
    Global state is like having a single database with a single table with a single row with infinitely many columns accessed concurrently by an arbitrary number of applications. – BevynQ May 24 '16 at 23:35
  • 2
    @BevynQ that makes no sense at all to me, could you elaborate? – sara May 25 '16 at 06:34
  • 1
    The state of the database is part of the spec of most operations, for example when I add a new customer; the testers will check the customer record is in the database. These tests will hopefully be automated. Global variables are just there because they make life easier for the programmer. – Ian May 25 '16 at 08:36
  • 43
    Databases are also evil. – Stig Hemmer May 25 '16 at 09:21
  • 1
    Much of the pain you get from a database is exactly the same as a singleton. For example difficulty in automated testing. Singletons and globals aren't evil. But like so many concepts you need to know the pros/cons of them. Typically the singleton is the right model for the database. – ArTs May 25 '16 at 09:58
  • 4
    The trick is to move all the singletoness into a single place where it can be managed and walled off. Arguably that is the entire raison d'être for the database. – ArTs May 25 '16 at 10:02
  • 3
    Also, it is possible to make databases immutable as well. – gardenhead May 25 '16 at 18:36
  • 28
    It's entertaining to "invert" the argument you make here and go in the other direction. A struct that has a pointer to another struct is logically just a foreign key in one row of one table that keys to another row of another table. How is working with any code, including walking linked lists any different from manipulating data in a database? Answer: it isn't. Question: why then do we manipulate in-memory data structures and in-database data structures using such different tools? Answer: I really don't know! Seems like an accident of history rather than good design. – Eric Lippert May 25 '16 at 22:18
  • I take umbrage with this When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter. I care a great deal. Application A should never be able to see Application B's data except via application B. – BevynQ May 25 '16 at 22:31
  • 1
    @Kai It is possible to design a database badly so that all data is globally scoped. It is also possible to highly restrict who has access to what data when and how. It is also possible to enforce data integrity rules. – BevynQ May 25 '16 at 22:40
  • 3
    @EricLippert please make that a question... – trichoplax is on Codidact now May 26 '16 at 00:06
  • 3
    The MUMPS programming language is worth a mention here. In MUMPS, there really is no functional difference between global variables and databases! – Andrew Coonce May 26 '16 at 00:42
  • 1
    @ArTs: databases is not a Singleton, they are usually more akin to a Borg. You can create multiple instances of the connections, or a connection pool, but they share the same state. – Lie Ryan May 26 '16 at 01:41
  • @LieRyan A database CONNECTION is not a database. I am however trying to describe the real world object rather than the data structures. Also, I called it "a database", rather than "the database". Applications, sometimes have multiple databases, but each one there must be one and only one of. – ArTs May 26 '16 at 01:47
  • It is the quality of the design and the code that touches global states that matters. – rwong May 26 '16 at 21:07
  • 1
    I'm voting to close this question as off-topic because the premise is fundamentally flawed as it is an equivalence fallacy. –  May 26 '16 at 23:31
  • 1
    I don't understand this question. Is your database connection stored as a global variable? If not, then how is it global? It's only accessible to the procedures that you explicitly passed it to... – user541686 May 27 '16 at 04:36
  • @EricLippert Actually that difference is a practical consideration having to do with the requirements of using data in a database because 1) it has to be persisted outside of the current program instance and 2) it's dynamic state has to be shared (eventually, in some way) with other instances and programs with potentially far-flung distribution. Changing a shared datum is hard/kludgy enough when you only have to synchronize with another thread in the same program instance. When you have to synchronize thousands of changes with millions of people across the world, you need a different approach. – RBarryYoung May 27 '16 at 19:00
  • 4
    @RBarryYoung: Certainly there are many, many implementation considerations. My musing was more along the lines of why languages which fetch data by dereferencing a pointer, and languages which fetch data by querying a table feel so different, when then underlying operation is conceptually the same. It's always struck me as odd. – Eric Lippert May 27 '16 at 19:03
  • @EricLippert ... IMHO, it's really the same answer as "Why is Web development so much different (worse) than Windows development? Why cant I just develop Web apps the way I develop windows apps?" AFAIK, the answer is: "Practical Considerations". – RBarryYoung May 27 '16 at 19:04
  • @EricLippert It has always struck me as odd too, and I've spent a lot of time pondering it. The best answer I've been able to come up with is the practical considerations of sharing, updating, protecting/persisting, and synchronizing changes transactionaly. You could take the ECC design pattern and extend it to make all data seem like just items and properties in a huge Object Model, but you get hung up on things again and again, like how to leverage the DB optimizers to search for row sets, and how to explicitly control when data is fetched, updated, comitted, checked for being stale etc. – RBarryYoung May 27 '16 at 19:12
  • 3
    @JarrodRoberson How does that make it off-topic? That just means the answers should be "Your premise that ... is fundamentally flawed because ... " – Ixrec May 28 '16 at 10:15
  • If you're database is source of truth of your data then you're right. However, if you use event sourcing, the source of truth is events, not your global database. – blockhead May 30 '16 at 13:05
  • I'm surprised no-one has talked much about testability yet. Global variables are bad because they represent a testing combinatorics problem. Technically speaking each global variable introduced (minimally) doubles the number of tests you must run for unit testing. A database is different because it isolates these "super-global variables" in a metaphor that allows you to reset them all to a given state (drop table;insert...insert...insert...), and relational databases even allow you to constrain these "variables" in ways that are not possible in code (referential integrity for example). – Calphool May 30 '16 at 17:12
  • 1
    @StigHemmer Everything is evil. Except - in their mind - Google, – ott-- May 30 '16 at 18:04
  • I don't think they're that much comparable. There isn't a widely used rigorous set of properties specifically designed to minimize the negative effect of global variables in the same way as ACID principles in database. They are much more prone to errors and unintended effects than DB operations. – xji May 30 '16 at 18:34
  • 1
    @EricLippert The situation feels even worse on the client side of a web app, wherein you have to work in a totally different mode of thought when you're hitting a local object (usually synchronously) versus something over the wire (usually asynchronously). Why do I have to care where the object is coming from, darnit!!?? I don't wanna!** – svidgen Jun 06 '16 at 21:08
  • 1
    One point is to just consider: What if the code was to run parallel in multiple remote machines AND has to maintain a global shared state ? A database is the answer. – S.D. Jan 06 '17 at 12:23
  • 1
    There has been a lot of discussion about globals being bad because they are mutable--which says nothing about my most common use of a global: Holding the information read from a configuration file. You either make it a global or you end up passing it around amongst all higher level routines and I consider the latter a bigger problem than the former. I would never use a global for something that is mutable and not a singleton, though. – Loren Pechtel Feb 22 '17 at 02:49

22 Answers22

119

First, I'd say that the answer you link to overstates that particular issue and that the primary evil of global state is that it introduces coupling in unpredictable ways that can make it difficult to change the behaviour of your system in future.

But delving into this issue further, there are differences between global state in a typical object-oriented application and the state that is held in a database. Briefly, the most important of these are:

  • Object-oriented systems allow replacing an object with a different class of object, as long as it is a subtype of the original type. This allows behaviour to be changed, not just data.

  • Global state in an application does not typically provide the strong consistency guarantees that a database does -- there are no transactions during which you see a consistent state for it, no atomic updates, etc.

Additionally, we can see database state as a necessary evil; it is impossible to eliminate it from our systems. Global state, however, is unnecessary. We can entirely eliminate it. So even were the issues with a database just as bad, we can still eliminate some of the potential problems and a partial solution is better than no solution.

Jules
  • 17,754
  • 44
    I think the point of the consistency is actually the main reason: When global variables are used in code, there is usually no telling when they are actually initialized. The dependencies between the modules are deeply hidden inside the sequence of calls, and simple stuff like swapping two calls can produce really nasty bugs because suddenly some global variable is not correctly initialized anymore when it's first used. At least that is the problem I have with the legacy code that I need to work with, and which makes refactoring a nightmare. – cmaster - reinstate monica May 24 '16 at 20:13
  • As @Jules notes, with objects you can substitute with something equivalent, but with globals not. Using a database also provides for substitution, just point the configuration at a different database. For example, a database can be mocked, whereas globals not so much. Because multiple databases can simultaneously exist they have similarities with objects that they don't have with globals. – Erik Eidt May 24 '16 at 20:30
  • 1
    Re "global state is unnecessary. We can entirely eliminate it." Tell that to a video game developer, or the developer of a high fidelity simulation of the solar system or of a galaxy. There has to be global state because everything can interact with everything else. In "A new model for efficient dynamic simulation" by Paul Dworkin & David Zeltzer, the authors went so far as to propose the concept of a god-object. – David Hammen May 25 '16 at 07:51
  • @DavidHammen there is a difference between global variables and singleton objects. – OrangeDog May 25 '16 at 08:46
  • 24
    @DavidHammen I've actually worked on world-state simulation for an online game, which is clearly in the category of application you're talking about, and even there I would not (and did not) use global state for it. Even if some efficiency gains can be made by using global state, the issue is that global state is not scalable. It becomes difficult to use once you move from a single-threaded to multi-threaded architecture. It becomes inefficient when you move to a NUMA architecture. It becomes impossible when you move to a distributed architecture. The paper you cite dates from... – Jules May 25 '16 at 10:47
  • 24
  • These problems were less of an issue then. The authors were working on a single processor system, simulating interactions of 1,000 objects. In a modern system you'd likely run a simulation of that kind on at the very least a dual-core system, but quite likely it could be at least 6 cores in a single system. For larger problems still, you'd run it on a cluster. For this kind of change, you must avoid global state because global state cannot be effectively shared.
  • – Jules May 25 '16 at 10:56
  • 20
    I think calling database state a "necessary evil" is a bit of a stretch. I mean, since when did state become evil? State is the entire purpose of a database. State is information. Without state, all you have are operators. What good are operators without something to operate on? That state has to go somewhere. At the end of the day, functional programming is just a means to an end and without state to mutate there would be no point in doing anything at all. It's a bit like a baker calling the cake a necessary evil - it's not evil. It's the entire point of the thing. – J... May 25 '16 at 12:25
  • @Jules - True, but there's still some object that knows at least a little bit about every object in the game, solar system, galaxy, or whatever it is that you are simulating. – David Hammen May 25 '16 at 13:40
  • There's not that much difference between a system where you have carefully managed global variables and a god object that knows about everything. – David Hammen May 25 '16 at 13:51
  • 5
    @DavidHammen "there's still some object that knows at least a little bit about every object in the game" Not necessarily true. A major technique in modern distributed simulation is taking advantage of locality and making approximations such that distant objects do not need to know about everything far away, only what data is supplied to them by the owners of those distant objects. – JAB May 25 '16 at 14:16
  • is there any relevance to the idea that the database represents something that should be persistent verse something that is not persistent – ford prefect May 27 '16 at 15:49