Would unit tests have helped Citigroup to avoid this expensive mistake?

Question

I read about this snafu: Programming bug costs Citigroup $7m after legit transactions mistaken for test data for 15 years.

When the system was introduced in the mid-1990s, the program code filtered out any transactions that were given three-digit branch codes from 089 to 100 and used those prefixes for testing purposes.

But in 1998, the company started using alphanumeric branch codes as it expanded its business. Among them were the codes 10B, 10C and so on, which the system treated as being within the excluded range, and so their transactions were removed from any reports sent to the SEC.

(I think this illustrates that using a non-explicit data indicator is ... sub-optimal. It would have been much better to populate and use a semantically explicit Branch.IsLive property.)

That aside, my first reaction was "Unit tests would have helped here"... but would they?

I recently read Why most unit testing is waste with interest, and so my question is: what would the unit tests that would have failed on the introduction of alphanumeric branch codes look like?

Should a question that is meaningless without viewing an external link be closed? — gnat, Jul 14 '16 at 04:00
It seems they also missed an integration test which checked the amount of transactions exported to SEC. If you build an export feature that would be a reasonable check. — Luc Franken, Jul 14 '16 at 07:14
Unit tests are meant to protect from / warn about regression. The regression did not take place. — Den, Jul 14 '16 at 08:13
The author of the article doesn't seem to understand unit testing. Some claims are just plain ridiculous ("unit tests are unlikely to test more than one trillionth of the functionality of any given method"), others would destroy the chance of getting regressions ("look at the tests that have never failed in a year and consider throwing them away"). Or suggestions like "turn unit tests into assertions", which are supposed to change failed tests for runtime exceptions? — vgru, Jul 14 '16 at 13:24
@gnat I didn't read the external link, and I still found this question meaningful — Jeutnarg, Jul 14 '16 at 16:06
For what it's worth, I pretty much disagree with everything in "Why Most Unit Testing is Waste." I'd write a rebuttal, but this margin is too small to contain it. — Robert Harvey, Jul 14 '16 at 19:24
@Jeutnarg Someone edited in details from the link since gnat posted that comment. — jpmc26, Jul 14 '16 at 19:42
Almost all testing is a waste. But when you find a way to only test the things that are broken and not 'waste' time testing things that work, well you'll be a billionaire. Literally. — corsiKa, Jul 14 '16 at 19:49
@Groo I think you missed the point of the article. The one trillionth estimate comes from all possible inputs. Your module takes a string? How many different possible strings are there that you could feed as input? That's why the problem space is so huge. (If you don't think that's a useful perspective, refute it instead of asserting it's "ridiculous".) The author doesn't advocate throwing away tests that have business value like regression tests that trace to requirements. As for assertions, it converts a dev-time check into a fail-fast, runtime enforcement that exposes programming errors. — jpmc26, Jul 14 '16 at 19:50
Anyone that has even a modicum of unit testing experience knows that you don't test for every possible permutation of a program. That task is impossible for anything but the most trivial of programs. Instead, look for the branches and edge cases, and test those. There are even static code analyzers that will tell you where those edge cases are likely to be. — Robert Harvey, Jul 14 '16 at 19:54
@RobertHarvey Of course, but the author makes the point to give perspective on the "coverage" metric, not suggest that you should attempt to achieve that. My point is that Groo seems to have fundamentally misunderstood the points the article was making, so it's hard to take the dismissal as credible. If you disagree with it even though you do understand it, that's a completely different matter. — jpmc26, Jul 14 '16 at 20:28
@RobertHarvey rebuttals have been published. There is also a lively thread on the programming subreddit https://www.reddit.com/r/programming/comments/1zqjk8/why_most_unit_testing_is_waste/ — Matt Evans, Jul 15 '16 at 05:21
@jpmc26: All the quotes I gave in my comment are taken from the article where they are emphasized to look like "axioms", and the author is pretty explicit about his opinions. Even if author's "perspective" on the coverage metric might comes from a valid idea that it doesn't provide an accurate or full image of code quality, his examples are, yes, ridiculous to anyone doing unit testing regularly. 1) If your code coverage analyzer said you have a coverage of 50%, does this mean you covered half of all the reasonable edge cases, or that you made "only 500 billion tests for each method"? — vgru, Jul 15 '16 at 09:21
@jpmc26: 2) you claim that "The author doesn't advocate throwing away tests that have business value" - first of all, it would be cool if you had a metric which would identify "test that have no business value" (I presume you can recognize those which "cannot possibly" catch regressions?), but no, he clearly wrote "remove all tests which haven't failed in a year", axiom style. 3) "As for assertions, it converts a dev-time check into a fail-fast, runtime enforcement that exposes programming errors" - that's only if "fail-fast" for you means "later than during testing". — vgru, Jul 15 '16 at 09:21
People that say things like "slowly took the world by storm" are making my brain melt. — nvoigt, Jul 15 '16 at 10:28
Yes, unit tests would have helped in this case, the same way that using duct tape to fix a broken pipe would help — Bradley Thomas, Jul 15 '16 at 13:20
@RobertHarvey I am actually a big fan of 'Why Most Unit Testing is a Waste', but I had to bump your comment for the Fermat ref :) — Jared Smith, Jul 15 '16 at 14:34
@JaredSmith: Naturally, there are probably dozens of ways to abuse unit testing (just like everything else), but enumerating them all in a paper doesn't invalidate the technique. Groo's comment is spot on; many of the assumptions the paper makes are dead wrong. — Robert Harvey, Jul 15 '16 at 14:37
One of the most important uses of unit testing is to permit you to modify code (whether optimizations, added functionality, supporting new platforms, using new language constructs, or whatever) and have additional confidence that you haven't broken it. Removing unit tests because they haven't triggered is nuts. — David Schwartz, Jul 17 '16 at 08:59
Comments are not for extended discussion; this conversation has been moved to chat. — yannis, Jul 17 '16 at 11:17
@Groo When you write 'If your code coverage analyzer said you have a coverage of 50%, does this mean you covered half of all the reasonable edge cases, or that you made "only 500 billion tests for each method"?', what code coverage metric are you assuming? — sdenham, Jul 17 '16 at 20:24
@sdenham: that's basically the question I wanted jpmc26 to comment, but let's just say no coverage metric expects a test to code ratio of one trillion to one. For example, the company I work for doesn't have any imposed code coverage metrics, and I am still writing unit tests because they let me create small, fairly tested, fairly decoupled chunks of code at a time. The alternative to this coding style, IMHO, is writing a bunch of code and hitting "Build and Run" every now and then to "see what happens" through trial and error. — vgru, Jul 18 '16 at 08:02
@Groo: If you are considering whole-program state-transition coverage, then the figures might be in the trillions. If you are using a lesser coverage metric (which is the only feasible option), then that figure does not translate into the number of reasonable edge cases covered, for any reasonable definition of reasonable. Far from being opposed to unit testing, I use it a lot myself, but the point is you always have to make a choice about what the best use of your limited testing time is. — sdenham, Jul 19 '16 at 18:52
@sdenham: I agree with that, this is why I don't agree with the linked article, because it seems to take this extreme case and uses it as an argument against testing, although it has nothing to do with practice. — vgru, Jul 20 '16 at 07:53

score 118 · Answer 1 · answered Jul 14 '16 at 06:50

118

Unit tests could have caught that the branch codes 10B and 10C were incorrectly classified as "testing branches", but I find it unlikely that the tests for that branch classification would have been extensive enough to catch that error.

On the other hand, spot checks of the generated reports could have revealed that branched 10B and 10C were consistently missing from the reports much sooner than the 15 years that the bug was now allowed to remain present.

Finally, this is a good illustration why it is a bad idea to mix testing data with the real production data in one database. If they had used a separate database that contains the testing data, there would not have been a need to filter that out of the official reports and it would have been impossible to filter out too much.

answered Jul 14 '16 at 06:50

Bart van Ingen Schenau

74,630

80

+1 Unit tests can never compensate for poor design decisions (like mixing test and real data) – Jeutnarg Jul 14 '16 at 16:06
6

While it is best to avoid mixing test data with real data, it can be hard to validate a production system if that requires modifying real data. For example, it's a bad idea to validate a banking system by modifying bank accounts totals in production. Using code ranges to designate meaning is problematic. A more explicit attribute of the records would probably have been a better choice. – JimmyJames Jul 14 '16 at 16:40
2

@JimmyJames What do you mean with "validating a production system"? The goal should be to have no weird "if (unitest) ..." parts in your production code. If they had used a separate test database (and system) for their testing, they wouldn't have had the need to tag test data and could've avoided the whole thing from the beginning. – Voo Jul 14 '16 at 20:32
4

@Voo I think there is a tacit assumption that there exists a level of complexity or requirement of reliability where testing the actual, deployed production system is considered worthwhile or necessary. (Consider how much could go wrong because of a wrong configuration variable.) I could see this being the case for a large financial institution. – jpmc26 Jul 14 '16 at 20:39
4

@Voo I'm not talking about testing. I'm talking about validation of the system. In a real production system, there are many ways that it can fail that have nothing to do with code. If you are putting a new banking system into production, you might have some issue in the db or network etc. that is preventing transactions from being applied to the accounts. I've never worked at a bank but I'm pretty sure it's frowned upon to start modifying real accounts with phony transactions. So that leaves you with either setting up fake accounts or wait and pray. – JimmyJames Jul 14 '16 at 20:41
12

@JimmyJames In healthcare it's common to periodically copy the production database into the test environment to perform testing on data that's as close to real as possible; I'd think a bank can do the same. – dj18 Jul 14 '16 at 21:54
2

@Voo - given that banking systems are actually a huge ecosystem of services that have been built over decades, you can test all you like in parallel systems, but there will still need to be some kind of validation in the full live ecosystem - using fake branch codes (or CC number or ...) – HorusKol Jul 15 '16 at 04:45
2

@Horus At work here we have a identical test environment down to every single server. Test and life are completely identical (down to same server names and configuration files), except that you can trash the test database without consequences. Mixing test and life systems means you're not actually testing your life code (since suddenly you need some fun "if test: do something" branches). Sure it makes things easier (and cheaper) for people, but I've yet to see an argument why you'd absolutely need it. – Voo Jul 15 '16 at 11:50
4

@dj18 That helps with testing against the same data but it tells you nothing about whether the configuration of the production system is right. It's one of those situations where the proof of the pudding is in the eating. You know it works when it is actively working. Everything else is deductive extrapolation. – JimmyJames Jul 15 '16 at 12:54
1

Unfortunately, sometimes you must mix test and real data. Woe to Mr. Test if he ever tried to buy from my previous employer--the whole factory floor engine knew that someone named "Test" was an engineering test of new product, go through the motions (print reports etc) but don't actually do anything (no commands were sent to the hardware, no materials ordered, no inventory tracking, the operators knew to simply set aside any such printouts for the guy who sent it through the system.) – Loren Pechtel Jul 15 '16 at 23:54
@voo - We do the same, but then we only have 4 servers in our system. How many servers do you? How many servers do you think citigroup has? – HorusKol Jul 16 '16 at 00:28
2

@Voo: We find that having an occasional test record in production is required. After a few low-level mistakes we went back and added a TEST flag to the database schema so we could flag the test records for the handful of places that have to care. Having a test range seems kind of bad to us. – Joshua Jul 16 '16 at 03:36
@Joshua Yeah, it's a magic flag--bad idea. (The case I mentioned above it was "Test" as a name that triggered the test mode behavior.) There's no need of a whole range of test numbers, bad idea. – Loren Pechtel Jul 16 '16 at 05:20
@horus We have a few hundred in total (different regions with different language settings and just generally different setups get the number up quickly). Thanks to the wonders of virtualization this is all much easier than it used to be. I'm sure Citigroup has a good deal more. I'm also sure they have a great deal more budget as well. – Voo Jul 16 '16 at 06:24
@voo - honestly, though, $7 million over 15 years is only $500k a year. Citigroup would probably need to spend more than that for a replicated system. Not a good investment. – HorusKol Jul 16 '16 at 07:37
@Horus You shouldn't forget that your test infrastructure doesn't have to be x physical real servers. What you basically need is a larger cluster that allows you to run lots of VMs on it. This has lots of advantages: You can have the same version as in life installed to try and reproduce bugs found in the wild, you can test the next version of the software in an early stage without being worried about destroying data and it forces you to come up with reasonably standardized role out procedures which can then be used for life as well. – Voo Jul 16 '16 at 09:29

score 76 · Answer 2 · answered Jul 14 '16 at 08:07

76

The software had to handle certain business rules. If there were unit tests, the unit tests would have checked that the software handled the business rules correctly.

The business rules changed.

Apparently nobody realised that the business rules had changed, and nobody changed the software to apply the new business rules. If there had been unit tests, those unit tests would have to be changed, but nobody would have done that because nobody realised that the business rules had changed.

So no, unit tests wouldn't have caught that.

The exception would be if the unit tests and the software had been created by independent teams, and the team doing the unit tests changed the tests to apply the new business rules. Then the unit tests would have failed, which hopefully would have resulted in a change of the software.

Of course in the same case if only the software was changed and not the unit tests, then the unit tests would also fail. Whenever a unit test fails, it doesn't mean the software is wrong, it means either the software or the unit test (sometimes both) are wrong.

answered Jul 14 '16 at 08:07

gnasher729

44,814
4
64
126

2

Is it feasible to have different teams where one is working on code and other on "unit" tests? How is that even possible?... I'm refactoring my code all the time. – Sergio Jul 14 '16 at 11:50
2

@Sergio from one perspective, refactoring changes internals while preserving behavior - so if the test is written in a way that tests behavior without relying on internals, then it doesn't need updating. – Daenyth Jul 14 '16 at 12:39
1

I've seen this happen a number of times. Software is in production with no complaints, then all of a sudden users explode with complaints that it no longer works and has been gradually failing over the years. That's what happens when you decide to go and change your internal procedures without following the standard notification process... – Brian Knoblauch Jul 14 '16 at 14:50
42

"The business rules changed" is the critical observation. Unit tests validate that you implemented the logic you thought you implemented, not that your logic was correct. – Ryan Cavanaugh Jul 14 '16 at 18:14
5

If I'm correct about what happened, it's unlikely that unit tests to catch this would be written. The basic principle for selecting tests is to test some "good" cases, some "bad" cases, and cases bracketing any boundaries. In this case, you'd test "099", "100", and "101". Since "10B" was covered by the "reject non-numbers" tests under the old system, and is greater than 101 (and so is covered by testing that) under the new system, there's no reason to test it -- except that in EBCDIC, "10B" sorts between "099" and "100". – Mark Jul 14 '16 at 18:47
@Mark: It sounds like the problem wasn't in the code, but rather in somewhat poorly-specified business rules which some people thought meant that 10A-10Z should be available for use as real branch codes, but which programmers did not believe needed to be treated as such. – supercat Jul 15 '16 at 19:54
I must be really misunderstanding this answer. The business rules changed, nobody bothered to change the unit tests, sooooooo the unit tests failed. Problem solved? – djechlin Jul 15 '16 at 22:44
1

Yup. In my experience mistakes that manage to lurk in production environments come down to scenarios nobody thought of, often in terms of the interaction of a new feature with long-standing code. As such, they would be very unlikely to be caught in testing. – Loren Pechtel Jul 15 '16 at 23:46
1

@djechlin: The business rules changed, but the code was not updated to support the new rules, and the unit-tests were not updated to validate that the new rules were supported, so -- bug. (It sounds like you're interpreting "business rules" to mean "what the code does"? But this answer is using it to mean "what the code is supposed to do".) – ruakh Jul 16 '16 at 03:43
@djechlin: Business rules changed. Code didn't change. Unit tests didn't change (assuming there had been unit tests). Since unit tests test that the code does what the unit test is testing, not what the code should do, unit tests wouldn't have failed. And therefore wouldn't have helped finding this problem. – gnasher729 Jul 18 '16 at 12:16
@ruakh oic, yes if neither the code nor the tests change, then you have a bit of a problem. – djechlin Jul 18 '16 at 15:24

score 30 · Answer 3 · answered Jul 14 '16 at 12:31

30

No. This is one of the big problems with unit testing: they lull you into a false sense of security.

If all your tests pass, it doesn't mean your system is working right; it means all your tests are passing. It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would, which really isn't that much of a big deal anyway: that was the stuff that you were actually paying close attention to, so it's very likely you got it right anyway! But it does nothing to catch cases you never thought of, such as this one, because you never thought to write a test for them. (And if you had, you'd have been aware that that meant code changes were necessary, and you'd have changed them.)

answered Jul 14 '16 at 12:31

Mason Wheeler

82,789

17

My father used to ask me that: Why didn't you think of the thing you didn't think of? (Only he used to make it confusing by saying "If you don't know, ask!") But how do I know that I don't know? – Jul 14 '16 at 13:28
1

OP's example just shows a really poor engineering decision. If someone hardcoded an arbitrary rule into the system to solve a problem in a really convoluted way, and then wrote unit/functional/integration tests which validate this behavior as correct, then these tests will do just that - show that the code is working the way it should be working. – vgru Jul 15 '16 at 09:33
But the fact that automated tests will never remove all the bugs from the code doesn't mean people shouldn't write them. If your tests are passing, sure, it doesn't mean the system is working, but if you have no unit tests at all, then your system is a big black box, and all your knowledge about it comes through its external interfaces. This is even more concerning than having a "false" sense of security about its internals. Passing unit tests at least show that someone took care to check if individual functions are doing what the developer thought they were supposed to be doing. – vgru Jul 15 '16 at 09:33
2

@nocomprende Consider this your notice that you don't know. So no excuse next time – Bradley Thomas Jul 15 '16 at 13:19
@Groo Wait, I thought that unit tests are there mainly to validate that those black box assumptions are correct. They make your system much more black boxy, on the assumption that that's a good idea (i.e. more abstraction, less dependence on implementation details and other internals etc.). With no unit tests, you don't have the (relatively) dependable black boxes, and you must more often dig into the details, and you end up more fragile to internal changes that don't affect the documented external interface. – Luaan Jul 15 '16 at 13:23
7

"It means that the parts of your design that you consciously thought about and wrote tests for are working as you consciously thought they would." Exactly right. This information is invaluable if you are refactoring, or if something changes somewhere else in the system that breaks your assumptions. Developers who are lulled into a false sense of security simply do not understand the limitations of unit testing, but that doesn't render unit testing a useless tool. – Robert Harvey Jul 15 '16 at 15:09
4

@RobertHarvey Did you read the linked paper? Bear in mind 3 simple facts: 1) Tests are code too. 2) Anything even approaching "thoroughness" in automated testing requires significantly more code than the codebase being tested. (In SQLite, for example, there's a test-to-functional-code ratio of ~1200:1). 3) Developers are statistically likely to create 3 significant bugs per 1000 lines of code. This means that any testing suite thorough enough to be truly useful is statistically likely to introduce (and enforce, as tests are considered canonical) more bugs than it fixes! – Mason Wheeler Jul 15 '16 at 15:14
2

@MasonWheeler: I stopped reading after about the sixth invalid assumption. I scanned the rest, but there's nothing worth examining there, other than exploring the many ways people get unit testing wrong. Thoroughness is not what unit testing is about. – Robert Harvey Jul 15 '16 at 15:15
3

@RobertHarvey What "invalid assumptions" are you talking about? I read it and saw a true rarity: someone actually taking an evidence-based look at unit testing instead of just accepting the practice as an article of faith because it sounds like a good idea on the surface. – Mason Wheeler Jul 15 '16 at 15:17
12

@MasonWheeler: Like you, the author thinks that unit testing is somehow supposed to prove that your program works. It doesn't. Let me repeat that: unit testing does not prove that your program works. Unit testing proves that your methods fulfill your testing contract, and that's all it does. The rest of the paper falls down, because it rests on that single invalid premise. – Robert Harvey Jul 15 '16 at 15:18
5

Naturally, developers who have that false belief are going to be disappointed when unit testing utterly fails them, but that's the fault of the developer, not of unit testing, and it doesn't invalidate the genuine value that unit testing provides. – Robert Harvey Jul 15 '16 at 15:22
2

@RobertHarvey I don't think that unit testing "somehow proves that your program works." In fact, I agree with you about what it does actually prove. There are two problems with that, though. First, "your testing contract" is made of code which can have bugs in it, which means that what it proves is essentially useless. Second, "proving that your program works" is exactly what people are told unit testing does. Uncle Bob literally makes that promise in so many words. Full coverage plus all tests passing means all your code works! – Mason Wheeler Jul 15 '16 at 15:25
Let us continue this discussion in chat. – Robert Harvey Jul 15 '16 at 15:25
This is reality, there is no way to "prove" anything. Things can fail, or, they might not have failed yet. That is how Science works, and they call it Computer Science, right? "Get used to disappointment." – Jul 15 '16 at 16:15
5

o_O @ your first sentence. Unit tests give you a false sense of security while coding like having your hands on the steering wheel gives you a false sense of security while driving. – djechlin Jul 15 '16 at 22:45
@Mason to name just one wrong assumption in 1.3: "That means that tests have to be at least as computationally complex as code". Unit tests are about checking whether a function returns the expected output for a given input. But you can use several different algorithms that give the same result but work differently (sorting being one obvious example). Since those algorithms can have different complexity but can still be covered with the same unit tests (whose complexity won't change) the stated assumption cannot be true. – Voo Jul 16 '16 at 06:47
Or on a basic sample: using my unit tests written for a bubble sort (complexity minimal) I can also test timsort or heap sort (both of them a great deal more complex) with the same tests. – Voo Jul 16 '16 at 06:50
The advice I give to my colleagues is to not think about unit tests as verifying the product works. They verify that a piece of code does what you expect it to do. Your expectations can easily, and likely, be incomplete, and can even be wrong. – Brandon Jul 17 '16 at 22:41
@Voo: but most likely you won't get line coverage of the TimSort code using the tests you wrote against bubble sort, because you won't exercise all the special cases. Some people would take this as a cue to add more test cases, making the tests in some conceptual sense more complicated. Others decide the whole concept of spot-testing is useless, and work on code-proving systems instead of sort routines ;-) But of course you're right, the test code doesn't increase in (eg) cyclomatic complexity as a result of adding more cases. Run-time complexity might be the same as the code under test. – Steve Jessop Jul 18 '16 at 11:46

Steve Jessop · Accepted Answer · 2016-07-15T11:18:34.650

Are you really asking, "would unit tests have helped here?", or are you asking, "could any kind of tests possibly have helped here?".

The most obvious form of testing that would have helped, is a precondition assertion in the code itself, that a branch identifier consists only of digits (supposing that this is the assumption relied on by the coder in writing the code).

This could then have failed in some kind of integration test, and as soon as the new alpha-numeric branch ids are introduced then the assertion blows up. But that's not a unit test.

Alternatively, there could be an integration test of the procedure that generates the SEC report. This test ensures that every real branch identifier reports its transactions (and therefore requires real-world input, a list of all branch identifiers in use). So that's not a unit test either.

I can't see any definition or documentation of the interfaces involved, but it may be that unit tests cannot possibly have detected the error because the unit was not faulty. If the unit is permitted to assume that branch identifiers consist only of digits, and the developers never made a decision what the code should do in case it didn't, then they should not write a unit test to enforce particular behaviour in the case of non-digit identifiers because the test would reject a hypothetical valid implementation of the unit that handled alphanumeric branch identifiers correctly, and you don't usually want to write a unit test that prevents valid future implementations and extensions. Or maybe one document written 40 years ago implicitly defined (via some lexicographical range in raw EBCDIC, instead of a more human-friendly collation rule) that 10B is a test identifier because it does in point of fact fall between 089 and 100. But then 15 years ago someone decided to use it as a real identifier, so the "fault" doesn't lie in the unit that correctly implements the original definition: it lies in the process that failed to notice that 10B is defined to be a test identifier and therefore should not be assigned to a branch. The same would happen in ASCII if you defined 089 - 100 as a test range and then introduced an identifier 10$ or 1.0. It just happens that in EBCDIC the digits come after the letters.

One unit test (or arguably a functional test) that conceivably might have saved the day, is a test of the unit that generates or validates new branch identifiers. That test would assert that the identifiers must contain only digits, and would be written in order to allow users of the branch identifiers to assume the same. Or perhaps there's a unit somewhere that imports real branch identifiers but never sees the test ones, and that could be unit-tested to ensure it rejects all test identifiers (if identifiers are only three characters we can enumerate them all, and compare the behaviour of the validator to that of the test-filter to ensure they match, which deals with the usual objection to spot-tests). Then when somebody changed the rules, the unit test would have failed since it contradicts the newly-required behaviour.

Since the test was there for a good reason, the point where you need to remove it due to changed business requirements becomes an opportunity for somebody to be given the job, "find every place in the code that relies on the behaviour that we want to change". Of course this is difficult and hence unreliable, so would by no means guarantee saving the day. But if you capture your assumptions in tests of the units that you're assuming properties of, then you've given yourself a chance and so the effort is not wholly wasted.

I agree of course that if the unit hadn't been defined in the first place with a "funny-shaped" input, then there would be nothing to test. Fiddly namespace divisions can be hard to test properly because the difficulty doesn't lie in implementing your funny definition, it lies in making sure that everyone understands and respects your funny definition. That is not a local property of one code unit. Furthermore, changing some data type from "a string of digits" to "a string of alphanumerics" is akin to making an ASCII-based program handle Unicode: it will not be simple if your code is heavily coupled to the original definition, and when the data type is fundamental to what the program does then it often is heavily coupled.

it’s a bit disturbing to think it’s largely wasted effort

If your unit tests sometimes fail (while you're refactoring, for example), and in doing so give you useful information (your change is wrong, for example), then the effort wasn't wasted. What they don't do, is test whether your system works. So if you're writing unit tests instead of having functional and integration tests then you may be using your time sub-optimally.

I was going to also say "Unit tests bad!" but I thought most people would miss the reference to Animal Farm and start actually criticizing me instead of thinking about what I was saying (knee-jerk responses are ineffective) but I didn't say that. Maybe a person who is smarter and has greater erudition can make that point. — , Jul 15 '16 at 17:40
"All the tests are passing, but some tests are MORE passing than others!" — GHP, Jul 18 '16 at 12:18
testing is a red herring. These guys just did not know how "branch code" was defined. This would be like the US Post Office not knowing that it was changing the definition of zip code when it added 4 digits. — radarbob, Jul 18 '16 at 13:33

score 10 · Answer 5 · answered Jul 14 '16 at 07:57

No, not necessarily.

The original requirement was to use numeric branch codes, so a unit test would have been produced for a component that accepted various codes and rejected any like 10B. The system would have been passed as working (which it was).

Then, the requirement would have changed and the codes updated, but this would have meant the unit test code that supplied the bad data (that is now good data) would have to be modified.

Now we assume that, the people managing the system would know this was the case and would change the unit test to handle the new codes... but if they knew that was occurring, they would also have known to change the code that handled these codes anyway.. and they didn't do that. A unit test that originally rejected code 10B would have happily said "everything is fine here" when run, if you didn't know to update that test.

Unit testing is good for original development but not for system testing, especially not 15 years after the requirements are long forgotten.

What they need in this kind of a situation is an end-to-end integration test. One where you can pass in the data you expect to work and see if it does. Someone would have noticed that their new input data didn't produce a report and would then investigate further.

Spot on. And the main (only?) problem with unit tests. Saved me wording my own answer, as I'd have said precisely the same thing (but probably worse!) :) — Lightness Races in Orbit, Jul 15 '16 at 23:04

score 8 · Answer 6 · answered Jul 14 '16 at 08:56

Type testing (the process of testing invariants using randomly-generated valid data, as exemplified by the Haskell testing library QuickCheck and various ports/alternatives inspired by it in other languages) may well have caught this issue, unit testing almost certainly would not have done.

This is because when the rules for the validity of branch codes were updated, it is unlikely anybody would have thought to test those specific ranges to ensure they worked correctly.

However, if type testing had been in use, somebody should at the time the original system was implemented have written a pair of properties, one to check that the specific codes for test branches were treated as test data and one to check that no other codes were... when the data type definition for the branch code was updated (which would have been required in order to allow testing that any of the changes for the branch code from digit to numeric worked), this test would have started testing values in the new range and would most likely have identified the fault.

Of course, QuickCheck was first developed in 1999, so was already too late to catch this issue.

I think it's more normal to call this property based testing, and of course it's just as possible to write a property based test that would still pass given this change (though I do think you are more likely to write a test that could find it) — jk., Jul 14 '16 at 09:15

score 5 · Answer 7 · answered Jul 14 '16 at 12:06

I really doubt unit testing would make a difference to this problem. It sounds like one of those tunnel vision situations because functionality was changed to support new branch codes, but this was not carried out throughout all areas in the system.

We use unit testing to design a class. Re-running a unit test is only required if the design has changed. If a particular unit does not change, then the unchanged unit tests will return the same results as before. Unit tests are not going to show you the impacts of changes to other units (if they do you are not writing unit tests).

You can only reasonably detect this problem via:

Integration tests - but you would have to specifically add the new code formats to feed through multiple units in the system (i.e. they would only show you the problem if the original tests included the now-valid branches)
End-to-end testing - the business should run an end-to-end test that incorporated old and new branch code formats

Not having sufficient end-to-end testing is more worrying. You cannot rely on unit testing as your ONLY or MAIN test for system changes. Sounds like it only required someone to run a report on the newly supported branch code formats.

user2023861 · Answer 8 · 2016-07-15T20:05:06.673

The takeaway from this is to Fail Fast.

We don't have the code, nor do we have many examples of prefixes that are or are not test branch prefixes according to the code. All we have is this:

089 - 100 => test branch
10B, 10C => test branch
< 088 => presumably real branches
> 100 => presumably real branches

The fact that the code allows numbers and strings is more than a little strange. Of course, 10B and 10C can be considered hex numbers, but if the prefixes are all treated as hex numbers, 10B and 10C fall outside of the test range and would be treated as real branches.

This likely means that the prefix is stored as a string but treated as a number in some cases. Here is the simplest code I can think of that replicates this behavior (using C# for illustrative purposes):

bool IsTest(string strPrefix) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix))
        return iPrefix >= 89 && iPrefix <= 100;
    return true; //here is the problem
}

In English, if the string is a number and is between 89 and 100, it's a test. If it's not a number, it's a test. Otherwise it's not a test.

If the code follows this pattern, no unit test would have caught this at the time the code was deployed. Here are some example unit tests:

assert.isFalse(IsTest("088"))
assert.isTrue(IsTest("089"))
assert.isTrue(IsTest("095"))
assert.isTrue(IsTest("100"))
assert.isFalse(IsTest("101"))
assert.isTrue(IsTest("10B")) // <--- business rule change

The unit test shows that "10B" should be treated as a test branch. User @gnasher729 above says that the business rules changed and that's what the last assertion above shows. At some point that assert should have switched to an isFalse, but that didn't happen. Unit tests get run at development- and build-time but then at no point afterwards.

What's the lesson here? The code needs some way to signal that it received unexpected input. Here is an alternative way to write this code that emphasizes that it expects the prefix to be a number:

// Alternative A
bool TryGetIsTest(string strPrefix, out bool isTest) {
    int iPrefix;
    if(int.TryParse(strPrefix, out iPrefix)) {
        isTest = iPrefix >= 89 && iPrefix <= 100;
        return true;
    }
    isTest = true; //this is just some value that won't be read
    return false;
}

For those who don't know C#, the return value indicates whether or not the code was able to parse a prefix from the given string. If the return value is true, the calling code can use the isTest out variable to check if the branch prefix is a test prefix. If the return value is false, the calling code should report that the given prefix is not expected, and the isTest out variable is meaningless and should be ignored.

If you are ok with exceptions, you can do this instead:

// Alternative B
bool IsTest(string strPrefix) {
    int iPrefix = int.Parse(strPrefix);
    return iPrefix >= 89 && iPrefix <= 100;
}

This alternative is more straightforward. In this case, the calling code needs to catch the exception. In either case, the code should have some way of reporting to the caller that it didn't expect a strPrefix that couldn't be converted to an integer. In this way the code fails fast and the bank can quickly find the problem without the SEC fine embarrassment.

score 2 · Answer 9 · edited May 23 '17 at 12:40

An assertion built-in to the run-time might have helped; for example:

Create a function like bool isTestOnly(string branchCode) { ... }
Use this function to decide which reports to filter out
Re-use that function in an assertion, in the branch-creation code, to verify or assert that a branch isn't (can't be) created using this type of branch code ‼
Have this assertion enabled in the real run-time (and not "optimized away except in the debug-only developer version of the code") ‼

Would unit tests have helped Citigroup to avoid this expensive mistake?

11 Answers11