If you are a believer in Agile methods, but don't like Test-Driven Development (TDD), this site is for you!

Many people in the Agile community feel that TDD is a niche technique that works really well for some people but is awful for others.

Sometimes TDD fans go too far by insisting that everyone should use TDD - even saying that if you don't use TDD, you are not fully Agile, or have not given it a chance. We disagree. We explain why here.

Be Agile without TDD!

Why so many people don’t like TDD

Clearly a-lot of programmers do not like TDD. I already explained that some people work bottom-up (inductively), and others work top down (reductively), and compared this to the two polar personalities in the sciences, the experimentalists and the theorists. And just as scientists of one type are sometimes arrogant with respect to the other type, that same arrogance sometimes manifests on Agile teams, with TDD adherents claiming that their way is the "best" way.

People work differently: don’t arrogantly assume that your way of working is best for all. And in truth, we need both personality types. Consider the sciences. Some of the amazing things that experimentalists have done include the creation of the Large Hadron Collideer (LHC) and the subsequent detection of the Higgs boson. And the space program. Astronomers, including those who have detected the existence of "dark energy". Michaelson and Morley, who proved that the speed of light is constant in all reference frames.

But then there are the amazing things that theorists have done. Einstein. Dirac. Newton. Héctor A. Chaparro-Romo (who solved the 2000 year old mathematics of spherical aberration) [1], or the mathematician William Hamilton who suddenly realized how to define quaternion mathematics – later proving useful for General Relativity and Quantum Mechanics and quite possibly a foundation of all of physics – and in his exuberance carved it into Dublin’s Brougham Bridge. [2]

In software, there are bottom-up programmers - who usually like TDD - and there are top-down programmers; and they need each other.

The bottom-up ones are the TDD advocates - they are the ones who want results right away. They jump straight to code.

The top-down ones are the architects - the ones who ask, “Forget the inputs and outputs: why are we doing this? What are we trying to accomplish? And how should it all fit together?” They tend to start with diagrams, and algorithms. They think a long time before they write code.

Top-down people don’t want to hear about inputs and outputs: to them, the inputs and outputs should be an outcome of detail level design. Bottom-up people want to start with the inputs and outputs.

We need both personality types to create complex systems.

The top-down people have been out-shouted. They need a voice. Their approach is legitimate, and a necessary ingredient, and can be Agile. But we need to give them a voice without disparaging the bottom-up people. We need both, and they need to respect each other.

The choice should not be between the extreme approaches of (a) write very detailed tests for each granular piece up front, or (b) design a system top-down and write entirely black box tests after the fact. These are extremes; just like socialism and capitalism, these are opposing extremes, and extremes rarely work well in the real world – unless you have an extreme situation or extreme requirements.

A Criticism of TDD: Change Impedance


One problem with TDD is that it assumes that one knows the required inputs and outputs of a function or method. From now on, I'll just say "method", for brevity.

However, inputs and outputs are a design feature: a programmer decides what those should be. So TDD assumes that one has designed the interaction between methods, and is now detailing the internals of each method. That is fine, but one often has to first code and run a set of methods before determining what the inputs and outputs should be: one has to "see it all run together".

To handle that, TDD has the concept of "refactoring", whereby one makes adjustments to many methods, and their tests. But if one has written extensive tests for every method, that is a-lot of adjusting.

The more traditional engineering approach is to "mock up" a set of methods, get them working, and then lock down the inputs and outputs. One then would write tests to verify things all work. TDD insists on writing extensive low level tests from the beginning.

The large amount of code that must be changed - all the tests - for any behavioral change - tends to discourage change. A programmer who is under time pressure is more likely to try to keep a method's inputs and outputs the same, and implement workarounds and special cases to minimize the impact and get the fix in. That is not a good thing. Having very high coverage low level tests freezes the internal design of a system, so that changes are difficult. That defeats agility.

The Couter-Claim: That Lots of Unit Tests Enhance Agility


TDD fans will claim that having a-lot of unit tests enhances agility because the tests enable one to make changes at will, and immediately see the impact in the form of failing tests. That is, the tests provide a "safety net". However, is such a safety net really needed?

For type-unsafe languages such as Ruby, original Python, Javascript (without Typescript), and Clojure, I would say definitely: small changes can have impacts that are not detected by the language system; but for typesafe languages such as Java, Rust, Scala, Go, and many others, small inconsistencies are almost always detected by the compiler. I have personally refactored large codebases in Java and Go and introduced no new errors in the process. None. The compiler detected myriad inconsistencies, and once all those were fixed, all of the behavioral (ATDD) tests worked. A unit test safety net was not needed.

Thus, a unit test safety net is essential for agility when one uses a type-unsafe language. That is not the only reason someone might choose to use TDD: as I said, some people simply prefer it. But the language is a strong consideration.

A Criticism of TDD: Complexity and "Test-Induced Damage"


In order for code to be unit-testable, it must be possible to make calls to internal functions. But internal functions often require complex inputs. Those inputs can be very difficult to construct on the fly as part of test setup. Also, internal functions sometimes are declared to be "private" and so they are not accessible to external test programs.

To overcome these problems, unit test "mocking" tools have features that enable one to "mock" inputs and inject those inputs into the code. The result is very confusing and intricate test code. In addition, the design of the application code must sometimes be modified in order to make portions of it visible to tests, and enable things to be instantiated abnormally. David Heinemeier Hansson calls this "test-induced damage". He claims that the result is code that is actually more error-prone. The added complexity might also increase the attack surface, making the code potentially less secure.

Years ago I had experience with test language design for microchips. Microchips often have structures on them that exist solely to enhance testability. However, such structures can introduce problems. A general rule about features that exist for testability is they should be parallel constructions, but not increase the complexity of the core design itself. No rule is absolute, however: it is always a judgment call.

Indeed, in my experience with PDD, my tests have more bugs than app code, and so mixing the two actually might decrease confidence. Then again, sometimes testability concerns are major ones, and they can impact the design.

A famous quote by Einstein comes to mind: make things as simple as possible, but no simpler.

The Counter-Claim: That Coding for Testability Makes the Code Better


TDD fans claim that the code changes required for unit testing actually result in higher code cohesion and a better design. Be your own judge: view the debates between David and Kent Beck.

A Criticism: TDD Doesn’t Make Sense for UI Code


The thoroughness of unit testing is usually assessed by measuring the "code coverage", which is a count of the percent of internal branch paths that the tests exercise. A code coverage of 100% is considered to be the best, although that does not mean that the code is well tested, or that all required features are covered. Still, it is useful to have a number.

One size does not fit all, however. For example, user interface code should arguably have lower test coverage than, say, the core microservices that power a bank's financial transactions, which are called by the user interface code. Most organizations have different code coverage targets for different kinds of components, but I would argue that unit tests are entirely a waste of time for user interface code, and that behavioral tests are preferable.

A Criticism: The "Continuous Integration" Misnomer


TDD is intended to be performed in the context of "continuous integration", aka "CI". That is a process whereby a programmer codes tests and application code, getting the tests to pass locally, frequently syncing and merging the changes of others from the shared code repository, and when the tests pass locally, merging one's changes into the shared repository, where a "CI server" such as Jenkins re-runs the unit tests to verify that they all pass.

The CI "job" - the execution under Jenkins - is supposed to be "green" - that is, all the tests should pass, assuming that each programmer verifies that they do locally before pushing their code to the shared repository.

That is a powerful process - one that I strongly advocate in the coaching that I do. But I have a problem with it: the original CI process was about unit tests, and it still is. What about integration tests? And what if one is not using TDD, and so one's unit test coverage is not high? If that is the case, then the behavioral (ATDD/BDD) tests are more important - so what is the analogous "cycle" for that?

Today one often hears the term "continuous delivery", aka "CD", but that term is highly ambiguous, especially since Jez Humble and David Farley's entire original book about DevOps was called Continuous Delivery.

Another problem is that the term "continuous integration" sounds like one is performing integration tests, but in reality one is running unit tests - not integration tests. So the term is a misnomer - an especially bad and confusing one, given the importance of integration tests.

It did not always matter. When TDD and CI were born, most applications were monolithic, and so unit tests usually were integration tests, to a large degree. Not so anymore: today systems are designed as highly distributed components, and the problems tend to be in the interactions among those components - something that is not tested by unit tests.

So we really need a term for the continuous testing of integration level tests, at all levels of integration, right up to the outermost level, which is often called "end-to-end" testing. I will call that "continuous system integration" - "CSI" - not to be confused with the TV show!


[1] https://petapixel.com/2019/07/05/goodbye-aberration-physicist-solves-2000-year-old-optical-problem/

[2] https://en.wikipedia.org/wiki/Quaternion#History

No comments:

Post a Comment