I've been trying to write an article about testing for the last five years or so. It's all too much. This stream is an attempt to capture some ideas individually, incrementally, instead of producing a magnum opus.
How do you decide what to test?
Test every line of code you intend to keep. By "intend to keep", I mean, check in to version control. (Or, if you are in a dysfunctional environment that runs code that isn't in version control, any code you will email to a colleague or save on a production server or otherwise keep around.)
This sounds a little glib, but actually it's an important part of how tests influence design.
If you care that something keeps working, then you should write a test for it. Of course, if you don't care whether it works or not, then you can just skip writing it, and save yourself the time!
This rule has a caveat - the "intend to keep" part - for a reason, though. Some snippets of code are things that you write in order to figure out how something works, or a script that you will literally run once and then throw away. Assuming you have the conviction to actually throw it away, then it would often be a waste of time to verify that it really works right, if the act of running it is itself that verification.
There are several purposes that tests have, and they're frequently confused.
The purpose of unit tests is to verify that you understand what an individual unit of your code does at the moment that you write it.
A "unit" is a function or a method; some part of your program that has a discrete amount of behavior.
That is the only purpose of a "pure" unit test, and it is possibly the most important aspect of testing, although not the only important aspect of testing.
As a labor-saving measure, you can have unit tests serve other purposes as well.
Unit tests should be written before the code they are testing, so that you can see the test go from red (failing) to green (passing).
Unit tests should always fail first. To those without experience with test-driven development, this can often seem tedious or obvious: of course the test is useful, look, it passes! However, it is surprisingly easy to write a test that does nothing, or passes without changing the system under test.
All other tests can be written later. Integrating multiple components requires multiple working components; if you're going to write tests before you write code, then you by definition don't have all your components ready yet.
A regression test is a test whose job is to stick around and make sure that, as a system evolves, it keeps working.
Most unit tests are also regression tests, since if you're writing a unit test, the property which the test verifies is generally important for some period of time.
An integration test is a test that verifies that two or more units in a system function together. Integration tests also make good regression tests, because you want to make sure that as you change individual bits of your system, the whole keeps functioning together.
Unit tests can function as integration tests as well, but it's dangerous to make them take on too much of that responsibility. A good unit test is small, clear, and to the point; its failure will quickly point you at exactly what's broken. A good integration test is almost the opposite: it's comprehensive and realistic. A good unit test should sacrifice as much realism and comprehensiveness as necessary to achieve brevity and clarity for testing the one thing that it's focused on. A good integration test should pull in fully-configured, real versions of all the components that it's putting together.
Functional tests make sure that some function of the system is satisfied. These are integration tests that integrate everything. (Ironically, "functional tests" are almost the opposite of "functional programs", since "functional programs" eschew side-effects wherever possible, but "functional tests" should have side-effects whereas unit, regression, and integration tests should not.
The distinction between an integration test and a functional test is that an integration test just puts together some number of components of the program; a functional test exercises some behavior of the whole, assembled program.
Acceptance test are tests that are used to determine whether a software system is fit for purpose (acceptable to its user, or to its customer).
Acceptance tests are one step beyond functional tests: rather than testing that the system works, they test that it works well enough for an actual human being to use. Acceptance tests typically look exclusively at the visible artifacts of the software – pixels on the screen, for example – and not its internal state.
Automated tests are those which are run, automatically, by computer. By definition, unit tests and regression tests are automated. Integration tests are almost always automated, functional tests should be automated to some degree, and acceptance tests generally have non-automated (and non-automatable) components to them, where an actual human being inspects the output.
There's a whole confusing taxonomy of the various kinds of things your tests can test against which aren't "real", so I probably am not going to repeat that here. Many of the reasons I covered above indicate why you might want to have more or less real fakes.
One particular term I use a lot which I don't see covered elsewhere in the literature is verified fake.
When you write a library, you provide an implementation of the thing the library does. But if your library does I/O (makes an HTTP request, generates an HTTP response, pops up a window, logs a message, whatever), you've just introduced a new barrier to testing: callers of your library might want to test their code that is talking to your thing, and how are they supposed to figure out if your thing did what they wanted it to?
A good library - and the libraries that I maintain are struggling to be "good" in this sense, for the most part they're not - will provide you a real (i.e. not a fake, double, stub, mock, or dummy) in-memory implementation of their functionality. One of the best examples of this is SQLite. If you need to test code that uses SQLite, you just make an in-memory SQLite database and supply it; there's virtually no reason to fake out the database.
One step removed from this is providing a verified fake - an implementation of your functionality which doesn't do anything "useful" (like an in-memory SQLite database does) but nevertheless is verified against (a subset of) the same test suite as the real implementation, as well as providing an introspection API that allows test cases to verify that it did the right thing. This allows client code to import the fake from your library, test against it, and have a reasonable level of assurance that their code is correct in terms of how it's using the API. When they upgrade your library and its interface has changed, their tests will start failing.
Tests which use an unverified fake have a maintenance burden: they must manually keep the fake up to date with every version bump on the real implementation.
Tests which use a real implementation will then be relying on lots of unimportant details, and will be potentially unreliable and flaky as real external systems (even systems you might not usually think about as "external", like the filesystem, or your operating system's clock) have non-deterministic failure modes.
Tests which use a verified fake get the benefits of a unit test (reliability, speed, simplicity) with the benefits of an integration test (assurance that it "really works", notification of breakage in the event of an upgrade) because they place the responsibility for maintenance of the fake along with the responsibility for the maintenance of the interface and its implementation.