As software developers, we’ve all come across those annoying, not-so-useful error messages when using some library or framework: "Couldn’t parse config file", "Lacking permission for this operation", etc. Ok, ok, so something went wrong apparently; but what exactly? What config file? Which permissions? And what should you do about it? Error messages lacking this kind of information quickly create a feeling of frustration and helplessness.
So what makes a good error message then? To me, it boils down to three pieces of information which should be conveyed by an error message:
Context: What led to the error? What was the code trying to do when it failed?
The error itself: What exactly failed?
Mitigation: What needs to be done in order to overcome the error?
Let’s dive into these individidual aspects a bit. Before we start, let me clarify that this is about error messages created by library or framework code, for instance in form of an exception message, or in form of a message written to some log file. This means the consumers of these error messages will typically be either other software developers (encountering errors raised by 3rd party dependencies during application development), or ops folks (encountering errors while running an application).
That’s in contrast to user-facing error messages, for which other guidance and rules (in particular in regards to security concerns) should be applied. For instance, you typically should not expose any implementation details in a user-facing message, whereas that’s not that much of a concern — or on the contrary, it can even be desirable — for the kind of error messages discussed here.
In a way, an error message tells a story; and as with every good story, you need to establish some context about its general settings. For an error message, this should tell the recipient what the code in question was trying to do when it failed. In that light, the first example above, "Couldn’t parse config file", is addressing this aspect (and only this one) to some degree, but probably it’s not enough. For instance, it would be very useful to know the exact name of the file:
Couldn’t parse config file: /etc/sample-config.properties"
Using an example from Debezium, the open-source change data capture platform I am working on in my day job, the second message could read like so with some context about what happened:
Failed to create an initial snapshot of the data; lacking permission for this operation
Coming back to error messages related to the processing of some input or configuration file, it can be a good idea to print the absolute path. In case file system resources are provided as relative paths, this can help to identify wrong assumptions around the current working directory, or whatever else is used as the root for resolving relative paths. On the other hand, in particular in case of multi-tenant or SaaS scenarios, you may consider filesystem layouts as a confidential implementation detail, which you may prefer to not reveal to unknown code you run. What’s best here depends on your specific situation.