Designing a Fault Injection Framework
Over at CDOT we have another phase of the log centralization project that involves producing a fault injection framework. This framework should be:
- Modular, such that different faults and fault cases can be applied simply
- Cross platform, such that this framework can be run from Windows or Linux
- Easy to use
- Allow programmatic definition of tests
- Allow non-programmatic definition of tests
Approaching this problem, we decided to take the ideas of test suites and test cases from the various Unit testing frameworks. We took the idea of a test case and made a fault case, which would be a combination of faults applied at once. From there we expanded outward and made fault suites, which would be combinations of test cases with different settings. This setup would allow for modularity in the individual test cases themselves and in running them together as one suite.
To approach the problem of having multiple types of fault, we decided to use a plugin system. Using the template design pattern would allow us to define how to inject a fault and how to check if the fault was successfully injected. With this common definition of how a plugin operated, not only could we create many different types of fault plugins easily, but others as well. Then, test cases would simply need to be combinations of plugin calls.
We additionally had two different kinds of fault cases: faults that applied to a single node (server) and faults that applied to multiple. To achieve this we decided to have two different types of plugins: single and multi-node. The difference between the two would be that single node plugins take in, you guessed it, information about a single node, while multi-nodes would take in information about more than one.
The overall design of our system would apply the template method generously to allow the user to define:
- How a plugin would inject faults
- What plugins a test case should define
- What test cases should be part of a suite
This setup would also allow for the automated production of test cases and suites with definitions of the nodes and their relationships, which I will be writing about in the next post.