TDD Regex Kata

For the past couple of years, we have been using an exercise in various of the training courses we lead, the purpose of which is to show that maintaining too narrow a view during TDD can lead to a very degenerate process and results.

The statement of the exercise is simple: produce a regex matcher that can match a single character, sequences of characters, optional characters (‘?’ in common regex notation), one or more repetitions of a character (+), zero of more repetitions of a character (Kleene Star) and, finally, groups that can also have the cardinalities previously listed (i.e. parenthesised patterns).

Note that the exercise is about producing a regex matcher, not a regex compiler, so anyone who wants to try this shouldn’t try to produce a model of a regex from the usual regex style strings. Instead, our tests will be constructing that model directly in code and then executing it against strings that may or may not match.

The order in which the capabilities of the regex are listed above is our suggestion for the implementation order. Consider them to be a sequence of stories, each of which enhances the capability developed previously.

As an additional part of the exercise, once the optional character part has been implemented we ask the course attendees to construct instances of their matchers that applies a regex with n optional ‘a’ characters followed by n required ‘a’ characters to a string containing 2n ‘a’ characters, where n varies between 1 and 25. For example, with n = 3, the corresponding regex is “a?a?a?aaa” and the string to match is “aaaaaa”.

If you try this kata, please let me know how you do, what difficulties you had and anything else interesting that you find.

Comments are closed.