Semi-Function: a Missing Tool to Handle Complexity in Imperative Code
When structuring source code, there is one topic on which the opinions of software designers don’t converge. This is whether to extract the code into many smaller functions or to combine the code into a few larger functions. In fact, different respected authors advocate for exactly opposite approaches: from Robert Martin’s “Extract till you Drop” to John Ousterhout’s “Modules should be deep”.
This seemingly unresolvable tension has led me to the idea that there might be a structure missing in programming languages and tooling (editors, IDEs, version control systems, etc.) for functions that are called from exactly one place in some other function. I’ll call them “semi-functions”. A semi-function is similar to a function defined within another function and called immediately. I use the example from Robert Martin’s “Extract till you Drop” post referenced above, converted to Kotlin syntax:
fun replace(): String {
val symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)")
val symbolMatcher = symbolPattern.matcher(stringToReplace)
while (symbolMatcher.find()) {
val symbolName: String = symbolMatcher.group(1)
run replaceAllInstances(symbolName) {
if (getSymbol(symbolName) != null &&
symbolName !in alreadyReplaced) {
alreadyReplaced.add(symbolName)
stringToReplace = stringToReplace.replace(
"$" + symbolName, translate(symbolName))
}
}
}
return stringToReplace
}
However, it’s not quite that simple. You may say that the code above is not much better, or, perhaps, even worse than the original version of the code (before extracting replaceAllInstances()
function) which Martin refactors in his post. Code browsers, IDEs, and other presentation tools should go beyond the simple treatment of source code files as flat text files to fully support semi-functions as intended and make them really useful. I’ll explain what I mean in details below in this post.
Semantics and syntax
In many programming languages, it’s already possible to define functions within other functions and even call them without a separate statement. For example, here is how our example may look like in JavaScript:
function replace() {
var symbolRegexp = /\$([a-zA-Z]\w*)/g;
var result;
while ((result = symbolRegexp.exec(stringToReplace)) !== null) {
var symbolName = result[1];
(function replaceAllInstances(symbolName) {
if (getSymbol(symbolName) != null &&
!alreadyReplaced.includes(symbolName)) {
alreadyReplaced.push(symbolName);
stringToReplace = stringToReplace.replace(
"$" + symbolName, translate(symbolName));
}
})(symbolName);
}
return stringToReplace;
}
Apart from the inconvenience of repeating symbolName
at the opposite ends of the function declaration, there would also be semantic differences:
- Semi-function doesn’t have access to the local variables from the scope of the containing function where the semi-function is defined, apart from the explicitly propagated ones which look like parameters to the semi-function. In this regard, semi-function is closer to an ordinary function than to an inner function or a lambda as they exist in the most modern programming languages.
- Semi-function runs exactly once. It cannot call itself recursively (unlike the function in the JavaScript example above), nor the containing function can call the semi-function multiple times after defining it.
Semi-function can be defined static
so that it doesn’t have access to the fields of the containing class on object-oriented languages:
stringToReplace = run static replaceAllInstances(
stringToReplace, alreadyReplaced, symbolName) {
if (getSymbol(symbolName) != null &&
symbolName !in alreadyReplaced) {
alreadyReplaced.add(symbolName)
return stringToReplace.replace(
"$" + symbolName, translate(symbolName))
}
}
Fields and variables can also be “renamed” when passed into a semi-function and their type can be declared more general, or just declared explicitly if the developer prefers:
stringToReplace = run static replaceAllInstances(
stringToReplace,
alreadyReplaced: Collection<String> = alreadyReplaced,
symbol: String = symbolName) {
...
}
Semantically and syntactically, semi-functions are much like normal functions. On the other hand, since semi-functions are run exactly once at the declaration site, their runtime cost can always be zero, just like simple code blocks in C-like programming languages. In fact, code blocks are the closest approximation of semi-functions available in these languages at the moment. Here is valid Java code equivalent to our replaceAllInstances()
semi-function:
{ // replaceAllInstances(symbolName)
if (getSymbol(symbolName) != null &&
!alreadyReplaced.contains(symbolName)) {
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace(
"$" + symbolName, translate(symbolName));
}
}
In Kotlin, there are no simple code blocks like in C, C++, C#, and Java, but the same effect can be achieved via using standard run()
function which makes the syntax even closer to semi-functions as proposed in this post:
run { // replaceAllInstances(symbolName)
...
}
Presentation
In all code snippets above, semi-functions are presented as ordinary structures in the flat source code. This may be a reasonable fallback. However, I think, the time has come for IDEs and editors to go further beyond the flat text to enable more efficient work with code.
By default, semi-functions should be folded and look almost exactly like ordinary function calls. Here is how this may look in Sublime Text editor:
When a developer performs a go to symbol action (Ctrl + click in most IDEs) or double clicks on the semi-function indicator (three small dots on the screenshot above) the semi-function declaration unfolds to the declaration similar to the plain text examples from above. At the same time, the surrounding text (the “main” source code layer, or the “previous” layer in case of nested semi-functions) may be greyed out to emphasize the semi-function’s body. IDE can also visually amend the unspecified parameter types to the unfolded semi-function declaration.
The difference between the semi-function folding and the ordinary code folding available in editors and IDEs is that whenever a developer navigates to a different file, or scrolls the current file out of the unfolded semi-function body it automatically collapses to the “function call” presentation unless pinned unfolded. This is important because developers will never fold semi-function declarations manually. If the declarations don’t fold themselves, they will be all unfolded soon and semi-functions will fail to hide details and provide an opportunity to browse the logic on a higher level of abstraction.
If a semi-function has a documentation comment, the comment is folded together with the function declaration and is visible only when the semi-function’s declaration is unfolded. This differs from how the function folding works in IDEs on the IntelliJ platform, where documentation comments and function bodies are folded separately.
Tooling support
- Although semi-functions can’t be called from production code other than at the declaration site, it should be possible to call them from test code to enable testing of semi-functions. Thanks to Vladimir Sitnikov and Ralf Westphal for pointing to this important aspect in comments.
- Semi-functions should appear in stack traces of exceptions, during debugging, in profiling and code coverage tools, etc.
- In IDEs, code browsers and search engines, semi-functions should be indexed as well as ordinary functions.
- In IDEs and editors, semi-functions should be presented in the module’s outline whenever it appears in the interface, nested with respect to their containing functions.
- It should be possible to link to a semi-function from comments (e. g. using
{@link #replaceAllInstances}
in Java). IDEs and editors should support navigation via such links and update them when the linked semi-function is renamed. - In
git diff
and Git-related patch browsing interfaces like Github’s commit and diff viewer, the hunk header should include the whole stack of semi-function declarations up to an ordinary function, for example:
@@ -27,7 +27,7 @@ fun replace(): String {
@@ run replaceAllInstances(symbolName) {
symbolName !in alreadyReplaced) {
alreadyReplaced.add(symbolName)
stringToReplace = stringToReplace.replace(
- "$" + symbolName, translate(symbolName))
+ "%$symbolName%", translate(symbolName))
}
}
}
- Code complexity measurement tools shouldn’t aggregate the cyclomatic complexity, depth of nesting, number of lines and other complexity metrics of semi-functions to the containing function’s score. Or, perhaps, there should be different limits for these metrics before the tool issues a warning or rejects the code during CI for functions with and without nested semi-functions. On the other hand, declarations of semi-functions should contribute to the containing function’s fan-out score.
- Similarly to the complexity metrics, the source code’s column count should be restarted at semi-functions’ bodies. So deeply nested semi-functions don’t have to be formatted unreadably to fit into the maximum of 100 or 120 columns, according to the project’s code style. Accordingly, IDEs and editors may support this by shifting the source code pane in order to center the semi-function’s body horizontally within the source code window when the semi-function is unfolded.
Advantages of semi-functions over extracting ordinary functions
- Semi-functions can’t be called from multiple places or recursively, which is a feature. Semi-functions thus don’t pollute the containing module’s (internal) interface. Semi-function can’t be called by mistake instead of another function from anywhere in the module. If some function needs to be called from multiple places, an ordinary (inner) function should be declared rather than a semi-function.
- First-level, ordinary functions in the containing module can be of the same level of abstraction, rather than being a mix of functions of different levels of abstraction. At least, the variability of the levels of abstraction can be smaller.
- Even when fully unfolded, semi-functions don’t contribute to the vertical size of the module as much as ordinary functions. The latter usually have to be separated with empty lines according to the coding style.
- When all semi-functions within an ordinary function are unfolded recursively, a developer can glance at them together and with all logic (statements) in the order of execution. This would make it easier to understand the overall logic of the containing ordinary function “across the stack” and to spot a bug then when jumping back and forth between extracted ordinary functions and “making function calls mentally”.
Pros of declaring semi-functions within large ordinary functions
- Limiting the function’s complexity and the number of variables available within the bodies of semi-functions. All variables used within a semi-function should be explicitly declared as parameters. It reduces the chance of using the wrong variable by mistake.
- Semi-functions add more structure to the containing function: they must return a single value (which may be a tuple which looks pretty much like a multi-return in some languages, but still). This limits the possible “creativity” of the function’s developers: for example, it shouldn’t be possible to return from the containing function (or continue to the next loop iteration in the containing function or do another form of
goto
) from within the nested semi-function’s body. For comparison, this differs from Kotlin’s inline functions which allow non-local returns, although the runtime cost of semi-functions should be zero as well as the cost of Kotlin’s inline functions. - When folded, semi-functions allow overviewing the containing function’s logic without digging into the details. On the other hand, this folded view is self-sufficient because folded semi-functions appear just like ordinary function calls. Currently, it’s possible to fold nested blocks of code within large functions in most editors and IDEs, but the result lacks the semantic information of what does the nested block do, what are its “input” and “output” variables.
- Unlike anonymous code blocks, lambdas and inline functions in most languages, semi-functions must have names. Naming semi-functions essentially forces developers to add comments to blocks of code within long functions. This is one of the large reasons to follow the “extract a lot of small functions” approach in the first place. For example, here is what Kent Beck and Martin Fowler say in “Refactoring”: “A heuristic we follow is that whenever we feel the need to comment something, we write a function instead.”
- Semi-functions can appear on the stack traces, be linked to and navigated to within IDEs, appear in the
git diff
interface, etc. See the “Tooling support” section above.
Conclusion
In this post, I’ve presented semi-functions which combine the features of lambdas, and inner functions. Semi-functions would allow taming the internal complexity of large functions without contributing to the complexity of the containing module (e. g. a class) in imperative programming languages.
Most modern programming languages support inner functions: functions declared within other functions. They mostly solve the same problems as semi-functions, but, unfortunately, a little off in terms of syntax for the immediate use: the function’s name and the list of parameters should be essentially repeated two times. This limits the usability of inner functions, and they are used relatively rarely, as far as I can tell.
On the other hand, anonymous functions (lambdas, closures) are not as strict as semi-functions: they usually allow automatic capturing of variables from the scope. Lambdas also don’t have names, which is actually very important: the necessity to name functions is one of the driving motivations for the “extract a lot of small functions” approach to the code decomposition. The absence of names also doesn’t allow to link to and quickly navigate to a specific lambda definition.
To me, it appears that semi-functions would be useful even in their plain-text, fall-back form. Special presentation support from IDEs and editors, namely auto-folding to the form barely distinguishable from an ordinary function call, would make semi-functions even more appealing.
Already existing thing?
I did some cursory research and didn’t find anything quite like semi-functions available in any imperative programming language. Anonymous functions and inner functions are everywhere, but, unfortunately, not quite what we need for complexity management, for different reasons. However, there well may be that I missed something. Please point to a language that implements semi-functions if you know one.
P. S. Semi-functions can be a presentation mode in IDE rather than programming language construct
After this post was published, I thought more about the problem and the feedback. I realized that semi-functions can be supported entirely in the IDE interface as a “tree presentation mode”, while the source code (when viewed as plain text) has extracted ordinary functions. See the description of this idea for IntelliJ: IDEA-211820.