Sunday, April 1, 2018

Code Entropy

Just like in chemistry, code and software projects experience a gradual decline to disorder. I first came across the idea of code entropy while reading The Pragmatic Programmer, but I've seen the process happen in many of the projects I've joined or worked on. Normally, when entropy is brought up, it's to encourage good engineering practices. Refactoring of code and sound design and structure of the code is thought to prevent decay. I think these things help, but can't prevent the growing disorder.
So what really is the problem? Why do things start to fall apart in software projects? Unmaintainable modifications - aka "hacks" - definitely add to entropy. Sometimes this is referred to as technical debt. These are modifications that make some functionality work, but make changes later on more time consuming and difficult.
For a non-code example, think of hanging a picture in your house. To do that, you need to find your tools, measure things out, do the hanging work, and then put things back. What if you're lazy and decide to leave your tools scattered wherever you last used them. That would make it more difficult to do your next hanging job since you'll have to search all over to find the tools that you need. That's a lot like what technical debt is.
Software engineering practices - refactoring as you go, following design patterns, separating out concerns, etc - all help to reduce technical debt and keep entropy out of your code. But there are other things that creep in and bring about disorder.
For example, use of third-party libraries is a common practice that I've seen cause a lot of problems. Don't get me wrong, I like third-party libraries and encourage using them, but they bring their own baggage. Many projects I've worked on, for example, have picked a stable version of a library and built on top of it. Then they rarely upgrade that library. There's good reasons for that - the code is working, the need to upgrade isn't perceived as an issue, it takes time to ensure an upgraded version works as intended, the more libraries you use, the harder it is to keep up to date and manage inter-dependencies, etc.
But there's also a lot of bad in not keeping up. The biggest one is security issues. If you aren't taking any updates, then you might be exposing yourself to vulnerabilities in the third-party libraries. Additionally, you can get stuck in a situation where you can't go back to make changes to code. I've seen it happen many times with Visual Studio where I've been unable to open an old project because it requires some old version of Visual Studio or an old plugin that isn't available anymore. So the project is literally unworkable until someone pays the large price of updating libraries and plugins.
On top of those worries, code eventually dies. Say your project takes a hard dependency on another project. Like your frontend project is built on Angular 1 or your backend is build in PHP 5.6 with the Zend Framework version 2. Those are great approaches, but at some point Angular 1 won't be supported anymore and there isn't a clear migration to version. Or PHP 5.6 reaches its end of life and your code doesn't run in version 7. This kind of thing happens all the time, and it leaves your code in this end-of-the-road state.
Your code may not be disorganized at this point, but it's much more likely that "hacks" are going to be put in and increase entropy since the code can't take advantage of new features and functions offered by the updated dependencies. Your developers will also likely care less about this "old, legacy code".
I don't think entropy can be completely avoided in code. Sure, you can keep it at bay for a while. To me, it just signifies that software should have a lifespan. Eventually, it will hit a point where the project needs to be shut down. I don't think that's a bad thing. Take the learnings from the project and incorporate them into the next thing.