The perfect, exact science of refactoring



Do you know how sometimes your code has this thing written somewhere, then you need about the same sort of thing somewhere else, then in another place, and you think "huh, guess I'll turn it into a function"? Do you know what's wrong with doing that? It doesn't have a cool enough name.

A normal person would call it re-jiggering. Don't laugh - it's in the dictionary: "to organize differently". It's the kind of word you use when you're improvising, or trying something new, which seems about right for writing a program. "Cleaning up" is also a good phrase. Sometimes writing a program feels like you're leaving a mess as you go, making progress slower and slower. You occasionally need to stop doing real work and clean up the mess.

Here's how it works: you write simple code to make a button. Later, you find out the colors need to be settable per each, which forces you to go back and do a rewrite. Next you're assigned to write drop-downs. The specs don't say they change color, but you won't be fooled again, and add that ability. Except, it turns out, drop-downs never need to change color. It seems like everything you do is wrong and time-wasting.

It's not about making mistakes. This sort of re-jiggering is the inevitable result of the program changing, and the program always changes, in ways you never would have guessed. The client changes their mind, testing reveals a problem with the specs, new hardware, you're working through a features list and aren't sure what you'll get to. It gets worse - you may know buttons will need color changing, add it, but something else forces a total button rewrite, requiring us to rip out this now-obsolete color-change code.

But suppose you're the sort of person who thinks programming is a more perfect version of reality. The sort of person who secretly believes the true purpose of programs is to have an inner beauty. It's inconceivable to judge a program by whether it merely does what it's supposed to. You need a word that implies that all of these rewrites are part of an orderly process - a reverent act that contributes to the final perfection of the code. That word is refactoring.

Refactoring is great since it sounds like prime factorization. That's the one correct way of breaking a number into parts. We want that certainty for our programs. We can't have it, but the word helps us pretend that cleaning up our code is an orderly process, following clear rules.

We should give some details about what this actually is, all the various ways we might rejigger. Sometimes we want to group loose variables into a class to allow us to easily copy them. Suppose we're using 3 variables to describe a border, then realize we can actually have several borders. We'll throw those three vars into a quick class, declaring several copies for several borders. Of course, if we did that and never needed more than one border, all we did was create an awkward, confusing way to use those vars.

In a similar fashion we might notice that two different sections are using three vars to make their border. We'd pull them into a class the same way. Unless each section has a specific type of border only it uses -- then we'll waste lots of time trying to make that general-purpose border class.

It's common to make class vars private and write interfaces. Doing that too late can force us to rewrite old code. But many classes will never get complex. Writing interfaces for them makes it more difficult to see what's actually happening.

We might have a class with lots and lots of useful options, but in practice we only use it in three configurations. Maintaining all of those combinations of options we never use is difficult, or impossible. So we sometimes break an overly general class into more specific-use classes.

Or we do the opposite. Sometimes our specific-use classes turn out to have more on common than we thought. We might combine them into a general-purpose class. Or we might pull out common parts into mini-classes.

Sometimes a function is hard-coded to use a set of globals. That makes it easy to run. Then we suddenly realize we need to run it on different items. We'll need to convert it to take them as inputs. But the opposite also happens. Our function takes everything as inputs, but it's always the same long set of inputs every single time. We should convert it back to using globals.

We may want to redo the communication. It's nice to have our code's set-up area install its own event handlers. We can let someone else check for clicks and whatever else, and have our handlers run by magic. But it's also nice to have a visible section of code, with IF's, where we can see exactly what's being checked when in which situations. The first way is nice if we rarely tweak them, the second if we often do.


I've talked to professional artists about something that seems similar to how program writing works. A painting starts with exciting ideas, sometimes high-concept, like smears of paint for inspiration. Then there comes a time when you hate it and want to start over the right way. But you have a deadline and are a pro, and work through your revulsion. Every finished painting is much worse than you hoped it would be, but other people don't notice, and that's the for-real artistic process at work.

I think programmers need to learn the same thing. That thing of beauty in your mind is great to get started, and can guide you as you write code, but you should never expect to actually make it. You're not factoring - you're using guesswork that time spent re-orging now will save more time later. And that's nothing to be ashamed of.



Comments. or email