|
Code refactoring is an often discussed topic
in software development these days. The word refactoring is
almost a buzzword as a result of hype from agile programming.
Perhaps I'm using the word in a different sense. In any case, this
page describes how I refactor code.
I first drafted this
material last April 2007 when folks at work asked how I
managed several evolutionary changes I coded over
recent months. The project director wondered if I
could teach others to do whatever it was I did with such good results.
Specifically, could
I call a meeting? Could I give one or more talks on how I refactor code?
I said I wasn't actually following a system or plan. My approach was
intuitive:
I wouldn't know what to say unless I first worked out what I was doing.
So I sat down and drafted what you see below while thinking about how I
reason while refactoring.
I posted this on a wiki
page at work, but no one has read it as far as I know. It seemed a
shame for it go to waste — I thought it had some value.
So I decided to repost it here since it contains no
information of a sensitive nature. I cut original leading and
trailing sections with context and examples in terms of
code at work. What remains is generic code surgery
technique.
Note I'm not rewriting
this to jazz it up. Maybe it could use polish; but what do you
want for a couple hours of writing? Of course all the html
and css markup is new.
what
¶
What is
refactoring, exactly? And what needs
refactoring? What problem is solved by it?
Refactored
code is just re-arranged code that does the
same thing, but with a different organization, with
new fracture lines you add for enough flexibility to
replace something that needs an alternative (for one
or several reasons).
You
refactor code when you re-arrange the order in
which things happen (and re-arrange which objects
handle requirements) so you can later replace one
solution with another. The goal is usually to isolate
some irritant in one place, so you can try another way
to handle the irritant. Ugliness does not warrant
refactoring — cosmetic issues have no priority.
A current
code irritation sufficient to get priority
is usually some combination of these, when you're
stuck:
- too slow or too big (uses too many cycles or too many
bytes) and it must come in as less
- too buggy or too squirrelly, or can't be audited to
assure quality and stability
- can't tell whether one component interferes with
another component
- the proper strategy to use in a complex situation is
unclear
- one component confuses understanding another when both
are mixed together
- the cause of undesired behavior cannot be attributed
to a specific bad component
entanglement
¶
Refactoring
solves the problem of entanglement: when
objects or processes are interleaved in a manner that
prevents you from dealing with one without the direct
or indirect interference of the other. You can
re-arrange code to break unnecessary dependencies, to
permit dealing with one thing without spending time on
something that should be unrelated.
why
¶
Why
(and when) would you want to refactor code?
Almost never
in practice, because there's usually
something more important to do. You only want to
refactor a bit of code when that code contains your
top priority task or problem, and you can't make
progress (that you know will work) or answer a
question without refactoring. Some improvements are
very hard to assess without comparing alternatives. So
sometimes you're best tactic is to try more than one
and make an empirical comparison. Refactoring is
sometimes the answer when the question was, "Should I
change the code from old X to new Y, or to new Z?" and
you don't know the answer, but it's important that you
end up using the better of the two alternatives Y and
Z.
Some
folks feel, when using an object oriented
language, that situations should never arise where
objects are interlinked in a byzantine dependency
network. Shouldn't information hiding be enough to
prevent one object from knowing too much about
another? In practice, the use of information hiding
and abstraction too aggressively will limit how far
you can see and how well you can cleverly optimize
information that crosses object boundaries. Visibilty
is both a performance enhancer and a code organization
clarifier. But you can get tied down accidentally.
Refactoring typically adds some hiding and removes
some visibility in order to make a replaceable
component truly pluggable when previously it was hard
linked.
When
more than one of the things below is true, it can
be a sign refactoring might help:
- it's hard to see where one feature stops and another
begins, from method to method and object to object
- it's unclear which lines of code are for which effect,
or whether there is organic overlap in features
- nothing can be changed in one place without compile
and link ripple effects for long distances
- the actual code organization shows no sign of the
organization present in English descriptions
- a change that sounds easy when said in words is
actually quite hard when looking at the code
- policies about whether to do things are intermingled
with the mechanisms for how to do things
- pointers to physical implementation details in one
subsytem are visible far away in another subsystem
- callers overspecify what they want done by spelling
out exactly how the callee can do it in micro detail
- implementations can't redo memory managment because
code uses outside know too many internals
how
¶
How do you go about refactoring code?
Now
there's the rub. There must be some trick to it,
or otherwise it would be easier and it would happen
more often. But the problem is refactoring can be
confusing, if only because the original version of
some code contributes a lot of confusion itself. Even
so, no matter how confusing it gets, you can still
follow some rules of thumb to help avoid totally
wasting your time. With some care, you can reduce the
odds your refactoring will bomb and be thrown away.
The most important rules of thumb are the first two:
¶ 1.
keep the system in a working state at all times
As soon
as you make a change that requires any
debugging, the odds you'll throw away your work under
time pressure go through the roof. Try to make changes
you know have exactly the same behavior at runtime,
and don't change too much at one time. Assume you'll
be interrupted at any time, and that you'll need to go
with whatever you have checked in last. You'll never
come back to anything that's only half done. If you
do, the world will have moved on, anyway.
¶ 2.
make small incremental changes you can clearly check
still work
During
initial stages of refactoring, you want
identical results at runtime as much as possible. As a
result, you should aim to change as little code as
possible, no matter how tempting it is to make
cosmetic changes. All you want to do is move the code
around so the new arrangement lines things up so a new
aisle appears allowing you to move more chairs around
later. At first all you want is breathing room, by
making it possible to do something else, but without
actually doing something else.
Really
what you're doing is removing inconsistencies
with a future approach, without altering a past
approach beyond mere re-organization. Let's say the
old code insists on the existence of an appendix and
kidneys. But it doesn't need them entangled — they
are entangled only accidentally. So you can
disentangle them without breaking the old world, while
preparing for a future world that wants them
completely separate.
|