|
10aug09
«
archive
work « Perhaps you thought this page would be about coding practice in typical work places. Sorry about that. Actually this page is a discarded early draft (the third) of the top level code page, emphasizing context often present in day jobs. But this page isn't about jobs per se. Why would I tell you about jobs? I almost named this page context instead, but then you might have assumed this page was about execution contexts in programming languages. So jobs seemed slightly less evil in terms of misdirection. What I really wanted to talk about was context motivating my other material about code, especially job contexts. After I wrote this much, the result didn't work for me. However, this content is still interesting and might inform your view of other pages. But nothing on this page is strictly necessary or helpful. Instead I decided a top level code page can start from blank, asking "What if you wanted to do xyz?" questions, without any overview. An overview suggests a teleological goal, and I'm suggesting the absence of such a goal—just that you want some effect xyz, and don't care how you get it, as long as it works and you can tell it works. 09aug09
«
day jobs
context « Successful coding often involves many tradeoffs. When you do too much of anything, you can short-change another dimension. Conflicts are hard to resolve, especially when you adopt a religion commanding priority to one thing. You might be tempted by a plan like this one: "If I just do xyz everywhere, consistently, then all my problems will be fixed." No, they won't: something will get harder to do. In real work environments, folks have a very pragmatic focus, and just want to get things done and be able to tell they are done. Of these two, the second is harder: seeing whether you're done is hard. Once a system is almost stable and robust, shipping might be around the corner or disastrously far away. Abstraction and indirection are powerful tools in every kit, but can hide flaws from sight. Negative constraints tend to have more power. You might be unable to use technology xyz in some context, so code using xyz is useless and you must go on without it. If all your other tools depend on one, your whole chain can fail. For example, what if you can't link C++'s standard runtime library in a product? Can you still code in C++? Yes, but only if you roll your own everything. This is why coding habits focus on how to roll your own tools, when you can't plug in standard items. You need some xyz effect but can't use a tool you know because it's forbidden where needed, or because it loses when scaled to gigabytes. What now? The clock's ticking. But shooting from the hip can take off your foot, and you've always been told never do this. Therefore, break conventions carefully. If you flout convention and it fails, you'll burn, so don't wing new tools unless you tend to be right. You need to be one of those guys with a ridiculous level of accuracy in execution. And you need to empirically prove a new thing works at runtime via conclusive evidence, else any odd result is pinned on you. All complex systems have odd results; you're guilty until proven innocent. First, do no harm. Your code habits need to lean toward: being able to abandon a piece; being able to rewrite whole sections; being able to review code at need; being able to show good evidence every critical point was working. You need some coupling for strong type checking, but not too much so you can abandon and rewrite code. You need some abstraction and indirection for power and grace, but not too much so you avoid obfuscation and inabilty to assign blame. Your need brevity to ease rewrites, but not too much, so you can't find everywhere a unique symbol appears. You need less code for speed, but more to generate evidence. You need some early design for structure, but not too much so you dither and/or polish throw-away work. Almost everything you do right is both not enough and too much, both, and hitting a good balance is an art with fuzzy rules. My coding habits have both: not enough and too much, according to fashion and convention. 09aug09
«
speed
profiling « The slow part of code isn't where you think it is, unless you have a lot of experience, like me. It's easy to assume more lines of code is always slower: that code is faster when fewer expressions need evaluation. So you fear lacing your code with assertions and cheap checks for validity, and you don't gather evidence it runs correctly, resulting in fast code doing the wrong thing. Your priority is misplaced. In modern architectures you can almost always add code in functions to say: if invalid, then report an error. Branch prediction can make this cost nothing, except a slightly bigger code cacheline, when the test nearly always comes out one way: valid. If you conditionally compile out such checks, you typically see no change in speed. (But note your code can vary in speed by up to 10% between one run and another after a recompile, if your code changes size even slightly; branching to non-aligned addresses can be slower on some chips, and C++ lacks any means to insist on loop branch alignment in critical code.) Measure the speed of your code. In fact, measure everything you care about. (Count everything and make it possible to see these counts on demand.) Profiling is one way to analyze code speed. Have someone run a profiler. After years you develop good intuition about speed, provided you measure one way or another. I sometimes write complex code (because it's the only way to get the right result). A typical reaction is this question: does that make code slow? I tell them no, all the time is consumed by this loop right here where I grovel over every byte in data passing through. When code is profiled, I'm right. If an app or server consistently looks at data much larger than your fastest cache, speed is often limited by how few cache lines you can manage to touch, dwarfing minor code effects. Algorithmically, you want to make every operation constant time. It has a huge effect when data sets are enormous. This requires cleverness when some things appear to require at least linear time. For example, you might think initialization is linear when a structure is gigabytes in size, but it isn't when lazy init works. The main point of this section is simple: code execution time is not proportional to code size as source code text. You might think getting size of a list is constant time, then be surprised to learn STL lists implement size() in linear time by traversing a list, because otherwise splice could not be constant time, and someone gave that priority. In my last job I once fixed code whose time complexity was N^4—that's N to the fourth power—on a population size of one hundred thousand; you should be shocked: I was. (Someone wrote a loop over N list members, calling size() in the loop test, and then indirectly looped again inside, with another call to size() in the test for the inner loop. They intended N^2 but got N^4; I changed it to linear.) To scale nicely and have constant time operations everywhere, you can't easily use an existing collection library when it doesn't let objects be members of multiple collections at once in an efficient manner. For example, suppose you implement a cache using STL in C++, with an LRU list for space re-use and a hashmap for fast lookup. How could this go wrong? STL manages collection members by pointer reference, so when you find a list member, you don't know where it's located in the hashmap, and vice versa when you find an item in the hashmap first and don't know where it's in the list. What do you want instead? When you find a member in one collection, you already know as much as you need to get constant time operations in all other sets containing that item—no linear searches, ever, unless necessary or unless length is trivial. In C++ you need to roll this by hand, usually, because standard libraries don't help. As a result you write more code, but your app runs much faster. What's the moral of my story? One-size-fits-all data structures are seldom actually one size fits all, because they go slower when applied, unless minute internal data is exposed, breaking abstraction and encapsulation. So for high performance, you must re-invent wheels. Is it fashionably correct? No, you find junior engineers everywhere who know wheel re-invention is a classic error, like prosecuting land wars in Asia. When coding for best results, you often break rules. |