Þ   briarpig  » code  » main


07sep09 « trusted base

C runtime «

     This page is mainly about writing a main() entry point for test code so you can try other code you write, in experiments. When I planned initial content on the void page, which is about writing new code from scratch, I felt it needed a starting point based on testing anything you write as soon as you can, in a test program.

     When you write code using no other code, where do you start? You must assume something already exists. Otherwise, how do you test new code? Anyway, this brings up an idea of a trusted computing base: what are you willing to trust as true, as a given? Here (and on the void page) I assume this: the C runtime is true.

     By this I mean I assume a standard C library is present, that it works, and that main() is used as an entry point with the usual arguments and semantics. In support of this idea, I'll show you a plausible way to structure test program startup from main, including the parsing of command line arguments. As a rare special case, this also means I'm willing to define C strings and semantics of argv parameters passed to main(), because this will get complex enough as an intro to string parsing.

     (Ultimately, we can write an interpreter just to get more flexibility in testing based on input text descriptions. But to start we can simply interpret command line arguments in a basic way. Or, of course, you can use your favorite existing dynamic language to load your code and test it. But that's your job, not mine.)

no static init «

     Since I use C++, you may wonder why I don't also assume a C++ standard library is "true," and that static init runs before main() is called. Fair enough. Sometimes my C++ code is called from a C app where static init time never happens, and where no C++ standard library was linked. So a test avoiding them is more faithful to actual runtime use, in these apps.

     Anyway, usually I won't assume it's okay to instantiate global C++ objects, since they might not be constructed by static init. But I will sometimes, when I feel like it. (Either I'm inconsistent, or I'm establishing precedent for scary unpredictability, to keep you on your toes. Take your pick.) I might later show you how to do lazy init on first use; this is also helpful when static init occurs, when objects use one another, and you have no guarantee of order of initialization in different compilation units.

standard out «

     We're also going to roll our own standard output stream class, assuming no standard C++ library code can be used, and assuming C standard i/o is too awkward. Now you know why this separate page is necessary just to get under way: it's going to be big. Then the void page can assume this one as given.

08sep09 « main

signature «

     How do we get command line arguments? They arrive via argv when main() is called. Let's look at two different ways of declaring argv, so you can think about declaring those input strings as const, since they should not be modified.

     Here's a typical way to declare main:

int main(int argc, char** argv) { // ... return status; // nonzero means lack of success }

     When you run this under Unix (e.g. Linux), the status code is returned to the invoking shell, so you can write scripts taking special action on failure. Since we'll focus on test programs, maybe status should be a count of errors.

     If you want the compiler to enforce immutable strings in argv, you can declare main as shown below instead. (And while we're at it, let's also have main() delegate control to another method with the same function signature.)

int g_pig_errors = 0; // global count of pig test errors extern int pig_main(int argc, const char* const* argv); int main(int argc, const char* const* argv) { // ... return pig_main(argc, argv); // main for app pig }

pig «

     Let's name our test app pig, and use this as a prefix on names with global scope, in a casual gesture at avoiding global name collisions later. (Use any name and prefix you like. But sample code here uses pig as a namespace prefix.)

argv «

     Now let's look at format of C strings in argv. Suppose you have a sample Linux command line that looks like the following:

% ./pig -xyz -crc,bits=64,path=data.txt

     For this command line, the format of argv might be identical to this:

const char* pig_argv[] = { "./pig", "-xyz", "-crc,bits=64,path=data.txt", (const char*) 0 };

     The last member in that array might surprise some coders: argv should always be nil terminated, so argv[argc] is zero without fail. (But sometimes folks imitating this api forget this minor detail.) When you iterate over strings in argv, you should be able to use either argc or nil termination, or both, and they must agree.

     You can call your pig_main() entry point with this statically declared argv if you like. (You can hard code "command line" args whenever you want.)

return pig_main(3, pig_argv);

     Now let's look at each of those C strings in the array:

C strings «

     Every C string literal constant in source code ends with a null byte at runtime, placed there by a compiler. Also, by convention, all C strings end with a null byte; C compilers merely follow this rule. Command line args are also C strings, null terminated by someone in a chain of control before main() is called. Let's assume this is always true by definition for strings in an argv array passed to main().

characters «

     Let's also assume each string character is an eight bit octet—a single byte—such that sizeof(char) is always one. (If you ever use an architecture where char is not an eight bit octet, you have my sympathies.)

     Note char can be either signed or unsigned—it varies from place to place. So in our sample code, we'll almost always use uint8_t (or contraction u8) instead of char, to avoid sign extension in our code, reserving char for use with standard C api, and for use to signal C string semantics:

     In my sample code, const char* always implies a C string with null termination. When I override methods with this type, callers must always ensure null termination to satisfy the api contract. Any other pointer type, like u8*, just means pointer to a byte, without implying a C string format.

iovecs «

     Instead of C strings, most of my code uses iovecs, in pointer plus length format. In standard C headers, an iovec looks like this:

struct iovec { char *iov_base; /* Base address. */ size_t iov_len; /* Length. */ };

     Or more usually, replacing char with void, like this:

struct iovec { void *iov_base; /* Base address. */ size_t iov_len; /* Length. */ };

     In C++ my version is named either yv or Iov, and looks like this:

struct Iov { // C++ class with all public members u8* v_p; // Base address. u32 v_n; // Length. };

     But in addition, I add a lot of methods, not emphasized here. (For an introduction to thorn iovecs, see the run demo.) Note how I prefer shorter names v_p and v_n, instead of longer iov_base and iov_len.

     For starters, let's add Iov api to include parts used in code below:

struct Iov { // C++ class with all public members « u8* v_p; // Base address. u32 v_n; // Length. Iov() : v_p(0), v_n(0) { } Iov(const void* p, u32 n) : v_p((u8*) p), v_n(n) { } explicit Iov(const char* cstr) // null terminated C string : v_p((u8*) cstr), v_n((cstr)? ::strlen(cstr) : 0) { } Iov& operator=(const char* cstr) { v_p = (u8*) cstr; v_n = (cstr)? ::strlen(cstr) : 0; return *this; } explicit Iov(const iovec& v) : v_p((u8*) v.iov_base), v_n(v.iov_len) { } Iov& operator=(const iovec& v) { v_p = (u8*) v.iov_base; v_n = v.iov_len; return *this; } operator iovec() const { // auto convert yv to iovec iovec v; v.iov_base = v_p; v.iov_len = v_n; return v; } void vinit(const void* p, n32 n) { v_p = (u8*)p; v_n = n; } void vtail() { if (v_n) { ++v_p; --v_n; } } // skip 1st u8 u8* vsplit(int c, Iov& outSecondHalf); // see below };

     The point of constructors and conversion operators is to have the right thing happen in a typesafe way, with low verbosity when values are created.

split «

     Most string parsing on this page uses the vsplit() method below, which chops an Iov in two when a sought byte value is present. We're going to use this to parse command line args like "-crc,bits=64,path=data.txt" below, in order to find key/value pairs used to drive tests.

u8* Iov::vsplit(register int c, Iov& outSecondHalf) { « // vsplit() returns same value returned by vindex() (right). // But vsplit also alters this and outSecondHalf, so this // Iov ends just before first c, and outSecondHalf starts w/ // the first c. Callers use vtail() to remove the leading c. register u8* p = v_p; u8* end = p + v_n; // one beyond last byte in the Iov --p; // prepare for preincrement while ( ++p < end ) { // another byte to examine? if (*p == c) { // found c? need to split Iov in 2 parts? v_n = p - v_p; // bytes left in 1st half outSecondHalf.vinit(p, end - p); // u8s to 2nd half return p; } } // c was not found inside run anywhere, so 2nd half is empty: outSecondHalf.vinit(end, 0); // no match, but return end return (u8*) 0; // nil }

printing Iovs «

     How do you print a string in Iov format using printf()? Don't we need a terminating null byte? No, we don't, when using %.*s like this:

void show_iov(Iov const& v, u32 max) { u32 n = v.v_n; if (n > max) n = max; printf("%.*s", (int) n, (const char*) v.v_p); };

     In other words, string length is passed explicitly as an extra arg before the pointer, so null termination is not needed. (But if an Iov does contain null byte, the result is truncated by printf(), so maybe you should not use printf().)