Þ   briarpig  » code  » label


label «

     On this page, label is short word for descriptor. For example, a file descriptor is a kind of label using terms defined here. But here we generalize descriptors for use in async messaging and call them labels. The idea means something very similar to name or address. In effect, a label is a name that can be turned into an address by whoever coined that name.

     The word descriptor serves my purposes perfectly, except the word is too damn long. In a thesaurus, label means the closest thing to descriptor as used here. And then if you look in a thesaurus for label, you can find synonyms with both noun and verb meanings. You can replace label wtih logo or mark as a noun, or with call or name as a verb. All of these are pertinent meanings—especially call when label means continuation.

why «

     Here's the confusing part: Why do I want this generalization of descriptors for use as labels? You don't need to know an answer to this at first (because descriptors are easy concepts) but as soon as I start showing applied label practice, you might be stumped unless you know the end goal.

     My goal is to decouple callers and callees in async api. A label can be used to indirectly refer to return addresses, for example, to name the caller who should receive a reply to a request. But a label can also refer indirectly to a callee preparing a response. Async api works by message passing, so callers and callees have no easy way to refer to one another without labels used as tokens to route messages. If a client C sends a request to server S, then C must pass a label to show S where to reply, and if server S wants to allow C to cancel the request (or say anything about the request before a reply), then S should give C a label referring to the request in progress. One kind of label points forward, the other backward. Confusing? Absolutely.

descriptors «

     The normal way to represent a descriptor is by a simple integer. For example, if you open a file you might be given a file descriptor consisting of nothing but an integer.

int fd = ::open( /* ... */);

     A big problem with that kind of descriptor is inability to invalidate stale descriptors. Let's say you close descriptor fd above and then open a new file. The new file might use the same descriptor. If someone thinks the old descriptor is still valid, you might have a problem. For example, suppose someone thinks the old file still needs to be closed to clean up a resource. If they close the old descriptor, the new file gets closed. Oops. It would be nice to see the old descriptor is no longer valid. If only the descriptor was annotated with a generation number.

typedef struct cy_fid_ { uint16_t fid_idx; /* index in some file array */ uint16_t fid_gen; /* current file generation */ } cy_fid; cy_fid fd = ::open( /* ... */);

     Using the approach shown above, you can tell an old descriptor is invalid. If the old file descriptor was just an index into some fix-sized resource, that index now becomes the 16-bit fid_idx field. (This is big enough when you never have more than 64K instances of this resource alive at once. When you need more, obviously both fields must be 32-bit values instead.)

     Every time the indexed resource is re-allocated, it gets a new generation number. (I strongly recommend pseudo random generation numbers.) If someone tries to use an old descriptor, with the wrong generation number, they get an error. Errors are good: much better than random execution behavior.

     I use a lot of descriptors like the one above when I design a library with hidden addresses. Each different type of object gets a separate struct, but typically they all look the same. A four byte struct like this can be passed around as efficiently as an integer value. I use 32-bit fields only when necessary.

     However, this sort of descriptor is a pain in the ass, because the logical firewall hiding a physical address is a kind of friction. Guess who pays cost of the friction? That's right, you do. A small pain like this, repeated zillions of times, can wear you down. Even worse, you don't want to impose this discipline on callers. You can do this with a library, but typically you want to represent client callbacks more directly.

callbacks «

     Here's a normal async callback in a C api:

typedef void (*cy_foo_cb)(void* cx, int err); int /* nonzero: errno; zero: you will be called back */ cy_do_some_foo(ArgA* a, ArgB* b, cy_foo_cb cb, void* cx);

     This is logically a function taking just two args, a and b, but callback cb and context cx are passed in order to get a reply later, asynchronously with respect to the call.

     In a "normal" function (with synchronous behavior) the role of cb is served by the return address in the caller. And because the caller's stack stays around until this call returns, you can keep state in the stack instead of passing a pointer to state in context pointer cx. So cb and cx manually cope with a feature normally hidden by the runtime. This async approach lacks grace: it's complex. (But it works well enough.)

     But the caller has a problem: context arg cx is actually a descriptor. It refers to a resource in the caller that must remain in place until the callback occurs. (If it's refcounted, then the call is one of the references.) But since cx has no associated generation number, it's hard for the caller to invalidate. What if the caller gets tired of waiting for the callback, and would rather attend to other business, re-using the space that cx describes? Without a generation number, how can you distingish a callback from the earlier call (which you abandoned) and a later one?

     The next section adds a generation number to cx.

who «

     This is the label type described by this page:

typedef struct cy_who_ { /* async ID */ void* who_ptr; /* async context pointer */ uint32_t who_gen; /* generation num or state */ } cy_who;

     Note: by convention, a nil pointer in who_ptr is never valid. (Nil is a synonym for zero, not any other value.)

     I changed the name of this struct several times before settling on who. You don't want to know the longer analytical names I tried first. This describes either a caller or a callee. Let's look at a new form of callback signature.

typedef void (*cy_cfn)(cy_who me, int err, cy_who you);

     This looks like the last callback, but with two differences. First, the first void* context argument is now a who which couples a generation number with the pointer. Second, a new last who argument makes it possible to identify who is replying.

     The next section explains expected usage in terms of continuations. First let's address terms me and you.

     In object oriented languages, the first parameter typically represents the object receiving a message. In C++, this is that object, and in Smalltalk self is that object. Here the use of me refers to the object receiving the callback, so me means this. Term you is the opposite of me: the label for whoever is calling back. (Weird? Other conventions are just as bad.)

continuations «

     The term continuation is a fancy way to say return address plus state of the caller. The continuation object below is named cy_c because c is short for continuation. This is just the original callback cb and context cx args shown earlier, but packaged differently so the context has a generation number.

typedef struct cy_c_ { /* async continuation */ cy_cfn c_fn; /* async callback */ cy_who c_me; /* me=this context */ } cy_c;

     Now look at opening a file using async notation:

cy_c myself; myself.c_fn = function_to_callback; myself.c_who = my_descriptor; cy_who you = cy_open(/* ...*/, myself);

     How do you tell whether this failed? By convention, zero in who_ptr is always invalid. So if you.who_ptr is nil, you can look in errno for the error value, and the callback will never be called. But if you.who_ptr is non-nil, this means the myself.c_fn callback will be called exactly one time with a non-negative err value. (We might use negative err in a callback for progress reports. Success is shown by zero in err when (*myself.c_fn)(myself, err, you) is called. Actual contracts may vary on a call by call basis. Always read docs.)

     The you value returned here allows a caller to ask about this request later—perhaps to attempt canceling the request. Note the callee appears burdened with a need to maintain the value for you returned by this request, because the same value must be passed later to the callback. But a cy_who value might be easy to get from request state; extra state might not be needed.

efficiency «

     Note sizeof(cy_who) is at least eight bytes, and more when pointers exceed 32 bits in size. This might seem hefty, especially when you know 16-bit descriptors and generation numbers are enough in your app. You might statically allocate all space in a server and expect tens of thousands of outstanding requests, wasting more space than necessary using this definition of cy_who. Yes, yes, this might not be ideal.

     The cy_who shown here aims to be least annoying when writing code to use it the first time. Having an actual pointer in there saves a lot of grief. Minimizing space isn't my goal in this version. Instead I want to reduce my grief in writing very complex first drafts of async systems. I can always write a new version with a smaller definition of cy_who.

encoding «

     Although cy_who is declared as containing a pointer and a 32-bit generation number, that doesn't mean that's what is really inside an instance of cy_who. By convention, each who instance is opaque, understood only by the cy_cfn continuation function called with that value passed in the me position.

     You can put anything inside as long a size does not exceed that of cy_who. Implementations that want to protect themselves might put integer indexes in the pointer, to avoid revealing memory addresses. And systems with memory that moves might use a relative pointer instead of absolute encoding.

     You could even use a bit inside cy_who to say how it's encoded, if you want to encode multiple ways for some reason. The only part that isn't negotiable is cy_cfn—that has to be a pointer to a function taking args as described.

C vs C++ «

     So why did I write the api above in C? Why didn't I use C++ since I prefer C++? I'm sure you can guess the answer: my callers or callees might be written in C. As long as basic interaction is defined in C, no C user is shut out.

     But I plan to use this api in C++:

external continuations «

     Note class cy::Ce means exactly the same thing as cy_c, but with convenient constructors defined. (Absence of constructors in C-based structs is irritating when code is verbose.)

     I might have named this class just cy::C, but one letter class names felt a little disturbing. So I appended e for external. This allows me to use i for internal below.

namespace cy { class Ce : public cy_c { // continuation external public: Ce() { c_fn = 0; c_me.who_ptr = 0; c_me.who_gen = 0; } Ce(cy_cfn cb) { c_fn = cb; c_me.who_ptr = 0; c_me.who_gen = 0; } Ce(cy_c const& x) { c_fn = x.c_fn; c_me = x.c_me; } Ce(cy_cfn cb, cy_who me) { c_fn = cb; c_me = me; } // same thing as cy_cfn_do(): void fn_do(int err, cy_who you) { (*c_fn)(c_me, err, you); /* callback */ } }; // class Ce }; // namespace cy

internal continuations «

     Similarly, class cy::Ci means exactly the same thing as cy_who, but with convenient constructors defined.

     Why do I say continuation referring to just a who value? Because when an async request returns a Ci denoting the future you passed to a callback, it refers to a continuation of a request: the interior of a call, as opposed to external callers.

     Internal continuations don't need a cy_cfn function pointer—you already supplied it when calling a request! Only an external continuation needs a callback function pointer, because it hasn't been called yet. (In other words, once you're inside a request method, you don't need to function pointer to get there anymore. So internal continations are just pointers.)

namespace cy { class Ci : public cy_who { // continuation internal public: Ci() { who_ptr = 0; who_gen = 0; } Ci(void* p, uint32_t g) { who_ptr = p; who_gen = g; } }; // class Ci }; // namespace cy

cross language dispatch «

mixed runtimes «

     Did you notice callers and callees do not need to use the same runtimes? Internal and external continuations can denote completely different kinds of runtime. You can call back and forth between garbage collected and non-garbage-collected runtimes, for example, or between different kinds of virtual machine.

     I wrote this page to crystalize thoughts from this afternoon (09may2009) before I go deeper into code designs for async programming language runtimes.

     Several years ago, around 2000 or so, I thought I would represent code continuations in virtual machines in a way that labels a function address with the type of runtime intended to execute the code. This would let me explore multiple VMs at the same time, as well as call between interpreted s-expresssions, byte-code compiled methods, or whatever else was used.

     After I started articulating this design to myself this afternoon, I noticed it resembled that earlier idea of typing code pointers. A cy_c external continuation representing a caller's callback has the same character: passing the who state for a function allows it to interpret what runtime should process the callback.

     Of course I'm glossing over details. (When a garbage collected runtime makes a request, does it know it should encode the continuation in a way that works when calling back into the gc runtime?) But in principle they just involve work.

stacking «

     The main reason for cy_c on this page was to unify the two sorts of continuation I normally use: descriptors for internal continuations and void* contexts plus function pointers for exerternal continuations.

     Now one format kinda looks like both. This allows me to make the async dispatch style universal inside and outside an async library, so I needn't use wildly different runtime styles inside an outside—that's a total pain when writing simulations and other test harnesses. This allows me to define a thread model crossing library boundaries.

     An async thread is a stack of continuations. Presumably all other state is allocated outside continuation stacks.

     I've been working on lightweight process api lately, and I started drafting the api on this page to represent async control flow, and ended up with a threading model instead.

     Presumably lightweight processes and async stackless threads are partly orthogonal. Process semantics are mainly about defining disjoint mutable address spaces. But for a process to do anything, it needs at least one async thread inside doing something. (Since interacting lightweight processes are all inside one heavyweight OS process, it might be possible to have async stackless threads that span lightweight processes—but this sounds disturbing somehow. Hmm.)

     When an async thread modifies no state, it might look like a process. So I might have a process spawning api for immutable activities, which just makes another async thread stack. Another kind of process spawning api would need to clone mutable state using a copy-on-write protocol.

menu

     Here's a menu of pages on cy code.

  • vector - std::vector clone
  • bheap - binary min heap
  • label - async descriptors
  • misc - bunch of basic utils
  • pool - C-based vats and pools
  • deck - scatter/gather suite
  • sink - C-based out stream api
  • row - a new deck rewrite

license

     See license and copyright for code here. For more context, see the cy page.

comments

     Compared to a thorn demo, I explain cy code less: I care little whether folks use or grasp cy source. But since I aim to get ideas across, I reveal a point to code constructs so you see intentions.

     If you ask: What was this for? That's the only question I address: why a thing was done. If you what to know how code works or what loose ends remain, figure it out.

color coding

     Library source code appears appears in amber (orange/brown):

amber is_source(code* c);

     Source .cpp code appears in red:

void cy_logf(int, const char* f, ...) { char temp[ 2048 + 4 ]; va_list args; va_start(args,f); vsnprintf(temp, 2048, f, args); va_end(args); temp[2048] = 0; printf("%s\n", temp); }

     Sample test code is purple:

o << "# purple=test green=stdout" << cy_newl;

     Printed output on stdout is green:

# purple=test green=stdout

     I know these aren't the best color cues. (Amber and green might appear the same hue to color blind folks. I have excellent color discrimination myself.)