More Thoughts on Multihoming July 6, 2006 Note: This note is mostly not directly about multihoming itself, but on the routing issues which are the key thing "behind" multihoming. [In this note I use the newly-made-up jargon 'topo-name' to describe the 'names' (in a general CS sense) that the path-selection system uses. Most people call these 'addresses', but I prefer to use a different term because the term 'address' brings along too much mental baggage in too many people.] Corollaries to Multihoming Classification Observation #3: MCO #3 said: There are 3 different multihoming scenarios which are worth distinguishing when one considers mechanisms: - A site (i.e. collection of hosts, networks and interconnecting routers) which has multiple links to the rest of the world - A host (without embedded router) which is connected to multiple networks - A host (without embedded router) which has multiple names (i.e. addresses) and/or interfaces to a single network These differ enough in fundamentals to possibly need different engineering approaches to handle them. I'd like to start this note by exploring what this implies for multihomed hosts (both types), in a little more detail. First, let's focus in on that "(without embedded router)". I put that in there because if a host *does* want to be a router, then if it has (e.g.) two interfaces, to two separate networks (so its interfaces have topo-names A.p.1 and B.q.2), then it can simply participate in the routing so that packets "for" interface B.q.2 can be sent to A.p.1, and the rest of the network doesn't need to know that A.p.1 and B.q.2 are the same machine. Yes, there is a massive amount of glossing over here about whether the overhead in the routing to do this is practical, but I'm examining this issue at an architectural level; from that level, it's clear that host multihoming (either kind) must involve some other mechanism than "having the routing do it". Site multihoming is a totally different kettle of fish, because there "having the routing do it" *is* a possibility. The rest of this note will focus on site multihoming, because that's the kind of multihoming that currently seems to be the most popular. (It's worth keeping in the back of one's mind, though, that some solutions to site multihoming, such as the use of multiple topo-names, also support host multihoming.) Site Multihoming Observation #1: Site multihoming is, fundamentally, a path-selection (routing) issue. That is, there are multiple paths through the physical connectivity fabric, and we wish to make more than one of them available for use. This does not sound hard. Why is it so difficult? The issue turns out to be simply one of efficiency - i.e. how high a cost (or overhead) do we want the path selection to have/incur. If any infinite cost were acceptable, there would be no problem. Path-Selection Observation #1: The mathematical tools used to handle the underlying model (which is a graph) have something to say about this issue - especially when it comes to graphs which are both i) very large, and ii) richly connected - which is to say, in more formal terms, that the average and maximal path lengths through the graph grow slowly as the graph grows. Since the pioneering work of Kleinrock and Kamoun, it has been understood that path-selection in very large graphs (*whether richly-connected or not*) is a tradeoff between two different costs (or overheads): - on the one hand, the overhead of the path-selection mechanisms (routing) themselves (data sets, updates, computation, etc); - and on the other hand the non-optimality of paths (i.e. traffic takes more hops than the shortest possible path). We might visualize this as a knob on our path-selction mechanism: turned one way, paths get closer to the optimum, and the routing overhead rises; turned the other way, paths become less optimal, but the routing becomes less expensive. Path-Selection Observation #1a: If the knob is turned too far toward the "low-overhead" end, the path non-optimality may become infinite; i.e. the routing will fail to discover any valid path, even though one exists. Path-Selection Observation #1b: One part of K+K's analysis assumed that very large graphs would not be very richly connected, and proceeded to show that reasonable paths would still be available for a class of inexpensive path-selction mechanisms (all those using hierarchical topo-names). However, the network these days *is* fairly richly connected, and so we have been pitched back into the "cost/path-optimality" tussle - and in particular, we risk running afould of PSO #1a - i.e. failing to find a path even when one physically exists. Path-Selection Observation #2: We can distinguish between two fundamentally different ways of distributing the information needed to do path-selection, between 'push' and 'pull' models. In the 'push' model, which is the one most contemporary routing systems use, iformation is distributing (usually widely) prior to need - whether this is topology-connectivity information (as in OSPF, IS-IS, etc) or destination-table information (as in RIP, BGP, etc). In the 'pull' model, information is obtained only when it is though it will be needed. There are not really any contemporary data networks which operate in this mode, but road networks do - when one is preparing to make an extensive visit to a distant city, one often obtains a road map for that city prior to arriving. The latter model is interesting, in the context of site multihoming in a large, richly-connected graph, because it offers a way of limiting the overhead of operating the path selection. Obviously, if information is only distributed to those who need it, instead of everyone, the overhead will be reduced. There have been a number of proposals to utilize 'pull' methods to support multihoming. One of the earliest ones I can recall was from Dave Clark; he once (I don't recall exactly when, but I think it was at the time of the IAB Retreat of Routing, which later led to the creation of ROAD - but it could have been earlier) wrote a draft (I'm not sure if it ever made it all the way to completion, alas) of an idea to have "stub" source routes, from major high-level topological entities, to the destination, and include them along with the response to a name lookup. (This is similar to what one often sees on "directions" sheets for events/attractions, where it will say something like "if coming from city X, follow this set of steps; if travelling on motorway/autostrada/interstate/autobahn Y, follow this other set".) The user would select whichever made the most sense (e.g. if they knew their site was directly connected to large ISP X, and one of the possible stub source routes was from X), and use it. At about the same time (I no longer recall if it was inspired by his idea, or not), in the map-distribution system Nimrod, we had roughly the same idea - when getting ready to select a path from A.p.1 to B.q.2, one might ask for detailed maps of B and B.q - in particular, to see if there were any direct connections from B.q that didn't show up in the "default" map. Path-Selection Observation #2a: It is important to realize that approaches such as Shim6, which provide multiple addresses to hosts inside multihomed sites, *are*, in a way, 'pull' systems. The information which is provided - the multiple addresses - are routing information, of a sort. They are information which something trying to find a path can use to select one of several possible paths (which is the essence of site multihoming, to make use of more than one potential path - see SMO #1). And obviously the method of distribution is a 'pull' system. Additional Path-Selection Observation #3: In controlling the overhead of operating the path-selection, limiting scopes for the distribution of information are a key tool. We saw simple examples of this back with subnets, when routing information for subnets of a network was not distributed outside the network. This represents the simplest model, where the 'abstraction naming boundary' (i.e. the boundary - i.e. line enclosing a portion of the graph - around that portion of the graph which can be referred to as if it were a single node) is the same as the 'abstraction action boundary' (i.e. the boundary outside which detailed information about the connectivity inside the ANB is not distributed). (To put it in less abstract terms, if an entity A is composed of A.1 ... A.n, the boundary of A is the ANB of A, and the boundary outside which routing information about A.1 ... A.n is no longer visible are the AAB of A.) Some simple results about path-selection are possible with these concepts in hand. For example, if the ANB and the AAB of an entity are congruent (i.e. identical), then if the entity is partitioned, connectivity to things inside it cannot be fully maintained. The relevance of these concepts to site multihoming is to the "cost/path-optimality" tussle referred to in PSO #1b; viable site multihoming in a large, richly-connected graph may require control of the overhead, and intelligent settings on ANB's and AAB's *may* provide a tool. If, for instance, a system which generally provides/uses only *two* kinds of AAB - either ones congruent with the ANB, or global - will have a harder time controlling routing overhead than one which provides a richer range of AAB's - and therefore more opportunities to control the overhead. However, without a detailed analysis based on actual data, it is impossible to say whether the reduction in overhead would be sufficient, or whether some more aggressive approach to controlling overhead (e.g. use of 'pull' methods) would be necessary.