Nimrod Working Group Ram Ramanathan Internet Draft Martha Steenstrup March 1996 BBN Systems and Technologies draft-ietf-nimrod-fun-pro-spec-00.ps Expires 30 August 1996 Nimrod Functionality and Protocol Specifications, Version 1 Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress''. Please check the 1id-abstracts.txt listing contained in the ``internet-drafts'' directories on ftp.isi.edu (U.S. West Coast), ds.internic.net (U.S. East Coast), munnari.oz.au (Pacific Rim), nic.nordu.net (Europe), or ftp.is.co.za (Africa) to learn the current status of any Internet Draft. Distribution of this Internet Draft is unlimited. Please send all comments to nimrod-wg@bbn.com. Abstract Nimrod is a scalable routing architecture designed to support a dynamic internetwork of arbitrary size, to provide service-specific routing in the presence of multiple constraints, and to admit incremental deployment throughout an internetwork. The key features of Nimrod include representation of internetwork connectivity and services in the form of maps at multiple levels of abstraction; source- and destination-controlled route generation and selection based on maps and traffic service requirements; and source- and destination-controlled message forwarding according to the routes selected. This document contains a description of Nimrod functionality and a specification of the protocols constituting Nimrod. In particular, the operations pertinent to the map, locator, adjacency, route, and forwarding databases are described, and the Reliable Transaction, Internet DraftNimrod Functionality and Protocol Specifications March 1996 Update, Query-Response, Path Management, and Discovery protocols are specified. Acknowledgments We thank Tom Calderwood, Winston Edmond, Charlie Lynn, Trevor Mendez, Betty O'Neil, Mike Patton, Ram Ramanathan, and Tim Shepard for producing an experimental Nimrod software system that has enabled us to test the practicality of Nimrod. We are especially grateful to Charlie Lynn, chief architect of the Nimrod software, for his flexible system design, his careful and critical analysis of the Nimrod protcols, and his detailed packet formats (depicted in this document). 1 Internet DraftNimrod Functionality and Protocol Specifications March 1996 Contents 1 Scope and Overview 1 2 Introduction 1 2.1 Overview of the Nimrod Architecture : :: :: :: :: :: :: :: :: :: 2 2.1.1 Clustering and Abstraction : :: :: :: :: :: :: :: :: :: :: 2 2.2 Nimrod Entities :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 3 2.3 Nimrod Routing Functions and Databases : :: :: :: :: :: :: :: :: 5 2.3.1 Nimrod Database Consistency :: :: :: :: :: :: :: :: :: :: 6 2.4 Nimrod Agents :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 8 3 Nimrod Operation : An Overview 10 3.1 Maps :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 10 3.1.1 Map Update :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 10 3.1.2 Map Request and Release : :: :: :: :: :: :: :: :: :: :: :: 11 3.2 Routes :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 11 3.2.1 Route Generation :: :: :: :: :: :: :: :: :: :: :: :: :: :: 12 3.2.2 Route Request and Release :: :: :: :: :: :: :: :: :: :: :: 12 3.3 Locators : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 13 3.3.1 Acquiring and Releasing Node Locators :: :: :: :: :: :: :: 13 3.3.2 Acquiring and Releasing Endpoint Locators : :: :: :: :: :: 14 3.4 Adjacencies : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 14 3.4.1 Acquiring, Activating, and Releasing Adjacencies :: :: :: 14 3.5 Paths : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 15 3.5.1 Path Setup :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 16 3.5.2 Path Acceptance :: :: :: :: :: :: :: :: :: :: :: :: :: :: 18 i Internet DraftNimrod Functionality and Protocol Specifications March 1996 3.6 Control Message Integrity and Authentication : :: :: :: :: :: :: 19 3.6.1 Timestamps :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 19 3.6.2 Authentication : :: :: :: :: :: :: :: :: :: :: :: :: :: :: 20 4 Reliable Transaction Protocol 21 4.1 Services Interface :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 21 5 The Update Protocol 23 5.1 Service Interface : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 23 5.2 Protocol Operation :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 24 5.2.1 Update Header :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 24 5.2.2 Originating Agent Operations :: :: :: :: :: :: :: :: :: :: 25 5.2.3 Transit Agent Operations :: :: :: :: :: :: :: :: :: :: :: 26 5.2.4 The Update Message Action Table (UMAT) : :: :: :: :: :: :: 26 5.3 Database Specific Updates :: :: :: :: :: :: :: :: :: :: :: :: :: 27 5.3.1 Adjacency Updates : :: :: :: :: :: :: :: :: :: :: :: :: :: 28 5.3.2 Locator Updates :: :: :: :: :: :: :: :: :: :: :: :: :: :: 28 5.3.3 Map Updates : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 29 6 The Query-Response Protocol 30 6.1 Service Interface : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 30 6.2 Protocol Operation :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 30 6.3 Query/Response Header :: :: :: :: :: :: :: :: :: :: :: :: :: :: 31 6.4 Database Specific Request/Release :: :: :: :: :: :: :: :: :: :: 32 6.4.1 Adjacency Formation :: :: :: :: :: :: :: :: :: :: :: :: :: 32 6.4.2 Adjacency Release : :: :: :: :: :: :: :: :: :: :: :: :: :: 32 6.4.3 Adjacency Activation : :: :: :: :: :: :: :: :: :: :: :: :: 33 ii Internet DraftNimrod Functionality and Protocol Specifications March 1996 6.4.4 Locator Acquisition :: :: :: :: :: :: :: :: :: :: :: :: :: 34 6.4.5 Locator Release :: :: :: :: :: :: :: :: :: :: :: :: :: :: 35 6.4.6 Map Acquisition :: :: :: :: :: :: :: :: :: :: :: :: :: :: 35 6.4.7 Map Termination Request : :: :: :: :: :: :: :: :: :: :: :: 36 6.4.8 Path Information Request :: :: :: :: :: :: :: :: :: :: :: 37 6.4.9 Route Generation Request/Reply :: :: :: :: :: :: :: :: :: 37 7 Path Management Protocol 39 7.1 Protocol Messages : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 40 7.1.1 Setup : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 40 7.1.2 Accept :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 42 7.1.3 Teardown : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 43 7.1.4 Status :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 45 7.1.5 Ack :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 47 7.2 Protocol Finite-State Machines :: :: :: :: :: :: :: :: :: :: :: 48 7.2.1 Initiator :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 50 7.2.2 Intermediate Forwarding Agents and Target : :: :: :: :: :: 51 7.2.3 Check State Actions :: :: :: :: :: :: :: :: :: :: :: :: :: 53 8 Discovery Protocols 57 8.1 Neighbor Discovery :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 57 8.1.1 Neighbor Reachability :: :: :: :: :: :: :: :: :: :: :: :: 58 8.2 Agent Discovery :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 60 8.2.1 Flooding Agent Advertisements : :: :: :: :: :: :: :: :: :: 61 8.2.2 Distribution of Advertisements to Distant Agents :: :: :: 62 8.2.3 Unreachable Agents :: :: :: :: :: :: :: :: :: :: :: :: :: 63 8.2.4 Node Partitions :: :: :: :: :: :: :: :: :: :: :: :: :: :: 64 iii Internet DraftNimrod Functionality and Protocol Specifications March 1996 8.3 Route Tracing :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 65 9 Packet Formats 67 9.1 Overview : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 67 9.2 IP and Security Headers : :: :: :: :: :: :: :: :: :: :: :: :: :: 68 9.3 Nimrod Forwarding Information : :: :: :: :: :: :: :: :: :: :: :: 70 9.4 Transaction Headers :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 76 9.5 Update, Query, and Response Protocol Headers : :: :: :: :: :: :: 77 9.6 Update Operation Messages :: :: :: :: :: :: :: :: :: :: :: :: :: 79 9.7 Query/Response Operation Messages :: :: :: :: :: :: :: :: :: :: 80 9.8 Discovery Message Header :: :: :: :: :: :: :: :: :: :: :: :: :: 82 10 Appendix 1: Figures for Update and Q-R protocols 85 11 Appendix 2: Basic Data Formats 90 11.1Element (NimElem) : :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 90 11.2Locator (NimLoc) :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 91 11.3Endpoint Identifier (NimEID) :: :: :: :: :: :: :: :: :: :: :: :: 92 11.4Endpoint Name (NimFQDN) : :: :: :: :: :: :: :: :: :: :: :: :: :: 93 11.5Node Identifier (NimNID) :: :: :: :: :: :: :: :: :: :: :: :: :: 93 11.6Services (NimServ) :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 93 11.7Maps (NimMap) :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 94 11.8Routes (NimRute) :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: 95 11.9Credentials (NimCred) :: :: :: :: :: :: :: :: :: :: :: :: :: :: 96 11.10Path Labels (NimPLbl) :: :: :: :: :: :: :: :: :: :: :: :: :: :: 96 11.11Time (NimSecs, NimNTP) :: :: :: :: :: :: :: :: :: :: :: :: :: :: 96 11.12Authenticator (NimAuth) : :: :: :: :: :: :: :: :: :: :: :: :: :: 97 iv Internet DraftNimrod Functionality and Protocol Specifications March 1996 12 Security Considerations 98 13 Contact Information 98 v Internet DraftNimrod Functionality and Protocol Specifications March 1996 1 Scope and Overview This document contains a description of Nimrod functionality,and a specification of the protocols constituting Nimrod. While it has been our intention that the document be self-contained, it would help the reader to be familiar with the Nimrod architecture and functionality as described in [1] and [2]. Nimrod does not specify support for mobility or multicast, but does specify requirements for solutions to mobility and multicast within the Nimrod context. A discussion of these issues can be found in [3] and [4]. The document has been organized so that readers may inform themselves at various levels of detail. Specifically, readers wishing to know only what Nimrod's functionality is may confine themselves to sections 2 and 3. For readers wishing to understand and evaluate the protocols comprising Nimrod, we additionally recommend sections 4, 5, 6, 7, and 8. Finally, for Nimrod implementors, sections 9 and 11 give additional details. 2 Introduction Nimrod is a scalable routing architecture designed to support a dynamic internetwork of arbitrary size, to provide service-specific routing in the presence of multiple constraints, and to admit incremental deployment throughout an internetwork. The key features of Nimrod include representation of internetwork connectivity and services in the form of maps at multiple levels of abstraction; source- and destination-controlled route generation and selection based on maps and traffic service requirements; and source- and destination-controlled message forwarding according to the routes selected. At the most general level, one may view any routing system as a set of basic functions which are producers and consumers of certain databases of routing information. These routing functions and their associated routing information include: 1. Assembling, distributing, and collecting information necessary for route generation and selection. This information includes internetwork connectivity and services, traffic service requirements, and locations of traffic sources and destinations. 2. Generating and selecting routes, based on the collected information. 3. Establishing in routers information necessary for forwarding messages, based on the selected routes. 4. Forwarding messages along these routes. Routing systems may, however, differ in the details of the mechanisms that provide a particular routing function. As Nimrod has been designed for routing in large, heterogeneous, and dynamic internetworks, its basic routing functions include additional mechanisms for reducing the quantity of 1 Internet DraftNimrod Functionality and Protocol Specifications March 1996 routing information that must be distributed, processed, and stored throughout an internetwork; for discovering and accommodating changes in routing information caused by physical changes in an internetwork; and for protecting the integrity of routing information. 2.1 Overview of the Nimrod Architecture Before Nimrod routing can be applied within an internetwork, the internetwork must be represented in terms of the two basic Nimrod entities: nodes and endpoints. The internetwork's physical assets, such as routers, point-to-point links, and multiaccess networks, must be captured in Nimrod maps comprising interconnected nodes. Traffic sources and destinations must be cast as Nimrod endpoints. Nimrod entities possess attributes (e.g., location with respect to the maps, interconnectivity with other entities, and service information) which are important for routing. 2.1.1 Clustering and Abstraction Ideally, Nimrod maps should be constructed so as to satisfy the following two primary, and potentially conflicting, goals: 1. Minimize the amount of routing information maintained throughout an internetwork. 2. Maintain routing information sufficient to generate routes that meet traffic service requirements. To satisfy these goals, Nimrod employs two complementary map construction procedures, namely clustering of internetwork physical assets into nodes and abstraction of attributes of the component physical assets resulting in node attributes. The objective of clustering is to reduce the number of entities visible to Nimrod routing at any given level of the hierarchy. Nodes are usually formed by clustering physical assets possessing similar attributes. These attribute similarities might be in terms of, for example, qualities of service, restrictions on access to services, or ownership of these assets. Such clustering results in a reduction in the amount of information necessary to characterize these physical assets, without a reduction in information detail. However, an internetwork's physical assets may be diverse enough so that clustering according to attribute similarity produces no significant reduction in the number of entities visible to Nimrod routing. In this case, alternative clustering criteria (e.g., geographical locality of physical assets) may be employed. Clustering may be applied repeatedly, such that physical assets are first clustered into nodes, and then nodes are themselves clustered into larger nodes, and so on. Iterative clustering further reduces the number of 2 Internet DraftNimrod Functionality and Protocol Specifications March 1996 entities visible to Nimrod routing at a given level of the hierarchy, and results in a hierarchical organization of nodes with a single top-level universal node containing all other entities. In the clustering hierarchy, the clustering criteria applied at different levels may not necessarily be the same. The objective of abstraction is to reduce the amount of information required to characterize an entity visible to Nimrod routing. Nodes whose component physical assets possess different attributes rely on information abstraction in order to reduce the number of attributes used to characterize them. Abstraction procedures include, for example, eliminating attributes possessed by only a small percentage of the component physical assets or expressing attributes in terms of ranges of values exhibited by these physical assets. Multiple abstraction procedures may be applied to produce the attributes of a given node (e.g., first eliminating attributes possessed by only a few physical assets and then taking the average values of the remaining attributes for the physical assets in the node). Nimrod does not mandate the choice of clustering and abstraction procedures to invoke in an internetwork. Rather, this choice is a local one under the control of the managers of the portions of the internetwork to be represented as Nimrod nodes, and hence network managers may develop procedures that best suit their needs. We note that the specific clustering and abstraction procedures employed in an internetwork may have a significant effect on the quality of routes generated and on the cost of routing information maintenance. Hence, network managers should exercise care in selecting and using these procedures and may wish to experiment with several different ones during the evolution of their nodes. Although clustering and abstraction procedures may be fully automated, we recommend allowing manual intervention in order to enable network managers to make cost-benefit tradeoffs appropriate for their particular networks. 2.2 Nimrod Entities All of the Nimrod routing functions are performed with respect to an internetwork's representation in terms of the basic Nimrod routing entities, namely nodes and endpoints. Each Nimrod entity has a set of attributes, each of which may be established through one or more of the following methods: 1. Manual configuration. 2. Automatic acquisition during initialization. 3. Active measurement. 4. Abstraction of attributes of component nodes. A node is a set of contiguous internetwork physical assets. It may be formed by clustering physical assets directly or by clustering existing 3 Internet DraftNimrod Functionality and Protocol Specifications March 1996 nodes. If the given node itself comprises component nodes, the routing system employed to route traffic within or across the node is Nimrod routing. Otherwise, this routing system may use any other routing protocol(s). A node's attributes include its: 1. Node identifier (NID). An NID is a location-independent referent for a node. It is a globally unique, relatively short, fixed-length bit string used by Nimrod-capable devices to communicate node identity (primarily used before a node acquires its locator). 2. Locator. A node's locator is a globally unique label describing the node's position in the clustering hierarchy. It consists of the global locator of the node's enclosing node in the clustering hierarchy, concatenated with a local bit-string, called an element, unique among all component nodes and endpoints of the enclosing node. 3. A pool of locator elements that may be assigned to its component nodes and endpoints. 4. Constraints on forming associations with endpoints. An association is a relationship formed between a node and an endpoint, such that the endpoint acquires a locator from the node. A node may be associated with multiple endpoints, and an endpoint may be associated with multiple nodes. 5. Constraints on forming adjacencies with other nodes. An adjacency is a neighbor relationship formed between two nodes that have a direct communication capability. The neighbor relationship need not be symmetric. For example, nodes X and Y may agree to a relationship in which Y is adjacent to X, but X is not adjacent to Y. 6. Maps consisting of the node's current adjacencies and service offerings. 7. Credentials of the node's manager, used in forming node adjacencies and endpoint associations. An endpoint is a traffic source, destination, or both that is visible to other Nimrod entities through association with one or more Nimrod nodes. Examples of endpoints include hosts and routers or even processes within hosts and routers. An endpoint's attributes include its: 1. Endpoint identifier (EID). An EID is a location-independent referent for an endpoint. It is a globally unique, relatively short, fixed-length bit string used by Nimrod-capable devices to communicate endpoint identity. 2. Names. A endpoint name is a globally unique, variable-length, structured, ASCII string used primarily by humans to refer to the endpoints. Nimrod uses Domain Name System (DNS) names for this purpose. 4 Internet DraftNimrod Functionality and Protocol Specifications March 1996 3. Constraints on forming associations with nodes. 4. Locators. An endpoint's locators are obtained from the nodes with which the endpoint is associated. Therefore, as an endpoint may be associated with more than one node, it may obtain more than one locator. 5. Traffic service requirements from the perspectives of the endpoint as source and as destination. 6. Credentials of the endpoint and its manager, used respectively in authentication of routing information and in forming node associations. 2.3 Nimrod Routing Functions and Databases At the core of Nimrod lies a set of distributed databases containing routing information that is constructed, accessed, and acted upon by the routing functions. These databases and their relationships to the routing functions are as follows: 1. Node attributes. Each node has a set of attributes used in forming node adjacencies and endpoint associations, in constructing maps, and in assigning locators to component nodes and endpoints. 2. Endpoint attributes. Each endpoint has a set of attributes used in forming node associations and in selecting routes. 3. Endpoint/locator associations. Nimrod endpoint locators are used in generating routes between and in forwarding messages toward those endpoints. Endpoint/locator associations are stored and accessed through the DNS. 4. Maps. Each Nimrod node has a set of maps describing its traffic service offerings and adjacencies to other nodes, collectively called connectivity specifications and used in generating routes. 5. Routes. Routes are generated in response to requests on behalf of traffic sessions between endpoints. Route generation works within the constraints of the service requirements specified by the endpoints and the services and adjacencies advertised in maps. Each route is expressed in terms of nodes and their corresponding connectivity specifications and is used to construct forwarding information to be installed in routers. 6. Forwarding information. Nimrod traffic forwarding is path-oriented, where a path is defined by the forwarding state stored in routers according to the route selected and is under the control of the source and destination endpoints. 5 Internet DraftNimrod Functionality and Protocol Specifications March 1996 Databases relevant to but not maintained by Nimrod are the pool of NIDs that may be assigned to nodes and the pools of EIDs and names that may be assigned to endpoints. EIDs and NIDs may be drawn from the same number space. Each of the Nimrod routing functions uses portions of the contents of one or more of the Nimrod databases. In a dynamic internetwork, the procedures for updating and retrieving the contents of Nimrod databases will be performed frequently. Therefore, each Nimrod database is organized to optimize the performance of these procedures along the dimensions of delay, internetwork resource consumption, fault tolerance, and load balancing. Nimrod databases are maintained by a combination of routers, hosts, and special-purpose physical devices. The use of special-purpose devices means that routers and hosts do not have to assume all of the processing and memory load related to routing. For example, as route generation is a computationally intensive procedure, some network managers may elect to use dedicated devices, distinct from routers, whose sole purpose is to generate Nimrod routes. For most Nimrod databases, we suggest distributing database contents over several physical devices throughout an internetwork. In a large internetwork, one may in fact have no other choice; the memory required for a single Nimrod database may exceed the storage capacity of any one device. Also, we suggest distributing database contents with partial redundancy, such that each database entry is stored in more than one device. Distributed organization of Nimrod database contents helps to reduce the database maintenance and query-response costs borne by any one physical device. Partial redundancy helps to increase the availability of database contents and to reduce the costs of the average database query-response; it may also increase the cost of the average database update, however. 2.3.1 Nimrod Database Consistency Each Nimrod database contains routing information crucial for successful communication between endpoints. Inconsistencies between the actual state of the internetwork and the state as reflected by Nimrod database contents may result in impaired communication between a pair of endpoints and, in the worst case, may completely disrupt communication among all endpoints. Thus, minimizing the number as well as the consequences of such inconsistencies is a primary objective of the Nimrod database maintenance procedures. Inconsistencies between database contents and actual internetwork state may result from delays incurred in updating database contents following internetwork state changes. Many of the Nimrod databases are volatile and hence require mechanisms for keeping the contents current, in order to prevent propagation and use of stale routing information. Database maintenance includes rapid and reliable updating with new information as well as removal of old information. We recommend that each Nimrod database 6 Internet DraftNimrod Functionality and Protocol Specifications March 1996 be maintained as a cache, such that each entry has a finite lifetime and may be removed from the database when it expires. Cache entry lifetimes will depend upon the expected duration of the usefulness of the cached information. Inconsistencies between database contents and internetwork state may also result from errors introduced directly into database contents. Errors in Nimrod database contents may be injected inadvertently, through faults in the transmission media or in physical device memory, through misconfiguration, or through incorrect implementation of the database maintenance procedures. Errors may also be injected intentionally by malicious parties, through distribution of fictitious database updates and responses to queries (by capturing and corrupting existing database messages or by generating new messages) or through modification of database maintenance procedures. Updating and retrieval of Nimrod database contents involve frequent communication of routing information over an internetwork and hence expose this routing information to numerous potential opportunities for error introduction. Therefore, the protocols that carry out these procedures attempt to protect the routing information from introduced errors and malicious parties. In particular, the protocols for communicating information to or from a Nimrod database permit the intended recipient of that information to: 1. Authenticate the information. 2. Detect corruption of the information. 3. Determine whether the information received is newer than any related information the recipient already possesses. 4. Indicate to the sender the receipt of acceptable or unacceptable information. These protocols also permit the sender to retransmit to the intended recipient any information that it perceives the recipient has failed to receive successfully. We note that while Nimrod requires consistency between database contents and internetwork state, it does not require different physical devices to maintain identical views of internetwork state (e.g., two different routers might maintain different maps for the same node, both of which are consistent with the physical assets of that node). Furthermore, Nimrod does not require consistency in route selection across different physical devices (e.g., two different routers might select routes to the same destination such that each router is included in the other's route). The underlying path-oriented nature of message forwarding in Nimrod enables loop-free forwarding in the presence of such inconsistencies in route selection among routers. 7 Internet DraftNimrod Functionality and Protocol Specifications March 1996 2.4 Nimrod Agents Within an internetwork, each Nimrod database is stored in a set of physical devices. Each physical device containing a portion of a Nimrod database executes a set of functionality, called a Nimrod agent, for manipulating the database contents. A single device may contain portions of more than one Nimrod database and hence may contain more than one Nimrod agent. Each Nimrod agent is a Nimrod endpoint. For each Nimrod database, certain agents are responsible for maintaining specific portions. Such an agent is designated as an authoritative source for that portion of the database. A specific portion of a database may have multiple authoritative sources. Each agent is an authoritative source for some portion of a database but may also obtain and cache information learned from authoritative sources for other portions of that same database. In addition to receiving unsolicited database updates, a Nimrod agent may also refresh its database by querying other agents of the same type for their database contents. Nimrod agents and databases are organized according to the clustering hierarchy, such that each node has a set of agents that act on its behalf to answer or forward database queries. The Nimrod agents and their corresponding functions are as follows: 1. Node representatives. Each Nimrod node must have one or more representatives which maintain the database of the node's attributes and act on its behalf. A node representative is responsible for forming adjacencies with other nodes; forming associations with endpoints; assigning locator elements to component nodes and endpoints; receiving maps from component nodes; and constructing its node's maps. All node representatives for a given node must construct the same map for a node, i.e., must use the same algorithms for map construction. Node representatives are authoritative sources for the maps of the nodes they represent. 2. Endpoint representives. Each Nimrod endpoint must have one or more representatives which maintain the database of the endpoint's attributes and act on its behalf. An endpoint representative is responsible for forming associations with nodes; discovering, through the DNS, locators of the endpoints with which its endpoints wish to communicate; requesting routes to those endpoints by querying route agents, and ensuring that the routes satisfy its endpoints' service requirements; initiating path setup along the selected routes; and forwarding data along the established paths. Endpoint representatives are authoritative sources for the locators and service requirements of the endpoints they represent. 3. Route agents. Each node may have one or more route agents responsible for collecting maps from nodes throughout the internetwork and for generating and dispensing routes based on endpoint service requirements and node connectivity specificiations advertised in maps. Route agents 8 Internet DraftNimrod Functionality and Protocol Specifications March 1996 are authoritative sources for the routes they generate. 4. Forwarding agents. Each node must have one or more forwarding agents responsible for initiating neighbor relationships with forwarding agents in other nodes; requesting routes; installing forwarding information in routers; forwarding messages along established paths; and controlling traffic flow into and out of the node according to the node's access restrictions. While each of these functions could be performed by a different type of agent, we have elected to concentrate them in the forwarding agents, in order to minimize the number of different agent types performing the Nimrod routing functions. Forwarding agents are authoritative sources for the portion of the paths that traverse them. Agents acting on behalf of a node need not reside within that node. Nevertheless, we recommend locating all Nimrod agents (and their databases) close to the entities on whose behalf they act. Such location minimizes delay and internetwork resource consumption when updating the databases corresponding to those entities and in responding to queries from other agents in the vicinity of those entities. A Nimrod agent residing external to the node on whose behalf it acts must be configured with location information for that node, and in some cases for ancestral and descendant nodes as well, in order communicate with other agents that act on behalf of the same node. Also, the forwarding agents within a node must be configured with the location of agents external to but acting on behalf of that node. We recommend placing all agents within the node on whose behalf they act, and henceforth we describe agent behavior from this perspective. 9 Internet DraftNimrod Functionality and Protocol Specifications March 1996 3 Nimrod Operation : An Overview This section describes key operating procedures in Nimrod from the viewpoint of how the various databases are managed. The description is organized based on the pertinent database. Specifically, maps (construction, dissemination), routes (generation, acquisition), locators (acquisition, notification, release), adjacencies (formation, release), paths (setup, teardown, forwarding), and discovery (of neighbors, agents) are addressed. This section only provides a brief summary of the operations. For a detailed exposition, the reader is referred to [2]. In the postscript version of this document, the reader may refer to Figures 1-4 (given in Appendix 1), for a quick overview of some of these operations. Throughout this section, the text contains references in parentheses to labels in these figures. 3.1 Maps Each Nimrod node has a set of maps describing its traffic service offerings and its adjacencies. Maps are maintained and updated by node representatives. A node representative maintains two kinds of maps for its node: a basic map that depicts the child nodes, the adjacencies between child nodes and the adjacencies between child and external nodes, plus the services provided by each of these nodes; an abstract map that depicts the node, its adjacencies to other nodes and the services provided by the node between any pair of such node adjacencies. A node representative may construct its abstract map using information obtained from abstraction of basic map, configuration, or measurement of service qualities across the node. Abstraction mechanisms are not a part of the Nimrod specifications. Rather, each node may choose to implement its own abstraction algorithm (uniform throughout a given node). Maps are updated using map updates, and obtained using map queries as described below. 3.1.1 Map Update Map updates are distributed using the Update protocol described in section 5. Maps are automatically updated in response to topological changes using constrained hierarchical flooding, according to the following procedure. The update originates from a representative (Nr in Figure 1) of the node whose abstract map has changed. This node representative sends (arrow 1 in Figure 1) the new abstract map to each boundary forwarding agent (F) which is a neighbor of a boundary forwarding agent of its parent. Note that sending to each forwarding agent is necessary in order to handle the case of a partitioned node. F in turn forwards (arrow 2) the update to each Fpto which it is adjacent. That Fp sends (3.x) the map to all of the node representatives (Nrp) and all of the route agents (Rp) in the node. The change in the abstract map of a node causes a change in the basic map of the parent node. This in turn may or may not cause a change in the abstract 10 Internet DraftNimrod Functionality and Protocol Specifications March 1996 map of the parent node. If it does, then a designated node representative originates another update to the next higher level by sending (4) the new abstract map to a boundary forwarding agent. The procedure described above now applies again at this higher level. In the worst case, a change in a node's abstract map may force changes in all of its ancestral nodes' abstract maps. We expect such changes to be rare, especially in nodes whose descendants are multiply connected. 3.1.2 Map Request and Release Map requestsm, responses, and releases are transmitted using the Query-Response protocol described in section 6. A map request is sent by a route agent, acting on behalf of an endpoint wishing to obtain or subscribe to the map of a node for route computation. A map release unsubscribes to the map of the node for which a subscribe request was sent. Our description below is in terms of map request; map release is very similar. The kinds of maps that can be requested are as follows: 1. Abstract Maps. Two kinds of abstract maps can be requested - abbreviated or full. An abstract map is full if it contains service information (i.e., connectivity specifications) and abbreviated if it does not. 2. Basic Maps. Two kinds of basic maps can be requested - complete or partial. A basic map is complete if it contains abstract maps of all component nodes and is partial if it only contains maps of a proper subset of the component nodes. The abstract map of the component nodes contained in a basic map may all be either full or abbreviated. A route agent sends the map request towards the targeted node. A flag in the request indicates whether or not a subscription is requested, i.e., updates are to be automatically sent. When the map query reaches a boundary forwarding agent for the targeted node, this forwarding agent relays the query to a node representative for that node. The node representative responds to the route agent with the largest subset of the requested map that is consistent with the map distribution restrictions. If it does not have the maps to fulfill the query, or if its restrictions do not permit it to respond, it still sends a reply to the requestor containing a reason for failure. A node representative is not required to support the map subscription service. 3.2 Routes Route agents use the maps obtained, either through automatic updates or in response to explicit map requests, in order to do route generation. Endpoint representatives and forwarding agents obtain these routes from the route agent using route requests. A route specification is expressed in terms of nodes and the connectivity specifications through those nodes, and 11 Internet DraftNimrod Functionality and Protocol Specifications March 1996 is used to specify forwarding information to be installed in routers. It also lists the services provided by the route. 3.2.1 Route Generation The input to route generation includes maps of the topology, a set of session service requirements, and the source and the destination node locators. Its output includes a (set of) route(s), or an indication that no route can be found. Each route contains a sequence of node locators and connectivity specification labels for nodes that have to be traversed in order to meet the service requirements. We note that the topology used for route generation typically represents the network in varying levels of detail for different regions. Thus, the route constructed by the route generation algorithm will typically not contain the complete list of all routers through which a datagram should pass. The details are filled in at the node where they are required in a recursive fashion when setting up a path or forwarding datagrams (see section 7 for details). Route generation algorithms are not specified by Nimrod. Rather, each route agent may choose to implement its own route generation algorithm, even within a single node. 3.2.2 Route Request and Release Route requests, response, and releases are transmitted using the Query-Response protocol described in section 6. An endpoint representative or a forwarding agent that wishes to obtain a route to a destination may request a route from a route agent. The route request contains the source endpoint's EID and locators, the destination endpoint's EID and locators, and the source and/or destination traffic service requirements. Note that if there are no strict traffic service requirements, in terms of quality, monetary cost, or access restrictions, a route may not need to be acquired (the source and destination endpoint locators may suffice for the route). Note also that the route agent to which a route request is sent need not be in the same node as the requestor. A route request may also be a subscription to a route, i.e., updated routes are automatically sent to the subscriber. A route release is used to unsubscribe to a specific route. Route agents are not required to support the route subscription service in this version of Nimrod. In response to a route request, a route agent first searches its route database for a set of feasible routes and if unsuccessful, invokes the route generation algorithm. The route agent may, in the process of attempting to generate feasible routes, obtain more maps of nodes using the map query procedure described in section 3.1.2. A route agent responds to the 12 Internet DraftNimrod Functionality and Protocol Specifications March 1996 requestor with either a set of feasible routes or an indication that no feasible route could be found. 3.3 Locators Nimrod nodes and endpoints require locators for routing. Each node has exactly one locator and each endpoint is associated with at least one locator. Locators are assigned during initialization following activation, reconfiguration, or movement. The representatives of a node are responsible for acquiring the node locator, and the endpoint representative is responsible for acquiring the locators for each of its endpoints. 3.3.1 Acquiring and Releasing Node Locators Node locator requests, responses, and releases are transmitted using the Query-Response protocol described in section 6. Node locator acquisition involves two phases. First, the designated representative of a node acquires a locator from a representative of its parent node. Next, it notifies all of the representatives within its node and all descendant nodes of the existence of the new locator. The first phase is illustrated in Appendix 1, Figure 2 and the second phase in Figure 3. The designated node representative (Nr in Figure 3) wishing to acquire a locator for a node Z first determines the node P from which its locator should be acquired. This node representative (Nr) then sends (1) a locator request message to a boundary forwarding agent (F) which forwards (2) the locator request message to the neighboring boundary forwarding agent in the parent node, which in turn relays (3) the message to a node representative for P (Nrp). The request contains information that enables the recipient node representative (Nrp) to evaluate the request and decide whether it can or wants to honor the request. The node representative responds (4) either with a new node locator (if it decides to honor the request), unique among the locators of P's component nodes, or a denial response otherwise. Upon receiving a positive response, the originating node representative (Nr) decides whether to accept the locator. If it (Nr) decides to accept the locator, the node representative starts the notification phase (refer Fig 3). Locator notifications are distributed using the Update protocol described in section 5. The node representative notifies all agents in its node (arrows 5.x), and all agents in Z's descendant nodes, of the new locator. The latter is done by forwarding the message to each forwarding agent in Z which is a neighbor of a forwarding agent for one of Z's component nodes. These forwarding agents (Fc) in the component nodes distribute the message to all agents in the component nodes including boundary forwarding agents (Fcc) to their children and so on until the locator trickles down to all of the nodes that have Z as an ancestor. 13 Internet DraftNimrod Functionality and Protocol Specifications March 1996 A locator release may be sent by a node representative of a node wishing to unsubscribe to a locator. This could happen, for instance, if an organization changes its service provider, or due to mobility of a network. The node representative includes the old locator to be released, and the locator and EID of the node representative that issued the locator, in its locator release message to its parent. 3.3.2 Acquiring and Releasing Endpoint Locators Endpoint locator requests, responses, and releases are transmitted using the Query-Response protocol described in section 6. An endoint representative attempts to acquire a set of locators for each of its endpoints. The endpoint representative, say Er, selects a set of target nodes and for each selected node Z, sends a locator request message identifying Z (label 5 in Figure 2) to a node representative for Z. As in the case of node locators, if this node representative decides to honor the request, it sends a response (6) containing the locator. 3.4 Adjacencies An adjacency is a neighbor relationship formed between two nodes that are physically joined. The neighbor relationship need not be symmetric, i.e., node A may be adjacent to node B but not vice versa. Adjacencies of a node Z to an external node Y may be formed by clustering together adjacencies of component nodes of Z to Y. At the lowest level, adjacencies are the physical connections themselves. 3.4.1 Acquiring, Activating, and Releasing Adjacencies Adjacency requests, responses, releases, and activations are transmitted using the Query-Response protocol described in section 6. The distribution of A single designated node representative is responsible for forming adjacencies between its node and neighboring nodes. When forming adjacencies by clustering existing adjacencies (or physical connections), the node representative obtains candidate external adjacencies from the node's basic map and groups these adjacencies according to which of their destination nodes are components of the same enclosing node. This information defines the target node for the adjacency formation requests. For each candidate adjacency, the node representative initiates an adjacency formation procedure (depicted in Appendix 1 - Figure 4). The node representative (Nr) begins by sending an adjacency request (1) to a node representative for the specified node (Nrs). Using information present in this request, the recipient node representative (Nrs) determines whether or not to honor the request, and replies (2) to the requesting node representative (Nr) about its decision. If the response is positive, then 14 Internet DraftNimrod Functionality and Protocol Specifications March 1996 the node representative decides whether or not to accept the adjacency. If it decides to accept, then it updates (3.x) all of the node representatives of the newly formed adjacency. The node representative at the other end of the adjacency (Nrs) also updates (4.x) all node representatives within its node. The adjacency updates to node representatives in the nodes forming the adjacency are distributed using the Update protocol described in section 5. In addition, the adjacency is ``activated'' by having the two node representatives inform the respective boundary forwarding agents constituting the adjacency that Nimrod data traffic may now be passed. If a node representative receives a negative reply to an adjacency request message, the message may contain information that indicates that the adjacency is not appropriate. An adjacency is terminated by sending an adjacency release request to the node representative which granted the adjacency. Management decisions and lack of data for a specified period of time may be other reasons for terminating an adjacency. 3.5 Paths Nimrod supports two distinct data message forwarding modes: flow and datagram. For each mode, a forwarding agent's ``next-hop'' forwarding decision is dictated by the information stored in its forwarding database and by information contained within the message to be forwarded. Flow mode requires the establishment of session-specific forwarding state in certain forwarding agents along the routes selected for a traffic session. With flow mode, each session is assigned one or more paths, derived from the selected routes. A path corresponds to forwarding state stored in forwarding agents along a route, and each path has a label which is unique within all of these forwarding agents (but not necessarily globally unique). Distinct traffic sessions may use the same path, and distinct paths may use the same route. The minimum forwarding state required for flow-mode forwarding includes linkages between the path label and the path's previous- and next-hop forwarding agents (and service guarantees for traffic control, if any). In flow mode, data messages carry the path label(s) that guides the message forwarding decisions at forwarding agents along the path. Datagram mode does not require the establishment of any session-specific forwarding state. In datagram mode, data messages carry a description of the selected route, which guides the message forwarding decisions at forwarding agents along the route. Each forwarding agent at the beginning of a route segment (the portion of a route between two successive nodes listed in a route specification) makes an independent forwarding decision for that segment, and hence the session source and destination relinquish some control over message forwarding. However, datagram mode provides robust forwarding, in the sense that the intermediate forwarding agents can base their message forwarding decisions on the current state of their portion of the internetwork. Both of the Nimrod forwarding modes rely on the existence of underlying 15 Internet DraftNimrod Functionality and Protocol Specifications March 1996 paths to fill in route segments. A path may connect a source endpoint to one or more destination endpoints. Forwarding agents execute path management procedures to install path state in and remove path state from their forwarding databases. With these procedures, Nimrod provides support for management and use of unicast and multicast paths. We note, however, that multicast group management and multicast route construction are not part of this initial version of Nimrod. These and other multicast issues are treated in detail in [4]. For simplicity of discussion, we focus on unicast paths in the remainder of this section. 3.5.1 Path Setup Paths may be set up from source to destination or from destination to source. Each path has an initiator and a target. We expect that most paths will be set up from the source endpoint to the destination endpoint. Hence, the initiator usually begins the path setup procedure on behalf of the source endpoint, and the target usually accepts or rejects a path on behalf of the destination endpoint. Nimrod paths are inherently multilevel as follows. We begin with a single path, p0, derived from the selected route between the source and destination endpoints for the traffic session. (The superscript indexes the level of the path, where the top level is 0.) This path comprises multiple contiguous paths, p11;: ::;p1n, one for each of the n segments of the route on which p0is based. (The subscript indexes the path for the corresponding route segment of the higher level path.) Each p1jitself comprises multiple contiguous paths corresponding to each of its segments, and so on. In general, for each pijcomposing pi-1k= pi, the initiator and target of pij maintain linkages to the path pi-1k(pi), which helps to guide forwarding along the successive segments of pi-1k(pi). Forwarding agents and endpoint representatives try to form paths by piecing together existing paths rather than by setting up new paths. This method provides the lowest-cost message forwarding in terms of the amount of route generation and forwarding state installation required. In a busy internetwork, there are likely to be many existing paths, and hence we expect this mechanism to be much less expensive than individually setting up and maintaining paths for each traffic session. We now describe how a new traffic session uses paths at multiple levels, distinguishing the actions in flow mode and datagram mode where appropriate. Whenever the endpoint representative receives a data transport request, it always checks whether there already exists a satisfactory path for the session. This is true whether the new session desires flow or datagram mode forwarding. If a satisfactory path exists, the endpoint representative links the session to the path and forwards session traffic along that path. If no such path exists, however, the endpoint representative attempts to obtain a feasible route for the session. Note that route generation might 16 Internet DraftNimrod Functionality and Protocol Specifications March 1996 not be required and that a feasible route might include only the source and destination locators. After obtaining a feasible route, the endpoint representative proceeds to determine where to install the necessary forwarding state. Flow mode:The endpoint representative becomes the initiator of a new path, p0, and generates a path setup message. If the route contains more than the source and destination locators, the endpoint representative then checks whether there already exists a satisfactory path for the session from itself to the next node in the route specification. Provided such a path, p11, exists, the endpoint representative proceeds as follows: Flow mode:The initiator of p11 (which is also the initiator of p0) links p0 and p11in its forwarding database and sends p0's setup message to the target of p11. Upon reception of the setup message, the target of p11 also links p0 and p11in its forwarding database. Datagram mode:The initiator of p11 sends the data message to the target of p11. If no satisfactory path, p11, yet exists between the first two nodes in the route specification, the endpoint representative attempts to form such a path by piecing together existing paths. The endpoint representative attempts to find an existing path whose destination locator is the longest match on the next node's locator and is within the context of the two nodes (i.e., the lowest node in the hierarchy that contains both nodes). If such a path, p21, exists, the endpoint representative proceeds as it did with p11. If the endpoint representative fails to find a satisfactory path to any of the second node's ancestral nodes contained within the context, then there are no ``direct'' paths to the second node. The endpoint representative then seeks a path up to the its node's enclosing node, as there are likely to be more existing paths between higher-level entities. To this end, the endpoint representative checks whether there already exists a satisfactory path whose target is an exit point of the its node's enclosing node. If such a path, p21, exists, the endpoint representative proceeds as it did with p11. Otherwise, the endpoint representative attempts to obtain a feasible route and set up a path, p21. If there are no short-cut paths from the its node's enclosing node to any of the second node's ancestral nodes in the context, the above procedure may need to be repeated for successively higher-level ancestors of the endpoint representative's node, up to but not including the context. If no short-cut paths exist at any of these levels, a route must be generated and a path set up, from the node below the context and containing the first node to the second node. 17 Internet DraftNimrod Functionality and Protocol Specifications March 1996 This iterative path formation procedure is performed by the target of each path thus selected, which then becomes the initiator for the path for the next segment, and so. Note that in the above description, the words ``endpoint representative'' should be replaced by ``forwarding agent'' when referring to the actions taken by intermediate agents along a path. The procedure terminates after attaining the last node in the route and the target endpoint representative in that node, possibly linking together paths at many different levels. In the presence of multilevel paths, each data message carries a nested sequence of path labels, in order to enable all forwarding agents involved in the paths to forward the message correctly. Intermediate forwarding agents update the path labels in the message, according to the linkages between paths stored in their forwarding databases. Upon receipt of a data message, the target of path pjifinds it is linked to path pj-1k=pj which in turn is linked to path pji+1and hence is the initiator of pji+1. This forwarding agent strips off the label for pjiand replaces it with the label for pji+1, before forwarding the message along that path. 3.5.2 Path Acceptance Each setup message in flow mode and each data message in datagram mode contains the route specification and additional service requirements, such as resource reservation requests. A boundary forwarding agent or endpoint representative receiving a setup or datagram message determines message acceptability. Acceptability is in part based on the perceived consistency between the route specification and service requirements contained in the message and the service attributes of each node traversed. When a forwarding agent refuses a setup message, it informs the other forwarding agents on the path between and including itself and the initiator. At the target, once a setup message passes the service attribute consistency check, it must also pass an endpoint-specific consistency check. In particular, the target determines the perceived consistency between the route specification and service requirements contained in the message and the service requirements of the target endpoint. Each target that accepts a setup message informs the initiator. If there is an inconsistency with the target endpoint's service requirements, the target takes one of two actions, depending upon whether the target is the path's destination or source: 1. If the target is at the destination endpoint, it returns to the initiator a message containing its endpoints' destination service requirements. The initiator is then responsible for obtaining a route and setting up a path that is consistent with both the source and destination service requirements. 2. If the target is at the source endpoint, it returns to the initiator a 18 Internet DraftNimrod Functionality and Protocol Specifications March 1996 message indicating that it will generate its own route. The target is then responsible for obtaining a route and setting up a path that is consistent with the source service requirements and the destination service requirements contained in the setup message. Any forwarding agent or endpoint representative may tear down a path by removing the corresponding forwarding state from its forwarding database. Reasons for path teardown include: o Detection of a connectivity failure along a path. o A change in node service attributes or traffic service requirements such that the route on which the path is based is no longer feasible. o Path expiration if a path exceeds a maximum prescribed lifetime. o Path preemption in favor of another path. 3.6 Control Message Integrity and Authentication Nimrod control messages (all messages except data messages are considered to be control messages) include several pieces of information which permit recipient agents to determine whether the message has been corrupted in some way. In addition to information on type and length of various sections of the message, each Nimrod control message contains its generation timestamp, expressed in seconds elapsed since 0 hours on 1 January 1900 (same format as the NTP timestamp [6]), as well as ``authentication'' information that simultaneously acts as a checksum and as source authentication. 3.6.1 Timestamps Timestamps establish message recency and hence help recipients detect message replays. In order to detect whether a Nimrod control message is timely, the recipient agent compares its local time with the timestamp contained in the control message. If the timestamp is less recent than the local time by no more than ffi seconds or more recent than the local time by no more than ffl seconds, the message is considered to be timely. Otherwise, the message is considered to be out-of-date. Nimrod agents do not require fine-grained time synchronization in order to make their message recency determinations. Time synchonization on the order of minutes is all that is required. In fact, periodic manual adjusting of local clocks should be sufficient to maintain the necessary synchronization among agents. 19 Internet DraftNimrod Functionality and Protocol Specifications March 1996 3.6.2 Authentication This initial version of Nimrod does not contain any specification of security measures for Nimrod but rather place holders for such security measures to be introduced in a future version. Nevertheless, we do make recommendations for what these security measures might be. Most Nimrod control messages are generated by a single agent but distributed to many different agents, and most parts of these messages remain constant as the messages are passed among agents. To prevent communication problems caused by errors introduced into these messages which carrying routing-related information, each recipient agent should be able to determine with high confidence whether the message has indeed been generated by the stated source and whether the constant portions of the message have been modified since being generated by that source. We recommend that each Nimrod control message carry a public-key-based digital signature covering a reduced form of the constant portions of the message (e.g., apply the MD5 hashing function followed by the RSA signing function to the constant portions of the message). The authentication information may also include the public key to be used to verify the signature together with its certificate. While the RSA signing procedure is computationally intensive, signature verification is not. As long as control message generation at a particular agent is infrequent, that agent should be able to handle the load imposed by signing. Discovery messages are the only control messages generated frequently (i.e., inter-message period on the order of tens of seconds); an alternative mechanism may be required to protect these messages. The authentication information field in Nimrod control messages is represented as type, length, and value and hence is able to accommodate any integrity and security information that may be desired or required in the future. 20 Internet DraftNimrod Functionality and Protocol Specifications March 1996 4 Reliable Transaction Protocol Many Nimrod control messages reliable delivery. Rather than have each agent duplicate this reliability functionality, Nimrod includes a reliable transaction service, which provides its clients the ability to reliably communicate arbitrary size blocks of information between a client and a peer and to receive an arbitrary size reply in approximately one round-trip time. The reliable transaction service is built on Transaction-TCP (T/TCP) [10], [9]. T/TCP is a backwards-compatible extension of TCP, which is opti,mized for request/response interactions. In particular, T/TCP may bypass the normal three-way handshake required at TCP connection setup time. This bypass is accomplished by adding a ``connection count'' option in the TCP header, and by maintaining per-host connection history at both client and server. This information allows the server to correctly distinguish a new connection open (SYN, no ACK) from a duplicate or out-of-order open, without shaking hands with the client. Using T/TCP, the client can obtain a response to a request message in one round-trip-time to the server and back (plus the server's processing time). T/TCP uses the normal three-way close handshake; it does not impact transaction latency. 4.1 Services Interface The reliable transaction service permits one or more transactions to be invoked at a peer. The service interface is: o Flags, misellaneous flags. o Source locator and EID of the transaction. o Destination locator and EID of the transaction. o Keying info, for authentication purposes. o Service requirements for the transactions. o Transaction, beginning with a protocol header (e.g., Query-Response or Update). Each transaction uses a separate TCP connection. We note that this may cause excessive overhead if the client(s) invoke many transactions within a short period of time, and is an issue to be examined more carefully in future versions of Nimrod. The beginning of each transaction is the transaction header, containing the following fields. The packet formats are illustrated and specified values given in section 9.4. o Length(32 bits) of the transaction, including this field. o Version(2 bits) of Nimrod update and query-response protocols. 21 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Protocol(2 bits) identifier. Whether Update, Query, Response, or Discovery message. o Operation(4 bits). Particular operation within the protocol. o Phase(8 bits). The Update protocol uses several phases for certain operations. This denotes the current phase. o Transaction ID(16 bits). To identify the transaction. o Timestamp(32 bits). Seconds since 1/1/1900, 00:00. The user may abort an initiated transaction at any time. Note that race conditions are possible as the aborted opertion may have actually been performed by the peer. There is no rollback facility provided by the reliable transaction service. 22 Internet DraftNimrod Functionality and Protocol Specifications March 1996 5 The Update Protocol The Update Protocol is used to update database contents (e.g., the map database). The peers in the Update Protocol are the Nimrod agents, currently including the forwarding agents, node representatives, route agents, and endpoint representatives. These agents participate in the distribution of the updated information in the required portion of the network. The implicit flooding constituting the protocol is carefully constrained by involving only a few agents per update. 5.1 Service Interface The Update Protocol offers a distributed database update service in a manner that renders the exact locations of the database transparent to the user. The portion of the distributed database updated is dependent on the particular database and the operation as indicated by the user. The service interface includes the following: o Source. EID and optionally locator of the agent initating the update. o Destination. EID and optionally locator of a specific agent (e.g., endpoint representative whose locator has changed). May be left unspecified. o Operation. Indicates what kind of update (i.e., for what database) it is. Current values are shown in section 9. o Keying info for authentication purposes. o Service requirements if any. Will be ``best effort'' if unspecified. o Patience. A time interval within which the user wishes to hear about the success of the update. The Update Protocol provides hop-by-hop reliablity, but no effort is made to ensure end-to-end reliablity. An agent that has initiated an update cannot be certain that the message has been delivered to all intended agents, or that all intended database portions have been updated. We believe that the resource and complexity overhead demanded by an end-to-end reliablity mechanism is not justified by the importance for database updates (note that Nimrod does not require absolute database consistency (see section 2.3.1)). The user may abort a previously initiated update, for instance, because an update is superseded by more recent information. The protocol will discard the update if it has not already been sent out. However, no rollback facility is provided. 23 Internet DraftNimrod Functionality and Protocol Specifications March 1996 5.2 Protocol Operation The Update Protocol consists of an Update Message that is generated by the agent wishing to make an update to a particular database. (The Update Message consists of a variable length database specific portion, described in section 5.3, prepended by a common update protocol header, described in section 5.2.1 below, prepended by the transaction header, described in section 4.) The database may be held redundantly or cooperatively by multiple agents in a node, and an update may involve several nodes in the hierarchy. Thus, the update protocol involves several cooperating communicating agents. We classify the participating (peer) agents into two for ease of description: the originating agent and the transit agents. The originating agent forwards the message to one or more agents which further forward it to other agents and so on, until all the necessary database locations have been updated. Once an originating or transit agent has successfully forwarded a message, it does not retain any state corresponding to the message. The originating and transit agent operations are described in more detail later in this section. The Update Message uses the reliable transaction service (see section 4). Since no effort is made to provide end-to-end reliability, no acknowledgements (positive or negative) are part of the Update Protocol. Exceptions are handled by making a log entry into a file. The actions performed by an agent upon receipt of an Update Message is a function of the receiving agent type and of the user supplied Phase of the message, which are contained in the transaction header. Examples of operation types are map update, locator update, etc. The actions include the decision of whether to cache the message or not, and whether to forward the message further, and if so to whom. We specify such actions corresponding to each agent type (columns) and operation type (rows) pair using an Update Message Action Table (UMAT) shown in section 5.2.4. We note that the update protocol is an application level protocol between a set of peer agents indicated in the destination field of the Nimrod header (or the IP header). In transit between these peer agents, the Update Message map may be forwarded through other intermediate agents, which are not peers in the protocol. For instance, an update message from agent A1to agent A2may go through (forwarding) agents a1, a2, ..., ak before reaching A2. However, such an agent ai is not a peer, and does not act upon the message using the UMAT. 5.2.1 Update Header Each item in the common update header is explained below. The packet format of the header is illustrated in section 9.5. 24 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Originating agent type(8 bits). The type of agent originating the update. Current agent types include Forwarding Agent, Endpoint Representative, Node Representative, and Route Agent. o Destination agents type(8 bits). The type(s) of agent(s) for whom the update is intended. For multiple agents, the field contains a bitwise-OR of respective agent types. o Flags(16 bits). Miscellaneous operation dependent flags. o Database Type(8 bits). The type of the database that is being updated. At most one kind of database can be updated with an update message. o Database timestamp(24 bits). Denotes the origination time of the message with respect to the originating agent EID. That is, the timestamp and the EID together identify the packet uniquely, modulo wraparounds. The timestamp is the lower 24 bits of the the current time in seconds, beginning 0:00 1 January 1900. o Destination NID (8 bytes). The update is restricted to this node. Refer to section 11.5 for NID format. o Originating EID (8 bytes). EID of the agent that issued the update. Refer to section 11.3 for EID format. o Originating Locator. Locator of the agent that issued the update. Refer to section 11.2 for locator format. o Authenticator. Authentication field. Contains authentication information for the agent originating the update. 5.2.2 Originating Agent Operations An originating agent issuing the update constructs Update Message database-specific information (update) and fills in the transaction and common update protocol headers, including the timestamp that is incremented for every update originating from the agent. Note that the user-specified Operation is placed in the ``operation'' field of the transaction header. The UMAT is then consulted to obtain the actions, which typically involve sending the Update Message to one or more next-hop agents. This could be in terms of specific agents, all agents of a given type, or any agent of a given type. Using the destination agent's EID, locator, etc., the Update Message is enclosed within the appropriate headers (see section 9) and sent. Note that the protocol calls for one-to-one individual transmissions (no multicast) to the next-hop peer agents. If it is required that the message be sent to any one agent of a given type, each agent of that type is tried until successful. An update failure occurs if a specified agent is unreachable or 25 Internet DraftNimrod Functionality and Protocol Specifications March 1996 if (in case of ``any agent of given type'') no agent of a given type is reachable. Update failures should be logged for possible network management action. 5.2.3 Transit Agent Operations A transit agent receives an Update Message as ``TCP data''. It then performs checks on the message to determine whether the message is a valid one. This may include checking the timestamp in the update header to ensure that the Update Message is not a duplicate, and verifying the authenticator in the update header. If any of the checks fail, the error should be logged for possible network management action. If the checks pass, then the UMAT is consulted for actions using the Phase field in the transaction header. This may involve caching the information (i.e., updating the relevant database using the message contents) and/or sending the message with a changed Phase, to one or more next-hop agents. This could be in terms of specific agents, all agents of a given type, or any agent of a given type. Using the destination agent's EID, locator, etc., the Update Message is sent as TCP data. Note that the protocol calls for one-to-one individual transmissions (no multicast) to the next-hop peer agents. If it is required that the message be sent to any one agent of a given type, each agent of that type is tried until successful. An update failure occurs if a specified agent is unreachable or if (in case of ``any agent of given type'') no agent of a given type is reachable. Update failures should be logged for possible network management action. 5.2.4 The Update Message Action Table (UMAT) The UMAT represents Update Message forwarding instructions based on agent and phase, and depends on what functionality is mapped into the protocol and how the mapping is done. The Update Protocol is used for map updates (section 3.1), locator updates(section 3.3), and adjacency updates (section 3.4). Our use of the UMAT is mainly to provide a succinct and flexible protocol specification. While it is clearly not necessary that an implementation use an UMAT-equivalent, it is strongly recommended from experience since it provides flexibility by making it easy to change the functionality and the mapping - one simply needs to add additional message types and/or alter the entries in the table. The Update Protocol is typically used by Nimrod agents or other ``users'' in order to initiate updates. We use the term client to denote such users. For each of the four agent types (Forwarding Agent, Endpoint Representative, Node Representative, and Route Agent), we give below the actions upon receipt of an Update Message of each phase. The phases form the rows, and 26 Internet DraftNimrod Functionality and Protocol Specifications March 1996 _____________________________________________________________________________ ||________________________||_____F_________|_______N_________|__R___|__E___||__ ||_CLIENT-MAP-UPD_________||_______________|send(1,_,F-P)(1)_|______|______||_ ||_phase-map-forw-par_(1)_||send(2,P,F-C)___|________________|______|______||_ ||_phase-map-distrib_(2)_s||end(3,_,{R*,N*})_|_______________|______|______||_ ||_phase-map-notify_(3)___||_______________|_____cache_______|cache_|______||__ ||_CLIENT-LOC-UPD_________||_______________|_send(4,_,*)(2)__|______|______||_ || phase-loc-notify (4) || cache | cache |cache |cache || ||________________________||send(5,C,F-P)___|________________|______|______||_ ||_phase-loc-child_(5)____||_send(4,*)_____|_________________|______|______||_ ||_CLIENT-ADJ-UPD_________||_______________|__send(6,_,N*)___|______|______||_ ||_phase-adj-notify_(6)___||_______________|_____cache_______|______|______||__ Table 1: Update Message Action Table for each agent type (columns) upon receipt of message with each Operation Type (rows). . the value of each phase is indicated within paranthesis. Some operations are client requests, and these are denoted in upper case. Note that to a given agent, only the column corresponding to its agent type is of interest, and thus every agent may be thought of as implementing a column of the UMAT. The actions primarily involve the functions described, along with their parameter legends, below. We assume the existence of forwarding functionality required to realize these functions. 1. send(Phase, [Node] , [AgentType][*]). This sends an Update Message with phase field denoted by Phase to an agent of AgentType in Node. The AgentType is one of N, F, R, E, F-P (boundary to/from parent), or F-C (boundary to/from child). A suffix `*' denotes all agents of the type. If AgentType is omitted, it means all agents in the specified node. For the Node field, P, C and S denote a parent, child, and sibling nodes respectively. If it is absent, it means the current node. 2. cache. Update the relevant data structures containing the database. In the postscript version, the reader may refer to Figures 1 through 4 in Appendix 1 for assistance in understanding the protocol. 5.3 Database Specific Updates As mentioned earlier, the Update messages contain a database specific information, depending on the operation being performed. In this section, we describe the database specific contents and their semantics for each operation. This information is referred to as ``additional information'' in the following. The packet formats for the additional fields are illustrated in section 9.6. 27 Internet DraftNimrod Functionality and Protocol Specifications March 1996 5.3.1 Adjacency Updates After an adjacency has been formed, the node representatives of the nodes constituting the adjacency have to be informed, so that they may modify their maps accordingly. Note that there are two adjacency updates sent for each uni-directional adjacency: one from the node representative that sent the Adjacency Request query and one from the node representative that sent the Adjacency Request response. The additional information in the adjacency updates is: o Flags, indicating whether the adjacency is to a parent, child, or sibling. o Neighbor node NID and locator. o Locator of boundary forwarding agent that implements the adjacency. 5.3.2 Locator Updates A node representative that changes a locator acquired by an endpoint representative must notify that endpoint representative if the locator changes or becomes unusable, e.g., the association between the node and endpoint is being terminated. The additional information contained in such an update is: o Flags, indicating nature of change (e.g., depreciate use of old locator, terminate use of old locator). o Credentials of the representative originating update. o Old locator that is being changed/terminated. o EID and locator (optional) of the supplier of the old locator, if different from the originator. Either both EID and locator are present or both are absent. o New locator (optional) or reassigned locator. o Expiration (optional, present only if new locator is present) time for the new locator. The representative of a node that acquires a new locator must update all of its children so that they can change their locator. Also, a node representative that changes a locator acquired by a component node must notify that component node if the locator changes or becomes unusable, e.g., the parent-child relationship is being terminated. The additional information contained in such an update is: 28 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Flags, indicating nature of change (e.g., depreciate use of old locator, terminate use of old locator). o Credentials of the representative originating update. o Old locator that is being changed/terminated. o EID and locator (optional) of the supplier of the old locator, if different from the originator. Either both EID and locator are present or both are absent. o New locator (optional) or reassigned locator. o Expiration (optional, present only if new locator is present) time for the new locator. 5.3.3 Map Updates Whenever a node's topology or offered services change, it must generate a new set of maps. The new maps must be propagated to the node representatives and route agents in the node's parent. The maps are also sent to any agents that have explicitly requested to be notified of updates, if an implementation supports the subscription functionality. o Flags. Qualifies the map (e.g., one or more component nodes not in map, map is of a partitioned node). o Sequence number (24 bits) of the transaction that requested explicit (automatic) updates. o Map. Abstract map of node. o Maps (optional) to specific agents as requested. 29 Internet DraftNimrod Functionality and Protocol Specifications March 1996 6 The Query-Response Protocol The Query-Response (Q-R) Protocol is used to obtain database information(e.g., portions of the map database) in an efficient manner. The Q-R Protocol consists of two messages: the Query Message and the Response Message. (The Query and Response Messages consist of a variable length database specific portion, described in section 6.4, prepended by a common Query/Response Protocol header, described in section 6.3 below, prepended by the transaction header, described in section 4.) The Query Message is generated by the agent wishing to make a query, contains the nature of the information required, and is sent directly to a destination agent that the originating agent believes is in possession of the information. The destination agent obtains the requisite information and sends the Response Message back to the originating agent. Note that the destination agent may obtain the information from its own database, or may in turn send a Query to another agent in order to obtain this information. 6.1 Service Interface The Q-R protocol offers a reliable query-response service in one round trip time. It uses the reliable transaction service. In fact, excepting headers and minor interface differences, the Q-R protocol adds very little to the service provided by the transaction service. o Originator. EID (and optionally locator) of the agent initiating the query. o Destination. EID (and optionally locator) of the destination agent being queried. o Operation. Indicates what kind of query (i.e., for what database) it is. Current values are shown in section 9. o Keying info for authentication purposes. o Service requirements if any. Will be ``best effort'' if unspecified. o Patience. A time interval that the user wishes to wait for the response to the query. If there is no response within this time, the user expects to be informed. 6.2 Protocol Operation Unlike the update protocol, the Q-R Protocol involves only two agents - the originator and destination. The Query Message header (given below) contains the EID and locator of the querying agent. These fields are used by the destination agent for the destination of the response. The destination agent verifies the authentication information to ensure that the query can indeed be honored. Should the authentication check fail or if the 30 Internet DraftNimrod Functionality and Protocol Specifications March 1996 destination agent is unable to supply the required information, it still sends a response back with the appropriate error code. If the originator does not get a response within a certain time t, it is informed of a query failure. The value of t is given to the protocol by the application (e.g., map request). The application also has the option of requesting an abort of the query - in this case, the state is reset and the response is ignored. As in the case of the Update Protocol, exceptions are handled by making a log entry into a file for possible network management action. 6.3 Query/Response Header Each item in the common Query/Response header is given below. The packet format of the header is illustrated in section 9.5. o Originating agent type (8 bits). The type of agent originating the query. Agent types include Forwarding Agent, Endpoint Representative, Node Representative, and Route Agent. o Destination agent type (8 bits). The type of agent for whom the query is intended. o Flags (16 bits). Miscellaneous operation dependent flags. o Database Type (8 bits). The type of the database to which the information being queried pertains. At most one kind of database can be queried with a query message. Database types are defined in section 9.5. o Count (8 bits). Operation dependent. o Opcode (16 bits) In a query, operation specific or database specific information. In a response, zero if query is being responded to successfully, otherwise an error code indicating the reason for failure. o Originating EID. In a query, the EID of the agent that issued the query. Used to obtain the destination for the response. In a response, the EID of the agent issuing the response. o Originating Locator. In a query, the locator of the agent that issued the query. Used to obtain the destination for the response. In a response, the locator of the agent issuing the response. o Authenticator. Authentication field contains the information used to authenticate the query (in Query) or the response (in Reply). 31 Internet DraftNimrod Functionality and Protocol Specifications March 1996 6.4 Database Specific Request/Release As mentioned earlier, the Query and Response messages contain a database specific information, depending on the operation being performed. In this section, we describe the database specific contents and their semantics for each operation. This information is referred to as ``additional information'' in the following. The packet formats for all of the packets in this section are illustrated in section 9.7. 6.4.1 Adjacency Formation The designated representative of a node forms adjacencies with a neighboring node by sending an adjacency request message to one of the neighboring node's representatives. Any resulting adjacency is one-way, from the node requesting the adjacency to that which granted it. The additional information in an adjacency request Query Message is: o Locator of the node initiating the adjacency. o NID of the node initiating the adjacency. o Circuit ID. Physical circuit identifier of the link to form the adjacency, as known in the originating node. o Neighbor NID of the intended neighbor node in the adjacency. The additional information in an adjacency request Response Message is: o Credentials of the granting node. o Locator of the granting node. o NID of the granting node. o Circuit ID. Physical circuit identifier of the link to form the adjacency, as known in the granting node. 6.4.2 Adjacency Release If a node representative receives a positive reply to an adjacency request message, the message may contain information that indicates that the adjacency is not appropriate. An adjacency is terminated by sending an adjacency release request to the node representative which granted the adjacency. Node mobility and management policy changes may also induce a node to release adjacencies. The additional information in an adjacency release Query Message is: 32 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Opcode giving a reason for terminating the adjacency (e.g., policy, lack of traffic, movement, etc.). o Locator of the requesting node. o NID of the requesting node. o Circuit ID of the link forming the adjacency, as known in the requesting node. o Neighbor NID of the granting neighbor node. o Circuit ID of the link forming the adjacency, as known in the granting neighbor. The adjacency release response contains indication of success or failure (reason code if latter). It does not contain any other message-specific information. 6.4.3 Adjacency Activation When a node representative has formed an adjacency with a neighbor node, the boundary forwarding agent connected to the neighbor must be informed that Nimrod data traffic may be passed to the neighbor. Note that there are two instances of the Adjacency Activation associated with each uni-directional adjacency, one from the node representative that sent the Adjacency Request query to its boundary forwarding agent indicating outgoing connectivity, and one from the node representative that sent the Adjacency Request reply to its boundary forwarding agent indicating incoming connectivity. The additional information in an Adjacency activation request is: o Flags, indicating whether the adjacency to be activated is to a parent, child, or sibling. o Locator of requesting node. o NID of requesting node. o Circuit ID of the link forming the adjacency as known in the requesting node. o Neighbor NID of granting node. o Circuit ID of the link forming the adjacency as known in the granting node. 33 Internet DraftNimrod Functionality and Protocol Specifications March 1996 6.4.4 Locator Acquisition There are two forms of locator acquisition: one for a component node requesting its locator from a node representative of the parent node, and one for endpoint representatives requesting a locator for an endpoint from a node representative of the node. The two forms are distinguished by the originating agent type field in the common Query/Response header. The additional information in a locator acquisition Query Message, from a component node is: o Old locator (optional), previously assigned locator, requestor wants it to be reassigned. o EID and locator (optional) of the node representative that previously assigned the locator. Either both EID and locator are present or both are absent. o Parent NID of the node to provide the locator. o Child NID of the node requesting the locator. The additional information in a locator acquisition Query Message, from an endpoint representative is: o Old locator (optional). Previously assigned locator; requestor wants it to be reassigned. o EID and locator (optional), of the node representative that previously assigned the locator. Either both EID and locator are present or both are absent. o Provider NID of the node to provide the locator. o Name of the endpoint requesting a locator. o EID of the endpoint requesting a locator. The locator acquisition Response Message, for a request from a component node or an endpoint contains the following additional information: o Flags indicating nature of error if any, 0 if okay (e.g., might indicate that another agent may be able to provide it (see below)). o New/Reassigned Locator (optional, present when successful) that the requestor and its descendants should use as prefix. o Expiration time of the locator supplied. o EID (optional) of node representative that might be able to (re)assign the locator. 34 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Locator (optional) of node representative that might (re)assign the locator. o NID (optional) of node containing representative that might (re)assign locator. 6.4.5 Locator Release The locator release operation is used by a component node or endpoint representative that wishes to release a locator, typically due to mobility/relocation of a network or endpoint. The additional information in a locator release Query Message is: o Opcode indicating reason for release. o Flags indicates if a different agent should be contacted (see below). o Old locator to be released. o Issuing NR EID, that issued the locator that is being released. o Issuing NR locator, that issued the locator that is being released. o Different NR EID (optional, only if flag is set) that might release the locator. o Different NR locator (optional, only if flag is set) that might release the locator. 6.4.6 Map Acquisition Route agents request maps from node representatives. Since the requesting route agent is both acting on behalf of an endpoint, either through the endpoint representative or a forwarding agent, and possibly on behalf of other route agents that delegated the original route request, there are usually two or more sets of credentials associated with a map request. These credentials are used by the node representative's filter module, which is tasked with enforcing any distribution restrictions on the maps it dispenses. The additional information in a map Query Message is: o Flags qualifying the maps requested. Values allow to indicate: authoritative response required, abbreviated map required, complete map required, basic map required, need automatic map updates. o Locator of the node whose map is desired. 35 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Child elements (optional, if flag does not indicate complete). The locators of children whose map is desired. o Credentials of requesting node. The map request Response Message contains the following additional information: o Flags qualifiying map supplied. Values allow to indicate: partitioned node, map is of partitioned node, access denied/not available, automatic updates not supported o Requested Map. o Child Maps (optional). Maps of children nodes as requested (partial or complete). o Agents (e.g., in other partitions) that may be able to answer query. Note that the reply for a basic map may contain several maps, one for the requested node (map) and an abstract map for each of the child component nodes (child maps). The signature of the basic map covers the basic map and each of the maps of the child component nodes. 6.4.7 Map Termination Request An agent that has explicitly requested to be notified of map updates may choose to terminate that subscription request. The map termination Query Message contains the following additional information: o Transaction number (16 bits) of the map request that requested automatic updates. The map termination Response Message contains the following additional information: o Flags. Allow to indicate: Automatic updates not supported, try alternate agent. o Opcode. Zero if ok, error code (e.g., no record of automatic update request) otherwise. o Agent (optional) that might have the automatic update request. 36 Internet DraftNimrod Functionality and Protocol Specifications March 1996 6.4.8 Path Information Request Route agents that are generating multicast routes may require access to existing path information for a multicast group so that more optimal routes may be generated. The path information is distributed among the forwarding agents supporting the multicast group. Route agents use path database queries to obtain the necessary information. The additional information in a path database Query Message is: o Flags. Which info to return. o Path Labels about which info is desired. The additional information in path database Response Message is: o Opcode indicating error code or zero. o Path entries. List of path entries requested. 6.4.9 Route Generation Request/Reply Route agents generate routes on behalf of an endpoint in response to requests received from endpoint representatives and forwarding agents. Requests from forwarding agents will contain the credentials of the forwarding agent in addition to those provided by the endpoint representative. These credentials, along with those of the route agent (and any other route agents that delegated the request) are passed to any node representatives when requesting the node's map(s). The additional information in a Route Generation Query Message is: o Misc. Flags qualifying the nature of route required. o Count, number of feasible routes required. o Sources. Source locators for the route. o Destinations. Destination locators for the route. o Services required by the initiating endpoint. o Services required by the target endpoint. The route generation Response Message contains the following additional information: o Count. Number of feasible routes returned. 37 Internet DraftNimrod Functionality and Protocol Specifications March 1996 o Routes (optional, only if Count is non-zero). A list of routes from source(s) to destination(s) meeting the requirements. 38 Internet DraftNimrod Functionality and Protocol Specifications March 1996 7 Path Management Protocol Nimrod endpoint representatives and forwarding agents are responsible for establishing, in hosts and routers, state information necessary for forwarding Nimrod data messages. These agents are also responsible for forwarding Nimrod data messages according to this state information and the forwarding directives carried along in the messages. For a particular traffic session, the forwarding state information installed and maintained by endpoint representatives and forwarding agents is derived from the routes selected for the session. The forwarding state corresponding to a particular session and route is called a path. Multiple traffic sessions may use the same path, and multiple paths may be established based on the same route. A path may connect one or more source endpoints and one or more destination endpoints. In this initial version of Nimrod, each multicast path is either a source tree or a sink tree. (We note that this version of Nimrod includes procedures for installing and removing forwarding state for multicast trees and for forwarding session traffic along such trees. It, however, does not include procedures for multicast route construction and group management. For more information on these and other multicast issues as they relate to Nimrod, see [4].) Endpoint representatives and forwarding agents use the path management protocol to install and remove state information from their forwarding databases. Each endpoint representative and forwarding agent maintains state information for those paths that originate, terminate, or pass through it. Paths may be set up from source to destination or from destination to source. Each path has an initiator and a target. We expect that, in most cases, paths will be set up from the representative of the source endpoint to the representative of the destination endpoint. Hence, the initiator usually begins the path setup procedure on behalf of the source endpoint, and the target usually accepts or rejects a path on behalf of the destination endpoint. Forwarding state is established such that path management messages may be forwarded in both directions along the path; data messages always flow from source to destination. Paths are identified by path labels, which are unique along the path but not necessarily globally unique throughout the internetwork. Multiple non-intersecting paths may carry the same path label. By eliminating the requirement for global uniqueness of path labels, we can allow paths labels to be relatively short (24 bits), thus reducing the cost of carrying them in data messages and the cost of accessing information in forwarding databases indexed by them. The labels for each direction of a path are distinguished by a bit that indicates whether the message is flowing from from source to destination or from destination to source. Endpoint representatives and forwarding agents try to form a path for a new session by piecing together existing paths, rather than by setting up entirely new paths, provided the existing paths meet the new session's service requirements and permit its traffic to flow over them. This method of path construction incurs the least cost, in terms of the amount of route generation and forwarding state installation required per session. In a 39 Internet DraftNimrod Functionality and Protocol Specifications March 1996 busy internetwork, there are likely to be many existing paths. Moreover, we expect that most traffic sessions will not require specific service guarantees and most networks will not refuse to carry this ``best effort'' traffic. Therefore, we expect this method of path constructruction to be much less expensive than individually setting up and maintaining distinct paths for every traffic session. Most Nimrod paths are likely to be composed of paths which in turn are composed of lower-level paths, and so on. A Nimrod flow-mode data message (refer to section 3.5 for a description of flow mode and datagram mode message transmission) travelling over one of these multilevel, multi-segment paths must carry the path labels of all component paths it is currently traversing. These path labels are stacked in the data message and manipulated, i.e., pushed and popped, by the agents handling the message. Note, however, that an endpoint representative or forwarding agent uses only one of these path labels at a time in making a forwarding decision for the message. 7.1 Protocol Messages The path management protocol uses five types of messages: setup, accept, teardown, status, and ack. Message contents are described below, but explicit formats are depicted in section 9. Each path management message is covered by the same basic types of integrity and authentication checks as other Nimrod control messages, including checks on message length, timestamp validity, content corruption, and source authentication. (Refer to section 3 for more information on Nimrod control message integrity and authentication.) All path management messages travel along the path to which they refer. Endpoint representatives and forwarding agents respond to the receipt of a path management message in different ways, depending upon the type of agent and the type of message. Path management protocol messages may be used not only to set up and teardown a path, but also to collect and report performance monitoring information for a path (e.g., path delay and throughput). Path monitoring operates in two modes: collection and report. Hence, monitored information for a path may appear in a message as information being collected or reported. 7.1.1 Setup A setup message is generated by the initiator and travels along the path toward the target. It is used to establish forwarding state in endpoint representatives and forwarding agents. In this initial version of Nimrod, all endpoint representatives and forwarding agents retain the setup message for each active path that traverses them. This copy is used for detection of duplicate setup messages and for selecting alternate lower-level paths if the one initially chosen fails. Each setup message contains: 40 Internet DraftNimrod Functionality and Protocol Specifications March 1996 1. The label of the path to be established. 2. The time at which the setup message was originally generated. 3. Path indications: (a) Source-initiated or destination-initiated. (b) Unicast or multicast. (c) Available for shared use by multiple sessions or dedicated to a single session. 4. Requested services for the session, indicated as either requirements or preferences and expressed as type, length, and value, from the perspective of the initiator. These may include but are not limited to: o delay; o variation in delay; o throughput; o variation in throughput; o bit error rate; o packet error/loss rate; o monetary cost (per byte, per packet, or per unit time); o whether packet order must be preserved. They may also include information about the session itself, such as its expected lifetime, the type of organization to which the originator belongs (e.g., academic, government, commercial), and the characterization of session traffic (e.g., in terms of average and peak rates and burst durations). As the work of the Integrated Services working group progresses, we plan to integrate these service specifications into future versions of Nimrod. 5. The route, in terms of the locators of the nodes through which it must pass and, for each of these nodes, the label of the relevant connectivity specification to invoke across the node. 6. The EIDs of the source and destination endpoints and their representatives. (a) Unicast path setup. The EID information for the initiator is included to enable the target to establish a path in the reverse 41 Internet DraftNimrod Functionality and Protocol Specifications March 1996 direction, if desired. It is also useful for network management. The type of the target agent determines what EID information is included for the target in the setup message. i. The target is the representative of one of the session's endpoints. Hence, EID information about this specific target must be included in the setup message, in order for the path to connect to the intended endpoint. ii. The target is any boundary forwarding agent for the last node on the route. Hence, there is no specific target and thus EID information related to the target may be left unspecified in the setup message. Such a situation is likely to arise when constructing a higher-level path intended to carry traffic for multiple sessions. (b) Multicast path setup. Although source and destination EID information is not strictly necessary in this case, it is included for network management reasons. Note that the multicast path is either a source tree or a sink tree. Hence, EID information is included for either the source or the destination but not for both. 7. Collection-mode monitored information expressed as type, length, and value. This information can be used to determine the actual services available over the path. The initiator determines whether to collect this information during path setup. 8. Integrity and authentication information covering all but collection-mode monitored information. 7.1.2 Accept An accept message is generated by the target and travels along the path toward the initiator. It is used to indicate to the initiator that the path has been successfully established. Each accept message contains: 1. The label of the accepted path. 2. The EID of the agent that generated the accept message. 3. The time at which the accept message was generated. 4. Report-mode monitored information, expressed as type, length, and value. If the setup message collected monitored information, the accept message reports this information as the services available over the path. 5. Integrity and authentication information covering all of the above. 42 Internet DraftNimrod Functionality and Protocol Specifications March 1996 Provided the initiator is the source, it may send data over the path before it receives an accept message from the target. Circumstances under which an initiator may wish to wait for an accept message before sending data on a path include the following example. If the source pays for all data messages sent, whether or not they are successfully received at the destination, it may want to wait to make sure that the path is successfully established before sending data to the destination. In this case, the source should be prepared to buffer data received from the host until the accept message is received. An initiating source determines whether to wait for an accept message before transmitting data, based upon the session's requested services. 7.1.3 Teardown A teardown message is generated by any endpoint representative or forwarding agent on the path. It usually travels outwards, in both directions, towards the initiator and the target. In some cases (e.g., incomplete path setup or teardown generation by initiator or target), the teardown message may travel in only one direction. Teardown messages are used to remove forwarding state in endpoint representatives and forwarding agents. Each teardown message contains: 1. The label of the path to be torn down. 2. The EID of the agent that generated the teardown message. 3. The time at which the teardown message was generated. 4. The reason for the teardown, expressed as type, length, and value. A teardown message may be generated in response to any of the following events: Type 1: A path timeout. Each path has a specified maximum lifetime, in order to ensure that forwarding state is eventually removed, no matter how a path might fail. The agent detecting the path timeout generates two teardown messages, sending one toward the initiator and one toward the target. Type 2: Setup message is out-of-date. A setup message is out-of-date if the absolute value of the difference between its timestamp and the local time kept by the recipient agent varies by more than a specified maximum value. The agent receiving the out-of-date setup message generates a teardown message and sends it toward the initiator. Type 3: Route specification carried in the setup message is inconsistent with the node. Subtype 1: Recipient agent's node does not appear in the route specification. 43 Internet DraftNimrod Functionality and Protocol Specifications March 1996 Subtype 2: Connectivity specification carried in the setup message is not a valid connectivity specification for the recipient's node. These situations might indicate that the setup message has been corrupted in an undetected way, that a route agent used an out-of-date or corrupted map for the node when constructing the route, or that a setup message was misdelivered. In any case, the agent unable to recognize the node or connectivity specification generates a teardown message and sends it toward the initiator. Type 4: Conflict between the path and either the services provided by the recipient's node or the session service requirements from the target's perspective. Subtype 1: During path setup, an agent detects a conflict between the path and the services provided by its node (as reflected in the node's connectivity specifications) such that the node refuses to carry the session traffic or cannot meet the session service requirements. The agent generates a teardown message and sends it toward the initiator. Subtype 2: During path setup, the target detects a conflict between the path and the session service requirements from its perspective such that the path fails to meet these requirements. The target generates a teardown message containing its service requirements and sends the message toward the initiator. In this case, the initiator will attempt to find a path that is consistent with the target's service requirements as well as its own. Subtype 3: After path establishment, a forwarding agent detects a change in its node's services (as reflected in the node's connectivity specifications), which conflicts with the path such that either the node refuses to carry the session traffic or cannot meet the session service requirements. The agent generates a teardown message and sends copies toward the initiator and target. Subtype 4: After path establishment, the initiator or target detects a change in session service requirements which conflicts with the path such that the path fails to meet the new requirements. The initiator (or target) generates a teardown message and sends it toward the target (or initiator). Type 5: Preemption in favor of another path. The preempting agent generates a teardown message and sends copies toward the initiator and target. Endpoint representatives and forwarding agents are free to implement their own preemption criteria, but these are not part of this initial version of Nimrod. In this version, all paths are established on a first-come, first-served basis, with no preemption. Type 6: Unresolvable path label collision during path setup. (Refer to the discussion of status and ack messages below and to section 7.2.3 below for information on collision resolution.) The agent unable to resolve the path label collision generates a teardown message and sends it toward the initiator. 44 Internet DraftNimrod Functionality and Protocol Specifications March 1996 Type 7: Insufficient resources to any next-hop agent during path setup. The agent detecting a lack of resources to reach a specific next-hop agent attempts to find a suitable alternate next-hop agent. If it fails to find an alternate agent, the agent generates a teardown message and sends it to the previous-hop agent on the path. Subtype 1: Insufficient space in the forwarding database. The agent has no room for another entry in its forwarding database. Subtype 2: Insufficient resources for the path (e.g., unable to reserve required capacity for the session). Type 8: Loss of connectivity in the path. An agent may detect a loss in path connectivity through neighbor discovery (see section 8.1), agent discovery (see section 8.2), or through the loss of a lower-level path forming part of the given path. Subtype 1: An agent detecting a downstream loss to a specific next-hop agent attempts to repair the path by sending the original setup message to an alternate next-hop agent. If it fails to find an alternate agent, the agent generates a teardown message and sends it to the previous-hop agent on the path. Subtype 2: An agent detecting an upstream loss sets a repair timer. If the timer fires before the agent detects the path is repaired, the agent generates a teardown message and sends it to the next-hop agent on the path. An agent detects a repaired path through receipt of a copy of the original setup message (described in more detail in section 7.2). Type 9: Initiator exceeds specified maximum number of setup attempts. The initiator generates a teardown message and sends it toward the target, in order to clear any partially established forwarding state for the path. 5. Report-mode monitored information, expressed as type, length, and value. If the setup message collected monitored information, and the teardown is generated in response to the setup message, then the teardown message reports this information as the services available over the path thus far. 6. Integrity and authentication information covering all of the above. 7.1.4 Status Status messages may be generated by any agent along a path, in order to report path information or modify path characteristics. Each status message contains: 1. The label of the path to which the status pertains. 2. The EID of the agent that generated the status message. 45 Internet DraftNimrod Functionality and Protocol Specifications March 1996 3. The time at which the status message was generated. 4. The reason for the status message, expressed as type, length, and value. A status message may be generated for any of the following reasons: Type 1: Path monitoring. The initiator (or target) generates a status message containing collection-mode monitored information and sends it toward the target (or initiator). The ultimate recipient (which is usually the target or initiator but which may be an intermediate forwarding agent along the path if the path has failed in some way) responds by generating a status message containing report-mode monitored information and sends it toward the initiator (or target). Type 2: Path lifetime extension. The initiator generates a status message containing the amount of time by which to extend the path's life, hence preventing a path teardown prior to session cessation. Type 3: Replacement path label. When a path label contained in a setup message collides with a path label already in use at a forwarding agent or endpoint representative, that agent usually generates an ack message containing a replacement label for the new path that the previous-hop agent should use when sending messages along the new path toward this agent. In some cases, the agent may instead generate status messages containing replacement labels to use for an existing path, one for the previous-hop agent and one for the next-hop agent on the path. For example, if the existing path has not been used in a long time, the agent may choose to alter the forwarding information for that path rather than for the new path, in order to speed processing of data flowing along the new path. Type 4: Unrecognized requested session service contained in a setup message. This might indicate that the setup message has been corrupted in an undetected way or that the initiator is requesting new service requirements not yet known throughout the internetwork. In any case, the agent unable to recognize the session service request generates a status message, sends it toward the initiator, and ignores the unknown service when making next-hop decisions. Type 5: Failure to transmit a setup message successfully over a path hop. An agent detects this failure through an indication of unsuccessful transmission provided by the mechanism for hop-by-hop reliability (refer to the ack message discussion below for more information). It then generates a status message and sends it toward the initiator. In response to this type of status message, the initiator may either resend the original setup message or teardown the established portion of the path. Type 6: Unrecognized path label carried in a data message. This might indicate that the data message has been corrupted in an undetected way or that the agent receiving the message has failed recently and has lost state concerning the paths previously established through it. In any case, the agent unable to recognize the path label 46 Internet DraftNimrod Functionality and Protocol Specifications March 1996 generates a status message and sends it to the agent from which it received the data message containing that path label. 5. Integrity and authentication information covering all but collection-mode monitored information. 7.1.5 Ack The path management protocol provides reliable transmission of setup, accept, teardown, and status messages between successive sending and receiving agents along a path. After transmitting one of these messages along a path, the sending agent expects to receive, within a specified period of time, an ack message acknowledging successful receipt of the message at the next agent. If no ack message is forthcoming, the sending agent retransmits the setup, accept, teardown, or status message up to a specified maximum number of times. Furthermore, if the sending agent fails to receive an ack message after the specified number of retransmissions, it logs the event for possible network management action. Ack messages are generated by each agent along a path in response to the receipt of a setup, accept, teardown, or status message. Each ack message contains: 1. The label of the path to which the ack pertains. 2. The EID of the agent that generated the ack message. 3. The time at which the ack message was generated. 4. In the case of a path label collision, the replacement path label to use when sending messages along that path toward this agent. 5. Integrity and authentication information covering all of the above. Given that the setup portion of the path management protocol is already reliable end to end (i.e., the initiator expects to receive either an accept or teardown message in response to its setup message and retransmits the setup message if no response if forthcoming within a specifed time period), one might consider hop-by-hop reliability overkill. Note that in lossy networks, the additional hop-by-hop reliability increases the reliability and responsiveness of the path management protocol and reduces the number of end-to-end path setup message retransmissions required for successful path establishment. In this version of Nimrod, the path management protocol, unlike the other Nimrod protocols, uses its own reliability mechanism rather than T/TCP. A simpler and more appealing design is one that employs a single reliable transaction protocol for all Nimrod control messages, either T/TCP or perhaps a Nimrod-specific transaction protocol. The reason for treating path management messages differently from other Nimrod control messages is 47 Internet DraftNimrod Functionality and Protocol Specifications March 1996 performance. In the prototype implementation of Nimrod, much of the path management functionality has been placed in the packet handling ``fast path'' while T/TCP is not. We expect that other implementations would likely be structured similarly. To improve message handling efficiency, the path management protocol uses its own separate mechanism for reliability of message transmissions. Thus, for this initial version of Nimrod, practical considerations have prevailed in the area of packet handling. 7.2 Protocol Finite-State Machines The path setup procedure for this initial version of Nimrod operates as follows. An endpoint representative initiates path setup either under control of network management or when it receives, from an endpoint it represents, a data message that requires Nimrod routing and for which no existing path is suitable. After determining the location of the message's destination (by consulting its local locator cache or by querying the DNS) and obtaining a feasible route to that destination (by consulting its local route cache or by querying a route agent), the endpoint representative generates a setup message and sends it toward the target. The endpoint representative expects to receive an accept message from the target, indicating successful path establishment, within a specified time interval. If the endpoint representative fails to receive an accept message for the path within the allotted time interval, it retransmits the setup message, provided it has not yet exhausted its permitted number of setup attempts. The endpoint representative may make a specified maximum number of path setup attempts, in an effort to establish the path successfully. Unsuccessful path establishment manifests itself as either failure to receive an accept message after the maximum number of setup attempts or receipt of a teardown message instead of an accept message for the path. In either case, the initiator removes the path's state from its forwarding database and the corresponding route from its route cache, so that it does not use that route again immediately. It also logs the event for possible network management action. Moreover, when the initiator exhausts its maximum number of setup attempts and does not receive a teardown message, it generates a teardown message of type 9 (exceeded maximum number of setup attempts) and sends it toward the tar