Specification of the Hierarchical Update and Hierarchical Query-Response Protocols for Nimrod (DRAFT). Author : Ram Ramanathan Affiliation : BBN Systems and Technologies, Cambridge, MA. Contact : ramanath@bbn.com, (617) 873-2736. 1. ABOUT THIS ARTICLE This article describes the Hierarchical Update Protocol (henceforth referred to simply as the Update Protocol) and the Hierarchical Query-Response Protocol (henceforth referred to simply as the Q-R Protocol) for Nimrod. Emphasis is on protocol engines and packet contents and not packet formats. Additionally, we also briefly discuss the functionality of Map Construction and Association Maintenance and their realization using the Update and Q-R protocols. This article uses a number of concepts and terms from the Internet-Drafts "The Nimrod Architecture" (draft-castineyra-routing-arch-00.txt) and "A Perspective on Nimrod Functionality" (draft-steenstrup-fun-perspective-00. txt). Much of the discussion assumes that the reader is familiar with at least "The Nimrod Architecture". The protocol specifications given here reflect the current state of the work in progress, and may be updated, replaced or obsoleted by other documents at any time. The main purpose of this article is to solicit comments, suggestions from the members of the nimrod working-group. Please send such comments and suggestions to nimrod-wg@bbn.com. Specific questions may also be addressed to the author at the contact points given above. 2. FUNCTIONALITY 2.1. BACKGROUND Nimrod functionality includes the following : Agent Discovery, Neighbor Discovery, Locator Acquisition, Arc Formation, Map Construction and Query, Association Database Maintenance and Query, Route construction and Query, Path Setup and Teardown. Each functionality will be described in an article on the protocol that supports it. A tentative list of such articles is given below, along with the functionality they will probably be dealing with. 1. Map Construction and Query, Association Maintenance and Query: Described in this article. 2. Locator acquisition and Query, Route construction and Query, Arc Formation: To be included in the next version of this article. 3. Agent Discovery, Neighbor discovery: To be described in another article by other member(s) of the working group. 4. Path Setup and Teardown : To be described in another article by other member(s) of the working group. The functionality mentioned above is realized using Nimrod "Agents". There are five kinds of agents: Endpoint representatives, that maintain the attributes of an endpoint and act on its behalf; Node representatives that maintain the attributes of a node and act on its behalf; Association agents responsible for answering and propagating assoication queries; Route agents, responsible for collecting maps from nodes and for generating and dispensing routes; and Forwarding agents, responsible for initiating neighbor relationships with forwarding agents in other nodes, requesting routes, installing forwarding information and forwarding messages. In this section, we briefly describe how maps are constructed and associations are maintained. The description is based on and rephrases that given in [2]. In describing the functionality, we focus on an arbitrary node N in the Nimrod hierarchy and consider the interaction between agents in N and agents in the immediate hierarchial vicinity of N (i.e., agents in the parent, sibling and child nodes of N). Describing the behavior at a single level suffices because the functionality of an agent is independent of its absolute hierarchical level, barring boundary conditions at the topmost node and the bottommost nodes (though straightforward, we shall ignore these boundary conditions in this section, in the interest of simplicity). For ease and brevity of description, we use E, O, A, R and F to denote the endpoint rep., node rep., association agent, route agent and forwarding agent, and suffixes p, s and c to denote parent, sibling and child. For example, Ap denotes an association agent in the parent node of the considered node. Our description assumes that Agent Discovery has been completed and agents within a node have some knowledge of each other,as described in [2] and summarized below: a. All forwarding agents in a node N know how to reach at least one Entity Representative, at least one Association Agent (if one exists) and at least one Route Agent (if one exists) in N. b. All agents of a given type and belonging to the same node know how to reach each other. c. All boundary forwarding agents of a node know how to reach each other. d. Each interior forwarding agent knows how to reach at least one boundary forwarding agent. 2.2. MAP CONSTRUCTION A map specifies the nature of the connectivity between and within Nimrod nodes. Maps are maintained and updated by node respresentatives. A node representative maintains two kinds of maps for its node: a "detailed map" that depicts the children nodes, the arcs interconnecting children nodes and the arcs connecting children nodes to external nodes plus the services provided by each of these nodes and arcs; an "abstract map" that depicts the node, the arcs connecting it to other nodes and the services provided by the node between any pair of incoming and outgoing arcs and by the outgoing arcs themselves. A node representative may construct its abstract map using information obtained from abstraction of detailed map, configuration or measurement of service qualities across the node. A node N that requires the map of another node M in order to construct its route sends a "map query" using the Q-R Protocol (detailed in section 3) to M. M responds to N with its abstract/detailed map. Maps are also automatically updated in response to topological changes using constrained hierarchical flooding. A change in the topology within a node N may change the abstract map of that node. This in turn changes the detailed map of N's parent. This change may cause a change in abstract map of N's parent and so on. These changes are effected using the "map update" mechanism described below. Let us consider a node N in the hierarchy and assume that its abstract map has changed. Then the node representative O of N initiates the map update using the Update Protocol (described in section 3). O sends the map to a boundary forwarding agent F for N, which is also a (interior) forwarding agent for N's parent p(N). F distributes the map to one node representative Op and one route agent Rp in p(N), if it exists. Op further redistributes the map all of the node representatives in p(N) and Rp redistributes the map to all of the route agents in p(N). Now that Op has gotten the new abstract map of N, it is in a position to update *its* detailed map and abstract map. If there is indeed a change in the abstract map of p(N), Op initiates a map update. The description in the preceding paragraph now applies recursively at this higher level. Note that only *one* entity representative in p(N), namely the one that received the update from the forwarding agent, takes the initiative in continuing the update to higher levels. In the worst case, a change in node's abstract map may force changes in all of its ancestral nodes' abstract maps. We expect such changes to be rare, especially in nodes whose descendants are multiply connected. 2.3. ASSOCIATION MAINTENANCE Nimrod maintains the associations between endpoints and their locators in an "association database". Each endpoint has at least one endpoint identifier (EID), which is a globally unique, "computer-friendly" bit-string. In addition, we expect that an endpoint will also have an endpoint name (EN) that are structure "user-friendly" (ASCII) strings designed to be used as keys in a distributed database. An agent that requires the association entry for a given endpoint queries the association database with the EN as the key and obtains the corresponding EID(s) and locator(s). Changes in association are incorporated into the database using "association updates", described below. Before discussing how association updates work, it is important to look at how the association database might be stored in the network. A possible approach and the rationale for it are given in [2]. We describe here the main features of this approach and use it as our method of choice. The association database is organized as follows. We assume that endpoint labels are structured hierarchically using a dotted notation. For instance, nodes with ENs A.B and A.C are contained within the node with EN A. We begin with the universal node with EN U. For each x such that U.x is a valid EN prefix, we find the node n(U.x) containing the management authority corresponding to U.x. A pointer is placed in all of the association agents in U to the effect that a query for label U.x.* should go to n(U.x). Then we go down one level and repeat the procedure recursively for each of the nodes that were selected above. We repeat this all the way down the hierarchy. Thus, given an EN, say U.A.B.C, there is a linked chain of nodes (LCN) realized in the corresponding association agents that ends up at the node containing the association for U.A.B.C. We call an association agent in such a node the "authoritative source" for the association. All queries that need an up-to-date answer must reach the authoritative source. All updates must eventually reach the authoritative source (caching of associations may be done for efficiency (see [2])). Thus the crux of the association update (and also the association query) functionality lies in seeking the linked chain of nodes (LCN) and upon finding it, going down the chain until the authoritative source is reached. An association update (AUPD) is initiated by an endpoint representative E of the endpoint whose locator has changed. E hands the AUPD to an association agent in the same node N if one exists, or to the boundary forwarding agent F otherwise. F also acts as an interior forwarding agent for p(N) and sends the AUPD to an association agent in p(N) if one exists or to the boundary forwarding agent Fp in p(N). In this manner, the AUPD goes up the hierarchy until it reaches an association agent A in some node M. Upon receipt of the AUPD, A checks to see if it is in the linked chain of nodes corresponding to the EN in the AUPD. If it is not, then A forwards the AUPD to the boundary forwarding agent F. F also acts as an interior forwarding agent for p(M) and sends the AUPD higher up in the hierarchy. Thus, the AUPD goes up the hierarchy until it reaches an association agent B that is within the LCN. In the worst case, this will be the top-level (universal) node's association agents. B uses the EN and the its pointers (explained earlier) to determine the next node in the linked chain and forwards the AUPD towards that node. There, it will be received by the forwarding agent and handed over to an association agent which will repeat the procedure. In this manner, the update will eventually reach the authoritative source. The non-authoritative association agents encountered by the AUPD also distribute the AUPD to all other association agents in the same node. All such agents cache the association and may use this cache to respond to a future (non-authoritative) query. 3. PROTOCOLS 3.1. The Hierarchical Update Protocol 3.1.1. Introduction The Update Protocol is used to update database contents (e.g., the association database, the map database etc.) in an efficient manner. By efficient, we mean that the implicit flooding constituting the protocol is carefully constrained by involving only a few agents per update. The Update Protocol assumes the existence of a Reliable Transport Protocol and uses it to achieve reliability in its updates. No reliability mechanism (such as retransmissions etc.) is incorporated into the Update Protocol itself. Additionally, we assume that authentication and/or encryption of the message is done elsewhere, either as a separate layer or as part of the reliable transport. This will be specified separately in another document by the Nimrod working group. We also assume that agent discovery has been completed with agents within a node knowing how to reach other agents directly or indirectly (see section 2.1). Also, we assume that path setup and data forwarding functionality is available when required. Agent Discovery and Path Setup will be specified separately in another document by the Nimrod working group. The Update Protocol consists of an Update Message that is generated by the agent wishing to make an update to a particular database. The originating agent forwards the message to one or more agents which further forward it to other agents and so on, until all the necessary database locations have been updated. Once an originating or transit agent has successfully forwarded a message, it does not retain any state corresponding to the message. No acknowledgements are sent from the recipients of the Update Message. Exceptions are handled by triggering SNMP events. Thus, the state machine constituting the protocol engine at the agents is quite simple, if not trivial. However, the decision of whether to cache the message contents or not, and the decision of who to forward the message to requires some specification. We have used an Update Message Forwarding Table (UMFT) for this purpose. There is one UMFT for each of the five types of agents. The UMFT describes the actions to be taken on the receipt of a particular message from a particular agent. The UMFT depends on what functionality we map into the protocol and how we do the mapping. The UMFT given here maps the association and map update in the manner described in section 2. We note that the UMFT-based protocol specification approach provides flexibility by making it easy to change the functionality and the mapping - you just need to alter the entries in the table. We expect to include the locator update functionality also in the next version of this document. The Update Protocol is typically used by Nimrod agents or other "users" in order to initiate updates. We use the word "Application" to denote these users. For instance, the generation of the initial association update is done by an endpoint representative, which is the application using the protocol. 3.1.2. Message Contents Apart from the "standard" fields in the IP and Nimrod headers, the Update Message should contain the following. Some of these fields may eventually be pulled out to the Nimrod header. For the protocol, it is sufficient that these fields are present somewhere in the packet. - Originating agent EID. - Originating agent locator. - Database Type (which database the message refers to). - Timestamp of when the message was generated. - Sequence number for the message (a different stream for each database). - Update data (specific to the database). The length and packet format of these fields is TBD. 3.1.3. Transit Agent Protocol Engine Given below is the algorithm executed by transit agent (ThisAgent) upon receipt of Update Message (UpdMsg) from another agent (PrevAgent). The UMFT is described in section 3.1.5. begin /* Check message contents. More checks than those given here are probably needed. */ if timestamp(UpdMsg) is invalid /* too old or too new */ then trigger SNMP event and discard UpdMsg. if seq_num(UpdMsg) is invalid /* duplicate */ then trigger SNMP event and discard UpdMsg. /* Consult the UMFT table for caching and forwarding directions */ NextAgentType = UMFT(ThisAgent, PrevAgent, UpdMsg) if NextAgentType is not a valid agent, discard message. repeat choose a (previously unchosen) agent A of NextAgentType. if none found then trigger SNMP event and discard message else forward UpdMsg to A. until successfully forwarded or message discarded. end. 3.1.4. Originating Agent Protocol Engine This is identical to the transit agent protocol engine except for the fact that the UpdMsg is received from an application and not from another agent. That is, make PrevAgent = Application and the same algorithm applies. Also, the packet is timestamped and a sequence number is put on it. The differences are small and so we dispense with the complete specification. 3.1.5. The Update Message Forwarding Table (UMFT) For each of 5 agent types (Forwarding Agent, Enpoint Rep., Node Rep., Association Agent and Route Agent), we give below the actions upon receipt of an UpdMsg. The agents are represented by F, E, O, A and R and suffixed by p, c or s to indicate the agent in the parent, child or sibling node of the node under consideration. No suffix means the agent belongs to the node under consideration. Note that a boundary forwarding agent for a given node acts also as an interior forwarding agent for its parent node. We shall use F to represent the boundary forwarding agent and Fc to represent the interior forwarding agent (boundary forwarding agent for some child of the node). A *particular* child/sibling node is indicated by paranthesizing the node. For instance the forwarding agent of child node N is Fc(N). The 1st column contains the type of the message, 2nd contains the previous agent type and the 3rd contains the actions to be taken. Combinations not given represent "Unexpected" situations. If these happen, an SNMP event is triggered and the message is discarded. The table makes use of two functions "forward" and "present", whose semantics are described following the table. whose semantics are indicated following the table. The table uses the concept of Linked Chain of Nodes (LCN) details of which were given in section 2.1. *********** UMFT for F *********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Association Ep,A, if present(Ap) Update Fc (interior FA) then forward (UpdMsg, Ap) else forward (UpdMsg, Fp) --------------------------------------------------------------------------- Association Fp,Ap if present(A) Update then forward (UpdMsg, A) else forward (UpdMsg, Fc) -------------------------------------------------------------------------- Map Update O forward (UpdMsg, {Ep, Rp}) -------------------------------------------------------------------------- *********** UMFT for O *********** -------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Map Update Fc, forward (UpdMsg, {all O}); Application if abstract map of MyNode changed put new abstract map in NewUpdMsg; forward (NewUpdMsg, F); --------------------------------------------------------------------------- Map Update O Cache i.e., Update Map Database --------------------------------------------------------------------------- ********** UMFT for E ********** -------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- New Assoc. Application make NewUpdMsg and put new assoc. Indication F = neighboring forwarding agent; forward (NewUpdMsg, F); --------------------------------------------------------------------------- *********** UMFT for A *********** -------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Association Fc forward (UpdMsg, {all A}); Update (interior FA) if (ThisAgent is not Author. Src) if (ThisAgent in LCN) N = next node in chn; forward (UpdMsg, Fc(N)); else forward (UpdMsg, F); -------------------------------------------------------------------------- Association F forward (UpdMsg, {all A}); Update if (ThisAgent is not Author. Src) N = next node in LCN; forward (UpdMsg, Fc(N)); ------------------------------------------------------------------------- Association A Cache, i.e, Update Association Databs Update ------------------------------------------------------------------------- *********** UMFT for R *********** -------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Map Update F forward (UpdMsg, {all R}) --------------------------------------------------------------------------- Map Update R Cache, i.e., Update Map Database --------------------------------------------------------------------------- Semantics of functions: forward (Msg, AgentSet) : forward Msg *reliably* to one agent of each type given in Agent Set. Includes the functionality of resending to another agent if one agent cannot be reached. The set {all X} indicates all agents of type X including itself. May involve multiple hops. If an agent A in AgentSet is not 1 hop away, Msg is handed over to a neighboring forwarding agent that knows how to reach A. present (Agent) : returns true if there is an agent of type AgentType 3.2. The Hierarchical Query-Response Protocol 3.2.1. Introduction The Q-R Protocol is used to obtain database information(e.g., the association database, the map database etc.) in an efficient manner. The Q-R Protocol assumes the existence of a Reliable Transport Protocol and uses it to achieve reliability in its updates. No reliability mechanism (such as retransmissions etc.) is incorporated into the Q-R Protocol itself. Additionally, we assume that authentication and/or encryption of the message is done elsewhere, either as a separate layer or as part of the reliable transport. This will be specified separately in another document by the Nimrod working group. We also assume that agent discovery has been completed with agents within a node knowing how to reach other agents directly or indirectly (see section 2.1). Also, we assume that path setup and data forwarding functionality is available when required. Agent Discovery and Path Setup will be specified separately in another document by the Nimrod working group. The Q-R protocol consists of a Query Message that is generated by the agent wishing to make a query. The originating agent forwards the message to one or more agents and awaits a response to the query. The response to the query is contained in a Response Message that is generated by the agent that has access to the information requested in the query. If the originator does not get a response within a certain time t, it informs the application of a query failure. The value of t may be configured or given to the protocol by the application. The application also has the option of requesting an abort of the query - in this case, the state is reset and the response is ignored. As in the case of the Update Protocol, exceptions are handled by triggering SNMP events. The state machine for the originating agent has two states - INITIAL and WAITING (for a response). A transition from INITIAL to WAITING is made when a query is sent and the state is reset to INITIAL when a response arrives, timer expires or the query is aborted. The transit agent state machine, as in the Update Protocol, is trivial. The decision of whether to cache the message contents or not, and the decision of who to forward the message is done using the Query Message Forwarding Table (QMFT). There is one QMFT for each of the five types of agents. The QMFT describes the actions to be taken on the receipt of a particular message from a particular agent. The QMFT depends in a large way on what functionality we map into the protocol and how we do the mapping. The QMFT given here maps the association and map update in the manner described in section 2. We note that the QMFT-based protocol specification approach provides flexibility by making it easy to change the functionality and the mapping - you just need to alter the entries in the table. We expect to include route query and locator query functionalities also in the next version of this document. The Q-R Protocol is typically used by Nimrod agents or other "users" in order to initiate updates. We use the word "Application" to denote these users. For instance, the generation of the initial association update is done by an endpoint representative, which is the application using the protocol. 3.2.2. Message Contents Apart from the "standard" fields in the IP and Nimrod headers, the Query Message should contain the following. Some of these fields may eventually be pulled out to the Nimrod header. For the protocol, it is sufficient that these fields are present somewhere in the packet. - Originating agent EID. - Originating agent locator. - Database Type (which database the information is requested from). - Timestamp of when the message was generated. - Query identification number (used to match responses to queries) - Query data (specific to the database). The length and packet format of these fields is TBD. Apart from the "standard" fields in the IP and Nimrod headers, the Response Message should contain the following. Some of these fields may eventually be pulled out to the Nimrod header. For the protocol, it is sufficient that these fields are present somewhere in the packet. - Originating agent EID. - Originating agent locator. - Database Type (which database the information is about) - Timestamp of when the message was generated. - Response identification number (equal to the id of Query responding to) - Response data (specific to the database). 3.2.3. Transit Agent Protocol Engine Given below is the algorithm executed by the originating agent (ThisAgent) upon receipt of Query Message (QryMsg) from another agent (PrevAgent). The QMFT is described in section 3.2.5. begin /* Check message contents. More checks than those given here are probably needed. */ if timestamp(QryMsg) is invalid /* too old or too new */ then trigger SNMP event and discard QryMsg. if seq_num(QryMsg) is invalid /* duplicate */ then trigger SNMP event and discard QryMsg. /* Consult the QMFT table for caching and forwarding directions */ NextAgentType = QMFT(ThisAgent, PrevAgent, QryMsg) if NextAgentType is not a valid agent, discard message. repeat choose a (previously unchosen) agent A of NextAgentType. if none found then trigger SNMP event and discard message else forward QryMsg to A. until successfully forwarded or message discarded. end 3.2.4. Originating Agent Protocol Engine The originating agent protocol engine has two states INITIAL and WAITING. We define three types of messages that it can receive : QryAbort, a message from the application requesting the agent to abort the query; QryFail, a message sent by the agent to the application informing of failure; a DoQuery message from the application, requesting execution of the Q-R protocol. We note that each query will initiate a new state machine. We assume that messages are demultiplexed apriori to the "right" state machine. begin receive process message state INITIAL : if it is DoQuery message from application Prepare QryMsg and put timestamp and sequence number on it; set query-timer with value from application or configuration; Execute Transit Agent Protocol Engine (see section 3.2.3); Transition to state WAITING. if it is any other message Discard the message state WAITING : if it is QryResp from peer deliver database contents to application; Transition to state INITIAL; if it is QryAbort from application Transition to state INITIAL; if it is query-timer expiration send a QryFail message to application; Transition to state INITIAL; if it is any other message Discard the message. end. 3.2.5. The Query Message Forwarding Table For each of 5 agent types (Forwarding Agent, Enpoint Rep., Node Rep., Association Agent and Route Agent), we give below the actions upon receipt of an UpdMsg. The agents are represented by F, E, O, A and R and suffixed by p, c or s to indicate the agent in the parent, child or sibling node of the node under consideration. No suffix means the agent belongs to the node under consideration. Note that a boundary forwarding agent for a given node acts also as an interior forwarding agent for its parent node. We shall use F to represent the boundary forwarding agent and Fc to represent the interior forwarding agent (boundary forwarding agent for some child of the node). A *particular* child/sibling node is indicated by paranthesizing the node. For instance the forwarding agent of child node N is Fc(N). The 1st column contains the type of the message, 2nd contains the previous agent type and the 3rd contains the actions to be taken. Combinations not given represent "Unexpected" situations. If these happen, an SNMP event is triggered and the message is discarded. The table makes use of three functions "forward", "send" and "present" whose semantics are indicated following the table. The table uses the concept of Linked Chain of Nodes (LCN) details of which were given in section 2.1. ********** QMFT for F ********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Association Ep,A if present(Ap) Query Fc (interior FA) then forward (QryMsg, Ap) else forward (QryMsg, Fp) --------------------------------------------------------------------------- Association Fp,Ap if present(A) Query then forward (QryMsg, A) else forward (QryMsg, Fc) --------------------------------------------------------------------------- Map Querying forward (QryMsg, E) Query node --------------------------------------------------------------------------- ********** QMFT for O ********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Map Query F if query is answerable send-response(QryRsp, Queryingnode) --------------------------------------------------------------------------- *********** QMFT for E *********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Request for Application make QryMsg using Request for loc; Associaton Fnbr = neighboring forwarding agent; forward (QryMsg, Fnbr); --------------------------------------------------------------------------- ********** QMFT for A ********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Association Fc if (ThisAgent is Author. Src OR Query (cached answer is okay AND assoc. found in cache)) send-response (QryRsp, Queryingnode) else if (ThisAgent in Linked Chain) N = next node in chn; forward (QryMsg, Fc(N)); else forward (QryMsg, F); ---------------------------------------------------------------------------- Association F if (ThisAgent is Author. Src OR Query (cached answer is okay AND assoc. found in cache)) send-response (QryRsp, Queryingnode) else N = next node in linked chn; forward (QryMsg, Fc(N)); ---------------------------------------------------------------------------- ********** QMFT for R ********** --------------------------------------------------------------------------- Message PrevAgent Actions --------------------------------------------------------------------------- Request for Application N = node whose map is required Map make QryMsg using Request for Map send (QryMsg, N); --------------------------------------------------------------------------- Semantics of functions: forward (Msg, AgentSet) : forward Msg *reliably* to one agent of each type given in Agent Set. Includes the functionality of resending to another agent if one agent cannot be reached. The set {all X} indicates all agents of type X including itself. May involve multiple hops. If an agent A in AgentSet is not 1 hop away, Msg is handed over to a neighboring forwarding agent that knows how to reach A. send (Msg, Node) : conveys Msg to a boundary forwarding agent in Node. May need to set up a path to do this. present (Agent) : returns true if there is an agent of type AgentType. 4. REFERENCES [1] I. Castineyra, J. N. Chiappa, M. Steenstrup, "The Nimrod Routing Architecture", Working-Draft, July 1994. [2] M. Steenstrup, "A Perspective on Nimrod Functionality", Working-Draft, January 1995.