view Side-By-Side changes
Network Working Group Eric C. Rosen Internet Draft Cisco Systems, Inc. Expiration Date:FebruarySeptember 1998 Arun ViswanathanIBM Corp.Lucent Technologies Ross CallonAscend Communications,IronBridge Networks, Inc.August 1997 A ProposedMarch 1998 Multiprotocol Label Switching Architecturefor MPLS draft-ietf-mpls-arch-00.txtdraft-ietf-mpls-arch-01.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Tolearnview thecurrent statusentire list ofany Internet-Draft,current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa),nic.nordu.net (Europe),ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim),ds.internic.netftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This internet draftcontains a draft protocolspecifies the architecture for multiprotocol label switching (MPLS). Theproposedarchitecture is based on other label switching approaches [2-11] as well as on the MPLS Framework document [1]. Rosen, Viswanathan & Callon [Page 1] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997draft-ietf-mpls-arch-01.txt March 1998 Table of Contents 1 Introduction to MPLS ...............................34 1.1 Overview ...........................................34 1.2 Terminology ........................................56 1.3 Acronyms and Abbreviations ......................... 9 1.4 Acknowledgments .................................... 10 2 Outline of Approach ................................1011 2.1 Labels .............................................1011 2.2 Upstream and Downstream LSRs .......................1112 2.3 Labeled Packet .....................................1112 2.4 Label Assignment and Distribution; Attributes ......1112 2.5 Label Distribution Protocol (LDP) ..................1213 2.6 The Label Stack ....................................1213 2.7 The Next Hop Label Forwarding Entry (NHLFE) ........1314 2.8 Incoming Label Map (ILM) ...........................1314 2.9 Stream-to-NHLFE Map (STN) ..........................1315 2.10 Label Swapping .....................................1415 2.11 Scope and Uniqueness of Labels ..................... 15 2.12 Label Switched Path (LSP), LSP Ingress, LSP Egress .14 2.1216 2.13 Penultimate Hop Popping ............................ 18 2.14 LSP Next Hop .......................................16 2.1319 2.15 Route Selection ....................................17 2.1420 2.16 Time-to-Live (TTL) .................................18 2.1521 2.17 Loop Control .......................................19 2.15.122 2.17.1 Loop Prevention ....................................20 2.15.223 2.17.2 Interworking of Loop Control Options ...............22 2.1625 2.18 Merging and Non-Merging LSRs .......................23 2.16.126 2.18.1 Stream Merge .......................................24 2.16.227 2.18.2 Non-merging LSRs ...................................24 2.16.327 2.18.3 Labels for Merging and Non-Merging LSRs ............25 2.16.428 2.18.4 Merge over ATM .....................................26 2.16.4.129 2.18.4.1 Methods of Eliminating Cell Interleave .............26 2.16.4.229 2.18.4.2 Interoperation: VC Merge, VP Merge, and Non-Merge ..26 2.1729 2.19 LSP Control: Egress versus Local ...................27 2.1830 2.20 Granularity ........................................29 2.1932 2.21 Tunnels and Hierarchy ..............................30 2.19.133 2.21.1 Hop-by-Hop Routed Tunnel ...........................30 2.19.233 2.21.2 Explicitly Routed Tunnel ...........................30 2.19.333 2.21.3 LSP Tunnels ........................................30 2.19.433 2.21.4 Hierarchy: LSP Tunnels within LSPs .................31 2.19.534 2.21.5 LDP Peering and Hierarchy ..........................31 2.2034 2.22 LDP Transport ......................................33 2.2136 2.23 Label Encodings ....................................33 2.21.136 2.23.1 MPLS-specific Hardware and/or Software .............33 2.21.236 2.23.2 ATM Switches as LSRs ...............................34 2.21.3 Interoperability among Encoding Techniques ......... 35 2.22 Multicast .......................................... 3637 Rosen, Viswanathan & Callon [Page 2] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997draft-ietf-mpls-arch-01.txt March 1998 2.23.3 Interoperability among Encoding Techniques ......... 38 2.24 Multicast .......................................... 39 3 Some Applications of MPLS ..........................3639 3.1 MPLS and Hop by Hop Routed Traffic .................3639 3.1.1 Labels for Address Prefixes ........................3639 3.1.2 Distributing Labels for Address Prefixes ...........3639 3.1.2.1 LDP Peers for a Particular Address Prefix ..........3639 3.1.2.2 Distributing Labels ................................3740 3.1.3 Using the Hop by Hop path as the LSP ...............3841 3.1.4 LSP Egress and LSP Proxy Egress ....................3841 3.1.5 The POP Label ......................................3942 3.1.6 Option: Egress-Targeted Label Assignment ...........4043 3.2 MPLS and Explicitly Routed LSPs ....................4144 3.2.1 Explicitly Routed LSP Tunnels: Traffic Engineering .4244 3.3 Label Stacks and Implicit Peering ..................4245 3.4 MPLS and Multi-Path Routing ........................4346 3.5LSPs may beLSP Trees as Multipoint-to-Point Entities........... 44.......... 46 3.6 LSP Tunneling between BGP Border Routers ...........4447 3.7 Other Uses of Hop-by-Hop Routed LSP Tunnels ........4649 3.8 MPLS and Multicast .................................4649 4 LDP Procedures for Hop-by-Hop Routed Traffic ....... 50 4.1 The Procedures for Advertising and Using labels .... 50 4.1.1 Downstream LSR: Distribution Procedure ............. 50 4.1.1.1 PushUnconditional .................................. 51 4.1.1.2 PushConditional .................................... 51 4.1.1.3 PulledUnconditional ................................ 52 4.1.1.4 PulledConditional .................................. 52 4.1.2 Upstream LSR: Request Procedure .................... 53 4.1.2.1 RequestNever ....................................... 53 4.1.2.2 RequestWhenNeeded .................................. 53 4.1.2.3 RequestOnRequest ................................... 53 4.1.3 Upstream LSR: NotAvailable Procedure ............... 54 4.1.3.1 RequestRetry ....................................... 54 4.1.3.2 RequestNoRetry .....................................47 5 Security Considerations ............................ 47 6 Authors' Addresses ................................. 47 7 References ......................................... 47 Appendix A Why Egress Control is Better ....................... 48 Appendix B Why Local Control is Better ........................54 4.1.4 Upstream LSR: Release Procedure .................... 54 4.1.4.1 ReleaseOnChange .................................... 54 4.1.4.2 NoReleaseOnChange .................................. 54 4.1.5 Upstream LSR: labelUse Procedure ................... 55 4.1.5.1 UseImmediate ....................................... 55 4.1.5.2 UseIfLoopFree ...................................... 55 4.1.5.3 UseIfLoopNotDetected ............................... 55 4.1.6 Downstream LSR: Withdraw Procedure ................. 561. Introduction to4.2 MPLS1.1. Overview In connectionless network layer protocols, as a packetSchemes: Supported Combinations of Procedures . 56 4.2.1 TTL-capable LSP Segments ........................... 57 4.2.2 Using ATM Switches as LSRs ......................... 57 4.2.2.1 Without Multipoint-to-point Capability ............. 58 4.2.2.2 With Multipoint-To-Point Capability ................ 58 4.2.3 Interoperability Considerations .................... 59 Rosen, Viswanathan & Callon [Page 3] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 4.2.4 How to do Loop Prevention .......................... 60 4.2.5 How to do Loop Detection ........................... 60 4.2.6 Security Considerations ............................ 60 5 Authors' Addresses ................................. 60 6 References ......................................... 61 1. Introduction to MPLS 1.1. Overview In connectionless network layer protocols, as a packet travels from one router hop to the next, an independent forwarding decision is made at each hop. Each routeranalyzes the packet header, andruns a network layer routing algorithm. As a packet travels through the network, each router analyzes the packet header. The choice of next hop for a packet ischosenbased on the header analysis and the result of running the routing algorithm. Packet headers contain considerably more information than is needed simply to choose the next hop. Choosing the next hop can therefore be thought of as the composition of two functions. The first function partitions the entirepacket forwarding spaceset of possible packets into"forwarding equivalence classesa set of "Forwarding Equivalence Classes (FECs)". The second mapsthese FECseach FEC to a next hop.Multiple network layer headersInsofar as the forwarding decision is concerned, different packets which get mapped into the same FEC areindistinguishable, as far as the forwarding decision is concerned. The set ofindistinguishable. All packetsbelongingwhich belong tothe same FEC, travelinga particular FEC and which travel from acommon node,particular node will follow the samepath and be forwarded in the Rosen, Viswanathan & Callon [Page 3] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 same manner (for example, by being placed inpath. Such acommon queue) towards the destination. Thisset of packetsfollowing the same path, belonging to the same FEC (and therefore being forwarded in a common manner)may bereferred to ascalled a "stream". In conventional IP forwarding,multiple packets area particular router will typicallyassignedconsider two packets to be in the sameStream by a particular routerstream if there is some address prefix X in that router's routing tables such that X is the "longest match" for each packet's destination address. As the packet traverses the network, each hop in turn reexamines the packet and assigns it to a stream. In MPLS, themapping fromassignment of a particular packetheadersto a particular stream isperformeddone just once, as the packet enters the network. The stream to which the packet is assigned is encoded with a short fixed length value known as a "label". When a packet is forwarded to its next hop, the label is sent along with it; that is, the packets are "labeled". At subsequent hops, there is no further analysis of the packet's network layer header. Rather, the label is used as an index into a Rosen, Viswanathan & Callon [Page 4] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 table which specifies the next hop, and a new label. The old label is replaced with the new label, and the packet is forwarded to its next hop.This eliminates the needIf assignment toperformalongest match computationstream is based on a "longest match", this eliminates the need to perform a longest match computation for each packet at each hop; the computation can be performed just once. Some routers analyze a packet's network layer header not merely to choose the packet's next hop, but also to determine a packet's "precedence" or "class of service", in order to apply different discard thresholds or scheduling disciplines to different packets.In MPLS, this can alsoMPLS allows the precedence or class of service to be inferred from the label, so that no further header analysis isneeded.needed; in some cases MPLS provides a way to explicitly encode a class of service in the "label header". The fact that a packet is assigned to aStreamstream just once, rather than at every hop, allows the use of sophisticated forwarding paradigms. A packet that enters the network at a particular router can be labeled differently than the same packet entering the network at a different router, and as a result forwarding decisions that depend on the ingress point ("policy routing") can be easily made. In fact, the policy used to assign a packet to aStreamstream need not have only the network layer header as input; it may use arbitrary information about the packet, and/or arbitrary policy information as input. Since this decouples forwarding from routing, it allows one to use MPLS to support a large variety of routing policies that are difficult or impossible to support with just conventional network layer forwarding. Similarly, MPLS facilitates the use of explicit routing, without requiring that each IP packet carry the explicit route. Explicit routes may be useful to support policy routing and traffic engineering.Rosen, Viswanathan & Callon [Page 4] Internet Draft draft-ietf-mpls-arch-00.txt August 1997MPLS makes use of a routing approach whereby the normal mode of operation is that L3 routing (e.g., existing IP routing protocols and/or new IP routing protocols) is used by all nodes to determine the routed path. MPLS stands for "Multiprotocol" Label Switching, multiprotocol because its techniques are applicable to ANY network layer protocol. In this document, however, we focus on the use of IP as the network layer protocol. A router which supports MPLS is known as a "Label Switching Router", or LSR. A general discussion of issues related to MPLS is presented in "A Rosen, Viswanathan & Callon [Page 5] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 Framework for Multiprotocol Label Switching" [1]. 1.2. Terminology This section gives a general conceptual overview of the terms used in this document. Some of these terms are more precisely defined in later sections of the document. aggregate stream synonym of "stream" DLCI a label used in Frame Relay networks to identify frame relay circuits flow a single instance of an application to application flow of data (as in the RSVP and IFMP use of the term "flow") forwarding equivalence class a group of IP packets which are forwarded in the same manner (e.g., over the same path, with the same forwarding treatment) frame merge stream merge, when it is applied to operation over frame based media, so that the potential problem of cell interleave is not an issue. label a short fixed length physically contiguous identifier which is used to identify a stream, usually of local significance.Rosen, Viswanathan & Callon [Page 5] Internet Draft draft-ietf-mpls-arch-00.txt August 1997label information base the database of information containing label bindings label swap the basic forwarding operation consisting of looking up an incoming label to determine the outgoing label, encapsulation, port, and other data handling information. label swapping a forwarding paradigm allowing streamlined forwarding of data by using labels to identify streams of data to be forwarded. Rosen, Viswanathan & Callon [Page 6] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 label switched hop the hop between two MPLS nodes, on which forwarding is done using labels. label switched path the path created by the concatenation of one or more label switched hops, allowing a packet to be forwarded by swapping labels from an MPLS node to another MPLS node. layer 2 the protocol layer under layer 3 (which therefore offers the services used by layer 3). Forwarding, when done by the swapping of short fixed length labels, occurs at layer 2 regardless of whether the label being examined is an ATM VPI/VCI, a frame relay DLCI, or an MPLS label. layer 3 the protocol layer at which IP and its associated routing protocols operate link layer synonymous with layer 2 loop detection a method of dealing with loops in which loops are allowed to be set up, and data may be transmitted over the loop, but the loop is later detected and closed loop prevention a method of dealing with loops in which data is never transmitted over a loop label stack an ordered set of labelsRosen, Viswanathan & Callon [Page 6] Internet Draft draft-ietf-mpls-arch-00.txt August 1997loop survival a method of dealing with loops in which data may be transmitted over a loop, but means are employed to limit the amount of network resources which may be consumed by the looping data label switched path The path through one or more LSRs at one level of the hierarchy followed by a stream. label switching router an MPLS node which is capable of forwarding native L3 packets Rosen, Viswanathan & Callon [Page 7] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 merge point the node at which multiple streams and switched paths are combined into a single stream sent over a single path. Mlabel abbreviation for MPLS label MPLS core standards the standards which describe the core MPLS technology MPLS domain a contiguous set of nodes which operate MPLS routing and forwarding and which are also in one Routing or Administrative Domain MPLS edge node an MPLS node that connects an MPLS domain with a node which is outside of the domain, either because it does not run MPLS, and/or because it is in a different domain. Note that if an LSR has a neighboring host which is not running MPLS, that that LSR is an MPLS edge node. MPLS egress node an MPLS edge node in its role in handling traffic as it leaves an MPLS domain MPLS ingress node an MPLS edge node in its role in handling traffic as it enters an MPLS domain MPLS label a label placed in a short MPLS shim header used to identify streams MPLS node a node which is running MPLS. An MPLS node will be aware of MPLS control protocols, will operate one or more L3 routing protocols, and will be capable ofRosen, Viswanathan & Callon [Page 7] Internet Draft draft-ietf-mpls-arch-00.txt August 1997forwarding packets based on labels. An MPLS node may optionally be also capable of forwarding native L3 packets. MultiProtocol Label Switching an IETF working group and the effort associated with the working group network layer synonymous with layer 3 stack synonymous with label stack Rosen, Viswanathan & Callon [Page 8] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 stream an aggregate of one or more flows, treated as one aggregate for the purpose of forwarding in L2 and/or L3 nodes (e.g., may be described using a single label). In many cases a stream may be the aggregate of a very large number of flows. Synonymous with "aggregate stream". stream merge the merging of several smaller streams into a larger stream, such that for some or all of the path the larger stream can be referred to using a single label. switched path synonymous with label switched path virtual circuit a circuit used by a connection-oriented layer 2 technology such as ATM or Frame Relay, requiring the maintenance of state information in layer 2 switches. VC merge stream merge when it is specifically applied to VCs, specifically so as to allow multiple VCs to merge into one single VC VP merge stream merge when it is applied to VPs, specifically so as to allow multiple VPs to merge into one single VP. In this case the VCIs need to be unique. This allows cells from different sources to be distinguished via the VCI. VPI/VCI a label used in ATM networks to identify circuitsRosen, Viswanathan & Callon [Page 8] Internet Draft draft-ietf-mpls-arch-00.txt August 19971.3. Acronyms and Abbreviations ATM Asynchronous Transfer Mode BGP Border Gateway Protocol DLCI Data Link Circuit Identifier FEC Forwarding Equivalence Class STNStreamstream to NHLFE Map Rosen, Viswanathan & Callon [Page 9] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 IGP Interior Gateway Protocol ILM Incoming Label Map IP Internet Protocol LIB Label Information Base LDP Label Distribution Protocol L2 Layer 2 L3 Layer 3 LSP Label Switched Path LSR Label Switching Router MPLS MultiProtocol Label Switching MPT Multipoint to Point Tree NHLFE Next Hop Label Forwarding Entry SVC Switched Virtual Circuit SVP Switched Virtual Path TTL Time-To-Live VC Virtual Circuit VCI Virtual Circuit Identifier VP Virtual PathRosen, Viswanathan & Callon [Page 9] Internet Draft draft-ietf-mpls-arch-00.txt August 1997VPI Virtual Path Identifier 1.4. Acknowledgments The ideas and text in this document have been collected from a number of sources and comments received. We would like to thank Rick Boivie, Paul Doolan, Nancy Feldman, Yakov Rekhter, Vijay Srinivasan, and George Swallow for their inputs and ideas. Rosen, Viswanathan & Callon [Page 10] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 2. Outline of Approach In this section, we introduce some of the basic concepts of MPLS and describe the general approach to be used. 2.1. Labels A label is ashortshort, fixedlengthlength, locally significant identifier which is used to identify a stream. The label is based on the stream orforwarding equivalence classForwarding Equivalence Class that a packet is assigned to. The label does not directly encode the network layeraddress, and is basedaddress. The choice of label depends on the network layer address only to the extent that theforwarding equivalence class is basedForwarding Equivalence Class depends onthethat address. If Ru and Rd areneighboringLSRs, and Ru transmits a packet to Rd, they may agree to use label L to representStreamstream S for packets which are sent from Ru to Rd. That is, they can agree to a "mapping" between label L andStreamstream S for packets moving from Ru to Rd. As a result of such an agreement, L becomes Ru's "outgoing label" corresponding toStreamstream S for such packets; L becomes Rd's "incoming label" corresponding toStreamstream S for such packets. Note that L does not necessarily correspond toStreamstream S for any packets other than those which are being sent from Ru to Rd. Also, L is not an inherently meaningful value and does not have any network- wide value; the particular value assigned to L gets its meaning solely from the agreement between Ru and Rd. Sometimes it may be difficult or even impossible for Rd totell thattell, of an arriving packet carrying label L, that the label Lcomes fromwas placed in the packet by Ru, rather thanfromby some other LSR. (This will typically be the case when Ru and Rd are not direct neighbors.) In such cases, Rd must make sure that the mapping from label to FEC is one-to-one. That is, in such cases, Rd must not agree with Ru1 to use L for one purpose, while also agreeing with some other LSR Ru2 to use L for a different purpose. Rosen, Viswanathan & Callon [Page10]11] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 The scope of labels could be unique per interface, or unique per MPLS node, or unique in a network. If labels are unique within a network, no label swapping needs to be performed in the MPLS nodes in that domain. The packets are just label forwarded and not label swapped. The possible use of labels with network-wide scope is FFS.draft-ietf-mpls-arch-01.txt March 1998 2.2. Upstream and Downstream LSRs Suppose Ru and Rd have agreed to map label L toStreamstream S, for packets sent from Ru to Rd. Then with respect to this mapping, Ru is the "upstream LSR", and Rd is the "downstream LSR". The notion of upstream and downstream relate to agreements between nodes of the label values to be assigned for packets belonging to a particularStreamstream that might be traveling from an upstream node to a downstream node. This is independent of whether the routing protocol actually will cause any packets to be transmitted in that particular direction. Thus, Rd is the downstream LSR for a particular mapping for label L if it recognizes L-labeled packets from Ru as being inStreamstream S. This may be true even if routing does not actually forward packets forStreamstream S between nodes Rd and Ru, or if routing has made Ru downstream of Rd along the path which is actually used for packets inStreamstream S. 2.3. Labeled Packet A "labeled packet" is a packet into which a label has been encoded. The encoding can be done by means of an encapsulation which exists specifically for this purpose, or by placing the label in an available location in either of the data link or network layer headers. Of course, the encoding technique must be agreed to by the entity which encodes the label and the entity which decodes the label. 2.4. Label Assignment and Distribution; Attributes For unicast traffic in the MPLS architecture, the decision to bind a particular label L to a particularStreamstream S is made by the LSR which is downstream with respect to that mapping. The downstream LSR then informs the upstream LSR of the mapping. Thus labels are "downstream-assigned", and are "distributed upstream". A particular mapping of label L toStreamstream S, distributed by Rd to Ru, may have associated "attributes". If Ru, acting as a downstream LSR, also distributes a mapping of a label toStreamstream S, then under certainRosen, Viswanathan & Callon [Page 11] Internet Draft draft-ietf-mpls-arch-00.txt August 1997conditions, it may be required to also distribute the corresponding attribute that it received from Rd. Rosen, Viswanathan & Callon [Page 12] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 2.5. Label Distribution Protocol (LDP) A Label Distribution Protocol (LDP) is a set of procedures by which one LSR informs another of the label/Stream mappings it has made. Two LSRs which use an LDP to exchange label/Stream mapping information are known as "LDP Peers" with respect to the mapping information they exchange; we will speak of there being an "LDP Adjacency" between them. (N.B.: two LSRs may be LDP Peers with respect to some set of mappings, but not with respect to some other set of mappings.) The LDP also encompasses any negotiations in which two LDP Peers need to engage in order to learn of each other's MPLS capabilities. 2.6. The Label Stack So far, we have spoken as if a labeled packet carries only a single label. As we shall see, it is useful to have a more general model in which a labeled packet carries a number of labels, organized as a last-in, first-out stack. We refer to this as a "label stack".AtIN MPLS, EVERY FORWARDING DECISION IS BASED EXCLUSIVELY ON THE LABEL AT THE TOP OF THE STACK. Although, as we shall see, MPLS supports aparticular LSR,hierarchy, thedecision as to how to forwardprocessing of a labeled packet is completely independent of the level of hierarchy. The processing is always basedexclusivelyon thelabel at thetop label, without regard for the possibility that some number of other labels may have been "above it" in thestack.past, or that some number of other labels may be below it at present. An unlabeled packet can be thought of as a packet whose label stack is empty (i.e., whose label stack has depth 0). If a packet's label stack is of depth m, we refer to the label at the bottom of the stack as the level 1 label, to the label above it (if such exists) as the level 2 label, and to the label at the top of the stack as the level m label. The utility of the label stack will become clear when we introduce the notion of LSP Tunnel and the MPLS Hierarchy (sections2.19.32.21.3 and2.19.4).2.21.4). Rosen, Viswanathan & Callon [Page12]13] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997draft-ietf-mpls-arch-01.txt March 1998 2.7. The Next Hop Label Forwarding Entry (NHLFE) The "Next Hop Label Forwarding Entry" (NHLFE) is used when forwarding a labeled packet. It contains the following information: 1. the packet's next hop 2. the data link encapsulation to use when transmitting the packet 3. the way to encode the label stack when transmitting the packet 4. the operation to perform on the packet's label stack; this is one of the following operations: a) replace the label at the top of the label stack with a specified new label b) pop the label stack c) replace the label at the top of the label stack with a specified new label, and then push one or more specified new labels onto the label stack. Note that at a given LSR, the packet's "next hop" might be that LSR itself. In this case, the LSR would need to pop the top levellabel and examinelabel, andoperatethen "forward" the resulting packet to itself. It would then make another forwarding decision, based on what remains after theencapsulated packet.label stacked is popped. This may still be alower level label,labeled packet, or it may be the native IP packet. This implies that in some cases the LSR may need to operate on the IP header in order to forward the packet. If the packet's "next hop" is the current LSR, then the label stack operation MUST be to "pop the stack". 2.8. Incoming Label Map (ILM) The "Incoming Label Map" (ILM) is a mapping from incoming labels to NHLFEs. It is used when forwarding packets that arrive as labeled packets. Rosen, Viswanathan & Callon [Page 14] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 2.9. Stream-to-NHLFE Map (STN) The "Stream-to-NHLFE" (STN) is a mapping from stream to NHLFEs. It is used when forwarding packets that arrive unlabeled, but which are to be labeled before being forwarded.Rosen, Viswanathan & Callon [Page 13] Internet Draft draft-ietf-mpls-arch-00.txt August 19972.10. Label Swapping Label swapping is the use of the following procedures to forward a packet. In order to forward a labeled packet, a LSR examines the label at the top of the label stack. It uses the ILM to map this label to an NHLFE. Using the information in the NHLFE, it determines where to forward the packet, and performs an operation on the packet's label stack. It then encodes the new label stack into the packet, and forwards the result. In order to forward an unlabeled packet, a LSR analyzes the network layer header, to determine the packet'sStream.stream. It then uses theFTNSTN to map this to an NHLFE. Using the information in the NHLFE, it determines where to forward the packet, and performs an operation on the packet's label stack. (Popping the label stack would, of course, be illegal in this case.) It then encodes the new label stack into the packet, and forwards the result.It is importantIT IS IMPORTANT TO NOTE THAT WHEN LABEL SWAPPING IS IN USE, THE NEXT HOP IS ALWAYS TAKEN FROM THE NHLFE; THIS MAY IN SOME CASES BE DIFFERENT FROM WHAT THE NEXT HOP WOULD BE IF MPLS WERE NOT IN USE. 2.11. Scope and Uniqueness of Labels A given LSR Rd may map label L1 tonotestream S, and distribute thatwhenmapping to LDP peer Ru1. Rd may also map labelswapping is in use, the next hopL2 to stream S, and distribute that mapping to LDP peer Ru2. Whether or not L1 == L2 isalways taken fromnot determined by theNHLFE;architecture; thismay in some cases be different from what the next hop would be if MPLS were not in use. 2.11. Label Switched Path (LSP), LSP Ingress, LSP Egress A "Label Switched Path (LSP) of level m" for a particular packet Pis asequence of LSRs, <R1, ..., Rn> with the following properties: 1. R1, the "LSP Ingress", pushes a label onto P's label stack, resulting in a label stack of depth m; 2. For all i, 1<i<n, P has alocal matter. A given LSR Rd may map labelstack of depth m when received by Ri; 3. At no time during P's transit from R1L toR[n-1] does itsstream S1, and distribute that mapping to LDP peer Ru1. Rd may also map labelstack ever have a depth of less than m; 4. For all i, 1<i<n: Ri transmits PL toR[i+1] by means of MPLS, i.e., by usingstream S2, and distribute that mapping to LDP peer Ru2. IF (AND ONLY IF) RD CAN TELL, WHEN IT RECEIVES A PACKET WHOSE TOP LABEL IS L, WHETHER THE LABEL WAS PUT THERE BY RU1 OR BY RU2, THEN THE ARCHITECTURE DOES NOT REQUIRE THAT S1 == S2. In general, Rd can only tell whether it was Ru1 or Ru2 that put the particular label value L at the top of the label stack(the level m label) as an index into an ILM;if the following conditions hold: Rosen, Viswanathan & Callon [Page14]15] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 5. For all i, 1<i<n: ifdraft-ietf-mpls-arch-01.txt March 1998 - Ru1 and Ru2 are the only LDP peers to which Rd distributed asystem S receivesmapping of label value L, andforwards P after P is transmitted by Ri but before P is received by R[i+1] (e.g., Ri- Ru1 andR[i+1] might beRu2 are each directly connected to Rd via aswitched data link subnetwork, and S might be one of the data link switches), then S's forwarding decision ispoint-to- point interface. When these conditions hold, an LSR may use labels that have "per interface" scope, i.e., which are only unique per interface. When these conditions do notbased onhold, thelevel m label, or on the network layer header. This maylabels must bebecause: a)unique over thedecisionLSR which has assigned them. If a particular LSR Rd isnot based on theattached to a particular LSR Ru over two point-to-point interfaces, then Rd may distribute to Rd a mapping of labelstack or the network layer header at all; b) the decision is based onL to stream S1, as well as a mapping of labelstack onL to stream S2, S1 != S2, if and only if each mapping is valid only for packets whichadditional labels have been pushed (i.e., onRu sends to Rd over alevel m+k label, where k>0).particular one of the interfaces. In all otherwords, we can speakcases, Rd MUST NOT distribute to Ru mappings of thelevel msame label value to two different streams. This prohibition holds even if the mappings are regarded as being at different "levels of hierarchy". In MPLS, there is no notion of having a different label space for different levels of the hierarchy. 2.12. Label Switched Path (LSP), LSP Ingress, LSP Egress A "Label Switched Path (LSP) of level m" forPacketa particular packet Pas theis a sequence ofLSRs: 1. which beginsrouters, <R1, ..., Rn> with the following properties: 1. R1, the "LSP Ingress", is an LSR(an "LSP Ingress") thatwhich pushesonalevel m label, 2. all of whose intermediate LSRs make their forwarding decision bylabelSwitching on a level m label, 3. which ends (at an "LSP Egress") whenonto P's label stack, resulting in aforwarding decision is made bylabelSwitching onstack of depth m; 2. For all i, 1<i<n, P has alevel m-k label, where k>0, orlabel stack of depth m whena forwarding decision is madereceived by"ordinary", non-MPLS forwarding procedures. A consequence (or perhaps a presupposition) of this is that whenever anLSRpushes a label onto an already labeled packet, it needs to make sure that the new label correspondsRi; 3. At no time during P's transit from R1 toa FEC whose LSP Egress is the LSR that assigned theR[n-1] does its labelwhich is now second in the stack. Note that according to these definitions, if <R1, ..., Rn> isstack ever have alevel m LSP for packet P,depth of less than m; 4. For all i, 1<i<n: Ri transmits Pmay be transmitted from R[n-1]toRn with a label stackR[i+1] by means ofdepth m-1. That is,MPLS, i.e., by using the labelstack may be poppedat thepenultimate LSRtop of theLSP, rather than at the LSP Egress. This is appropriate, since the level mlabelhas served its function of getting the packet to Rn, and Rn's forwarding decision cannot be made until thestack (the level mlabellabel) as an index into an ILM; Rosen, Viswanathan & Callon [Page 16] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 5. For all i, 1<i<n: if a system S receives and forwards P after P ispopped. If the label stacktransmitted by Ri but before P isnot poppedreceived byR[n-1], then Rn must do two label lookups; this is an overhead which is best avoided. However, some hardware switching engines may notR[i+1] (e.g., Ri and R[i+1] might beable to pop the label stack. The penultimate node popsconnected via a switched data link subnetwork, and S might be one of thelabel stack only if thisdata link switches), then S's forwarding decision isspecifically requested by the egress node. Having the penultimate node pop the label stack has an implicationnot based on theassignment of labels: For any one node Rn, operating atlevel minlabel, or on theMPLS Rosen, Viswanathan & Callon [Page 15] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 hierarchy, therenetwork layer header. This may besome LSPs which terminate at that node (i.e., for which Rn isbecause: a) theegress node) and some other LSPs which continue beyond that node (i.e., for which Rndecision isan intermediate node). If the penultimate node R[n-1] popsnot based on the label stackfor those LSPs which terminate at Rn, then node R[n] will receive some packets for whichor thetop ofnetwork layer header at all; b) thestackdecision is based on alevel mlabel stack on which additional labels have been pushed (i.e.,packets destined foron a level m+k label, where k>0). In otheregress nodes), and some packets for which the topwords, we can speak of thestack is alevelm-1 label (i.e., packets for which Rn is the egress). This implies that in orderm LSP fornode R[n-1] to popPacket P as thestack, node Rn must assign labels suchsequence of routers: 1. which begins with an LSR (an "LSP Ingress") that pushes on a level mand level m-1 labels are distinguishable (i.e., use unique values across multiple levelslabel, 2. all ofthe MPLS hierarchy). Note that ifwhose intermediate LSRs make their forwarding decision by label Switching on a level m= 1, the LSP Egress may receivelabel, 3. which ends (at anunlabeled packet, and in fact need not even be capable of supporting MPLS. In this case, assuming that we are using globally meaningful IP addresses, the confusion of labels at multiple levels is not possible. However, it"LSP Egress") when a forwarding decision ispossible that themade by labelmay still beSwitching on a level m-k label, where k>0, or when a forwarding decision is made by "ordinary", non-MPLS forwarding procedures. A consequence (or perhaps a presupposition) ofvalue for the egress node. One examplethis is thatthe label may be used to assign the packet towhenever an LSR pushes aparticular Forwarding Equivalence Class (for example,label onto an already labeled packet, it needs toidentify the packet as a high priority packet). Another example ismake sure that the new labelmay assign the packetcorresponds to aparticular virtual private network (for example, the virtual private network may make use of local IP addresses, andFEC whose LSP Egress is thelabel may be necessary to disambiguateLSR that assigned theaddresses). Therefore even when there is only a singlelabelvalue the stackwhich isnonetheless popped only when requested bynow second in theegress node.stack. We will call a sequence of LSRs the "LSP for a particularStreamstream S" if it is an LSP of level m for a particular packet P when P's level m label is a label corresponding toStreamstream S.2.12. LSP Next Hop TheConsider the set of nodes which may be LSPNext Hopingress nodes fora particular labeled packet in a particular LSRstream S. Then there isthe LSRan LSP for stream S whichis the next hop, as selected bybegins with each of those nodes. If a number of those LSPs have theNHLFE entry used for forwarding that packet. Thesame LSPNext Hop foregress, then one can consider the set of such LSPs to be aparticular Streamtree, whose root is thenext hop as selected byLSP egress. (Since data travels along this tree towards theNHLFE entry indexed byroot, this may be called alabel which corresponds to that Stream.multipoint-to-point tree.) We can thus speak of the "LSP tree" for a particular stream S. Rosen, Viswanathan & Callon [Page16]17] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997draft-ietf-mpls-arch-01.txt March 1998 2.13.Route Selection Route selection refersPenultimate Hop Popping Note that according to themethod used for selecting the LSP fordefinitions of section 2.11, if <R1, ..., Rn> is aparticular stream. The proposed MPLS protocol architecture supports two optionslevel m LSP forRoute Selection: (1) Hop by hop routing, and (2) Explicit routing. Hop by hop routing allows each nodepacket P, P may be transmitted from R[n-1] toindependently chooseRn with a label stack of depth m-1. That is, thenext hop forlabel stack may be popped at thepath for a stream. This ispenultimate LSR of thenormal mode today with existing datagram IP networks. A hop by hop routed LSP refers to an LSP whose route is selected using hop by hop routing. An explicitly routed LSP is an LSP where,LSP, rather than ata given LSR,the LSPnext hop is not chosen by each local node, but ratherEgress. From an architectural perspective, this ischosen by a single node (usually the ingress or egress node of the LSP).perfectly appropriate. Thesequencepurpose ofLSRs followed by an explicit routing LSP may be chosen by configuration, or by a protocol selected by a single node (for example,theegress node may make use oflevel m label is to get thetopological information learned from a link state database in orderpacket tocomputeRn. Once R[n-1] has decided to send theentire path forpacket to Rn, thetree ending at that egress node). Explicit routing maylabel no longer has any function, and need no longer beuseful forcarried. There is also anumber of purposes such as allowing policy routing and/or facilitating traffic engineering. With MPLS the explicit route needspractical advantage tobe specified at the time that Labels are assigned, but the explicit routedoing penultimate hop popping. If one does nothave to be specified with each IP packet. This impliesdo this, then when the LSP egress receives a packet, it first looks up the top label, and determines as a result of thatexplicit routing with MPLSlookup that it isrelatively efficient (when compared withindeed theefficiencyLSP egress. Then it must pop the stack, and examine what remains ofexplicit routingthe packet. If there is another label on the stack, the egress will look this up and forward the packet based on this lookup. (In this case, the egress forpure datagrams). For any onethe packet's level m LSP(at any oneis also an intermediate node for its levelof hierarchy),m-1 LSP.) If thereare two possible options: (i) The entire LSP may be hop by hop routed from ingressis no other label on the stack, then the packet is forwarded according toegress; (ii) The entire LSP may be explicit routed from ingressits network layer destination address. Note that this would require the egress toegress. Intermediate casesdonot make sense: In general, an LSP will be explicit routed specifically because there isTWO lookups, either two label lookups or agood reason to uselabel lookup followed by analternative toaddress lookup. If, on the other hand, penultimate hopbypopping is used, then when the penultimate hoprouted path. This implies that if some oflooks up thenodes alonglabel, it determines: - that it is thepath follow an explicit route but some ofpenultimate hop, and - who thenodes make use of hop bynext hoprouting,is. The penultimate node theninconsistent routing will resultpops the stack, andloops (or severely inefficient paths) may form. For this reason, it is important that if an explicit route is specified for an LSP, thenforward the packet based on the information gained by looking up the label thatroute mustwas at the top of the stack. When the LSP egress receives the packet, the label at the top of the stack will befollowed. Note thatthe label which itis relatively simpleneeds to*follow* an explicit route which is specifiedlook up in order to make its own forwarding decision. Or, if the packet was only carrying aLDP setup. We therefore propose thatsingle label, theLDP specification require that all MPLS nodes implementLSP egress will simply see theability to follow an explicit route if this is specified. Itnetwork layer packet, which isnot necessary for a nodejust what it needs tobe able to create an explicit route. However,see in order toensure interoperabilitymake its forwarding decision. This technique allows the egress to do a single lookup, and also requires only a single lookup by the penultimate node. The creation of the forwarding fastpath in a label switching product may be greatly aided if it isnecessaryknown that only a single lookup is every required: Rosen, Viswanathan & Callon [Page17]18] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 to ensure that either (i) Every node knows how to use hop by hop routing; or (ii) Every node knows how to create and follow an explicit route. We propose that due todraft-ietf-mpls-arch-01.txt March 1998 - thecommon use of hop by hop routing in networks today,code may be simplified if it can assume that only a single lookup isreasonable to make hop by hop routingever needed - thedefaultcode can be based on a "time budget" thatall nodesassumes that only a single lookup is ever needed. In fact, when penultimate hop popping is done, the LSP Egress needtonot even be an LSR. However, some hardware switching engines may not be able touse. 2.14. Time-to-Live (TTL) In conventional IP forwarding, each packet carries a "Time To Live" (TTL) valuepop the label stack, so this cannot be universally required. There may also be some situations inits header. Whenever a packet passes through a router, its TTL gets decrementedwhich penultimate hop popping is not desirable. Therefore the penultimate node pops the label stack only if this is specifically requested by1;the egress node, or if theTTL reaches 0 beforenext node in thepacket has reached its destination,LSP does not support MPLS. (If thepacket gets discarded. This provides some level of protection against forwarding loops that may exist due to misconfigurations, or due to failure or slow convergence ofnext node in therouting algorithm. TTL is sometimes used for other functions as well,LSP does support MPLS, but does not make suchas multicast scoping, and supporting the "traceroute" command. This implies that there are two TTL-related issues that MPLS needs to deal with: (i) TTL as a way to suppress loops; (ii) TTL asaway to accomplish other functions, such as limitingrequest, thescopepenultimate node has no way ofa packet. When a packet travels along an LSP, it should emerge with the same TTL valueknowing that itwould have had if it had traversedin fact is thesame sequencepenultimate node.) An LSR which is capable ofrouters without having been label switched. Ifpopping thepacket travels along a hierarchylabel stack at all MUST do penultimate hop popping when so requested by its downstream LDP peer. Initial LDP negotiations must allow each LSR to determine whether its neighboring LSRS are capable ofLSPs,popping thetotal number of LSR- hops traversed should be reflected in its TTL value when it emerges fromlabel stack. A LSR will not request an LDP peer to pop thehierarchy of LSPs. The way that TTLlabel stack unless it ishandledcapable of doing so. It mayvary depending uponbe asked whether theMPLSegress node can always interpret the top labelvalues are carried in an MPLS-specific "shim" header, orof a received packet properly if penultimate hop popping is used. As long as theMPLS labelsuniqueness and scoping rules of section 2.11 arecarried in an L2 header such as an ATM header or a frame relay header. Ifobeyed, it is always possible to interpret the top labelvalues are encoded inof a"shim" that sits between the data link and network layer headers, then this shim should havereceived packet unambiguously. 2.14. LSP Next Hop The LSP Next Hop for aTTL field thatparticular labeled packet in a particular LSR isinitially loaded fromthenetwork layer header TTL field, is decremented at each LSR-hop, andLSR which iscopied intothenetwork layer header TTL field whennext hop, as selected by thepacket emerges from its LSP. IfNHLFE entry used for forwarding that packet. The LSP Next Hop for a particular stream is the next hop as selected by the NHLFE entry indexed by a labelvalues are encoded in an L2 header (e.g.,which corresponds to that stream. Note that theVPI/VCI field in ATM's AAL5 header), andLSP Next Hop may differ from thelabeled packets are forwardednext hop which would be chosen byan L2 switch (e.g., an ATM switch). This implies that unlessthedata linknetwork layeritself has a TTL field (unlike ATM), itrouting algorithm. We willnot be possibleuse the term "L3 next hop" when we refer todecrement a packet's TTL at each LSR-hop. An LSP segment which consists of a sequence of LSRs that cannot decrement a packet's TTL will be called a "non-TTL LSP segment".the latter. Rosen, Viswanathan & Callon [Page18]19] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 When a packet emerges from a non-TTLdraft-ietf-mpls-arch-01.txt March 1998 2.15. Route Selection Route selection refers to the method used for selecting the LSPsegment, it should however be givenfor aTTL that reflects the number of LSR-hops it traversed. In the unicast case, this can be achievedparticular stream. The proposed MPLS protocol architecture supports two options for Route Selection: (1) Hop bypropagating a meaningful LSP lengthhop routing, and (2) Explicit routing. Hop by hop routing allows each node toingress nodes, enablingindependently choose theingress to decrementnext hop for theTTL value before forwarding packets intopath for anon-TTLstream. This is the normal mode today with existing datagram IP networks. A hop by hop routed LSPsegment. Sometimes it can be determined, upon ingressrefers toa non-TTLan LSPsegment, thatwhose route is selected using hop by hop routing. An explicitly routed LSP is an LSP where, at aparticular packet's TTL will expire before the packet reachesgiven LSR, theegress of that non-TTLLSPsegment. In this case, the LSR atnext hop is not chosen by each local node, but rather is chosen by a single node (usually the ingresstoor egress node of thenon-TTLLSP). The sequence of LSRs followed by an explicitly routed LSPsegment must not label switch the packet. This means that special procedures mustmay bedeveloped to support traceroute functionality, for example, traceroute packetschosen by configuration, or may beforwarded using conventional hopselected dynamically byhop forwarding. 2.15. Loop Control Onanon-TTL LSP segment, by definition, TTL cannot be used to protect against forwarding loops. The importance of loop controlsingle node (for example, the egress node maydepend onmake use of theparticular hardware being usedtopological information learned from a link state database in order toprovide the LSR functions alongcompute thenon-TTL LSP segment. Suppose,entire path forinstance,the tree ending at thatATM switching hardware is being used to provideegress node). Explicit routing may be useful for a number of purposes such as allowing policy routing and/or facilitating traffic engineering. With MPLSswitching functions, with the label being carried in the VPI/VCI field. Since ATM switching hardware cannot decrement TTL, there is no protection against loops. IftheATM hardware is capable of providing fair accessexplicit route needs to be specified at thebuffer pool for incoming cells carrying different VPI/VCI values, this looping maytime that labels are assigned, but the explicit route does not haveany deleterious effect on other traffic. If the ATM hardware cannot provide fair buffer access of this sort, however, then even transient loops may cause severe degradation of the LSR's total performance. Even if fair buffer access canto beprovided, itspecified with each IP packet. This implies that explicit routing with MPLS isstill worthwhile to have some meansrelatively efficient (when compared with the efficiency ofdetecting loops that last "longer than possible". In addition, even where TTL and/or per-VC fair queuing provides a meansexplicit routing forsurviving loops, it stillpure datagrams). For any one LSP (at any one level of hierarchy), there are two possible options: (i) The entire LSP may bedesirable where practicalhop by hop routed from ingress toavoid setting up LSPs which loop.egress; (ii) TheMPLS architecture will therefore provide a technique for ensuring that loopingentire LSPsegments canmay bedetected, and a technique for ensuring that looping LSP segments are never created. Rosen, Viswanathan & Callon [Page 19] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 2.15.1. Loop Prevention LSR's maintain for each of their LSP's an LSR id list. This list is a list of all the LSR's downstreamexplicit routed fromthis LSR on a given LSP. The LSR id list is usedingress toprevent the formation of switched path loops. The LSR ID listegress. Intermediate cases do not make sense: In general, an LSP will be explicit routed specifically because there ispropagated upstream fromanodegood reason toits neighbor nodes. The LSR ID list is useduse an alternative toprevent loops as follows: When a node, R, detects a change inthenexthopfor a given stream, it asks its new nextby hopfor a label and the associated LSR ID list forrouted path. This implies thatstream. The new next hop responds with a label forif some of thestream andnodes along the path follow anassociated LSR id list. R looks inexplicit route but some of theLSR id list. If R determinesnodes make use of hop by hop routing, then inconsistent routing will result and loops (or severely inefficient paths) may form. For this reason, it is important thatit, R,if an explicit route isin the listspecified for an LSP, thenwe have athat routeloop. In this case, we do nothing and the old LSP will continue tomust beused until the route protocols break the loop. The means by which the old LSPfollowed. Note that it isreplaced by a new LSP after therelatively simple to *follow* an explicit routeprotocols breathe loop is described below. If Rwhich isnotspecified inthe LSR id list, R will starta"diffusion" computation [12]. The purpose of the diffusion computation is to pruneLDP setup. We therefore propose that thetree upstream of R soLDP specification require thatwe removeallLSR's fromMPLS nodes implement thetree that would be on a looping path if R were to switch overability tothe new LSP. After those LSR's are removed from the tree, itfollow an explicit route if this issafespecified. It is not necessary forRa node toreplace the old LSP with the new LSP (and the old LSP canbereleased). The diffusion computation works as follows: R adds its LSR id to the list and sends a query messageable toeach of its "upstream" neighbors (i.e.create an explicit route. However, in order toeach of its neighbors thatensure interoperability it isnot the new "downstream" next hop). A node Snecessary Rosen, Viswanathan & Callon [Page 20] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 to ensure thatreceives such a query will process the query as follows: - If node R is noteither (i) Every nodeS's nextknows how to use hopfor the given stream,by hop routing; or (ii) Every nodeS will respondknows how tonode R willcreate and follow an"OK" message meaningexplicit route. We propose thatas far as node S is concerned it is safe for node R to switch overdue to thenew LSP. - If node R is node S's nextcommon use of hopfor the stream, node S will check to see if it, node S, isby hop routing inthe LSR id list that it received from node R. Ifnetworks today, itis, we have a route loop and S will respond with a "LOOP" message. R will unsplice the connectionis reasonable toS pruning S from the tree. The mechanismmake hop bywhich S will get a new LSP for Rosen, Viswanathan & Callon [Page 20] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 the stream after the route protocols break the loop is described below. - If node S is not inhop routing theLSR id list, S will add its LSR iddefault that all nodes need tothe LSR id list and send a new query message further upstream. The diffusion computation will continuebe able topropagate upstream alonguse. 2.16. Time-to-Live (TTL) In conventional IP forwarding, eachof the paths in the tree upstream of S until eitherpacket carries aloop is detected,"Time To Live" (TTL) value inwhich case the node is pruned as described above or we get to a point whereits header. Whenever anode getspacket passes through aresponse ("OK" or "LOOP") from each ofrouter, itsneighbors perhaps because none of those neighbors considersTTL gets decremented by 1; if thenode in question to be its downstream next hop. Once a nodeTTL reaches 0 before the packet hasreceived a response from each of its upstream neighbors, it returns an "OK" message toreached itsdownstream neighbor. Whendestination, theoriginal node, node R,packet getsa response from eachdiscarded. This provides some level ofits neighbors, it is safeprotection against forwarding loops that may exist due toreplace the old LSP withmisconfigurations, or due to failure or slow convergence of thenew one because allrouting algorithm. TTL is sometimes used for other functions as well, such as multicast scoping, and supporting thepaths"traceroute" command. This implies thatwould loop have been pruned from the tree. Therethere area couple of details to discuss: - First, we need to do something about nodestwo TTL-related issues thatfor one reason or another do not produce a timely response in responseMPLS needs to deal with: (i) TTL as aquery message. If a node Y does not respondway to suppress loops; (ii) TTL as aquery from node X because of a failure of some kind, X will not be able to respond to its downstream neighbors (if any) or switch overway toa new LSP if X is, like R above, the node that has detected the route change. This problem is handled by timing outaccomplish other functions, such as limiting thequery message. If a node doesn't receivescope of aresponse withinpacket. When a"reasonable" period of time,packet travels along an LSP, it"unsplices" its VC toshould emerge with theupstream neighborsame TTL value thatis not responding and proceeds asit would have had if it hadreceivedtraversed the"LOOP" message. - We also need to be concerned about multiple concurrent routing updates. What happens, for example, when a node M receives a request for an LSP from an upstream neighbor, N,same sequence of routers without having been label switched. If the packet travels along a hierarchy of LSPs, the total number of LSR- hops traversed should be reflected in its TTL value whenMit emerges from the hierarchy of LSPs. The way that TTL is handled may vary depending upon whether the MPLS label values are carried in an MPLS-specific "shim" header, or if themiddle ofMPLS labels are carried in an L2 header such as an ATM header or adiffusion computation i.e., it has sentframe relay header. If the label values are encoded in aquery upstream but hasn't received all"shim" that sits between theresponses. Sincedata link and network layer headers, then this shim should have adownstream node, node RTTL field that isabout to changeinitially loaded fromone LSP to another, M needs to pass to N an LSR id list corresponding to the union oftheold and new LSP's if itnetwork layer header TTL field, isto avoid loops both beforedecremented at each LSR-hop, andafter the transition. Thisiseasily accomplished since M already has the LSR id list forcopied into theold LSP and it getsnetwork layer header TTL field when theLSR id list forpacket emerges from its LSP. If thenew LSPlabel values are encoded in an L2 header (e.g., thequery message. After R makesVPI/VCI field in ATM's AAL5 header), and the labeled packets are forwarded by an L2 switchfrom(e.g., an ATM switch). This implies that unless theold LSP to the new one, R sends a new establish message upstream with the LSR id list of (just) the new LSP. At this point, the nodes upstream of R know that Rdata link layer itself hasswitched overa TTL field (unlike ATM), it will not be possible tothe newdecrement a packet's TTL at each LSR-hop. An LSPandsegment which consists of a sequence of LSRs thatthey can return the id list for (just) the newcannot decrement a packet's TTL will be called a "non-TTL LSPin response to any new requests for LSP's.segment". Rosen, Viswanathan & Callon [Page 21] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 They can also grow the tree to include additional nodes that would not have been valid for the combined LSR id list. - We also need to discuss howdraft-ietf-mpls-arch-01.txt March 1998 When anode that doesn't have an LSP forpacket emerges from a non-TTL LSP segment, it should however be givenstream ata TTL that reflects theendnumber ofa diffusion computation (becauseLSR-hops itwould have been on a looping LSP) gets one after the routing protocols breaktraversed. In theloop. If node L has been pruned fromunicast case, this can be achieved by propagating a meaningful LSP length to ingress nodes, enabling thetree and its local route protocol processing entity breaksingress to decrement theloop by changing L's next hop, L will requestTTL value before forwarding packets into anewnon-TTL LSPfrom its new downstream neighbor whichsegment. Sometimes it can be determined, upon ingress to a non-TTL LSP segment, that a particular packet's TTL willuse once it executesexpire before thediffusion computation as described above. If the loop is broken by a route change at another point inpacket reaches theloop, i.e. at a point "downstream"egress ofL, L will get a new LSP as the newthat non-TTL LSPtree grows upstream from the point ofsegment. In this case, theroute change as discussed inLSR at theprevious paragraph. - Note that when a node is pruned fromingress to thetree,non-TTL LSP segment must not label switch theswitched path upstream of that node remains "connected".packet. Thisis important since it allows the switched path to get "reconnected"means that special procedures must be developed to support traceroute functionality, for example, traceroute packets may be forwarded using conventional hop by hop forwarding. 2.17. Loop Control On adownstream switched path after a route change with a minimal amountnon-TTL LSP segment, by definition, TTL cannot be used to protect against forwarding loops. The importance ofunsplicing and resplicing onceloop control may depend on theappropriate diffusion computation(s) have taken place. The LSR Id list can also beparticular hardware being used to providea "loop detection" capability. To use it in this manner, an LSR which sees that it is already inthe LSRId list for a particular stream will immediately unsplice itself fromfunctions along theswitched pathnon-TTL LSP segment. Suppose, for instance, thatstream, and will NOT passATM switching hardware is being used to provide MPLS switching functions, with theLSR Id list further upstream. The LSR can rejoin a switched path forlabel being carried in thestream when it changes its next hopVPI/VCI field. Since ATM switching hardware cannot decrement TTL, there is no protection against loops. If the ATM hardware is capable of providing fair access to the buffer pool forthat stream, or whenincoming cells carrying different VPI/VCI values, this looping may not have any deleterious effect on other traffic. If the ATM hardware cannot provide fair buffer access of this sort, however, then even transient loops may cause severe degradation of the LSR's total performance. Even if fair buffer access can be provided, itreceivesis still worthwhile to have some means of detecting loops that last "longer than possible". In addition, even where TTL and/or per-VC fair queuing provides anew LSR Id list from its current next hop, in whichmeans for surviving loops, itis not contained. The diffusion computation wouldstill may beomitted. 2.15.2. Interworking of Loop Control Optionsdesirable where practical to avoid setting up LSPs which loop. The MPLSprotocolarchitectureallows some nodes towill therefore provide a technique for ensuring that looping LSP segments can beusing loop prevention, while some other nodesdetected, and a technique for ensuring that looping LSP segments arenot (i.e., the choice of whether or not to use loop prevention maynever created. All LSRs will be required to support alocal decision). When this mix is used, it is not possiblecommon technique foraloopto form which includes only nodes which dodetection. Support for the loopprevention. However,prevention technique is optional, though it ispossible for loopsrecommended in ATM-LSRs that have no other way toform which contain a combinationprotect themselves against the effects ofsome nodes which do loop prevention, and some nodes which do not. There are at least four identified cases in which it makes sense to combine nodes which dolooping data packets. Use of the loop preventionwith nodes which do not: (i) For transition, in intermediate states while transitioning from all non-loop-prevention to all loop prevention, or vice versa; (ii) Fortechnique, when supported, is optional. Rosen, Viswanathan & Callon [Page 22] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 interoperability, where one vendor implementsdraft-ietf-mpls-arch-01.txt March 1998 2.17.1. Loop Prevention NOTE: The loop preventionbut another vendor does not; (iii) Where theretechnique described here isa mixed ATM and datagram media network,being reconsidered, andwhere loop prevention is desired over the ATM portionsmay be changed. LSR's maintain for each ofthe network but not over the datagram portions; (iv) where sometheir LSP's an LSR id list. This list is a list of all theATM switches can do fair access to the buffer poolLSR's downstream from this LSR on aper-VC basis, and some cannot, and loop preventiongiven LSP. The LSR id list isdesired overused to prevent theATM portionsformation ofthe network which cannot. Note that interworking is straightforward. If answitched path loops. The LSR ID list isnot doing loop prevention, and it receivespropagated upstream from adownstreamnode to its neighbor nodes. The LSR ID list is used to prevent loops as follows: When alabel mapping which contains loop prevention information, it (a) acceptsnode, R, detects a change in the next hop for a given stream, it asks its new next hop for a labelmapping, (b) does NOT pass the loop prevention information upstream,and(c) informs the downstream neighbor thatthepath is loop-free. Similarly, ifassociated LSR ID list for that stream. The new next hop responds with a label for the stream and an associated LSR id list. Rwhich is doing loop prevention receives from a downstreamlooks in the LSR id list. If R determines that it, R, is in the list then we have alabel mappingroute loop. In this case, we do nothing and the old LSP will continue to be used until the route protocols break the loop. The means by whichdoes not contain any loop prevention information, then R passesthelabel mapping upstream withold LSP is replaced by a new LSP after the route protocols breathe loopprevention information included as ifis described below. If Rwere the egress foris not in thespecified stream. Optionally,LSR id list, R will start anode"diffusion" computation [12]. The purpose of the diffusion computation ispermittedtoimplementprune theabilitytree upstream ofeither doing or not doing loop prevention as options, and is permittedR so that we remove all LSR's from the tree that would be on a looping path if R were tochoose whichswitch over touse for any one particular LSP based on the information obtained from downstream nodes. Whenthelabel mapping arrivesnew LSP. After those LSR's are removed fromdownstream, thenthenode may choose whethertree, it is safe for R touse loop prevention soreplace the old LSP with the new LSP (and the old LSP can be released). The diffusion computation works as follows: R adds its LSR id tocontinue to usethesame approach as was used in the information passed to it. Note that regardless of whether loop prevention is used the egress nodes (for any particular LSP) always initiates exchange of label mapping information without waiting for other nodes to act. 2.16. Merginglist andNon-Merging LSRs Merge allows multiple upstream LSPs to be merged into a single downstream LSP. When implemented by multiple nodes, this results in the traffic going tosends aparticular egress nodes, based on one particular Stream,query message tofollow a multipointeach of its "upstream" neighbors (i.e. topoint tree (MPT), with the MPT rooted ateach of its neighbors that is not theegressnew "downstream" next hop). A nodeand associated with the Stream. This can haveS that receives such asignificant effect on reducingquery will process thenumber of labels that need to be maintained by any one particular node.query as follows: - Ifmerge wasnode R is notused at all it would be necessarynode S's next hop foreachthe given stream, node S will respond toprovide the upstream neighbors with a label for each Streamnode R will an "OK" message meaning that as far as node S is concerned it is safe foreach upstreamnodewhich may be forwarding trafficR to switch overthe link. This implies that the number of labels needed might not in general be known a priori. However, the use of merge allows a single labeltobethe new LSP. Rosen, Viswanathan & Callon [Page 23] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 used per Stream, therefore allowing label assignment to be done in a common way without regard for the number of upstream nodes which will be usingdraft-ietf-mpls-arch-01.txt March 1998 - If node R is node S's next hop for thedownstream LSP. The proposed MPLS protocol architecture supports LSP merge, while allowing nodes which do not support LSP merge. This leadsstream, node S will check tothe issue of ensuring correct interoperation between nodes which implement merge and those which do not. The issuesee if it, node S, issomewhat differentin thecase of datagram media versus the case of ATM. The different media types will therefore be discussed separately. 2.16.1. Stream Merge Let us say that anLSRis capable of Stream Merge ifid list that itcan receive two packetsreceived fromdifferent incoming interfaces, and/or with different labels,node R. If it is, we have a route loop andsend both packets out the same outgoing interfaceS will respond with a "LOOP" message. R will unsplice thesame label. This in effect takes two incoming streams and merges them into one. Onceconnection to S pruning S from thepackets are transmitted,tree. The mechanism by which S will get a new LSP for theinformation that they arrived from different interfaces and/or with different incoming labelsstream after the route protocols break the loop islost. Let us say that an LSRdescribed below. - If node S is notcapable of Stream Merge if, for any two packets which arrive from different interfaces, or with different labels,in thepackets must either be transmitted out different interfaces, or must have different labels. AnLSRwhich is capable of Stream Merge (a "Merging LSR") needs to maintain only one outgoing label for each FEC. ANid list, S will add its LSRwhich is not capable of Stream Merge (a "Non-merging LSR") may needid tomaintain as many as N outgoing labels per FEC, where N is the number of LSRs inthenetwork. Hence by supporting Stream Merge, anLSRcan reduce its number of outgoing labels byid list and send afactor of O(N). Sincenew query message further upstream. The diffusion computation will continue to propagate upstream along eachlabelof the paths inuse requiresthededication of some amounttree upstream ofresources, this can beS until either asignificant savings. 2.16.2. Non-merging LSRs The MPLS forwarding proceduresloop isvery similar todetected, in which case theforwarding procedures used by such technologiesnode is pruned asATM and Frame Relay. That is,described above or we get to aunit of data arrives,point where alabel (VPI/VCInode gets a response ("OK" orDLCI) is looked up"LOOP") from each of its neighbors perhaps because none of those neighbors considers the node in question to be its downstream next hop. Once a"cross-connect table", on the basisnode has received a response from each ofthat lookupits upstream neighbors, it returns anoutput port is chosen, and"OK" message to its downstream neighbor. When thelabel value is rewritten. In fact,original node, node R, gets a response from each of its neighbors, it ispossiblesafe touse such technologies for MPLS forwarding; LDP can be used asreplace the"signalling protocol" for setting upold LSP with thecross-connect tables. Unfortunately, these technologies do not necessarily support the Rosen, Viswanathan & Callon [Page 24] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 Stream Merge capability. In ATM, ifnew oneattempts to perform Stream Merge, the result may bebecause all theinterleaving of cells from various packets. If cellspaths that would loop have been pruned fromdifferent packets get interleaved, it is impossible to reassemble the packets. Some Frame Relay switches use cell switching on their backplanes. These switches may also be incapable of supporting Stream Merge, forthesame reason -- cellstree. There are a couple ofdifferent packets may get interleaved, and there is then no way to reassemble the packets. We propose to support two solutionsdetails tothis problem.discuss: - First,MPLS will contain procedures which allow the use of non-merging LSRs. Second, MPLS will support procedures which allow certain ATM switches to function as merging LSRs. Since MPLS supports both merging and non-merging LSRs, MPLS also contains procedureswe need toensure correct interoperation between them. 2.16.3. Labelsdo something about nodes that forMerging and Non-Merging LSRs An upstream LSR which supports Stream Merge needs to be sent onlyonelabel per FEC. An upstream neighbor whichreason or another do not produce a timely response in response to a query message. If a node Y does notsupport Stream Merge needsrespond tobe sent multiple labels per FEC. However, there is no waya query from node X because ofknowingapriori how many labels it needs. This will depend on how many LSRs are upstreamfailure ofit with respectsome kind, X will not be able to respond to its downstream neighbors (if any) or switch over tothe FEC in question. In the MPLS architecture, if a particular upstream neighbor does not support Stream Merge, it is not sent any labels for a particular FEC unless it explicitly asks foralabel fornew LSP if X is, like R above, the node thatFEC. The upstream neighbor may make multiple such requests, andhas detected the route change. This problem isgivenhandled by timing out the query message. If anew label each time. Whennode doesn't receive adownstream neighbor receives suchresponse within arequest from upstream, and"reasonable" period of time, it "unsplices" its VC to thedownstreamupstream neighbordoesthat is notitself support Stream Merge, thenresponding and proceeds as itmust in turn ask its downstream neighbor for another label forwould if it had received theFEC in question. It is possible that there may be some nodes which support merge, but have a limited number of upstream streams which may"LOOP" message. - We also need to bemerged into a single downstream streams. Supposeconcerned about multiple concurrent routing updates. What happens, forexample that due to some hardware limitationexample, when a node M receives a request for an LSP from an upstream neighbor, N, when M iscapablein the middle ofmerging four upstream LSPs intoasingle downstream LSP. Suppose however, that this particular nodediffusion computation i.e., it hassixsent a query upstreamLSPs arriving at it forbut hasn't received all the responses. Since aparticular Stream. In this case, this node may merge these into twodownstreamLSPs (corresponding to two labels that neednode, node R is about tobe obtainedchange from one LSP to another, M needs to pass to N an LSR id list corresponding to thedownstream neighbor). In this case, the normal operationunion of theLDP implies that the downstream neighbor will supply this node with a single label forold and new LSP's if it is to avoid loops both before and after theStream.transition. Thisnode can then ask its downstream neighbor for one additional labelis easily accomplished since M already has the LSR id list for theStream, implying thatold LSP and it gets thenode will thereby obtainLSR id list for therequired two labels.new LSP in the query message. After R Rosen, Viswanathan & Callon [Page25]24] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 The interaction between explicit routing and merge is FFS. 2.16.4. Merge over ATM 2.16.4.1. Methods of Eliminating Cell Interleave There are several methods that can be used to eliminatedraft-ietf-mpls-arch-01.txt March 1998 makes thecell interleaving problem in ATM, thereby allowing ATM switches to support stream merge: : 1. VP merge When VP merge is used, multiple virtual paths are merged into a virtual path, but packetsswitch fromdifferent sources are distinguished by using different VCs withintheVP. 2. VC merge When VC merge is used, switches are requiredold LSP tobuffer cells from one packet untiltheentire packet is received (this may be determined by looking fornew one, R sends a new establish message upstream with theAAL5 endLSR id list offrame indicator). VP merge has(just) theadvantagenew LSP. At this point, the nodes upstream of R know thatit is compatible with a higher percentage of existing ATM switch implementations. This makes it more likelyR has switched over to the new LSP and thatVP mergethey canbe usedreturn the id list for (just) the new LSP inexisting networks. Unlike VC merge, VP merge does not incurresponse to anydelays at the merge points andnew requests for LSP's. They can alsodoes not impose any buffer requirements. However, it hasgrow thedisadvantagetree to include additional nodes thatit requires coordination ofwould not have been valid for theVCI space within each VP. There arecombined LSR id list. - We also need to discuss how anumber of ways that this can be accomplished. Selection of one or more methods is FFS. This tradeoff between compatibility with existing equipment versus protocol complexity and scalability impliesnode thatit is desirabledoesn't have an LSP for a given stream at theMPLS protocol to support both VP merge and VC merge. In order to do so each ATM switch participating in MPLS needs to know whether its immediate ATM neighbors perform VP merge, VC merge, or no merge. 2.16.4.2. Interoperation: VC Merge, VP Merge, and Non-Merge The interoperation of the various forms of merging over ATM is most easily described by first describing the interoperation of VC merge with non-merge. In the case where VC merge and non-merge nodes are interconnected the forwardingend ofcells is based in all casesa diffusion computation (because it would have been on aVC (i.e.,looping LSP) gets one after theconcatenation ofrouting protocols break theVPIloop. If node L has been pruned from the tree andVCI). For each node, if an upstream Rosen, Viswanathan & Callon [Page 26] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 neighbor is doing VC merge then that upstream neighbor requires only a single VPI/VCI for a particular Stream (this is analogous toits local route protocol processing entity breaks therequirement forloop by changing L's next hop, L will request asingle label innew LSP from its new downstream neighbor which it will use once it executes thecase of operation over frame media).diffusion computation as described above. If theupstream neighborloop isnot doing merge, thenbroken by a route change at another point in theneighbor will requireloop, i.e. at asingle VPI/VCI per Stream for itself, plus enough VPI/VCIs to pass to its upstream neighbors. The number requiredpoint "downstream" of L, L willbe determined by allowingget a new LSP as the new LSP tree grows upstreamnodes to request additional VPI/VCIsfromtheir downstream neighbors (this is again analogous to the method used with frame merge). A similar method is possible to support nodes which perform VP merge. In this casetheVP merge node, rather than requesting a single VPI/VCI or a numberpoint ofVPI/VCIs from its downstream neighbor, instead may request a single VP (identified by a VPI) but several VCIs withintheVP. Furthermore, supposeroute change as discussed in the previous paragraph. - Note that when anon-mergenode isdownstreampruned fromtwo different VP merge nodes. Thisthe tree, the switched path upstream of that nodemay need to request one VPI/VCI (for traffic originating from itself) plus two VPs (one for each upstream node), each associated with a specified set of VCIs (as requested from the upstream node). In order to support all of VP merge, VC merge, and non-merge, itremains "connected". This istherefore necessaryimportant since it allows the switched path toallow upstream nodesget "reconnected" torequestacombination of zero or more VC identifiers (consisting ofdownstream switched path after aVPI/VCI), plus zero or more VPs (identified by VPIs) each containingroute change with aspecified numberminimal amount ofVCs (identified byunsplicing and resplicing once the appropriate diffusion computation(s) have taken place. The LSR Id list can also be used to provide aset of VCIs"loop detection" capability. To use it in this manner, an LSR whichare significant withinsees that it is already in the LSR Id list for aVP). VP merge nodes would therefore request one VP, withparticular stream will immediately unsplice itself from the switched path for that stream, and will NOT pass the LSR Id list further upstream. The LSR can rejoin acontained VCIswitched path for the stream when it changes its next hop fortrafficthat stream, or when itoriginates (if appropriate) plusreceives aVCI for each VC requestednew LSR Id list fromabove (regardlessits current next hop, in which it is not contained. The diffusion computation would be omitted. 2.17.2. Interworking ofwhether orLoop Control Options The MPLS protocol architecture allows some nodes to be using loop prevention, while some other nodes are not (i.e., theVC is partchoice ofa containing VP). VC merge node would request only a single VPI/VCI (since they can merge all upstream traffic into a single VC). Non-merge nodes would pass on any requests that they get from above, plus request a VPI/VCI for traffic that they originate (if appropriate). 2.17. LSP Control: Egress versus Local There is a choice to be made regardingwhetherthe initial setup of LSPs will be initiated by the egress node,orlocally by each individual node.not to use loop prevention may be a local decision). WhenLSP controlthis mix isdone locally, then each node may at any time pass label bindings to its neighborsused, it is not possible foreach FEC recognized by that node. In the normal case that the neighboring nodes recognize the same FECs, thena loop to form which includes only nodesmay map incoming labelswhich do loop prevention. However, it is possible for loops tooutgoing labels as partform which contain a combination ofthe normal label swapping forwarding method. When LSP control is done by the egress, then initially only thesome nodes which do loop prevention, and some nodes which do not. Rosen, Viswanathan & Callon [Page27]25] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 egress node passes label bindings to its neighbors correspondingdraft-ietf-mpls-arch-01.txt March 1998 There are at least four identified cases in which it makes sense toany FECscombine nodes whichleave the MPLS network at that egress node. Otherdo loop prevention with nodeswait until they get a labelwhich do not: (i) For transition, in intermediate states while transitioning fromdownstream for a particular FEC before passing a corresponding label for the same FEC to upstream nodes. With local control, since each LSR is (at least initially) independently assigning labelsall non-loop-prevention toFECs, it is possible that different LSRs may make inconsistent decisions.all loop prevention, or vice versa; (ii) Forexample, an upstream LSR may make a coarse decision (map multiple IP address prefixes to a single label) while its downstream neighbor makes a finer grain decision (map each individual IP address prefix tointeroperability, where one vendor implements loop prevention but another vendor does not; (iii) Where there is aseparate label). With downstream label assignment this can be corrected by having LSRs withdraw labels that it has assigned which are inconsistent with downstream labels,mixed ATM andreplace them with new consistent label assignments. Even with egress control itdatagram media network, and where loop prevention ispossible thatdesired over thechoiceATM portions ofegress node may change, ortheegress may (based on a change in configuration) change its mind in termsnetwork but not over the datagram portions; (iv) where some of thegranularity which isATM switches can do fair access tobe used. This impliesthesame mechanism will be necessary to allow changes in granularity to bubble up to upstream nodes. The choicebuffer pool on a per-VC basis, and some cannot, and loop prevention is desired over the ATM portions ofegress or local control may therefore effectthefrequency withnetwork whichthis mechanismcannot. Note that interworking is straightforward. If an LSR isused, but willnoteffect the need fordoing loop prevention, and it receives from a downstream LSR amechanism to achieve consistency oflabelgranularity. Generally speaking,mapping which contains loop prevention information, it (a) accepts thechoice of local versus egress controllabel mapping, (b) doesnot appear to have any effect onNOT pass theLDP mechanisms which need to be defined. Egress controlloop prevention information upstream, andlocal control can interwork in a very straightforward manner (although some of(c) informs theadvantages ascribed to egress control may be lost, see appendices A and B). With either approach, (assumingdownstreamlabel assignment)neighbor that theegress node will initially assign labels for particular FECs and will pass these labels to its neighbors. With either approach thesepath is loop-free. Similarly, if an LSR R which is doing loop prevention receives from a downstream LSR a labelassignments will bubble upstream, withmapping which does not contain any loop prevention information, then R passes the label mapping upstreamnodes choosing labels that are consistentwith loop prevention information included as if R were thelabels that they receive from downstream. The difference betweenegress for thetwo approachesspecified stream. Optionally, a node istherefore primarily an issuepermitted to implement the ability ofwhat each node does prioreither doing or not doing loop prevention as options, and is permitted toobtaining a label assignmentchoose which to use foraany one particularFECLSP based on the information obtained from downstreamnodes: Does it wait, or does it assign a preliminarynodes. When the labelundermapping arrives from downstream, then theexpectation that it will (probably) be correct? Regardless of which method is used (local control or egress control) eachnodeneedsmay choose whether toknow (possibly by configuration) what granularityuse loop prevention so as to continue to usefor labelsthe same approach as was used in the information passed to it. Note thatit assigns. Where egress controlregardless of whether loop prevention isused, this requires each node to knowused thegranularity onlyegress nodes (for any particular LSP) always initiates exchange of label mapping information without waiting forstreams which leave the MPLS network at that node. For local control,other nodes to act. 2.18. Merging and Non-Merging LSRs Merge allows multiple upstream LSPs to be merged into a single downstream LSP. When implemented by multiple nodes, this results inorderthe traffic going toavoida particular egress nodes, based on one particular stream, to follow a multipoint to point tree (MPT), with the MPT rooted at the egress node and associated with the stream. This can have a significant effect on reducing the number of labels that need towithdraw inconsistent labels,be maintained by any one particular node. If merge was not used at all it would be necessary for each nodein theto Rosen, Viswanathan & Callon [Page28]26] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 network would need to be configured consistently to knowdraft-ietf-mpls-arch-01.txt March 1998 provide thegranularityupstream neighbors with a label for eachstream. However, in many cases thisstream for each upstream node which may bedone by usingforwarding traffic over the link. This implies that the number of labels needed might not in general be known asingle levelpriori. However, the use ofgranularity which applies to all streams (such as "onemerge allows a single label to be used perIP prefix in the forwarding table"). The choice between local control versus egress control could similarlystream, therefore allowing label assignment to beleft asdone in aconfiguration option. Future versionscommon way without regard for the number of upstream nodes which will be using the downstream LSP. The proposed MPLS protocol architecturewill needsupports LSP merge, while allowing nodes which do not support LSP merge. This leads tochoosethe issue of ensuring correct interoperation betweenthree options: (i) Requiring local control; (ii) Requiring egress control; or (iii) Allowing a choicenodes which implement merge and those which do not. The issue is somewhat different in the case oflocal control or egress control. Arguments for localdatagram media versusegress control are contained in appendices A and B. 2.18. Granularity When forwarding by label swapping, a streamthe case of ATM. The different media types will therefore be discussed separately. 2.18.1. Stream Merge Let us say that an LSR is capable of Stream Merge if it can receive two packetsfollowing a stream arrivingfromupstream may be mappeddifferent incoming interfaces, and/or with different labels, and send both packets out the same outgoing interface with the same label. This in effect takes two incoming streams and merges them into one. Once the packets are transmitted, the information that they arrived from different interfaces and/or with different incoming labels is lost. Let us say that anequalLSR is not capable of Stream Merge if, for any two packets which arrive from different interfaces, orcoarser grain stream. However, a coarse grain stream (for example, containingwith different labels, the packetsdestined for a short IP address prefix covering many subnets) cannotmust either bemapped directly into a finer grain stream (for example, containing packets destined for a longer IP address prefix covering a single subnet). This implies that theretransmitted out different interfaces, or must have different labels. An LSR which is capable of Stream Merge (a "Merging LSR") needs tobe some mechanismmaintain only one outgoing label forensuring consistency betweeneach FEC. AN LSR which is not capable of Stream Merge (a "Non-merging LSR") may need to maintain as many as N outgoing labels per FEC, where N is thegranularitynumber ofLSPsLSRs inan MPLS network. The method used for ensuring compatibility of granularity may depend uponthemethod used for LSP control. When LSP control is local, it is possible that a node may pass a coarse grain label tonetwork. Hence by supporting Stream Merge, an LSR can reduce itsupstream neighbor(s), and subsequently receivenumber of outgoing labels by afiner grainfactor of O(N). Since each labelfrom its downstream neighbor. In this casein use requires thenode has two options: (i) It may forwarddedication of some amount of resources, this can be a significant savings. 2.18.2. Non-merging LSRs The MPLS forwarding procedures is very similar to thecorresponding packets using normal IP datagramforwarding(i.e.,procedures used byexaminationsuch technologies as ATM and Frame Relay. That is, a unit of data arrives, a label (VPI/VCI or DLCI) is looked up in a "cross-connect table", on theIP header); (ii) It may withdrawbasis of that lookup an output port is chosen, and the labelmappings thatvalue is rewritten. In fact, ithas passedis possible toits upstream neighbors, and replace these with finer grain label mappings. When LSP control is egress based, the label setup originates from the egress node and passes upstream. It is therefore straightforward with this approach to maintain equally-grained mappings along the route.Rosen, Viswanathan & Callon [Page29]27] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 2.19. Tunnels and Hierarchy Sometimes a router Ru takes explicit action to cause a particular packet todraft-ietf-mpls-arch-01.txt March 1998 use such technologies for MPLS forwarding; LDP can bedelivered to another router Rd, even though Ru and Rd are not consecutive routers onused as theHop-by-hop path"signalling protocol" forthat packet, and Rd issetting up the cross-connect tables. Unfortunately, these technologies do not necessarily support thepacket's ultimate destination. For example, thisStream Merge capability. In ATM, if one attempts to perform Stream Merge, the result may bedone by encapsulating the packet inside a network layer packet whose destination address istheaddressinterleaving ofRd itself. This creates a "tunnel"cells fromRu to Rd. We refer to any packet so handled as a "Tunneled Packet". 2.19.1. Hop-by-Hop Routed Tunnelvarious packets. Ifa Tunneled Packet follows the Hop-by-hop pathcells fromRu to Rd, we say thatdifferent packets get interleaved, it isin an "Hop-by-Hop Routed Tunnel" whose "transmit endpoint" is Ruimpossible to reassemble the packets. Some Frame Relay switches use cell switching on their backplanes. These switches may also be incapable of supporting Stream Merge, for the same reason -- cells of different packets may get interleaved, andwhose "receive endpoint"there isRd. 2.19.2. Explicitly Routed Tunnel If a Tunneled Packet travels from Ruthen no way toRd over a path other thanreassemble theHop-by-hop path, we say that it is in an "Explicitly Routed Tunnel" whose "transmit endpoint" is Ru and whose "receive endpoint" is Rd. For example, we might send a packet through an Explicitly Routed Tunnel by encapsulating it in a packetpackets. We propose to support two solutions to this problem. First, MPLS will contain procedures whichis source routed. 2.19.3. LSP Tunnels It is possibleallow the use of non-merging LSRs. Second, MPLS will support procedures which allow certain ATM switches toimplement a tunnelfunction asa LSP,merging LSRs. Since MPLS supports both merging anduse label switching rather than network layer encapsulationnon-merging LSRs, MPLS also contains procedures tocause the packetensure correct interoperation between them. 2.18.3. Labels for Merging and Non-Merging LSRs An upstream LSR which supports Stream Merge needs totravel through the tunnel. The tunnel wouldbea LSP <R1, ..., Rn>, where R1 is the transmit endpoint of the tunnel, and Rnsent only one label per FEC. An upstream neighbor which does not support Stream Merge needs to be sent multiple labels per FEC. However, there isthe receive endpointno way ofthe tunnel. This is calledknowing a"LSP Tunnel". The set of packets whichpriori how many labels it needs. This will depend on how many LSRs are upstream of it with respect tobe sent thoughtheLSP tunnel becomes a Stream, and each LSRFEC in question. In thetunnel must assignMPLS architecture, if alabel to thatparticular upstream neighbor does not support Stream(i.e., must assignMerge, it is not sent any labels for alabel to the tunnel). The criteriaparticular FEC unless it explicitly asks forassigningaparticular packet to an LSP tunnellabel for that FEC. The upstream neighbor may make multiple such requests, and is given alocal matter at the tunnel's transmit endpoint. To putnew label each time. When apacket into an LSP tunnel, the transmit endpoint pushesdownstream neighbor receives such alabel for the tunnel onto the label stackrequest from upstream, andsends the labeled packet to the next hop inthetunnel. If it isdownstream neighbor does notnecessaryitself support Stream Merge, then it must in turn ask its downstream neighbor for another label for thetunnel's receive endpoint toFEC in question. It is possible that there may beable to determinesome nodes which support merge, but have a limited number of upstream streams whichpackets it receives through the tunnel, as discussed earlier, the label stackmay bepoppedmerged into a single downstream streams. Suppose for example that due to some hardware limitation a node is capable of merging four upstream LSPs into a single downstream LSP. Suppose however, that this particular node has six upstream LSPs arriving at it for a particular stream. In this case, this node may merge these into two downstream LSPs (corresponding to two labels that need to be obtained from thepenultimate LSR indownstream neighbor). In this case, thetunnel.normal operation of the LDP Rosen, Viswanathan & Callon [Page30]28] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 A "Hop-by-Hop Routed LSP Tunnel" is a Tunneldraft-ietf-mpls-arch-01.txt March 1998 implies thatis implemented as an hop-by-hop routed LSP betweenthetransmit endpoint and the receive endpoint. An "Explicitly Routed LSP Tunnel" is a LSP Tunnel that is also an Explicitly Routed LSP. 2.19.4. Hierarchy: LSP Tunnels within LSPs Considerdownstream neighbor will supply this node with aLSP <R1, R2, R3, R4>. Let us suppose that R1 receives unlabeled packet P, and pushes on itssingle labelstackfor the stream. This node can then ask its downstream neighbor for one additional labelto cause it to follow this path, and that this is in factfor theHop-by-hop path. However, let us further supposestream, implying thatR2the node will thereby obtain the required two labels. The interaction between explicit routing andR3merge is FFS. 2.18.4. Merge over ATM 2.18.4.1. Methods of Eliminating Cell Interleave There arenot directly connected,several methods that can be used to eliminate the cell interleaving problem in ATM, thereby allowing ATM switches to support stream merge: : 1. VP merge When VP merge is used, multiple virtual paths are merged into a virtual path, but packets from different sources are"neighbors"distinguished byvirtue of being the endpoints of an LSP tunnel. Sousing different VCs within theactual sequence of LSRs traversed by P is <R1, R2, R21, R22, R23, R3, R4>.VP. 2. VC merge WhenP travels from R1VC merge is used, switches are required toR2, it will have a label stack of depth 1. R2, switching onbuffer cells from one packet until thelabel, determines that P must enterentire packet is received (this may be determined by looking for thetunnel. R2 first replacesAAL5 end of frame indicator). VP merge has theIncoming label with a labeladvantage thatis meaningful to R3. Thenitpushes on a new label. This level 2 label has a value which is meaningful to R21. Switching is done on the level 2 label by R21, R22, R23. R23, whichisthe penultimate hop in the R2-R3 tunnel, pops the label stack before forwarding the packet to R3. When R3 sees packet P, P has onlycompatible with alevel 1 label, having now exited the tunnel. Since R3 is the penultimate hop in P's level 1 LSP,higher percentage of existing ATM switch implementations. This makes itpopsmore likely that VP merge can be used in existing networks. Unlike VC merge, VP merge does not incur any delays at thelabel stack,merge points andR4 receives P unlabeled. The label stack mechanism allows LSP tunneling to nest toalso does not impose anydepth. 2.19.5. LDP Peering and Hierarchy Suppose that packet P travels along a Level 1 LSP <R1, R2, R3, R4>, and when going from R2 to R3 travels along a Level 2 LSP <R2, R21, R22, R3>. From the perspective of the Level 2 LSP, R2's LDP peer is R21. Frombuffer requirements. However, it has theperspectivedisadvantage that it requires coordination of theLevel 1 LSP, R2's LDP peers are R1 and R3. One can have LDP peers atVCI space within eachlayerVP. There are a number ofhierarchy. We will see in sections 3.6 and 3.7 somewaysto make use of this hierarchy. Notethatinthisexample, R2 and R21 mustcan beIGP neighbors, but R2 and R3 need not be. When two LSRs are IGP neighbors, we will refer to them as "Local LDP Peers". When two LSRs may be LDP peers, but are not IGP neighbors, we will refer to them as "Remote LDP Peers". Inaccomplished. Selection of one or more methods is FFS. This tradeoff between compatibility with existing equipment versus protocol complexity and scalability implies that it is desirable for theabove example, R2MPLS protocol to support both VP merge andR21 are local LDP peers, but R2VC merge. In order to do so each ATM switch participating in MPLS needs to know whether its immediate ATM neighbors perform VP merge, VC merge, or no merge. 2.18.4.2. Interoperation: VC Merge, VP Merge, andR3 are remote LDP peers.Non-Merge The interoperation of the various forms of merging over ATM is most easily described by first describing the interoperation of VC merge Rosen, Viswanathan & Callon [Page31]29] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 The MPLS architecture supports two ways to distribute labels at different layers of the hierarchy: Explicit Peering and Implicit Peering. One performs label Distribution with one's Local LDP Peers by opening LDP connections to them. One can perform label Distributiondraft-ietf-mpls-arch-01.txt March 1998 withone's Remote LDP Peers in one of two ways: 1. Explicit Peeringnon-merge. Inexplicit peering, one sets up LDP connections between Remote LDP Peers, exactly as one would do for Local LDP Peers. This technique is most useful whenthenumbercase where VC merge and non-merge nodes are interconnected the forwarding ofRemote LDP Peerscells issmall, orbased in all cases on a VC (i.e., thenumber of higher level label mappings is large, or the Remote LDP Peers are in distinct routing areas or domains. Of course, one needs to know which labels to distribute to which peers; this is addressed in section 3.1.2. Examplesconcatenation of theuse of explicit peering is found in sections 3.2.1VPI and3.6. 2. Implicit Peering In Implicit Peering, one does not have LDP connections to one's remote LDP peers, butVCI). For each node, if an upstream neighbor is doing VC merge then that upstream neighbor requires only a single VPI/VCI for a particular stream (this is analogous toone's local LDP peers. To distribute higher level labels to ones remote LDP peers, one encodesthehigher level labels as an attributerequirement for a single label in the case of operation over frame media). If thelower level labels, and distributesupstream neighbor is not doing merge, then thelower level label, along with this attribute,neighbor will require a single VPI/VCI per stream for itself, plus enough VPI/VCIs tothe local LDP peers.pass to its upstream neighbors. Thelocal LDP peers then propagatenumber required will be determined by allowing theinformationupstream nodes to request additional VPI/VCIs from theirpeers. This process continues till the information reaches remote LDP peers. Note thatdownstream neighbors (this is again analogous to theintermediary nodes may also be remote LDP peers. This techniquemethod used with frame merge). A similar method ismost useful whenpossible to support nodes which perform VP merge. In this case the VP merge node, rather than requesting a single VPI/VCI or a number ofRemote LDP Peers is large. Implicit peering does not requireVPI/VCIs from its downstream neighbor, instead may request an-square peering mesh to distribute labels to the remote LDP peers because the information is piggybacked through the local LDP peering. However, implicit peering requiressingle VP (identified by a VPI) but several VCIs within theintermediate nodes to store informationVP. Furthermore, suppose thatthey might not be directly interested in. An example of the use of implicit peering is found in section 3.3. Rosen, Viswanathan & Callon [Page 32] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 2.20. LDP Transport LDP is used between nodes in an MPLS network to establish and maintain the label mappings. In order for LDP to operate correctly, LDP information needs to be transmitted reliably, and the LDP messages pertaining toaparticular FEC need to be transmitted in sequence.non-merge node is downstream from two different VP merge nodes. This node maypotentially be accomplished either by using an existing reliable transport protocol such as TCP, or by specifying reliability mechanisms as part of LDPneed to request one VPI/VCI (forexample, the reliability mechanisms which are defined in IDRP could potentially be "borrowed" for use with LSP). The precise meanstraffic originating from itself) plus two VPs (one foraccomplishing transport reliabilityeach upstream node), each associated withLSP are for further study, but will bea specifiedby the MPLS Protocol Architecture beforeset of VCIs (as requested from thearchitecture may be considered complete. 2.21. Label Encodingsupstream node). In order totransmit a label stack along with the packet whose label stack it is,support all of VP merge, VC merge, and non-merge, it is therefore necessary todefineallow upstream nodes to request aconcrete encodingcombination ofthe label stack. The architecture supports several different encoding techniques; the choicezero or more VC identifiers (consisting ofencoding technique depends on the particular kinda VPI/VCI), plus zero or more VPs (identified by VPIs) each containing a specified number ofdevice being used to forward labeled packets. 2.21.1. MPLS-specific Hardware and/or Software If one is using MPLS-specific hardware and/or software to forward labeled packets, the most obvious way to encode the label stack is to defineVCs (identified by anew protocol to be used asset of VCIs which are significant within a"shim" between the data link layer and network layer headers. This shimVP). VP merge nodes wouldreally be just an encapsulationtherefore request one VP, with a contained VCI for traffic that it originates (if appropriate) plus a VCI for each VC requested from above (regardless of whether or not thenetwork layer packet; itVC is part of a containing VP). VC merge node wouldbe "protocol- independent" such that it could be used to encapsulaterequest only a single VPI/VCI (since they can merge all upstream traffic into a single VC). Non-merge nodes would pass on anynetwork layer. Hence we will referrequests that they get from above, plus request a VPI/VCI for traffic that they originate (if appropriate). 2.19. LSP Control: Egress versus Local There is a choice toit asbe made regarding whether the"generic MPLS encapsulation". The generic MPLS encapsulation would in turninitial setup of LSPs will beencapsulated in a data link layer protocol. The generic MPLS encapsulation should contain the following fields: 1.initiated by the egress node, or locally by each individual node. When LSP control is done locally, then each node may at any time pass labelstack, 2. a Time-to-Live (TTL) fieldbindings to its neighbors for each FEC recognized by that node. Rosen, Viswanathan & Callon [Page33]30] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 3. a Class of Service (CoS) field The TTL field permits MPLS to provide a TTL function similar to what is provided by IP. The CoS field permits LSRs to apply various scheduling packet disciplines to labeled packets, without requiring separate labels for separate disciplines. This section is not intended to rule outdraft-ietf-mpls-arch-01.txt March 1998 In theuse of alternative mechanisms in network environments where such alternatives may be appropriate. 2.21.2. ATM Switches as LSRs It will be notednormal case thatMPLS forwarding procedures are similar to those of legacy "label swapping" switches such as ATM switches. ATM switches usetheinput port andneighboring nodes recognize the same FECs, then nodes may map incomingVPI/VCI value as the index into a "cross-connect" table, from which they obtain an output port and anlabels to outgoingVPI/VCI value. Therefore if one or morelabelscan be encoded directly intoas part of thefields which are accessednormal label swapping forwarding method. When LSP control is done bythese legacy switches,the egress, then initially only thelegacy switches can, with suitable software upgrades, be used as LSRs. We will referegress node passes label bindings tosuch devices as "ATM- LSRs". There are three obvious waysits neighbors corresponding toencode labels in the ATM cell header (presumingany FECs which leave theuse of AAL5): 1. SVC Encoding Use the VPI/VCI field to encode the label which isMPLS network atthe top of thethat egress node. Other nodes wait until they get a labelstack. This technique can be used in any network.from downstream for a particular FEC before passing a corresponding label for the same FEC to upstream nodes. Withthis encoding technique,local control, since eachLSPLSR isrealized as(at least initially) independently assigning labels to FECs, it is possible that different LSRs may make inconsistent decisions. For example, anATM SVC, and the LDP becomes the ATM "signaling" protocol. With this encoding technique, the ATM-LSRs cannot perform "push" or "pop" operations on the label stack. 2. SVP Encoding Use the VPI fieldupstream LSR may make a coarse decision (map multiple IP address prefixes toencode thea single label) while its downstream neighbor makes a finer grain decision (map each individual IP address prefix to a separate label). With downstream label assignment this can be corrected by having LSRs withdraw labels that it has assigned which are inconsistent with downstream labels, and replace them with new consistent label assignments. Even with egress control it isatpossible that thetopchoice of egress node may change, or thelabel stack, and the VCI field to encode the second labelegress may (based on a change in configuration) change its mind in terms of thestack, if onegranularity which ispresent.to be used. Thistechnique some advantages overimplies theprevious one,same mechanism will be necessary to allow changes inthat it permits the use of ATM "VP- switching". That is,granularity to bubble up to upstream nodes. The choice of egress or local control may therefore effect theLSPs are realized as ATM SVPs,frequency withLDP serving as the ATM signaling protocol. However, this technique cannot always be used. If the network Rosen, Viswanathan & Callon [Page 34] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 includes an ATM Virtual Path through a non-MPLS ATM network, then the VPI field is not necessarily available for use by MPLS. Whenwhich thisencoding techniquemechanism is used, but will not effect theATM-LSR at the egressneed for a mechanism to achieve consistency of label granularity. Generally speaking, theVP effectivelychoice of local versus egress control doesa "pop" operation. 3. SVP Multipoint Encoding Use the VPI fieldnot appear toencodehave any effect on thelabelLDP mechanisms whichis atneed to be defined. Egress control and local control can interwork in a very straightforward manner (although when both methods exist in thetop ofnetwork, thelabel stack, use partoverall behavior of theVCI field to encode the secondnetwork is largely that of local control). With either approach, (assuming downstream labelonassignment) thestack, if one is present,egress node will initially assign labels for particular FECs andusewill pass these labels to its neighbors. With either approach these label assignments will bubble upstream, with theremainder ofupstream nodes choosing labels that are consistent with theVCI field to identifylabels that they receive from downstream. The difference between theLSP ingress. If this techniquetwo approaches isused, conventional ATM VP-switching capabilities can be usedtherefore primarily an issue of what each node does prior toprovide multipoint-to-point VPs. Cellsobtaining a label assignment for a particular FEC fromdifferent packetsdownstream nodes: Does it wait, or does it assign a preliminary label under the expectation that it willthen carry different VCI values, so multipoint- to-point VPs can(probably) beprovided without any cell interleaving problems. This technique depends on the existencecorrect? Regardless ofa capability for assigning small unique values towhich method is used (local control or egress control) Rosen, Viswanathan & Callon [Page 31] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 eachATM switch. If there are morenode needs to know (possibly by configuration) what granularity to use for labelsonthat it assigns. Where egress control is used, this requires each node to know thestack than can be encodedgranularity only for streams which leave the MPLS network at that node. For local control, in order to avoid theATM header,need to withdraw inconsistent labels, each node in theATM encodings mustnetwork would need to becombined withconfigured consistently to know thegeneric encapsulation. This does presuppose that itgranularity for each stream. However, in many cases this may bepossibledone by using a single level of granularity which applies totell, when reassemblingall streams (such as "one label per IP prefix in theATM cells into packets, whetherforwarding table"). This architecture allows thegeneric encapsulation is also present. 2.21.3. Interoperability among Encoding Techniques If <R1, R2, R3> is a segment ofchoice between local control and egress control to be aLSP, it is possible that R1 will use one encoding oflocal matter. Since thelabel stack when transmitting packet P to R2, but R2 will usetwo methods interwork, adifferent encoding when transmittinggiven LSR need support only one or the other. 2.20. Granularity When forwarding by label swapping, apacket Pstream of packets following a stream arriving from upstream may be mapped into an equal or coarser grain stream. However, a coarse grain stream (for example, containing packets destined for a short IP address prefix covering many subnets) cannot be mapped directly into a finer grain stream (for example, containing packets destined for a longer IP address prefix covering a single subnet). This implies that there needs toR3. In general,be some mechanism for ensuring consistency between theMPLS architecture supportsgranularity of LSPswith different label stack encodingsin an MPLS network. The method usedon different hops. Therefore, when we discuss the proceduresforprocessing a labeled packet, we speak in abstract termsensuring compatibility ofoperating ongranularity may depend upon thepacket's label stack.method used for LSP control. Whena labeled packetLSP control isreceived, the LSR must decodelocal, it is possible that a node may pass a coarse grain label todetermineits upstream neighbor(s), and subsequently receive a finer grain label from its downstream neighbor. In this case thecurrent valuenode has two options: (i) It may forward the corresponding packets using normal IP datagram forwarding (i.e., by examination of thelabel stack, then must operate onIP header); (ii) It may withdraw the labelstackmappings that it has passed todetermine the new value of the stack,its upstream neighbors, andthen encode the new value appropriately before transmittingreplace these with finer grain label mappings. When LSP control is egress based, thelabeled packet to its next hop. Unfortunately, ATM switches have no capability for translatinglabel setup originates fromone encoding technique to another. The MPLS architecture therefore requires that whenever itthe egress node and passes upstream. It ispossible for two ATM switchestherefore straightforward with this approach tobe successive LSRsmaintain equally-grained mappings alonga level m LSP for some packet, that those twothe route. Rosen, Viswanathan & Callon [Page35]32] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 ATM switches use the same encoding technique. Naturally there will be MPLS networks which contain a combination of ATM switches operating as LSRs,draft-ietf-mpls-arch-01.txt March 1998 2.21. Tunnels andother LSRs which operate using an MPLS shim header. In such networks there mayHierarchy Sometimes a router Ru takes explicit action to cause a particular packet to besome LSRs which have ATM interfaces as well as "MPLS Shim" interfaces. This is one example of an LSR with different label stack encodings on different hops. Such an LSR may swap off an ATM encoded label stack on an incoming interfacedelivered to another router Rd, even though Ru andreplace it with an MPLS shim header encoded label stackRd are not consecutive routers on theoutgoing interface. 2.22. Multicast This section isHop-by-hop path forfurther study 3. Some Applications of MPLS 3.1. MPLSthat packet, andHop by Hop Routed Traffic One use of MPLSRd isto simplifynot theprocess of forwarding packets using hoppacket's ultimate destination. For example, this may be done byhop routing. 3.1.1. Labels for Address Prefixes In general, router R determinesencapsulating thenext hop forpacketP by finding theinside a network layer packet whose destination addressprefix X in its routing table whichis thelongest match for P's destination address. That is,address of Rd itself. This creates a "tunnel" from Ru to Rd. We refer to any packet so handled as a "Tunneled Packet". 2.21.1. Hop-by-Hop Routed Tunnel If a Tunneled Packet follows thepacketsHop-by-hop path from Ru to Rd, we say that it is in an "Hop-by-Hop Routed Tunnel" whose "transmit endpoint" is Ru and whose "receive endpoint" is Rd. 2.21.2. Explicitly Routed Tunnel If agiven Stream are just those packets which matchTunneled Packet travels from Ru to Rd over agiven address prefixpath other than the Hop-by-hop path, we say that it is inR's routing table. In this case,an "Explicitly Routed Tunnel" whose "transmit endpoint" is Ru and whose "receive endpoint" is Rd. For example, we might send aStream can be identified withpacket through anaddress prefix. IfExplicitly Routed Tunnel by encapsulating it in a packetP must traversewhich is source routed. 2.21.3. LSP Tunnels It is possible to implement asequence of routers,tunnel as a LSP, andat each router inuse label switching rather than network layer encapsulation to cause thesequence P matchespacket to travel through thesame address prefix, MPLS simplifiestunnel. The tunnel would be a LSP <R1, ..., Rn>, where R1 is theforwarding process by enabling all routers buttransmit endpoint of thefirst to avoid executingtunnel, and Rn is thebest match algorithm; they need only look upreceive endpoint of thelabel. 3.1.2. Distributing Labels for Address Prefixes 3.1.2.1. LDP Peers fortunnel. This is called aParticular Address Prefix LSRs R1 and R2"LSP Tunnel". The set of packets which areconsideredto beLDP Peers for address prefix X if and only if one ofsent though thefollowing conditions holds: Rosen, Viswanathan & Callon [Page 36] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 1. R1's route to X is a route which it learned about via a particular instance ofLSP tunnel becomes aparticular IGP,stream, andR2 is a neighbor of R1each LSR in the tunnel must assign a label to thatinstance of that IGP 2. R1's routestream (i.e., must assign a label toX isthe tunnel). The criteria for assigning aroute which it learned about by some instance of routing algorithm A1, and that route is redistributed intoparticular packet to aninstance of routing algorithm A2, and R2LSP tunnel is aneighbor of R1 in that instance of A2 3. R1 islocal matter at thereceive endpoint oftunnel's transmit endpoint. To put a packet into an LSPTunnel that is within another LSP, and R2 is atunnel, the transmit endpointof that tunnel, and R1 and R2 are participants inpushes acommon instance of an IGP, and are inlabel for thesame IGP area (iftunnel onto theIGP in question has areas),label stack andR1's routesends the labeled packet toX was learned via that IGP instance, orthe next hop in the tunnel. If it isredistributed by R1 into that IGP instance 4. R1's route to X is a routenot necessary for the tunnel's receive endpoint to be able to determine which packets itlearned about via BGP, and R2receives through the tunnel, as discussed earlier, the label stack may be popped at the penultimate LSR in the tunnel. Rosen, Viswanathan & Callon [Page 33] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 A "Hop-by-Hop Routed LSP Tunnel" is aBGP peer of R1 In general, these rules ensureTunnel thatifis implemented as an hop-by-hop routed LSP between theroute totransmit endpoint and the receive endpoint. An "Explicitly Routed LSP Tunnel" is aparticular address prefixLSP Tunnel that isdistributed viaalso anIGP, the LDP peers forExplicitly Routed LSP. 2.21.4. Hierarchy: LSP Tunnels within LSPs Consider a LSP <R1, R2, R3, R4>. Let us suppose thataddress prefix are the IGP neighbors. IfR1 receives unlabeled packet P, and pushes on its label stack theroutelabel toa particular address prefixcause it to follow this path, and that this isdistributed via BGP,in fact theLDP peers forHop-by-hop path. However, let us further suppose thataddress prefixR2 and R3 arethe BGP peers. In other casesnot directly connected, but are "neighbors" by virtue ofLSP tunneling,being thetunnelendpointsare LDP peers. 3.1.2.2. Distributing Labels In order to use MPLS for the forwardingofnormally routed traffic, each LSR MUST: 1. bind one or more labels to each address prefix that appears in its routing table; 2. for each such address prefix X, useanLDP to distributeLSP tunnel. So themapping of a label to X to eachactual sequence ofits LDP Peers for X. There is also one circumstance in which an LSR must distribute a label mapping for an address prefix, even if itLSRs traversed by P isnot the LSR which bound that label to that address prefix: 3. If<R1, R2, R21, R22, R23, R3, R4>. When P travels from R1uses BGPtodistributeR2, it will have aroute to X, naming some other LSR R2 as the BGP Next Hop to X, and if R1 knows that R2 has assignedlabelL to X, then R1 must distributestack of depth 1. R2, switching on themapping between T and X to any BGP peer to which it distributes that route. Rosen, Viswanathan & Callon [Page 37] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 These rules ensurelabel, determines thatlabels corresponding to address prefixes which correspond to BGP routes are distributed to IGP neighbors if and only if the BGP routes are distributed into the IGP. Otherwise,P must enter thelabels bound to BGP routes are distributed only totunnel. R2 first replaces theother BGP speakers. These rules are intended to indicate whichIncoming labelmappings must be distributed bywith agiven LSRlabel that is meaningful to R3. Then it pushes on a new label. This level 2 label has a value whichother LSRs, NOTis meaningful toindicateR21. Switching is done on theconditions underlevel 2 label by R21, R22, R23. R23, whichthe distribution is to be made. Thatisdiscussedthe penultimate hop insection 2.17. 3.1.3. UsingtheHop by Hop path asR2-R3 tunnel, pops theLSP Iflabel stack before forwarding thehop-by-hop path thatpacketP needstofollow is <R1, ..., Rn>, then <R1, ..., Rn> can be an LSP as long as: 1. there isR3. When R3 sees packet P, P has only asingle address prefix X, such that, for all i, 1<=i<n, Xlevel 1 label, having now exited the tunnel. Since R3 is thelongest matchpenultimate hop inRi's routing table forP'sdestination address; 2. for all i, 1<i<n, Ri has assigned alevel 1 LSP, it pops the labelto Xstack, anddistributed thatR4 receives P unlabeled. The label stack mechanism allows LSP tunneling toR[i-1]. Notenest to any depth. 2.21.5. LDP Peering and Hierarchy Suppose that packet P travels along apacket'sLevel 1 LSPcan extend only until it encounters a router whose forwarding tables have<R1, R2, R3, R4>, and when going from R2 to R3 travels along alonger best match address prefix forLevel 2 LSP <R2, R21, R22, R3>. From thepacket's destination address. At that point,perspective of theLSP must end andLevel 2 LSP, R2's LDP peer is R21. From thebest match algorithm must be performed again. Suppose, for example, that packet P, with destination address 10.2.153.178 needs to go fromperspective of the Level 1 LSP, R2's LDP peers are R1to R2 toand R3.Suppose alsoOne can have LDP peers at each layer of hierarchy. We will see in sections 3.6 and 3.7 some ways to make use of this hierarchy. Note that in this example, R2advertises address prefix 10.2/16 to R1,and R21 must be IGP neighbors, butadvertises 10.2.153/22, 10.2.154/22,R2 and10.2/16 to R3. That is, R2 is advertising an "aggregated route"R3 need not be. When two LSRs are IGP neighbors, we will refer toR1. In this situation, packet P canthem as "Local LDP Peers". When two LSRs may belabel Switched until it reaches R2,LDP peers, butsince R2 has performed route aggregation, it must execute the best match algorithmare not IGP neighbors, we will refer tofind P's Stream. 3.1.4. LSP Egressthem as "Remote LDP Peers". In the above example, R2 andLSP Proxy Egress An LSR R is considered to be an "LSP Egress" LSR for address prefix X ifR21 are local LDP peers, but R2 andonly if one of the following conditions holds: 1. R1 has an address Y, such that X is the address prefix in R1's routing table which is the longest match for Y, orR3 are remote LDP peers. Rosen, Viswanathan & Callon [Page38]34] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 2. R containsdraft-ietf-mpls-arch-01.txt March 1998 The MPLS architecture supports two ways to distribute labels at different layers of the hierarchy: Explicit Peering and Implicit Peering. One performs label Distribution with one's Local LDP Peers by opening LDP connections to them. One can perform label Distribution with one's Remote LDP Peers inits routing tablesoneor more address prefixes Y such that X is a proper initial substringofY, but R's "LSP previous hops" for Xtwo ways: 1. Explicit Peering In explicit peering, one sets up LDP connections between Remote LDP Peers, exactly as one would donot contain any such address prefixes Y; that is, R2 is a "deaggregation point"foraddress prefix X. An LSR R1 is considered to be an "LSP Proxy Egress" LSR for address prefix X if and only if: 1. R1's next hop for XLocal LDP Peers. This technique isR2 R1 and R2 are notmost useful when the number of Remote LDP Peerswith respect to X (perhaps because R2 does not support MPLS),is small, or2. R1 has been configured to act as an LSP Proxy Egress for X The definitionthe number ofLSP allows forhigher level label mappings is large, or theLSP EgressRemote LDP Peers are in distinct routing areas or domains. Of course, one needs tobe a nodeknow whichdoes not support MPLS; inlabels to distribute to which peers; thiscase the penultimate node in the LSPis addressed in section 3.1.2. Examples of theProxy Egress. 3.1.5. The POP Label The POP labeluse of explicit peering isa label with special semantics which an LSR can bind to an address prefix. If LSR Ru, by consulting its ILM, sees that labeled packet P must be forwarded nextfound in sections 3.2.1 and 3.6. 2. Implicit Peering In Implicit Peering, one does not have LDP connections toRd,one's remote LDP peers, butthat Rd has distributed a mapping of the POP labelonly to one's local LDP peers. To distribute higher level labels to ones remote LDP peers, one encodes thecorresponding address prefix, then insteadhigher level labels as an attribute ofreplacingthevalue oflower level labels, and distributes thelabel on top oflower level label, along with this attribute, to thelabel stack, Ru pops the label stack, andlocal LDP peers. The local LDP peers thenforwards the resulting packet to Rd. LSR Rd distributes a mapping between the POP label and an address prefix X to LSR Ru if and only if: 1.propagate therules of Section 3.1.2 indicate that Rd distributesinformation toRu a label mapping for X, and 2. whentheir peers. This process continues till the information reaches remote LDPconnection between Ru and Rd was opened, Ru indicatedpeers. Note thatit could support the POP label, and 3. Rd is an LSP Egress (not proxy egress) for X. This causes the penultimate LSR on a LSP to popthelabel stack.intermediary nodes may also be remote LDP peers. This technique isquite appropriate; ifmost useful when theLSP Egressnumber of Remote LDP Peers isan MPLS Egress for X, then if the penultimate LSRlarge. Implicit peering does notpop the label stack, the LSP Egress will needrequire a n-square peering mesh tolook up the label, pop the label stack, and then look up the next label (or look up the L3 address, if no moredistribute labelsare present). By having the penultimate LSR popto thelabel stack,remote LDP peers because theLSP Egressinformation issavedpiggybacked through thework of havinglocal LDP peering. However, implicit peering requires the intermediate nodes tolook up two labelsstore information that they might not be directly interested in. An example of the use of implicit peering is found inorder to make its forwarding decision.section 3.3. Rosen, Viswanathan & Callon [Page39]35] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 However, if the penultimate LSRdraft-ietf-mpls-arch-01.txt March 1998 2.22. LDP Transport LDP is used between nodes in anATM switch, it may not have the capabilityMPLS network topopestablish and maintain the labelstack. Hence a POP label mapping may be distributed onlymappings. In order for LDP toLSRs which can support that function. Ifoperate correctly, LDP information needs to be transmitted reliably, and thepenultimate LSRLDP messages pertaining to a particular FEC need to be transmitted inan LSP for address prefix Xsequence. Flow control isan LSP Proxy Egress, it acts justalso required, asif the LSP Egress had distributedis thePOP label for X. 3.1.6. Option: Egress-Targeted Label Assignment There are situationscapability to carry multiple LDP messages inwhich an LSP Ingress, Ri, knows that packets of several different Streams must all follow the same LSP, terminating at, say, LSP Egress Re. In this case, proper routing cana single datagram. These goals will beachievedmet by using TCP as the underlying transport for LDP. (The use of multicast techniques to distribute label mappings is FFS.) 2.23. Label Encodings In order to transmit asinglelabelcan be used for all such Streams;stack along with the packet whose label stack it is, it isnotnecessary tohavedefine adistinct label for each Stream. If (and only if)concrete encoding of thefollowing conditions hold: 1.label stack. The architecture supports several different encoding techniques; theaddresschoice ofLSR Re is itself in the routing table as a "host route", and 2. there is some way for Ri to determine that Re isencoding technique depends on theLSP egress for all packets in aparticularsetkind ofStreams Then Ri may bind a single labeldevice being used toall FECS in the set. This is known as "Egress-Targeted Label Assignment." How can LSR Ri determine that an LSR Re is the LSP Egress for all packets in a particular Stream? There are a couple of possible ways: -forward labeled packets. 2.23.1. MPLS-specific Hardware and/or Software Ifthe networkone isrunning a link state routing algorithm, and all nodes in the area support MPLS, then the routing algorithm provides Ri with enough informationusing MPLS-specific hardware and/or software todetermine the routers through which packets in that Stream must leaveforward labeled packets, therouting domain or area. - It is possible to use LDP to pass information about which address prefixes are "attached"most obvious way towhich egress LSRs. This method has the advantage of not depending onencode thepresence of link state routing. If egress-targetedlabelassignmentstack isused, the number of labels that needto define a new protocol to besupported throughoutused as a "shim" between the data link layer and networkmay be greatly reduced.layer headers. Thismayshim would really besignificant if one is using legacy switching hardware to do MPLS, and the switching hardware can support only a limited numberjust an encapsulation oflabels. One possible approachthe network layer packet; it would be "protocol- independent" such that it could be used toconfigure theencapsulate any network layer. Hence we will refer touse Rosen, Viswanathan & Callon [Page 40] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 egress-targeted label assignment by default, but to configure particular LSRs to NOT use egress-targeted label assignment for one or more ofit as theaddress prefixes for which it is an LSP egress. We impose"generic MPLS encapsulation". The generic MPLS encapsulation would in turn be encapsulated in a data link layer protocol. The generic MPLS encapsulation should contain the followingrule: - Iffields: 1. the label stack, 2. aparticular LSR is NOT an LSP Egress for some setTime-to-Live (TTL) field Rosen, Viswanathan & Callon [Page 36] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 3. a Class ofaddress prefixes, then it should assign labelsService (CoS) field The TTL field permits MPLS tothe address prefixes in the same way asprovide a TTL function similar to what isdoneprovided byits LSP next hopIP. The CoS field permits LSRs to apply various scheduling packet disciplines to labeled packets, without requiring separate labels for separate disciplines. 2.23.2. ATM Switches as LSRs It will be noted that MPLS forwarding procedures are similar to thoseaddress prefixes. That is, suppose Rd is Ru's LSP next hop for address prefixes X1 and X2. If Rd assignsof legacy "label swapping" switches such as ATM switches. ATM switches use thesame label to X1input port andX2, Ru shouldthe incoming VPI/VCI value aswell. If Rd assigns different labels to X1the index into a "cross-connect" table, from which they obtain an output port andX2, then Ru should as well. For example, supposean outgoing VPI/VCI value. Therefore if onewants to make egress-targeted label assignment the default, but to assign distinctor more labelsto those address prefixes forcan be encoded directly into the fields whichtherearemultiple possible LSP egresses (i.e., for those address prefixes which are multi-homed.) One can configure all LSRs to use egress-targeted label assignment, andaccessed by these legacy switches, thenconfigure a handful of LSRs to assign distinct labels to those address prefixes which are multi-homed. For a particular multi-homed address prefix X, one would only needthe legacy switches can, with suitable software upgrades, be used as LSRs. We will refer toconfigure this in LSRs whichsuch devices as "ATM- LSRs". There areeither LSP Egresses or LSP Proxy Egresses for X. It is importantthree obvious ways tonote that if Ru and Rd are adjacent LSRs in an LSP for X1 and X2, forwarding will still be done correctly if Ru assigns distinctencode labelsto X1 and X2 while Rd assigns just one label toin thebothATM cell header (presuming the use ofthem. This just means that R1 will map different incoming labels toAAL5): 1. SVC Encoding Use thesame outgoing label, an ordinary occurrence. Similarly, if Rd assigns distinct labels to X1 and X2, but Ru assignsVPI/VCI field tothem bothencode the labelcorresponding towhich is at theaddresstop oftheirthe label stack. This technique can be used in any network. With this encoding technique, each LSPEgressis realized as an ATM SVC, and the LDP becomes the ATM "signaling" protocol. With this encoding technique, the ATM-LSRs cannot perform "push" orProxy Egress, forwarding will still be done correctly. Ru will just map"pop" operations on theincominglabel stack. 2. SVP Encoding Use the VPI field to encode the label whichRd has assigned tois at theaddresstop ofthat LSP Egress. 3.2. MPLSthe label stack, andExplicitly Routed LSPs There are a number of reasons why it may be desirable to use explicit routing instead of hop by hop routing. For example, this allows routesthe VCI field tobe basedencode the second label onadministrative policies, and allowstheroutesstack, if one is present. This technique some advantages over the previous one, in thatLSPs take to be carefully designed to allow traffic engineering (i.e., to allow intentional management ofit permits theloadinguse of ATM "VP- switching". That is, thebandwidth throughLSPs are realized as ATM SVPs, with LDP serving as thenodes and links inATM signaling protocol. However, this technique cannot always be used. If thenetwork).network includes an ATM Virtual Path through a non-MPLS ATM network, then the VPI field is not necessarily available for use by MPLS. Rosen, Viswanathan & Callon [Page41]37] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 3.2.1. Explicitly Routed LSP Tunnels: Traffic Engineering In some situations,draft-ietf-mpls-arch-01.txt March 1998 When this encoding technique is used, thenetwork administrators may desire to forward certain classesATM-LSR at the egress oftraffic along certain pre-specified paths, where these paths differ fromtheHop-by-hop path thatVP effectively does a "pop" operation. 3. SVP Multipoint Encoding Use thetraffic would ordinarily follow. This is known as Traffic Engineering. MPLS allows thisVPI field tobe easily done by means of Explicitly Routed LSP Tunnels. All thatencode the label which isneeded is: 1. A meansat the top ofselectingthepackets that arelabel stack, use part of the VCI field tobe sent intoencode theExplicitly Routed LSP Tunnel; 2. A meanssecond label on the stack, if one is present, and use the remainder ofsetting uptheExplicitly RoutedVCI field to identify the LSPTunnel; 3. A meansingress. If this technique is used, conventional ATM VP-switching capabilities can be used to provide multipoint-to-point VPs. Cells from different packets will then carry different VCI values, so multipoint- to-point VPs can be provided without any cell interleaving problems. This technique depends on the existence ofensuringa capability for assigning small unique values to each ATM switch. If there are more labels on the stack than can be encoded in the ATM header, the ATM encodings must be combined with the generic encapsulation. This does presuppose thatpackets sentit be possible to tell, when reassembling the ATM cells into packets, whether theTunnelgeneric encapsulation is also present. 2.23.3. Interoperability among Encoding Techniques If <R1, R2, R3> is a segment of a LSP, it is possible that R1 willnot loop fromuse one encoding of thereceive endpoint backlabel stack when transmitting packet P to R2, but R2 will use a different encoding when transmitting a packet P to R3. In general, thetransmit endpoint. IfMPLS architecture supports LSPs with different label stack encodings used on different hops. Therefore, when we discuss thetransmit endpointprocedures for processing a labeled packet, we speak in abstract terms of operating on thetunnel wishes to putpacket's label stack. When a labeled packetintois received, thetunnel, itLSR mustfirst replacedecode it to determine thelabelcurrent valueat the topof thestack with alabelvalue that was distributed to it by the tunnel's receive endpoint. Then itstack, then mustpushoperate on the labelwhich correspondsstack to determine thetunnel itself, as distributed to it bynew value of thenext hop alongstack, and then encode thetunnel. To allow this,new value appropriately before transmitting thetunnel endpoints should be explicit LDP peers. The label mappings they needlabeled packet toexchange are ofits next hop. Unfortunately, ATM switches have nointerestcapability for translating from one encoding technique totheanother. The MPLS architecture therefore requires that whenever it is possible for two ATM switches to be successive LSRs along a level m LSP for some packet, that those two ATM switches use thetunnel. 3.3. Label Stacks and Implicit Peering Supposesame encoding technique. Naturally there will be MPLS networks which contain aparticular LSR Recombination of ATM switches operating as LSRs, and other LSRs which operate using an Rosen, Viswanathan & Callon [Page 38] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 MPLS shim header. In such networks there may be some LSRs which have ATM interfaces as well as "MPLS Shim" interfaces. This is one example of anLSP proxy egress for 10 address prefixes,LSR with different label stack encodings on different hops. Such an LSR may swap off an ATM encoded label stack on an incoming interface and replace itreaches each address prefix through a distinct interface. One could assign a single label to all 10 address prefixes. Then Re iswith anLSP egress for all 10 address prefixes.MPLS shim header encoded label stack on the outgoing interface. 2.24. Multicast Thisensures that packetssection is forall 10 address prefixes get delivered to Re. However, Re would then havefurther study 3. Some Applications of MPLS 3.1. MPLS and Hop by Hop Routed Traffic One use of MPLS is tolook upsimplify thenetwork layer addressprocess ofeach suchforwarding packets using hop by hop routing. 3.1.1. Labels for Address Prefixes In general, router R determines the next hop for packet P by finding the address prefix X inorder to chooseits routing table which is theproper interface to sendlongest match for P's destination address. That is, the packets in a given stream are just those packets which match a given address prefix in R's routing table. In this case, a stream can be identified with an address prefix. If packeton. Alternatively, one could assignP must traverse adistinct label tosequence of routers, and at eachinterface. Then Re is an LSP proxy egress forrouter in the10sequence P matches the same addressprefixes. This eliminatesprefix, MPLS simplifies theneed for Reforwarding process by enabling all routers but the first to avoid executing the best match algorithm; they need only look up thenetwork layer addresses in orderlabel. 3.1.2. Distributing Labels for Address Prefixes 3.1.2.1. LDP Peers for a Particular Address Prefix LSRs R1 and R2 are considered toforwardbe LDP Peers for address prefix X if and only if one of thepackets. However,following conditions holds: 1. R1's route to X is a route which itcan result in the uselearned about via a particular instance of alarge numberparticular IGP, and R2 is a neighbor oflabels. An alternative would be to bind all 10 address prefixes to the same level 1 label (which is also bound to the addressR1 in that instance ofthe LSR itself),that IGP Rosen, Viswanathan & Callon [Page42]39] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 and then to bind each address prefixdraft-ietf-mpls-arch-01.txt March 1998 2. R1's route to X is adistinct level 2 label. The level 2 label would be treated asroute which it learned about by some instance of routing algorithm A1, and that route is redistributed into anattributeinstance of routing algorithm A2, and R2 is a neighbor of R1 in that instance of A2 3. R1 is thelevel 1 label mapping, which we call the "Stack Attribute". We impose the following rules: - When LSR Ru initially labelsreceive endpoint of anuntagged packet, if the longest match for the packet's destination address is X, and R'sLSPnext hop for XTunnel that isRd,within another LSP, andRd has distributed to R1R2 is amappingtransmit endpoint oflabel L1 X, along withthat tunnel, and R1 and R2 are participants in astack attributecommon instance ofL2, then 1. Ru must push L2an IGP, andthen L1 ontoare in thepacket's label stack, and then forwardsame IGP area (if thepacketIGP in question has areas), and R1's route toRd; 2. When Ru distributes label mappings forX was learned via that IGP instance, or is redistributed by R1 into that IGP instance 4. R1's route toits LDP peers, it must include L2 as the stack attribute. 3. Whenever the stack attribute changes (possibly asX is aresult ofroute which it learned about via BGP, and R2 is achange in Ru's LSP next hop for X), Ru must distribute the new stack attribute. NoteBGP peer of R1 In general, these rules ensure thatalthoughif thelabel value boundroute toX may be different at each hop alonga particular address prefix is distributed via an IGP, theLSP,LDP peers for that address prefix are thestack attribute value is passed unchanged, and is set byIGP neighbors. If theLSP proxy egress. Thusroute to a particular address prefix is distributed via BGP, theLSP proxy egressLDP peers forX becomes an "implicit peer" with eachthat address prefix are the BGP peers. In otherLSR incases of LSP tunneling, therouting area or domain.tunnel endpoints are LDP peers. 3.1.2.2. Distributing Labels Inthis case, explicit peering would be too unwieldy, becauseorder to use MPLS for thenumberforwarding ofpeers would become too large. 3.4. MPLS and Multi-Path Routing If annormally routed traffic, each LSRsupports multiple routes for a particular Stream, then it may assign multipleMUST: 1. bind one or more labels tothe Stream, oneeach address prefix that appears in its routing table; 2. for eachroute. Thussuch address prefix X, use an LDP to distribute thereceptionmapping of asecondlabelmapping from a particular neighborto X to each of its LDP Peers for X. There is also one circumstance in which an LSR must distribute aparticularlabel mapping for an addressprefix should be taken as meaningprefix, even if it is not the LSR which bound thateitherlabelcan be usedtorepresentthat addressprefix.prefix: 3. Ifmultiple label mappings forR1 uses BGP to distribute aparticularroute to X, naming some other LSR R2 as the BGP Next Hop to X, and if R1 knows that R2 has assigned label L to X, then R1 must distribute the mapping between T and X to any BGP peer to which it distributes that route. These rules ensure that labels corresponding to addressprefixprefixes which correspond to BGP routes arespecified, they may have distinct attributes.distributed to IGP neighbors if and only if the BGP routes are distributed into the IGP. Otherwise, the labels bound to BGP routes are distributed only to the other BGP Rosen, Viswanathan & Callon [Page43]40] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 3.5. LSPs maydraft-ietf-mpls-arch-01.txt March 1998 speakers. These rules are intended to indicate which label mappings must beMultipoint-to-Point Entities Considerdistributed by a given LSR to which other LSRs, NOT to indicate thecase of packets P1 and P2, each ofconditions under whichhas a destination address whose longest match, throughout a particular routing domain,the distribution isaddress prefix X. Suppose thatto be made. That is discussed in section 2.19. 3.1.3. Using theHop-by-hopHop by Hop pathfor P1 is <R1, R2, R3>, andas theHop-by-hopLSP If the hop-by-hop pathfor P2 is <R4, R2, R3>. Let's supposethatR3 binds label L3 to X, and distributes this mapping to R2. R2 binds label L2packet P needs to follow is <R1, ..., Rn>, then <R1, ..., Rn> can be an LSP as long as: 1. there is a single address prefix X,and distributes this mappingsuch that, for all i, 1<=i<n, X is the longest match in Ri's routing table for P's destination address; 2. for all i, 1<i<n, Ri has assigned a label toboth R1X andR4. When R2 receives packet P1, its incomingdistributed that labelwillto R[i-1]. Note that a packet's LSP can extend only until it encounters a router whose forwarding tables have a longer best match address prefix for the packet's destination address. At that point, the LSP must end and the best match algorithm must beL2. R2 will overwrite L2performed again. Suppose, for example, that packet P, withL3, and send P1destination address 10.2.153.178 needs to go from R1 toR3. When R2 receives packet P2, its incoming label will also be L2.R2again overwrites L2 with L3, and send P2 onto R3.Note thenSuppose also thatwhen P1R2 advertises address prefix 10.2/16 to R1, but R3 advertises 10.2.153/22, 10.2.154/22, andP2 are traveling from10.2/16 to R2. That is, R2 is advertising an "aggregated route" toR3, they carryR1. In this situation, packet P can be label Switched until it reaches R2, but since R2 has performed route aggregation, it must execute thesame label,best match algorithm to find P's stream. 3.1.4. LSP Egress andas far as MPLSLSP Proxy Egress An LSR R isconcerned, they cannotconsidered to bedistinguished. Thus instead of talking about two distinct LSPs, <R1, R2, R3>an "LSP Egress" LSR for address prefix X if and<R4, R2, R3>, we might talkonly if one of the following conditions holds: 1. R1 has an address Y, such that X is the address prefix in R1's routing table which is the longest match for Y, or 2. R contains in its routing tables one or more address prefixes Y such that X is asingle "Multipoint-to- Point LSP", which we might denote as <{R1, R4}, R2, R3>. This creates a difficulty when we attempt to use conventional ATM switches as LSRs. Since conventional ATM switchesproper initial substring of Y, but R's "LSP previous hops" for X do notsupport multipoint-to-point connections, there must be procedures to ensurecontain any such address prefixes Y; thateach LSPis, R2 isrealized asapoint-to-point VC. However,"deaggregation point" for address prefix X. Rosen, Viswanathan & Callon [Page 41] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 An LSR R1 is considered to be an "LSP Proxy Egress" LSR for address prefix X ifATM switches which do support multipoint-to-point VCsand only if: 1. R1's next hop for X is R2 R1 and R2 arein use, then the LSPs can be most efficiently realizednot LDP Peers with respect to X (perhaps because R2 does not support MPLS), or 2. R1 has been configured to act asmultipoint-to-point VCs. Alternatively, ifan LSP Proxy Egress for X The definition of LSP allows for theSVP Multipoint Encoding (section 2.21) canLSP Egress to beused,a node which does not support MPLS; in this case the penultimate node in theLSPs can be realized as multipoint-to-point SVPs. 3.6.LSPTunneling between BGP Border Routers Consideris thecase of an Autonomous System, A, which carries transit traffic between other Autonomous Systems. Autonomous System A will have a number of BGP Border Routers, andProxy Egress. 3.1.5. The POP Label The POP label is amesh of BGP connections among them, overlabel with special semantics whichBGP routes are distributed. In many such cases, it is desirable to avoid distributing the BGP routesan LSR can bind torouters which are not BGP Border Routers.an address prefix. Ifthis can be avoided, the "route distribution load" on those routers is significantly reduced. However, thereLSR Ru, by consulting its ILM, sees that labeled packet P must besome means of ensuringforwarded next to Rd, but that Rd has distributed a mapping of thetransit traffic will be delivered from Border RouterPOP label toBorder Router bytheinterior routers. This can easily be done by meanscorresponding address prefix, then instead ofLSP Tunnels. Suppose that BGP routes are distributed only to BGP Border Routers, and not toreplacing theinterior routers that lie alongvalue of theHop-by-hop path from Border Router to Border Router. LSP Tunnels canlabel on top of the label stack, Ru pops the label stack, and thenbe used as follows: Rosen, Viswanathan & Callon [Page 44] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 1. Each BGP Border Router distributes, to every other BGP Border Router inforwards thesame Autonomous System,resulting packet to Rd. LSR Rd distributes a mapping between the POP labelfor eachand an address prefix X to LSR Ru if and only if: 1. the rules of Section 3.1.2 indicate thatitRd distributes tothat router via BGP. 2. The IGPRu a label mapping for X, and 2. when theAutonomous System maintains a host route for each BGP Border Router. Each interior router distributes its labels for these host routes to each of its IGP neighbors.LDP connection between Ru and Rd was opened, Ru indicated that it could support the POP label, and 3.Suppose that: a) BGP Border Router B1 receives an unlabeled packet P, b) address prefix X in B1's routing tableRd isthe longest matchan LSP Egress (not proxy egress) for X. This causes thedestination address of P, c) the route to X ispenultimate LSR on aBGP route, d)LSP to pop theBGP Next Hop for X is B2, e) B2 has boundlabelL1 to X, and has distributed this mapping to B1, f)stack. This is quite appropriate; if theIGP next hopLSP Egress is an MPLS Egress for X, then if theaddress of B2 is I1, g)penultimate LSR does not pop theaddress of B2 is in B1's and I1's IGP routing tables as a host route, and h) I1 has boundlabelL2 tostack, theaddress of B2, and distributed this mapping to B1. Then before sending packet PLSP Egress will need toI1, B1 must create a label stack for P, then push onlook up the label, pop the labelL1,stack, and thenpush on label L2. 4. Suppose that BGP Border Router B1 receives a labeled Packet P, wherelook up the next labelon(or look up thetop ofL3 address, if no more labels are present). By having the penultimate LSR pop the labelstack corresponds to an address prefix, X, to whichstack, therouteLSP Egress isa BGP route, and that conditions 3b, 3c, 3d, and 3e all hold. Then before sending packet P to I1, B1 must replace the label at the top ofsaved thelabel stack with L1, and then push on label L2. With these procedures, a given packet P follows a level 1 LSP all of whose members are BGP Border Routers, and between each pairwork ofBGP Border Routershaving to look up two labels in order to make its forwarding decision. However, if thelevel 1 LSP,penultimate LSR is an ATM switch, itfollows a level 2 LSP. These procedures effectively create a Hop-by-Hop Routed LSP Tunnel betweenmay not have theBGP Border Routers. Sincecapability to pop theBGP border routers are exchanginglabelmappingsstack. Hence a POP label mapping may be distributed only to LSRs which can support that function. If the penultimate LSR in an LSP for address prefix X is an LSP Proxy Rosen, Viswanathan & Callon [Page45]42] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 address prefixes that are not even known to the IGP routing,draft-ietf-mpls-arch-01.txt March 1998 Egress, it acts just as if theBGP routers should become explicit LDP peers with each other. 3.7. Other Uses of Hop-by-Hop RoutedLSPTunnels The use of Hop-by-Hop Routed LSP Tunnels is not restricted to tunnels between BGP Next Hops. Any situationEgress had distributed the POP label for X. 3.1.6. Option: Egress-Targeted Label Assignment There are situations in whichone might otherwise have usedanencapsulation tunnel is one in which it is appropriate to use a Hop-by-Hop RoutedLSPTunnel. InsteadIngress, Ri, knows that packets ofencapsulatingseveral different streams must all follow thepacket withsame LSP, terminating at, say, LSP Egress Re. In this case, proper routing can be achieved by using anew header whose destination addresssingle label can be used for all such streams; it is not necessary to have a distinct label for each stream. If (and only if) the following conditions hold: 1. the address of LSR Re is itself in thetunnel's receive endpoint, the label correspondingrouting table as a "host route", and 2. there is some way for Ri tothe address prefix whichdetermine that Re is thelongest matchLSP egress forthe addressall packets in a particular set of streams Then Ri may bind a single label to all FECS in thetunnel's receive endpointset. This ispushed on the packet's label stack. The packet whichknown as "Egress-Targeted Label Assignment." How can LSR Ri determine that an LSR Re issent into the tunnel may or may not already be labeled. Ifthetransmit endpointLSP Egress for all packets in a particular stream? There are a couple of possible ways: - If thetunnel wishes to putnetwork is running alabeled packet into the tunnel, it must first replace the label value atlink state routing algorithm, and all nodes in thetop ofarea support MPLS, then thestackrouting algorithm provides Ri witha label value that was distributedenough information toit bydetermine thetunnel's receive endpoint. Then itrouters through which packets in that stream mustpush onleave thelabel which correspondsrouting domain or area. - It is possible tothe tunnel itself, as distributed to it by the next hop along the tunnel. To allow this, the tunnel endpoints should be explicituse LDPpeers. The label mappings they needtoexchangepass information about which address prefixes areof no interest"attached" tothe LSRs along the tunnel. 3.8. MPLS and Multicast Multicast routing proceeds by constructing multicast trees. The tree alongwhicha particular multicast packet must get forwarded depends in generalegress LSRs. This method has the advantage of not depending on thepacket's source address and its destination address. Whenever a particular LSR is a node in a particular multicast tree, it binds apresence of link state routing. If egress-targeted labelto that tree. It then distributesassignment is used, the number of labels thatmappingneed toits parent on the multicast tree. (Ifbe supported throughout thenode in questionnetwork may be greatly reduced. This may be significant if one ison a LAN,using legacy switching hardware to do MPLS, andhas siblings on that LAN, it must also distributethemappingswitching hardware can support only a limited number of labels. One possible approach would be toits siblings. This allowsconfigure theparentnetwork to usea singleegress-targeted labelvalue when multicastingassignment by default, but toall children on the LAN.) When a multicast labeled packet arrives, the NHLFE correspondingconfigure particular LSRs totheNOT use egress-targeted labelindicates the set of output interfacesassignment forthat packet, as well as the outgoing label. Ifone or more of thesame label encoding techniqueaddress prefixes for which it isused on all the outgoing interfaces, the very same packet can be sent to allan LSP egress. We impose thechildren.following rule: Rosen, Viswanathan & Callon [Page46]43] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 4. LDP Procedures This sectiondraft-ietf-mpls-arch-01.txt March 1998 - If a particular LSR isFFS. 5. Security Considerations Security considerations are not discussedNOT an LSP Egress for some set of address prefixes, then it should assign labels to the address prefixes inthis versionthe same way as is done by its LSP next hop for those address prefixes. That is, suppose Rd is Ru's LSP next hop for address prefixes X1 and X2. If Rd assigns the same label to X1 and X2, Ru should as well. If Rd assigns different labels to X1 and X2, then Ru should as well. For example, suppose one wants to make egress-targeted label assignment the default, but to assign distinct labels to those address prefixes for which there are multiple possible LSP egresses (i.e., for those address prefixes which are multi-homed.) One can configure all LSRs to use egress-targeted label assignment, and then configure a handful of LSRs to assign distinct labels to those address prefixes which are multi-homed. For a particular multi-homed address prefix X, one would only need to configure thisdraft. 6. Authors' Addresses Eric C. Rosen Cisco Systems, Inc. 250 Apollo Drive Chelmsford, MA, 01824 E-mail: erosen@cisco.com Arun Viswanathan IBM Corp. 17 Skyline Drive Hawthorne NY 10532 914-784-3273 E-mail: arunv@vnet.ibm.com Ross Callon Ascend Communications, Inc. 1 Robbins Road Westford, MA 01886 508-952-7412 E-mail: rcallon@casc.com 7. References [1] "A Frameworkin LSRs which are either LSP Egresses or LSP Proxy Egresses forMultiprotocol Label Switching", R.Callon, P.Doolan, N.Feldman, A.Fredette, G.Swallow,X. It is important to note that if Ru andA.Viswanathan, workRd are adjacent LSRs inprogress, Internet Draft <draft-ietf-mpls-framework-01.txt>, July 1997. [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. Feldman, R. Boivie, R. Woundy, workan LSP for X1 and X2, forwarding will still be done correctly if Ru assigns distinct labels to X1 and X2 while Rd assigns just one label to the both of them. This just means that R1 will map different incoming labels to the same outgoing label, an ordinary occurrence. Similarly, if Rd assigns distinct labels to X1 and X2, but Ru assigns to them both the label corresponding to the address of their LSP Egress or Proxy Egress, forwarding will still be done correctly. Ru will just map the incoming label to the label which Rd has assigned to the address of that LSP Egress. 3.2. MPLS and Explicitly Routed LSPs There are a number of reasons why it may be desirable to use explicit routing instead of hop by hop routing. For example, this allows routes to be based on administrative policies, and allows the routes that LSPs take to be carefully designed to allow traffic engineering (i.e., to allow intentional management of the loading of the bandwidth through the nodes and links inprogress,the network). 3.2.1. Explicitly Routed LSP Tunnels: Traffic Engineering In some situations, the network administrators may desire to forward certain classes of traffic along certain pre-specified paths, where these paths differ from the Hop-by-hop path that the traffic would ordinarily follow. This is known as Traffic Engineering. Rosen, Viswanathan & Callon [Page 44] Internet Draft<draft-viswanathan-aris-overview-00.txt>,draft-ietf-mpls-arch-01.txt March1997. [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in progress, Internet Draft <draft-feldman-aris-spec-00.txt>, March 1997. Rosen, Viswanathan & Callon [Page 47] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 [4] "ARIS Support for LAN Media Switching", S. Blake, A. Ghanwani, W. Pace, V. Srinivasan, work in progress, Internet Draft <draft-blake- aris-lan-00.txt>, March 1997. [5] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, Rosen, Swallow, Farinacci, work in progress, Internet Draft <draft- rekhter-tagswitch-arch-00.txt>, January, 1997. [6] "Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, work in progress, Internet Draft <draft-doolan-tdp-spec-01.txt>, May, 1997. [7] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet Draft <draft-davie-tag-switching-atm-01.txt>, January, 1997. [8] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, work in progress, Internet Draft <draft- rosen-tag-stack-02.txt>, June, 1997. [9] "Partitioning Tag Space among Multicast Routers on a Common Subnet", Farinacci, work in progress, internet draft <draft- farinacci-multicast-tag-part-00.txt>, December, 1996. [10] "Multicast Tag Binding and Distribution using PIM", Farinacci, Rekhter, work in progress, internet draft <draft-farinacci- multicast-tagsw-00.txt>, December, 1996. [11] "Toshiba's Router Architecture Extensions for ATM: Overview", Katsube, Nagami, Esaki, RFC 2098, February, 1997. [12] "Loop-Free Routing Using Diffusing Computations", J.J. Garcia- Luna-Aceves, IEEE/ACM Transactions on Networking, Vol. 1, No. 1, February 1993. Appendix A Why Egress Control is Better This section is written1998 MPLS allows this to be easily done byArun Viswanathan. It is demonstrated here why egress controlmeans of Explicitly Routed LSP Tunnels. All that isa necessary and sufficient mechanism forneeded is: 1. A means of selecting theLDP, and therefore ispackets that are to be sent into theoptimal method forExplicitly Routed LSP Tunnel; 2. A means of setting upLSPs. The necessary condition is established by citing counter examples that can be achieved *only* by egress control. It's also established why these typical scenarios are vital requirements for a multiprotocol LDP. The sufficiency part is established by proving Rosen, Viswanathan & Callon [Page 48] Internet Draft draft-ietf-mpls-arch-00.txt August 1997the Explicitly Routed LSP Tunnel; 3. A means of ensuring thategress control subsumespackets sent into thelocal control. Then finally, some discussions are made to mitigate concerns expressed againstTunnel will nothaving local control. It is shown that local control has clearly undesirable properties which may leadloop from the receive endpoint back tosevere scalability and robustness problems. It is also shown that in having both egress control and local control simultaneously in a network leadsthe transmit endpoint. If the transmit endpoint of the tunnel wishes tointeroperability problems and how local control abrogatesput a labeled packet into the tunnel, it must first replace the label value at theessential benefitstop ofegress control. A complete and self-contained case is presented here that clearly establishesthe stack with a label value thategress control iswas distributed to it by thepreponderant mechanism for LDP, andtunnel's receive endpoint. Then itsufficesmust push on the label which corresponds tosupport egress control alone asthedistribution paradigm. A.1 Definition of an Egress A node is identifiedtunnel itself, asan "egress" for a Stream, if: 1) it's at a routing boundary for that Stream, 2)distributed to it by the next hopfor that Stream is non-MPLS, 3)along theStream is directly attached ortunnel. To allow this, thenode itself. Nodes that satisfy conditions 1 or 2 for Streams, will by default start behaving astunnel endpoints should be explicit LDP peers. The label mappings they need to exchange are of no interest to the LSRs along the tunnel. 3.3. Label Stacks and Implicit Peering Suppose a particular LSR Re is an LSP proxy egress forthose streams. Note that conditions 110 address prefixes, and2 can be learned dynamically. For condition 3, nodes will not by default act asit reaches each address prefix through a distinct interface. One could assign a single label to all 10 address prefixes. Then Re is an LSP egress forthemselves or directly attached networks. If this condition is made the default, the LSPs setup by egress control will create LSPsall 10 address prefixes. This ensures thatare identicalpackets for all 10 address prefixes get delivered to Re. However, Re would then have to look up theLSPs created by local control. A.2 Overviewnetwork layer address ofEgress Control Wheneach such packet in order to choose the proper interface to send the packet on. Alternatively, one could assign anodedistinct label to each interface. Then Re is anegress for a Stream, it originates aLSPsetup messageproxy egress forthat particular Stream. The setup message is sent to all MPLS neighbors, exceptthenext hop neighbor. Each of these messages to10 address prefixes. This eliminates theneighbors carry an appropriate label for that Stream. When a node in a MPLS domain receives a setup message from a neighborneed fora particular Stream, it checks if that neighbor isRe to look up thenext hop fornetwork layer addresses in order to forward thegiven Stream. If so,packets. However, itpropagatescan result in themessageuse of a large number of labels. An alternative would be to bind allits MPLS neighbors, except the next hop from which10 address prefixes to themessage arrived. If not,same level 1 label (which is also bound to thenode may keepaddress of thelabel provided in the setup message for future use or negatively acknowledge the node that sent the messageLSR itself), and then toreleasebind each address prefix to a distinct level 2 label. The level 2 label would be treated as an attribute of the level 1 labelassignment. But it must not forwardmapping, which we call thesetup message from"Stack Attribute". We impose theincorrect next hop to any of its neighbors. This flooding scheme is similar in mechanism to Reverse Path Multicast. When a next hop for a Stream changes due to change in networkfollowing rules: Rosen, Viswanathan & Callon [Page49]45] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 topology, or a new node joinsdraft-ietf-mpls-arch-01.txt March 1998 - When LSR Ru initially labels an untagged packet, if thetopology,longest match for thenodepacket's destination address islocally appendedX, and R's LSP next hop for X is Rd, and Rd has distributed tothe existing LSP, without requiring egress intervention. The node may either request the labelR1 a mappingfrom the new next hop, or use the previously stored (but unused)of labelfrom that next hop. In the former case, the new next hop immediately respondsL1 X, along with a stack attribute of L2, then 1. Ru must push L2 and then L1 onto the packet's labelmappingstack, and then forward the packet to Rd; 2. When Ru distributes label mappings forthat Stream if it hasX to itsown downstream mapping for that Stream. A.3 Why Egress Control is Necessary There are some important situationsLDP peers, it must include L2 as the stack attribute. 3. Whenever the stack attribute changes (possibly as a result of a change inwhich egress control is necessary: - Shutting off anRu's LSPIf for some reason a network administrator requires to "shut off" a LSP setup for a particular Stream, s/he can configure the egress node for that Streamnext hop for X), Ru must distribute thedesired result.new stack attribute. Note that although therequirementlabel value bound toshut off an LSPX may be different at each hop along the LSP, the stack attribute value isa very fundamental one. If a destination has network layer reachability but no MPLS layer reachability (because of a problem in MPLS layer), shutting off anpassed unchanged, and is set by the LSPprovidesproxy egress. Thus theonly means to reach that destination. This mode of operation can be used by LSRs in a network that aren't a sinkLSP proxy egress forlarge amounts of data. These LSRs usually requireX becomes anoccasional telnet or network management traffic. It's important to provide the capability that such nodes"implicit peer" with each other LSR ina network canthe routing area or domain. In this case, explicit peering would beaccessed through hop-by-hop connectivity avoidingtoo unwieldy, because the number of peers would become too large. 3.4. MPLSlayer optimization. The reachability is more important than optimization in instances like this. The MPLS architecture MUST provide this capability. Note that this is only possible in local control when each node inand Multi-Path Routing If anentire network is configured to shut off a LSP setupLSR supports multiple routes for a particularStream. Such is neither desirable nor scalable. - Egress Aggregation In some networks, due to the absence of routing summarization, aggregation may not be possible through routing information. However, with Egress control,stream, then itis possiblemay assign multiple labels toaggregate *all* Streams that exitthenetwork throughstream, one for each route. Thus the reception of acommon egress node withsecond label mapping from asingle LSP. This is achieved easily because the egress simplyparticular neighbor for a particular address prefix should be taken as meaning that either label canuse the samebe used to represent that address prefix. If multiple label mappings forall Streams. Sucha particular address prefix are specified, they may have distinct attributes. 3.5. LSP Trees as Multipoint-to-Point Entities Consider the case of packets P1 and P2, each of which has a destination address whose longest match, throughout a particular routing domain, issimply not possible withaddress prefix X. Suppose that theLocal control;Hop-by-hop path for P1 is <R1, R2, R3>, and the Hop-by-hop path for P2 is <R4, R2, R3>. Let's suppose that R3 binds label L3 to X, and distributes this mapping to R2. R2 binds label L2 to X, and distributes this mapping to both R1 and R4. When R2 receives packet P1, its incoming label will be L2. R2 will overwrite L2 withlocal knowledge LSRs cannot map several StreamsL3, and send P1 toa singleR3. When R2 receives packet P2, its incoming labelbecause it is unknown if Streamswilldiverge at some subsequent downstream node.also be L2. R2 Rosen, Viswanathan & Callon [Page50]46] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 The egress aggregation works for both distance vector protocolsdraft-ietf-mpls-arch-01.txt March 1998 again overwrites L2 with L3, andlink state protocols; it is protocol independent.send P2 on to R3. Note then that whenusing VP switching in conjunction with some distance vector protocols it becomes very essential that such aggregation be possible, as there are many vendor switches that don't have VC merging capability,P1 andhave limited VP switching capability. The egress control provides such vendors with a level-playing field to compete with MPLS products. Moreover, this capability can be very useful in enterprise networks; where several legacy LANs at a site can be aggregatedP2 are traveling from R2 to R3, they carry theegress LSR at that site. Furthermore, this approach can drastically reduce signallingsame label, andLSP state maintenance overheads in the entire network. - Loop Prevention The loop-prevention mechanism only works from the egress node for multipoint-to-pointas far as MPLS is concerned, they cannot be distinguished. Thus instead of talking about two distinct LSPs,since the loop prevention mechanism requires the list<R1, R2, R3> and <R4, R2, R3>, we might talk ofLSR nodes througha single "Multipoint-to- Point LSP Tree", whichthe setup message has already traversed in orderwe might denote as <{R1, R4}, R2, R3>. This creates a difficulty when we attempt toidentify and prevent LSP loops. A loop prevention scheme isuse conventional ATM switches as LSRs. Since conventional ATM switches do notpossible through local control. - De-aggregation Egress control provides the capabilitysupport multipoint-to-point connections, there must be procedures tode-aggregate one or more Streams from an aggregated Stream. For example, if a networkensure that each LSP isaggregating all CIDRs of an EBGP node into a single LSP, with egress control,realized as aspecific CIDR from this bundlepoint-to-point VC. However, if ATM switches which do support multipoint-to-point VCs are in use, then the LSPs can begiven its own dedicated LSP. This enables one to apply special policies to specific CIDRs when required. Inmost efficiently realized as multipoint-to-point VCs. Alternatively, if thelocal control thisSVP Multipoint Encoding (section 2.23) can beachieved only by configuring every node in the network with specific de-aggregation information andused, theassociated policy. This approachLSPs canlead severe scalability problems. - Unique Labels As is known, when using VP merging, all ingresses mustbe realized as multipoint-to-point SVPs. 3.6. LSP Tunneling between BGP Border Routers Consider the case of an Autonomous System, A, which carries transit traffic between other Autonomous Systems. Autonomous System A will haveunique VCI values to prevent cell interleaving. With egress control,a number of BGP Border Routers, and a mesh of BGP connections among them, over which BGP routes are distributed. In many such cases, it ispossible to distribute unique VCI valuesdesirable to avoid distributing theingress nodes, avoiding the needBGP routes toconfigure each ingress node. The egress node can pick a unique VCI for each ingress node. Another benefit of egress control is that each egressrouters which are not BGP Border Routers. If this can beconfigured with a unique label value in the case of egress aggregation (as described above). Sinceavoided, thelabel value"route distribution load" on those routers isunique, the same label value cansignificantly reduced. However, there must beused on all the segmentssome means ofa LSP. This enables one to identify anywhere in a network each LSP Rosen, Viswanathan & Callon [Page 51] Internet Draft draft-ietf-mpls-arch-00.txt August 1997ensuring thatis associated with a certain egress node, thus easing network debugging. This again, is not possible in the local control because ofthelack of a single coordinating node. A.4 Examples that work better through egress control Local control needs to propagate attributes that cometransit traffic will be delivered fromthe downstream nodeBorder Router toall upstream nodes.Border Router by the interior routers. Thisbehavior itselfcan easily beLIKENEDdone by means of LSP Tunnels. Suppose that BGP routes are distributed only to BGP Border Routers, and not to theegress control. Nevertheless,interior routers that lie along thelocal controlHop-by-hop path from Border Router to Border Router. LSP Tunnels canachieve these onlythen be used as follows: 1. Each BGP Border Router distributes, to every other BGP Border Router in the same Autonomous System, aseverely inefficient manner. Sincelabel for eachnode only knows of local information,address prefix that itcreates anddistributesan LSP with incorrect attributes. As each node learns of new downstream attributes, a correction is made as the attributes are propagated upstream again. This can leadto that router via BGP. 2. The IGP for the Autonomous System maintains aworst case of O(n-squared) setup messageshost route for each BGP Border Router. Each interior router distributes its labels for these host routes tocreate a single LSP, where n is the numbereach ofnodesits IGP neighbors. 3. Suppose that: Rosen, Viswanathan & Callon [Page 47] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 a) BGP Border Router B1 receives an unlabeled packet P, b) address prefix X ina LSP. InB1's routing table is theegress control,longest match for theattribute distributiondestination address of P, c) the route to X isachieved during initial LSP setup, withasingle message fromBGP route, d) theegressBGP Next Hop for X is B2, e) B2 has bound label L1 toingresses. - TTL/Traceroute The ingress requires a proper LSP hop-count valueX, and has distributed this mapping todecrement TTL in packets that use a particular LSP,B1, f) the IGP next hop for the address of B2 is I1, g) the address of B2 is inenvironments suchB1's and I1's IGP routing tables asATM which do not haveaTTL equivalent. This simulates the TTL decrement which exists in an IP network,host route, andalso enables scoping utilities, such as traceroute,h) I1 has bound label L2 towork as they do today in IP networks. In egress control, the LSP hop-count is known attheingress as a by-productaddress ofthe LSP setup message, since an LSP setup message traverses from egressB2, and distributed this mapping to B1. Then before sending packet P toingress,I1, B1 must create a label stack for P, then push on label L1, andincrements the hop-count at each node along the path. - MTU When the MTU atthen push on label L2. 4. Suppose that BGP Border Router B1 receives a labeled Packet P, where theegress node is smaller thanlabel on theMTU at sometop of theingress nodes, packets originated at those ingress nodes will be dropped when they reach the egress node. Hosts not using MTU discovery have no meanslabel stack corresponds torecover from this. However, similaran address prefix, X, to which thehop-count, the minimum LSP MTU can be propagatedroute is a BGP route, and that conditions 3b, 3c, 3d, and 3e all hold. Then before sending packet P to I1, B1 must replace theingresses via egress control LSP setup messages, enablinglabel at theingress to do fragmentation when required. Rosen, Viswanathan & Callon [Page 52] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 - Implicit Peering Implicit peering istop of themechanism through which higher levellabel stacklabels are communicated to the ingress nodes. Thesewith L1, and then push on labelvaluesL2. With these procedures, a given packet P follows a level 1 LSP all of whose members arepiggybackedBGP Border Routers, and between each pair of BGP Border Routers in theLSP setup messages. This works best with egress control; when the egress creates the setup message,level 1 LSP, itcan piggyback the stack labels at the same time. - ToS/COS Based LSPs When certain LSPs require higher or lower precedence or priority throughfollows anetwork,level 2 LSP. These procedures effectively create a Hop-by-Hop Routed LSP Tunnel between thesingle egress nodeBGP Border Routers. Since the BGP border routers are exchanging label mappings for address prefixes thatLSP can be configured with the required priority and this can be communicated inare not even known to theegress control LSP setup message. InIGP routing, thelocal control,BGP routers should become explicit LDP peers with eachand every node in the network must be configured perother. Rosen, Viswanathan & Callon [Page 48] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 3.7. Other Uses of Hop-by-Hop Routed LSPto achieve the same result.Tunnels Thelocal control initially distributes labels to its neighbors willy-nilly, and then waits for attributes to come through egress control. Thus, local controluse of Hop-by-Hop Routed LSP Tunnels iscompletely dependent on egress controlnot restricted toprovide complete functional operationtunnels between BGP Next Hops. Any situation in which one might otherwise have used an encapsulation tunnel is one in which it is appropriate toLSPs. Otherwise, local control requires that attributes be configured throughuse a Hop-by-Hop Routed LSP Tunnel. Instead of encapsulating theentire network for each Stream. Thispacket with a new header whose destination address is themost compelling argument that local control is *not sufficient*; or conversely, egress controladdress of the tunnel's receive endpoint, the label corresponding to the address prefix which isnecessary. This demonstrates egress control subsumesthelocal control. Moreover, distributionlongest match for the address oflabels without associated attributes may not be appropriate and may lead to undesired results. A.5 Egress Controlthe tunnel's receive endpoint isSufficientpushed on the packet's label stack. Theargument for sufficiency is proved by demonstrating that required LSPs can be created with egress control, and thispacket which is sent into the tunnel may or may not already be labeled. If thecasetransmit endpoint of the tunnel wishes to put a labeled packet into the tunnel, it must first replace the label value at the top of the stack withlocal control. The egress control can create an LSP for every route entry madea label value that was distributed to it by therouting protocols: 1. A route can be learned from another routing domain, in which casetunnel's receive endpoint. Then it must push on theLSR atlabel which corresponds to therouting domain will acttunnel itself, asan egress fordistributed to it by theroute and originate an LSP setup for that route. 2. A route can be a locally attached network ornext hop along theLSR itself may be a host route. In this case,tunnel. To allow this, theLSRtunnel endpoints should be explicit LDP peers. The label mappings they need to exchange are of no interest to the LSRs along the tunnel. 3.8. MPLS and Multicast Multicast routing proceeds by constructing multicast trees. The tree along whichsucharoute is attached originates an LSP setup message. Rosen, Viswanathan & Callon [Page 53] Internet Draft draft-ietf-mpls-arch-00.txt August 1997 3. Anparticular multicast packet must get forwarded depends in general on the packet's source address and its destination address. Whenever a particular LSRwithis anon-MPLS next-hop behaves as an egress for all those route whose next-hop is the non-MPLS neighbor. These three above methods can create an LSP for each route entry in a network. Moreover, policy specific LSPs, as described previously, can *only* be achieved with egress control. Thus, egress control is necessary and sufficient for creating LSPs. QED. A.6 Discussions A.6.1 Is Local control faster than Egress control? During topology changes, such as links going down, coming up, change in link cost, etc, there is no difference in setup latency between Egress Control and Local control. This is due to the fact that thenode(Ru) which undergoes a changeinnext-hop foraStream immediately requestsparticular multicast tree, it binds a labelassignment from the new next hop node (Rd). The new next hop nodeto that tree. It thenimmediately supplies the labeldistributes that mappingfor the requested Stream. As explained in the Egress Control Method section,to its parent on thenode Ru may already have stored label assignments frommulticast tree. (If the nodeRd,inwhich case node Ru can immediately splice itself toquestion is on a LAN, and has siblings on that LAN, it must also distribute themultipoint-to-point tree. Hence, new nodes are spliced into existing LSPs locally. Inmapping to its siblings. This allows thescenario where a network initially learns ofparent to use anew route, althoughsingle label value when multicasting to all children on theLocal control may setup LSPs faster thanLAN.) When a multicast labeled packet arrives, theEgress control, this difference in latency has no perceived advantage. Since routing itself may take several secondsNHLFE corresponding topropagate and converge onthenew route information,label indicates thepotential latencyset ofegress control is smalloutput interfaces for that packet, as well ascompared totherouting protocol propagation time, andoutgoing label. If theinitial setup time at route propagation timesame label encoding technique isunimportant since these are long lived LSPs. Moreover, the hurried distribution of labels in local control may not carry much meaning because: 4. The associated attributes are not applied or propagated to the ingress. 5. While the ingress may believe it has an LSP, in reality the packets may be blackholed in the middle ofused on all thenetwork ifoutgoing interfaces, thefull LSP is not established. 6. Policy based LSPs, whichvery same packet canonlybeachieved via egress control as described above, may undo an un-used label assignment established by local control. A.6.2 Scalability and Robustness It has been alleged that the egress control does not havesent to all the children. Rosen, Viswanathan & Callon [Page54]49] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 scalabilitydraft-ietf-mpls-arch-01.txt March 1998 4. LDP Procedures for Hop-by-Hop Routed Traffic 4.1. The Procedures for Advertising androbustness properties required by distributed processing. However, the egress uses a root distribution paradigm commonlyUsing labels In this section, we consider only label mappings that are usedby many other standard routing protocols. For example,for traffic to be label switched along its hop-by-hop routed path. In these cases, the label in question will correspond to an address prefix in thecase of OSPF, LSAsrouting table. There areflooded throughadomain originating atnumber of different procedures that may be used to distribute label mappings. One such procedure is executed by the"egress", wheredownstream LSR, and thedifference being thatothers by theflooding inupstream LSR. The downstream LSR must perform: - The Distribution Procedure, and - thecase of OSPF is contained through a sequence numberWithdrawal Procedure. The upstream LSR must perform: - The Request Procedure, andin- theEgress control it is contained byNotAvailable Procedure, and - thenext hop validation. InRelease Procedure, and - thecaselabelUse Procedure. The MPLS architecture supports several variants ofPIM (and some other multicast protocols),each procedure. However, thedistribution mechanism is in fact exactly similar. EvenMPLS architecture does not support all possible combinations of all possible variants. The set of supported combinations will be described inBGP with route reflection, updates originate atsection 4.2, where theroot and traverseinteroperability between different combinations will also be discussed. 4.1.1. Downstream LSR: Distribution Procedure The Distribution Procedure is used by atree structure to reach the peers, as opposeddownstream LSR to determine when it should distribute an-square mesh.label mapping for a particular address prefix to its LDP peers. Thecommonality is the distribution paradigm, in which thearchitecture supports four different distributionoriginates at the root of a tree and traverses the branches till it reaches all the leaves. None of the above mentioned protocols have scalability or robustness problems becauseprocedures. Irrespective of thedistribution paradigm. The ONLY concern expressed against to counter Egress control isparticular procedure that is used, ifthe setup message does not propagate upstream fromacertain node, then the sub-treelabel mapping for a particular address prefix has been distributed by a downstream LSR Rd to an upstream LSR Ru, and if at any time the Rosen, Viswanathan & Callon [Page 50] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 attributes (as defined above) of thatnode will not be added intomapping change, then Rd must inform Ru of theLSP. It'snew attributes. If an LSR is maintaining multiple routes to areasonable concern, but further analysis shows that it's notparticular address prefix, it is arealistic problem. The impact of this problem comparedlocal matter as to whether that LSR maps multiple labels to theimpactaddress prefix (one per route), and hence distributes multiple mappings. 4.1.1.1. PushUnconditional Let Rd be an LSR. Suppose that: 1. X is an address prefix in Rd's routing table 2. Ru is an LDP Peer of Rd with respect to X Whenever these conditions hold, Rd must map asimilar problem in local control are exactly the same when LSRs employed in a MPLS domain have little or no forwarding capabilities (for example, ATM LSRs), since in both cases, packets are blackholed. In fact, in the egress control the packets for afflicted LSPs will be dropped right at the ingress, while with local control the packets will be dropped at the point of breakage, causing packetslabel tounnecessarily traverse part way through the network. When reasonable forwarding capability exists in the MPLS domain, with the egress control the packets may be forwarded hop-by-hop till the point whereX and distribute that mapping to Ru. It is theLSP setup ended. Whereas in caseresponsibility oflocal control, the packets will label switched till the pointRd to keep track ofbreakage and hop-by-hop forwarded tilltheLSP segment resumes. Since egress controlmappings which it hasadvantages when there is no forwarding capability,distributed to Ru, andlocal control isto make sure that Ru always hasadvantages when therethese mappings. 4.1.1.2. PushConditional Let Rd be an LSR. Suppose that: 1. X isforwarding capability, therean address prefix in Rd's routing table 2. Ru is anequal tradeoff between them, and thus, neitherLDP Peer of Rd with respect to X 3. Rd issuperioreither an LSP Egress orinferior in this regard. This latter casean LSP Proxy Egress for X, or Rd's L3 next hop for X issimply a loss in optimization, since the network has reasonable forwarding capabilities. Hence the robustness issueRn, where Rn isnotdistinct from Ru, and Rn has bound aproblem in either types of networks. As mentioned before, the local control is dependent on egress control for distributing attributes. The attribute distribution could then also face the same problem of stalled propagation, which would leadlabel toerroneous LSP setup. So, the local control can also be seenX and distributed that mapping to Rd. Then asafflicted with this problem, if it exists. Moreover, if stalled propagation were trulysoon as these conditions all hold, Rd should map aproblem, there are other schemes in MPLSlabel to X and distribute thatwould facemapping to Ru. Whereas PushUnconditional causes thesame issue. For example,distribution of label mappings for all address prefixes in the routing table, PushConditional causes the distribution of label mappings only for those address prefixes for which one has received label mappings from one's LSP next hop, or for which one does not have an MPLS-capable L3 next hop. Rosen, Viswanathan & Callon [Page55]51] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 thedraft-ietf-mpls-arch-01.txt March 1998 4.1.1.3. PulledUnconditional Let Rd be an LSR. Suppose that: 1. X is an address prefix in Rd's routing table 2. Ru is a label distributionthrough PIM, Explicit Route setup, and RSVP would also not work, and therefore should be withdrawn :-). Note that exhaustionpeer of Rd with respect to X 3. Ru has explicitly requested that Rd map a labelspace cannot stall the propagation of messagesto X and distribute theupstream nodes. Appropriate indications can be givenmapping tothe upstream nodes in the setup message that no label allocation was made because of exhaustion ofRu Then Rd should map a labelspace, so that correct action can be taken at the upstream nodes,to X andyet the LSP setup would continue. A.6.3 Conclusion The attempt heredistribute that mapping to Ru. Note that if X is notto deride the local control, but since one method subsumes the features and propertiesin Rd's routing table, or if Rd is not an LDP peer ofthe other,Ru with respect to X, thenwhy support both and complicate implementation, interoperability and maintenance? In fact RFC1925 says, "In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away". A usual diplomatic resolution for such controversy is to make accommodations for both. We feelRd must inform Ru thatit'sit cannot provide apoor choice of architecture to support both. That is why we feel strongly thatmapping at thismust be evaluated by the MPLS WG. Intime. If Rd has already distributed away, controlling the network behavior asmapping for address prefix X towhich LSP are formed, which StreamsRu, and it receives a new request from Ru for a mapping for address prefix X, it will mapto which LSPs,a second label, and distribute theassociated attributes, can be comparednew mapping toapplying policies at the edges ofRu. The first label mapping remains in effect. 4.1.1.4. PulledConditional Let Rd be anAS. This is precisely what the egress control provides, a rich and varied policy control at the egress node of LSPs. Appendix B Why Local ControlLSR. Suppose that: 1. X isBetter This sectionan address prefix in Rd's routing table 2. Ru iswritten by Eric Rosen. The remaining area of dispute between advocatesa label distribution peer of"local control"Rd with respect to X 3. Ru has explicitly requested that Rd map a label to X andadvocates of "egress control" is relatively small. In particular, there is agreement ondistribute thefollowing points: 1. If LSR R1'smapping to Ru 4. Rd is either an LSP Egress or an LSP Proxy Egress for X, or Rd's L3 next hop foraddress prefixX isLSR R2, and R2Rn, where Rn isin a different area or in a different routing domain than R1, then R1 may assigndistinct from Ru, anddistributeRn has bound a labelfor X, even if R2 has not done so. This meansto X and distributed thateven under egress control, the border routersmapping to Rd, or Then as soon as these conditions all hold, Rd should map a label to X and distribute that mapping to Ru. Note that if X is not inone autonomous system doRd's routing table, or if Rd is nothavea label distribution peer of Ru with respect towait, before distributing labels, forX, then Rd must inform Ru that it cannot provide a mapping at this time. However, if the only condition that fails to hold is that Rn has not yet provided a label to Rd, then Rd must defer anydownstream routers which are in other autonomous systems.response to Ru until such time as it has receiving a mapping from Rn. Rosen, Viswanathan & Callon [Page56]52] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 2.draft-ietf-mpls-arch-01.txt March 1998 IfLSR R1's next hop for address prefix X is LSR R2, but R1 receivesRd has distributed a label mapping for address prefix Xfrom LSR R3, then R1 may remember R3's mapping. If,to Ru, and at some later time,R3 becomes R1's next hop for S, then (if R1 is not using loop prevention) R1 may immediately begin using R3 as the LSP next hop for S, using the remembered mapping from R3. 3. Attributes which are passed upstream from the egress may change over time, as a result of reconfigurationany attribute of theegress, or of other events. This means that even if egress control is used, LSRslabel mapping changes, then Rd mustbe ableredistribute the label mapping toaccept attribute changes on existing LSPs; attributes are not fixed whenRu, with theLSP is first constructed, nornew attribute. It must do this even though Ru doesa change in attributes requirenot issue a newLSP to be constructed. The dispute is centered onRequest. In section 4.2, we will discuss how to choose thesituation in whichparticular procedure to be used at any given time, and how to ensure interoperability among LSRs that choose different procedures. 4.1.2. Upstream LSR: Request Procedure The Request Procedure is used by thefollowing conditions hold: -upstream LSRR1's next hopfor an address prefixX is within the same administrative domain as R1, and - R1's next hop for X has not distributedtoR1determine when to explicitly request that the downstream LSR map a labelfor X, and - R1 has not yet distributed to its neighbors any labels for X. With local control, R1 is permittedto that prefix and distribute the mapping. There are three possible procedures that can be used. 4.1.2.1. RequestNever Never make alabel for X to its neighbors; with egress control itrequest. This isnot. From an implementation perspective,useful if thedifference then between egress control and local control is relatively small. Egress control simply creates an additional state indownstream LSR uses thelabel distribution process, and prohibits label distribution in that state. FromPushConditional procedure or theperspective of network behavior, however, this differencePushUnconditional procedure, but is not useful if the downstream LSR uses the PulledUnconditional procedure or the the Pulledconditional procedures. 4.1.2.2. RequestWhenNeeded Make abit more significant: - Egress control adds latency torequest whenever theinitial construction of an LSP, becauseL3 next hop to thepath must be set up serially, node by nodeaddress prefix changes, and one doesn't already have a label mapping from that next hop for theegress. With local control, all LSRs along the path may perform their setup activities in parallel. - Egress control adds additional interdependencies among nodes, as theregiven address prefix. 4.1.2.3. RequestOnRequest Issue a request whenever a request issomething that one node cannot do until some other node does something else first,received, in addition to issuing a request when needed (as described in section 4.1.2.2). If Rd receives such a request from Ru, for an address prefix for which Rd has already distributed Ru a label, Rd shall assign a new (distinct) label, map itcannot do until some other node does something first, etc.to X, and distribute that mapping. (Whether Rd can distribute this mapping to Ru immediately or not depends on the Distribution Procedure being used.) This procedure isproblematical for a number of reasons.useful when the LSRs are implemented on conventional ATM switching hardware. Rosen, Viswanathan & Callon [Page57]53] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 * In robust system design, one tries to avoid such interdependencies, since they always bring along robustnessdraft-ietf-mpls-arch-01.txt March 1998 4.1.3. Upstream LSR: NotAvailable Procedure If Ru and Rd are respectively upstream andscalability problems. * In some situations, it is advantageous for a node to use MPLS, even if some nodedownstreamis not functioning properlylabel distribution peers for address prefix X, andhence not assigning labels as it should. These disadvantages might be tolerable if thereRd issome significant problem which can be solved by egress control,Ru's L3 next hop for X, and Ru requests a mapping for X from Rd, butnot by local control. SoRd replies that itis worth looking to see if there is suchcannot provide aproblem.mapping at this time, then the NotAvailable procedure determines how Ru responds. There are two possible procedures governing Ru's behavior: 4.1.3.1. RequestRetry Ru should issue the request again at anumber of situations in which it may be desirablelater time. That is, the requester is responsible foran LSP Ingress nodetrying again later toknow certain attributes ofobtain theLSP, e.g.,needed mapping. 4.1.3.2. RequestNoRetry Ru should never reissue thenumber of hops inrequest, instead assuming that Rd will provide theLSP. Itmapping automatically when it issometimes claimed that obtaining such information requiresavailable. This is useful if Rd uses theuse of egress control. However, thisPushUnconditional procedure or the PushConditional procedure. 4.1.4. Upstream LSR: Release Procedure Suppose that Rd isnot true. Any attribute ofanLSP is liable to change after the LSP exists. ProceduresLSR which has bound a label todetectaddress prefix X, andcommunicate the change must exist. These procedures CANNOT be tied to the initial construction of the LSP, since they must execute after the LSPhasalready been constructed. The abilitydistributed that mapping topass control information upstream along a path towards an ingress nodeLSR Ru. If Rd does notpresuppose anything about the procedures usedhappen toconstruct the path. The fundamental issue separating the advocates of egress control from the advocates of local control is really a network management issue. To advocates of egress control, setting up an LSPbe Ru's L3 next hop fora particularaddress prefixis analogousX, or has ceased tosetting up a PVCbe Ru's L3 next hop for address prefix X, then Rd will not be using the label. The Release Procedure determines how Ru acts inan ATM network. When setting up a PVC, one goes to one ofthis case. There are two possible procedures governing Ru's behavior: 4.1.4.1. ReleaseOnChange Ru should release thePVC endpointsmapping, andenters certain configuration information. Similarly, one might thinkinform Rd thatto set up an LSP for a particular address prefix, one goes toit has done so. 4.1.4.2. NoReleaseOnChange Ru should maintain the mapping, so that it can use it again immediately if Rd later becomes Ru's L3 next hop for X. Rosen, Viswanathan & Callon [Page 54] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 4.1.5. Upstream LSR: labelUse Procedure Suppose Ru is an LSR whichis the egresshas received label mapping L forthataddressprefix,prefix X from LSR Rd, andenters configuration information. This allows the network administrator complete controlRu is upstream ofwhich address prefixes are assigned LSPsRd with respect to X, andwhich are not. And if thisin fact Rd isone's management model, egress control does simplify the configuration issues. OnRu's L3 next hop for X. Ru will make use of theother hand,mapping ifone's modelRd isthatRu's L3 next hop for X. If, at theLSPs get set up automatically bytime thenetwork, as a resultmapping is received by Ru, Rd is NOT Ru's L3 next hop for X, Ru does not make any use of theoperationmapping at that time. Ru may however start using the mapping at some later time, if Rd becomes Ru's L3 next hop for X. The labelUse Procedure determines just how Ru makes use of Rd's mapping. There are three procedures which Ru may use: 4.1.5.1. UseImmediate Ru may put therouting algorithm, then egress controlmapping into use immediately. At any time when Ru has a mapping for X from Rd, and Rd isof no utility at all. When one hearsRu's L3 next hop for X, Rd will also be Ru's LSP next hop for X. 4.1.5.2. UseIfLoopFree Ru will use theclaimmapping only if it determines that"egress control allow you to control your network fromby doing so, it will not cause afew nodes", whatforwarding loop. If Ru has a mapping for X from Rd, and Rd isreally being claimed(or becomes) Ru's L3 next hop for X, but Rd is"egress control simplifiesNOT Ru's current LSP next hop for X, Ru does NOT immediately make Rd its LSP next hop. Rather, it initiates a loop prevention algorithm. If, upon thejobcompletion ofmanually configuring all the LSPs in your network". Of course, if you don't intendthis algorithm, Rd is still the L3 next hop for X, Ru will make Rd the LSP next hop for X, and use L as the outgoing label. The loop prevention algorithm to be used is still under consideration. 4.1.5.3. UseIfLoopNotDetected This procedure is the same as UseImmediate, unless Ru has detected a loop in the LSP. If a loop has been detected, Ru will discard packets that would otherwise have been labeled with L and sent to Rd. This will continue until the next hop for X changes, or until the Rosen, Viswanathan & Callon [Page 55] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 loop is no longer detected. 4.1.6. Downstream LSR: Withdraw Procedure In this case, there is only a single procedure. When LSR Rd decides to break the mapping between label L and address prefix X, then this unmapping must be distributed tomanually configureall LSRs to which theLSPsmapping was distributed. It is desirable, though not required, that the unmapping of L from X be distributed by Rd to a LSR Ru before Rd distributes to Ru any new mapping of L to any other address prefix Y, where X != Y. If Ru learns of the new mapping of L to Y before it learns of the unmapping of L from X, and if packets matching both X and Y are forwarded by Ru to Rd, then for a period of time, Ru will label both packets matching X and packets matching Y with label L. The distribution and withdrawal of label mappings is done via a label distribution protocol, or LDP. LDP is a two-party protocol. If LSR R1 has received label mappings from LSR R2 via an instance of an LDP, and that instance of that protocol is closed by either end (whether as a result of failure or as a matter of normal operation), then all mappings learned over that instance of the protocol must be considered to have been withdrawn. As long as the relevant LDP connection remains open, label mappings that are withdrawn must always be withdrawn explicitly. If a second label is bound to an address prefix, the result is not to implicitly withdraw the first label, but to map both labels; this is needed to support multi-path routing. If a second address prefix is bound to a label, the result is not to implicitly withdraw the mapping of that label to the first address prefix, but to use that label for both address prefixes. 4.2. MPLS Schemes: Supported Combinations of Procedures Consider two LSRs, Ru and Rd, which are label distribution peers with respect to some set of address prefixes, where Ru is the upstream peer and Rd is the downstream peer. The MPLS scheme which governs the interaction of Ru and Rd can be described as a quintuple of procedures: <Distribution Procedure, Request Procedure, NotAvailable Procedure, Release Procedure, labelUse Procedure>. (Since there is only one Withdraw Procedure, it need not be mentioned.) A "*" appearing in one of the positions is a Rosen, Viswanathan & Callon [Page 56] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 wild-card, meaning that any procedure in that category may be present; an "N/A" appearing in a particular position indicates that no procedure in that category is needed. Only the MPLS schemes which are specified below are supported by the MPLS Architecture. Other schemes may be added in the future, if a need for them is shown. 4.2.1. TTL-capable LSP Segments If Ru and Rd are MPLS peers, and both are capable of decrementing a TTL field in the MPLS header, then the MPLS scheme in use between Ru and Rd must be one of the following: <PushUnconditional, RequestNever, N/A, NoReleaseOnChange, UseImmediate> <PushConditional, RequestWhenNeeded, RequestNoRetry, *, *> The former, roughly speaking, is "local control with downstream label assignment". The latter is an egress control scheme. 4.2.2. Using ATM Switches as LSRs The procedures for using ATM switches as LSRs depends on whether the ATM switches can realize LSP trees as multipoint-to-point VCs or VPs. Most ATM switches existing today do NOT have a multipoint-to-point VC-switching capability. Their cross-connect tables could easily be programmed to move cells from multiple incoming VCs to a single outgoing VC, but the result would be that cells from different packets get interleaved. Some ATM switches do support a multipoint-to-point VC-switching capability. These switches will queue up all the incoming cells from an incoming VC until a packet boundary is reached. Then they will transmit the entire sequence of cells on the outgoing VC, without allowing cells from any other packet to be interleaved. Many ATM switches do support a multipoint-to-point VP-switching capability, which can be used if the Multipoint SVP label encoding is used. Rosen, Viswanathan & Callon [Page 57] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 4.2.2.1. Without Multipoint-to-point Capability Suppose that R1, R2, R3, and R4 are ATM switches which do not support multipoint-to-point capability, but are being used as LSRs. Suppose further that the L3 hop-by-hop path for address prefix X is <R1, R2, R3, R4>, and that packets destined for X can enter the network at any of these LSRs. Since there is no multipoint-to-point capability, the LSPs must be realized as point-to-point VCs, which means that there needs to be three such VCs for address prefix X: <R1, R2, R3, R4>, <R2, R3, R4>, and <R3, R4>. Therefore, if R1 and R2 are MPLS peers, and either is an LSR which is implemented using conventional ATM switching hardware (i.e., no cell interleave suppression), the MPLS scheme in use between R1 and R2 must be one of the following: <PulledUnconditional, RequestOnRequest, RequestRetry, ReleaseOnChange, UseImmediate> <PulledConditional, RequestOnRequest, RequestNoRetry, ReleaseOnChange, *> The use of the RequestOnRequest procedure will cause R4 to distribute three labels for X to R3; R3 will distribute 2 labels for X to R2, and R2 will distribute one label for X to R1. The first of these procedures is the "optimistic downstream-on- demand" variant of local control. The second is the "conservative downstream-on-demand" variant of local control. An egress control scheme which works in the absence of multipoint- to-point capability is for further study. 4.2.2.2. With Multipoint-To-Point Capability If R1 and R2 are MPLS peers, and either of them is an LSR which is implemented using ATM switching hardware with cell interleave suppression, and neither is an LSR which is implemented using ATM switching hardware that does not have cell interleave suppression, then the MPLS scheme in use between R1 and R2 must be one of the following; <PushConditional, RequestWhenNeeded, RequestNoRetry, *, *> <PushUnconditional, RequestNever, N/A, NoReleaseOnChange, UseImmediate> Rosen, Viswanathan & Callon [Page 58] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 <PulledConditional, RequestOnRequest, RequestNoRetry, ReleaseOnChange, *> The first of these is an egress control scheme. The second is is the "downstream" variant of local control. The third is the "conservative downstream-on-demand" variant of local control. 4.2.3. Interoperability Considerations It is easy to see that certain quintuples do NOT yield viable MPLS schemes. For example: - <PulledUnconditional, RequestNever, *, *, *> <PulledConditional, RequestNever, *, *, *> In these MPLS schemes, the downstream LSR Rd distributes label mappings to upstream LSR Ru only upon request from Ru, but Ru never makes any such requests. Obviously, these schemes are not viable, since they will not result in the proper distribution of label mappings. - <*, RequestNever, *, *, ReleaseOnChange> In these MPLS schemes, Rd releases mappings when it isn't using them, but it never asks for them again, even if it later has a need for them. These schemes thus do not ensure that label mappings get properly distributed. In this section, we specify rules to prevent a pair of LDP peers from adopting procedures which lead to infeasible MPLS Schemes. These rules require the exchange of information between LDP peers during the initialization of the LDP connection between them. 1. Each must state whether it is an ATM switch, and if so, whether it has cell interleave suppression. 2. If Rd is an ATM switch without cell interleave suppression, it must state whether it intends to use the PulledUnconditional procedure or the Pulledconditional procedure. If the former, Ru MUST use the RequestRetry procedure; if the latter, Ru MUST use the RequestNoRetry procedure. 3. If Ru is an ATM switch without cell interleave suppression, it must state whether it intends to use the RequestRetry or the RequestNoRetry procedure. If Rd is an ATM switch without cell interleave suppression, Rd is not bound by this, and in fact Ru MUST adopt Rd's preferences. However, if Rd is NOT an ATM Rosen, Viswanathan & Callon [Page 59] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 switch without cell interleave suppression, then if Ru chooses RequestRetry, Rd must use PulledUnconditional, and if Ru chooses RequestNoRetry, Rd MUST use PulledConditional. 4. If Rd is an ATM switch with cell interleave suppression, it must specify whether it prefers to use PushConditional, PushUnconditional, or PulledConditional. If Ru is not an ATM switch without cell interleave suppression, it must then use RequestWhenNeeded and RequestNoRetry, or else RequestNever and NoReleaseOnChange, respectively. 5. If Ru is an ATM switch with cell interleave suppression, it must specify whether it prefers to use RequestWhenNeeded and RequestNoRetry, or else RequestNever and NoReleaseOnChange. If Rd is NOT an ATM switch with cell interleave suppression, it must then use either PushConditional or PushUnconditional, respectively. 4.2.4. How to do Loop Prevention TBD 4.2.5. How to do Loop Detection TBD. 4.2.6. Security Considerations Security considerations are not discussed in this version of this draft. 5. Authors' Addresses Eric C. Rosen Cisco Systems, Inc. 250 Apollo Drive Chelmsford, MA, 01824 E-mail: erosen@cisco.com Arun Viswanathan Lucent Technologies 101 Crawford Corner Rd., #4D-537 Holmdel, NJ 07733 732-332-5163 Rosen, Viswanathan & Callon [Page 60] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 E-mail: arunv@dnrc.bell-labs.com Ross Callon IronBridge Networks 55 Hayden Avenue, Lexington, MA 02173 +1-781-402-8017 E-mail: rcallon@ironbridgenetworks.com 6. References [1] "A Framework for Multiprotocol Label Switching", R.Callon, P.Doolan, N.Feldman, A.Fredette, G.Swallow, and A.Viswanathan, work in progress, Internet Draft <draft-ietf-mpls-framework-02.txt>, November 1997. [2] "ARIS: Aggregate Route-Based IP Switching", A. Viswanathan, N. Feldman, R. Boivie, R. Woundy, work in progress, Internet Draft <draft-viswanathan-aris-overview-00.txt>, March 1997. [3] "ARIS Specification", N. Feldman, A. Viswanathan, work in progress, Internet Draft <draft-feldman-aris-spec-00.txt>, March 1997. [4] "Tag Switching Architecture - Overview", Rekhter, Davie, Katz, Rosen, Swallow, Farinacci, work in progress, Internet Draft <draft- rekhter-tagswitch-arch-00.txt>, January, 1997. [5] "Tag distribution Protocol", Doolan, Davie, Katz, Rekhter, Rosen, work in progress, Internet Draft <draft-doolan-tdp-spec-01.txt>, May, 1997. [6] "Use of Tag Switching with ATM", Davie, Doolan, Lawrence, McGloghrie, Rekhter, Rosen, Swallow, work in progress, Internet Draft <draft-davie-tag-switching-atm-01.txt>, January, 1997. [7] "Label Switching: Label Stack Encodings", Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, Conta, work in progress, Internet Draft <draft-ietf-mpls-label-encaps-01.txt>, February, 1998. [8] "Partitioning Tag Space among Multicast Routers on a Common Subnet", Farinacci, work inyour network, this is irrelevant. So before an egress control scheme is adopted, one should ask whether complete manual configuration of the set of address prefixes whichprogress, internet draft <draft- farinacci-multicast-tag-part-00.txt>, December, 1996. [9] "Multicast Tag Binding and Distribution using PIM", Farinacci, Rekhter, work in progress, internet draft <draft-farinacci- multicast-tagsw-00.txt>, December, 1996. Rosen, Viswanathan & Callon [Page58]61] Internet Draftdraft-ietf-mpls-arch-00.txt August 1997 get assigned LSPs is necessary. That is, is this capability needed to solve a real problem? It is sometimes claimed that egress control is needed if one wants to conserve labels by assigning a single label to all address prefixes which have the same egress. This is not true. If the network is running a link state routing algorithm, each LSR already knows which address prefixes have a common egress, and hence can assign a common label. If the network is running a distance vector routing protocol, information about which address prefixes have a common egress can be made to "bubble up" from the egress, using LDP, even if local control is used. It is only in the case where the number of available labels is so small that their use must be manually administered that egress control has an advantage. It may be arguable that egress control should be an option that can be useddraft-ietf-mpls-arch-01.txt March 1998 [10] "Toshiba's Router Architecture Extensions forthe special cases in which it provides value. In most cases, there is no reason to have it at all.ATM: Overview", Katsube, Nagami, Esaki, RFC 2098, February, 1997. [11] "Loop-Free Routing Using Diffusing Computations", J.J. Garcia- Luna-Aceves, IEEE/ACM Transactions on Networking, Vol. 1, No. 1, February 1993. Rosen, Viswanathan & Callon [Page59]62] Internet Draft draft-ietf-mpls-arch-01.txt March 1998 Rosen, Viswanathan & Callon [Page 63] ----