technical misery: A Visual Approach to Mutual Redistribution Loops

When a system becomes sufficiently complex, so as to confound our efforts to predict its behavior, we begin speaking of probabilistic, rather than deterministic, outcomes. The convergence of mutually redistributed routing domains is often discussed probabilistically, forcing engineers to rely on lab results, rules-of-thumb, and intuition for guidance during design. And while misconfiguration no doubt lies at the bottom of nearly all pathological behaviors that emerge, I would wager these more often stem from a failure of imagination than cavalier planning.

My intent is to elevate the level of discourse regarding mutual redistribution loops and introduce tools that permit objective, accurate predictions to be made easily, for all but the most unlikely topologies. To accomplish this, I need to introduce two new concepts: a modest notation by which redistributed routes and their corresponding routing domains may be classified, called D-notation, and a simple diagramming scheme, consisting of what I call Juno diagrams. These are intended only as tools to aid discourse and can by no means be regarded as rigorous models of behavior—though I feel my own predictive powers have benefited greatly. That said, I hasten to add that neither idea is beyond improvement nor reproach: both are very young and can only benefit from critical assessment.

The next two sections introduce both D-notation and Juno diagrams. Following these is a section on the construction and interpretation of Juno diagrams, along with a few illustrative examples. A final section will then look briefly at loop classification.

D-notation

Where mutual redistribution is concerned, routing information loops form whenever a redistributed route is preferred over a non-redistributed route, or—more generally—a route redistributed fewer times. If every route were accompanied by a redistribution counter, to be incremented upon each redistribution, the route with the lowest counter could be preferred, avoiding the entire problem. Of course, there is no such counter, but D-notation provides us a convenient way to speak as if there were.

The "D" in D-notation stands for "Domain," and it merely establishes a common language for quantifying how many routing domains removed a particular route is from its origin. Put another way, D-notation counts the number of times a route has been redistributed. For example, a D₀ route is a route that has not been redistributed, residing solely within its originating domain. Were this route to be redistributed, we would say that the D₀ route now supported a D₁ counterpart in a neighboring domain. Redistributing the route again would produce a D₂ route, and so on. In the general case, it is also permissible to refer to D_i and D_i+1 routes rather than speak of a particular i. One might also refer to the totality of D_i routes as D_0...n, where D_n always refers to the largest possible i, given the topology.

Fig. 1 - A D₀ route supports D₁ and D₂ counterparts

Fig. 2 - Routes and their corresponding domains may be discussed relative to one another

We might also speak of an entire routing domain as a D_i domain with respect to a particular route. In this way, we can recount a crude history of a route's life over the course of its (potentially) many redistributions, discussing preceding and succeeding domains as D₀ and D₁, D_i and D_i+1, or D_n-1 and D_n. It is also possible for a domain to assume multiple values for i, as when a routing information loop is formed. Bear in mind, D₀ must at all times refer to the domain of origin to prevent ambiguity. It may be tempting to refer to any preceding domain as D₀, starting wherever one left off, but this only promotes confusion. The general forms of D_i and D_i+1, or D_i-1 and D_i, should make adequate substitutes for discourse, if they are required at all.

Fig. 3 - Redistributing D_i routers are also D_i+1 routers

For routers that lie within a D_i domain, we naturally refer to these as D_i routers. In the case of redistributing routers, we acknowledge these as both D_i and D_i+1 routers, as they have an equal claim to both domains. When the distinction between redistributing and non-redistributing D_i routers must be made, the terms edge and internal may be used, where internal D_i routers do not perform redistribution. To give a flavor of the structure implied, consider the statement, "a D₀ router installs a D₁ route." The only way a D₀ router can learn a D₁ route is if that router lies at the edge of D₀ and D₁. Otherwise, the route would have to be redistributed back into D₀ from D₁, whereby internal D₀ routers would learn a D₂ route, not a D₁ route.

Fig. 4 - Internal routers, by definition, do not perform redistribution

Caveats

A potential caveat of D-notation is that it inherently treats routes as abstract objects, separate from the mechanisms by which they propagate. On the one hand, the reliance of D_{n, n-1, n-2, ..., 1} routes on their D₀ counterpart creates the sense that any D_i route is merely a shadow or reflection of the D₀ route that led to its creation. (I consider this analagous to the way in which feedback may only occur between two mirrors made to face one another; in doing the opposite, this feedback is immediately broken.) On the other hand, if a D₀ route were to meet its D_n counterpart, so to speak, it would fail to recognize its own offspring, suggesting the D_n route has just as much substance as the D₀ route. Certainly, in the case of self-stabilizing (or fowarding) loops, where a D₂ route usurps its D₀ progenitor, the presence or absence of the D₀ route becomes inconsequential. This is rather like a bad Twilight Zone episode, in which some sinister doppelgänger assumes the identity of its ill-fated double.

Three Stages

Let us now turn to matters more concrete. Using D-notation, we can identify, in a concise way, three key events or stages that lead to pathological behavior in a mutually redistributed internetwork. These are:

1. D_i routers install D_i+1 routes
2. D_i routers propagate D_i+2 routes
3. D_i routers install D_i+2 routes

Fig. 5 - At stage 1, edge routers install routes learned from their connecting domains

At stage 1, looped routes merely become candidates for redistribution, whereas, at stage 2, the looped (D_i+1) routes are redistributed and advertised. By the time stage 3 is reached, D_i internal routers have begun learning and installing the looped (D_i+2) routes: the damage is done. In the worst case, stage 3 sees other D_i edge routers install the looped (D_i+2) routes, creating either route oscillation or self-stabilizing routing loops that persist after advertisement of the instigating D₀ route has ceased.

Fig. 6 - At stage 2, edge routers redistribute D_i+1 routes, advertising D_i+2 routes into D_i

Fig. 7 - Stage 3 occurs when D_i routers install D_i+2 routes

Fortunately, there are corrective actions that may be taken for each stage, and intervening at an earlier stage prevents all later ones. Summarization, despite its indifference to any of the three stages, can be effective but depends strongly on the topology and can create new problems. Metric manipulation, in those topologies for which it proves effective, can prevent stage-3 behavior while permitting stage-1 and -2 events. AD (administrative distance) manipulation can be used to prevent stage-1 behavior in topologies involving two routing domains but is insufficient for topologies in which a high-AD domain lies between two low-AD domains, or vice versa. A more direct approach, route filtering, can be quite effective in stopping stage-1 or -2 behavior but can result in loss of routing in some failure scenarios. Route filtering can also permit D_i edge routers to prefer D_i+1 over D_i routes, but this behavior may be acceptable under certain conditions.

An Appeal

The reframing of familiar tools and troubles in the language of D-notation has been tedious, but necessary, work. Yet the reader may still feel D-notation's importance is not evident, or its use, not justified. Indeed, augmenting a complex network with a new system of notation may seem to defeat the purpose of establishing a convenient basis for discourse. But D-notation—rather than collapsing the network's complexity—seeks to ease discussion through calculated abstraction and the imposition of a well-defined structure. Put another way, D-notation abbreviates rather than simplifies, describing complex events in a more concise and modular form, and I believe this is a crucial first step towards eventually automating the detection and mitigation of routing loops.

For actually solving mutual redistribution problems, we must instead rely on D-notation's visual expression: Juno diagrams. It is to these that we turn next.

Introduction to Juno Diagrams

When trying to predict the converged state of a network involving mutual redistribution, the main difficulty lies in holding, in one's mind, a rather long chain of cause and effect. To make matters worse, several scenarios must be tested before one can be satisfied of the final design, and often a balance must be struck between fulfillment of the requirements and acceptable levels of extraneous behavior. Of course, when one variable is altered, its effect on the whole delicate chain must be reassessed to ensure the integrity of the design.

This can be taxing. More crucially, during an examination—or, far worse, while troubleshooting—this process can be time-consuming. Having a simple way to record the effects of mutual redistribution in a given network eliminates the hassle of remembering chains of cause and effect for several scenarios. If the representation is visual, problem assessment is made faster, saving precious minutes that could be spent elsewhere. And, when used as an aid to design, the final result may also stand as finished documentation.

Fig. 8 - A sample stage-1 diagram: RIP/OSPF mutual redistribution

Juno diagrams (so named for their resemblance to the letters j-u-n-o) visually describe what happens at each of the three stages expressed by D-notation. Each diagram consists of several arrows representing (in stage-1 and -2 diagrams) the flow of information in the control plane or (in the case of stage-3 diagrams) next-hop vectors in the forwarding plane. The arrows in stage-1 and -2 diagrams are also annotated with the corresponding routing domain's AD--and route-type or metric, where relevant.

Fig. 9 - Here, stage-2 pathology manifests as a potential metric conflict

New information is conveyed in each stage's Juno diagram. Stage-1 diagrams help identify AD conflicts and inform the construction of stage-2 diagrams. Stage-2 diagrams provide information about the converged network (after race conditions), metric conflicts, and the extent of the damage caused by loops. Stage-3 diagrams, while not essential to solving mutual redistribution problems, are useful for predicting the flow of traffic once the network converges. Moreover, stage-3 diagrams tend to give a better sense of how metrics affect the forwarding plane when loops are present.

Fig. 10 - In the worst case, a forwarding loop develops

Three Stages in Juno

For a topology with two routing domains, two stage-1 diagrams would be used to describe internally injected routes, one for each domain. With three domains, three stage-1 diagrams would be needed, and so on. When connected or static routes are redistributed, additional diagrams may be required, depending on the routing protocol in question. For example, RIP has no concept of internal and external routes: a diagram depicting an externally injected route in RIP is identical to a diagram that depicts an internally injected one. The same could not be said of EIGRP or OSPF.

Fig. 11 - AD conflicts are readily found in stage-1 diagrams

Stage-2 diagrams make explicit the effects of stage-1 events, and thus rely on stage-1 diagrams for the locations of AD conflicts. For each stage-1 diagram, one or more stage-2 diagrams may be constructed, but multiple stage-2 diagrams are only required to observe the effects of race conditions. Topologies with race conditions can converge to one of several stable states, and each state may be depicted by its own stage-2 diagram. In the case of unstable topologies, namely those exhibiting perpetual route oscillation, stage-2 diagrams make explicit the presence of improper metrics.

Fig. 12 - Stage-2 diagrams readily expose potential metric conflicts

For each stage-2 diagram, a corresponding stage-3 diagram may be constructed. Stage-3 diagrams indicate each router's possible next-hops with arrows pointing from one router to the next. The resulting directed graph can then be used to observe the possible paths packets will follow. While not very useful for solving mutual redistribution problems, stage-3 diagrams can be used to predict or confirm symptoms observed in a network troubled by routing loops. That they contain full topological detail, however, makes them unwieldy for all but the simplest networks.

Fig. 13 - In the best case, a metric conflict results in sub-optimal routing

By now, the reader should have a firm grasp of Juno diagrams and their purpose, but this is hardly adequate for making one's own. The next section contains specific instructions for constructing and interpreting Juno diagrams.

Constructing Juno Diagrams

Content forthcoming...

Loop Classification

D_i ~ D_i+2 ^ D_i > D_i+2 => sub-optimal routing
D_i ~ D_i+2 ^ D_i+2 > D_i => self-stabilizing (forwarding) loop
D_i+1 ~ D_i+3 ^ D_i+3 > D_i => route oscillation

3 comments:

Frank CuiMarch 9, 2013 at 10:46:00 AM CST
very interesting, can you continue on the constructing and interpretation of Juno Diagrams? Thanks
academic help from find writing service comMarch 14, 2018 at 10:05:00 AM CDT
you do not take into account all the forces acting on the body, why. I think you should revise the concept

11.20.2012

A Visual Approach to Mutual Redistribution Loops