Controlling Large-Scale Autonomous Systems
Internet Routing Architectures (CISCO)
Author: Bassam Halabi
Publisher: Cisco Press (53)
This chapter covers the following key topics:
A method of managing expanding mesh requirements in large
ASs by using selected routers as focal points for internal BGP sessions
A method of managing expanding mesh requirements in large
ASs by creating sub-Ass
Controlling IGP Expansion
Methods of managing networks in which expansion is characterized
by the use of multiple IGPs
Virtual Private Networks with Route Reflectors
A method of developing restricted network access, within an
AS, using route reflectors
Autonomous systems consisting of hundreds of routing nodes can pose
a serious routing management problem for network administrators. Service providers
and customers each have their own set of problems when dealing with large
networks. On the service provider side, the majority of routers run BGP. The
IBGP mesh will grow beyond the provider's control. On the customer side, however,
the majority of routers run IGPs, which also may grow beyond the customer's
This chapter discusses methods and techniques that can be used to better
control the deployment of BGP and IGPs inside large autonomous systems. There
are no absolute rules that say a provider or customer should or should not
use one of the methods discussed in this chapter, or which method to prefer.
Keep in mind that any new technique brings with it its own complexities. Imposing
complex techniques on situations that do not really need them could hurt more
In some ISP networks, the internal BGP mesh becomes quite large (more
than 100 internal BGP sessions per router), which strongly suggests that some
new peering mechanism be implemented. The route reflector
 concept is based on the idea of specifying a concentration
router to act as a focal point for internal BGP sessions. Multiple BGP routers
can peer with a central point (the route reflector), and then multiple route
reflectors peer together.
Route reflectors are only recommended for ASs with a large internal
BGP mesh, on the order of more than 100 sessions per router. The route reflector
concept introduces processing overhead on the concentration router and, if
configured incorrectly, can cause routing loops and routing instability. As
a result, route reflectors are not recommended for every topology. If it can
be tolerated, a full mesh is the better solution.
Without route reflectors, BGP speakers in an AS will have to be fully
meshed. We have already discussed this behavior in this book; the following
illustration is just a reminder. In figure 8-1,
RTA, RTB, and RTC form an internal BGP full mesh. Each router acts as a BGP
peer with the other two routers. RTA and RTB are physically connected, as
are RTB and RTC. No physical connection exists between RTA and RTC.
RTA gets an update from an external peer and will pass it on to its
two internal peers, RTB and RTC. Note that even though there is no physical
connectivity between RTA and RTC, RTA will manage to pass the update to RTC
via the BGP peering session. RTB and RTC, in turn, will pass on the update
to their external peers.
RTB will not pass on the update to RTC, because RTC is an internal peer
and the update received by RTB also comes from an internal peer. Without the
internal BGP session between RTA and RTC, RTC would never get the update;
hence, the full mesh is necessary.
The route reflector acts as a concentration point for other routers
called clients. The clients peer with the route reflector
and exchange routing information with it. In turn, the route reflector will
pass on (reflect) the information between clients.
In figure 8-2, RTA gets an update
from an external peer and passes it on to RTB. RTB is configured as a route
reflector with two clients, RTA and RTC. RTB will reflect the update from
client RTA to client RTC. In this configuration, a peering session between
RTA and RTC is not really needed because the route reflector is propagating
the BGP information to RTC.
In an AS where routers have to build BGP sessions with too many other
routers, the route reflector concept becomes very helpful and very scalable.
The route reflector is a router that can perform the route reflection function.
The IBGP peers of the route reflector fall under two categories, clients
and nonclients. A route reflector and its clients form a cluster.
All peers of the route reflector that are not part of the cluster are non-clients.
Figure 8-3 illustrates these
Non-clients must be fully meshed with the route reflector and each other
because they follow the basic rules of the IBGP mesh. Clients should not peer
with internal speakers outside their associated cluster. As you can see, these
conditions have been met for the clients and non-clients in figure
The route reflector function is implemented only on the route reflector;
all clients and non-clients are normal BGP peers that have no notion of the
route reflector. Clients are only considered as such because the route reflector
lists them as clients.
Any route reflector that receives multiple routes for the same destination
will pick the best path based on the usual BGP decision process.The best path
would be propagated inside the AS based on the following rules of operation
(propagation to EBGP runs as usual):
If the route is received from a non-client peer, reflect to
If the route is received from a client peer, reflect to all
non-client peers and also to client peers, except the originator of the route.
If the route is received from an EBGP peer, reflect to all
client and non-client peers.
With the lack of a full BGP mesh inside the AS, redundancy and reliability
become issues. If a route reflector fails, clients will be isolated. Redundancy
requires the existence of multiple route reflectors in an AS where clients
can simultaneously peer with multiple routers. If one peer connection fails,
the other will back it up.
The importance of complementing logical redundancy with physical redundancy
cannot be overstated. It does not make sense to build route reflector redundancy
if the physical redundancy itself does not exist.The logical redundancy arrangement
on the left in figure 8-4 shows RTA
as the client of both RR1 and RR2. RTA is peering with both route reflectors
in an effort to create a redundant link. Unfortunately, if the connection
to RR1 is broken, or if RR1 itself fails, RTA is isolated. The logical connectivity
between RTA and RR2 is of no practical use and is simply more memory and processing
The physical redundancy configuration on the right in figure
8-4 illustrates how logical redundancy can be backed up with
physical redundancy. In the event of a failure in the link to RR1, RTA can
Figure 8-4. Comparison of logical and physical redundancy solutions.
National networks are usually laid out in concentration points per geographical
regions. Providers have POPs (sometimes called hubs) in different
regions in the U.S. with high-speed DS3 or OC3/OC12 links connecting different
locations in a partially meshed topology. The route reflector concept can
be used to logically interconnect the routers running BGP in a pattern that
follows the physical connectivity. Figure
8-5 illustrates a complex arrangement featuring route reflectors (indicated
as RR in this figure and those that follow).
Except for the fact that the route reflector needs to keep up with more
BGP sessions than normal routers, any router could be configured as a route
reflector. Your physical topology should be the main indicator of which is
the best router to choose to be the route reflector.
In figure 8-5, AS100 is divided
into three clusters: San Francisco, Dallas, and New York. The Dallas cluster
has multiple RRs for redundancy. RTA and RTD physically connect San Francisco
to New York. It makes sense to follow the actual physical traffic flow in
selecting RRs, so RTA and RTD are the obvious choices for RRs in the Dallas
Figure 8-5. Complex multiple route reflector environment.
In San Francisco, router RTC physically connects San Francisco to Dallas,
so RTC would be the best candidate to become a RR. The same reasoning applies
for the New York cluster: RTE physically connects New York to Dallas and is
the best candidate for RR.
The route reflector concept does not change the IBGP behavior. The route
reflector is not allowed to change the attributes of the reflected IBGP routes.
The next hop attribute, for example, remains the same when exchanged between
RRs. This is necessary for avoiding loops in the AS.
illustrates why the RR should not modify the attributes of the IBGP reflected
routes. The next hop attribute is used as an example. Figure 8-6
focuses on the portion of the network from figure
8-5 where Dallas connects to San Francisco.
Assume that RTB is specified as the route reflector, rather than RTA,
and that an IBGP session is configured between RTB (220.127.116.11) and RTC (18.104.22.168).
This looks odd because physically RTA is passing the traffic, while logically
RTB is reflecting the BGP updates between RTA and RTC. RTB will receive the
prefix 22.214.171.124/24 from its IBGP neighbor RTC with a next hop of 126.96.36.199.
RTB will reflect the route to its client RTA with the next hop 188.8.131.52 also.
This is the desired behavior.
Alternatively, if RTB were to change the next hop to its IP address,
184.108.40.206, RTA would try to use RTB to reach destination 220.127.116.11/24. A
loop would occur between RTA and RTB, with RTA sending the traffic to RTB,
and RTB trying to use RTA to reach the final destination. This hypothetical
situation exemplifies why the route reflector must not change IBGP behavior
Figure 8-6. The route reflector preserves IBGP attributes.
When dealing with the possibility of routing updates making their way
back into an AS, BGP relies on the information in the AS_path for loop detection.
An update that tries to make its way back into the AS it was originated from
will be dropped by the border router.
With the introduction of route reflectors, there is a potential for
having routing loops within an AS. A routing update that leaves a cluster
might find its way back inside the cluster. Loops inside the AS cannot be
detected by the traditional AS_path approach because the routing updates have
not left the AS yet. BGP offers two extra measures for loop avoidance inside
an AS when route reflectors are configured.
The originator ID is a 4-byte, optional, nontransitive
BGP attribute (type code 9) that is created by the route reflector. This attribute
carries the router ID of the originator of the route in the local AS. If,
because of poor configuration, the update comes back to the originator, the
originator ignores it.
The cluster list is an optional, nontransitive
BGP attribute (type code 10). Each cluster is represented with a cluster ID.
A cluster list is a sequence of cluster IDs that an update has traversed.
When a route reflector sends a route from its clients to nonclients outside
the cluster, it appends the local cluster ID to the cluster list. If the route
reflector receives an update whose cluster list contains the local cluster
ID, the update is ignored. This is basically the same concept as the AS_path
list applied between the clusters inside the AS.
Using originator IDs and cluster lists to avoid loops in ASs using route
Recall from Chapter 5, “Tuning BGP Capabilities,”
that a peer group is a group of BGP neighbors that shares the same routing
policies. Route reflectors can be used in conjunction with peer groups only
when the clients of a route reflector are fully meshed. The reasoning is as
follows: in a normal situation, a router A that learns a prefix from a router
B will send a WITHDRAWN message back to that router to poison that route.
In other words, router A is telling B that this prefix is not reachable via
A. This is to prevent a situation where A claims that a prefix is reachable
via B, and B claims it is reachable via A. In a peer group, the same UPDATE
or WITHDRAWN message is sent to all members of the group. In a peer group/route
reflector situation, a route reflector that has learned a prefix from one
of the clients and is trying to poison that route will end up withdrawing
that prefix from all the other clients. Because the clients are not talking
to one another via BGP, that prefix will be lost. That is why an IBGP mesh
between the clients is needed for the other clients to learn that prefix directly
from the source. Even with this design, the network administrator is still
avoiding a full IBGP mesh between all IBGP routers in the AS and concentrating
the mesh between route reflectors and clients.
With the use of peer groups, the AS design would look like rings of fully
meshed BGP speakers. Route reflectors are fully meshed among each other,
and clients of each route reflector are also fully meshed. Figure
8-7 illustrates such an environment; each circled area represents a
distinct peer group.
In conclusion, the route reflector concept is growing in popularity
for large networks due to the fact that it is a simple approach that enables
scalability without too much overhead. Migrating from a non-route reflector
to a route reflector design is easy because only the route reflectors need
to be modified to behave as route reflectors; all other routers would be running
as usual. Routers that do not implement the route reflector behavior could
be part of the AS without any loss of BGP routing information.
One of the primary architects of OpenCable, Michael
Adams, explains the key concepts of this initiative in his book
Broadband, Second Edition
by George Abe
Introduces the topics surrounding high-speed networks
to the home. It is written for anyone seeking a broad-based familiarity
with the issues of residential broadband (RBB) including product
developers, engineers, network designers, business people, professionals
in legal and regulatory positions, and industry analysts.