[lacnog] Analysing traffic in context of rejecting RPKI invalids using pmacct

Mar Feb 12 21:33:14 -02 2019

Dear all,

Whether to deploy RPKI Origin Validation with an "invalid == reject"
policy really is a business decision. One has to weigh the pros and
cons: what are the direct and indirect costs of accepting
misconfigurations or hijacks for my company? what is the cost of
deploying RPKI? What is the cost of honoring misconfigured RPKI ROAs?
There are a few thousand misconfigured ROAs, what does this mean for me?

To answer these questions, Paolo Lucente and myself worked to extend
pmacct traffic analysis engine (http://pmacct.net/) in such a way that
it can do perform the RFC 6811 Origin Validation procedure and present
the outcome as a property in the flow aggregation process.

Pmacct has the ability to ingest BGP feeds and correlate the BGP data to
the sflow/netflow/ipfix data. This allows for fantastic business
intelligence, you can see exactly how much traffic is flowing from what
customers to what endpoints for what reason!

Pmacct implemented Origin Validation in a cute way: it separates out
RPKI invalid BGP announcements into two categories:

    a) "invalid with no overlapping or alternative route"
        (aka will be blackholed if 'invalid == reject')

    b) "invalid but an overlapping unknown/valid announcement also exists"
        (end-to-end connectivity can still work).

Because pmacct separates out the various types kinds of (invalid) BGP
announcements, operators don't have to do deploy *anything* in their
network to get a good grasp on how their connectivity to the rest of the
Internet would look like after deploying a "invalid == reject" policy.
No changes to your network configurations are required to make use of
this feature, you don't need to tag routes with communities or do other
tricks. All the analysis happens inside pmacct.

Of course we tested this first in the NTT global backbone AS 2914! At
the moment of writing, we're seeing less than a handful of gigabits per
second being send towards BGP announcements that are RPKI Invalid and
for which no alternative route exists. In context of NTT's backbone
that amount of traffic is just statistical noise. This is a very
encouraging sign, it may help us move towards the goal of deploying RPKI
Origin Validation in AS 2914.

Nusenu wrote a great blog post on where these RPKI ROA misconfigurations
are located, i recommend reading their posts to develop a better
understanding of the problem space:
https://medium.com/@nusenu/where-are-rpki-unreachable-networks-located-65c7a0bae0f8

Even if you don't intend to deploy RPKI Origin Validation (or are
single-homed), pmacct's RPKI capabilities can be useful in forensic
investigations. It'll be easier to analyse how much and what kind of
traffic for what period of time was sent to a possible hijack. This
will help you when writing RFOs!

If you want to testdrive this feature, fetch pmacct version 1.7.3-rc1
from https://github.com/pmacct/pmacct/releases/tag/1.7.3-rc1

Documentation on how to configure the feature:
    https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1783-#L1833
    https://github.com/pmacct/pmacct/blob/master/CONFIG-KEYS#L2626-#L2647

Let us know what you think! Or if you'd like to chat telemetry with
Paolo or me about analysing the effects of BGP hijacks and RPKI, we'll
both be at the San Francisco NANOG meeting next week!

Kind regards,

Job