[lacnog] problemas con el TA de RPKI de LACNIC

Job Snijders job en sobornost.net
Jue Mayo 11 09:31:54 -03 2023


On Mon, Apr 17, 2023 at 03:49:59PM +0000, Job Snijders wrote:
> I'm concerned there might be an 'active/active' aspect in the
> high-availability setup of LACNIC without proper synchronization
> within the cluster itself. For example: if some kind of
> 'directory-to-RRDP' conversion process is executed on two (or more)
> nodes, the nodes each should use a unique RRDP session ID, and a
> load-balancer should do apply active/backup distribution.
> 
> I'm happy to help investigate where exactly the issue resides to
> prevent reoccurance.

The same type of issue happened again last night, twice :-(

The thing that stands out to me is that a validly signed Manifest
somehow was published - that's no longer (?) visible on the RRDP server.

    File:                     ./rpki-20230511T001557Z/data/repository.lacnic.net/rpki/lacnic/48f083bb-f603-4893-9990-0284c04ceb85/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft
    Hash identifier:          NR4rQ98BxuZaIJrAiIS/7o+jT4g5mlZK7C4771nHq4M=
    Subject key identifier:   D6:24:C1:D2:66:7D:45:68:89:AC:E8:D0:3A:51:AC:14:EC:12:C3:97
    Authority key identifier: 14:70:94:B4:E4:47:E3:EE:2D:CC:3F:D5:27:3D:46:EB:9D:C4:78:07
    Certificate issuer:       /CN=production O=lacnic
    Certificate serial:       E96949
    Authority info access:    rsync://repository.lacnic.net/rpki/lacnic/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.cer
    Subject info access:      rsync://repository.lacnic.net/rpki/lacnic/48f083bb-f603-4893-9990-0284c04ceb85/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft
    Manifest number:          55E7
    Signing time:             Thu 11 May 2023 00:00:23 +0000
    Manifest this update:     Wed 10 May 2023 23:00:18 +0000
    Manifest next update:     Sun 14 May 2023 01:20:18 +0000

When inspecting RRDP session 9f374a50-5c12-4dfc-8718-b46aaabbd733 deltas
14223 (May 10th, 13:37) through delta 14332 (May 11th, 09:21) - only the
following manifestNumbers show up for ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft:

    Manifest number:          55CC
    Manifest number:          55CD
    Manifest number:          55CE
    Manifest number:          55CF
    Manifest number:          55D0
    Manifest number:          55D1
    Manifest number:          55D2
    Manifest number:          55D3
    Manifest number:          55D4
    Manifest number:          55D5
    Manifest number:          55D6
    Manifest number:          55D7
                      *gap 1*
    Manifest number:          55D9
    Manifest number:          55DA
                      *gap 2*
                      *gap 3*
    Manifest number:          55DD
                      *gap 4*
                      *gap 5*
    Manifest number:          55E0
    Manifest number:          55E1
    Manifest number:          55E2
                      *gap 6*
                      *gap 7*
    Manifest number:          55E5
                      *gap 8*
                      *gap 9*
    Manifest number:          55E8
    Manifest number:          55E9
    Manifest number:          55EA
    Manifest number:          55EB
    Manifest number:          55EC

It is perfectly normal to see gaps in the monotonically incrementing
manifestNumbers... but what's *NOT* normal is that archiving validators
have copies of Manifest files that according to the RRDP server never
existed.

This means that either the RRDP XML files have retroactively been
modified (seems unlikely), or a load-balancer is causing different RPKI
validators to end up at different backends that all pretend to have the
same RRDP Session ID but in reality are not synchronized with each
other (more likely), or something else.

The RRDP deltas are a journal in which ADD/REMOVE/UPDATEs are
registered: it should not have been possible to have obtained a validly
signed copy of ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft with number
55E7 without it showing up in the LACNIC RRDP XML files - yet somehow
this happened.

I've made a copy of all relevant RRDP XML files available here:
https://chloe.sobornost.net/~job/lacnic-outage-may-2023/

What is wrong with the LACNIC RRDP service? 4 incidents in less than a
month.

Kind regards,

Job


Más información sobre la lista de distribución LACNOG