[lacnog] problemas con el TA de RPKI de LACNIC
Job Snijders
job en sobornost.net
Jue Mayo 11 09:31:54 -03 2023
On Mon, Apr 17, 2023 at 03:49:59PM +0000, Job Snijders wrote:
> I'm concerned there might be an 'active/active' aspect in the
> high-availability setup of LACNIC without proper synchronization
> within the cluster itself. For example: if some kind of
> 'directory-to-RRDP' conversion process is executed on two (or more)
> nodes, the nodes each should use a unique RRDP session ID, and a
> load-balancer should do apply active/backup distribution.
>
> I'm happy to help investigate where exactly the issue resides to
> prevent reoccurance.
The same type of issue happened again last night, twice :-(
The thing that stands out to me is that a validly signed Manifest
somehow was published - that's no longer (?) visible on the RRDP server.
File: ./rpki-20230511T001557Z/data/repository.lacnic.net/rpki/lacnic/48f083bb-f603-4893-9990-0284c04ceb85/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft
Hash identifier: NR4rQ98BxuZaIJrAiIS/7o+jT4g5mlZK7C4771nHq4M=
Subject key identifier: D6:24:C1:D2:66:7D:45:68:89:AC:E8:D0:3A:51:AC:14:EC:12:C3:97
Authority key identifier: 14:70:94:B4:E4:47:E3:EE:2D:CC:3F:D5:27:3D:46:EB:9D:C4:78:07
Certificate issuer: /CN=production O=lacnic
Certificate serial: E96949
Authority info access: rsync://repository.lacnic.net/rpki/lacnic/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.cer
Subject info access: rsync://repository.lacnic.net/rpki/lacnic/48f083bb-f603-4893-9990-0284c04ceb85/ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft
Manifest number: 55E7
Signing time: Thu 11 May 2023 00:00:23 +0000
Manifest this update: Wed 10 May 2023 23:00:18 +0000
Manifest next update: Sun 14 May 2023 01:20:18 +0000
When inspecting RRDP session 9f374a50-5c12-4dfc-8718-b46aaabbd733 deltas
14223 (May 10th, 13:37) through delta 14332 (May 11th, 09:21) - only the
following manifestNumbers show up for ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft:
Manifest number: 55CC
Manifest number: 55CD
Manifest number: 55CE
Manifest number: 55CF
Manifest number: 55D0
Manifest number: 55D1
Manifest number: 55D2
Manifest number: 55D3
Manifest number: 55D4
Manifest number: 55D5
Manifest number: 55D6
Manifest number: 55D7
*gap 1*
Manifest number: 55D9
Manifest number: 55DA
*gap 2*
*gap 3*
Manifest number: 55DD
*gap 4*
*gap 5*
Manifest number: 55E0
Manifest number: 55E1
Manifest number: 55E2
*gap 6*
*gap 7*
Manifest number: 55E5
*gap 8*
*gap 9*
Manifest number: 55E8
Manifest number: 55E9
Manifest number: 55EA
Manifest number: 55EB
Manifest number: 55EC
It is perfectly normal to see gaps in the monotonically incrementing
manifestNumbers... but what's *NOT* normal is that archiving validators
have copies of Manifest files that according to the RRDP server never
existed.
This means that either the RRDP XML files have retroactively been
modified (seems unlikely), or a load-balancer is causing different RPKI
validators to end up at different backends that all pretend to have the
same RRDP Session ID but in reality are not synchronized with each
other (more likely), or something else.
The RRDP deltas are a journal in which ADD/REMOVE/UPDATEs are
registered: it should not have been possible to have obtained a validly
signed copy of ff14e9055d5afaa37fbe20f4a26bd13c8f18d79a.mft with number
55E7 without it showing up in the LACNIC RRDP XML files - yet somehow
this happened.
I've made a copy of all relevant RRDP XML files available here:
https://chloe.sobornost.net/~job/lacnic-outage-may-2023/
What is wrong with the LACNIC RRDP service? 4 incidents in less than a
month.
Kind regards,
Job
Más información sobre la lista de distribución LACNOG