RPKI Time-of-Flight: Tracking Delays in the Management, Control, and Data Planes
What is the life cycle of Resource Public Key Infrastructure (RPKI) data used to secure Internet routing? More specifically, how long does a Route Origin Authorization (ROA) take to propagate and how quickly does it actually affect Internet routing and reachability?
These are questions we would love to have answers to, given that changes on the RPKI management plane can impact how traffic flows. To answer these questions we dissect the stages in the life of RPKI data.
Below is a summary of the RPKI lifecycle and our findings.
Key points:
- Creation times vary significantly across the Regional Internet Registries (RIRs), ranging from a few minutes to over an hour for new ROAs to reach the publication points.
- High publication delays were initially observed for ARIN and LACNIC due to a timezone issue. The problem has been reported and is now fixed. Observed delays are usually less than 20min.
- Relying Party (RP) delay represents the most time-consuming step observed in ROA processing.
- Deleting ROAs takes longer to reflect in BGP as routers explore alternate routes that have not yet been invalidated.
ROV supply chain
Publishing ROAs involves a large number of players, is not instantaneous, and is often dominated by ad hoc administrative decisions.
It starts when a resource holder queries an RIR to create or update RPKI information for its prefixes. The ROAs and other meta files (manifests, CRLs) are then placed in public repositories called publication points.
RPs periodically fetch and validate all the objects from the global RPKI repositories, after which they produce a list of Validated ROA Payloads (VRPs) that routers use to verify incoming BGP announcements. These changes are fetched by operators performing Route Origin Validation (ROV-enabled ASes, green in Fig.1) that use this new information to update their routers. Only then do changes appear on the data plane when routing announcements are either accepted or dropped by ROV-enabled ASes.
Figure 1 — Data flow from the creation of a ROA by the prefix holder to the corresponding BGP updates are recorded at the route collectors (RIS / RouteViews). The red labels on the left show the points at which time measurements were taken. |
Delay differences between ROA creation and deletion
Each of the above steps is common to all RIRs and ROV-enabled ASes, but each (may) perform these steps at different time intervals and frequencies.
In our study, we found that RIRs usually publish new RPKI information within five minutes, except APNIC which averages ten minutes slower (Table 1, column 3). We also observed significant disparities in ISPs’ reaction time to new RPKI information, ranging from a few minutes to one hour.
Table 1— ROA Creation median delays (IPv6 in parentheses). |
When deleting ROAs, we found the delay to be significantly longer (Table 2) except for ARIN and LACNIC (we explain why these differ below). For ROA revocation, we observed that the delay between ROA deletion and unreachability varies depending on the topology. Again, BGP delays are significantly higher for ROA deletion than for ROA creation. For example, BGP delay for unreachability goes up to 51 minutes for IPv4 and 56 minutes for IPv6 and we rarely observe short BGP delays. We proposed two possible causes for this:
Using multiple RP caches (for redundancy) is likely to react significantly slower to ROA deletion than to ROA creation. BGP path hunting (Figure 3): in some cases, we observed that the AS path between the RIPE Atlas probe and the destination changed before becoming unreachable.
Table 2— ROA deletion median delays (IPv6 in parentheses). |
Figure 2 — Difference between ROA creation and deletion. |
ARIN and LACNIC timezone issues
Before April 2022, the publication delay for ARIN and LACNIC could last several hours due to a time zone conversion problem.
Both RIRs intended to set ROAs’ NotBefore values to midnight, but instead, ARIN has been setting this value to 04:00 UTC or 05:00 UTC (corresponding respectively to 00:00 in Eastern Daylight Time and Eastern Standard Time) and LACNIC to 03:00 UTC (corresponding to 00:00 in Uruguay Standard Time).
For example, a query at 01:00 UTC to create a ROA in LACNIC would create a ROA with a NotBefore value set to 03:00 UTC. Therefore, the ROA would be invalid for the two hours following its creation. Our experiment revealed that the publication point wisely does not publish the ’not-yet-valid’ ROA to the repository, therefore, delaying its availability to RPs. The same holds for ARIN.
We reported this issue to both ARIN and LACNIC, who promptly acknowledged and fixed the problem.
Data Plane Measurements
As more ROAs are created to protect prefixes from being mis-originated, one wonders how long it takes for the effect of RPKI changes to appear in the data plane.
To achieve the above, we used a ‘toggling ROAs’ mechanism, where we use an “invalidating” ROA with AS666 to keep the RPKI status of our advertised prefixes “invalid”. We would then change the RPKI status “valid <-> invalid” of the test prefixes by either creating a “validating” ROA with a proper authorized origin AS or deleting the ROA.
To test data plane reachability and the delay of prefixes with toggling ROAs, we performed traceroutes every 15 minutes from RIPE Atlas with probes in six different ASes. When creating a “validating ROA”, the delay between the user query and data plane reachability is similar to BGP. We observe a median delay between 23 minutes (RIPE) and 50 minutes (APNIC).
Figure 3 — Effects of ROA creation/deletion on the data plane. We observe BGP path hunting with AS path changes. |
RP Delay: Possible Bottleneck?
RPs periodically fetch RPKI data from publications points.
We found that the RP delay represents the most time-consuming step observed in ROA processing. The delay we observed between the ROA creation and the time when an RP validates the new ROA is usually less than 15 minutes for most RIRs. This represents 10 minutes more than the publication delay and consists mainly of the:
- Polling interval (5 minutes delay on average)
- Downloading time from all Certification Authorities (4 minutes)
- ROA processing time (1 minute). There are other factors that can potentially affect RP delays such as varying downloading times due to networking conditions, publication point time-outs, or lack of multi-threading support in some RP software.
Read more
Romain Fontugne, Amreesh Phokeer, Cristel Pelsser, Kevin Vermeulen, Randy Bush (2023). RPKI Time-of-Flight: Tracking Delays in the Management, Control, and Data Planes. Passive and Active Measurement.
This study was partly funded by the MANRS Fellowship program and was a collaboration between IIJ Research Lab, Internet Society, UCLouvain, LAAS-CNRS, and Arrcus Inc.