RRSIG and TTL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RRSIG and TTL

Scott Nicholas
I was hoping someone's experience could save me as I've spent too much time down this rabbit hole.

Primary nameserver is behind a cache/proxy on enterprise network such that all external traffic hits this. Zone went bogus. I blame policy but on further inspection 2/3 proxys had differing TTL between the DNSKEY and it's RRSIG.

I dove into RFC but not yet the code. I believe any security aware system would throw out the DNSKEY with the RRSIG.

I suspect that the signature hit the absolute time, got a fresh copy, and the DNSKEY stuck around another 2 days (1 week TTL). Now if the system wasn't security aware, I'm not sure how the TTL became unmatched but I can see that it could happen. I guess?

The questions

- is this system broken?
- can I work around it with creative policy / TTL
- can explain other cases these can get unmatched TTL?

A low TTL would minimize it but appliance doesn't allow direct configuration for DNSKEY TTL.

Thanks for your input
Scott

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: RRSIG and TTL

Tony Finch
Scott Nicholas <[hidden email]> wrote:
>
> Primary nameserver is behind a cache/proxy on enterprise network such that
> all external traffic hits this. Zone went bogus. I blame policy but on
> further inspection 2/3 proxys had differing TTL between the DNSKEY and it's
> RRSIG.

Hmm, that's suspicious. In the DNS, an RRset is an atomic unit and every
record must have the same TTL. In DNSSEC the RRSIG is part of the RRset,
so if there is a difference between the DNSKEY TTL and the RRSIG(DNSKEY)
TTL there is a bug, and it might be bad enough to cause validation
failures.

It sounds like you have a good idea of what the bug might be, and my guess
is probably the same. If we're right you will be able to provoke
validation failures by

  * query a (sacrificial!) record via the proxy with DO=0 (dig +nodnssec)
    to populate its cache with an RRset maybe lacking RRSIGs
    (that's the guess / bug)

  * change the sacrificial record on the primary

  * query again via the proxy with DO=1 (dig +dnssec) before the old TTL expires

If our guess is right, you'll get the old record with the new RRSIG and
validation will fail.

> I suspect that the signature hit the absolute time, got a fresh copy, and
> the DNSKEY stuck around another 2 days (1 week TTL). Now if the system
> wasn't security aware, I'm not sure how the TTL became unmatched but I can
> see that it could happen. I guess?

Yes.

But there's another issue that can make this bug worse: I think the 7 day
TTL on your DNSKEY records is too long.

BIND's default sig-signing-interval is 30 days, and signatures are
regenerated 1/4 of the interval before expiry, i.e. 7.5 days.

If you want to avoid serving bogus signatures, you need to add together
the zone's SOA expire interval, the propagation delay between your primary
server and your public authoritative servers, and the maximum TTL of any
record in your zone. This sum must be less than the signature regeneration
interval (7.5 days by default).

In practice you will never get anywhere near the expiry interval unless
things are broken, and NOTIFY means the propagation delay is negligible.
So in the real world the important number is how good you are at
monitoring zone propagation delays and fixing things if they become
non-negligible. To allow for SNAFUs this is about the same as the
traditional zone expiry time of about a week...

The logistics are a bit different if you have a reverse proxy in your
authoritative server setup, but I hope you get the idea of how to think
about making sure your DNSSEC signatures are fresh enough.

The other interesting number is the TTL. When choosing TTLs there are
roughly two kinds of records, which I call infrastructure records and,
uuuh, I don't have a word for the others - user records? application
records? Anyway, infrastructure records are the irrelevant crap a resolver
needs in order to get the answers that users actually care about, and of
course this irrelevant crap is the tricky stuff that DNS admins have to
work with: NS records, A and AAAA records of DNS servers, DNSKEY records,
DS records.

The TTL for infrastructure records should be relatively long, to minimize
the amount of irrelevant crap that resolvers have to deal with, i.e. to
reduce the tail latency experienced by end users while resolvers go off to
look at the infrastructure. You start hitting diminishing returns for
infrastructure TTLs after about 24 hours - delegation records in TLDs
typically have TTLs of 24h or 48h, and that's a reasonable length for your
in-zone infrastructure records too.

Any longer than that and you are creating pain for yourself any time you
have to do a nameserver migration or a DNSSEC rollover. With 24h TTLs
you'll need to allow a week for a significant move; for a 7 day TTL you
might be looking at a month of faff to deal with something that's often
tricky and perhaps unexpectedly urgent.

For other records, I find an hour is a reasonable balance between decent
cache performance and not-too-annoying update delays. I don't have records
with enough churn to justify shorter TTLs but your mileage may vary.

(There are scientific measurements of DNS TTL vs latency that agree
reasonably well with my suggestions, so there's a bit more to them than
convenient round numbers!)

> A low TTL would minimize it but appliance doesn't allow direct
> configuration for DNSKEY TTL.

GOOD GRIEF :-(

Tony.
--
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
Biscay, Fitzroy, Sole: East or northeast 4 to 6, occasionally 7 later, but
cyclonic 3 to 5 in south Fitzroy and south Biscay. Moderate or rough, but
slight in southeast Biscay, becoming rough later in Sole. Thundery showers in
Biscay and Fitzroy. Good, occasionally poor in Biscay and Fitzroy.
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: RRSIG and TTL

Scott Nicholas
I was just thinking to update this. The auth server on our end is Infoblox with few knobs for timing (it's not awful but could definitely be better). The caching resolver is BIND. I wasn't initially aware of the transparent cache between. That must be the thing with the implementation bug.

It's not mine to open a case against but I plan to eventually provide my own test results. I'll add your method to the list.

Thank you.

Scott

On Thu, Sep 17, 2020, 6:26 PM Tony Finch <[hidden email]> wrote:
Scott Nicholas <[hidden email]> wrote:
>
> Primary nameserver is behind a cache/proxy on enterprise network such that
> all external traffic hits this. Zone went bogus. I blame policy but on
> further inspection 2/3 proxys had differing TTL between the DNSKEY and it's
> RRSIG.

Hmm, that's suspicious. In the DNS, an RRset is an atomic unit and every
record must have the same TTL. In DNSSEC the RRSIG is part of the RRset,
so if there is a difference between the DNSKEY TTL and the RRSIG(DNSKEY)
TTL there is a bug, and it might be bad enough to cause validation
failures.

It sounds like you have a good idea of what the bug might be, and my guess
is probably the same. If we're right you will be able to provoke
validation failures by

  * query a (sacrificial!) record via the proxy with DO=0 (dig +nodnssec)
    to populate its cache with an RRset maybe lacking RRSIGs
    (that's the guess / bug)

  * change the sacrificial record on the primary

  * query again via the proxy with DO=1 (dig +dnssec) before the old TTL expires

If our guess is right, you'll get the old record with the new RRSIG and
validation will fail.

> I suspect that the signature hit the absolute time, got a fresh copy, and
> the DNSKEY stuck around another 2 days (1 week TTL). Now if the system
> wasn't security aware, I'm not sure how the TTL became unmatched but I can
> see that it could happen. I guess?

Yes.

But there's another issue that can make this bug worse: I think the 7 day
TTL on your DNSKEY records is too long.

BIND's default sig-signing-interval is 30 days, and signatures are
regenerated 1/4 of the interval before expiry, i.e. 7.5 days.

If you want to avoid serving bogus signatures, you need to add together
the zone's SOA expire interval, the propagation delay between your primary
server and your public authoritative servers, and the maximum TTL of any
record in your zone. This sum must be less than the signature regeneration
interval (7.5 days by default).

In practice you will never get anywhere near the expiry interval unless
things are broken, and NOTIFY means the propagation delay is negligible.
So in the real world the important number is how good you are at
monitoring zone propagation delays and fixing things if they become
non-negligible. To allow for SNAFUs this is about the same as the
traditional zone expiry time of about a week...

The logistics are a bit different if you have a reverse proxy in your
authoritative server setup, but I hope you get the idea of how to think
about making sure your DNSSEC signatures are fresh enough.

The other interesting number is the TTL. When choosing TTLs there are
roughly two kinds of records, which I call infrastructure records and,
uuuh, I don't have a word for the others - user records? application
records? Anyway, infrastructure records are the irrelevant crap a resolver
needs in order to get the answers that users actually care about, and of
course this irrelevant crap is the tricky stuff that DNS admins have to
work with: NS records, A and AAAA records of DNS servers, DNSKEY records,
DS records.

The TTL for infrastructure records should be relatively long, to minimize
the amount of irrelevant crap that resolvers have to deal with, i.e. to
reduce the tail latency experienced by end users while resolvers go off to
look at the infrastructure. You start hitting diminishing returns for
infrastructure TTLs after about 24 hours - delegation records in TLDs
typically have TTLs of 24h or 48h, and that's a reasonable length for your
in-zone infrastructure records too.

Any longer than that and you are creating pain for yourself any time you
have to do a nameserver migration or a DNSSEC rollover. With 24h TTLs
you'll need to allow a week for a significant move; for a 7 day TTL you
might be looking at a month of faff to deal with something that's often
tricky and perhaps unexpectedly urgent.

For other records, I find an hour is a reasonable balance between decent
cache performance and not-too-annoying update delays. I don't have records
with enough churn to justify shorter TTLs but your mileage may vary.

(There are scientific measurements of DNS TTL vs latency that agree
reasonably well with my suggestions, so there's a bit more to them than
convenient round numbers!)

> A low TTL would minimize it but appliance doesn't allow direct
> configuration for DNSKEY TTL.

GOOD GRIEF :-(

Tony.
--
f.anthony.n.finch  <[hidden email]http://dotat.at/
Biscay, Fitzroy, Sole: East or northeast 4 to 6, occasionally 7 later, but
cyclonic 3 to 5 in south Fitzroy and south Biscay. Moderate or rough, but
slight in southeast Biscay, becoming rough later in Sole. Thundery showers in
Biscay and Fitzroy. Good, occasionally poor in Biscay and Fitzroy.

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users