Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

Gareth Parks
Hi,

I have three centos 7 servers running bind acting as internal resolvers. There was an update released that upgrades them from 0:9.9.4-74.el7_6.2 to 32:9.11.4-16.P2.el7_8.2. On performing this upgrade to one of the servers there has been a notable increase in retry and timeout errors as measured by data collected from the statistics channel. Where previously the number of errors for retry and timeouts was < 10/2 minutes I now regularly see spikes > 50/2 minutes and the error levels have remained consistent on the other two servers. When I downgrade the server back to 9.9.4 the error rate drops as well.

I increased the log level for the query-errors log and observed the number of entries between the upgraded and non-upgraded servers were about the same so there doesn't appear to be an increase in errors.

I'm not sure whether the issue is that I'm not looking in the correct place to identify the source of retries/timeouts or the other possibility that occurred to me is that there might have been a change between the two versions for what data is represented by those retry/timeout counters and the increased rate is not a problem but just representing different information.

Gareth
 
   
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users

OutlookEmoji-signature_2340144644a600368-9f8b-4dd9-9094-d4611542cbcc.png (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

Mark Andrews
Well BIND 9.11+ supports DNS COOKIE by default and there are some servers that mishandle EDNS requests with a DNS COOKIE option present.  Unknown EDNS options are supposed to be ignored, but there are servers/firewalls that just drop such queries.  Others return FORMERR, others return NXDOMAIN when there is a answer w/o the option being present, others echo unknown options, and others still send back a DNS COOKIE response but fail to correctly copy the client cookie part to the response.

https://ednscomp.isc.org/compliance/ts/govfull.optfail.html show how servers for .GOV zone behave when presented with a unknown EDNS option.  Other datasets are similar.

You can use "server <prefix> { send-cookie no; };” to work around known broken servers.

Mark

> On 4 May 2020, at 11:21, Gareth Parks <[hidden email]> wrote:
>
> Hi,
>
> I have three centos 7 servers running bind acting as internal resolvers. There was an update released that upgrades them from 0:9.9.4-74.el7_6.2 to 32:9.11.4-16.P2.el7_8.2. On performing this upgrade to one of the servers there has been a notable increase in retry and timeout errors as measured by data collected from the statistics channel. Where previously the number of errors for retry and timeouts was < 10/2 minutes I now regularly see spikes > 50/2 minutes and the error levels have remained consistent on the other two servers. When I downgrade the server back to 9.9.4 the error rate drops as well.
>
> I increased the log level for the query-errors log and observed the number of entries between the upgraded and non-upgraded servers were about the same so there doesn't appear to be an increase in errors.
>
> I'm not sure whether the issue is that I'm not looking in the correct place to identify the source of retries/timeouts or the other possibility that occurred to me is that there might have been a change between the two versions for what data is represented by those retry/timeout counters and the increased rate is not a problem but just representing different information.
>
> Gareth
>
> <OutlookEmoji-signature_2340144644a600368-9f8b-4dd9-9094-d4611542cbcc.png>_______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>
> bind-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/bind-users

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: [hidden email]

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

Gareth Parks

I set send-cookie no; globally to test this theory out but the pattern of retries and timeout continued. Despite this I was able to determine the retries/timeouts matches the same pattern as the resolver statistic for truncated responses received which suggests they are related.


When I look at the same graph on one of the other servers it doesn't have any truncated responses but instead has a lot of NXDOMAIN errors which the upgraded server does not.


Gareth



From: Mark Andrews <[hidden email]>
Sent: Monday, 4 May 2020 12:13 PM
To: Gareth Parks
Cc: [hidden email]
Subject: Re: Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade
 
Message from External Sender

Well BIND 9.11+ supports DNS COOKIE by default and there are some servers that mishandle EDNS requests with a DNS COOKIE option present.  Unknown EDNS options are supposed to be ignored, but there are servers/firewalls that just drop such queries.  Others return FORMERR, others return NXDOMAIN when there is a answer w/o the option being present, others echo unknown options, and others still send back a DNS COOKIE response but fail to correctly copy the client cookie part to the response.

https://urldefense.proofpoint.com/v2/url?u=https-3A__ednscomp.isc.org_compliance_ts_govfull.optfail.html&d=DwIFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=YT6tAUO21wmmbZ6L3VHF95Ws6lcJb3NPmWpTtQNY9wo&m=toMCYizzDwsssH4G2tEaiaasg0S6WDJ4jIqUgj4usU4&s=cXVSwXE8RZChCdqj6Ouc5Rz07kHUdjhbu3TxhEYQ06k&e=  show how servers for .GOV zone behave when presented with a unknown EDNS option.  Other datasets are similar.

You can use "server <prefix> { send-cookie no; };” to work around known broken servers.

Mark

> On 4 May 2020, at 11:21, Gareth Parks <[hidden email]> wrote:
>
> Hi,
>
> I have three centos 7 servers running bind acting as internal resolvers. There was an update released that upgrades them from 0:9.9.4-74.el7_6.2 to 32:9.11.4-16.P2.el7_8.2. On performing this upgrade to one of the servers there has been a notable increase in retry and timeout errors as measured by data collected from the statistics channel. Where previously the number of errors for retry and timeouts was < 10/2 minutes I now regularly see spikes > 50/2 minutes and the error levels have remained consistent on the other two servers. When I downgrade the server back to 9.9.4 the error rate drops as well.
>
> I increased the log level for the query-errors log and observed the number of entries between the upgraded and non-upgraded servers were about the same so there doesn't appear to be an increase in errors.
>
> I'm not sure whether the issue is that I'm not looking in the correct place to identify the source of retries/timeouts or the other possibility that occurred to me is that there might have been a change between the two versions for what data is represented by those retry/timeout counters and the increased rate is not a problem but just representing different information.
>
> Gareth
>
> <OutlookEmoji-signature_2340144644a600368-9f8b-4dd9-9094-d4611542cbcc.png>_______________________________________________
> Please visit https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers&d=DwIFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=YT6tAUO21wmmbZ6L3VHF95Ws6lcJb3NPmWpTtQNY9wo&m=toMCYizzDwsssH4G2tEaiaasg0S6WDJ4jIqUgj4usU4&s=P3JuggovK1bx0g_3_p1eh_KMt7kBWIf1QEqBqYe5mUk&e=  to unsubscribe from this list
>
> bind-users mailing list
> [hidden email]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers&d=DwIFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=YT6tAUO21wmmbZ6L3VHF95Ws6lcJb3NPmWpTtQNY9wo&m=toMCYizzDwsssH4G2tEaiaasg0S6WDJ4jIqUgj4usU4&s=P3JuggovK1bx0g_3_p1eh_KMt7kBWIf1QEqBqYe5mUk&e=

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: [hidden email]


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users