Bind suddenly starts responding clients with servfail

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Bind suddenly starts responding clients with servfail

Søren Andersen
Hello List,

I'm running a few BIND servers, but lately one of my servers suddenly starts responding to clients with servfail for every request from the clients, and BIND doesn't respond to the rndc or statistics interface anymore. 

My logs for client-channel show me this: 
25-Apr-2020 21:52:04.501 client @XX XX.37#2921 (google.dk): no more recursive clients (1000/900/1000): quota reached

I've removed all the dns traffic from the server, and the quota is still reached after 6+ hours?

Do you guys have some clue what all this is about? - Or any suggestions where to look for any further information?

I'm running BIND 9.16.1 on CentOS 7:

named -V
BIND 9.16.1 (Stable Release) <id:d497c32>
running on Linux x86_64 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/opt/isc/isc-bind/root/usr' '--exec-prefix=/opt/isc/isc-bind/root/usr' '--bindir=/opt/isc/isc-bind/root/usr/bin' '--sbindir=/opt/isc/isc-bind/root/usr/sbin' '--sysconfdir=/etc/opt/isc/isc-bind' '--datadir=/opt/isc/isc-bind/root/usr/share' '--includedir=/opt/isc/isc-bind/root/usr/include' '--libdir=/opt/isc/isc-bind/root/usr/lib64' '--libexecdir=/opt/isc/isc-bind/root/usr/libexec' '--localstatedir=/var/opt/isc/isc-bind' '--sharedstatedir=/var/opt/isc/isc-bind/lib' '--mandir=/opt/isc/isc-bind/root/usr/share/man' '--infodir=/opt/isc/isc-bind/root/usr/share/info' '--disable-static' '--enable-dnstap' '--with-pic' '--with-gssapi' '--with-json-c' '--with-libtool' '--with-libxml2' '--without-lmdb' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--with-python' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS= -L/opt/isc/isc-bind/root/usr/lib64' 'PKG_CONFIG_PATH=:/opt/isc/isc-bind/root/usr/lib64/pkgconfig:/opt/isc/isc-bind/root/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with json-c version: 0.11
linked to json-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled

/Søren

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Bind suddenly starts responding clients with servfail

Frey, Rick E

Recursive clients are lookups/clients on your nameserver on behalf of a query received.  If you are seeing that your nameserver is running out of recursive clients after removing “all” traffic, it would indicate something is still querying your nameserver as BIND won’t spontaneously create recursive lookups.  Perhaps something local on the server is generating queries?

 

A dump of existing recursive clients can be performed using “rndc recursing”.   Output is normally “named.recursing” in your data directory. 

 

I would suspect that your server may be unable to make outbound connections to authoritative servers.  This could cause high number of recursive clients.  Note that behavior of BIND is to start dropping older outstanding recursive lookups once 90% of recursive clients is reached (900 recursive clients in your case).  Thus, a high number of recursive clients in itself normally doesn’t result in SERVFAIL for queries.

 

Not sure why you’re unable to run rndc commands (local or remote?).   Perhaps you are out of file descriptors as well?

 

From: bind-users <[hidden email]> on behalf of Søren Andersen <[hidden email]>
Date: Monday, April 27, 2020 at 4:00 AM
To: "[hidden email]" <[hidden email]>
Subject: Bind suddenly starts responding clients with servfail

 

Hello List,

 

I'm running a few BIND servers, but lately one of my servers suddenly starts responding to clients with servfail for every request from the clients, and BIND doesn't respond to the rndc or statistics interface anymore. 

 

My logs for client-channel show me this: 

25-Apr-2020 21:52:04.501 client @XX XX.37#2921 (google.dk): no more recursive clients (1000/900/1000): quota reached

 

I've removed all the dns traffic from the server, and the quota is still reached after 6+ hours?

 

Do you guys have some clue what all this is about? - Or any suggestions where to look for any further information?

 

I'm running BIND 9.16.1 on CentOS 7:

 

named -V
BIND 9.16.1 (Stable Release) <id:d497c32>
running on Linux x86_64 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/opt/isc/isc-bind/root/usr' '--exec-prefix=/opt/isc/isc-bind/root/usr' '--bindir=/opt/isc/isc-bind/root/usr/bin' '--sbindir=/opt/isc/isc-bind/root/usr/sbin' '--sysconfdir=/etc/opt/isc/isc-bind' '--datadir=/opt/isc/isc-bind/root/usr/share' '--includedir=/opt/isc/isc-bind/root/usr/include' '--libdir=/opt/isc/isc-bind/root/usr/lib64' '--libexecdir=/opt/isc/isc-bind/root/usr/libexec' '--localstatedir=/var/opt/isc/isc-bind' '--sharedstatedir=/var/opt/isc/isc-bind/lib' '--mandir=/opt/isc/isc-bind/root/usr/share/man' '--infodir=/opt/isc/isc-bind/root/usr/share/info' '--disable-static' '--enable-dnstap' '--with-pic' '--with-gssapi' '--with-json-c' '--with-libtool' '--with-libxml2' '--without-lmdb' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--with-python' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS= -L/opt/isc/isc-bind/root/usr/lib64' 'PKG_CONFIG_PATH=:/opt/isc/isc-bind/root/usr/lib64/pkgconfig:/opt/isc/isc-bind/root/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with json-c version: 0.11
linked to json-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled

 

/Søren


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Bind suddenly starts responding clients with servfail

Søren Andersen
The only dns request my server are handling now is just some monitoring dns request.. It's just a few dns request / min, not much. 

Even the 'rndc' command cannot get any answer from the named process 😕- It looks like named don't even handle the incoming traffic from rndc command, since my revc-q increase for every time i use the rndc command. (I'm running rndc local)

[root@ns-2d ~]# ss -lnt
State      Recv-Q Send-Q                                                                          Local Address:Port                                                                                         Peer Address:Port
LISTEN     10     128                                                                                 127.0.0.1:953                                                                                                     *:*

And once again:

[root@ns-2d ~]# ss -lnt
State      Recv-Q Send-Q                                                                          Local Address:Port                                                                                         Peer Address:Port
LISTEN     11     128                                                                                 127.0.0.1:953                                                                                                     *:*

Outgoing network traffic is working just fine. I've checked serval dns servers with dig..  So i don't think this is the problem..

Do you guys have any other suggestions for my problem?

/Søren 

From: bind-users <[hidden email]> on behalf of Frey, Rick E <[hidden email]>
Sent: Monday, April 27, 2020 15:11
To: [hidden email] <[hidden email]>
Subject: Re: Bind suddenly starts responding clients with servfail
 
[EXTERNAL MAIL]

Recursive clients are lookups/clients on your nameserver on behalf of a query received.  If you are seeing that your nameserver is running out of recursive clients after removing “all” traffic, it would indicate something is still querying your nameserver as BIND won’t spontaneously create recursive lookups.  Perhaps something local on the server is generating queries?

 

A dump of existing recursive clients can be performed using “rndc recursing”.   Output is normally “named.recursing” in your data directory. 

 

I would suspect that your server may be unable to make outbound connections to authoritative servers.  This could cause high number of recursive clients.  Note that behavior of BIND is to start dropping older outstanding recursive lookups once 90% of recursive clients is reached (900 recursive clients in your case).  Thus, a high number of recursive clients in itself normally doesn’t result in SERVFAIL for queries.

 

Not sure why you’re unable to run rndc commands (local or remote?).   Perhaps you are out of file descriptors as well?

 

From: bind-users <[hidden email]> on behalf of Søren Andersen <[hidden email]>
Date: Monday, April 27, 2020 at 4:00 AM
To: "[hidden email]" <[hidden email]>
Subject: Bind suddenly starts responding clients with servfail

 

Hello List,

 

I'm running a few BIND servers, but lately one of my servers suddenly starts responding to clients with servfail for every request from the clients, and BIND doesn't respond to the rndc or statistics interface anymore. 

 

My logs for client-channel show me this: 

25-Apr-2020 21:52:04.501 client @XX XX.37#2921 (google.dk): no more recursive clients (1000/900/1000): quota reached

 

I've removed all the dns traffic from the server, and the quota is still reached after 6+ hours?

 

Do you guys have some clue what all this is about? - Or any suggestions where to look for any further information?

 

I'm running BIND 9.16.1 on CentOS 7:

 

named -V
BIND 9.16.1 (Stable Release) <id:d497c32>
running on Linux x86_64 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/opt/isc/isc-bind/root/usr' '--exec-prefix=/opt/isc/isc-bind/root/usr' '--bindir=/opt/isc/isc-bind/root/usr/bin' '--sbindir=/opt/isc/isc-bind/root/usr/sbin' '--sysconfdir=/etc/opt/isc/isc-bind' '--datadir=/opt/isc/isc-bind/root/usr/share' '--includedir=/opt/isc/isc-bind/root/usr/include' '--libdir=/opt/isc/isc-bind/root/usr/lib64' '--libexecdir=/opt/isc/isc-bind/root/usr/libexec' '--localstatedir=/var/opt/isc/isc-bind' '--sharedstatedir=/var/opt/isc/isc-bind/lib' '--mandir=/opt/isc/isc-bind/root/usr/share/man' '--infodir=/opt/isc/isc-bind/root/usr/share/info' '--disable-static' '--enable-dnstap' '--with-pic' '--with-gssapi' '--with-json-c' '--with-libtool' '--with-libxml2' '--without-lmdb' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--with-python' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS= -L/opt/isc/isc-bind/root/usr/lib64' 'PKG_CONFIG_PATH=:/opt/isc/isc-bind/root/usr/lib64/pkgconfig:/opt/isc/isc-bind/root/usr/share/pkgconfig'
compiled by GCC 4.8.5 20150623 (Red Hat 4.8.5-39)
compiled with OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
linked to OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
compiled with libxml2 version: 2.9.1
linked to libxml2 version: 20901
compiled with json-c version: 0.11
linked to json-c version: 0.11
compiled with zlib version: 1.2.7
linked to zlib version: 1.2.7
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled

 

/Søren


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Bind suddenly starts responding clients with servfail

Greg Rivers
In reply to this post by Søren Andersen
On Monday, 27 April 2020 03:59:39 CDT Søren Andersen wrote:

> I'm running a few BIND servers, but lately one of my servers suddenly starts
> responding to clients with servfail for every request from the clients, and
> BIND doesn't respond to the rndc or statistics interface anymore.
>
> My logs for client-channel show me this:
> 25-Apr-2020 21:52:04.501 client @XX XX.37#2921 (google.dk): no more
> recursive clients (1000/900/1000): quota reached
>
> I've removed all the dns traffic from the server, and the quota is still
> reached after 6+ hours?
>
> Do you guys have some clue what all this is about? - Or any suggestions
> where to look for any further information?
>
> I'm running BIND 9.16.1 on CentOS 7:
>
I've had the very same thing happen twice in the past two weeks on different production recursive servers running BIND 9.16.2 on FreeBSD. I've opened a ticket with ISC, and they are looking into it. Can you share any additional information that might aid troubleshooting?

If anyone else experiences this, please report it.

--
Greg


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Bind suddenly starts responding clients with servfail

Søren Andersen
Hi Greg,

I'm glad what I'm not the only one having this issue. Currently i've not more information that are not already mention in this mail thread.

But do you have a link to the ticket you have created?

/Søren

From: Greg Rivers <gcr+[hidden email]>
Sent: Friday, May 8, 2020 04:03
To: Søren Andersen <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Bind suddenly starts responding clients with servfail
 
[EXTERNAL MAIL]


On Monday, 27 April 2020 03:59:39 CDT Søren Andersen wrote:
> I'm running a few BIND servers, but lately one of my servers suddenly starts
> responding to clients with servfail for every request from the clients, and
> BIND doesn't respond to the rndc or statistics interface anymore.
>
> My logs for client-channel show me this:
> 25-Apr-2020 21:52:04.501 client @XX XX.37#2921 (google.dk): no more
> recursive clients (1000/900/1000): quota reached
>
> I've removed all the dns traffic from the server, and the quota is still
> reached after 6+ hours?
>
> Do you guys have some clue what all this is about? - Or any suggestions
> where to look for any further information?
>
> I'm running BIND 9.16.1 on CentOS 7:
>
I've had the very same thing happen twice in the past two weeks on different production recursive servers running BIND 9.16.2 on FreeBSD. I've opened a ticket with ISC, and they are looking into it. Can you share any additional information that might aid troubleshooting?

If anyone else experiences this, please report it.

--
Greg



_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: Bind suddenly starts responding clients with servfail

Greg Rivers
On Friday, 8 May 2020 16:27:35 CDT Søren Andersen wrote:
> I'm glad what I'm not the only one having this issue. Currently i've not
> more information that are not already mention in this mail thread.
>
> But do you have a link to the ticket you have created?
>
<https://gitlab.isc.org/isc-projects/bind9/-/issues/1859>

--
Greg


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users