NXDOMAIN problems

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

NXDOMAIN problems

Boylan, Ross
I have been experiencing NXDOMAIN errors persistently, though not 100% of the time, for a machine I am trying to reach.  The queries worked OK before today.  I not only don't know what's causing it, but am having trouble tracing what's going on inside of bind.  I'd be grateful for help on either front, getting DNS to work or debugging.

There are a lot of complications.  In brief, the machine and name resolution for it are only available through VPN; I have a search list which should cause some failed lookups if the original doesn't work; and I'm using views.  Some details follow, and then discussion of my debugging attempts.

DETAILS

The remote machine is only accessible though VPN, and the nameserver that knows how to find it is also accessible only through VPN.  The IP of that nameserver is first on my forwarders list on my local machine.  When failures happen the replies indicate the request was addressed to the public-facing nameservers; it is good that they don't provide any info, but they shouldn't be getting the request.

I also added the target domain (ucsf.edu) to my search list.  So when I ask for mymachine.ucsf.edu, this will also generate a query for mymachine.ucsf.edu.ucsf.edu if the first query fails.  The second query is asking for a non-existent domain, and so maybe that is the proximate source of the NXDOMAIN.

The machine I'm making the query from is in my own domain, which is why I'm running BIND.  I use views, and the query is processed through my "inside" view according to the logs.  ucsf.edu is NOT a domain I manage.

DEBUGGING

I directed, either explicitly or via default, all channels to a file and I have tried rndc trace as high as 4.  But I can't tell if the values are coming from the cache or where external queries are going.  Even after flushing the cache I didn't see any info on outbound queries.  I tried using the query-errors channel first, but it didn't seem to capture anything.  I guess NXDOMAIN is not considered an error.

Occasionally I've had success, particularly after flushing the cache (though that doesn't always work).  But when I try 30 seconds later I get NXDOMAIN.

Every query I have directed explicitly (with dig) at the campus nameserver has succeeded.

The VPN connection has always been a bit touchy, and the problem first arose immediately after it went down for somewhere between 30 seconds and a couple of minutes.  My initial theory was that had caused a failure to be cached, but the way I get failures right after successes is not consistent with that.

Thanks for any help.

Ross Boylan
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: NXDOMAIN problems

Boylan, Ross
One other detail may be important: I just added a bridge interface and virtual machines.  I presume the VPN tunnel was using the hardware interface (enp5s0) before, and is using the bridge (br0) now.  OpenConnect creates the tunnel (tun0); both the name and inspection of the code indicate the tunnel is based on the TUN interface, at the IP layer, instead of the TAP interface, at the MAC layer.  If some of the communication is not using IP then I presume it could be disappearing at the bridge.

This theory seems to imply that DNS lookup will always fail, which is not the case.  dig always works (though not a lot of tests) and general lookup rarely works.  I presume the general lookups go through bind, though maybe lwres is involved. If dig and bind use different communication methods that have different abilities to traverse the network stack that might explain some of the differences.

I don't think the virtual network is running any DNS servers since a) with bridging it is not an option and b) they are getting IPs from my main machine.  But if they were, that could definitely mess things up.

This is on Debian 10 (buster) with a Linux 4.19 kernel and bind 9.11.5.

________________________________________
From: Boylan, Ross
Sent: Monday, November 16, 2020 2:58 PM
To: [hidden email]
Cc: Ross Boylan
Subject: NXDOMAIN problems

I have been experiencing NXDOMAIN errors persistently, though not 100% of the time, for a machine I am trying to reach.  The queries worked OK before today.  I not only don't know what's causing it, but am having trouble tracing what's going on inside of bind.  I'd be grateful for help on either front, getting DNS to work or debugging.

There are a lot of complications.  In brief, the machine and name resolution for it are only available through VPN; I have a search list which should cause some failed lookups if the original doesn't work; and I'm using views.  Some details follow, and then discussion of my debugging attempts.

DETAILS

The remote machine is only accessible though VPN, and the nameserver that knows how to find it is also accessible only through VPN.  The IP of that nameserver is first on my forwarders list on my local machine.  When failures happen the replies indicate the request was addressed to the public-facing nameservers; it is good that they don't provide any info, but they shouldn't be getting the request.

I also added the target domain (ucsf.edu) to my search list.  So when I ask for mymachine.ucsf.edu, this will also generate a query for mymachine.ucsf.edu.ucsf.edu if the first query fails.  The second query is asking for a non-existent domain, and so maybe that is the proximate source of the NXDOMAIN.

The machine I'm making the query from is in my own domain, which is why I'm running BIND.  I use views, and the query is processed through my "inside" view according to the logs.  ucsf.edu is NOT a domain I manage.

DEBUGGING

I directed, either explicitly or via default, all channels to a file and I have tried rndc trace as high as 4.  But I can't tell if the values are coming from the cache or where external queries are going.  Even after flushing the cache I didn't see any info on outbound queries.  I tried using the query-errors channel first, but it didn't seem to capture anything.  I guess NXDOMAIN is not considered an error.

Occasionally I've had success, particularly after flushing the cache (though that doesn't always work).  But when I try 30 seconds later I get NXDOMAIN.

Every query I have directed explicitly (with dig) at the campus nameserver has succeeded.

The VPN connection has always been a bit touchy, and the problem first arose immediately after it went down for somewhere between 30 seconds and a couple of minutes.  My initial theory was that had caused a failure to be cached, but the way I get failures right after successes is not consistent with that.

Thanks for any help.

Ross Boylan
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: NXDOMAIN problems

Ondřej Surý
Ross,

I don’t have an answer for you what’s happening, but it would help you with the debugging if you see what happens on the wire? Using wireshark is usually helpful.

Also reviewing named.conf after you made the networking change might help and sharing the anonymized named.conf might trigger somebody with similar experience.

Ondrej
--
Ondřej Surý — ISC (He/Him)

> On 17. 11. 2020, at 6:42, Boylan, Ross <[hidden email]> wrote:
>
> One other detail may be important: I just added a bridge interface and virtual machines.  I presume the VPN tunnel was using the hardware interface (enp5s0) before, and is using the bridge (br0) now.  OpenConnect creates the tunnel (tun0); both the name and inspection of the code indicate the tunnel is based on the TUN interface, at the IP layer, instead of the TAP interface, at the MAC layer.  If some of the communication is not using IP then I presume it could be disappearing at the bridge.
>
> This theory seems to imply that DNS lookup will always fail, which is not the case.  dig always works (though not a lot of tests) and general lookup rarely works.  I presume the general lookups go through bind, though maybe lwres is involved. If dig and bind use different communication methods that have different abilities to traverse the network stack that might explain some of the differences.
>
> I don't think the virtual network is running any DNS servers since a) with bridging it is not an option and b) they are getting IPs from my main machine.  But if they were, that could definitely mess things up.
>
> This is on Debian 10 (buster) with a Linux 4.19 kernel and bind 9.11.5.
>
> ________________________________________
> From: Boylan, Ross
> Sent: Monday, November 16, 2020 2:58 PM
> To: [hidden email]
> Cc: Ross Boylan
> Subject: NXDOMAIN problems
>
> I have been experiencing NXDOMAIN errors persistently, though not 100% of the time, for a machine I am trying to reach.  The queries worked OK before today.  I not only don't know what's causing it, but am having trouble tracing what's going on inside of bind.  I'd be grateful for help on either front, getting DNS to work or debugging.
>
> There are a lot of complications.  In brief, the machine and name resolution for it are only available through VPN; I have a search list which should cause some failed lookups if the original doesn't work; and I'm using views.  Some details follow, and then discussion of my debugging attempts.
>
> DETAILS
>
> The remote machine is only accessible though VPN, and the nameserver that knows how to find it is also accessible only through VPN.  The IP of that nameserver is first on my forwarders list on my local machine.  When failures happen the replies indicate the request was addressed to the public-facing nameservers; it is good that they don't provide any info, but they shouldn't be getting the request.
>
> I also added the target domain (ucsf.edu) to my search list.  So when I ask for mymachine.ucsf.edu, this will also generate a query for mymachine.ucsf.edu.ucsf.edu if the first query fails.  The second query is asking for a non-existent domain, and so maybe that is the proximate source of the NXDOMAIN.
>
> The machine I'm making the query from is in my own domain, which is why I'm running BIND.  I use views, and the query is processed through my "inside" view according to the logs.  ucsf.edu is NOT a domain I manage.
>
> DEBUGGING
>
> I directed, either explicitly or via default, all channels to a file and I have tried rndc trace as high as 4.  But I can't tell if the values are coming from the cache or where external queries are going.  Even after flushing the cache I didn't see any info on outbound queries.  I tried using the query-errors channel first, but it didn't seem to capture anything.  I guess NXDOMAIN is not considered an error.
>
> Occasionally I've had success, particularly after flushing the cache (though that doesn't always work).  But when I try 30 seconds later I get NXDOMAIN.
>
> Every query I have directed explicitly (with dig) at the campus nameserver has succeeded.
>
> The VPN connection has always been a bit touchy, and the problem first arose immediately after it went down for somewhere between 30 seconds and a couple of minutes.  My initial theory was that had caused a failure to be cached, but the way I get failures right after successes is not consistent with that.
>
> Thanks for any help.
>
> Ross Boylan
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>
> ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.
>
>
> bind-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: NXDOMAIN problems

Matus UHLAR - fantomas
In reply to this post by Boylan, Ross
On 16.11.20 22:58, Boylan, Ross wrote:

>I have been experiencing NXDOMAIN errors persistently, though not 100% of
> the time, for a machine I am trying to reach.  The queries worked OK
> before today.  I not only don't know what's causing it, but am having
> trouble tracing what's going on inside of bind.  I'd be grateful for help
> on either front, getting DNS to work or debugging.
>
>There are a lot of complications.  In brief, the machine and name
> resolution for it are only available through VPN; I have a search list
> which should cause some failed lookups if the original doesn't work; and
> I'm using views.  Some details follow, and then discussion of my debugging
> attempts.
>
>DETAILS
>
>The remote machine is only accessible though VPN, and the nameserver that
> knows how to find it is also accessible only through VPN.  The IP of that
> nameserver is first on my forwarders list on my local machine.  When
> failures happen the replies indicate the request was addressed to the
> public-facing nameservers; it is good that they don't provide any info,
> but they shouldn't be getting the request.

forwarders are not used in specified order, named measures TTL and uses server
that answers first.

you can configure configure your domain with specified forwarders and to be
"forward only".

>I also added the target domain (ucsf.edu) to my search list.  So when I ask
> for mymachine.ucsf.edu, this will also generate a query for
> mymachine.ucsf.edu.ucsf.edu if the first query fails.  The second query is
> asking for a non-existent domain, and so maybe that is the proximate
> source of the NXDOMAIN.

this could be controlled by option "ndots:1" in resolv.conf, so search list
ignored for every hostname with one or more dots
... this is not BIND issue but the stub resolver issue.

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I intend to live forever - so far so good.
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: NXDOMAIN problems

Matus UHLAR - fantomas
In reply to this post by Boylan, Ross
On 17.11.20 05:41, Boylan, Ross wrote:
>One other detail may be important: I just added a bridge interface and
> virtual machines.  I presume the VPN tunnel was using the hardware
> interface (enp5s0) before, and is using the bridge (br0) now.  OpenConnect
> creates the tunnel (tun0); both the name and inspection of the code
> indicate the tunnel is based on the TUN interface, at the IP layer,
> instead of the TAP interface, at the MAC layer.  If some of the
> communication is not using IP then I presume it could be disappearing at
> the bridge.

I guess that your VPN uses the IP that topologically closest to the
other side of VPN tunnel. Usually it's the IP with the default route set.

you can often override it in the VPN configuration.
Note this is not bind issue.

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Eagles may soar, but weasels don't get sucked into jet engines.
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: NXDOMAIN problems

Bind-Users forum mailing list
In reply to this post by Boylan, Ross
Hi there,

On Tue, 17 Nov 2020, Boylan, Ross wrote:

> I have been experiencing NXDOMAIN errors ...
> ... There are a lot of complications.
> ... The remote machine is only accessible though VPN
> ... the nameserver ... is also accessible only through VPN
> ... The VPN connection has always been a bit touchy ...

In my experience, complicated usually also means unreliable.

Does it _need_ to be complicated?

Could you not just put

192.0.2.3 mymachine.ucsf.edu mymachine

or similar into /etc/hosts (or whatever passes for that on the client)?

--

73,
Ged.
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users