dns cache issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

dns cache issue

Edwardo Garcia
With new windows update last day, we notice something strange, our local DNS cache server timeout on lookups.

For example lookup google.com, 1 minute later fails timeout looking up, but since it has already looked it up it should have returned answer from cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns it seems

QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything else is default QoS must be working because if I do
host www.google.com $externalDNSserver   I get an answer pretty much right away,  immediately try again on our local dns server it times out cant connect to any servers.
this contrinues on, if I drop the LAN port on switch the windows update machine uses,  it resolves google.com again, bring back up that port, it times out again.

this only happens on congestion, with our cable link maxed out.

(never thought i'd see the day when a windows pc would take out an entire network)

Below is my named.conf I have to be missing something ?
 
BIND 9.11.2-P1
running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
built by make with defaults

acl "trusted" { localhost; 198.162.100.0/24; };
acl "sysop" { localhost; 192.168.100.6; };
        
options {
        directory "/var/named";
        allow-query { trusted; };
        allow-query-cache { trusted; };
        allow-transfer { sysop; };
        transfer-format many-answers;
        masterfile-format text;
        interface-interval 0;
        response-policy {zone "rpz.lan"; };
        dnssec-enable yes;
        dnssec-validation auto;
        empty-zones-enable yes;
};

server fe80::/16 { bogus yes; };
        
logging {
        category lame-servers { null; };
        category edns-disabled { null; };
        category client { null; };
        category dnssec { null; };
         //channel log_queries { file "/var/named/query.log"; print-category yes; };
         //category queries { log_queries; };
        channel log-rpz { file "/var/log/rpz.log" versions 10 25m; severity info; };
        category rpz { log-rpz; };
};
  
zone "." {
        type hint;
        file "root.cache";
 
zone "rpz.lan" {
        type master;
        file "rpz.lan";
        allow-query { trusted; };
        allow-update {none;};
        notify no;
};
       
       
zone "akamai.net" {
        type forward;
        forward first;
        forwarders { xxxxxx; xxxxxx; };
};
 
 


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: dns cache issue

Kevin Darcy
Offhand, sounds like your LAN is saturated so the queries might not be getting to BIND in the first place. Or the replies aren't getting back. It's unlikely that QoS is going to help this, you indicated that QoS was on your "router", and that is typical -- usually QoS is found on WAN links. (Although, on the other hand, you mentioned VoIP, and VoIP sometimes requires applying QoS at the LAN level too).

You currently have query logging turned off. If it's not too resource-intensive, you might want to consider turning that on, to verify whether the queries are getting to BIND. Or, run a packet capture on the BIND side. Packet capture on the BIND device should also help to identify any issues talking upstream (e.g. to TLD servers or auth servers for domains like google.com). Packet capture on the *client* side would probably be necessary for definitive proof of whether replies are being dropped by the LAN (compare what the server sent side-by-side with what the client saw).

I was intrigued by "server fe80::/16 { bogus yes; }; " in your config. Have you had issues with IPv6 link-local addresses being associated with delegated nameservers? I haven't noticed this, but then again, I haven't been looking for that particular misconfiguration specifically...

                                                                                                    - Kevin



On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia <[hidden email]> wrote:
With new windows update last day, we notice something strange, our local DNS cache server timeout on lookups.

For example lookup google.com, 1 minute later fails timeout looking up, but since it has already looked it up it should have returned answer from cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns it seems

QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything else is default QoS must be working because if I do
host www.google.com $externalDNSserver   I get an answer pretty much right away,  immediately try again on our local dns server it times out cant connect to any servers.
this contrinues on, if I drop the LAN port on switch the windows update machine uses,  it resolves google.com again, bring back up that port, it times out again.

this only happens on congestion, with our cable link maxed out.

(never thought i'd see the day when a windows pc would take out an entire network)

Below is my named.conf I have to be missing something ?
 
BIND 9.11.2-P1
running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
built by make with defaults

acl "trusted" { localhost; 198.162.100.0/24; };
acl "sysop" { localhost; 192.168.100.6; };
        
options {
        directory "/var/named";
        allow-query { trusted; };
        allow-query-cache { trusted; };
        allow-transfer { sysop; };
        transfer-format many-answers;
        masterfile-format text;
        interface-interval 0;
        response-policy {zone "rpz.lan"; };
        dnssec-enable yes;
        dnssec-validation auto;
        empty-zones-enable yes;
};

server fe80::/16 { bogus yes; };
        
logging {
        category lame-servers { null; };
        category edns-disabled { null; };
        category client { null; };
        category dnssec { null; };
         //channel log_queries { file "/var/named/query.log"; print-category yes; };
         //category queries { log_queries; };
        channel log-rpz { file "/var/log/rpz.log" versions 10 25m; severity info; };
        category rpz { log-rpz; };
};
  
zone "." {
        type hint;
        file "root.cache";
 
zone "rpz.lan" {
        type master;
        file "rpz.lan";
        allow-query { trusted; };
        allow-update {none;};
        notify no;
};
       
       
zone "akamai.net" {
        type forward;
        forward first;
        forwarders { xxxxxx; xxxxxx; };
};
 
 

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: dns cache issue

Edwardo Garcia
Kevin,
I though lan saturation too, but I can ssh into bind server immediately, I also, from my other pc did a lookup on local authoritative zone rpz.lan, so my bind replying right away or within 1 second during congestion, could it be dnssec the problem, I did not disable that to test, it really is like it is not caching any external results so maybe it needs to go out and do all lookups again to make sure signature valid? I really don't know. I'm now guessing.

I will try your suggestion of logging again, and as for link local, yes, couple of years ago  we saw problems

ed

On Fri, Jan 11, 2019 at 1:17 AM Kevin Darcy <[hidden email]> wrote:
Offhand, sounds like your LAN is saturated so the queries might not be getting to BIND in the first place. Or the replies aren't getting back. It's unlikely that QoS is going to help this, you indicated that QoS was on your "router", and that is typical -- usually QoS is found on WAN links. (Although, on the other hand, you mentioned VoIP, and VoIP sometimes requires applying QoS at the LAN level too).

You currently have query logging turned off. If it's not too resource-intensive, you might want to consider turning that on, to verify whether the queries are getting to BIND. Or, run a packet capture on the BIND side. Packet capture on the BIND device should also help to identify any issues talking upstream (e.g. to TLD servers or auth servers for domains like google.com). Packet capture on the *client* side would probably be necessary for definitive proof of whether replies are being dropped by the LAN (compare what the server sent side-by-side with what the client saw).

I was intrigued by "server fe80::/16 { bogus yes; }; " in your config. Have you had issues with IPv6 link-local addresses being associated with delegated nameservers? I haven't noticed this, but then again, I haven't been looking for that particular misconfiguration specifically...

                                                                                                    - Kevin



On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia <[hidden email]> wrote:
With new windows update last day, we notice something strange, our local DNS cache server timeout on lookups.

For example lookup google.com, 1 minute later fails timeout looking up, but since it has already looked it up it should have returned answer from cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns it seems

QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything else is default QoS must be working because if I do
host www.google.com $externalDNSserver   I get an answer pretty much right away,  immediately try again on our local dns server it times out cant connect to any servers.
this contrinues on, if I drop the LAN port on switch the windows update machine uses,  it resolves google.com again, bring back up that port, it times out again.

this only happens on congestion, with our cable link maxed out.

(never thought i'd see the day when a windows pc would take out an entire network)

Below is my named.conf I have to be missing something ?
 
BIND 9.11.2-P1
running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
built by make with defaults

acl "trusted" { localhost; 198.162.100.0/24; };
acl "sysop" { localhost; 192.168.100.6; };
        
options {
        directory "/var/named";
        allow-query { trusted; };
        allow-query-cache { trusted; };
        allow-transfer { sysop; };
        transfer-format many-answers;
        masterfile-format text;
        interface-interval 0;
        response-policy {zone "rpz.lan"; };
        dnssec-enable yes;
        dnssec-validation auto;
        empty-zones-enable yes;
};

server fe80::/16 { bogus yes; };
        
logging {
        category lame-servers { null; };
        category edns-disabled { null; };
        category client { null; };
        category dnssec { null; };
         //channel log_queries { file "/var/named/query.log"; print-category yes; };
         //category queries { log_queries; };
        channel log-rpz { file "/var/log/rpz.log" versions 10 25m; severity info; };
        category rpz { log-rpz; };
};
  
zone "." {
        type hint;
        file "root.cache";
 
zone "rpz.lan" {
        type master;
        file "rpz.lan";
        allow-query { trusted; };
        allow-update {none;};
        notify no;
};
       
       
zone "akamai.net" {
        type forward;
        forward first;
        forwarders { xxxxxx; xxxxxx; };
};
 
 

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: dns cache issue

Edwardo Garcia
OK, so  this happen again, with link congestion.

bind is caching the results as tested with no congestion, 78ms down to 1ms... BUT the issue with bind remain and logs show nothing wrong

congested link lookup , tried in instant succession with a second or less between:
google.com (like any other host I try)  timeout no servers can be reached
lookup internal zone I added to bind, replies with 7ms
retry google and few other sites again, all timeout no servers can be reached
(google may only have 5min TTL but other domains i'm testing, including mail provider etc, is 1 day.
ping to DNS box is quick
ping to other boxes is quick too
disconnect  windows updating pc, and google et al respond with 1ms so it obviously is in the bloody cache but because bind  cant do something with internet in a timely manor it just spits dummy

Why bind do this if it should already know the answer, it should give answer, since it holds the record, just as it knows the internal test zone.

this all cause mail to fail, web browsing to fail, boss not happy.



On Fri, Jan 11, 2019 at 9:27 AM Edwardo Garcia <[hidden email]> wrote:
Kevin,
I though lan saturation too, but I can ssh into bind server immediately, I also, from my other pc did a lookup on local authoritative zone rpz.lan, so my bind replying right away or within 1 second during congestion, could it be dnssec the problem, I did not disable that to test, it really is like it is not caching any external results so maybe it needs to go out and do all lookups again to make sure signature valid? I really don't know. I'm now guessing.

I will try your suggestion of logging again, and as for link local, yes, couple of years ago  we saw problems

ed

On Fri, Jan 11, 2019 at 1:17 AM Kevin Darcy <[hidden email]> wrote:
Offhand, sounds like your LAN is saturated so the queries might not be getting to BIND in the first place. Or the replies aren't getting back. It's unlikely that QoS is going to help this, you indicated that QoS was on your "router", and that is typical -- usually QoS is found on WAN links. (Although, on the other hand, you mentioned VoIP, and VoIP sometimes requires applying QoS at the LAN level too).

You currently have query logging turned off. If it's not too resource-intensive, you might want to consider turning that on, to verify whether the queries are getting to BIND. Or, run a packet capture on the BIND side. Packet capture on the BIND device should also help to identify any issues talking upstream (e.g. to TLD servers or auth servers for domains like google.com). Packet capture on the *client* side would probably be necessary for definitive proof of whether replies are being dropped by the LAN (compare what the server sent side-by-side with what the client saw).

I was intrigued by "server fe80::/16 { bogus yes; }; " in your config. Have you had issues with IPv6 link-local addresses being associated with delegated nameservers? I haven't noticed this, but then again, I haven't been looking for that particular misconfiguration specifically...

                                                                                                    - Kevin



On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia <[hidden email]> wrote:
With new windows update last day, we notice something strange, our local DNS cache server timeout on lookups.

For example lookup google.com, 1 minute later fails timeout looking up, but since it has already looked it up it should have returned answer from cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns it seems

QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything else is default QoS must be working because if I do
host www.google.com $externalDNSserver   I get an answer pretty much right away,  immediately try again on our local dns server it times out cant connect to any servers.
this contrinues on, if I drop the LAN port on switch the windows update machine uses,  it resolves google.com again, bring back up that port, it times out again.

this only happens on congestion, with our cable link maxed out.

(never thought i'd see the day when a windows pc would take out an entire network)

Below is my named.conf I have to be missing something ?
 
BIND 9.11.2-P1
running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
built by make with defaults

acl "trusted" { localhost; 198.162.100.0/24; };
acl "sysop" { localhost; 192.168.100.6; };
        
options {
        directory "/var/named";
        allow-query { trusted; };
        allow-query-cache { trusted; };
        allow-transfer { sysop; };
        transfer-format many-answers;
        masterfile-format text;
        interface-interval 0;
        response-policy {zone "rpz.lan"; };
        dnssec-enable yes;
        dnssec-validation auto;
        empty-zones-enable yes;
};

server fe80::/16 { bogus yes; };
        
logging {
        category lame-servers { null; };
        category edns-disabled { null; };
        category client { null; };
        category dnssec { null; };
         //channel log_queries { file "/var/named/query.log"; print-category yes; };
         //category queries { log_queries; };
        channel log-rpz { file "/var/log/rpz.log" versions 10 25m; severity info; };
        category rpz { log-rpz; };
};
  
zone "." {
        type hint;
        file "root.cache";
 
zone "rpz.lan" {
        type master;
        file "rpz.lan";
        allow-query { trusted; };
        allow-update {none;};
        notify no;
};
       
       
zone "akamai.net" {
        type forward;
        forward first;
        forwarders { xxxxxx; xxxxxx; };
};
 
 

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users