DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

PENG, JUNAN
Hi, All

I did recursive query capacity test.   I used traffic generator to place 15K QPS traffic to DNS 1 with FQDN1 (Note, FQDN1 can't be resolve by DNS1, it need to forward it to DNS2  and TTL is set to 0)

But during the test , I found lots of failure , the successful rate is not high (85%).   Then I used TCPdump commands to capture logs in DNS1 , I found the following things:

Thing 1.  DNS query number is larger than response number between traffic generator and DNS1 .  About 15% traffic are dropped by DNS1 .

Thing 2. DNS recursive query number between DNS1 and DNS2  is far less than query number between traffic generator and DNS1  


I want to confirm DNS behavior here:

DNS1 will initiate a recursive query  towards DNS2 when first query is coming .  transaction time between DNS1 and DNS2 is about 3 miliseconds.  If in these 3 miliseconds, there are other queries with same FQDN are coming, whether all these queries will be lined up in DNS1 because DNS1 has initiated the same FQDN resolve request to DNS2 ? if yes, which will explain thing 2  I observed during the test.    After DNS1 gets response from DNS2, then DNS1 will send response to the all the requests from traffic generator lined up in DNS1 , but unfortunately ,  DNS1 seems drop some packets here. There are 15% packet without response .

Besides, CPU usage is not high in DNS1 , only 30%

Is my understanding correct ?   Which parameters in DNS will impact the performance significantly ?   How to do further troubleshooting ?


Thank you very much!!

BR
Michael


 
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

Martin Wismer
Hello Michael,

take care to increase the tcpdump buffers. Else it's tcpdump which loose
trafic, not the dns server
Have Fun. Greetings
   Martin.Wismer.

tcpdump option   -B 131072   helped in my case

PS) this time with my other E-Mail Address
On 10.04.18 02:37, PENG, JUNAN wrote:

> Hi, All
>
> I did recursive query capacity test.   I used traffic generator to place 15K QPS traffic to DNS 1 with FQDN1 (Note, FQDN1 can't be resolve by DNS1, it need to forward it to DNS2  and TTL is set to 0)
>
> But during the test , I found lots of failure , the successful rate is not high (85%).   Then I used TCPdump commands to capture logs in DNS1 , I found the following things:
>
> Thing 1.  DNS query number is larger than response number between traffic generator and DNS1 .  About 15% traffic are dropped by DNS1 .
>
> Thing 2. DNS recursive query number between DNS1 and DNS2  is far less than query number between traffic generator and DNS1
>
>
> I want to confirm DNS behavior here:
>
> DNS1 will initiate a recursive query  towards DNS2 when first query is coming .  transaction time between DNS1 and DNS2 is about 3 miliseconds.  If in these 3 miliseconds, there are other queries with same FQDN are coming, whether all these queries will be lined up in DNS1 because DNS1 has initiated the same FQDN resolve request to DNS2 ? if yes, which will explain thing 2  I observed during the test.    After DNS1 gets response from DNS2, then DNS1 will send response to the all the requests from traffic generator lined up in DNS1 , but unfortunately ,  DNS1 seems drop some packets here. There are 15% packet without response .
>
> Besides, CPU usage is not high in DNS1 , only 30%
>
> Is my understanding correct ?   Which parameters in DNS will impact the performance significantly ?   How to do further troubleshooting ?
>
>
> Thank you very much!!
>
> BR
> Michael
>
>
>  
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>
> bind-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/bind-users
>

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

Tony Finch
In reply to this post by PENG, JUNAN
PENG, JUNAN <[hidden email]> wrote:
>

I need to start by saying that my load testing is very unscientific,
so I can only give you a few handwaving hints...

> I did recursive query capacity test.  I used traffic generator to place
> 15K QPS traffic to DNS 1 with FQDN1 (Note, FQDN1 can't be resolve by
> DNS1, it need to forward it to DNS2 and TTL is set to 0)

In my experience, 15kqps is easy to achieve with a hot cache, but if you
are forcing the resolver to make recursive queries you'll be sacrificing
a lot of the potential performance (but I can't give you numbers on how
much).

Set the TTL to a non-zero value to get a more realistic test.

> Thing 1.  DNS query number is larger than response number between
> traffic generator and DNS1 .  About 15% traffic are dropped by DNS1 .

Are you hammering the same qname or a small number of qnames? If so I
would expect the server to drop queries while it is recursing - look at
the documentation for max-clients-per-query.

> Thing 2. DNS recursive query number between DNS1 and DNS2 is far less
> than query number between traffic generator and DNS1

That's kind of the point of a cache :-)

Tony.
--
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
public services available on equal terms to all
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

Re: DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

Cathy Almond
In reply to this post by PENG, JUNAN
On 10/04/2018 01:37, PENG, JUNAN wrote:
> Hi, All
>
> I did recursive query capacity test.   I used traffic generator to place 15K QPS traffic to DNS 1 with FQDN1 (Note, FQDN1 can't be resolve by DNS1, it need to forward it to DNS2  and TTL is set to 0)
>
> But during the test , I found lots of failure , the successful rate is not high (85%).   Then I used TCPdump commands to capture logs in DNS1 , I found the following things:
>
> Thing 1.  DNS query number is larger than response number between traffic generator and DNS1 .  About 15% traffic are dropped by DNS1 .
>
> Thing 2. DNS recursive query number between DNS1 and DNS2  is far less than query number between traffic generator and DNS1  

Tony Finch was correct earlier to point you in the direction of
max-clients-per-query.

There's also this KB article:

https://kb.isc.org/article/AA-00463/0/How-does-clients-per-query-work.html

But your test scenario is in any case flawed.  You're attempting to test
how well named can handle recursing every time, but that is not going to
happen because you're using the same FQDN.

What's happening here is that the first query received causes recursion
to commence to get the answer to the client query.  All the other
clients making the same query while this is ongoing, don't cause named
to start more recursion - instead they will be queued waiting for the
answer to be available (i.e. there are multiple clients per query at
this point in time).

When the answer comes back from recursion, it will be given to all those
clients that were waiting for it.  Then, because it had TTL=0, it's not
kept to be used for newer clients asking for the same thing -
essentially the process starts all over again.

And the other thing that is happening (as has already been pointed out)
is that you're (very likely) tripping up over the 'clients-per-query'
self-tuning throttle (designed to protect your server from a storm of
the same query from multiple clients).  This is going to result in
dropped queries.  Have a look at your logs (make sure you're logging
everything) - if you see clients-per-query being adjusted up and down,
then you've been hitting this limit.

Hope this info helps you to design a test that matches better to what
you need to achieve.

Cathy

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users
Reply | Threaded
Open this post in threaded view
|

RE: DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

PENG, JUNAN
Hi, Tony and Cathy

Yes,  you are right.   It is caused by query using same FQDN and TTL=0.  I went to adjust  'clients-per-query' and ' max-clients-per-query' parameters during the test, there was a big difference.

I also saw clients-per-query being adjusted up and down in logs :)   anyway,  I am looking for multiple FQDNs solution to alleviate it.

Thank you very much!

BR


-----Original Message-----
From: bind-users [mailto:[hidden email]] On Behalf Of Cathy Almond
Sent: Friday, April 13, 2018 4:14 AM
To: [hidden email]
Subject: Re: DNS Capacity issue help -- Recursive Query -- it seems some packets are dropped by DNS

On 10/04/2018 01:37, PENG, JUNAN wrote:
> Hi, All
>
> I did recursive query capacity test.   I used traffic generator to place 15K QPS traffic to DNS 1 with FQDN1 (Note, FQDN1 can't be resolve by DNS1, it need to forward it to DNS2  and TTL is set to 0)
>
> But during the test , I found lots of failure , the successful rate is not high (85%).   Then I used TCPdump commands to capture logs in DNS1 , I found the following things:
>
> Thing 1.  DNS query number is larger than response number between traffic generator and DNS1 .  About 15% traffic are dropped by DNS1 .
>
> Thing 2. DNS recursive query number between DNS1 and DNS2  is far less than query number between traffic generator and DNS1  

Tony Finch was correct earlier to point you in the direction of max-clients-per-query.

There's also this KB article:

https://urldefense.proofpoint.com/v2/url?u=https-3A__kb.isc.org_article_AA-2D00463_0_How-2Ddoes-2Dclients-2Dper-2Dquery-2Dwork.html&d=DwICAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=xVh2hygmrxdOcVtuPuDNjQ&m=L93rGQDGg9_j4oNxcd_ghnG5KRYTElry1B5GJf6e_PU&s=yhxZMriJCxLZZJZ7o5ulNZbdtvwXlcJDhavl-FhCApA&e= 

But your test scenario is in any case flawed.  You're attempting to test how well named can handle recursing every time, but that is not going to happen because you're using the same FQDN.

What's happening here is that the first query received causes recursion to commence to get the answer to the client query.  All the other clients making the same query while this is ongoing, don't cause named to start more recursion - instead they will be queued waiting for the answer to be available (i.e. there are multiple clients per query at this point in time).

When the answer comes back from recursion, it will be given to all those clients that were waiting for it.  Then, because it had TTL=0, it's not kept to be used for newer clients asking for the same thing - essentially the process starts all over again.

And the other thing that is happening (as has already been pointed out) is that you're (very likely) tripping up over the 'clients-per-query'
self-tuning throttle (designed to protect your server from a storm of the same query from multiple clients).  This is going to result in dropped queries.  Have a look at your logs (make sure you're logging
everything) - if you see clients-per-query being adjusted up and down, then you've been hitting this limit.

Hope this info helps you to design a test that matches better to what you need to achieve.

Cathy

_______________________________________________
Please visit https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers&d=DwICAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=xVh2hygmrxdOcVtuPuDNjQ&m=L93rGQDGg9_j4oNxcd_ghnG5KRYTElry1B5GJf6e_PU&s=kddAM2ISqC6rwxchE8ZvKOfPHq_mT5vuLqUDBHBJD40&e=  to unsubscribe from this list

bind-users mailing list
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers&d=DwICAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=xVh2hygmrxdOcVtuPuDNjQ&m=L93rGQDGg9_j4oNxcd_ghnG5KRYTElry1B5GJf6e_PU&s=kddAM2ISqC6rwxchE8ZvKOfPHq_mT5vuLqUDBHBJD40&e= 
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/bind-users