Solved DNS server ISC BIND 9.12 won't query via TCP

Hi All.

My server using dns/bind912 provide domain name resolve service (Domain name: epopen.com, DNS server: dns.epopen.com)
Nearly new feature(package) added, it update DNS zone TXT record via TCP( dns.query.tcp() @ python) .
But it won't work.

Bind listen TCP port confirmed by sockstat |grep bind.
And result:
Code:
bind     named      16858 23 tcp4   10.0.0.1:953          *:*
bind     named      16858 21 tcp6   fd00::ffff:a00:1:53   *:*
bind     named      16858 22 tcp4   10.0.0.1:53           *:*
bind     named      16858 512 udp6  fd00::ffff:a00:1:53   *:*
bind     named      16858 513 udp4  10.0.0.1:53           *:*
Note: Bind running in jail, so fd00::ffff:a00:1 and 10.0.0.1 is jail address.

Manual query test in server machine as below:
1.TCP test by dig +tcp @dns.epopen.com www.epopen.com.
And result:
Code:
;; communications error to 10.0.0.1#53: host unreachable

2.UDP test by dig @dns.epopen.com www.epopen.com.
And result:
Code:
; <<>> DiG 9.12.1-P2 <<>> @dns.epopen.com www.epopen.com
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15052
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: a719c728b73c85fe8e41323d5b320882d1a2b560249f42c5 (good)
;; QUESTION SECTION:
;www.epopen.com.                        IN      A

;; ANSWER SECTION:
www.epopen.com.         3600    IN      A       122.117.86.253

;; Query time: 2 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: 週二  6月 26 17:33:54 CST 2018
;; MSG SIZE  rcvd: 87

My /usr/local/etc/namedb/named.conf (options section) show below for reference.
Code:
options {
        directory       "/usr/local/etc/namedb/working";
        pid-file        "/var/run/named/pid";
        dump-file       "/var/dump/named_dump.db";
        statistics-file "/var/stats/named.stats";
        listen-on       { any; };
        listen-on-v6    { any; };
        disable-empty-zone "255.255.255.255.IN-ADDR.ARPA";
        disable-empty-zone "0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA";
        disable-empty-zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA";
        recursion       no;
        allow-transfer { "none"; };
};

Look like DNS server work fine via UDP only.
But I can't find any directive about TCP.

Can help me for debug?
Thanks very much.
 
Any firewall? Your configuration looks fine, sockstat(1) shows the ports are open. So the only reason why TCP may not be passed along is a firewall.
 
How are the host and the jail configured?
Hi SirDice

My host configure:
  1. Interface "lo1" cloned by cloned_interfaces="lo1" in /etc/rc.conf.
    And attach IP address as 10.0.0.254/fd00::ffff:ffff:fffe.
  2. Only one PF rule for lo1 as set skip on lo1.
  3. ifconfig lo1 output.
    Code:
    ......
    ::1                     localhost localhost.my.domain
    127.0.0.1               localhost localhost.my.domain
    fd00::ffff:10.0.0.1             dns.epopen.com
    10.0.0.1                                dns.epopen.com
    fd00::ffff:10.0.0.2             www.epopen.com
    10.0.0.2                                www.epopen.com
    fd00::ffff:ffff:fffe    host.epopen.com
    10.0.0.254                              host.epopen.com
    ......
  4. ifconfig lo1 output.
    Code:
    ......
    inet 10.0.0.254 netmask 0xffffff00
    inet 10.0.0.1 netmask 0xffffffff
    inet 10.0.0.2 netmask 0xffffffff
    inet6 fd00::ffff:ffff:fffe prefixlen 96
    inet6 fd00::ffff:a00:1 prefixlen 128
    inet6 fd00::ffff:a00:2 prefixlen 128
    ......
My jail configure:
  1. All of jail attach interface "lo1".
  2. jls -v output.
    Code:
    JID  Hostname           Path
    Name                          State
    CPUSetID
    ......
    2  www.epopen.com                /usr/jail/httpd
        httpd                         ACTIVE
        3
        10.0.0.2
        fd00::ffff:a00:2
    17  dns.epopen.com                /usr/jail/named
       named                         ACTIVE
       6
       10.0.0.1
       fd00::ffff:a00:1
    ......
Above is configuration for you need I think.
Please tell me If need more.

Note: I can access my web server(in jail) from host via TCP by wget 10.0.0.2 and output below.
Code:
--2018-06-27 09:54:35--  http://10.0.0.2/
Connecting to 10.0.0.2:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://epopen.com/ [following]
--2018-06-27 09:54:35--  https://epopen.com/
........

Thanks you a lot.
 
IIRC Bind > 9.11 requires TCP FastOpen for almost everything it does over TCP. You should get some warnings/log entries about that functionality missing at service startup and rndc most likely can't connect to the name server. Issue a rndc status and it should return an error.

The kernel option you need to set is "TCP_RFC7413"
 
IIRC Bind > 9.11 requires TCP FastOpen for almost everything it does over TCP. You should get some warnings/log entries about that functionality missing at service startup and rndc most likely can't connect to the name server. Issue a rndc status and it should return an error.
The kernel option you need to set is "TCP_RFC7413"
Hi sko
I have never use rndc command before today, rndc complained every bind start/stop was ignore .:p
Follow your suggestion, test rndc status in jail and got below error.
Code:
rndc: recv failed: host unreachable
Accord your description.

I found relation thread as https://forums.freebsd.org/threads/tcp_fastopen.59367/
But my server never explained error message in this thread.

I will add TCP_RFC7413 and net.inet.tcp.fastopen.enabled and try again.
Thanks a lot.
 
I have never use [BGCOLOR=#dee3e7] rndc[/BGCOLOR] command before today

How do you manage your zones without being able to freeze/reload/flush/refresh... them?

However, check if the control daemon is enabled - not sure if this is the case on a vanilla installation of bind. There should be a "controls" section in your named.conf. The important thing with bind running in a jail is; the rndc service can't just use "localhost" or 127.0.0.1; so you have to either specify a loopback address for the jail or run and connect to rndc via the/an external address. This also has to be configured in rndc.conf.

You were mentioning setting/updating TXT records via some python thingy - this will most likely also need a working control daemon.
Also check the logfiles; bind can be made _very_ verbose, so use this for debugging. Tcpdump is also extremely helpful for diagnosing DNS problems of all sorts. Usually bind tells the client exactly why some request failed, so just tap in to the conversation if that python module doesn't properly forward errors...
 
How do you manage your zones without being able to freeze/reload/flush/refresh... them?

However, check if the control daemon is enabled - not sure if this is the case on a vanilla installation of bind. There should be a "controls" section in your named.conf. The important thing with bind running in a jail is; the rndc service can't just use "localhost" or 127.0.0.1; so you have to either specify a loopback address for the jail or run and connect to rndc via the/an external address. This also has to be configured in rndc.conf.

You were mentioning setting/updating TXT records via some python thingy - this will most likely also need a working control daemon.
Also check the logfiles; bind can be made _very_ verbose, so use this for debugging. Tcpdump is also extremely helpful for diagnosing DNS problems of all sorts. Usually bind tells the client exactly why some request failed, so just tap in to the conversation if that python module doesn't properly forward errors...
Hi sko.

Thanks you reply.
My site use static domain and zone, so both hard code in configuration file.
The issue start from new request for modify TXT record by dynamic update.
Basis your suggestion, try working with control daemon.

In fact, my dns/bind912 and prior version server recived resolve query by UDP only, never TCP.
I confusing why TCP won't work before.

I am very sorry.
My server is remote, got unknown problem and connect lost after make/install new kernel (with option TCP_RFC7413)
So can't try and answer to you in this week.

Thanks you a lot.
 
you might also need to add
Code:
listen-on-v6 { ::1; any; }
to named.conf to add IPv6 "localhost" , and "any" address (change as required).
While this doesn't guarantee TCP connectivity. You're using IPv6, and unless you tell named to listen on that (your) IPv6 address. It won't know to do that. :)

HTH

--Chris
 
you might also need to add
Code:
listen-on-v6 { ::1; any; }
to named.conf to add IPv6 "localhost" , and "any" address (change as required).
While this doesn't guarantee TCP connectivity. You're using IPv6, and unless you tell named to listen on that (your) IPv6 address. It won't know to do that. :)

HTH

--Chris
Hi Chris_H

Thanks you a lot.
But my remote server still off-line:'‑(, can't test your suggestion.
I will test and reply when server problem solved.
 
Hi sko.

Thanks you reply.
My site use static domain and zone, so both hard code in configuration file.
The issue start from new request for modify TXT record by dynamic update.
Basis your suggestion, try working with control daemon.

In fact, my dns/bind912 and prior version server recived resolve query by UDP only, never TCP.
I confusing why TCP won't work before.

I am very sorry.
My server is remote, got unknown problem and connect lost after make/install new kernel (with option TCP_RFC7413)
So can't try and answer to you in this week.

Thanks you a lot.


Hey , I know your problem ! It tried to use unbound with tcp fast open and as soon as I used the kernel with option TCP_RFC7413, I got a kernel panic .

This is what I did.

You need to apply these two patches.

https://svnweb.freebsd.org/base?view=revision&revision=313168

From that on, everything is stable...
 
If you are building bind from ports, support for TCP_FASTOPEN can be disabled in the port options. Unfortunately it is enabled by default in the port but not in GENERIC kernels and afaik there is no run-time checking to test whether the currently running kernel supports it.
 
Hey , I know your problem ! It tried to use unbound with tcp fast open and as soon as I used the kernel with option TCP_RFC7413, I got a kernel panic .

This is what I did.

You need to apply these two patches.

https://svnweb.freebsd.org/base?view=revision&revision=313168

From that on, everything is stable...
Hi Sebastian

The bug(?) committed 17 months ago, but stay HEAD, doesn't apply even latest RELEASE (11.2) until now.
Confuse, under experiment?:)

Thanks you found root cause very much.

If you are building bind from ports, support for TCP_FASTOPEN can be disabled in the port options. Unfortunately it is enabled by default in the port but not in GENERIC kernels and afaik there is no run-time checking to test whether the currently running kernel supports it.
Hi mickey
Yes, bind from ports, so I have two action.
  1. Apply Sebastian's patch, enable TCP_FASTOPEN, before 12-RELEASE.
  2. Disable TCP_FASTOPEN in the port options if patch doesn't apply into official 12-RELEASE.:)
Thanks you very much.
 
Hi All.

I found root cause @ https://calomel.org/freebsd_network_tuning.html
Because net.inet.tcp.soreceive_stream enabled.
And wrote:
NOTE: disable net.inet.tcp.soreceive_stream when using rndc to update BIND DNS records
otherwise the following error will trigger, "rndc: recv failed: host unreachable".


Finally I disabled it as net.inet.tcp.soreceive_stream="0" in /boot/loader.conf and work fine !

Note 1: https://forums.freebsd.org/threads/...able-update-caused-by-soreceive_stream.61064/ report same issue.
Note 2: I have no idea why it block rndc's TCP connection, but HTTP/FTP etc TCP connection work fine.:-/

Thanks all very much:D
 
Back
Top