Sysadmin Sunday: my vacuum cleaner killed my WiFi
True story: my vacuum cleaner killed my home WiFi network.
I mean… kinda. Sorta. Close enough.
If you want the full details, you'll need to read inordinate amounts of arcane Unix DNS configuration details. Or don't; I'm a blog post, not a cop.
In true Sysadmin Sunday fashion, I wanted to write down my debugging so that I can refer to it later—see that post to understand my weird notation below.
It's always DNS
Observation: when trying to surf the information superhighway, I am sporadically greeted by "Server not found" errors.
Knowledge: This is a DNS issue; it's always DNS. I've previously configured treebeard
,
my home server, to act as the DNS server for my local network, so we should look there.
Hypothesis: before checking the DNS server specifically, is something wrong with treebeard
overall? Time to do my
best Brendan Gregg impression…
josh@treebeard:~$ uptime
16:37:16 up 7:18, 2 users, load average: 4.12, 4.50, 4.60
Observation: Uhh, that seems way higher than I'd expect. (For the uninitiated: the "load average" in that output counts how many tasks have been waiting to run in the past little while1—given that this server is usually idle, I'd expect those numbers to all be well under one).
A quick glance at top
reveals that tailscaled
is using 400% CPU. Let's check the logs:
josh@treebeard:~$ journalctl -u tailscaled -f
Jun 17 21:13:22 treebeard tailscaled[71934]: trying bootstrapDNS("derp24c.tailscale.com", "208.83.233.233") for "log
.tailscale.com" ...
Jun 17 21:13:23 treebeard tailscaled[71934]: trying bootstrapDNS("derp7d.tailscale.com", "2403:2500:400:20::cfe") fo
r "log.tailscale.com" ...
Jun 17 21:13:23 treebeard tailscaled[71934]: bootstrapDNS("derp7d.tailscale.com", "2403:2500:400:20::cfe") for "log.
tailscale.com" error: Get "https://derp7d.tailscale.com/bootstrap-dns?q=log.tailscale.com": dial tcp [2403:2500:400:
20::cfe]:443: connect: network is unreachable
Jun 17 21:14:03 treebeard tailscaled[71934]: [RATELIMIT] format("dns: resolver: forward: no upstream resolvers set,
returning SERVFAIL") (19 dropped)
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolver: forward: no upstream resolvers set, returning SERVFAIL
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolution failed due to missing upstream nameservers. Recompilin
g DNS configuration.
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:9}
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: Resolvercfg: {Routes:{} Hosts:9 LocalDomains:[]}
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: OScfg: {}
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolver: forward: no upstream resolvers set, returning SERVFAIL
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolver: forward: no upstream resolvers set, returning SERVFAIL
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolver: forward: no upstream resolvers set, returning SERVFAIL
Jun 17 21:14:03 treebeard tailscaled[71934]: dns: resolver: forward: no upstream resolvers set, returning SERVFAIL
Jun 17 21:14:03 treebeard tailscaled[71934]: [RATELIMIT] format("dns: resolver: forward: no upstream resolvers set,
returning SERVFAIL")
Hmmmmmmmm. And how about dnsmasq itself:
journalctl -u dnsmasq -f
Jun 17 15:36:37 treebeard dnsmasq[12222]: failed to send packet: Operation not permitted
Jun 17 15:36:41 treebeard dnsmasq[12222]: Maximum number of concurrent DNS queries reached (max: 150)
# (After restarting dnsmasq)
Jun 17 15:39:36 treebeard dnsmasq[71406]: reading /run/dnsmasq/resolv.conf
Jun 17 15:39:36 treebeard dnsmasq[71406]: ignoring nameserver 192.168.8.250 - local interface
Jun 17 15:39:36 treebeard dnsmasq[71406]: using nameserver 100.100.100.100#53
Hypothesis: something is misconfigured somewhere in my DNS/Tailscale setup. Also, it seems like there's way too many DNS queries in flight at once—unclear if that's related or a separate issue.
Experiment: let's just shut off Tailscale entirely for now and see if we can get dnsmasq back to a healthy state.
Tailscale's DNS server lives at 100.100.100.100, so let's remove any references to that we find. (From the tailscaled
logs, it looks like it's missing an upstream location to which to forward DNS requests it can't answer by
itself—we could probably fix that misconfiguration, but let's first start by simplifying as much as possible.) I
saw /run/dnsmasq/resolv.conf
in the dnsmasq logs, so let's start there.
josh@treebeard:~$ cat /run/dnsmasq/resolv.conf
nameserver 192.168.8.250
nameserver 100.100.100.100
# Then, after editing /run/dnsmasq/resolv.conf:
josh@treebeard:~$ cat /run/dnsmasq/resolv.conf
nameserver 9.9.9.9
Observation: changes to that file seem to be picked up immediately by dnsmasq:
Jun 17 21:21:39 treebeard dnsmasq[72630]: reading /run/dnsmasq/resolv.conf
Jun 17 21:21:39 treebeard dnsmasq[72630]: using nameserver 9.9.9.9#53
Observation: browsing the web seems to work as expected again! Let's validate that with an explicit DNS query from my desktop:
[josh@galadriel ~]$ dig +short +identify cbc.ca
96.7.25.105 from server 192.168.8.250 in 63 ms.
Knowledge: 192.168.8.250 is treebeard
's private IP address, so everything looks good here.
Experiment: if I bring tailscaled
back up, but with MagicDNS
disabled, does everything still work?
josh@treebeard:~$ sudo systemctl start tailscaled
josh@treebeard:~$ sudo tailscale set --accept-dns=false
josh@treebeard:~$ uptime
21:30:54 up 12:12, 2 users, load average: 0.25, 0.19, 0.33
Eh, I still see the same weird DNS-related logs from tailscaled
, but dnsmasq isn't on fire anymore and CPU usage seems
normal. Plus my usual "does my tailnet work outside my home?" test is passing: on my phone, toggle WiFi off, data on,
Tailscale on, then try to connect to one of my self-hosted services. It works!
Whose DNS config is it anyway?
Whew, this is a good place for a breather: my home WiFi is functioning as expected, and treebeard
is looking healthy
now. But this whole ordeal has exposed that my DNS setup on treebeard
has been working more by chance than anything
else—it'd be nice to build a deeper understanding of how all this is configured.
How does one gain deeper insight into Unix networking configuration? In 2025, the answer is obvious: ask ChatGPT, then corroborate with Arch Wiki, StackOverflow, and related Wikipedia pages. This section summarizes my learnings therefrom.
The big picture
Say some program on your (Unix) computer wants to do a DNS lookup. What happens next? The process will likely look something like this:
- The program calls a glibc function like
getaddrinfo
. getaddrinfo
will call into glibc's internal Name Service Switch (NSS) machinery, which reads/etc/nsswitch.conf
to figure out how it should resolve the host name to an IP address.- If
dns
is listed as an option on the relevant/etc/nsswitch.conf
line, the NSS resolver will read/etc/resolve.conf
to find which DNS servers to query.
Of course, a program doesn't have to follow those steps—you could write a program that accepts a domain name
from a user then explicitly sends a DNS query to 9.9.9.9
to resolve that name. But doing so is (usually) a Bad
Idea™ that will anger your local sysadmin: if a user has some custom DNS configuration (presumably for a good
reason!), your program will completely ignore it.
Let's walk through those steps on treebeard
to ensure we can follow exactly what's going on.
/etc/nsswitch.conf
As we just learned, /etc/nsswitch.conf
is the main entry point, so let's make sure it looks okay first:
josh@treebeard:~$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files systemd
group: files systemd
shadow: files
gshadow: files
hosts: files mdns4_minimal [NOTFOUND=return] dns
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
Yep, /etc/nsswitch.conf
looks fine to me. The hosts
line is the one we care about:
hosts: files mdns4_minimal [NOTFOUND=return] dns
That config line specifies that there are three possible "sources" for the hosts
database; i.e., when trying to
resolve a domain name, use these sources in order. In more detail those are:
files
: check for a match in/etc/hosts
. (No network requests required!)mdns4_minimal
: attempt to resolve the host via multicast DNS (herein mDNS).[NOTFOUND=return]
: since the.local
domain is only intended for use over mDNS, if mDNS runs successfully but responds "that host doesn't exist," don't continue. Again,.local
is only intended for use in personal networks; hitting an external DNS server to resolve such a host basically never makes sense. This directive ensures we abort before attempting to do so.- Somewhat related: mDNS was the subject of my first Sysadmin Sunday.
dns
: attempt "traditional" DNS resolution (i.e. actually send good ol' DNS requests on port 53), reading/etc/resolv.conf
to discover the DNS servers to query.
Or if you prefer that in pictures:

A visual representation of how DNS resolution (via NSS) works on treebeard
To give an example of when each source might be used, let's consider looking up the following hosts on treebeard
:
-
treebeard
: Since I have the line127.0.1.1 treebeard
in my/etc/hosts
, this is resolved directly by thefiles
source. -
nasgul.local
: As we saw before,.local
is a special domain that is only meant to be resolved via mDNS, so this will be handled by themdns4_minimal
source, after thefiles
source fails to find a match. -
doesnotexist.local
: same as above, but the host doesn't actually exist. mDNS resolution will still be triggered (because it's a.local
domain), but once mDNS returns "name not found," thedns
source innsswitch.conf
will not be used (because of the earlier[NOTFOUND=return]
directive). -
jellyfin.simpsonian.ca
: this won't be covered byfiles
ormdns4_minimal
, sodns
will give it a shot. In this case,treebeard
itself is running the DNS server (dnsmasq), so we want to make sure thedns
source in/etc/nsswitch.conf
usestreebeard
's IP as the nameserver to query. We'll do that later by editing/etc/resolv.conf
. (For the motivation behind this "split-horizon" DNS setup, see this post.)- If all goes as expected, dnsmasq will be able to answer this query itself; we'll give it an explicit configuration
mapping
jellyfin.simpsonian.ca
to an IP address on the local network.
- If all goes as expected, dnsmasq will be able to answer this query itself; we'll give it an explicit configuration
mapping
-
cbc.ca
: like before, this can only be handled bydns
. However, in this case dnsmasq won't have an explicit entry for the host, so it will need to forward that DNS query to an upstream DNS server—we'll configure that next in the dnsmasq config.
The mystery of /etc/resolv.conf
So far, so good: with our newfound understanding of /etc/nsswitch.conf
, we know we need to update /etc/resolv.conf
to point right back to localhost, so that dnsmasq answers any of treebeard
's own DNS queries. Let's see what we're
dealing with there:
josh@treebeard:~$ cat /etc/resolv.conf
# resolv.conf(5) file generated by tailscale
# For more info, see https://tailscale.com/s/resolvconf-overwrite
# DO NOT EDIT THIS FILE BY HAND -- CHANGES WILL BE OVERWRITTEN
nameserver 100.100.100.100
search tailaf9b3.ts.net tailaf9b3.ts.net
If you've ever cracked open /etc/resolv.conf
, or other config files, you've likely seen a similar "DON'T EDIT THIS
YOUR CHANGES WILL DISAPPEAR" admonishment. This awkwardness is inherent when configuration data lives in a single file:
in a world where a single superuser is editing that file by hand, there's no issue, but when multiple services (include
the user) want to manage it, how can they collaborate? There's no foolproof algorithm to apply certain changes to an
arbitrary config and make everyone happy. So in practice, one service tends to commandeer the config (like we're seeing
here), and adds a scary warning to give the user a heads-up.
So who exactly is responsible for /etc/resolv.conf
right now? Seems like it's Tailscale, given the comment, but
running some other ChatGPT-supplied diagnostics shows a couple other possibilities:
josh@treebeard:~$ sudo which resolvconf
/usr/sbin/resolvconf
# So maybe resolvconf is in charge?
josh@treebeard:~$ systemctl is-active systemd-resolved
active
# ...but systemd also claims to be doing things?
josh@treebeard:~$ file /etc/resolv.conf
/etc/resolv.conf: ASCII text
josh@treebeard:~$ readlink -f /etc/resolv.conf
# ...but also the config isn't a symlink??
To be honest, I'm still not fully sure what happened here; I can only imagine it was a gruesome and gorey battle between daemons, the detritus of which is all that remains for us unlucky viewers.
…but given that /etc/resolv.conf
is not currently a symlink, I
don't think anything else is going to try to monkey with it? So let's take full control ourselves and see if we get
clobbered:
# (After editing in vim)
josh@treebeard:~$ cat /etc/resolv.conf
# Note to self: I want full manual control over this config (as opposed to having it managed by systemd/etc.).
# We're running dnsmasq on this server, so all DNS queries should go there (i.e. localhost).
# We'll configure dnsmasq separately to specify its upstream DNS servers.
nameserver 127.0.0.1
Remember, right now we're telling treebeard
which IP address to use for DNS resolution—since dnsmasq
is running on treebeard itself, we want localhost to be the nameserver.
But of course, dnsmasq on treebeard
won't be able to answer most DNS queries by itself (i.e. it won't know the IP
address for cbc.ca
without consulting an external source); it will need to forward those to an external DNS server. We
specify those nameservers with the server
directive in /etc/dnsmasq.conf
. Like before, let's use Quad9:
# (After editing in vim)
josh@treebeard:~$ grep ^server /etc/dnsmasq.conf
server=9.9.9.9
A /run
ner in the night
Earlier, we saw that treebeard
was getting its upstream DNS servers from /run/dnsmasq/resolv.conf
. By hand-editing
that file to contain only the single nameserver I want, we were able to get things working. But that's not a long-term
solution, because something keeps recreating /run/dnsmasq/resolv.conf
every time I restart dnsmasq, which interferes
with our lovingly crafted /etc/dnsmasq.conf
…
josh@treebeard:~$ sudo systemctl restart dnsmasq
josh@treebeard:~$ ls -l /var/run/dnsmasq/resolv.conf
-rw-r--r-- 1 root root 52 Jun 19 12:42 /var/run/dnsmasq/resolv.conf
josh@treebeard:~$ sudo rm /var/run/dnsmasq/resolv.conf
josh@treebeard:~$ ls -l /var/run/dnsmasq/resolv.conf
ls: cannot access '/var/run/dnsmasq/resolv.conf': No such file or directory
josh@treebeard:~$ sudo systemctl restart dnsmasq
josh@treebeard:~$ ls -l /var/run/dnsmasq/resolv.conf
-rw-r--r-- 1 root root 52 Jun 19 12:42 /var/run/dnsmasq/resolv.conf
This isn't a practical problem, because adding no-resolv
to /etc/dnsmasq.conf
prevents dnsmasq from reading
/var/run/dnsmasq/resolv.conf
, but I want to get to the bottom of this—what's creating that file?
I know I can use lsof
to show files being held open by processes, but my guess is that whatever creates this exits
immediately, so lsof
won't spot it. (And if the main dnsmasq process is holding it open after reading it, that's not
much help either.) How can we set up some kind of "monitor" to catch the file creation? ChatGPT suggests either auditd
or inotifyctl
, but I don't have either of them installed. Before trying those, let's look at how dnsmasq is
specifically being invoked on treebeard
by peeking at the systemd unit file:
josh@treebeard:~$ systemctl cat dnsmasq
# /lib/systemd/system/dnsmasq.service
[Unit]
Description=dnsmasq - A lightweight DHCP and caching DNS server
Requires=network.target
Wants=nss-lookup.target
Before=nss-lookup.target
After=network.target
[Service]
Type=forking
PIDFile=/run/dnsmasq/dnsmasq.pid
# Test the config file and refuse starting if it is not valid.
ExecStartPre=/etc/init.d/dnsmasq checkconfig
# We run dnsmasq via the /etc/init.d/dnsmasq script which acts as a
# wrapper picking up extra configuration files and then execs dnsmasq
# itself, when called with the "systemd-exec" function.
ExecStart=/etc/init.d/dnsmasq systemd-exec
# The systemd-*-resolvconf functions configure (and deconfigure)
# resolvconf to work with the dnsmasq DNS server. They're called like
# this to get correct error handling (ie don't start-resolvconf if the
# dnsmasq daemon fails to start).
ExecStartPost=/etc/init.d/dnsmasq systemd-start-resolvconf
ExecStop=/etc/init.d/dnsmasq systemd-stop-resolvconf
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Hmm, these lines seem pretty suspicious:
# The systemd-*-resolvconf functions configure (and deconfigure)
# resolvconf to work with the dnsmasq DNS server. They're called like
# this to get correct error handling (ie don't start-resolvconf if the
# dnsmasq daemon fails to start).
ExecStartPost=/etc/init.d/dnsmasq systemd-start-resolvconf
ExecStop=/etc/init.d/dnsmasq systemd-stop-resolvconf
Snooping around in /etc/init.d/dnsmasq
gives us all the details:
josh@treebeard:~$ grep -m 1 -A 18 "RESOLV_CONF" /etc/init.d/dnsmasq
# RESOLV_CONF:
# If the resolvconf package is installed then use the resolv conf file
# that it provides as the default. Otherwise use /etc/resolv.conf as
# the default.
#
# If IGNORE_RESOLVCONF is set in /etc/default/dnsmasq or an explicit
# filename is set there then this inhibits the use of the resolvconf-provided
# information.
#
# Note that if the resolvconf package is installed it is not possible to
# override it just by configuration in /etc/dnsmasq.conf, it is necessary
# to set IGNORE_RESOLVCONF=yes in /etc/default/dnsmasq.
if [ ! "${RESOLV_CONF}" ] &&
[ "${IGNORE_RESOLVCONF}" != "yes" ] &&
[ -x /sbin/resolvconf ]
then
RESOLV_CONF=/run/dnsmasq/resolv.conf
fi
josh@treebeard:~$ grep -A 18 "start_resolvconf()" /etc/init.d/dnsmasq
start_resolvconf()
{
# If interface "lo" is explicitly disabled in /etc/default/dnsmasq
# Then dnsmasq won't be providing local DNS, so don't add it to
# the resolvconf server set.
for interface in ${DNSMASQ_EXCEPT}; do
[ ${interface} = lo ] && return
done
# Also skip this if DNS functionality is disabled in /etc/dnsmasq.conf
if grep -qs '^port=0' /etc/dnsmasq.conf; then
return
fi
if [ -x /sbin/resolvconf ] ; then
echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.${NAME}${INSTANCE:+.${INSTANCE}}
fi
return 0
}
So it looks like dnsmasq sets RESOLV_CONF=/run/dnsmasq/resolv.conf
, then calls resolvconf
(if available) to fill out
that config file. That makes some sense to me: dnsmasq recognizes that resolvconf
should be the "canonical" source for
this information, and so dnsmasq intentionally defers to resolvconf
.
But where does resolvconf
get its configs?? ChatGPT points me to /run/resolvconf/interface
, which indeed seems to be
the place:
josh@treebeard:~$ for FILE in /run/resolvconf/interface/*; do echo "$FILE"; cat "$FILE"; done
/run/resolvconf/interface/eth0.dhclient
domain lan
nameserver 192.168.8.250
/run/resolvconf/interface/lo.dnsmasq
nameserver 127.0.0.1
/run/resolvconf/interface/systemd-resolved
nameserver 100.100.100.100
search tailaf9b3.ts.net
We could go even deeper here (how were those configs created?), but my curiosity is satiated for today. Let's disable
those ExecStartPost
and ExecStop
directives we saw in the unit file earlier and lay /run/dnsmasq/resolv.conf
to
rest once and for all. Manually editing /lib/systemd/system/dnsmasq.service
seems unhygienic to me; ChatGPT redirects
me to systemctl edit dnsmasq
, which instead creates an override file—neat!
josh@treebeard:~$ systemctl cat dnsmasq | grep -E 'Exec(StartPost|Stop)'
ExecStartPost=/etc/init.d/dnsmasq systemd-start-resolvconf
ExecStop=/etc/init.d/dnsmasq systemd-stop-resolvconf
josh@treebeard:~$ sudo systemctl edit dnsmasq.service
# Add the following override in the editor:
# [Service]
# ExecStartPost=
# ExecStop=
josh@treebeard:~$ sudo systemctl daemon-reexec
josh@treebeard:~$ sudo systemctl daemon-reload
josh@treebeard:~$ sudo systemctl restart dnsmasq
# Unsure if I needed _all_ of those; copy-pasting from the AI...
josh@treebeard:~$ systemctl cat dnsmasq | grep -E 'Exec(StartPost|Stop)'
ExecStartPost=/etc/init.d/dnsmasq systemd-start-resolvconf
ExecStop=/etc/init.d/dnsmasq systemd-stop-resolvconf
ExecStartPost=
ExecStop=
And now for the moment of truth:
josh@treebeard:~$ ls -l /run/dnsmasq/resolv.conf
-rw-r--r-- 1 root root 52 Jun 19 12:42 /run/dnsmasq/resolv.conf
josh@treebeard:~$ sudo rm /run/dnsmasq/resolv.conf
josh@treebeard:~$ ls -l /run/dnsmasq/resolv.conf
ls: cannot access '/run/dnsmasq/resolv.conf': No such file or directory
josh@treebeard:~$ sudo systemctl restart dnsmasq
josh@treebeard:~$ ls -l /run/dnsmasq/resolv.conf
ls: cannot access '/run/dnsmasq/resolv.conf': No such file or directory
# Success! File was not created after restarting the service.
🥳
Summary
Wow, that took longer than expected. But the end result is exactly what we wanted: everything on my local network uses
treebeard
to resolve DNS queries (including treebeard
itself). dnsmasq on treebeard
is configured to resolve most
*.simpsonian.ca
queries by itself, and anything it can't resolve gets routed upstream to 9.9.9.9
. Tailscale's
MagicDNS is disabled on treebeard
—with everything we've learned, I'm sure I could get it to play nicely with the
rest of our configurations, but I wasn't using MagicDNS anyways, so let's leave it off.
Root causes
So, what was the actual inciting incident that led to all this gnashing of the teeth? Well, I live in an fairly old apartment building (without central AC), and I was hosting some visitors from out of town. Accordingly, I had a portable AC unit running and started to take care of the vacuuming—but as it turns out, a 15-amp circuit can't support a refrigerator, portable AC, vacuum, and all my home electronics. Not for long, anyways.
Specifically, here's my best guess at the causal pathway:
- I update Tailscale over the course of months without actually ever restarting it.
- One day, I run the vacuum cleaner on a busy circuit, causing a fuse to blow.
- All my home electronics violently lose power.
- As a side effect,
treebeard
is restarted, and there's something wrong with the resulting mess of DNS-related config files post-reboot. - Oh also, I think I might've partially fried my old router; that started showing double-digit packet loss on the internal network too. But that's a story for another day…
So like I said: my vacuum cleaner killed my WiFi. But at least now I know how to fix it.
Until next time, may all your DNS queries resolve successfully.
Addendum
For the brave souls that made it through that great slog, allow me to reward you with a story this whole saga reminded me of. Many moons ago, when I was still fresh-faced and full of zeal, I did an internship at Facebook. When something went bigly wrong at Facebook, the policy was to write up a postmortem detailing exactly what 'sploded, and how it came to pass. Well, one day some trees fell and severed some cables at a data centre, leading to a partial outage and this all-time great example of dry wit:
SEV-551 Postmortem
Description: a number of trees fell, severing all links to a data centre, causing 24% of all traffic to be dropped for 46 minutes.
Root cause: yes.
Technically it's the exponential weighted moving average of that count, evaluated for the past 1, 5, and 15 minutes.