Resolving ‘host not found in upstream’ Error in Nginx with Docker Swarm Services

Experiencing problems when running an nginx reverse proxy within your docker swarm, particularly where services fail to start causing nginx itself to halt? This issue often arises due to a lack of resolvable hostnames for failed service instances. Here’s how you can maintain functionality even if some services are not found:

Understanding the Issue

Nginx attempts initial lookup at startup based on static DNS resolution, which is where it may fail when dependent dockerized services aren’t responsive or reachable. This results in errors such as ‘host not found for “service_name” within upstream configuration of nginx server block xyz’. Here’s an example:

# Example from error log (simplified)
UploadPassed|172.31.0.98 54/tcp reverse-proxy_stack_nginx-reverse-proxy[...] "uptime-kuma" in /data/nginx/proxy_host/5.conf:54

In this situation, nginx fails to continue since the uptime-kuma service is not reachable due to misconfiguration or downtime of that specific dockerized application within your swarm environment.

Solution Strategy

To prevent a single point failure from halting all functionality: configure dynamic DNS entries so they’re resolved at query time rather than the static resolver during nginx startup, thus allowing individual service statuses to dictate connectivity without impacting others or stopping Nginx entirely. Here’s how you do this with jc21/nginx-proxy-manager:

Steps for Dynamic DNS Configuration:

Set Up a Resolver in your nginx configuration files, specifying where it should obtain its dynamic resolves from (either Docker’s internal service discovery or Kubernetes’ cluster services). For example:
- Using localhost with an assigned port might look like this for basic DNS resolution via system config.dnsmasq: 127.0.0.1,53/tcp resolver settings in /etc/nginx/snippets/dynamic_resolvers.

server {
 resolve_service service-name dns;
}

If using Kubernetes services (kube-dns), you might configure it as follows: 10.32.0.1:53, masters/, where the address is derived from your cluster’s internal DNS configuration or via service discovery APIs like CoreDNS with kubedns resolver settings in /etc/nginx/resolver.

server {
 set $upstream dyno-service.default; # Assign a variable to use as the upstream definition 
 resolve_service "$upstream" "kube";          # Use this DNS system with kubedns resolvers specified in nginx configs for dynamic service discovery and routing of requests within Kubernetes environment or Docker swarm's internal networking.
}

Modify Your Upstream Definitions to refer solely to these variables:
- This modification is essential so that if a particular upstream like uptime-kuma stops responding, nginx won’t halt entirely but continue routing traffic for other services correctly using the dynamic DNS resolution setup. Here’s an updated snippet of how your config might look after this change (simplified):

upstream stack_nginx {
 server web; # Default or primary service within upstack 
 $dyno-service1 backup;       # Secondary, dynamic resolvable by set variable dnsname and the assigned port for kube services. In Docker Swarm's case using localhost:53/tcp with system config.
}

By implementing these changes within your nginx configuration alongside jc21/nginx-proxy, you can ensure that Nginx will continue to proxy requests even when individual service instances fail, thereby maintaining a robust and fault-tol0nsal reverse proxy setup in Docker Swarm environments with dynamic DNS resolution.