Kubernetes worker node unable to connect to POD network CIDR after reboot
It appears that your Kubernetes Worker3 node was experiencing network connectivity issues within its pods on specific subnet ranges (10.244.2.) while being able to communicate with the range of addresses in a different private IP space (10.244.1.). After flushing iptables, resetting kubeadm configurations and re-joining your node as part of troubleshooting efforts — when those steps failed — you discovered that deleting all existing pods related to “canal” in Kubernetes allowed them to be automatically reconverted upon recreation.
This suggests there might have been static routes or IPAM configuration issues, possibly caused by stale routing entries left over after the initial reboots and configurations were performed incorrectly due to some changes made on-premises that weren’t accounting for within your networking set up in Kubernetes (for example with a third-party network bridge service).
Here are steps you can take going forward:
-
Check Route Table Manually - Ensure the routes correspond to what should be assigned by looking at
ip route
rather than just relying on kubeadm output, which may not always match reality (like with CIDR assignments).ip rule show # for iproute2 based systems. For others use netstat -rn or similar utility appropriate to your system and network tools installed.
-
Verify Kubernetes Network Policy: Ensure that the service accounts have correct CIDR ranges assigned using
kubectl describe services
for any external facing traffic, which can often clarify issues with node-to-node communication within your cluster if all else fails after cleaning up resources related to specific namespaces or applications.kubectl get svc --namespace [service namespace] -o wide | grep service name and port number here # Also verify the routes with: kubectl describe node <node-name> for a fuller picture of assigned IP ranges etc. on that specific worker/master, including ExternalIPs if they exist which can be useful in multihost networking scenarios or when using cloud providers where such addresses may come into play as well (though this seems less likely here).
-
Review Pod Resources and Logging: Look at the logs of your pod’s containers, especially for network-related errors which might not be immediately apparent from just routing checks alone; these could provide clues to why certain packets fail that others don’t (like missing required resources or misconfigurations).
kubectl logs [pod id/name] --namespace=[service namespace, if applicable] # may need specific flags for container exec'ing within pod. This will depend on your cluster setup and what you actually want to extract from the output (logs could be verbose! use `--min-loglevel=info` or similar.)
-
Consult Cluster Network Documentation: Look up documentation specific for Kubernetes networking, especially if using Calico as a network plugin — it’s possible that CNI configurations are contributing to issues you haven’t considered yet and they can be complex (e.g.,
caliper
or other similar tools). -
Check DHCP Reservations: If your cluster uses dynamic IP assignment, ensure no conflicting reservations exist in the router settings which might interfere with Kubernetes’ network assignments post-reboot/cleanup (though this typically wouldn’t involve flushing iptables).
-
Restart Services and Retest: After making changes based on these suggestions, restart services as necessary for them to take effect within the cluster networking stack – then perform another round of pinging or traceroutes if possible (to trace where packets fail) before retrying
kubeadm join
with fresh configurations. -
Increase Observability: Consider setting up better observability using Prometheus and Grafana, along with Fluentd for logging; they can be particularly helpful when diagnosing issues that are not immediately apparent through command-line tools alone due to their visualization capabilities — a very powerful combination which makes it easier spot trends or irregular behavior in your networking setup.
If after all these steps the issue persists, consider reaching out again with more specific details (perhaps sharing relevant parts of Kubernetes and network configurations if sensitive) so that further guidance can be provided on troubleshooting based upon exact configuration issues likely at play here rather than broad guesses which might not apply to your particular scenario.