Redefine Deployment (Portainer) PVC After Node Failure in K3s Cluster
In a home lab running on k3s cluster with multiple nodes (k8s-node-001
as control plane, k8s-node-002
and others as workers), an unexpected node failure occurred. The affected worker (originally named k8s-node-003
) was forcefully removed due to a disk issue leading the associated Persistent Volume Claim (portainer PVC
).
Initial steps:
- Deleted portainer deployment using kubectl command, resulting in missing nodes for scheduling and subsequent errors indicating unavailability of
persistentvolumeclaim "portainer"
. - Attempted to reinstate the Portainer setup by scaling down its Deployment (zero replicas) before trying again with a single Replica but without success due to PVC not being found post-node removal.
- Recognized that manual intervention was required rather than allowing automatic reconciliation, as previous strategies didn’t restore the lost data or correct configurations after node failure in Kubernetes cluster (k8s).
- Considered applying
helm
for updated configuration with specific flags (--create-namespace
,--set
) but faced persistent scheduling issues post PVC deletion and reinstatement, indicating deeper misconfiguration possibly related to affinity rules or storage class settings associated within the Kubernetes cluster namespace (portainer). - Resolved by directly editing Portainer’s deployment configuration with
kubectl edit
command:kubectl edit deploy portainer -n portainer --output yaml > tempfile.yaml # Opened the saved file, modified nodeName to match current healthy nodes within cluster (e.g., replacing old hostname with new ones), then re-apply changes: cat tempfile.yaml | kubectl apply -f -
- After applying these manual edits and adjustments back into the configuration, successfully resolved issues as evidenced by Portainer’diction being properly scheduled again within Kubernetes cluster without further complications of persistent volume claims or node failures impacting deployment statuses directly related to portainer setup (k8s).
- Conclusion: Directly editing and re-applying the configuration for
deployment
resources in specific namespaces (portainer
) can be an effective troubleshooting strategy when automated solutions like helm fails or fall short, especially after node failures where manual intervention on deployment configurations is necessary to realign with cluster’s current state.
Key Takeaways: Manual configuration tweaks following unexpected Kubernetes resource misconfigurations can often resolve issues related to PVC unavailability and scheduling errors when nodes are removed, especially for self-hosted deployments on platforms like k3s where direct control over namespace settings is viable.