Skip to content

Image Pull Failure on Private Registry: Solution for Kubernetes Workers & Master using Nutanix Engine in AWS EKS Cluster

Overview:

I encountered an issue where the AWX (Ansible Automation Wizard) pods were failing to pull images from a Docker-hosted private registry on my setup. Despite being able to access and push/pull custom AWX Enterprise Edition (EE) image using docker CLI, Kubernetes workers couldn’t communicate with the same repository within AWS EKS cluster managed by Nutanix Container Engine for Containers due to ImagePullBackOff errors related to certificate validation.

Problem Statement:

The pods failed with an ImagePullError and a specific error stating “Failed to resolve reference”, despite my private repository appearing correctly configured according to the logs provided below (simplified for brevity):

failed to pull image &quot;awx-repo.mydomain.com/wildcard:latest&quot; with pkg type IMAGE, as failed to do request: Head "https://<AWX_REPO>/v2/manifests/latest"
Failed to resolve reference for wildcard certificate (DOCKERHUB) in the registry's manifest. This is often caused by an incorrect or missing intermediate certification authority file from your local machine, which should be present as `/etc/containerd/certificates.d/` on both master and workers nodes

The challenge was to ensure that Kubernetes could successfully pull images using the same private registry credentials without altering their existing Docker CLI operations or changing how they deploy AWX EE instances within AWS EKS cluster managed by Nutanix Container Engine for Containers.

Resolution:

After conducting thorough testing, I identified that my Kubernetes workers and master nodes needed updated certificates to correctly authenticate with the Docker-hosted private registry using X509 certification chains (intermediate + wildcard certificate). Here’s how it was resolved on Nutanix Container Engine for Containers:

  1. Generating .pem Certificate Chain - A combination of both intermediate and the appropriate Wildcard SSL certificates were generated to form a complete trust chain, suitable for authentication purposes with Docker registries in containers managed by Kubernetes cluster workers & master nodes (as required).
  2. Converting PEM Files into Format Required By Tooling - Converted these .pem files as per the requirements of my Nutanix Container Engine tool suite, which necessitated a specific format for certificate handling within EKS Cluster setup on AWS.
  3. Updating Certificates in Containerd Directory: /etc/containerd/ (default) - The resulting certificates were then installed into the required directory (certs.d) of my private repository to ensure that worker and master nodes could authenticate against it without any issues using kubelet.
  4. Successful Pull Operation Post Resolution: ImagePulled Without Error (DOCKERHUB) - After updating certificates, AWX pods started pulling images from the private registry smoothly as expected and with no further errors related to certificate validation or image pull failure messages indicating a successful resolution of previously encountered issues.
  5. Crucial Takeaway: Certificate Configuration for Private Registries in EKS Cluster - The root cause revolved around incorrect/missing certificates on my Kubernetes nodes, which was unrelated to the actual Docker registry configuration (e.g., wildcard certification). It’s imperative that all workers and master have up-toency with valid certificate chains for successful internal communication within cluster setup using private repositories managed by kubelet.

For those facing similar issues, ensuring proper certificates on worker nodes/master is crucial when pulling images from Docker registries in a Kubernetes environment. The resolution provided here was specific to the use of Nutanix Container Engine for Containers within AWS EKS Cluster and might not be directly transferable without considering variations according to different tooling or cloud service providers being used (e.g., Google Cloud Platform, Azure).


Previous Post
Deploying One Kubernetes Master Node Across Multip
Next Post
Understanding Pipeline Invocation in Merge Request