Tanzu Kubernetes Grid+ getting started – Tips
August 4, 2020Tip(s) #1 TKG / Photon OS 3.0 and Private Registry
- vSphere Integrated Containers / Harbor as private registry (link)
- Easy-to-deploy private registry that consumes native vSphere resources and integrate into an existing environment easily. It takes roughly 5-10 minutes to deploy a secured Harbor private registry integrated with (in my case) Active Directory. Custom certs can be provided at install time, or replaced easily after install.
- ErrImagePull: temporary failure in name resolution reg.corp.local
- Ensure your private registry is reachable on a domain other than .local. There are known issues with systemd-resolved
- There are some workarounds that involve symlinks to
/run/systemd/resolve/resolv.conf
, or updating/etc/systemd/resolved.conf
to manually add the desired DNS servers, however they are not officially supported.
- ErrImagePull: x509: certificate signed by unknown authority
Your CA signed cert is not trusted by the Photon OS node. In my case, I was using a wildcard certificate issued by Sectigo.- SSH into worker node
To get node IP (The node with ‘md’ in the name is the worker):
kubectl get nodes -owide
- Update all packages:
tdnf update
Only update root CA package:
tdnf upgrade ca-certificates
- Place copy of CA certificate chain in /etc/ssl/certs/priv-registry-chain.crt (You can use scp or just copy the certificate chain contents into a file with vi)
cat /etc/ssl/certs/priv-registry-chain.crt >> /etc/pki/tls/certs/ca-bundle.crt
- Execute the following to rehash certs:
c_rehash
- And finally, execute:
systemctl restart containerd kubelet
- SSH into worker node
- ErrImagePull: Pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
- Not sure if this is a new bug or just bad input on my part, however I found prior art regarding issues with loading a Docker config.json as a
kubernetes.io/dockerconfigjson
object. Using the commands in the post I was able to determine I was missing the auth fields in the generated secret.
kubectl get secret docker-registry {secret-name} –output=json | jq “.data[]” -r | base64 –decode
Correct format with auth field:
{“auths”:{“a.com”:{“username”:”user”,”password”:”pass”,”email”:”a@a.com”,”auth”:”dXNlcjpwYXNz”}}}
Incorrect format; notice the missing auth field:
{ “auths”: { “privreg.corp.com”: {}, “privreg.corp.com:443”: {} }, “HttpHeaders”: { “User-Agent”: “Docker-Client/19.03.12 (windows)” }, “credsStore”: “desktop”, “experimental”: “disabled”, “stackOrchestrator”: “swarm”
As I was editing this, I realize this must be a bug as the decoded config is not even a complete json definition. - My resolution was to manually create a
harbor-registry
secret using the following:
kubectl create secret docker-registry {secret-name} –docker-email={email} –docker-server={private registry fqdn} –docker-username={username} –docker-password={password}
- Not sure if this is a new bug or just bad input on my part, however I found prior art regarding issues with loading a Docker config.json as a
- Debugging image pulls on a node
- SSH Into the worker node
- Use the following to try to manually pull an image to the node, adjust according to your needs:
ctr –debug image pull -u {username}:{password} privreg.corp.com:443/project/{imagename}:{version}
Tip #2 TKG utilizes the containerd runtime
- This point is mostly salient when you need to troubleshoot a node. I did not realize this initially, which led to some initial confusion locating logs and information.
ctr
can be used to interact withcontainerd
as demonstrated above
Tip #3 TKG & Namespaces
- When deploying a tkg cluster to a specific namespace, be sure to update your manifests to reflect this, possibly even update your context to set the default namespace to work in:
kubectl config set-context –current –namespace=dev
This will potentially save you from having to re-deploy to the correct namespace 😂
Versions Used
vSphere 6.7u3 | kubectl: 1.18 |
TKG (Tanzu Kubernetes Grid+): 1.1.2 | Kubernetes: 1.18.3 |
VIC (vSphere Intergrated Containers): 1.5.5 | tkg-cli: 1.1.2 |