Tanzu Kubernetes Grid+ getting started – Tips

August 4, 2020 By Corey Dinkens

Tip(s) #1 TKG / Photon OS 3.0 and Private Registry

  • vSphere Integrated Containers / Harbor as private registry (link)
    • Easy-to-deploy private registry that consumes native vSphere resources and integrate into an existing environment easily. It takes roughly 5-10 minutes to deploy a secured Harbor private registry integrated with (in my case) Active Directory. Custom certs can be provided at install time, or replaced easily after install.
  • ErrImagePull: temporary failure in name resolution reg.corp.local
    • Ensure your private registry is reachable on a domain other than .local. There are known issues with systemd-resolved
    • There are some workarounds that involve symlinks to /run/systemd/resolve/resolv.conf, or updating /etc/systemd/resolved.conf to manually add the desired DNS servers, however they are not officially supported.
  • ErrImagePull: x509: certificate signed by unknown authority
    Your CA signed cert is not trusted by the Photon OS node. In my case, I was using a wildcard certificate issued by Sectigo.
    1. SSH into worker node
      To get node IP (The node with ‘md’ in the name is the worker):
      kubectl get nodes -owide
    2. Update all packages:
      tdnf update

      Only update root CA package:
      tdnf upgrade ca-certificates
    3. Place copy of CA certificate chain in /etc/ssl/certs/priv-registry-chain.crt (You can use scp or just copy the certificate chain contents into a file with vi)
      cat /etc/ssl/certs/priv-registry-chain.crt >> /etc/pki/tls/certs/ca-bundle.crt
    4. Execute the following to rehash certs:
    5. And finally, execute:
      systemctl restart containerd kubelet
  • ErrImagePull: Pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
    • Not sure if this is a new bug or just bad input on my part, however I found prior art regarding issues with loading a Docker config.json as a kubernetes.io/dockerconfigjson object. Using the commands in the post I was able to determine I was missing the auth fields in the generated secret.
      kubectl get secret docker-registry {secret-name} –output=json | jq “.data[]” -r | base64 –decode

      Correct format with auth field:

      Incorrect format; notice the missing auth field:
       { “auths”: { “privreg.corp.com”: {}, “privreg.corp.com:443”: {} }, “HttpHeaders”: { “User-Agent”: “Docker-Client/19.03.12 (windows)” }, “credsStore”: “desktop”, “experimental”: “disabled”, “stackOrchestrator”: “swarm”

      As I was editing this, I realize this must be a bug as the decoded config is not even a complete json definition.
    • My resolution was to manually create a harbor-registry secret using the following:
      kubectl create secret docker-registry {secret-name} –docker-email={email} –docker-server={private registry fqdn} –docker-username={username} –docker-password={password}
  • Debugging image pulls on a node
    • SSH Into the worker node
    • Use the following to try to manually pull an image to the node, adjust according to your needs:
      ctr –debug image pull -u {username}:{password} privreg.corp.com:443/project/{imagename}:{version}

Tip #2 TKG utilizes the containerd runtime

  • This point is mostly salient when you need to troubleshoot a node. I did not realize this initially, which led to some initial confusion locating logs and information.
  • ctr can be used to interact with containerd as demonstrated above

Tip #3 TKG & Namespaces

  • When deploying a tkg cluster to a specific namespace, be sure to update your manifests to reflect this, possibly even update your context to set the default namespace to work in:
    kubectl config set-context –current –namespace=dev

    This will potentially save you from having to re-deploy to the correct namespace 😂

Versions Used

vSphere 6.7u3kubectl: 1.18
TKG (Tanzu Kubernetes Grid+): 1.1.2Kubernetes: 1.18.3
VIC (vSphere Intergrated Containers): 1.5.5tkg-cli: 1.1.2