Portworx: Failed to load PX filesystem dependencies for kernel

September 22, 2020 By Corey Dinkens

This is a follow-up my previous post on Architecture considerations for stateful Kubernetes applications and is specific to VMWare’s Tanzu Kubernetes Grid (TKG) implementation of Kubernetes. In lieu of utilizing NFS pod to gain RWX (aka ReadWriteMany) access to vSphere volumes, I decided to go a different route.

Important caveat #1 for TKG users: It is important to note that this is currently only recommended in test/dev environments. Portworx confirmed the Kernel headers issue is planned to be fixed in their v3 release.

Important caveat #2 for TKG/vSphere users: You cannot generate a spec from PX central for the ‘Essentials’ tier that includes support for creating vSphere volumes… yet. This was confirmed to be on the way however. I just checked PX Central and it appears this functionality has been fixed/added since writing this.


Portworx Install & Kernel headers

VMWare’s vSan is not currently an option, so in looking for an alternative I eventually stumbled onto Portworx, a Kubernetes storage provider that was created specifically to address multi-node accessible storage pools. This addresses my RWX (ReadWriteMany) issue that I discussed in my previous post. Their free version supports up to 5 nodes, and a 5TB storage pool.

After getting Portworx installed, I started checking on the progress and noticed that none of the Portworx pods were starting correctly. What is going on?

List the Portworx pods:

kubectl get pods -l name=portworx -nkube-system

Then inspect pod(s) failing to start:

kubectl inspect pod portworx-cbmsl -nkube-system
Events:
  Type     Reason                             Age                    From                                           Message
  ----     ------                             ----                   ----                                           -------
  Normal   PortworxMonitorImagePullInPrgress  56m                    portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Normal   PortworxMonitorImagePullInPrgress  40m                    portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Warning  FileSystemDependency               40m                    portworx, tkg-dev-md-0-565567f9c9-wxz7m  Failed to load PX filesystem dependencies for kernel 4.19.132-1.ph3
  Normal   PortworxMonitorImagePullInPrgress  25m                    portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Normal   PortworxMonitorImagePullInPrgress  9m54s                  portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Warning  Unhealthy                          63s (x11389 over 31h)  kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Readiness probe failed: HTTP probe failed with statuscode: 503

Looks like we found the issue:

Warning FileSystemDependency 40m portworx, tkg-dev-md-0-565567f9c9-wxz7m Failed to load PX filesystem dependencies for kernel 4.19.132-1.ph3
Verify Portworx’ status with pxctl (Commands formatted for Powershell):
$PX_POD=kubectl get pods -l name=portworx -nkube-system -o jsonpath="{.items[0].metadata.name}"
kubectl exec $PX_POD -nkube-system -- /opt/pwx/bin/pxctl status
PX stopped working 12m45.8s ago.  Last status: Failed to load PX filesystem dependencies for kernel 4.19.132-1.ph3

After some digging and searching with Google and tdnf, I was able to determine that the linux kernel headers were the missing piece. In Photon OS 3 the needed package is: linux-devel.

TKG is deployed using certificate auth for SSH access, under the username ‘capv’. I will need to connect to the nodes and see if I can get the dependencies installed.

SSH into the node:
ssh capv@192.168.58.28

Install modules:

sudo su -
tdnf install linux-devel

Once the updated kernel modules have been installed, reboot the node.

Confirm Portworx has started:
kubectl describe pod portworx-cbmsl -nkube-system
Events:
  Type     Reason                             Age                     From                                           Message
  ----     ------                             ----                    ----                                           -------
  Normal   PortworxMonitorImagePullInPrgress  44m                     portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Normal   PortworxMonitorImagePullInPrgress  29m                     portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Warning  FileSystemDependency               29m                     portworx, tkg-dev-md-0-565567f9c9-wxz7m  Failed to load PX filesystem dependencies for kernel 4.19.132-1.ph3
  Normal   PortworxMonitorImagePullInPrgress  13m                     portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Warning  Unhealthy                          6m8s (x11569 over 32h)  kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  FailedMount                        3m2s                    kubelet, tkg-dev-md-0-565567f9c9-wxz7m   MountVolume.SetUp failed for volume "px-account-token-z587c" : failed to sync secret cache: timed out waiting for the condition
  Normal   SandboxChanged                     3m1s                    kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                            3m1s                    kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Pulling image "portworx/oci-monitor:2.6.0"
  Normal   Pulled                             3m                      kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Successfully pulled image "portworx/oci-monitor:2.6.0"
  Normal   Pulled                             2m59s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0"
  Normal   Started                            2m59s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Started container portworx
  Normal   Pulling                            2m59s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0"
  Normal   Created                            2m59s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Created container portworx
  Normal   PortworxMonitorImagePullInPrgress  2m58s                   portworx, tkg-dev-md-0-565567f9c9-wxz7m  Portworx image docker.io/portworx/px-essentials:2.6.0 pull and extraction in progress
  Normal   Created                            2m58s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Created container csi-node-driver-registrar
  Normal   Started                            2m58s                   kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Started container csi-node-driver-registrar
  Warning  NodeStateChange                    2m12s                   portworx, tkg-dev-md-0-565567f9c9-wxz7m  Node is not in quorum. Waiting to connect to peer nodes on port 9002.
  Warning  Unhealthy                          63s (x12 over 2m53s)    kubelet, tkg-dev-md-0-565567f9c9-wxz7m   Readiness probe failed: HTTP probe failed with statuscode: 503
  Normal   NodeStartSuccess                   62s                     portworx, tkg-dev-md-0-565567f9c9-wxz7m  PX is ready on this node

Excellent, the pod has started.

Verify Portworx status with pxctl:
$PX_POD=kubectl get pods -l name=portworx -nkube-system -o jsonpath="{.items[0].metadata.name}"
kubectl exec $PX_POD -nkube-system -- /opt/pwx/bin/pxctl status
Status: PX is operational
License: PX-Essential (lease renewal in 23h, 49m)
Node ID: {GUID}
        IP: 192.168.58.28 
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       HIGH            raid0           57 GiB  5.0 GiB Online  default default
        Local Storage Devices: 1 device
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sdb2       STORAGE_MEDIUM_MAGNETIC 57 GiB          09 Sep 20 21:56 UTC
        total                   -                       57 GiB
        Cache Devices:
         * No cache devices
        Kvdb Device:
        Device Path     Size
        /dev/sdc        150 GiB
         * Internal kvdb on this node is using this dedicated kvdb device to store its data.
        Journal Device: 
        1       /dev/sdb1       STORAGE_MEDIUM_MAGNETIC
Cluster Summary
        Cluster ID: px-cluster-{GUID}
        Cluster UUID: {GUID}
        Scheduler: kubernetes
        Nodes: 3 node(s) with storage (3 online)
        IP              ID                                      SchedulerNodeName                       StorageNode     Used    Capacity        Status  StorageStatus   Version         KerneOS
        192.168.58.28    {GUID}    tkg-dev-md-0-565567f9c9-gvp2k     Yes             5.0 GiB 57 GiB          Online  Up (This node)  2.6.0.0-208389c 4.19.138-2.ph3        VMware Photon OS/Linux
        192.168.58.29    {GUID}    tkg-dev-md-0-565567f9c9-2hd5q     Yes             7.2 GiB 57 GiB          Online  Up              2.6.0.0-208389c 4.19.138-2.ph3        VMware Photon OS/Linux
        192.168.58.21    {GUID}    tkg-dev-md-0-565567f9c9-wxz7m     Yes             5.0 GiB 57 GiB          Online  Up              2.6.0.0-208389c 4.19.145-1.ph3        VMware Photon OS/Linux
        Warnings: 
                 WARNING: Persistent journald logging is not enabled on this node.
Global Storage Pool
        Total Used      :  17 GiB
        Total Capacity  :  171 GiB

Portworx has fully started and can be utilized.

Create a StorageClass and the shared PVC:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
    name: px-shared-sc
provisioner: kubernetes.io/portworx-volume
parameters:
   repl: "1"
   shared: "true"
allowVolumeExpansion: true
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: px-shared-pvc
  annotations:
    volume.beta.kubernetes.io/storage-class: px-shared-sc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 60Gi

Hopefully this helps if you happen to encounter this issue before Portworx rolls out version 3.