Current status of MUSES web services backup system

(This topic is primarily for the benefit of the MUSES cyberinfrastructure team documentation and administrators @andrew.manning @rhaas and @mcarras2 )

Three of the critical web services used by the MUSES collaboration are Discourse (this forum), Nextcloud (cloud file storage and calendar/groupware solution), and HedgeDoc (real-time collaborative documents). These services should be resilient against data corruption and other failure modes.

Data storage volume types

The MUSES web services are backed up in multiple ways, depending on the persistent volumes backing the service data.

  • Longhorn (longhorn) volumes are backed up using the native backup system configured via the Longhorn web UI. The target location of the backups is radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backup/backupstore, but these are not raw files; they must be restored via Longhorn.

  • NFS-mounted NCSA Condo storage (nfs-condo) volumes are backed up manually to radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backups.

Accessing the backup files

Backups stored in the NFS-mounted NCSA Condo storage under radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses are accessed by opening an SSH terminal into a worker node and mounting the directory via NFS. The SSH access requires adding your SSH public key to the authorized_keys in the cluster nodes and the commands below assume an SSH config like so:

$ cat ~/.ssh/config.d/muses 
# Automatically created by terraform

Host muses-controlplane-0
  HostName 141.142.217.150
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null
  IdentityFile /home/manninga/.ssh/muses.pem
  User centos
...
Host muses-worker-0
  HostName 192.168.0.119
  StrictHostKeyChecking no
  ProxyJump muses-controlplane-0
  UserKnownHostsFile=/dev/null
  IdentityFile /home/manninga/.ssh/muses.pem
  User centos
...

SSH into a worker node for example as shown below and mount the Condo volume via NFS. The subsequent commands assume a base path /mnt.

$ ssh muses-worker-0
$ mount radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses /mnt

Backup chart

I created a backup system with a Helm chart that can be added as a dependency to the Helm charts of the deployed services, allowing us to enable backups by sprinkling in some lines to the relevant values.yaml file like this example from our Discourse chart:

...
backups:
  enabled: true
  volume:
    nfs:
      basePath: "/radiant/projects/bbdr/muses/backups/discourse"
      server: "radiant-nfs.ncsa.illinois.edu"
  data:
    enabled: true
    persistence:
      claimName: "discourse-data-pvc"

Read more about how it works in the backup chart’s Readme, including how to enable the restore deployment for convenient restoration of backups.

Discourse

:information_source: The backup locations listed below assume a base path radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backups

Discourse has a native backup system that generates nightly .tar.gz archive files of the entire instance. To restore these you install a fresh deployment and then restore from one of these files. To ensure these backup files are available in the event the entire deployment and associated PVCs are deleted, the backups chart is used to take snapshots of the Discourse data. This is redundant in the sense that it is snapshots of snapshots.

An example of a backup file location is /discourse/discourse-data-pvc/snapshot.0/snapshots/data/discourse/public/backups/default/muses-2022-04-06-033933-v20210420015635.tar.gz

Nextcloud and HedgeDoc

Nextcloud and HedgeDoc both use the common pattern of having a flat file data volume and a SQL database volume. The backups chart handles this common configuration for both MySQL and PostgreSQL databases.

Nextcloud backups are found in folders with the patterns

  • /nextcloud/nextcloud-data/snapshot.X for the flat files and
  • /nextcloud/nextcloud-db/YYYYMMDD for the database dumps.

HedgeDoc backups are in analogous locations.