(This topic is primarily for the benefit of the MUSES cyberinfrastructure team documentation and administrators @andrew.manning @rhaas and @mcarras2 )
Three of the critical web services used by the MUSES collaboration are Discourse (this forum), Nextcloud (cloud file storage and calendar/groupware solution), and HedgeDoc (real-time collaborative documents). These services should be resilient against data corruption and other failure modes.
Data storage volume types
The MUSES web services are backed up in multiple ways, depending on the persistent volumes backing the service data.
-
Longhorn (
longhorn
) volumes are backed up using the native backup system configured via the Longhorn web UI. The target location of the backups isradiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backup/backupstore
, but these are not raw files; they must be restored via Longhorn. -
NFS-mounted NCSA Condo storage (
nfs-condo
) volumes are backed up manually toradiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backups
.
Accessing the backup files
Backups stored in the NFS-mounted NCSA Condo storage under radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses
are accessed by opening an SSH terminal into a worker node and mounting the directory via NFS. The SSH access requires adding your SSH public key to the authorized_keys
in the cluster nodes and the commands below assume an SSH config like so:
$ cat ~/.ssh/config.d/muses
# Automatically created by terraform
Host muses-controlplane-0
HostName 141.142.217.150
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
IdentityFile /home/manninga/.ssh/muses.pem
User centos
...
Host muses-worker-0
HostName 192.168.0.119
StrictHostKeyChecking no
ProxyJump muses-controlplane-0
UserKnownHostsFile=/dev/null
IdentityFile /home/manninga/.ssh/muses.pem
User centos
...
SSH into a worker node for example as shown below and mount the Condo volume via NFS. The subsequent commands assume a base path /mnt
.
$ ssh muses-worker-0
$ mount radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses /mnt
Backup chart
I created a backup system with a Helm chart that can be added as a dependency to the Helm charts of the deployed services, allowing us to enable backups by sprinkling in some lines to the relevant values.yaml
file like this example from our Discourse chart:
...
backups:
enabled: true
volume:
nfs:
basePath: "/radiant/projects/bbdr/muses/backups/discourse"
server: "radiant-nfs.ncsa.illinois.edu"
data:
enabled: true
persistence:
claimName: "discourse-data-pvc"
Read more about how it works in the backup chart’s Readme, including how to enable the restore deployment for convenient restoration of backups.
Discourse
The backup locations listed below assume a base path radiant-nfs.ncsa.illinois.edu:/radiant/projects/bbdr/muses/backups
Discourse has a native backup system that generates nightly .tar.gz
archive files of the entire instance. To restore these you install a fresh deployment and then restore from one of these files. To ensure these backup files are available in the event the entire deployment and associated PVCs are deleted, the backups chart is used to take snapshots of the Discourse data. This is redundant in the sense that it is snapshots of snapshots.
An example of a backup file location is /discourse/discourse-data-pvc/snapshot.0/snapshots/data/discourse/public/backups/default/muses-2022-04-06-033933-v20210420015635.tar.gz
Nextcloud and HedgeDoc
Nextcloud and HedgeDoc both use the common pattern of having a flat file data volume and a SQL database volume. The backups chart handles this common configuration for both MySQL and PostgreSQL databases.
Nextcloud backups are found in folders with the patterns
-
/nextcloud/nextcloud-data/snapshot.X
for the flat files and -
/nextcloud/nextcloud-db/YYYYMMDD
for the database dumps.
HedgeDoc backups are in analogous locations.