Continuing the discussion from CI Meeting 2023/09/27:
An update on my research about using Pegasus as our workflow management engine: my preliminary reading of the documentation is encouraging. Pegasus checks most if not all of our boxes:
- Native support for containerized jobs
- Provenance data is collected in a database, and the data can be summarized with tools such as pegasus-statistics, pegasus-plots, or directly with SQL queries.
- Workflow API: workflows can be defined using YAML but there is a Python API recommended for creating Pegasus workflows.
- Many Execution Environments are supported:
- Local Execution
- Condor Pools and Glideins
- Grids
- Clouds
I refuse to fall victim to the sunk cost fallacy here; just because I have already spent a considerable amount of time working on a custom workflow management solution, it is primitive compared to Pegasus and it’s hard to imagine that the disadvantages of using Pegasus could outweigh the benefits in terms of features and robustness.
There are people at NCSA having some familiarity with Pegasus, so I’m getting their input as well. Any additional insight from the MUSES community would be appreciated.