Public uploads and Calculation Engine data hosting

Recently there have been discussions about the role of the public upload system provided by the Calculation Engine. I would like us to discuss the ideas around this topic and see if we can agree on whether we should expand and advertise this capability to meet the needs of the research community.

Currently the CE allows users to upload a limited amount of data that is stored persistently and made available as input files to workflows. Users can mark their uploads as “public”, allowing other users to also use them in their workflows in a read-only mode. The idea is that for frequently used static input files, we can save bandwidth and time by maintaining a local copy of these files. The public setting saves collaborators the hassle of downloading and reuploading a copy that would then unnecessarily waste their time and upload quota.

@mrpelicer and @dfriede1 were envisioning an expansion of this CE feature. Would the two of you mind describing your idea so we can get some feedback?

For me, the main advantage of the Public Uploads page is to easily fetch the UUID and checksum of the public uploads.

The main example where this is useful for everyone is Crust-DFT. It has several input files that already exist as public uploads (and are essential for a quick run of the module) , but it is very hard (maybe impossible?) to find their UUIDs and checksums. Having a page that shows all the public uploads makes this search very straightforward.

Also, from the latest MUSES meeting, we know there are people interested in having a centralized place to have static EOSs uploaded (especially for heavy-ion, since NSs have compOSE).

I do believe it can become messy if people start making lots of tables public. We could go back to our last conversation about it and make another type of upload. We could differentiate between Public files and Shared files, where Public files are very limited and uploaded by admins, and Shared files can be used by anyone, but the UUID and checksum have to be given person-to-person.

1 Like

Hi Mateus, Hi Andrew,
I agree with Mateus that it can be messy really quick if everyone can upload their own EoS in public.

I take this opportunity to re-precise the original idea for me (heavy-ion user). The idea is to have a few standard EoS available, that have been checked, or even published under peer-reviewed papers already. We could then simply download them and use them (without thinking on how to calculate it through the CE).

The order of magnitude should be something like 4-5 EoS for heavy-ions in total.

A concrete example is what standard we have right now in the heavy-ion community. We have 2D EoS (hot QCD, Neos bqs, MUSES), 4D EoS (MUSES, 4D Neos), 2D EoS with a critical point (BEST, MUSES). (MUSES here means that “it can be generated through the CE”). The idea is simply to have the MUSES EoS directly for each cases (and maybe a few other cases) available with standard parameters.
A hydro user could then simply write “we used 4D EoS, [name of the standard EoS], from the MUSES collaboration standards \cite{MUSES collab}”
Of course, someone studying the impact of the EoS in the hydrodynamics would be still interested in generating EoS with the CE. But if they study something else, then MUSES collaboration EoS can be the standard.

So yes, this idea of very limited Public Files perfectly fits what we talked about during the last collaboration meeting.
Thanks a lot!

For the time being, we will add a Public Uploads page that is only accessible if you are logged in.

The original idea of the public tag is to share files between workflows, and we want to keep it. For this to work, anyone can make their uploads public. Over time, the number of public uploads can become large if the number of collaborative projects increases, and the “Special EOSs” would get lost in the crowd.

Instead of having a dedicated repository page on the CE, we can have a dedicated page on the MUSES website with links to a Zenodo page containing the special EOS files to publicize them.

The person who generated the files can upload them publicly in the CE and mention the upload path, UUID, and checksum on the website, so that people can find and use them in workflows. But we cannot make the promise that the administrators will maintain them in the CE. It would be a responsibility of the person who made the uploads.

At the end of the day, the @leadership should have the final say.