Curated data files for shared workflow components

There appears to be a need for a curated list of data files that people can use in shared workflow components. The specific issue at the moment is the use of EoS tables in this list from the Crust DFT documentation. We need a way for people to efficiently reference the same file for the same purpose without having to upload it again (supported) while also enabling provenance/reproducibility of results (not yet supported).

Here’s an idea for a first iteration of such a shared data file system: A curated list of shared input data files will be uploaded (owned by the admin user so no one accidentally deletes them). The “name”, a description, and most importantly the checksum of the file (e.g. md5sum) will be registered on some central document, perhaps initially a wiki-post on the forum. I can modify the CE such that when you specify an uploaded file as a process input you must include that file’s checksum, which the CE would validate prior to executing the process in the workflow, and failing if the checksum mismatches. The checksum is thus the immutable piece of information required for reproducibility, allowing the file to be reuploaded (and given a new UUID) as dictated by operational needs without making it impossible to verify a previous workflow result.

Clearly this system will need revising but we need something that works first and then we can worry about what works well. @jakinh I need the scientists to respond with comments and suggestions based on how you plan to use the CE.

By the way, @sroy14 , four of these links are dead (as of 2024/10/07 15:27 UTC):

Description UUID Checksum Path
Fiducial L450 a4f8be0f-57b6-4fe4-b460-3ef5720ead7b e39eb584e5587b74a90c74f9648f8e5f /public_data/eos_tables/du21/fid_450_2_6_22.o2
Large SL no leptons 94be89d9-1a62-43de-8acc-d49c3dced5f4 f69813f6aad2f80d597cb829667a9265 /public_data/eos_tables/du21/large_sl_nolep_noderiv_2_5_22.o2
Large R no leptons 6f43de4d-aa80-4b88-a7ee-17ab3c5517b1 2aef9e3adc8381427ea08aa8192f10c5 /public_data/eos_tables/du21/large_r_nolep_noderiv_2_5_22.o2
Small SL no leptons 6ef796a4-a2bf-4390-a1a7-0052b94dfd46 80e02501e476fdd141ab056b9e8d32c7 /public_data/eos_tables/du21/small_sl_nolep_noderiv_2_5_22.o2
Small R no leptons db7c7ee5-a54d-4c1d-aa6a-885421455bbc 7c1bf99473f60805891a20fe479d496b /public_data/eos_tables/du21/small_r_nolep_noderiv_2_5_22.o2
Smaller R no leptons 8664a7d7-38d6-49bc-9827-16d3290f7f94 c9882398d0636fb6660d92ebeb15790f /public_data/eos_tables/du21/smaller_r_nolep_noderiv_2_5_22.o2
Fiducial L450 no leptons e5879703-5d8e-4489-908f-57a86c870711 3b36f9ba127d008b77625026193f7bef /public_data/eos_tables/du21/fid_450_nolep_noderiv_2_5_22.o2
Large SL a4354153-84f2-4812-9c80-f501e7b057c3 0a4fcf8cc1e0a8e37eee5ca47c73c0c9 /public_data/eos_tables/du21/large_sl_2_6_22.o2
Electrons and photons a4245f15-19d6-4127-b960-b1c891853450 c70096af219f58070e2a6dcaac3593f8 /public_data/eos_tables/du21/electron_photon.o2
Large Mmax 8acd4e96-e2ce-498a-a811-676c5276d915 ce327473bdf9ae10e02eaf86e9c34f9f /public_data/eos_tables/du21/large_mmax_2_6_22.o2
Nuclear masses 0aae8e14-22e6-4f1c-a228-4de643ba7cb3 5881b123855849abd8cd946eb031e884 /public_data/eos_tables/du21/nuclear_masses.o2
Fiducial d1ed1c63-6192-4ac9-9cb1-a7d82cc27b72 164575f9d84c3ac087780e0219ee2e8a /public_data/eos_tables/du21/fid_3_5_22.o2
Large R a5f07b17-41a8-491a-a785-a2afd08ad4fb 19688cbba5f359efb2aa2f6673914285 /public_data/eos_tables/du21/large_r_2_6_22.o2
Small SL 8e5f694e-6e96-4be0-8876-2c38f69524fd 42a33681ad010b3804eb5d3ab2f68154 /public_data/eos_tables/du21/small_sl_2_6_22.o2
Small R 31d86bc1-5677-4c5e-a4e9-e088aa350092 c47d687394f935dcb0e18c5f0e8a75ed /public_data/eos_tables/du21/small_r_2_6_22.o2
Smaller R 30258dc4-d142-47e0-895c-e959d907dab7 6d6444c2ac7fc88656d41ae846aa3fe4 /public_data/eos_tables/du21/smaller_r_2_6_22.o2
Fiducial L414 9db6a7a6-746d-4d13-ba96-15522798bb25 21cdff54b747ab6d24299e13c79b424e /public_data/eos_tables/du21/fid_414_2_6_22.o2

Thank you for uploading the files, Andrew.

  1. I tried running crust-DFT + Lepton + QLIMR with all the input files that do not contain ‘no lepton’, but all the Crust DFT outputs I’m getting are identical. I don’t think this is expected. Nevermind this. I ran it again and it miraculously worked.

  2. From my conversations with @sroy14, I think the ‘No Lepton’ tables are not used for MUSES.

Can you confirm this, @sroy14?

Yes, that is correct. The files with leptons have both data with and without leptons in them so they are for now sufficient when used with inc_lep false flag.

@sroy14 Are you saying that the four missing files are irrelevant then?

large_max_2_6_22.o2 is good, the other 3 are not necessary right now.

So…where can we find this file? :smile: Also, this morning we discussed a need for version control for these files. On our side we can calculate the checksum for each file and include that in the workflow spec to ensure that the file is immutable for the sake of reproducibility. But on your side it would be best if you could publish these files to something like Zenodo so there is an immutable, version-controlled, and DOI-citeable copy.

I am working with Dr. Steiner to fix this. In the meantime try Tables to download — eos documentation

@sroy14 I don’t understand; those are the same URLs for the files.

I meant to say, try the large_max_2_6_22.o2 download again to see if it works.

The problem is that the URL that works has an extra m character:

https://isospin.roam.utk.edu/\
public_data/eos_tables/du21/\
large_mmax_2_6_22.o2

At least now we have all the relevant files. What are your thoughts on the version control?

@sroy14 Did I understand you correctly that these files are no longer necessary for Crust DFT to function because the module is now calculating the values instead of interpolating from these tables?

They are not strictly necessary, but I have a hunch most users will choose to use them as they help make the calculations faster.

Nice to see that the md5sums listed in the Zenodo record you shared match what we uploaded.

You may want to update that Zenodo record’s description to include more detail about those specific files, giving some immediate general information about what they are, but also referencing immutable content. As it is, the “Full documentation at https://np3m.org/code/e4mma” is evolving. Consider linking to this page for example, so it is at least pinned at a commit coinciding with the data files.

@awsteiner @sroy14

It seems that the
smaller_r_2_6_22, small_r_2_6_22 and small_sl_2_6_22 are not working as expected anymore.

I’m getting the error:
run_module assert checksum == md5sum ^^^^^^^^^^^^^^^^^^ AssertionError for those.

Here’s the definition I’m using, to show there’s no apparent mistake on my part:

jobs = [
    {'file': 'fid_3_5_22',
     'input_uuid': 'd1ed1c63-6192-4ac9-9cb1-a7d82cc27b72',
     'checksum': '164575f9d84c3ac087780e0219ee2e8a'},
    {'file': 'large_sl_2_6_22',
     'input_uuid': 'a4354153-84f2-4812-9c80-f501e7b057c3',
     'checksum': '0a4fcf8cc1e0a8e37eee5ca47c73c0c9'},
    {'file': 'large_r_2_6_22',
     'input_uuid': 'a5f07b17-41a8-491a-a785-a2afd08ad4fb',
     'checksum': '19688cbba5f359efb2aa2f6673914285'},
    {'file': 'small_sl_2_6_22',
     'input_uuid': '8e5f694e-6e96-4be0-8876-2c38f69524fd',
      'checksum': 'c47d687394f935dcb0e18c5f0e8a75ed'},
    {'file': 'small_r_2_6_22',
     'input_uuid': '31d86bc1-5677-4c5e-a4e9-e088aa350092',
     'checksum': '6d6444c2ac7fc88656d41ae846aa3fe4'},
    {'file': 'smaller_r_2_6_22',
     'input_uuid': '30258dc4-d142-47e0-895c-e959d907dab7',
     'checksum': '42a33681ad010b3804eb5d3ab2f68154'},
    {'file': 'fid_414_2_6_22',
     'input_uuid': '9db6a7a6-746d-4d13-ba96-15522798bb25',
     'checksum': '21cdff54b747ab6d24299e13c79b424e'},
    {'file': 'fid_450_2_6_22',
     'input_uuid': 'a4f8be0f-57b6-4fe4-b460-3ef5720ead7b',
     'checksum': 'e39eb584e5587b74a90c74f9648f8e5f'}
]

If you look at the table above, the upload on ce.musesframework.io with that UUID has md5sum 6d6444c2ac7fc88656d41ae846aa3fe4. My admin view of the uploads confirms this.

You mean the small_r_2_6_22, right?

That’s the checksum I am using, yet I get the assertion error…

@mrpelicer There must have been a transcription error. I reconstructed the upload table above by directly copying and pasting text from the CE interface. It looks like the “small_sl, small_r, smaller_r” file checksums had been rotated but should work now.

Pro tip: Use the table expand button see read tables like this more easily:

It works now. Thank you.