Source code licenses

A software license is an important part of a source code repository. Unfortunately, most of us are not lawyers and have little experience dealing with licenses, so we neglect to include one. There are several reasons you want to include a LICENSE file in your code repo. One reason is to require citations by anyone who uses your code in work that they publish. Another reason is to ensure that code derived from yours includes the same license so that someone cannot claim it is their work.

We should discuss as a collaboration what license makes sense for most code contributors and generate a “recommended default” license that people can include in their code repos. This recommendation must balance the goals of open science against the professional rights of collaborators to protect their research.

A good resource for learning about the many options is from the Open Source Initiative. There is also a convenient comparison table from snyk.io:

1 Like

@rhaas What are your thoughts?

I’d pick either MIT or GPL and call it a day. Neither one really stops much abuse since eg even with GPL groups can still internally use and publish results and not share their code (internal use is always fine).

If you want to really go crazy you can check with the office of technology management (keep in mind that our work is “work for hire” so all copyright is actually owned by our respective Universities):

Office of Technology Management OTM: http://otm.illinois.edu/

Thanks for the links. One of our concerns is about citations in scientific publications. Unlike the open source code that drives this forum deployment, for example, the equation of state calculation modules are used more or less directly to generate scientific data used in academic papers. Given that these papers and citations thereof are the bread and butter of academic researchers, it would be smart to determine what the best license is for these purposes. In the end, MUSES will generate high quality and relevant software products, and simply because we proposed to make these open source should not preclude proper attribution and professional recognition by contributors.

For citation (rahter than code contributinos) I would suggest making a Zenodo publication whcih gives you a DOI. https://www.zenodo.org

We do GPL with SMASH and also Zenodo for each release version, that works very well for us. It is certainly good to decide on licenses as early as possible in a project, since otherwise one needs to gather agreement of all prior contributors, which can be hassle, if people left academia or so.

@elfner: I had not even considered a contributor’s agreement (mostly b/c those do tend to involve OTM). Do you have one for SMASH that MUSES could adopt / be inspired by?

This most likely should be something that involves @leadership though.

Just searching the NCSA wiki one finds eg https://opensource.ncsa.illinois.edu/confluence/download/attachments/34013263/Ergo-CLA.pdf for Contributing Code to Ergo - Ergo - Confluence (from around 2014).

The tricky bit usually being that University’s OTM will quite likely balk at the copyright assignment to a different University (just on the off chance that there may be money to be made by licensing the software product). They will be fine with open sourcing the code since we spelled that out in the proposal (we do, don’t we?) that the University let us submit (so it’s promised as open source to NSF by the University and hence OTM is bound by this).

This is (naturally) less of an issue with permissive licenses (unless and until one needs to change the license). Eg LLVM (which has a very nice set of how to contribute) page has this: LLVM Developer Policy — LLVM 13 documentation

This is the relevant paragraph from the proposal:

This collaborative framework aims at providing an open source, open-physics community toolkit that is capable of generating the equation of state in various regions of the QCD phase diagram for various physics communities (e.g. heavy ions, nuclear astrophysics, gravitational waves, numerical relativity). The toolkit follows a completely open strategy: all code development is carried out using standard version-controlled repositories that are made publicly accessible using online collaboration platforms such as GitHub. From these repositories, anyone can view released tagged versions, historical versions from any given date, and full commit messages for any source code file. Upload into the toolkit is restricted to the developer group of the collaboration, and all submitted patches are subject to code review to maintain a high coding quality standard. A public issue tracker lets everyone submit bug reports and discuss newly proposed code. All source code will be available under free and open source licenses, including but not limited to the NCSA license, that protect the freedom to use and/or modify the code.

Sorry, but I have no idea what ‘OTM’ stands for. Anyways, we did only need agreement by everyone once (not for the main code, just for the associated analysis-suite) and there we just asked everyone by email, so I do not have a template. I mainly added this information, because I think it is good to think about licenses early on.
With the copyright etc, we actually sometimes get additions to our working contracts, that explicitly allow working on open source project, but my impression is that with the ‘Open science’ paradigm becoming more important, this is what universities and research centers support.

Per the proposal, what we need to make sure of is that when a particular package is used, then the user is informed what papers she/he needs to cite. This needs to be done through the CI somehow (may not be hard to implement, once we agree on which papers correspond to which user-cases). In addition to that, there should be a LICENSE file in the main repo explaining this procedure and requesting the citations. We don’t need to go to the University for this. I agree we should go with MIT or GPL, which is what we had at some point in the proposal (inspired by ET).

In our meeting with @awsteiner we discussed his use of and preference for the GPLv3 license.

What goes into using the MIT License? Is it just filling in the correct information, adding the LICENSE file to the repository, and adding a line to all of the source files reiterating the important information?

I don’t think adding text to individual files is necessary; it should be sufficient to include one license file typically at the root of the source code repo directory structure.

Another license I have not heard discussed for MUSES software is the “3-Clause BSD License”, but I have heard this mentioned positively by people on other similar projects.

@jakinh

1 Like

Regarding the citation of the codes, I have seen a very interesting talk about a new standard called CFF format (for Citation File Format).

It is written in a very simple YAML format (they even have a tool to generate your CITATION.cff file directly from the website), and has the advantage to be supported by Github, Zenodo and Zotero.

A nice feature, as you can see on the Athena++ code repository for instance, is that it allows to export BibTeX format citation when clicking on "Cite this repository ".

We discussed with @andrew.manning and thought it could be a good idea to use it. The question then is: should we write one per module? In this case, we would need to agree on some common structure so that all MUSES modules have a citation file which follows a given convention.

Beyond that, I believe this is a nice addition you could probably use it for your own code repositories!