Semantic versioning and backwards compatibility

Today I had a conversation with one of the code module teams that applies to all of our module developer teams, so I thought it would be good to summarize some ideas here for reference.

Two of the concepts we discussed were semantic versioning and backwards compatibility. Imagine that you initially designed the output data structure like so:

mass:
  - 100
  - 900
radius:
  - 1
  - 3

You publish the module with version 1.0.0.

Later you realize that the units should be included in the data structure, so you revise it like this:

mass:
  units: "kg"
  values:
    - 100
    - 900
radius:
  units: "m"
  values:
    - 1
    - 3

This is not a backwards compatible change, because it will probably break client software. Thus you would publish this as version 1.1.0 to indicate this fact. For example, say someone wrote a client for version 1.0.0:

data = yaml.load(output_data)
mass_values = data['mass']

If they switch to version 1.1.0, this code will break because data['mass'] is no longer an array of mass values; it is an object with fields units and values.

Because of this, developers are more likely shoehorn in the units information like this in order to make a backwards compatible update:

units:
  mass: "kg"
  mass: "m"
mass:
  - 100
  - 900
radius:
  - 1
  - 3

They could call this version 1.0.1 to indicate this backwards compatibility. However, this is an inferior data structure because the units field at the root level is parallel to the mass and radius fields (and presumably the other parameters not shown here like moment_of_inertia, etc) which does not really make sense.

The point here is that it is worth designing a reasonably flexible data structure at the beginning that has room to grow and change in a backwards compatible way.

@ns-qlimr @cmf-module @cyberinfrastructure