[Freedesktop-sdk] License blacklisting [Was: license-checking script for BuildStream projects]
Tristan Van Berkom
tristan.vanberkom at codethink.co.uk
Thu Aug 27 11:18:28 UTC 2020
Hi,
Forking this thread because I think this needs a wider discussion
outside of the scope of this license checker tool.
Also: Cross posting this to the BuildStream dev list as I think this is
quite relevant there. Here is a link to the freedesktop-sdk thread for
reference:
https://lists.freedesktop.org/archives/freedesktop-sdk/2020-August/000054.html
On Tue, 2020-08-25 at 20:22 +0100, Douglas Winship wrote:
> Following on from the previous email, I've put together a basic
> license-checker in python and tested it in a CI Pipeline. I'd be very
> interested to get feedback on the html and json output.
>
> In particular I'd be interested to get opinions about how to
> implement the blacklist: we're planning to design the license checker
> with a blacklist option, where users can supply a list of blacklisted
> licenses (possibly as regular expressions). If any blacklisted
> licenses are detected, these would be reported in the html and json
> outputs, but I'm not sure what form that ought to take.
First, I think blacklisting of the licenses should be out of scope for
this script, which essentially will scan source code and give us
summary feedback of detected licenses (and as such, provides valuable
input for project maintainers in other stages).
Here is how I would envision a workflow which involves reliable checks
and blacklisting, I will describe this in two sections since I only
recently became aware of the benefits we can gain with SPDX[0].
Traditional approach
~~~~~~~~~~~~~~~~~~~~
Traditionally linux distributions need to audit and consciously
understand what rights they have for every given module they distribute
in binary form, and then make a conscious decision under which license
they distribute those binaries (in the cases where the upstream module
is dual licensed and provides some choice to the distribution).
Binary package based distributions like rpm or deb packages, often
encode this decision into the package metadata, custom linux
integration tools like buildroot and yocto do the same. E.g. yocto has
the LICENSE[1] variable which is manually encoded into all of the
recipes in the poky distribution, users of the poky distribution (who
typically /derive/ poky to create something custom), can then set the
INCOMPATIBLE_LICENSE[2] variable for their distribution, which will
cause build errors if their distribution every inadvertently tries to
include a module with a license on their decided blacklist.
For a vast portion of open source / free software available in the
wild, this conscious interpretation and decision needs to be made by a
human being.
I would see this implemented in BuildStream in the following way:
* Declare a new "licenses" public data format in the bst public data
domain[3]
This is a place where BuildStream project maintainers can record
the decided license for the module being built, similar to yocto's
LICENSE variable[1].
For compatibility across tooling, and consideration of possible
further automation (see further below), we should probably assert
that these license annotations be valid SPDX license
identifiers[4].
* We would add a new Element plugin in BuildStream, and call it
something like `assertlicense`
In this element's `config`, it would allow the user to declare
a blacklist.
This element could output a manifest of licenses in the artifact,
or produce no output at all, the important part is that this
element can be added to the pipeline, depend on some elements,
and halt the build with an error in the case that invalid
licenses are detected.
Enhanced approach
~~~~~~~~~~~~~~~~~
>From my limited understanding, SPDX now provides a format for upstream
project maintainers to encode machine readable information, including
"license expressions" in an "spdx" file in their module.
This would allow for a (possibly weaker possibly stronger) trust chain
where the distributor places trust in the upstream module maintainer to
have the spdx file up to date, if that upstream does maintain one (I
suspect that depending on the use cases, a full license audit will
still be preferred).
This allows us some room to maneuver, and provide automation in the
cases where an upstream provides an spdx file. One downside I can see
from a quick blog read[5]:
"The SPDX specification doesn't specify a file extension or file
naming convention."
If this is true, then we would *still* need project maintainers to at
least annotate their element declarations with a bit of public data
which tell us what file is the SPDX file.
An implementation which seems suitable to me for this, building on top
of the previous "Traditional approach" would look like this:
* Block on the ability to have elements depend on the sources of
their dependencies in BuildStream, or another solution to the
same problem.
As discussed in a recent thread[6], there are already a few
use cases needing similar capability, including the Bazel
build plugin which wants to stage many dependency sources
in one sandbox.
* With the ability to depend on dependency source availability
at build time, the new `assertlicense` Element plugin could
have the ability to:
* Depend on some SPDX parsing tooling, which it could stage
in the `/` of the sandbox.
* Stage sources for any of the dependency elements which do
not already list manually specified licenses in their
public data.
* Attempt to scan the code for an spdx file.
In this way the license assertion could be made based both
on manually specified licenses (for any modules which do not
export any SPDX file), and can be automated for modules which
provide the SPDX file.
Summary
~~~~~~~
I think that the license checker script has value on it's own, as it
provides some automated feedback for those actors who need to audit the
distribution and understand what it is they are distributing, but by
itself is not the ultimately suitable place to add blacklist
assertions.
Any thoughts on the above approaches for general license metadata
checking ?
Cheers,
-Tristan
PS: Please note that there is *another* problem related to licenses,
and that is the actually *distribution* of license files themselves,
e.g. it can be desirable to publish the COPYING/LICENSE files found in
upstream modules in the artifact payloads somewhere so that they can be
handed over at the distribution phase - the entire text above does not
address this bit, and I think it is yet another separate problem.
[0]: https://spdx.dev/
[1]: https://www.yoctoproject.org/docs/latest/ref-manual/ref-manual.html#var-LICENSE
[2]: https://www.yoctoproject.org/docs/latest/ref-manual/ref-manual.html#var-INCOMPATIBLE_LICENSE
[3]: https://docs.buildstream.build/master/format_public.html#builtin-public-data
[4]: https://spdx.org/licenses/
[5]: https://github.com/david-a-wheeler/spdx-tutorial
[6]: https://lists.apache.org/thread.html/r3ff35d36e085d1ca51f753707b24ac5e3111b5b53d74807085076033%40%3Cdev.buildstream.apache.org%3E
More information about the Freedesktop-sdk
mailing list