[Freedesktop-sdk] License blacklisting [Was: license-checking script for BuildStream projects]

Douglas Winship douglas.winship at codethink.co.uk
Thu Aug 27 14:49:20 UTC 2020


Hi Tristan

Thanks for your comments. Your main suggestion sounds good, but I don't 
follow all of
the logic.

For clarity, I'll call my existing approach to license checking an 
"external" approach,
since it's an external script that isn't part of BuildStream itself, 
whereas your
proposal would be an "internal" approach.

The internal approach sounds like an excellent proposal, and as I 
understand it you're
suggesting that both approaches have valid use cases, and both 
approaches should be
developed. (With the external approach presumably being developed first, 
since the
internal approach is currently blocked.) This makes sense, but if I 
understand you
correctly you're also saying:

1. The internal approach will implement blacklist processing.
2. ...and therefore the external approach shouldn't.

That's the part that confuses me. I don't understand how the one thing 
implies the other.

If only one of the approaches is worth using, then we should only 
develop that approach
and not develop the other. But if both approaches are worth having, then 
we should
develop both approaches properly. There's no reason to deliberately 
limit the
functionality of one approach, by removing an obvious and reasonable 
feature.

Identifying blacklist violations does seem like an obvious feature for 
any license-
checking approach. So far, most of the people I've spoken to about the 
external script
seems to have assumed it will be used this way; to monitor which 
licenses apply to the
code in their BuildStream project, and make sure that doesn't include 
any licenses that
ought to be avoided. That effectively means checking against a blacklist.

The external script produces two main summary outputs: a json output for 
machine
processing, and an html output for human reading. The main use case I 
see for the
human-readable output is for users to skim through it, looking for 
anything out of place
or surprising. This is a perfect place for blacklist processing: 
violations could be
highlighted at the top of the page, where they would be visible in one 
glance.

Likewise, we're suggesting that the script should be used in CI for 
projects like
freedesktop-sdk. If the script includes blacklist processing, then it 
can cause CI
pipelines to fail when a violation is detected; that's a useful feature. 
Without
blacklist processing, the script would just create output artifacts that 
people probably
won't remember to look at. I don't see what value that adds to CI.

Honestly, I think blacklist processing could be the most valuable 
feature of the
external script. If we don't include it, then I'm not sure I understand 
what the
external script is supposed to be for.


Douglas

On 27/08/2020 12:18, Tristan Van Berkom wrote:
> Hi,
>
> Forking this thread because I think this needs a wider discussion
> outside of the scope of this license checker tool.
>
> Also: Cross posting this to the BuildStream dev list as I think this is
> quite relevant there. Here is a link to the freedesktop-sdk thread for
> reference:
>
>      https://lists.freedesktop.org/archives/freedesktop-sdk/2020-August/000054.html
>
> On Tue, 2020-08-25 at 20:22 +0100, Douglas Winship wrote:
>> Following on from the previous email, I've put together a basic
>> license-checker in python and tested it in a CI Pipeline. I'd be very
>> interested to get feedback on the html and json output.
>>
>> In particular I'd be interested to get opinions about how to
>> implement the blacklist: we're planning to design the license checker
>> with a blacklist option, where users can supply a list of blacklisted
>> licenses (possibly as regular expressions). If any blacklisted
>> licenses are detected, these would be reported in the html and json
>> outputs, but I'm not sure what form that ought to take.
> First, I think blacklisting of the licenses should be out of scope for
> this script, which essentially will scan source code and give us
> summary feedback of detected licenses (and as such, provides valuable
> input for project maintainers in other stages).
>
>
> Here is how I would envision a workflow which involves reliable checks
> and blacklisting, I will describe this in two sections since I only
> recently became aware of the benefits we can gain with SPDX[0].
>
>
> Traditional approach
> ~~~~~~~~~~~~~~~~~~~~
> Traditionally linux distributions need to audit and consciously
> understand what rights they have for every given module they distribute
> in binary form, and then make a conscious decision under which license
> they distribute those binaries (in the cases where the upstream module
> is dual licensed and provides some choice to the distribution).
>
> Binary package based distributions like rpm or deb packages, often
> encode this decision into the package metadata, custom linux
> integration tools like buildroot and yocto do the same. E.g. yocto has
> the LICENSE[1] variable which is manually encoded into all of the
> recipes in the poky distribution, users of the poky distribution (who
> typically /derive/ poky to create something custom), can then set the
> INCOMPATIBLE_LICENSE[2] variable for their distribution, which will
> cause build errors if their distribution every inadvertently tries to
> include a module with a license on their decided blacklist.
>
> For a vast portion of open source / free software available in the
> wild, this conscious interpretation and decision needs to be made by a
> human being.
>
> I would see this implemented in BuildStream in the following way:
>
>    * Declare a new "licenses" public data format in the bst public data
>      domain[3]
>
>      This is a place where BuildStream project maintainers can record
>      the decided license for the module being built, similar to yocto's
>      LICENSE variable[1].
>
>      For compatibility across tooling, and consideration of possible
>      further automation (see further below), we should probably assert
>      that these license annotations be valid SPDX license
>      identifiers[4].
>
>    * We would add a new Element plugin in BuildStream, and call it
>      something like `assertlicense`
>
>      In this element's `config`, it would allow the user to declare
>      a blacklist.
>
>      This element could output a manifest of licenses in the artifact,
>      or produce no output at all, the important part is that this
>      element can be added to the pipeline, depend on some elements,
>      and halt the build with an error in the case that invalid
>      licenses are detected.
>
>
> Enhanced approach
> ~~~~~~~~~~~~~~~~~
>  From my limited understanding, SPDX now provides a format for upstream
> project maintainers to encode machine readable information, including
> "license expressions" in an "spdx" file in their module.
>
> This would allow for a (possibly weaker possibly stronger) trust chain
> where the distributor places trust in the upstream module maintainer to
> have the spdx file up to date, if that upstream does maintain one (I
> suspect that depending on the use cases, a full license audit will
> still be preferred).
>
> This allows us some room to maneuver, and provide automation in the
> cases where an upstream provides an spdx file. One downside I can see
> from a quick blog read[5]:
>
>      "The SPDX specification doesn't specify a file extension or file
>       naming convention."
>
> If this is true, then we would *still* need project maintainers to at
> least annotate their element declarations with a bit of public data
> which tell us what file is the SPDX file.
>
> An implementation which seems suitable to me for this, building on top
> of the previous "Traditional approach" would look like this:
>
>    * Block on the ability to have elements depend on the sources of
>      their dependencies in BuildStream, or another solution to the
>      same problem.
>
>      As discussed in a recent thread[6], there are already a few
>      use cases needing similar capability, including the Bazel
>      build plugin which wants to stage many dependency sources
>      in one sandbox.
>
>    * With the ability to depend on dependency source availability
>      at build time, the new `assertlicense` Element plugin could
>      have the ability to:
>
>      * Depend on some SPDX parsing tooling, which it could stage
>        in the `/` of the sandbox.
>
>      * Stage sources for any of the dependency elements which do
>        not already list manually specified licenses in their
>        public data.
>
>      * Attempt to scan the code for an spdx file.
>
>      In this way the license assertion could be made based both
>      on manually specified licenses (for any modules which do not
>      export any SPDX file), and can be automated for modules which
>      provide the SPDX file.
>
>
> Summary
> ~~~~~~~
> I think that the license checker script has value on it's own, as it
> provides some automated feedback for those actors who need to audit the
> distribution and understand what it is they are distributing, but by
> itself is not the ultimately suitable place to add blacklist
> assertions.
>
> Any thoughts on the above approaches for general license metadata
> checking ?
>
>
> Cheers,
>      -Tristan
>
>
> PS: Please note that there is *another* problem related to licenses,
> and that is the actually *distribution* of license files themselves,
> e.g. it can be desirable to publish the COPYING/LICENSE files found in
> upstream modules in the artifact payloads somewhere so that they can be
> handed over at the distribution phase - the entire text above does not
> address this bit, and I think it is yet another separate problem.
>
>
> [0]: https://spdx.dev/
> [1]: https://www.yoctoproject.org/docs/latest/ref-manual/ref-manual.html#var-LICENSE
> [2]: https://www.yoctoproject.org/docs/latest/ref-manual/ref-manual.html#var-INCOMPATIBLE_LICENSE
> [3]: https://docs.buildstream.build/master/format_public.html#builtin-public-data
> [4]: https://spdx.org/licenses/
> [5]: https://github.com/david-a-wheeler/spdx-tutorial
> [6]: https://lists.apache.org/thread.html/r3ff35d36e085d1ca51f753707b24ac5e3111b5b53d74807085076033%40%3Cdev.buildstream.apache.org%3E
>
>
> _______________________________________________
> Freedesktop-sdk mailing list
> Freedesktop-sdk at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedesktop-sdk
>


More information about the Freedesktop-sdk mailing list