[Freedesktop-sdk] license-checking script for BuildStream projects

Tue Aug 25 18:32:34 UTC 2020

On 14/08/2020 11:42, Valentin David wrote:
>> C) What sort of format would be good for the machine-readable 
>> summary? json? YAML?
> json is so much easier and faster to parse. So for machine-readable, json.
>> D) What sort of format would be good for the human-readable summary? 
>> markdown? html?
> markdown would be nice. But if it has limitation for formatting, you 
> can go for html.

I've been working on json and html outputs. (see next email)

>> E) What would be a more useful output for freedesktop-sdk: just the 
>> summaries?
>> or should we also include the raw licensecheck output?
>
> To be published? The summaries, I suppose. But I suppose we want to be 
> able to get the output in a way.
For now I've included them in the output folder, along with the summaries.

> We probably need to have a way to annotate the licensecheck data in 
> the elements. For example build scripts that are intermediate stage 
> within a project are not important for the result. Other cases are we 
> do not build some of the code (for example FFmpeg). We still want to 
> tell the license of the source code. But we should also say what 
> applies to the element's artifact. Optionally, specify it for each 
> split domain. There is also documentation which usually has difference 
> licensing.

The current approach is to build something that's completely external to 
the BuildStream program (although it will still hopefully be maintained 
under the BuildStream umbrella, in the BuildStream GitLab group). That 
means that the script won't have access to internal element data and 
config options. Instead, it works by invoking "bst show" to get a list 
of dependencies, and then checking out the source code from each 
dependency in order to perform the license scan.

In that approach, I don't think there's any way to pay attention to 
split domains.

For excluding certain elements (like intermediate stages), I was 
planning to introduce an 'ignore list', which users can maintain, and 
which the script will read. Any element on the ignore list wouldn't be 
scanned and wouldn't be mentioned in the output. This could also be used 
to remove stack and compose elements from the list, which aren't worth 
including in the output since they don't have any sources to scan.

Scanning artifacts as well as sources is an interesting suggestion. I 
don't think artifacts are ever likely to contain license information 
which wasn't in the sources, so it wouldn't add any additional license 
information to the results. But I suppose in some cases it would be 
interesting to see which license information ends up in the actual 
artifact and which is only found in the source code.

On the other hand, I think scanning artifacts as well as sources would 
add a lot of extra time, and the process already takes a very long time 
to complete. It took nearly 10 hours for the runners to do a full scan 
of everything in Freedesktop-sdk as it is. I don't think it's worth 
doing something that'll make it take even longer.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/freedesktop-sdk/attachments/20200825/bfef0bfe/attachment.htm>