[Mesa-dev] [RFC] Mesa 17.3.x release problems and process improvements

Mark Janes mark.a.janes at intel.com
Tue Mar 13 20:36:49 UTC 2018


Daniel Vetter <daniel at ffwll.ch> writes:

> On Tue, Mar 13, 2018 at 4:46 PM, Mark Janes <mark.a.janes at intel.com> wrote:
>> Daniel Vetter <daniel at ffwll.ch> writes:
>>
>>> On Mon, Mar 12, 2018 at 11:54:45PM -0700, Kenneth Graunke wrote:
>>>> On Friday, March 9, 2018 12:12:28 PM PDT Mark Janes wrote:
>>>> [snip]
>>>> > I've been doing this for Intel.  Developers are on the hook to fix their
>>>> > bugs, but you can't make them do it.  They have many pressures on them,
>>>> > and a maintainer can't make the call as to whether a rendering bug is
>>>> > more important than day-1 vulkan conformance, for example.
>>>> >
>>>> > We could heighten the transparency of what is blocking the build by
>>>> > publicizing the authors of bisected blocking bugs to Phoronix, which
>>>> > might get things moving.
>>>>
>>>> I hope you're being sarcastic here, or else I'm misunderstanding your
>>>> proposal.  Public shaming of developers who create bugs has absolutely
>>>> no place in the Mesa community, IMHO.  It would foster the kind of toxic
>>>> community that none of us want to be a part of.
>>>>
>>>> Sometimes, people who create bugs are the very people that work the
>>>> hardest, who the project may not even exist without.  Would you want
>>>> to chew out someone for creating a bug in a Vulkan driver when...if it
>>>> weren't for that person, you wouldn't have a Vulkan driver at all?  Or,
>>>> maybe they caused a couple bad bugs...but also fixed hundreds of them.
>>>>
>>>> Other times, they're new contributors or volunteers who do this, not as
>>>> their day job.  Frankly, those people are under no obligation to help us
>>>> at all, so we need to thank them and appreciate the time and effort they
>>>> spend - and give them a hand fixing things when they're too busy, or
>>>> don't have the relevant hardware or skill to track down a regression.
>>>>
>>>> It's easy to be pissed off when there are bugs, and things seem to not
>>>> be making progress, but let's try and keep things positive and work
>>>> together to make Mesa the best we can.
>>>
>>> I'd like to second this with my experience from the kernel community. The
>>> public shaming game for when you create a regression is very strong there,
>>> lead by Linus Torvalds. In my experience this directly causes:
>>>
>>> - Maintainers to hide bug reports and regressions reports at all costs,
>>>   because having Linus destroy you just aint never worth it. The meta game
>>>   becomes "avoid getting railed" instead of "deliver quality code", and
>>>   there's lots of ways to easily achieve the former that serious hurt the
>>>   latter.
>>>
>>> - Best practice (in my experience) is to not mention the dreaded
>>>   "REGRESSION" tag when you need another maintainer's help to fix a
>>>   regression, because it's too likely they'll just panic. That means they
>>>   start screaming at you to go away, or brain locks up and they can't
>>>   effectively help you track down the bug (seen both cases).
>>>
>>> - Creates a culture where talking about process/tooling improvements to
>>>   prevent regressions and/or handle them quicker becomes too dangerous,
>>>   because it all turns into a personal shaming game of who maintains the
>>>   worst subsystem.
>>>
>>> Long term you end up with a culture fucked up for good :-/
>>>
>>> Imo the only way to make this better is to try analyzing why a regressions
>>> happened, and fix the tooling to prevent that in the future. Maybe better
>>> test coverage (and long term efforts to fix known gaps), maybe better
>>> presentation of automated checks (stuff like github pull requests that
>>> automatically run CI and report full results, blocking the merge if
>>> anything is amiss).
>>
>> You have to have a very strong CI to use it to block commits.  i965 Mesa
>> has a big CI which identifies many regressions, but I wouldn't want to
>> checkpoint commits in an automated way.  A large pool of obsolete
>> CI hardware will have lower reliability than the mesa master branch --
>> which generates noise for developers and impedes progress.
>
> This was all in general about blaming regressions on people, not
> specifically for the stable-backporting-from-master issue here.
>
> And if parts of your CI can't autogate then you can make it more
> informal - there's definitely stuff you want to autogate, like "does
> it compile everywhere in all configs", and probably you don't want to
> autogate on gen2 dying :-)

It's a bit different for us, because multiple companies and volunteers
can push.  We have a buildtest which prevents intel engineers and any CI
user from breaking radeon for example.  However, radeon still breaks
when AMD devs push LLVM-version-dependent patches.  We can't stop that,
and there are a set of similar situations where builds break.  Reverts
and quick fixes are fine for this IMO.

> My point was if you don't want regressions, make it as easy as
> possible for people to never push a regression (whether master or
> stable trees) instead of a pillory or other blaming exercises. Litlle
> things (like whether your CI results is in some mail somewhere, maybe
> for an oudated version of your patches on a different baseline, or
> right next to the "do you really want to merge" button) matters.

Agreed.  Anyone can painlessly test in our CI, and the majority of
developers verifying patches in our CI are external.  We offer it to
them after a regression is detected.  Usually, they make use of the CI,
because they care about the product, and they want their patches to be
great.

There have been a few situations where developers have skipped CI for
what they thought was a trivial patch, and they caused regressions for
everyone.  Lazy behavior can be quite disruptive, and can inflict cost
on the community that you want to participate in.

The most common case, though, is what Ken describes: an absolutely
critical, overworked developer who we depend on utterly, who can't
always get to the fixes rapidly due to multiple conflicting demands on
time.  Mesa has perhaps a half-dozen engineers in this category.  How do
we prioritize quality releases, so millions of users can take advantage
of all the work?  I don't have answers for that.  The last thing I want
is for the hard work to result in criticism instead of praise.

IMO, Mesa CI is an example of what you've asked for in your maintainer
talks, where the system is openly implemented, and provides it's
benefits freely to enable wide participation.

> -Daniel
>
>>> Personally I have high hopes for gitlab.fd.o to enable us to do a lot of
>>> that automation in a much better and much more discoverable way, but it's
>>> some ways in the future still. Besides better quality that would also help
>>> us ramp up new contributors, since instead of unwritten rules they'd not
>>> just get documented merge criteria, but have a pile of bots that
>>> interactively walk them through everything (the best projects auto-insert
>>> a comment from the bot with instructions how to repro results if anything
>>> fails, with links to further docs).
>>>
>>> Assume that people try to do the best and fix the tooling/support
>>> infrastructure to allow them to, and they will deliver. Blaming them just
>>> drives them into hiding and looking for better places to have fun.
>>> -Daniel
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>
>
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the mesa-dev mailing list