[RFC] Using DC in amdgpu for upcoming GPU

Fri Dec 9 20:30:13 UTC 2016

> I think this is part of the reason a lot of people get fed up with working upstream in Linux.  I can respect your technical points and if you kept it to that, I'd be fine with it and we could have a technical discussion starting there.  But attacking us or our corporate culture is not cool.  I think perhaps you have been in the RH silo for too long.  Our corporate culture is not like RH's.  Like it or not, we have historically been a windows centric company.  We have a few small Linux team that has been engaged with the community for a long time, but the rest of the company has not.  We are working to improve it, but we can only do so many things at one time.  GPU cycles are fast.  There's only so much time in the day; we'd like to make our code perfect, but we also want to get it out to customers while the hw is still relevant.  We are finally at a point where our AMD Linux drivers are almost feature complete compared to windows and we have support upstream well before hw launch and we get shit on for trying to do the right thing.  It doesn't exactly make us want to continue contributing.  That's the problem with Linux.  Unless you are part time hacker who is part of the "in" crowd can spend all of his days tinkering with making the code perfect, a vendor with massive resources who can just through more people at it, or a throw it over the wall and forget it vendor (hey, my code can just live in staging), there's no room for you.

I don't think that's fair, AMD as a company has a number of
experienced Linux kernel developers, who are well aware of the
upstream kernel development process and views. I should not be put in
a position where I have to say no, that is frankly the position you
are in as a maintainer, you work for AMD but you answer to the kernel
development process out here. AMD is travelling a well travelled road
here, Intel/Daniel have lots of times I've had to deal with the same
problems, eventually Intel learn that what Daniel says matters and
people are a lot happier. I brought up the AMD culture because either
one of two things have happened here, a) you've lost sight of what
upstream kernel code looks like, or b) people in AMD aren't listening
to you, and if its the latter case then it is a direct result of the
AMD culture, and so far I'm not willing to believe it's the former
(except maybe CGS - still on the wall whether that was a good idea or
a floodgate warning).

>From what I understood this DAL code was a rewrite from scratch, with
upstreamability as a possible goal, it isn't directly taken from
Windows or fglrx. This goal was not achieved, why do I have to live
with the result. AMD could have done better, they have so many people
experienced in how this thing should go down.

> You love to tell the exynos story about how crappy the code was and then after it was cleaned up how glorious it was. Except the vendor didn't do that.  Another vendor paid another vendor to do it.  We don't happen to have the resources to pay someone else to do that for us.  Moreover, doing so would negate all of the advantages to bringing up the code along with the hw team in the lab when the asics come back from the fab.  Additionally, the original argument against the exynos code was that it was just thrown over the wall and largely ignored by the vendor once it was upstream.  We've been consistently involved in upstream (heck, I've been at AMD almost 10 years now maintaining our drivers).  You talk about trust.  I think there's something to cutting a trusted partner some slack as they work to further improve their support vs. taking a hard line because you got burned once by a throw it over the wall vendor who was not engaged.  Even if you want to take a hard line, let's discuss it on technical merits, not mud-slinging.

Here's the thing, what happens if a vendor pays another vendor to
clean up DAL after I merge it, how do you handle it? Being part of the
upstream kernel isn't about hiding in the corner, if you want to gain
the benefits of upstream development you need to participate in
upstream development. If you want to do what AMD seems to be only in a
position to do, and have upstream development as an after thought then
you of course are going to run into lots of problems.

>
> I realize you care about code quality and style, but do you care about stable functionality?  Would you really merge a bunch of huge cleanups that would potentially break tons of stuff in subtle ways because coding style is that important?  I'm done with that myself.  I've merged too many half-baked cleanups and new features in the past and ended up spending way more time fixing them than I would have otherwise for relatively little gain.  The hw is just too complicated these days.  At some point people what support for the hw they have and they want it to work.  If code trumps all, then why do we have staging?

Code doesn't trump all, I'd have merged DAL if it did. Maintainability
trumps all. The kernel will be around for a long time more, I'd like
it to still be something we can make changes to as expectations
change.

> I understand forward progress on APIs, but frankly from my perspective, atomic has been a disaster for stability of both atomic and pre-atomic code.  Every kernel cycle manages to break several drivers.  What happened to figuring out how to do in right in a couple of drivers and then moving that to the core.  We seem to have lost that in favor of starting in the core first.  I feel like we constantly refactor the core to deal with that or that quirk or requirement of someone's hardware and then deal with tons of fallout.  Is all we care about android?  I constantly hear the argument, if we don't do all of this android will do their own thing and then that will be the end.  Right now we are all suffering and android barely even using this yet.  If Linux will carry on without AMD contributing maybe Linux will carry on ok without bending over backwards for android.  Are you basically telling us that you'd rather we water down our driver and limit the features and capabilities and stability we can support so that others can refactor our code constantly for hazy goals to support some supposed glorious future that never seems to come?  What about right now?  Maybe we could try and support some features right now.  Maybe we'll finally see Linux on the desktop.
>

All of this comes from the development model you have ended up at. Do
you have upstream CI? Upstream keeps breaking things, how do you find
out? I've seen spstarr bisect a bunch of AMD regressions in the past 6
months (not due to atomic), where are the QA/CI teams validating that,
why aren't they bisecting the upstream kernel, instead of people in
the community on irc. AMD has been operating in throw it over the wall
at upstream for a while, I've tried to help motivate changing that and
slowly we get there with things like the external mailing list, and I
realise these things take time, but if upstream isn't something that
people really care about at AMD enough to continuously validate and
get involved in defining new APIs like atomic, you are in no position
to come back when upstream refuses to participate in merging 60-90k of
vendor produced code with lots of bits of functionality that shouldn't
be in there.

I'm unloading a lot of stuff here, and really I understand it's not
your fault, but I've stated I've only got one power left when people
let code like DAL/DC get to me, I'm not going to be tell you how to
rewrite it, because you already know, you've always known, now we just
need the right people to listen to you.

Dave.