[Intel-xe] [PATCH v2 0/5] drm/xe: uAPI cleanup

Thu Jun 1 16:48:18 UTC 2023

On Thu, Jun 01, 2023 at 01:21:06PM +0300, Oded Gabbay wrote:
>On Wed, May 31, 2023 at 11:29 PM Lucas De Marchi
><lucas.demarchi at intel.com> wrote:
>>
>> On Wed, May 31, 2023 at 07:02:51PM +0200, Francois Dugast wrote:
>> >On Wed, May 31, 2023 at 09:23:45AM -0700, Lucas De Marchi wrote:
>> >> +Oded
>> >>
>> >> On Wed, May 31, 2023 at 03:23:33PM +0000, Francois Dugast wrote:
>> >> > This series addresses some feedback on xe_drm.h provided here:
>> >> > https://lists.freedesktop.org/archives/intel-xe/2023-May/004704.html
>> >> >
>> >> > It is based on xe-rev1-2023-05-02, not on drm-xe-next, so it will
>> >> > likely break CI.
>> >>
>> >> As I said on the previous reply, this is not going to fly. Now we have a
>> >> patch series against xe-rev1-2023-05-02, pushed as xe-rev2 branch with
>> >> *no recorded reviews*, doing things that already conflict with other
>> >> fixes that landed in drm-xe-next. Examples: the 32b compat for uapi,
>> >> the mmio refactor, the oob WAs, the probe refactor, etc etc.
>> >>
>> >> Also, there are other fixes queued as fixups, which is still the process
>> >> being used in drm-xe-next. It's also where CI is running. Due to display
>> >> being moved up in the branch it's basically the only sane thing to do.
>> >>
>> >> I think we need we are shooting ourselves on the foot and this review
>> >> process needs to be fixed / agreed upon before we create a mess out of
>> >> it.
>> >>
>> >> Lucas De Marchi
>> >
>> >Yes I understand the concern. The intention is to address Oded's feedback on the
>> >revision that was reviewed, so that the next review iteration is manageable and
>> >not based on a moving target, which could lead to and endless effort.
>>
>> and then what are we going to do next? rebase the rest of drm-xe-next on top?
>> How do we guarantee the fixes on top of xe-rev is not breaking stuff?
>> What about the changes that are already there changing the commits that
>> are on xe-rev?
>>
>> >
>> >Maybe I created confusion by sharing on the mailing list but at this stage the
>>
>> no, that was the right part of it. Reviews on the patches happen on this
>> mailing list. The wrong part IMO was to push it as a branch before it
>> got the necessary review and before we know how we are going to
>> reconciliate things. Rebasing drm-xe-next on xe-rev doesn't
>> really work for the reasons mentioned above.
>>
>> >idea is not to get this series merged, only to get Oded's validation that it
>> >fixes the issues that he reported. Yes this means more work will be required to
>> >rebase and create fixups later for drm-xe-next but it seems acceptable for the
>> >work that has been done so far.
>> >
>> >Do you see a better way that would still satisfy the "not moving target" goal?
>>
>> 0) Ignore the "not moving target" goal
>
>Hi Lucas,
>I would like to say a few things.
>
>The first point is that the current xe-rev1 branch was something that
>Rodrigo and I set-up. If you feel this is not a good starting point
>for the review, let's talk and decide on a new starting point. Even
>though I have already invested multiple hours in reviewing it, I can
>understand the point you are making and I'm willing to re-start the
>review with a new starting point.
>
>The second point is once we make a decision regarding the code
>baseline (keep using the old or moving to the new), I want *all* of us
>to agree that newer patches that are sent to the mailing list will not
>be part of the initial Xe upstreamed code. There are mainly two
>reasons for that:
>
>1. There is a long-standing process of upstreaming a new driver to the
>Linux kernel and that process assumes a code baseline that doesn't
>change between revisions, except for addressing code review comments.
>As a person who has undergone this a couple of times from the side
>being reviewed I would like to follow this same process here as well
>:)
>
>2. The Xe driver is one big driver, with a large team behind it that
>continuously sends patches. The code keeps getting re-factored and
>acquires new features at a high rate. If every revision of the review
>the baseline code will change, the changes from the previous revision
>will be too large to review independently. I will have to review the
>entire driver again... Not to mention how will I be able to easily
>verify that my comments were addressed properly (btw, I also wrote it
>here https://lore.kernel.org/intel-xe/CAFCwf124UMR3VQ=Tr6mT_eTRQi9ssqZqEH4jys1bGggHJjboLw@mail.gmail.com/).

Let me use a few examples:

	22ad50d184c0 ("fixup! drm/xe: Introduce a new DRM driver for Intel GPUs")

Do we want to leave the commit "drm/xe: Introduce a new DRM driver for
Intel GPUs" as part of the original and then have a Fix commit leaving a
few hundreds of broken commits?  On other "first driver submissions"
I've seen that would be frowned upon and it would be expected that this
is fixed inplace instead of leaving broken commits behind.

	a00a92181ab1 ("fixup! drm/xe: Introduce a new DRM driver for Intel GPUs")

Same thing: a nasty memory corruption lurking there. Do we leave it
broken?  With the previous (current(?)) process we were actually adding
these fixups that are later moved back in the git history so things like
this are never part of the upstream code. Looking at xe-rev2 branch
being pushed without any code review, why is 83ac6e02bf5e ("drm/xe:
Remove unused define") more important that deserves being brought back
to the original submission?

Another one that comes to mind: the engine vs context name that you
raised on the other review and I pointed out to the history: would it be
something to be changed on xe-rev branch and invalidate the other things
on top?  Or would we better change this in drm-xe-next?

 From what I talked with Rodrigo before, we'd actually discuss and agree
about things that needed to be changed/fixed, particularly on the design
and interface of the driver (with drm subsystem and with userspace).
Line by line review with non-critical things would then become gitlab
issues, like when Matt Roper went through a similar line by line
review and Rodrigo created a bunch of gitlab issues that are being
solved in drm-xe-next.

>Therefore, I think we need to agree on some method that won't burn me
>out and cause me to tell Dave & Daniel I'm giving up on what they
>asked me to do. One way is to follow the rule above, but if you still
>disagree with this, I'm open to hearing other suggestions on how to
>enable a proper review in a timely manner that won't cause me to pull
>my hair out.
>
>In that context, I feel your "Ignore the not moving target goal"
>statement to Francois was inappropriate. We need to have an agreement,
>you can't decide that on your own.

What am I deciding on my own and why was that inappropriate?  I went
back to read again what I wrote to check if it was badly phrased, but I
don't see it that way. He asked my opinion on a different method, which
you are also asking when you say "I'm open to hearing other suggestions
on how to enable a proper review in a timely manner that won't cause me
to pull my hair out". If you hear and disagree, I'm totally fine with
it, but asking for it and then saying it's inappropriate to disagree
with the premise seems odd.

I went on and explained *why*, saying that the moving base is already
there and IMO a better method would be to accept the moving base, with a
different review method. And if you look at the proposal below, I'm
actually removing the "moving base" factor between what's being reviewed
and where the rest of development is happening, CI is running, other
review comments are being handled, etc.

What I'm currently seeing and I don't agree with:

- Commits are being pushed to a xe-revN branch bypassing code review

- Since CI is running on drm-xe-next, it's also bypassing CI. I was
   part of enough upstream submissions to know our ability to break
   things when handling code review comments, which is only noticed later
   when the rest is rebased on top and CI is running again

- drm-xe-next and xe-revN are not on the same baseline. If they were,
   then comparing both would be trivial and we'd probably not have
   this discussion. Question here on your process: currently xe-rev
   is on top of drm-next-2023-04-27, which is on top of v6.3-rc4.
   Is your intention that fixes (to only your review commens) are
   accumulated  on xe-rev* and then propose this to be merged to drm-next
   when that is finished, without any rebase?

- changes to "the original submission" that from my pov some are good,
   some would better be discussed and some that are not critical
   taking precedence over others that should really be there. And that
   is happening because there is this disconnect between drm-xe-next and
   xe-rev as per above.

- the disconnect will only get worse as we are on 1/6 batch of review
   comments so I don't see the process scaling well.

Lucas De Marchi

>
>Thanks,
>Oded
>
>
>
>
>>
>> 1) The review should be on very a recent rev, not something old with a
>>     lot of commits on top. That doesn't scale and there will be another 100
>>     patches before the next review round.
>>
>>         $ tag=xe-rev1-2023-05-02
>>         $ s=$(git log -1 --format=%s $tag)
>>         $ commit=$(git log --grep "$s" -n1 --format=%H)
>>         $ git rev-list $commit..HEAD | wc -l
>>         208
>>
>>     Then as outcome of the review we have more patches to land, rebase
>>     the rest, etc, etc.  This is a much worse "moving target" and IMO
>>     it's worse for both reviewers and committers.
>>
>> 2) The xe-revX should have the same baseline. That makes for a really
>>     simple way to check "what changed since I last looked". This can be
>>     checked on the overall diff with `git diff` or on the amended fixes
>>     with git range-diff.
>>
>>     Currently drm-xe-next is on drm-tip due to the display integration.
>>     Being on drm-next is important for merging the branch upstream, but
>>     not so much for these kind of reviews
>>
>> 3) Not merge things in the xe-revX branch that is not already
>>     in drm-xe-next.  Ideally if there are things that really need to
>>     be changed back in the xe-rev branch, it should be a fixup on
>>     drm-xe-next that is then brought back to the xe-rev branch.
>>
>>
>> Lucas De Marchi
>>
>> >
>> >Thanks,
>> >Francois
>> >
>> >>
>> >> >
>> >> > v2: fix ordering of defines and fields in uAPI (Lucas De Marchi)
>> >> >
>> >> > Francois Dugast (5):
>> >> >  drm/xe: Use SPDX-License-Identifier instead of license text
>> >> >  drm/xe: Group engine related structs
>> >> >  drm/xe: Move defines before relevant fields
>> >> >  drm/xe: Document usage of struct drm_xe_device_query
>> >> >  drm/xe: Fix some formatting issues in uAPI
>> >> >
>> >> > include/uapi/drm/xe_drm.h | 171 ++++++++++++++++++++------------------
>> >> > 1 file changed, 90 insertions(+), 81 deletions(-)
>> >> >
>> >> > --
>> >> > 2.34.1
>> >> >