[PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2

Tue Jun 20 09:20:32 UTC 2023

On Tue, 20 Jun 2023 11:14:51 +0200
Christian König <ckoenig.leichtzumerken at gmail.com> wrote:

> Am 20.06.23 um 11:09 schrieb Boris Brezillon:
> > On Tue, 20 Jun 2023 10:44:26 +0200
> > Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
> >  
> >> Am 20.06.23 um 10:28 schrieb Boris Brezillon:  
> >>> On Tue, 20 Jun 2023 10:12:13 +0200
> >>> Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
> >>>     
> >>>>> I think Boris's suggestion of having this through a common
> >>>>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.  
> >>>> No, again. The only driver which should accept duplicates is radeon, for
> >>>> all other drivers especially new ones duplicates should probably be
> >>>> rejected.
> >>>>
> >>>> We only allow this for radeon because it is already UAPI, could be that
> >>>> we need to do this for amdgpu as well but I really hope we don't need this.  
> >>> Just want to describe the use case we have: we support submission in
> >>> batch (several jobs passed to the submit ioctl) with a
> >>> submit-all-or-nothing model: if any of the job description is passed
> >>> wrong args or causes an allocation error, we fail the whole group. In
> >>> the submission path, we want to prepare GEMs for all jobs. That means
> >>> adding enough fence slots for the number job finished fences. Given not
> >>> all jobs will access the same set of BOs, I thought I could use
> >>> duplicates support to make my life easier, because otherwise I have to
> >>> collect all BOs upfront, store them in a temporary array, and keep
> >>> track of the number of fence slots needed for each of them. I guess
> >>> the other option would be to over-estimate the number of slots and make
> >>> it equal to num_jobs for all BOs.  
> >> Sounds pretty much what amdgpu is doing as well, but question is why
> >> don't you give just one list of BOs? Do you really want to add the
> >> fences that fine grained?  
> > Actually, we don't give a list of BOs at all, we pass a VM, and lock
> > all BOs attached to the VM (similar to what Xe does). And, as all other
> > drivers being submitted recently, we use explicit sync, so most of
> > those VM BOs, except for the imported/exported ones, will be given a
> > BOOKKEEP fence.
> >
> > The reason we need support for duplicates is because we also have
> > implicit BOs (like the HWRT object that's shared by the
> > geometry/fragment queues to pass data around), and those can be passed
> > to multiple jobs in a given batch and require special synchronization
> > (geometry job writes to them, fragment job reads from them, so we have
> > a reader/writer sync to express). I can of course de-duplicate upfront,
> > by parsing jobs and creating an array of BOs that need to be acquired
> > over the whole submission, but that's still one extra-step I'd prefer
> > to avoid, given the dma_resv framework allows us to figure it out at
> > lock time. I can also just deal with the EALREADY case in the driver
> > directly, it's not like it's super complicated anyway, just thought
> > other drivers would fall in the same situation, that's all.  
> 
> Well as long as you just need to ignore EALREADY, that should be trivial 
> and doable.

Oh, yeah, that's all I need really. We probably don't want to add the
GEM object a second time in the array though, hence the goto
reserve_fences in my proposal when EALREADY is returned.

> 
> What radeon needs is to keep EALREADY BOs in a separate container 
> because we need to double check their properties to not break the UAPI.
> 
> I strongly think that this shouldn't be needed by any other driver.
> 
> Going to add a flag to ignore EALREADY which can be set during exec init.

Thanks!