[PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2

Tue Jun 20 09:14:51 UTC 2023

Am 20.06.23 um 11:09 schrieb Boris Brezillon:
> On Tue, 20 Jun 2023 10:44:26 +0200
> Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
>
>> Am 20.06.23 um 10:28 schrieb Boris Brezillon:
>>> On Tue, 20 Jun 2023 10:12:13 +0200
>>> Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
>>>   
>>>>> I think Boris's suggestion of having this through a common
>>>>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.
>>>> No, again. The only driver which should accept duplicates is radeon, for
>>>> all other drivers especially new ones duplicates should probably be
>>>> rejected.
>>>>
>>>> We only allow this for radeon because it is already UAPI, could be that
>>>> we need to do this for amdgpu as well but I really hope we don't need this.
>>> Just want to describe the use case we have: we support submission in
>>> batch (several jobs passed to the submit ioctl) with a
>>> submit-all-or-nothing model: if any of the job description is passed
>>> wrong args or causes an allocation error, we fail the whole group. In
>>> the submission path, we want to prepare GEMs for all jobs. That means
>>> adding enough fence slots for the number job finished fences. Given not
>>> all jobs will access the same set of BOs, I thought I could use
>>> duplicates support to make my life easier, because otherwise I have to
>>> collect all BOs upfront, store them in a temporary array, and keep
>>> track of the number of fence slots needed for each of them. I guess
>>> the other option would be to over-estimate the number of slots and make
>>> it equal to num_jobs for all BOs.
>> Sounds pretty much what amdgpu is doing as well, but question is why
>> don't you give just one list of BOs? Do you really want to add the
>> fences that fine grained?
> Actually, we don't give a list of BOs at all, we pass a VM, and lock
> all BOs attached to the VM (similar to what Xe does). And, as all other
> drivers being submitted recently, we use explicit sync, so most of
> those VM BOs, except for the imported/exported ones, will be given a
> BOOKKEEP fence.
>
> The reason we need support for duplicates is because we also have
> implicit BOs (like the HWRT object that's shared by the
> geometry/fragment queues to pass data around), and those can be passed
> to multiple jobs in a given batch and require special synchronization
> (geometry job writes to them, fragment job reads from them, so we have
> a reader/writer sync to express). I can of course de-duplicate upfront,
> by parsing jobs and creating an array of BOs that need to be acquired
> over the whole submission, but that's still one extra-step I'd prefer
> to avoid, given the dma_resv framework allows us to figure it out at
> lock time. I can also just deal with the EALREADY case in the driver
> directly, it's not like it's super complicated anyway, just thought
> other drivers would fall in the same situation, that's all.

Well as long as you just need to ignore EALREADY, that should be trivial 
and doable.

What radeon needs is to keep EALREADY BOs in a separate container 
because we need to double check their properties to not break the UAPI.

I strongly think that this shouldn't be needed by any other driver.

Going to add a flag to ignore EALREADY which can be set during exec init.

Regards,
Christian.

>
>> For radeon it turned out that we just had stupid userspace which
>> sometimes mentioned a BO in the list twice.
> Okay, that's not the same thing, indeed.
>
>> On the other hand over estimating the number of fences needed is
>> perfectly fine as well, that is rounded up to the next kvmalloc size or
>> even next page size anyway.
> Yeah, actually over-provisioning is not the most annoying part.
> Iterating over jobs to collect 'meta'-BOs is, so if I can just rely on
> EALREADY to detect that case and fallback to reserving an extra slot in
> that situation, I'd prefer that.