[Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Jun 28 08:58:01 UTC 2022


On 27/06/2022 19:58, Zeng, Oak wrote:
> 
> 
> Thanks,
> Oak
> 
>> -----Original Message-----
>> From: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>> Sent: June 27, 2022 4:30 AM
>> To: Zeng, Oak <oak.zeng at intel.com>; Landwerlin, Lionel G
>> <lionel.g.landwerlin at intel.com>; Vishwanathapura, Niranjana
>> <niranjana.vishwanathapura at intel.com>
>> Cc: Zanoni, Paulo R <paulo.r.zanoni at intel.com>; intel-
>> gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org; Hellstrom,
>> Thomas <thomas.hellstrom at intel.com>; Wilson, Chris P
>> <chris.p.wilson at intel.com>; Vetter, Daniel <daniel.vetter at intel.com>;
>> christian.koenig at amd.com; Auld, Matthew <matthew.auld at intel.com>
>> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
>>
>>
>> On 24/06/2022 21:23, Zeng, Oak wrote:
>>> Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing
>> storage":
>>>
>>> Correctness:
>>> consider this sequence of:
>>> 1. unbind va1 from pa1,
>>> 2. then bind va1 to pa2. //user space has the freedom to do this as it
>>> manages virtual address space 3. Submit shader code using va1, 4. Then
>>> retire pa1.
>>>
>>> If you don't perform tlb invalidate at step #1, in step #3, shader will use
>> stale entries in tlb and pa1 will be used for the shader. User want to use pa2.
>> So I don't think invalidate tlb at step #4 make correctness.
>>
>> Define step 3. Is it a new execbuf? If so then there will be a TLB flush there.
>> Unless the plan is to stop doing that with eb3 but I haven't picked up on that
>> anywhere so far.
> 
> In Niranjana's latest patch series, he removed the TLB flushing from vm_unbind. He also said explicitly TLB invalidation will be performed at job submission and backing storage releasing time, which is the existing behavior of the current i915 driver.
> 
> I think if we invalidate TLB on each vm_unbind, then we don't need to invalidate at submission and backing storage releasing. It doesn't make a lot of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a behavior for the old implicit binding programming model. For vm_bind and eb3, we separate the binding and job submission into two APIs. It is more natural the TLB invalidation be coupled with the vm bind/unbind, not job submission. So in my opinion we should remove tlb invalidation from submission and backing storage releasing and add it to vm unbind. This is method is cleaner to me.

You can propose this model (not flushing in eb3) but I have my doubts. 
Consider the pointlessness of flushing on N unbinds for 99% of clients 
which are not infinite compute batch. And consider how you make the 
behaviour consistent on all platforms (selective vs global tlb flush).

Also note that this discussion is orthogonal to unbind vs backing store 
release.

> Regarding performance, we don't have data. In my opinion, we should make things work in a most straight forward way as the first step. Then consider performance improvement if necessary. Consider some delayed tlb invalidation at submission and backing release time without performance data support wasn't a good decision.

It is quite straightforward though. ;) It aligns with the eb2 model and 
argument can be made backing store release is (much) less frequent than 
unbind (consider softpin where client could trigger a lot of pointless 
flushes).

Regards,

Tvrtko


More information about the Intel-gfx mailing list