[Intel-gfx] [PATCH v2 4/4] drm/i915/: Re-work clflush_write32

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Feb 1 16:32:14 UTC 2022


On 01/02/2022 15:41, Michael Cheng wrote:
> Ah, thanks for the clarification! While discussion goes on about the 
> route you suggested, could we land these patches (after addressing the 
> reviews) to unblock compiling i915 on arm?

I am 60-40 to no, since follow up can be hard. I'd prefer a little bit 
of discussion before merging.

Also, what will be the Arm implementation of drm_clflush_virt_range? 
Noob question - why is i915 the only driver calling it? Do other GPUs 
never need to flush CPU cache?

Regards,

Tvrtko

> On 2022-02-01 1:25 a.m., Tvrtko Ursulin wrote:
>>
>> On 31/01/2022 17:02, Michael Cheng wrote:
>>> Hey Tvrtko,
>>>
>>> Are you saying when adding drm_clflush_virt_range(addr, sizeof(addr), 
>>> this function forces an x86 code path only? If that is the case, 
>>> drm_clflush_virt_range(addr, sizeof(addr) currently has ifdefs that 
>>> seperate out x86 and powerpc, so we can add an ifdef for arm in the 
>>> near future when needed.
>>
>> No, I was noticing that the change you are making in this patch, while 
>> it indeed fixes a build failure, it is a code path which does not get 
>> executed on Arm at all.
>>
>> So what effectively happens is a single assembly instruction gets 
>> replaced with a function call on all integrated GPUs up to and 
>> including Tigerlake.
>>
>> That was the slightly annoying part I was referring to and asking 
>> whether it was discussed before.
>>
>> Sadly I don't think there is a super nice solution apart from 
>> duplicating drm_clflush_virt_range as for example i915_clflush_range 
>> and having it static inline. That would allow the integrated GPU code 
>> path to remain of the same performance profile, while solving the Arm 
>> problem. However it would be code duplication so might be frowned upon.
>>
>> I'd be tempted to go that route but it is something which needs a bit 
>> of discussion if that hasn't happened already.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> On 2022-01-31 6:55 a.m., Tvrtko Ursulin wrote:
>>>> On 28/01/2022 22:10, Michael Cheng wrote:
>>>>> Use drm_clflush_virt_range instead of clflushopt and remove the memory
>>>>> barrier, since drm_clflush_virt_range takes care of that.
>>>>>
>>>>> Signed-off-by: Michael Cheng <michael.cheng at intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +++-----
>>>>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> index 498b458fd784..0854276ff7ba 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> @@ -1332,10 +1332,8 @@ static void *reloc_vaddr(struct i915_vma *vma,
>>>>>   static void clflush_write32(u32 *addr, u32 value, unsigned int 
>>>>> flushes)
>>>>>   {
>>>>>       if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) {
>>>>> -        if (flushes & CLFLUSH_BEFORE) {
>>>>> -            clflushopt(addr);
>>>>> -            mb();
>>>>> -        }
>>>>> +        if (flushes & CLFLUSH_BEFORE)
>>>>> +            drm_clflush_virt_range(addr, sizeof(addr));
>>>>>             *addr = value;
>>>>>   @@ -1347,7 +1345,7 @@ static void clflush_write32(u32 *addr, u32 
>>>>> value, unsigned int flushes)
>>>>>            * to ensure ordering of clflush wrt to the system.
>>>>>            */
>>>>>           if (flushes & CLFLUSH_AFTER)
>>>>> -            clflushopt(addr);
>>>>> +            drm_clflush_virt_range(addr, sizeof(addr));
>>>>>       } else
>>>>>           *addr = value;
>>>>>   }
>>>>
>>>> Slightly annoying thing here (maybe in some other patches from the 
>>>> series as well) is that the change adds a function call to x86 only 
>>>> code path, because relocations are not supported on discrete as per:
>>>>
>>>> static in
>>>> eb_validate_vma(...)
>>>>         /* Relocations are disallowed for all platforms after 
>>>> TGL-LP. This
>>>>          * also covers all platforms with local memory.
>>>>          */
>>>>
>>>>         if (entry->relocation_count &&
>>>>             GRAPHICS_VER(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915))
>>>>                 return -EINVAL;
>>>>
>>>> How acceptable would be, for the whole series, to introduce a static 
>>>> inline i915 cluflush wrapper and so be able to avoid functions calls 
>>>> on x86? Is this something that has been discussed and discounted 
>>>> already?
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>> P.S. Hmm I am now reminded of my really old per platform build 
>>>> patches. With them you would be able to compile out large portions 
>>>> of the driver when building for ARM. Probably like a 3rd if my 
>>>> memory serves me right.


More information about the Intel-gfx mailing list