[Intel-gfx] [PATCH v2 4/4] drm/i915/: Re-work clflush_write32
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Feb 1 16:32:14 UTC 2022
On 01/02/2022 15:41, Michael Cheng wrote:
> Ah, thanks for the clarification! While discussion goes on about the
> route you suggested, could we land these patches (after addressing the
> reviews) to unblock compiling i915 on arm?
I am 60-40 to no, since follow up can be hard. I'd prefer a little bit
of discussion before merging.
Also, what will be the Arm implementation of drm_clflush_virt_range?
Noob question - why is i915 the only driver calling it? Do other GPUs
never need to flush CPU cache?
Regards,
Tvrtko
> On 2022-02-01 1:25 a.m., Tvrtko Ursulin wrote:
>>
>> On 31/01/2022 17:02, Michael Cheng wrote:
>>> Hey Tvrtko,
>>>
>>> Are you saying when adding drm_clflush_virt_range(addr, sizeof(addr),
>>> this function forces an x86 code path only? If that is the case,
>>> drm_clflush_virt_range(addr, sizeof(addr) currently has ifdefs that
>>> seperate out x86 and powerpc, so we can add an ifdef for arm in the
>>> near future when needed.
>>
>> No, I was noticing that the change you are making in this patch, while
>> it indeed fixes a build failure, it is a code path which does not get
>> executed on Arm at all.
>>
>> So what effectively happens is a single assembly instruction gets
>> replaced with a function call on all integrated GPUs up to and
>> including Tigerlake.
>>
>> That was the slightly annoying part I was referring to and asking
>> whether it was discussed before.
>>
>> Sadly I don't think there is a super nice solution apart from
>> duplicating drm_clflush_virt_range as for example i915_clflush_range
>> and having it static inline. That would allow the integrated GPU code
>> path to remain of the same performance profile, while solving the Arm
>> problem. However it would be code duplication so might be frowned upon.
>>
>> I'd be tempted to go that route but it is something which needs a bit
>> of discussion if that hasn't happened already.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> On 2022-01-31 6:55 a.m., Tvrtko Ursulin wrote:
>>>> On 28/01/2022 22:10, Michael Cheng wrote:
>>>>> Use drm_clflush_virt_range instead of clflushopt and remove the memory
>>>>> barrier, since drm_clflush_virt_range takes care of that.
>>>>>
>>>>> Signed-off-by: Michael Cheng <michael.cheng at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +++-----
>>>>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> index 498b458fd784..0854276ff7ba 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> @@ -1332,10 +1332,8 @@ static void *reloc_vaddr(struct i915_vma *vma,
>>>>> static void clflush_write32(u32 *addr, u32 value, unsigned int
>>>>> flushes)
>>>>> {
>>>>> if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) {
>>>>> - if (flushes & CLFLUSH_BEFORE) {
>>>>> - clflushopt(addr);
>>>>> - mb();
>>>>> - }
>>>>> + if (flushes & CLFLUSH_BEFORE)
>>>>> + drm_clflush_virt_range(addr, sizeof(addr));
>>>>> *addr = value;
>>>>> @@ -1347,7 +1345,7 @@ static void clflush_write32(u32 *addr, u32
>>>>> value, unsigned int flushes)
>>>>> * to ensure ordering of clflush wrt to the system.
>>>>> */
>>>>> if (flushes & CLFLUSH_AFTER)
>>>>> - clflushopt(addr);
>>>>> + drm_clflush_virt_range(addr, sizeof(addr));
>>>>> } else
>>>>> *addr = value;
>>>>> }
>>>>
>>>> Slightly annoying thing here (maybe in some other patches from the
>>>> series as well) is that the change adds a function call to x86 only
>>>> code path, because relocations are not supported on discrete as per:
>>>>
>>>> static in
>>>> eb_validate_vma(...)
>>>> /* Relocations are disallowed for all platforms after
>>>> TGL-LP. This
>>>> * also covers all platforms with local memory.
>>>> */
>>>>
>>>> if (entry->relocation_count &&
>>>> GRAPHICS_VER(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915))
>>>> return -EINVAL;
>>>>
>>>> How acceptable would be, for the whole series, to introduce a static
>>>> inline i915 cluflush wrapper and so be able to avoid functions calls
>>>> on x86? Is this something that has been discussed and discounted
>>>> already?
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>> P.S. Hmm I am now reminded of my really old per platform build
>>>> patches. With them you would be able to compile out large portions
>>>> of the driver when building for ARM. Probably like a 3rd if my
>>>> memory serves me right.
More information about the Intel-gfx
mailing list