[PATCH] drm/xe: Use ttm_uncached for BO with NEEDS_UC flag

Wed Jun 19 09:44:45 UTC 2024

On 18/06/2024 19:54, Matt Roper wrote:
> On Tue, Jun 18, 2024 at 08:29:24PM +0200, Thomas Hellström wrote:
>> Hi, Matt
>>
>> On Tue, 2024-06-18 at 09:43 -0700, Matt Roper wrote:
>>> On Tue, Jun 18, 2024 at 02:38:01PM +0200, Thomas Hellström wrote:
>>>> Hi,
>>>>
>>>> On Mon, 2024-06-17 at 13:28 -0700, Matt Roper wrote:
>>>>> On Wed, Jun 12, 2024 at 08:03:24PM +0200, Michal Wajdeczko wrote:
>>>>>> Hi Thomas,
>>>>>>
>>>>>> On 11.06.2024 14:47, Thomas Hellström wrote:
>>>>>>> Hi, Michal,
>>>>>>>
>>>>>>> On Thu, 2024-06-06 at 21:56 +0200, Michal Wajdeczko wrote:
>>>>>>>> We should honor requested uncached mode also at the TTM
>>>>>>>> layer.
>>>>>>>> Otherwise, we risk losing updates to the memory based
>>>>>>>> interrupts
>>>>>>>> source or status vectors, as those require uncached memory.
>>>>>>>>
>>>>>>>> Signed-off-by: Michal Wajdeczko
>>>>>>>> <michal.wajdeczko at intel.com>
>>>>>>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>>>>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/xe/xe_bo.c | 3 +++
>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>> b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>> index 2bae01ce4e5b..2573cc118f29 100644
>>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>> @@ -378,6 +378,9 @@ static struct ttm_tt
>>>>>>>> *xe_ttm_tt_create(struct
>>>>>>>> ttm_buffer_object *ttm_bo,
>>>>>>>>   	    (xe->info.graphics_verx100 >= 1270 && bo-
>>>>>>>>> flags &
>>>>>>>> XE_BO_FLAG_PAGETABLE))
>>>>>>>>   		caching = ttm_write_combined;
>>>>>>>>   
>>>>>>>> +	if (bo->flags & XE_BO_FLAG_NEEDS_UC)
>>>>>>>> +		caching = ttm_uncached;
>>>>>>>> +
>>>>>>>>   	err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags,
>>>>>>>> caching,
>>>>>>>> extra_pages);
>>>>>>>>   	if (err) {
>>>>>>>>   		kfree(tt);
>>>>>>>
>>>>>>> To me the preferred method is to teach bo->cpu_caching about
>>>>>>> the
>>>>>>> uncached mode and then include it in the switch statement
>>>>>>> above.
>>>>>>>
>>>>>>
>>>>>> but bo->cpu_caching is currently documented as:
>>>>>>
>>>>>> /**
>>>>>>   * @cpu_caching: CPU caching mode. Currently only used for
>>>>>> userspace
>>>>>>   * objects.
>>>>>>   */
>>>>>>
>>>>>> and value 0 is implicitly reserved as kind of default, so
>>>>>> 'teaching'
>>>>>> would likely mean either extending uapi with something like:
>>>>>>
>>>>>>    #define DRM_XE_GEM_CPU_CACHING_WB                      1
>>>>>>    #define DRM_XE_GEM_CPU_CACHING_WC                      2
>>>>>> + #define DRM_XE_GEM_CPU_CACHING_UC                      3
>>>>>>
>>>>>> which will introduce lot of undesired right now code changes,
>>>>>> or we
>>>>>> will
>>>>>> introduce internal only flag:
>>>>>>
>>>>>> + #define XE_CPU_CACHING_UC                      ((u16)~0)
>>>>>>
>>>>>> but that doesn't look like a clean solution.
>>>>>>
>>>>>>
>>>>>> OTOH, just above this new diff chunk, there is already a code
>>>>>> that
>>>>>> updates caching mode outside the "switch statement above":
>>>>>>
>>>>>> 	if ((!bo->cpu_caching && bo->flags &
>>>>>> XE_BO_FLAG_SCANOUT)
>>>>>>>>
>>>>>> 	    (xe->info.graphics_verx100 >= 1270 &&
>>>>>> 	     bo->flags & XE_BO_FLAG_PAGETABLE))
>>>>>> 		caching = ttm_write_combined;
>>>>>>
>>>>>> so maybe as a short term solution we can keep this patch as
>>>>>> it's
>>>>>> doing
>>>>>> similar last resort stuff and return to 'preferred way' later:
>>>>>>
>>>>>> 	if (!bo->cpu_caching && bo->flags &
>>>>>> XE_BO_FLAG_NEEDS_UC)
>>>>>> 		caching = ttm_uncached;
>>>>>
>>>>> Yeah, cpu_caching is a "uapi only" thing at the moment (and even
>>>>> then
>>>>> is
>>>>> only set in some situations).  Given the current design and
>>>>> assumptions
>>>>> of the code, maybe it would be more clear to add an assertion
>>>>> like
>>>>> this
>>>>> to help document why this is special?
>>>>>
>>>>>          if (bo->flags & XE_BO_FLAG_NEEDS_UC) {
>>>>>                  /*
>>>>>                   * Valid only for internally-created buffers
>>>>> only,
>>>>> for
>>>>>                   * which cpu_caching is never initialized.
>>>>>                   */
>>>>>                  xe_assert(xe, bo->cpu_caching == 0);
>>>>>                  caching = ttm_uncached;
>>>>>          }
>>>>>
>>>>> If we decide we want a more general redesign of cpu_caching
>>>>> behavior,
>>>>> that would probably be a separate change from the direct
>>>>> functional
>>>>> fix
>>>>> here.
>>>>
>>>> I do think the change should have actually been done before the
>>>> scanout
>>>> caching hack. We shouldn't be building special cases like this, but
>>>> rather fix what's missing.
>>>
>>> I think things happened the other way around.  The scanout caching
>>> adjustment pre-dates the existence of bo->cpu_caching in the driver.
>>> When bo->cpu_caching got added in commit 622f709ca629 ("drm/xe/uapi:
>>> Add
>>> support for CPU caching mode"), it intentionally left kernel objects
>>> set
>>> to 0 by design.
>>
>> I don't really see a discussion around the kernel objects in the commit
>> message, it mentions "currently" in the caching mode doc, but that
>> sounds more like a documentation of current status than a guideline.
>> Ofc, it might be in the review discussion but I haven't looked too
>> closely TBH.
>>
>> Hmm. This commit raises some questions. Do you remember the reason for
>> leaving out the kernel bos? Was it because kernel bos relies on
>> implicit caching mode selection whereas the user-space bo caching mode
>> selection is now explicit? Otherwise it's pretty standard in the driver
>> to map the DRM_XE user-space flags / enums to XE_ internal ones.

Yeah, no real reason IIRC. I don't remember exactly, but I don't think 
there was a huge need for kmd internal objects needing explicit caching 
at the time, and didn't seem worth adding just for that scanout case.

> 
> Unfortunately I don't remember the details here very well myself; I know
> this patch went through a lot of revisions and morphed a fair bit during
> the review cycles.  Adding a couple extra Cc's of people who were more
> actively involved in the review and may have a clearer memory of how we
> initially settled on the userspace-only design.
> 
> +cc Jose, Oak
> 
> 
> Matt
> 
>>
>> (As a complete side note it looks like the system pages for VRAM-only
>> user-bo eviction are now forced to be write-combined by the uAPI?)

Yeah, originally we had smem_cpu_caching, where it had to be left as 
zero for vram objects. vram caching was then implicit and always wc. I 
think internally kmd would use wb if it was evicted and vram-only, idea 
was also to potentially allow umd to also control this with 
smem_cpu_caching.

There was some discussion here: 
https://lore.kernel.org/intel-xe/30454ab26b8df746cfcdc4c1a0548e7c1e676ac0.camel@intel.com/

Feedback was rather to have single cpu_caching value which would always 
be respected when mmapping the object. If the cpu_caching is wc then the 
mmap is always wc, and not sometimes wc and sometimes wb, depending on 
if it's evicted or not which userspace is unaware of. The downside is 
that wc for evicted vram is going to be common and maybe not ideal. 
Maybe this can be revisited?

>>
>>>
>>> We can certainly change the design now if everyone agrees that it
>>> makes
>>> the code cleaner,
>>
>> This is probably a bigger change than I originally though and requires
>> some additional consideration of the above.
>>
>>>   but I think that the general refactoring and
>>> repurposing of bo->cpu_caching is orthogonal to the functional fix
>>> that
>>> Michal is providing here.
>>
>> Generally I think that if something is missing to be able to cleanly
>> add a fix, then we should definitely try our best to add that something
>> _first_. In this case it turns out, though, that it requires some
>> additional afterthought.
>>
>> So for the patch
>>
>> Acked-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>
>>
>>
>>>
>>> +Cc Pallavi, Matt Auld as the authors of the original design in case
>>> they have any thoughts on extending the usage of bo->cpu_caching.
>>>
>>>
>>> Matt
>>>
>>>>
>>>> Can't we make bo->cpu_caching valid also for kernel bos with a new
>>>> enum
>>>> and do the translation in the ioctl?
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>
>>>>>
>>>>> Matt
>>>>>
>>>>>>
>>>>>> Michal
>>>>>
>>>>
>>>
>>
>