[PATCH 1/2] drm/xe: Skip CCS clear for WB type BOs

Wed Aug 28 08:34:22 UTC 2024

On 8/28/2024 10:23 AM, Thomas Hellström wrote:
> Hi,
>
> On Tue, 2024-08-27 at 17:49 +0200, Nirmoy Das wrote:
>> HW treats any access to 1-way or 2-way coherent memory as compression
>> disabled memory. So for such BOs there is no need to do CCS clearing.
>>
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index cbe7bf098970..24701272e3af 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -283,6 +283,7 @@ struct xe_ttm_tt {
>>   	struct device *dev;
>>   	struct sg_table sgt;
>>   	struct sg_table *sg;
>> +	bool skip_ccs_clear:1;
>>   };
>>   
>>   static int xe_tt_map_sg(struct ttm_tt *tt)
>> @@ -404,6 +405,8 @@ static struct ttm_tt *xe_ttm_tt_create(struct
>> ttm_buffer_object *ttm_bo,
>>   	if (ttm_bo->type == ttm_bo_type_device && xe-
>>> mem.gpu_page_clear_sys)
>>   		page_flags |= TTM_TT_FLAG_CLEARED_ON_FREE;
>>   
>> +	/* compression is not allowed for cached BO so ccs clear can
>> be skipped. */
>> +	tt->skip_ccs_clear = caching == ttm_cached;
> In theory, BOs that are promoted to fb (not created with the SCANOUT
> flag) can AFAICT have caching remaining at ttm_cached, yet still sent
> to the display engine, reading uninitialized ccs.
>
> Also I think LNL will be the only HW having the "feature" that clean
> cache-lines are written back so in the future we might allow 0-coherent
> with ttm_cached.

I Just read that no compression for 1,2-way coherent is only for LNL In 
seems so this is mainly applicable for LNL.

>
> So IMO we need to improve the detection of "skip_ccs_clear" here.

How do I detect when a BO is promoted to FB ?

Regards,

Nirmoy

> Otherwise, I'm all for the optimizaion.
>
> /Thomas
>
>
>>   	err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching,
>> extra_pages);
>>   	if (err) {
>>   		kfree(tt);
>> @@ -664,13 +667,16 @@ static int xe_bo_move(struct ttm_buffer_object
>> *ttm_bo, bool evict,
>>   	struct ttm_resource *old_mem = ttm_bo->resource;
>>   	u32 old_mem_type = old_mem ? old_mem->mem_type :
>> XE_PL_SYSTEM;
>>   	struct ttm_tt *ttm = ttm_bo->ttm;
>> +	struct xe_ttm_tt *xe_tt = container_of(ttm_bo->ttm, struct
>> xe_ttm_tt,
>> +					       ttm);
>>   	struct xe_migrate *migrate = NULL;
>>   	struct dma_fence *fence;
>>   	bool move_lacks_source;
>>   	bool tt_has_data;
>>   	bool needs_clear;
>>   	bool handle_system_ccs = (!IS_DGFX(xe) &&
>> xe_bo_needs_ccs_pages(bo) &&
>> -				  ttm && ttm_tt_is_populated(ttm)) ?
>> true : false;
>> +				  ttm && ttm_tt_is_populated(ttm) &&
>> +				  !xe_tt->skip_ccs_clear) ? true :
>> false;
>>   	bool clear_system_pages;
>>   	int ret = 0;
>>