[PATCH 1/2] drm/xe: Skip CCS clear for WB type BOs
Nirmoy Das
nirmoy.das at intel.com
Wed Aug 28 08:34:22 UTC 2024
On 8/28/2024 10:23 AM, Thomas Hellström wrote:
> Hi,
>
> On Tue, 2024-08-27 at 17:49 +0200, Nirmoy Das wrote:
>> HW treats any access to 1-way or 2-way coherent memory as compression
>> disabled memory. So for such BOs there is no need to do CCS clearing.
>>
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_bo.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index cbe7bf098970..24701272e3af 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -283,6 +283,7 @@ struct xe_ttm_tt {
>> struct device *dev;
>> struct sg_table sgt;
>> struct sg_table *sg;
>> + bool skip_ccs_clear:1;
>> };
>>
>> static int xe_tt_map_sg(struct ttm_tt *tt)
>> @@ -404,6 +405,8 @@ static struct ttm_tt *xe_ttm_tt_create(struct
>> ttm_buffer_object *ttm_bo,
>> if (ttm_bo->type == ttm_bo_type_device && xe-
>>> mem.gpu_page_clear_sys)
>> page_flags |= TTM_TT_FLAG_CLEARED_ON_FREE;
>>
>> + /* compression is not allowed for cached BO so ccs clear can
>> be skipped. */
>> + tt->skip_ccs_clear = caching == ttm_cached;
> In theory, BOs that are promoted to fb (not created with the SCANOUT
> flag) can AFAICT have caching remaining at ttm_cached, yet still sent
> to the display engine, reading uninitialized ccs.
>
> Also I think LNL will be the only HW having the "feature" that clean
> cache-lines are written back so in the future we might allow 0-coherent
> with ttm_cached.
I Just read that no compression for 1,2-way coherent is only for LNL In
seems so this is mainly applicable for LNL.
>
> So IMO we need to improve the detection of "skip_ccs_clear" here.
How do I detect when a BO is promoted to FB ?
Regards,
Nirmoy
> Otherwise, I'm all for the optimizaion.
>
> /Thomas
>
>
>> err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching,
>> extra_pages);
>> if (err) {
>> kfree(tt);
>> @@ -664,13 +667,16 @@ static int xe_bo_move(struct ttm_buffer_object
>> *ttm_bo, bool evict,
>> struct ttm_resource *old_mem = ttm_bo->resource;
>> u32 old_mem_type = old_mem ? old_mem->mem_type :
>> XE_PL_SYSTEM;
>> struct ttm_tt *ttm = ttm_bo->ttm;
>> + struct xe_ttm_tt *xe_tt = container_of(ttm_bo->ttm, struct
>> xe_ttm_tt,
>> + ttm);
>> struct xe_migrate *migrate = NULL;
>> struct dma_fence *fence;
>> bool move_lacks_source;
>> bool tt_has_data;
>> bool needs_clear;
>> bool handle_system_ccs = (!IS_DGFX(xe) &&
>> xe_bo_needs_ccs_pages(bo) &&
>> - ttm && ttm_tt_is_populated(ttm)) ?
>> true : false;
>> + ttm && ttm_tt_is_populated(ttm) &&
>> + !xe_tt->skip_ccs_clear) ? true :
>> false;
>> bool clear_system_pages;
>> int ret = 0;
>>
More information about the Intel-xe
mailing list