[PATCH] drm/xe/bo: optimise CCS case for WB pages
Matthew Auld
matthew.auld at intel.com
Mon May 19 09:58:42 UTC 2025
On 16/05/2025 20:12, Matt Roper wrote:
> On Fri, May 16, 2025 at 04:38:11PM +0100, Matthew Auld wrote:
>> Dealing with CCS state is significant on LNL+, where we end up clearing
>> the compression state on every page alloc using the blitter for user
>> buffers, including also saving and restoring it when moving between
>> domains, plus we need to alloc extra pages to hold the raw CCS state for
>> the save step.
>>
>> However all compression PAT modes, on platforms like LNL, also require
>> coh_none, meaning that only WC memory can use compression in the first
>
> On PTL/Xe3 there's a new PAT entry 16 that has CCS compression + 1-way
> coherency (according to bspec page 71582). It looks like we don't have
> that in the driver yet today, but we probably need to add it since
> userspace is expected to be able to use it.
Right, for that we are also missing userptr handling, if we can't just
reject it (it will also currently throw a build error IIRC), and also
need to figure out what to do with external imported dma-buf, I assume
we just reject at bind time? Using compression with external dma-buf I
assume is not going to work.
Jose had the idea of maybe adding a bo_create ioctl flag to opt of using
compression, which we could use a hint for when to apply this
optimisation more generally. Also would benefit SRIOV VF case where we
can skip needing to deal with potential CCS state on PTL.
>
>
> Matt
>
>> place. With this we can be sneaky and completely ignore CCS for WB
>> buffers, which is likely the common case anyway. This would then skip
>> all blitter moves/clears between sys <-> tt and then also means we can
>> drop the extra CCS pages.
>>
>> This should be safe since there is no way to interact with the
>> compression state (potentially uncleared) without using a PAT enabled
>> index (which is rejected at bind), including if trying to be malicious
>> and copy the raw CCS state from userpace, which should give back all
>> zeroes if the src surface (indirect) is lacking compressed PAT index.
>>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_bo.c | 8 ++++++++
>> drivers/gpu/drm/xe/xe_pat.c | 3 ++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index d99d91fe8aa9..3fafdcb8d95b 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -2982,6 +2982,14 @@ bool xe_bo_needs_ccs_pages(struct xe_bo *bo)
>> if (IS_DGFX(xe) && (bo->flags & XE_BO_FLAG_SYSTEM))
>> return false;
>>
>> + /*
>> + * Compression implies coh_none, therefore we know for sure that WB
>> + * memory can't currently use compression, which is likely one of the
>> + * common cases.
>> + */
>> + if (bo->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB)
>> + return false;
>> +
>> return true;
>> }
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
>> index 30fdbdb9341e..38a6a49c1b2a 100644
>> --- a/drivers/gpu/drm/xe/xe_pat.c
>> +++ b/drivers/gpu/drm/xe/xe_pat.c
>> @@ -103,7 +103,8 @@ static const struct xe_pat_table_entry xelpg_pat_table[] = {
>> *
>> * Note: There is an implicit assumption in the driver that compression and
>> * coh_1way+ are mutually exclusive. If this is ever not true then userptr
>> - * and imported dma-buf from external device will have uncleared ccs state.
>> + * and imported dma-buf from external device will have uncleared ccs state. See
>> + * also xe_bo_needs_ccs_pages().
>> */
>> #define XE2_PAT(no_promote, comp_en, l3clos, l3_policy, l4_policy, __coh_mode) \
>> { \
>> --
>> 2.49.0
>>
>
More information about the Intel-xe
mailing list