[PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Nov 1 22:22:00 UTC 2024


-----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com> 
Sent: Friday, November 1, 2024 3:15 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> 
> On 11/1/2024 15:00, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Harrison, John C <john.c.harrison at intel.com>
> > Sent: Friday, November 1, 2024 2:53 PM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> > Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
> > Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> >> On 11/1/2024 13:47, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Harrison, John C <john.c.harrison at intel.com>
> >>> Sent: Friday, November 1, 2024 1:14 PM
> >>> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> >>> Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
> >>> Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> >>>> On 11/1/2024 12:40, Cavitt, Jonathan wrote:
> >>>>> -----Original Message-----
> >>>>> From: Harrison, John C <john.c.harrison at intel.com>
> >>>>> Sent: Friday, November 1, 2024 11:46 AM
> >>>>> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> >>>>> Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
> >>>>> Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> >>>>>> On 11/1/2024 11:04, Jonathan Cavitt wrote:
> >>>>>>> When performing a guc_mmio_regset_write, we add all the registers in the
> >>>>>>> reg_sr list to the save/restore list, but do not do the same for the
> >>>>>>> whitelist registers.  Add them in.
> >>>>>>>
> >>>>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249
> >>>>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> >>>>>>> CC: Lucas de Marchi <lucas.demarchi at intel.com>
> >>>>>>> CC: Matt Roper <matthew.d.roper at intel.com>
> >>>>>>> CC: John Harrison <john.c.harrison at intel.com>
> >>>>>>> CC: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> >>>>>>> CC: Ashutosh Dixit <ashutosh.dixit at intel.com>
> >>>>>>> ---
> >>>>>>>      drivers/gpu/drm/xe/xe_guc_ads.c | 11 ++++++++++-
> >>>>>>>      1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>>>> index 943146e5b460..2fc6b1ccc8fc 100644
> >>>>>>> --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>>>> @@ -239,9 +239,12 @@ static size_t calculate_regset_size(struct xe_gt *gt)
> >>>>>>>      	enum xe_hw_engine_id id;
> >>>>>>>      	unsigned int count = 0;
> >>>>>>>      
> >>>>>>> -	for_each_hw_engine(hwe, gt, id)
> >>>>>>> +	for_each_hw_engine(hwe, gt, id) {
> >>>>>>>      		xa_for_each(&hwe->reg_sr.xa, sr_idx, sr_entry)
> >>>>>>>      			count++;
> >>>>>>> +		xa_for_each(&hwe->reg_whitelist.xa, sr_idx, sr_entry)
> >>>>>>> +			count++;
> >>>>>>> +	}
> >>>>>>>      
> >>>>>>>      	count += ADS_REGSET_EXTRA_MAX * XE_NUM_HW_ENGINES;
> >>>>>>>      
> >>>>>>> @@ -727,6 +730,12 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> >>>>>>>      	xa_for_each(&hwe->reg_sr.xa, idx, entry)
> >>>>>>>      		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
> >>>>>>>      
> >>>>>>> +	i = 0;
> >>>>>>> +	xa_for_each(&hwe->reg_whitelist.xa, idx, entry)
> >>>>>>> +		guc_mmio_regset_write_one(ads, regset_map,
> >>>>>>> +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i++),
> >>>>>>> +					  count++);
> >>>>>>> +
> >>>>>> The code that actually writes to the NONPRIV registers
> >>>>>> (xe_reg_sr_apply_whitelist() in xe_reg_src.c) explicitly clears all the
> >>>>>> unused registers with a comment of "clear the rest in case of garbage".
> >>>>> The code in xe_reg_sr_apply_whitelist calls xe_mmio_write32 to write the
> >>>>> registers, whereas the code in guc_mmio_regset_write uses xe_map_memcpy_to
> >>>>> internally.  While the former seems to be writing to the
> >>>>> xe_mmio_adjusted_addr(mmio, reg.addr) + mmio->regs, the latter appears to be
> >>>>> writing to IOSYS_MAP_INIT_OFFSET(ads_to_map(ads), guc_ads_regset_offset(ads).
> >>>>>
> >>>>> I'm not particularly well-versed in these functions, but it looks to me that these
> >>>>> two functions write to different locations and thus would not impact each other.
> >>>>> Or, in other words, I don't think the garbage we're clearing in xe_reg_sr_apply_whitelist
> >>>>> is the same as the data we're writing in guc_mmio_regset_write.
> >>>> No.
> >>>>
> >>>> The apply function is writing the list of whitelisted registers into the
> >>>> whitelist registers themselves. The GuC ADS code is adding lists of
> >>>> registers to the save/restore list for an engine reset.
> >>>>
> >>>> Specifically with regards to the NONPRIV registers, these are a set of
> >>>> registers which hold the addresses of other registers. When set, they
> >>>> allow untrusted users to access those 'other' registers which otherwise
> >>>> would be off limits. The whitelist code is setting up that list. E.g.
> >>>> adding the OA registers to the whitelist to allow applications to use
> >>>> the OA mechanisms. So it does "NONPRIV_REG(x) = OA_REG". It also does
> >>>> "NONPRIV(x+1 .. max) = NO_OP". That is to ensure all the NONPRIV
> >>>> registers are set to something valid and not uninitialised. Otherwise we
> >>>> potentially have unintended registers being whitelisted and users are
> >>>> able to access things they shouldn't. Whereas, setting them all to NO_OP
> >>>> means we are granting all users access to the NO_OP register which they
> >>>> already had access to anyway.
> >>>>
> >>>> Completely separate to that, the GuC ADS code is creating a list of
> >>>> registers which GuC will save and restore across an engine reset. These
> >>>> are all the registers which get trashed by the reset but which are not
> >>>> saved and restored as part of a running context. The NONPRIV registers
> >>>> apparently fall into this category. So we need to tell GuC to preserve
> >>>> their content across a reset. Otherwise, after the reset, the whitelist
> >>>> will be lost. But, the reset state of those registers is 'undefined' as
> >>>> opposed to 'NO-OP' as suggested by the whitelist code. That means that
> >>>> any NONPRIV register which is not part of the reset save/restore list
> >>>> will be no longer be set to NO-OP after a reset. Instead, it will be
> >>>> giving users access to some random register again. And we do not want to
> >>>> do that.
> >>>>
> >>> Okay.  It sounds to me that we aren't performing xe_reg_sr_apply_whitelist
> >>> during an engine reset, because that function should be setting the registers
> >>> to a defined reset state, rather than "undefined".  Should we be calling that
> >>> function during guc_mmio_regset_write?
> >>>
> >>> I tested it and it didn't work on my end, but I might be missing something.
> >>> -Jonathan Cavitt
> >> No. The KMD does not execute any code on an engine reset. The reset is
> >> handled by the GuC. The KMD merely gets notified that it has happened
> >> after the fact. That is the point of giving a save/restore register list
> >> to GuC. GuC will save the values of all the registers in the list before
> >> it does a reset and restore them again after the reset is complete.
> >> Therefore, any register whose value we want to be manually preserved
> >> across an engine reset must be added to the GuC's save/restore list.
> > AFAICT, that's what I'm doing in this patch, so could you please
> > clarify what it is I need to do differently from what is currently
> > present in the patch?
> > -Jonathan Cavitt
> You patch is only adding the NONPRIV registers which have been used as 
> opposed to adding all of them. The foreach loop is iterating over the 
> list of whitelisted registers (i.e. the target registers that are 
> written into the NON_PRIV(x) registers) and adds a NONPRIV register to 
> the save/restore list for each entry found in the whitelist. So if there 
> were three entries in the whitelist, it would add NONPRIV(0..2). 
> Therefore NONPRIV(3..11) are not added to the save/restore list and will 
> be trashed on an engine reset.
> 
> The original patch was simply looping over 0..MAX_NONPRIV and would have 
> added all the NONPRIV registers to the save/restore list.
> 
> The current patch is an optimisation to only add the in-use registers. 
> My concern is that optimisation is not valid and the original version 
> was actually necessary.

I gave it some thought after sending my prior reply and had a feeling
that would be your response.  Okay, I'll send an update that iterates
over all of MAX_NONPRIV.
-Jonathan Cavitt

> 
> John.
> 
> >
> >> John.
> >>
> >>>> John.
> >>>>
> >>>>
> >>>>> -Jonathan Cavitt
> >>>>>
> >>>>>> If we don't trust the reset state to be valid then we need to ensure all
> >>>>>> of them are saved/restored across a reset. Otherwise, that garbage can
> >>>>>> come back and cause problems.
> >>>>>>
> >>>>>> John.
> >>>>>>
> >>>>>>
> >>>>>>>      	for (e = extra_regs; e < extra_regs + ARRAY_SIZE(extra_regs); e++) {
> >>>>>>>      		if (e->skip)
> >>>>>>>      			continue;
> >>
> 
> 


More information about the Intel-xe mailing list