[PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Nov 1 22:00:45 UTC 2024


-----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com> 
Sent: Friday, November 1, 2024 2:53 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> 
> On 11/1/2024 13:47, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Harrison, John C <john.c.harrison at intel.com>
> > Sent: Friday, November 1, 2024 1:14 PM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> > Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
> > Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> >> On 11/1/2024 12:40, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Harrison, John C <john.c.harrison at intel.com>
> >>> Sent: Friday, November 1, 2024 11:46 AM
> >>> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> >>> Cc: Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dixit, Ashutosh <ashutosh.dixit at intel.com>
> >>> Subject: Re: [PATCH] drm/xe/xe_guc_ads: Add whitelist registers to write list
> >>>> On 11/1/2024 11:04, Jonathan Cavitt wrote:
> >>>>> When performing a guc_mmio_regset_write, we add all the registers in the
> >>>>> reg_sr list to the save/restore list, but do not do the same for the
> >>>>> whitelist registers.  Add them in.
> >>>>>
> >>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249
> >>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> >>>>> CC: Lucas de Marchi <lucas.demarchi at intel.com>
> >>>>> CC: Matt Roper <matthew.d.roper at intel.com>
> >>>>> CC: John Harrison <john.c.harrison at intel.com>
> >>>>> CC: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> >>>>> CC: Ashutosh Dixit <ashutosh.dixit at intel.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/xe/xe_guc_ads.c | 11 ++++++++++-
> >>>>>     1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>> index 943146e5b460..2fc6b1ccc8fc 100644
> >>>>> --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>> +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> >>>>> @@ -239,9 +239,12 @@ static size_t calculate_regset_size(struct xe_gt *gt)
> >>>>>     	enum xe_hw_engine_id id;
> >>>>>     	unsigned int count = 0;
> >>>>>     
> >>>>> -	for_each_hw_engine(hwe, gt, id)
> >>>>> +	for_each_hw_engine(hwe, gt, id) {
> >>>>>     		xa_for_each(&hwe->reg_sr.xa, sr_idx, sr_entry)
> >>>>>     			count++;
> >>>>> +		xa_for_each(&hwe->reg_whitelist.xa, sr_idx, sr_entry)
> >>>>> +			count++;
> >>>>> +	}
> >>>>>     
> >>>>>     	count += ADS_REGSET_EXTRA_MAX * XE_NUM_HW_ENGINES;
> >>>>>     
> >>>>> @@ -727,6 +730,12 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> >>>>>     	xa_for_each(&hwe->reg_sr.xa, idx, entry)
> >>>>>     		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
> >>>>>     
> >>>>> +	i = 0;
> >>>>> +	xa_for_each(&hwe->reg_whitelist.xa, idx, entry)
> >>>>> +		guc_mmio_regset_write_one(ads, regset_map,
> >>>>> +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i++),
> >>>>> +					  count++);
> >>>>> +
> >>>> The code that actually writes to the NONPRIV registers
> >>>> (xe_reg_sr_apply_whitelist() in xe_reg_src.c) explicitly clears all the
> >>>> unused registers with a comment of "clear the rest in case of garbage".
> >>> The code in xe_reg_sr_apply_whitelist calls xe_mmio_write32 to write the
> >>> registers, whereas the code in guc_mmio_regset_write uses xe_map_memcpy_to
> >>> internally.  While the former seems to be writing to the
> >>> xe_mmio_adjusted_addr(mmio, reg.addr) + mmio->regs, the latter appears to be
> >>> writing to IOSYS_MAP_INIT_OFFSET(ads_to_map(ads), guc_ads_regset_offset(ads).
> >>>
> >>> I'm not particularly well-versed in these functions, but it looks to me that these
> >>> two functions write to different locations and thus would not impact each other.
> >>> Or, in other words, I don't think the garbage we're clearing in xe_reg_sr_apply_whitelist
> >>> is the same as the data we're writing in guc_mmio_regset_write.
> >> No.
> >>
> >> The apply function is writing the list of whitelisted registers into the
> >> whitelist registers themselves. The GuC ADS code is adding lists of
> >> registers to the save/restore list for an engine reset.
> >>
> >> Specifically with regards to the NONPRIV registers, these are a set of
> >> registers which hold the addresses of other registers. When set, they
> >> allow untrusted users to access those 'other' registers which otherwise
> >> would be off limits. The whitelist code is setting up that list. E.g.
> >> adding the OA registers to the whitelist to allow applications to use
> >> the OA mechanisms. So it does "NONPRIV_REG(x) = OA_REG". It also does
> >> "NONPRIV(x+1 .. max) = NO_OP". That is to ensure all the NONPRIV
> >> registers are set to something valid and not uninitialised. Otherwise we
> >> potentially have unintended registers being whitelisted and users are
> >> able to access things they shouldn't. Whereas, setting them all to NO_OP
> >> means we are granting all users access to the NO_OP register which they
> >> already had access to anyway.
> >>
> >> Completely separate to that, the GuC ADS code is creating a list of
> >> registers which GuC will save and restore across an engine reset. These
> >> are all the registers which get trashed by the reset but which are not
> >> saved and restored as part of a running context. The NONPRIV registers
> >> apparently fall into this category. So we need to tell GuC to preserve
> >> their content across a reset. Otherwise, after the reset, the whitelist
> >> will be lost. But, the reset state of those registers is 'undefined' as
> >> opposed to 'NO-OP' as suggested by the whitelist code. That means that
> >> any NONPRIV register which is not part of the reset save/restore list
> >> will be no longer be set to NO-OP after a reset. Instead, it will be
> >> giving users access to some random register again. And we do not want to
> >> do that.
> >>
> > Okay.  It sounds to me that we aren't performing xe_reg_sr_apply_whitelist
> > during an engine reset, because that function should be setting the registers
> > to a defined reset state, rather than "undefined".  Should we be calling that
> > function during guc_mmio_regset_write?
> >
> > I tested it and it didn't work on my end, but I might be missing something.
> > -Jonathan Cavitt
> No. The KMD does not execute any code on an engine reset. The reset is 
> handled by the GuC. The KMD merely gets notified that it has happened 
> after the fact. That is the point of giving a save/restore register list 
> to GuC. GuC will save the values of all the registers in the list before 
> it does a reset and restore them again after the reset is complete. 
> Therefore, any register whose value we want to be manually preserved 
> across an engine reset must be added to the GuC's save/restore list.

AFAICT, that's what I'm doing in this patch, so could you please
clarify what it is I need to do differently from what is currently
present in the patch?
-Jonathan Cavitt

> 
> John.
> 
> >
> >> John.
> >>
> >>
> >>> -Jonathan Cavitt
> >>>
> >>>> If we don't trust the reset state to be valid then we need to ensure all
> >>>> of them are saved/restored across a reset. Otherwise, that garbage can
> >>>> come back and cause problems.
> >>>>
> >>>> John.
> >>>>
> >>>>
> >>>>>     	for (e = extra_regs; e < extra_regs + ARRAY_SIZE(extra_regs); e++) {
> >>>>>     		if (e->skip)
> >>>>>     			continue;
> >>
> 
> 


More information about the Intel-xe mailing list