[Intel-gfx] [PATCH 00/15] HuC loading for DG2

Thu Jun 16 02:28:26 UTC 2022

> On From: Ye, Tony <tony.ye at intel.com>
> Sent: Thursday, June 16, 2022 12:15 AM
> 
> 
> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
> >
> > On 15/06/2022 00:15, Ye, Tony wrote:
> >> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
> >>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
> >>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
> >>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
> >>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
> >>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
> >>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
> >>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
> >>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
> >>>>>>>>>>>>> command. The load operation itself is relatively simple
> >>>>>>>>>>>>> (just send a message to the GSC with the physical address
> >>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
> >>>>>>>>>>>>> requires special attention. In particular, to send a PXP
> >>>>>>>>>>>>> command we need to first export the GSC driver and then
> >>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
> >>>>>>>>>>>>> means that HuC load will complete after i915 load is
> >>>>>>>>>>>>> complete. This means that there is a small window of time
> >>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
> >>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
> >>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
> >>>>>>>>>>>>> usually loaded before kernel init/resume completes).
> >>>>>>>>>>>>> We've consulted with the media team in regards to how to
> >>>>>>>>>>>>> handle this and they've asked us to do the following:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
> >>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
> >>>>>>>>>>>>> way to check if HuC is enabled and then includes a
> >>>>>>>>>>>>> secondary check in the batches to get the actual status,
> >>>>>>>>>>>>> so doing it this way allows userspace to keep working
> >>>>>>>>>>>>> without changes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
> >>>>>>>>>>>>> Stalls are
> >>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
> >>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
> >>>>>>>>>>>>> completed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Motivation to add these complications into i915 are not
> >>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
> >>>>>>>>>>>> the premise of the series, right? So no backwards
> >>>>>>>>>>>> compatibility concerns. In this case why jump through the
> >>>>>>>>>>>> hoops and not let userspace handle all of this by just
> >>>>>>>>>>>> leaving the getparam return the true status?
> >>>>>>>>>>>
> >>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
> >>>>>>>>>>> that HuC load is complete when i915 starts accepting
> >>>>>>>>>>> submissions are boot and suspend/resume, with the latter
> >>>>>>>>>>> being the main problem; GT reset is not a concern because
> >>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
> >>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
> >>>>>>>>>>> from loaded to not without userspace knowledge, especially
> >>>>>>>>>>> if we start going into deeper suspend states and start
> >>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
> >>>>>>>>>>> that this is different from what happens during GT reset for
> >>>>>>>>>>> older platforms, because in that scenario we guarantee that
> >>>>>>>>>>> HuC reload is complete before we restart the submission
> >>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
> >>>>>>>>>>> change. We had an internal discussion about this problem
> >>>>>>>>>>> with both media and i915 archs and the conclusion was that
> >>>>>>>>>>> the best option is for i915 to stall media submission while
> >>>>>>>>>>> HuC (re-)load is in progress.
> >>>>>>>>>>
> >>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
> >>>>>>>>>> that from the cover letter. I read the statement about the
> >>>>>>>>>> unlikely and small window where HuC is not loaded during
> >>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
> >>>>>>>>>> part.
> >>>>>>>>>>
> >>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
> >>>>>>>>>
> >>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
> >>>>>>>>> can't start until i915 resume completes.
> >>>>>>>>
> >>>>>>>> I'll dig into this in the next few days since I want to
> >>>>>>>> understand how exactly it works. Or someone can help explain.
> >>>>>>>>
> >>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
> >>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
> >>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
> >>>>>>>> explained, user experience post resume wouldn't be good.
> >>>>>>>
> >>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
> >>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
> >>>>>>> we want to block within the i915 resume path for that long.
> >>>>>>
> >>>>>> Yeah maybe not. But entertaining the idea that it is technically
> >>>>>> possible to block - we could perhaps add uapi for userspace to
> >>>>>> mark contexts which want HuC access. Then track if there are any
> >>>>>> such contexts with outstanding submissions and only wait in
> >>>>>> resume if there are. If that would end up significantly less code
> >>>>>> on the i915 side to maintain is an open.
> >>>>>>
> >>>>>> What would be the end result from users point of view in case
> >>>>>> where it suspended during video playback? The proposed solution
> >>>>>> from this series sees the video stuck after resume. Maybe
> >>>>>> compositor blocks as well since I am not sure how well they
> >>>>>> handle one window not providing new data. Probably depends on
> the
> >>>>>> compositor.
> >>>>>>
> >>>>>> And then with a simpler solution definitely the whole resume
> >>>>>> would be delayed by 300ms.
> >>>>>>
> >>>>>> With my ChromeOS hat the stalled media engines does sound like a
> >>>>>> better solution. But with the maintainer hat I'd like all options
> >>>>>> evaluated since there is attractiveness if a good enough solution
> >>>>>> can be achieved with significantly less kernel code.
> >>>>>>
> >>>>>> You say 300ms is typical time for HuC load. How long it is on
> >>>>>> other platforms? If much faster then why is it so slow here?
> >>>>>
> >>>>> The GSC itself has to come out of suspend before it can perform
> >>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
> >>>>> also slower in processing the HuC load and auth compared to the
> >>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
> >>>>> GSC FW needs from start of the resume flow to HuC load complete,
> >>>>> so I bumped that to ~300ms to account for all other SW
> >>>>> interactions, plus a bit of buffer. Note that a bit of the SW
> >>>>> overhead is caused by the fact that we have 2 mei modules in play
> >>>>> here: mei-gsc, which manages the GSC device itself (including
> >>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
> >>>>> load.
> >>>>
> >>>> And how long on other platforms (not DG2) do you know? Presumably
> >>>> there the wait is on the i915 resume path?
> >>>
> >>> I don't have "official" expected load times at hand, but looking at
> >>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
> >>> load both GuC and HuC:
> >>>
> >>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
> >>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware
> >>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[    8.158634] i915
> >>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
> >>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
> >>> [i915]] GuC communication enabled <6>[    8.166111] i915
> >>> 0000:03:00.0: [drm] HuC authenticated
> >>>
> >>> Note that we increase the GT frequency all the way to the max before
> >>> starting the FW load, which speeds things up.
> >>>
> >>>>
> >>>>>>>> However, do we really need to lie in the getparam? How about
> >>>>>>>> extend or add a new one to separate the loading vs loaded
> >>>>>>>> states? Since userspace does not support DG2 HuC yet this
> >>>>>>>> should be doable.
> >>>>>>>
> >>>>>>> I don't really have a preference here. The media team asked us
> >>>>>>> to do it this way because they wouldn't have a use for the
> >>>>>>> different "in progress" and "done" states. If they're ok with
> >>>>>>> having separate flags that's fine by me.
> >>>>>>> Tony, any feedback here?
> >>>>>>
> >>>>>> We don't even have any docs in i915_drm.h in terms of what it
> means:
> >>>>>>
> >>>>>> #define I915_PARAM_HUC_STATUS         42
> >>>>>>
> >>>>>> Seems to be a boolean. Status false vs true? Could you add some
> >>>>>> docs?
> >>>>>
> >>>>> There is documentation above intel_huc_check_status(), which is
> >>>>> also updated in this series. I can move that to i915_drm.h.
> >>>>
> >>>> That would be great, thanks.
> >>>>
> >>>> And with so rich return codes already documented and exposed via
> >>>> uapi - would we really need to add anything new for DG2 apart for
> >>>> userspace to know that if zero is returned (not a negative error
> >>>> value) it should retry? I mean is there another negative error
> >>>> missing which would prevent zero transitioning to one?
> >>>
> >>> I think if the auth fails we currently return 0, because the uc
> >>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
> >>> fully enabled. I don't have anything against changing the FW state
> >>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
> >>> but I'd prefer the media team to comment on their needs for this
> >>> IOCTL before committing to anything.
> >>
> >>
> >> Currently media doesn't differentiate "delayed loading is in
> >> progress" with "HuC is authenticated and running". If the HuC
> >> authentication eventually fails, the user needs to check the debugfs
> >> to know the reason. IMHO, it's not a big problem as this is what we
> >> do even when the IOCTL returns non-zero values. + Carl to comment.
> >
> > (Side note - debugfs can be assumed to not exist so it is not
> > interesting to users.)
> >
> > There isn't currently a "delayed loading is in progress" state, that's
> > the discussion in this thread, if and how to add it.
> >
> > Getparam it currently documents these states:
> >
> >  -ENODEV if HuC is not present on this platform,
> >  -EOPNOTSUPP if HuC firmware is disabled,
> >  -ENOPKG if HuC firmware was not installed,
> >  -ENOEXEC if HuC firmware is invalid or mismatched,
> >  0 if HuC firmware is not running,
> >  1 if HuC firmware is authenticated and running.
> >
> > This patch proposed to change this to:
> >
> >  1 if HuC firmware is authenticated and running or if delayed load is
> > in progress,
> >  0 if HuC firmware is not running and delayed load is not in progress
> >
> > Alternative idea is for DG2 (well in general) to add some more fine
> > grained states, so that i915 does not have to use 1 for both running
> > and loading. This may be adding a new error code for auth fails as
> > Daniele mentioned. Then UMD can know that if 0 is returned and
> > platform is DG2 it needs to query it again since it will transition to
> > either 1 or error eventually. This would mean the non error states
> > would be:
> >
> >  0 not running (aka loading)
> >  1 running (and authenticated)
> >
> > @Daniele - one more thing - can you make sure in the series (if you
> > haven't already) that if HuC status was in any error before suspend
> > reload is not re-tried on resume? My thinking is that the error is
> > likely to persist and we don't want to impose long delay on every
> > resume afterwards. Makes sense to you?
> >
> > @Tony - one more question for the UMD. Or two.
> >
> > How prevalent is usage of HuC on DG2 depending on what codecs need it?
> > Do you know in advance, before creating a GEM context, that HuC
> > commands will be sent to the engine or this changes at runtime?
> 
> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in use.
> It's also used by content protection. And UMD doesn't know if it will be used
> later at context creation time.
> 
from UMD perspective, We don’t care much on the normal initialization process
because, I could not image that a system is boot up, and user select a crypted content
to playback, and huc is still not ready.
of course, We are  also ok to query the huc status twice, and wait if the status is "0 not running"
to avoid potential issue.

I suppose the main possible issue will happen in the hibernation/awake process, it is transparent to UMD.
UMD will not call ioctrl  to query huc status in this process, and will continue to send command buffer to KMD.

> Thanks,
> 
> Tony
> 
> >
> > Regards,
> >
> > Tvrtko