[Intel-gfx] [PATCH 00/15] HuC loading for DG2

Wed Jul 6 17:26:23 UTC 2022

On 7/5/2022 4:30 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>>> On From: Ye, Tony <tony.ye at intel.com>
>>> Sent: Thursday, June 16, 2022 12:15 AM
>>>
>>>
>>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>>> (just send a message to the GSC with the physical address
>>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>>> complete. This means that there is a small window of time
>>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
>>>>>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
>>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
>>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
>>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
>>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
>>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
>>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
>>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>>> that this is different from what happens during GT reset for
>>>>>>>>>>>>>> older platforms, because in that scenario we guarantee that
>>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>>> the best option is for i915 to stall media submission while
>>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
>>>>>>>>>>>>> part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an 
>>>>>>>>>>>>> option?
>>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>>
>>>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
>>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
>>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
>>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>>> Yeah maybe not. But entertaining the idea that it is technically
>>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>>> mark contexts which want HuC access. Then track if there are any
>>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>>> resume if there are. If that would end up significantly less code
>>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>>
>>>>>>>>> What would be the end result from users point of view in case
>>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>>> handle one window not providing new data. Probably depends on
>>> the
>>>>>>>>> compositor.
>>>>>>>>>
>>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>>> would be delayed by 300ms.
>>>>>>>>>
>>>>>>>>> With my ChromeOS hat the stalled media engines does sound like a
>>>>>>>>> better solution. But with the maintainer hat I'd like all options
>>>>>>>>> evaluated since there is attractiveness if a good enough solution
>>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>>
>>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
>>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>>> overhead is caused by the fact that we have 2 mei modules in play
>>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
>>>>>>>> load.
>>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>>> there the wait is on the i915 resume path?
>>>>>> I don't have "official" expected load times at hand, but looking at
>>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>>> load both GuC and HuC:
>>>>>>
>>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
>>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware
>>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
>>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>>
>>>>>> Note that we increase the GT frequency all the way to the max before
>>>>>> starting the FW load, which speeds things up.
>>>>>>
>>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>>> should be doable.
>>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>>> Tony, any feedback here?
>>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>>> means:
>>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>>
>>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>>> docs?
>>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>>> That would be great, thanks.
>>>>>>>
>>>>>>> And with so rich return codes already documented and exposed via
>>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>>> value) it should retry? I mean is there another negative error
>>>>>>> missing which would prevent zero transitioning to one?
>>>>>> I think if the auth fails we currently return 0, because the uc
>>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
>>>>>> fully enabled. I don't have anything against changing the FW state
>>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
>>>>>> but I'd prefer the media team to comment on their needs for this
>>>>>> IOCTL before committing to anything.
>>>>>
>>>>> Currently media doesn't differentiate "delayed loading is in
>>>>> progress" with "HuC is authenticated and running". If the HuC
>>>>> authentication eventually fails, the user needs to check the debugfs
>>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>>> (Side note - debugfs can be assumed to not exist so it is not
>>>> interesting to users.)
>>>>
>>>> There isn't currently a "delayed loading is in progress" state, that's
>>>> the discussion in this thread, if and how to add it.
>>>>
>>>> Getparam it currently documents these states:
>>>>
>>>>   -ENODEV if HuC is not present on this platform,
>>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>>   -ENOPKG if HuC firmware was not installed,
>>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>>   0 if HuC firmware is not running,
>>>>   1 if HuC firmware is authenticated and running.
>>>>
>>>> This patch proposed to change this to:
>>>>
>>>>   1 if HuC firmware is authenticated and running or if delayed load is
>>>> in progress,
>>>>   0 if HuC firmware is not running and delayed load is not in progress
>>>>
>>>> Alternative idea is for DG2 (well in general) to add some more fine
>>>> grained states, so that i915 does not have to use 1 for both running
>>>> and loading. This may be adding a new error code for auth fails as
>>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>>> platform is DG2 it needs to query it again since it will transition to
>>>> either 1 or error eventually. This would mean the non error states
>>>> would be:
>>>>
>>>>   0 not running (aka loading)
>>>>   1 running (and authenticated)
>>>>
>>>> @Daniele - one more thing - can you make sure in the series (if you
>>>> haven't already) that if HuC status was in any error before suspend
>>>> reload is not re-tried on resume? My thinking is that the error is
>>>> likely to persist and we don't want to impose long delay on every
>>>> resume afterwards. Makes sense to you?
>>>>
>>>> @Tony - one more question for the UMD. Or two.
>>>>
>>>> How prevalent is usage of HuC on DG2 depending on what codecs need it?
>>>> Do you know in advance, before creating a GEM context, that HuC
>>>> commands will be sent to the engine or this changes at runtime?
>>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is 
>>> in use.
>>> It's also used by content protection. And UMD doesn't know if it 
>>> will be used
>>> later at context creation time.
>>>
>> from UMD perspective, We don’t care much on the normal initialization 
>> process
>> because, I could not image that a system is boot up, and user select 
>> a crypted content
>> to playback, and huc is still not ready.
>> of course, We are  also ok to query the huc status twice, and wait if 
>> the status is "0 not running"
>> to avoid potential issue.
>>
>> I suppose the main possible issue will happen in the 
>> hibernation/awake process, it is transparent to UMD.
>> UMD will not call ioctrl  to query huc status in this process, and 
>> will continue to send command buffer to KMD.
>
> I think there is an agreement that it is ok to return 0 to mark the 
> load still in progress and 1 for load & auth complete. However, double 
> checking the code it turns out that we currently return 0 on load 
> failure, even if that's not particularly clear from the comment. I can 
> easily change that to be an error code, but not sure if that's 
> considered an API breakage considering it's not a well documented 
> behavior. I believe that on pre-DG2 userspace considers 1 as ok and 
> everything else as failure, so changing the ioctl to return an error 
> code on failure and 0 for load pending (with the latter being a 
> DG2-esclusive code for now) should be safe, but I'd like confirmation 
> that I'm not breaking API before sending the relevant code.

The UMD code is like this:

     struct drm_i915_getparam gp;
     int32_t value;
     gp.param = I915_PARAM_HUC_STATUS;
     gp.value = &value;
     ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
     if (ret != 0)
         hasHuC = 0
     else
         if (value == 0)
             hasHuC = 0;
         else
             hasHuC = 1;

Currently the behavior of i915 is:

     if there is an error, ioctl returns -1, and set errno as 
ENODEV/EOPNOTSUPP/ENOPKG/ENOEXEC;

     otherwise, set *(gp.value) as 0 if HuC is not running, or 1 if HuC 
is authenticated.

Hi Daniele, which value are you going to change - the "ret" or the "value"?

Thanks,

Tony

>
> Thanks,
> Daniele
>
>>
>>> Thanks,
>>>
>>> Tony
>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>