[Intel-gfx] [PATCH 00/15] HuC loading for DG2

Ye, Tony tony.ye at intel.com
Wed Jul 6 20:11:42 UTC 2022


On 7/6/2022 12:29 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 7/6/2022 10:26 AM, Ye, Tony wrote:
>>
>> On 7/5/2022 4:30 PM, Ceraolo Spurio, Daniele wrote:
>>>
>>>
>>> On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>>>>> On From: Ye, Tony <tony.ye at intel.com>
>>>>> Sent: Thursday, June 16, 2022 12:15 AM
>>>>>
>>>>>
>>>>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>>>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>>>>> (just send a message to the GSC with the physical 
>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>>>>> complete. This means that there is a small window of 
>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC 
>>>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>>>> status, although this is quite unlikely to happen 
>>>>>>>>>>>>>>>>>> (HuC is
>>>>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if 
>>>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL 
>>>>>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is 
>>>>>>>>>>>>>>>>>> loaded.
>>>>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>>>>> The main areas impacted by the fact that we can't 
>>>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be 
>>>>>>>>>>>>>>>> transparent to
>>>>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily 
>>>>>>>>>>>>>>>> flip
>>>>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>>>>> that this is different from what happens during GT 
>>>>>>>>>>>>>>>> reset for
>>>>>>>>>>>>>>>> older platforms, because in that scenario we guarantee 
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>>>>> the best option is for i915 to stall media submission 
>>>>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the 
>>>>>>>>>>>>>>> resume
>>>>>>>>>>>>>>> part.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an 
>>>>>>>>>>>>>>> option?
>>>>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If in the end conclusion will be that i915 resume indeed 
>>>>>>>>>>>>> cannot
>>>>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up 
>>>>>>>>>>>>> contexts
>>>>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't 
>>>>>>>>>>>> think
>>>>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>>>>> Yeah maybe not. But entertaining the idea that it is 
>>>>>>>>>>> technically
>>>>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>>>>> mark contexts which want HuC access. Then track if there are 
>>>>>>>>>>> any
>>>>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>>>>> resume if there are. If that would end up significantly less 
>>>>>>>>>>> code
>>>>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>>>>
>>>>>>>>>>> What would be the end result from users point of view in case
>>>>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>>>>> handle one window not providing new data. Probably depends on
>>>>> the
>>>>>>>>>>> compositor.
>>>>>>>>>>>
>>>>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>>>>> would be delayed by 300ms.
>>>>>>>>>>>
>>>>>>>>>>> With my ChromeOS hat the stalled media engines does sound 
>>>>>>>>>>> like a
>>>>>>>>>>> better solution. But with the maintainer hat I'd like all 
>>>>>>>>>>> options
>>>>>>>>>>> evaluated since there is attractiveness if a good enough 
>>>>>>>>>>> solution
>>>>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>>>>
>>>>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the 
>>>>>>>>>> GSC is
>>>>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>>>>> overhead is caused by the fact that we have 2 mei modules in 
>>>>>>>>>> play
>>>>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including 
>>>>>>>>>> HuC
>>>>>>>>>> load.
>>>>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>>>>> there the wait is on the i915 resume path?
>>>>>>>> I don't have "official" expected load times at hand, but 
>>>>>>>> looking at
>>>>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>>>>> load both GuC and HuC:
>>>>>>>>
>>>>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init 
>>>>>>>> [i915]] GSC
>>>>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC 
>>>>>>>> firmware
>>>>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 
>>>>>>>> 7.9
>>>>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>>>>
>>>>>>>> Note that we increase the GT frequency all the way to the max 
>>>>>>>> before
>>>>>>>> starting the FW load, which speeds things up.
>>>>>>>>
>>>>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>>>>> should be doable.
>>>>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>>>>> Tony, any feedback here?
>>>>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>>>>> means:
>>>>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>>>>
>>>>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>>>>> docs?
>>>>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>>>>> That would be great, thanks.
>>>>>>>>>
>>>>>>>>> And with so rich return codes already documented and exposed via
>>>>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>>>>> value) it should retry? I mean is there another negative error
>>>>>>>>> missing which would prevent zero transitioning to one?
>>>>>>>> I think if the auth fails we currently return 0, because the uc
>>>>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete 
>>>>>>>> but not
>>>>>>>> fully enabled. I don't have anything against changing the FW state
>>>>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done 
>>>>>>>> yet",
>>>>>>>> but I'd prefer the media team to comment on their needs for this
>>>>>>>> IOCTL before committing to anything.
>>>>>>>
>>>>>>> Currently media doesn't differentiate "delayed loading is in
>>>>>>> progress" with "HuC is authenticated and running". If the HuC
>>>>>>> authentication eventually fails, the user needs to check the 
>>>>>>> debugfs
>>>>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>>>>> (Side note - debugfs can be assumed to not exist so it is not
>>>>>> interesting to users.)
>>>>>>
>>>>>> There isn't currently a "delayed loading is in progress" state, 
>>>>>> that's
>>>>>> the discussion in this thread, if and how to add it.
>>>>>>
>>>>>> Getparam it currently documents these states:
>>>>>>
>>>>>>   -ENODEV if HuC is not present on this platform,
>>>>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>>>>   -ENOPKG if HuC firmware was not installed,
>>>>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>>>>   0 if HuC firmware is not running,
>>>>>>   1 if HuC firmware is authenticated and running.
>>>>>>
>>>>>> This patch proposed to change this to:
>>>>>>
>>>>>>   1 if HuC firmware is authenticated and running or if delayed 
>>>>>> load is
>>>>>> in progress,
>>>>>>   0 if HuC firmware is not running and delayed load is not in 
>>>>>> progress
>>>>>>
>>>>>> Alternative idea is for DG2 (well in general) to add some more fine
>>>>>> grained states, so that i915 does not have to use 1 for both running
>>>>>> and loading. This may be adding a new error code for auth fails as
>>>>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>>>>> platform is DG2 it needs to query it again since it will 
>>>>>> transition to
>>>>>> either 1 or error eventually. This would mean the non error states
>>>>>> would be:
>>>>>>
>>>>>>   0 not running (aka loading)
>>>>>>   1 running (and authenticated)
>>>>>>
>>>>>> @Daniele - one more thing - can you make sure in the series (if you
>>>>>> haven't already) that if HuC status was in any error before suspend
>>>>>> reload is not re-tried on resume? My thinking is that the error is
>>>>>> likely to persist and we don't want to impose long delay on every
>>>>>> resume afterwards. Makes sense to you?
>>>>>>
>>>>>> @Tony - one more question for the UMD. Or two.
>>>>>>
>>>>>> How prevalent is usage of HuC on DG2 depending on what codecs 
>>>>>> need it?
>>>>>> Do you know in advance, before creating a GEM context, that HuC
>>>>>> commands will be sent to the engine or this changes at runtime?
>>>>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) 
>>>>> is in use.
>>>>> It's also used by content protection. And UMD doesn't know if it 
>>>>> will be used
>>>>> later at context creation time.
>>>>>
>>>> from UMD perspective, We don’t care much on the normal 
>>>> initialization process
>>>> because, I could not image that a system is boot up, and user 
>>>> select a crypted content
>>>> to playback, and huc is still not ready.
>>>> of course, We are  also ok to query the huc status twice, and wait 
>>>> if the status is "0 not running"
>>>> to avoid potential issue.
>>>>
>>>> I suppose the main possible issue will happen in the 
>>>> hibernation/awake process, it is transparent to UMD.
>>>> UMD will not call ioctrl  to query huc status in this process, and 
>>>> will continue to send command buffer to KMD.
>>>
>>> I think there is an agreement that it is ok to return 0 to mark the 
>>> load still in progress and 1 for load & auth complete. However, 
>>> double checking the code it turns out that we currently return 0 on 
>>> load failure, even if that's not particularly clear from the 
>>> comment. I can easily change that to be an error code, but not sure 
>>> if that's considered an API breakage considering it's not a well 
>>> documented behavior. I believe that on pre-DG2 userspace considers 1 
>>> as ok and everything else as failure, so changing the ioctl to 
>>> return an error code on failure and 0 for load pending (with the 
>>> latter being a DG2-esclusive code for now) should be safe, but I'd 
>>> like confirmation that I'm not breaking API before sending the 
>>> relevant code.
>>
>> The UMD code is like this:
>>
>>     struct drm_i915_getparam gp;
>>     int32_t value;
>>     gp.param = I915_PARAM_HUC_STATUS;
>>     gp.value = &value;
>>     ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
>>     if (ret != 0)
>>         hasHuC = 0
>>     else
>>         if (value == 0)
>>             hasHuC = 0;
>>         else
>>             hasHuC = 1;
>>
>> Currently the behavior of i915 is:
>>
>>     if there is an error, ioctl returns -1, and set errno as 
>> ENODEV/EOPNOTSUPP/ENOPKG/ENOEXEC;
>
> Not exactly. The current map of FW states to ioctl behavior is:
>
> FIRMWARE_NOT_SUPPORTED -> return -ENODEV;
> FIRMWARE_DISABLED -> return -EOPNOTSUPP;
> FIRMWARE_MISSING ->  return -ENOPKG;
> FIRMWARE_ERROR -> return -ENOEXEC

For these 4 errors, ioctl is returning -1 and setting the errno as one 
of the error values above.

ioctl is not returning the error values. It only returns -1.

This is exactly the behavior of i915 observed from the user space. You 
can verify it with the code snippet.

The igt huc_copy test is also checking the errno instead of the ret value:

https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/i915/gem_huc_copy.c#L66

> FIRMWARE_INIT_FAIL -> return 0, value 0
> FIRMWARE_LOAD_FAIL -> return 0, value 0

If you change these two conditions to return -EIO, then what the user 
space actually gets is going to be:

     ioctl returns -1, and errno==EIO.

It's not going to break the current UMD as you said below.

Thanks,

Tony


> FIRMWARE_RUNNING -> return 0, value 1
>
> So we do have 2 error state in which the ioctl succeeds.
>
>>
>>     otherwise, set *(gp.value) as 0 if HuC is not running, or 1 if 
>> HuC is authenticated.
>>
>> Hi Daniele, which value are you going to change - the "ret" or the 
>> "value"?
>
> The idea is to change the 2 FAIL states above to return an error 
> (probably -EIO) instead of setting value to 0. This would be 
> compatible with your code snippet, because it'll hit the ret != 0 
> condition. Value 0 can then be re-purposed for DG2 to indicate "load 
> pending", which would not be compatible with your current code, but 
> being a new addition for a new platform does not necessarily need to be.
>
> This said, I'm not sure if changing the return behavior of INIT_FAIL 
> and LOAD_FAIL is API breakage or not, given that it won't impact 
> userspace expectations. Tvrtko, any feedback here?
>
> Thanks,
> Daniele
>
>>
>>
>> Thanks,
>>
>> Tony
>>
>>>
>>> Thanks,
>>> Daniele
>>>
>>>>
>>>>> Thanks,
>>>>>
>>>>> Tony
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>
>


More information about the Intel-gfx mailing list