IOCTL feature detection (Was: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation)

Ceraolo Spurio, Daniele daniele.ceraolospurio at intel.com
Wed Apr 26 18:10:00 UTC 2023



On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote:
> On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
>> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
>>> (+ Faith and Daniel as they have been involved in previous discussions)
>>> Quoting Jordan Justen (2023-04-24 20:13:00)
>>>> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
>>>>>
> alan:snip
>
>> - the more a feature spans drivers/modules, the more it should be
>>    discovered by trying it out, e.g. dma-buf fence import/export was a huge
>>    discussion, luckily mesa devs figured out how to transparantly fall back
>>    at runtime so we didn't end up merging the separate feature flag (I
>>    think at least, can't find it). pxp being split across i915/me/fw/who
>>    knows what else is kinda similar so I'd heavily lean towards discovery
>>    by creating a context
>>
>> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>>    else

I think there has been a bit of confusion in regards to this timeout and 
to where it applies, so let me try to clarify to make sure we're all on 
the same page (Alan has already explained most of it below, but I'm 
going to go in a bit more detail and I want to make sure it's all in one 
place for reference).
Before we can do any PXP operation, dependencies need to be satisfied, 
some of which are outside of i915. For MTL, these are:

GSC FW needs to be loaded (~250 ms)
HuC FW needs to be authenticated for PXP ops (~20 ms)
MEI modules need to be bound (depends on the probe ordering, but usually 
a few secs)
GSC SW proxy via MEI needs to be established (~500 ms normally, but can 
take a few seconds on the first boot after a firmware update)

Due to the fact that these can take several seconds in total to 
complete, to avoid stalling driver load/resume for that long we moved 
the i915-side operations to a separate worker and we register i915 
before they've completed. This means that we can get a PXP context 
creation call before all the dependencies are in place, in which case we 
do need to wait and that's where the 8s come from. After all the pieces 
are in place, a PXP context creation call is much faster (up to ~150 ms, 
which is the time required to start the PXP session if it is not already 
running).

The reason why we suggested a dedicated getparam was to avoid requiring 
early users to wait for all of that to happen just to check the 
capability. By the time an user actually wants to use PXP, we're likely 
done with the prep steps (or at least we're far along with them) and 
therefore the wait will be short.

> Alan: Please be aware that:
> 1. the wait-timeout was changed to 1 second sometime back.
> 2. the I'm not deciding the time-out. I initially wanted to keep it at the same
> timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
> same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
> process, we reduced it to 1 second.
> 3. In anycase, thats just the wait-timeout - and we know it wont succeed until
> ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
> - its the component driver load <-- this is what's broken.

I think the question here is whether the mei driver is taking a long 
time to probe or if it is just being probed late. In the latter case, I 
wouldn't call it broken.

>
> Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
> component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
> Most of above steps begin rather quickly during i915 driver load - the delay
> seems to come from a very late mei-gsc-proxy component driver load. In fact the
> parent mei-me driver is only getting ~6 seconds after i915 init is done. That
> blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.
>
> That said, what is broken is why it takes so long to get the component drivers
> to come up. NOTE: PXP isnt really doing anything differently in the context
> creation flow (in terms of time-consuming-steps compared to ADL) besides the
> extra dependency waits these.
>
> We can actually go back to the original timeout of 250 milisecs like we have in ADL
> but will fail if MESA calls in too early (but will succeed later) ... or...
> we can create the GET_PARAMs.
>
> A better idea would be to figure out how to control the driver load order and
> force mei driver + components to get called right after i915. I was informed
> there is no way to control this and changes here will likely not be accepted
> upstream.

we could add a device link to mark i915 as a consumer of mei, but I 
believe that wouldn't work for 2 reasons

1 - on discrete, mei binds to a child device of i915, so the dependency 
is reversed
2 - the link might just delay the i915 load to after the mei load, which 
I'm not sure it is something we want (and at that point we could also 
just wait for mei to bind from within the i915 load).

Daniele

>
> ++ Daniele - can you chime in?
>
> Take note that ADL has the same issue but for whatever reason, the dependant
> mei component on ADL loaded much sooner - so it was never an issue that was
> caught but still existed on ADL time merge (if users customize the kernel +
> compositor for fastboot it will happen).(i realize I havent tested ADL with the
> new kernel configs that we use to also boot PXP on MTL - wonder if the new
> mei configs are causing the delay - i.e. ADL customer could suddenly see this
> 6 sec delay too. - something i have to check now)



More information about the dri-devel mailing list