[Intel-gfx] [PATCH v5 5/8] drm/i915/pxp: Add ARB session creation and cleanup

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Thu Feb 23 23:27:22 UTC 2023


On Fri, 2023-02-17 at 03:12 +0000, Teres Alexis, Alan Previn wrote:
> On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote:
> > Add MTL's function for ARB session creation using PXP firmware
> > version 4.3 ABI structure format.
> 
> alan:snip
> 
> Not part of this patch today but a new modification is required that would end up going into this patch --->
> 
> So from the internal testing we are doing on MTL, i have noticed that the first time the GSC firmware
> is requested to init the arb session (right after a cold-boot or  driver-reload-after-flr), it takes much longer.
> This has resulted in the observation of the following problematic event flow:
> 
> 1. app or igt calls gem-context-create to create a protected context (after a fresh boot or driver reload).
> 2. intel_pxp_start will begin the global teardown and recreation where:
> 	2-a: the first part (i.e. session teardown) is skipped (since arb session wasnt created before this)
>         2-b: the second part (i.e. arb session init commands via the gsc firmware) does happen and takes a long time (on first time)
> 3. step 2 is queued thru a worker while the main call into intel_pxp_start continues to wait for the arb
>     session to start and finally bails out with a timeout (back up through gem-context-create).
> 4. app retries again and now we get a second call that repeats step 1 while 2-b is still wrapping up.
>     so depending on the race of this step 4 (step-1-recall) vs the completion of step 2-b, we could end up
>     getting a 2nd teardown right (i.e. step 2-a going in) after the the first arb-session-creation completed
>     ... eventhough in both cases app just wants the creation.
> 
> The simplest fix (with minimal code changes) would be to add a complementary "is_arb_creation_pending" flag
> alongside the is_arb_valid flag - with both remainining protected by the arb-mutex. That said, we I'll respin rev6
> with this fix along with other mutex fix on Patch4.

After additional offline discussions with Daniele, we've decided against adding more complexity.
Instead we'll get the official timeout spec from the gsc-firmware and bump up the arb-session timeout
creation across the call-stack to ensure its sufficient and if it fails, we return -ENODEV indicating
we do not have PXP support. Although this will block the gem protected-context creation, it will not
block other apps, only the ones creating protected contexts which would end up waiting somewhere for
the gsc fw (from an e2e system level) no matter what design we employ so bumping out timeout with a 
hard -ENODEV seems like the most straight forward.



More information about the dri-devel mailing list