[Intel-gfx] [PATCH] drm/i915/pxp: limit drm-errors or warnings on firmware API failures
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Feb 3 08:24:01 UTC 2023
On 02/02/2023 17:11, Teres Alexis, Alan Previn wrote:
> On Thu, 2023-02-02 at 08:43 +0000, Tvrtko Ursulin wrote:
>>
>> On 02/02/2023 08:13, Alan Previn wrote:
>>> MESA driver is creating protected context on every driver handle
>>> initialization to query caps bit for app. So when running CI tests,
>>> they are observing hundreds of drm_errors when enabling PXP
>>> in .config but using SOC or BIOS configuration that cannot support
>>> PXP sessions.
>>>
>>> Update error handling codes to be more selective on which errors
>>> are reported as drm_error vs drm_WARN_ONCE vs drm_debug.
>>> Don't completely remove all FW error replies (at least keep them
>>> but use drm_debug) or else cusomers that really needs to know that
>>> content protection failed won't be aware of it when debugging.
>>>
>>> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
>>
>> How does this relate to b762787bf767 ("drm/i915/pxp: Use drm_dbg if arb
>> session failed due to fw version") which I thought was already fixing
>> the drm_error spam caused by userspace probing?
>>
> Good question. That previous error was specific to a board that was using
> outdated firmware version that really needed to be upgraded.
> At that point i wasn't aware of the the fact that MESA was seeing
> high frequency of this failure that is tied to platform issues
> (BIOS configuration / SOC fusing). Also, i believe in the prior case
> PXP was not enabled by default the .config in all testing.
>
> In this latest reported bug (i realized i forgot to include the bug no. for this
> new patch - https://gitlab.freedesktop.org/drm/intel/-/issues/7706#note_1746952),
> i was informed that PXP is being enabled by default and there
> were DUT hardware that was not PXP-capable (SOC fusing / BIOS config).
>
> So with this patch, i am trying to balance between issues that is critical
> but are root-caused from HW/platform gaps (louder drm-warn - but just ONCE)
> vs other cases where it could also come from hw/sw state machine (which cannot
> be a WARB_ONCE message since it can occur due to runtime operation events).
>
> One thing to note: i am pushing-for / waiting-on our firmware team to get
> blessing on more fw-error-code to error-string translations that can be allowed
> upstream which is why i added the "pxp_fw_err_to_string" and a single
> "drm_dbg" so that in future, we don't have to keep adding a whole new lines of
> code to multiple functions but just one new error code translation - and instead
> just add the new err-code-to-string entry into a single location.
>
> note: i will re-rev with the bug id.
Thanks for the details. Yes definitely avoid any drm_warn/err/WARN on
invalid conditions/usage that can be triggered from userspace.
And given the bug report is about TGL probably try to add a Fixes: tag
with an appropriate target too, so that there is less bug re-reports
from the released kernels.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list