[Intel-gfx] [PATCH v7 5/8] drm/i915/pxp: Add ARB session creation and cleanup
Ceraolo Spurio, Daniele
daniele.ceraolospurio at intel.com
Mon Apr 17 19:15:42 UTC 2023
On 4/17/2023 11:21 AM, Teres Alexis, Alan Previn wrote:
> On Mon, 2023-04-10 at 10:14 -0700, Ceraolo Spurio, Daniele wrote:
>>
>>
> alan:snip
>
>>> @@ -354,8 +368,14 @@ int intel_pxp_start(struct intel_pxp *pxp)
>>> if (!intel_pxp_is_enabled(pxp))
>>> return -ENODEV;
>>>
>>> - if (wait_for(pxp_component_bound(pxp), 250))
>>> - return -ENXIO;
>>> + if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
>>> + /* Use a larger 1 second timeout considering all the dependencies. */
>>> + if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), 1000))
>>> + return -ENXIO;
>> This needs a comment to explain that we expect userspace to retry later
>> if we return -ENXIO and therefore the timeout is smaller that what would
>> be required to guarantee that the init can complete. It also needs an
>> ack from the userspace drivers for this behavior.
>>
> alan: I agree with a requirement to comment this down. However, i believe its better
> to put this int the UAPI header file comment-documentation since it applies to both
> current MTL as well as previous ADL cases (this is not new behavior being introduced
> in MTL but it is fixing of the spec of existing behavior).
> That said, your feedback is already being addressed by patch #6 but i will ammend
> patch #6 to add the detail you highlighted WRT ~"timeout is not larger than the completion-guarantee...".
Can you move that comment update for the context getparam from the next
patch to this one? that way it's all nicely self-contained in this patch.
> alan:snip
>>> +fw_err_to_string(u32 type)
>>> +{
>>> + switch (type) {
>>> + case PXP_STATUS_ERROR_API_VERSION:
>>> + return "ERR_API_VERSION";
>>> + case PXP_STATUS_NOT_READY:
>>> + return "ERR_NOT_READY";
>>> + case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF:
>>> + case PXP_STATUS_PLATFCONFIG_KF1_BAD:
>>> + return "ERR_PLATFORM_CONFIG";
>>> + default:
>>> + break;
>>> + }
>>> + return NULL;
>>> +}
>>> +
>> There is a lot of replication for this error handling, I'm wondering if
>> it's worth adding a common function to handle this. Something like:
>>
>> intel_pxp_header_error(u32 header, const char *op, u32 id)
>> {
>> if (is_fw_err_platform_config(header.status)) {
>> drm_info_once(&i915->drm,
>> "PXP %s-%d failed due to BIOS/SOC:0x%08x:%s\n",
>> op, id, header.status,
>> fw_err_to_string(header.status));
>> } else {
>> drm_dbg(&i915->drm, "PXP %s-%d failed 0x%08x:%st:\n",
>> op, id, header.status,
>> fw_err_to_string(header.status));
>> drm_dbg(&i915->drm, " cmd-detail: ID=[0x%08x],API-Ver-[0x%08x]\n",
>> header.command_id, header.api_version);
>> }
>> }
>>
>>
>> Not a blocker.
> alan: Thanks - i will have to address as a stand alone patch since i have to
> think about moving this to a comment helper layer (common to both ADL-mei-comp and MTL-gsccs)
> while potentially have different set of error codes that can mean different reporting levels
> (i.e. notice once coz its a platform limit vs always err out if its runtime related).
> Once this series gets merged it will be easier to work on that patch (which would require both
> backends to be present in the baseline).
> alan:snip
>>> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
>>> @@ -10,14 +10,18 @@
>>>
>>> struct intel_pxp;
>>>
>>> -#define GSC_REPLY_LATENCY_MS 200
>>> +#define GSC_REPLY_LATENCY_MS 210
>> why move from 200 to 210? not a problem, I just want to understand why
>> this is required.
>>
>> Daniele
> alan: so 200 is based on the fw spec - and from real testing i observed the occasional highs touching ~185ms.
> However, the spec gives this number as the expected max time the firmware is going to take to process the request
> and post a reply. Therefore it doesn't include the additional overhead for the request creation, emision,
> submission via guc and the completion pipeline completion indication. All of these contribute additional layers
> of possible delay. Since arb-session creation and invalidation are not common events,
> I believe a slightly wider overhead of 10 milisec will not hurt.
Agreed. Can you add a comment? something like "Max FW response time is
200ms, to which we add 10ms to account for overhead".
With those 2 nits addressed:
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
Daniele
>
More information about the Intel-gfx
mailing list