[Intel-gfx] [PATCH v7 5/8] drm/i915/pxp: Add ARB session creation and cleanup

Mon Apr 17 19:15:42 UTC 2023

On 4/17/2023 11:21 AM, Teres Alexis, Alan Previn wrote:
> On Mon, 2023-04-10 at 10:14 -0700, Ceraolo Spurio, Daniele wrote:
>>
>>
> alan:snip
>
>>> @@ -354,8 +368,14 @@ int intel_pxp_start(struct intel_pxp *pxp)
>>>    	if (!intel_pxp_is_enabled(pxp))
>>>    		return -ENODEV;
>>>    
>>> -	if (wait_for(pxp_component_bound(pxp), 250))
>>> -		return -ENXIO;
>>> +	if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
>>> +		/* Use a larger 1 second timeout considering all the dependencies. */
>>> +		if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), 1000))
>>> +			return -ENXIO;
>> This needs a comment to explain that we expect userspace to retry later
>> if we return -ENXIO and therefore the timeout is smaller that what would
>> be required to guarantee that the init can complete. It also needs an
>> ack from the userspace drivers for this behavior.
>>
> alan: I agree with a requirement to comment this down. However, i believe its better
> to put this int the UAPI header file comment-documentation since it applies to both
> current MTL as well as previous ADL cases (this is not new behavior being introduced
> in MTL but it is fixing of the spec of existing behavior).
> That said, your feedback is already being addressed by patch #6 but i will ammend
> patch #6 to add the detail you highlighted WRT ~"timeout is not larger than the completion-guarantee...".

Can you move that comment update for the context getparam from the next 
patch to this one? that way it's all nicely self-contained in this patch.

> alan:snip
>>> +fw_err_to_string(u32 type)
>>> +{
>>> +	switch (type) {
>>> +	case PXP_STATUS_ERROR_API_VERSION:
>>> +		return "ERR_API_VERSION";
>>> +	case PXP_STATUS_NOT_READY:
>>> +		return "ERR_NOT_READY";
>>> +	case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF:
>>> +	case PXP_STATUS_PLATFCONFIG_KF1_BAD:
>>> +		return "ERR_PLATFORM_CONFIG";
>>> +	default:
>>> +		break;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>> There is a lot of replication for this error handling, I'm wondering if
>> it's worth adding a common function to handle this. Something like:
>>
>> intel_pxp_header_error(u32 header, const char *op, u32 id)
>> {
>> 	if (is_fw_err_platform_config(header.status)) {
>> 		drm_info_once(&i915->drm,
>> 			      "PXP %s-%d failed due to BIOS/SOC:0x%08x:%s\n",
>> 			      op, id, header.status,
>> 			      fw_err_to_string(header.status));
>> 	} else {
>> 		drm_dbg(&i915->drm, "PXP %s-%d failed 0x%08x:%st:\n",
>> 			op, id, header.status,
>> 			fw_err_to_string(header.status));
>> 		drm_dbg(&i915->drm, "     cmd-detail: ID=[0x%08x],API-Ver-[0x%08x]\n",
>> 			header.command_id, header.api_version);
>> 	}
>> }
>>
>>
>> Not a blocker.
> alan: Thanks - i will have to address as a stand alone patch since i have to
> think about moving this to a comment helper layer (common to both ADL-mei-comp and MTL-gsccs)
> while potentially have different set of error codes that can mean different reporting levels
> (i.e. notice once coz its a platform limit vs always err out if its runtime related).
> Once this series gets merged it will be easier to work on that patch (which would require both
> backends to be present in the baseline).
> alan:snip
>>> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
>>> @@ -10,14 +10,18 @@
>>>    
>>>    struct intel_pxp;
>>>    
>>> -#define GSC_REPLY_LATENCY_MS 200
>>> +#define GSC_REPLY_LATENCY_MS 210
>> why move from 200 to 210? not a problem, I just want to understand why
>> this is required.
>>
>> Daniele
> alan: so 200 is based on the fw spec - and from real testing i observed the occasional highs touching ~185ms.
> However, the spec gives this number as the expected max time the firmware is going to take to process the request
> and post a reply. Therefore it doesn't include the additional overhead for the request creation, emision,
> submission via guc and the completion pipeline completion indication. All of these contribute additional layers
> of possible delay. Since arb-session creation and invalidation are not common events,
> I believe a slightly wider overhead of 10 milisec will not hurt.

Agreed. Can you add a comment? something like "Max FW response time is 
200ms, to which we add 10ms to account for overhead".
With those 2 nits addressed:

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

Daniele

>