[Intel-gfx] [PATCH v7 5/8] drm/i915/pxp: Add ARB session creation and cleanup

Mon Apr 17 18:21:37 UTC 2023

On Mon, 2023-04-10 at 10:14 -0700, Ceraolo Spurio, Daniele wrote:
> 
> 
> 
alan:snip

> > @@ -354,8 +368,14 @@ int intel_pxp_start(struct intel_pxp *pxp)
> >   	if (!intel_pxp_is_enabled(pxp))
> >   		return -ENODEV;
> >   
> > -	if (wait_for(pxp_component_bound(pxp), 250))
> > -		return -ENXIO;
> > +	if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
> > +		/* Use a larger 1 second timeout considering all the dependencies. */
> > +		if (wait_for(intel_pxp_gsccs_is_ready_for_sessions(pxp), 1000))
> > +			return -ENXIO;
> 
> This needs a comment to explain that we expect userspace to retry later 
> if we return -ENXIO and therefore the timeout is smaller that what would 
> be required to guarantee that the init can complete. It also needs an 
> ack from the userspace drivers for this behavior.
> 
alan: I agree with a requirement to comment this down. However, i believe its better
to put this int the UAPI header file comment-documentation since it applies to both
current MTL as well as previous ADL cases (this is not new behavior being introduced
in MTL but it is fixing of the spec of existing behavior).
That said, your feedback is already being addressed by patch #6 but i will ammend
patch #6 to add the detail you highlighted WRT ~"timeout is not larger than the completion-guarantee...".

> 
alan:snip
> > +fw_err_to_string(u32 type)
> > +{
> > +	switch (type) {
> > +	case PXP_STATUS_ERROR_API_VERSION:
> > +		return "ERR_API_VERSION";
> > +	case PXP_STATUS_NOT_READY:
> > +		return "ERR_NOT_READY";
> > +	case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF:
> > +	case PXP_STATUS_PLATFCONFIG_KF1_BAD:
> > +		return "ERR_PLATFORM_CONFIG";
> > +	default:
> > +		break;
> > +	}
> > +	return NULL;
> > +}
> > +
> 
> There is a lot of replication for this error handling, I'm wondering if 
> it's worth adding a common function to handle this. Something like:
> 
> intel_pxp_header_error(u32 header, const char *op, u32 id)
> {
> 	if (is_fw_err_platform_config(header.status)) {
> 		drm_info_once(&i915->drm,
> 			      "PXP %s-%d failed due to BIOS/SOC:0x%08x:%s\n",
> 			      op, id, header.status,
> 			      fw_err_to_string(header.status));
> 	} else {
> 		drm_dbg(&i915->drm, "PXP %s-%d failed 0x%08x:%st:\n",
> 			op, id, header.status,
> 			fw_err_to_string(header.status));
> 		drm_dbg(&i915->drm, "     cmd-detail: ID=[0x%08x],API-Ver-[0x%08x]\n",
> 			header.command_id, header.api_version);
> 	}
> }
> 
> 
> Not a blocker.
alan: Thanks - i will have to address as a stand alone patch since i have to
think about moving this to a comment helper layer (common to both ADL-mei-comp and MTL-gsccs)
while potentially have different set of error codes that can mean different reporting levels
(i.e. notice once coz its a platform limit vs always err out if its runtime related).
Once this series gets merged it will be easier to work on that patch (which would require both
backends to be present in the baseline).
> > 
alan:snip
> > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
> > @@ -10,14 +10,18 @@
> >   
> >   struct intel_pxp;
> >   
> > -#define GSC_REPLY_LATENCY_MS 200
> > +#define GSC_REPLY_LATENCY_MS 210
> 
> why move from 200 to 210? not a problem, I just want to understand why 
> this is required.
> 
> Daniele
alan: so 200 is based on the fw spec - and from real testing i observed the occasional highs touching ~185ms.
However, the spec gives this number as the expected max time the firmware is going to take to process the request
and post a reply. Therefore it doesn't include the additional overhead for the request creation, emision,
submission via guc and the completion pipeline completion indication. All of these contribute additional layers
of possible delay. Since arb-session creation and invalidation are not common events,
I believe a slightly wider overhead of 10 milisec will not hurt.