[PATCH 1/2] drm/xe/gsc: Initialize GSC proxy

Fri Jan 5 07:21:32 UTC 2024

On Mon, 2023-12-11 at 17:05 -0800, Daniele Ceraolo Spurio wrote:

alan: per offline chat, apologies on my delay causing to warrant a rebase.
alan: (nit?/not-nit?): for Xe driver, are we enforcing to ensure all
functions require documentation? (description, input, output, return...).
If so, i think many functions in this patch is lacking this.

> The GSC uC needs to communicate with the CSME to perform certain
> operations. Since the GSC can't perform this communication directly on
> platforms where it is integrated in GT, the graphics driver needs to
> transfer the messages from GSC to CSME and back. The proxy flow must be
> manually started after the GSC is loaded to signal to GSC that we're
> ready to handle its messages and allow it to query its init data from
> CSME.
alan:snip

> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 2e0b2e40d8f3..3e97bc5a5b48 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
alan:snip
> @@ -512,6 +516,14 @@ int xe_device_probe(struct xe_device *xe)
>  err_fini_display:
>  	xe_display_driver_remove(xe);
>  
> +err_fini_gt:
> +	for_each_gt(gt, xe, id) {
> +		if (id < last_gt)
> +			xe_gt_remove(gt);
alan: since xe_gt_init does a drmm_add_action_or_reset(... gt_fini ...),
is there a reason we don't want to place the new xe_uc_remove under there?
(which in turn calls the new xe_gsc_remove).

alan:snip

>  
> diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
> index a8a895cf4b44..38e9dd3129bd 100644
> +static int gsc_upload_and_init(struct xe_gsc *gsc)
> +{
> +	struct xe_gt *gt = gsc_to_gt(gsc);
> +	int ret;
> +
> +	ret = gsc_upload(gsc);
> +	if (ret)
> +		return ret;
alan: nit:going thru gsc_upload top->down, there is only one corner
case where we can fail inside xe_uc_fw_check_version_requirements without
an error message, would we perhaps wanna add a xe_gt_dbg "GSC_FW async init: load done" after this
and tweak the message at end to "GSC FW async init: proxy done"?

alan:snip

> diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
> new file mode 100644
> index 000000000000..0983b4903cc7
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
alan:snip

> +/* shorthand define for code compactness */
> +#define PROXY_HDR_SIZE (sizeof(struct xe_gsc_proxy_header))
alan: nit: "GSC_PROXY_HDR_SIZE"? (for no reason other than consistency).

> +static inline struct xe_device *kdev_to_xe(struct device *kdev)
> +{
> +	return dev_get_drvdata(kdev);
alan: seeing some other xe codes that convert from dev to xe, should this rather be:
	return to_xe_device(dev_get_drvdata(kdev);
> +}
> +
> +static bool gsc_proxy_init_done(struct xe_gsc *gsc)
alan: not blocking: as i compared simple hw/fw facing helpers like
this against i915 (logically the same), I notice that in xe, we dont
have the equivalent for i915's __wait_gsc_proxy_completed which
is called from selftest code. In i915 that helper calls a gsc-helper
same as this function after also checking CONFIG_INTEL_MEI_GSC_PROXY
and the fw loading state. Do we not need this type of checking for
xe BISTs that excercise all engines? To be fair that can be a separate
patch. Probably an offline to-do.

alan:snip

> +u32 emit_proxy_header(struct xe_device *xe, struct iosys_map *map, u32 offset)
> +{
> +	xe_map_memset(xe, map, offset, 0, PROXY_HDR_SIZE);
> +
> +	proxy_header_wr(xe, map, offset, hdr,
> +			FIELD_PREP(GSC_PROXY_TYPE, GSC_PROXY_MSG_TYPE_PROXY_QUERY) |
> +			FIELD_PREP(GSC_PROXY_PAYLOAD_LENGTH, 0));
alan: thinking of discrete devices, i assume proxy messages are occasional and in between
(not sure about how occasional for hdcp case), so as a low priority feedback:
perhaps its more efficient (at the hw pci bus transaction level) if we create a local
structure filled up and then use iosysmap for a memcpy as opposed to writing dwords
one at a time? or am missing something about how we are cache/flush these buffers leading
up  to submission?

alan: snip

> +static int proxy_query(struct xe_gsc *gsc)
> +{
> +	struct xe_gt *gt = gsc_to_gt(gsc);
> +	struct xe_device *xe = gt_to_xe(gt);
> +	struct xe_gsc_proxy_header *to_csme_hdr = gsc->proxy.to_csme;
> +	void *to_csme_payload = gsc->proxy.to_csme + PROXY_HDR_SIZE;
> +	u32 wr_offset;
> +	u32 rd_offset;
> +	u32 size;
> +	int ret;
> +
> +	wr_offset = xe_gsc_emit_header(xe, &gsc->proxy.to_gsc, 0,
> +				       HECI_MEADDRESS_PROXY, 0, PROXY_HDR_SIZE);
> +	wr_offset = emit_proxy_header(xe, &gsc->proxy.to_gsc, wr_offset);
> +
> +	size = wr_offset;
> +
> +	while (1) {
> +		/* clear the GSC response header space */
> +		xe_map_memset(xe, &gsc->proxy.from_gsc, 0, 0, SZ_4K);
alan: any reason we are clearing out a whole page as opposed to
just the sizeof(gsc-cmd-header) or sizeof(gsc-cmd-header) + sizeof(proxy-header)?

> +
> +		/* send proxy message to GSC */
> +		ret = proxy_send_to_gsc(gsc, size);
> +		if (ret)
> +			goto proxy_error;
> +
> +		/* check the reply from GSC */
> +		ret = xe_gsc_read_out_header(xe, &gsc->proxy.from_gsc, 0,
> +					     PROXY_HDR_SIZE, &rd_offset);
alan: nit: could we rename "rd_offset" to "gsc_payload_offset"? (i believe this
would be more descriptive of how it's used in the subsequent places in this loop)

> +		if (ret) {
> +			xe_gt_err(gt, "Invalid gsc header in proxy reply (%pe)\n",
> +				  ERR_PTR(ret));
alan: why the ERR_PTR conversion here? 
> +			goto proxy_error;
> +		}
> +
> +		/* copy the proxy header reply from GSC */
> +		xe_map_memcpy_from(xe, to_csme_hdr, &gsc->proxy.from_gsc,
> +				   rd_offset, PROXY_HDR_SIZE);
> +
> +		/* stop if this was the last message */
> +		if (FIELD_GET(GSC_PROXY_TYPE, to_csme_hdr->hdr) == GSC_PROXY_MSG_TYPE_PROXY_END)
> +			break;
> +
> +		/* make sure the GSC-to-CSME proxy header is sane */
> +		ret = validate_proxy_header(to_csme_hdr,
> +					    GSC_PROXY_ADDRESSING_GSC,
> +					    GSC_PROXY_ADDRESSING_CSME,
> +					    GSC_PROXY_BUFFER_SIZE - rd_offset);
> +		if (ret) {
> +			xe_gt_err(gt, "invalid GSC to CSME proxy header! (%pe)\n",
> +				  ERR_PTR(ret));
alan: why the ERR_PTR conversion here? 
> +			goto proxy_error;
> +		}
> +
> +		/* copy the rest of the message */
> +		size = FIELD_GET(GSC_PROXY_PAYLOAD_LENGTH, to_csme_hdr->hdr);
> +		xe_map_memcpy_from(xe, to_csme_payload, &gsc->proxy.from_gsc,
> +				   rd_offset + PROXY_HDR_SIZE, size);
> +
> +		/* send the GSC message to the CSME */
> +		ret = proxy_send_to_csme(gsc, size + PROXY_HDR_SIZE);
> +		if (ret < 0)
> +			goto proxy_error;
> +
> +		/* reply size from CSME, including the proxy header */
> +		size = ret;
> +		if (size < PROXY_HDR_SIZE) {
> +			xe_gt_err(gt, "CSME to GSC proxy msg too small: 0x%x\n", size);
alan: should we set ret = -EPROTO or other error?
This way, this function will returns an error, also it would ensure all paths in this
loop will either have ret = 0 or an error and simplify the end of the fucntion... i.e.
all cases can just "break" as opposed to "goto proxy_error" and just return 'ret' as is?
> +			goto proxy_error;
alan:snip
> +proxy_error:
> +	return ret < 0 ? ret : 0;
> +}
> +

alan:snip

> +void xe_gsc_proxy_remove(struct xe_gsc *gsc)
> +{
> +	struct xe_gt *gt = gsc_to_gt(gsc);
> +	struct xe_device *xe = gt_to_xe(gt);
> +
> +	if (gsc->proxy.component_added) {
> +		component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
alan: we ought to put the wait for the worker to complete since that worker
could take long time for the component to get added - we could get racy no?

> +		gsc->proxy.component_added = false;
> +	}
> +}
> +
> 
alan:snip

> diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
> index 57fefd66a7ea..645166adae1f 100644
> --- a/drivers/gpu/drm/xe/xe_gsc_types.h
> +++ b/drivers/gpu/drm/xe/xe_gsc_types.h
> @@ -7,11 +7,15 @@
>  #define _XE_GSC_TYPES_H_
>  
>  #include <linux/workqueue.h>
> +#include <linux/iosys-map.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
alan: question: do we need to follow alphabetical for linux includes?
alan:snip
>