[PATCH v1 1/1] drm/xe/guc: Fix missing init value and add register order check
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Thu Oct 31 22:38:54 UTC 2024
Thanks for the clarification this morning - as such, i should have reviewed sooner.
On Wed, 2024-10-23 at 12:23 -0700, Zhanjun Dong wrote:
> Fix missing initial value for last_value.
> For GuC capture register definition, it is required to define 64bit
> register in a pair of 2 consecutive 32bit register entries, low first,
> then hi. Add code to check this order.
>
> Fixes: 0f1fdf559225 ("drm/xe/guc: Save manual engine capture into capture list")
>
alan:Should the fixes tag be applied to this instead? ->
ecb633646391 ("drm/xe/guc: Plumb GuC-capture into dev coredump")
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_capture.c | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c
> index 8b6cb786a2aa..d7ff7dd60a1d 100644
> --- a/drivers/gpu/drm/xe/xe_guc_capture.c
> +++ b/drivers/gpu/drm/xe/xe_guc_capture.c
> @@ -102,6 +102,7 @@ struct __guc_capture_parsed_output {
> * A 64 bit register define requires 2 consecutive entries,
> * with low dword first and hi dword the second.
> * 2. Register name: null for incompleted define
> + * 3. Incorrect order will trigger XE_WARN.
> */
> #define COMMON_XELP_BASE_GLOBAL \
> { FORCEWAKE_GT, REG_32BIT, 0, 0, "FORCEWAKE_GT"}
> @@ -1678,10 +1679,10 @@ snapshot_print_by_list_order(struct xe_hw_engine_snapshot *snapshot, struct drm_
> struct xe_devcoredump *devcoredump = &xe->devcoredump;
> struct xe_devcoredump_snapshot *devcore_snapshot = &devcoredump->snapshot;
> struct gcap_reg_list_info *reginfo = NULL;
> - u32 last_value, i;
> - bool is_ext;
> + u32 i, last_value = 0;
> + bool is_ext, low32_ready = false;
>
> - if (!list || list->num_regs == 0)
> + if (!list || !list->list || list->num_regs == 0)
> return;
> XE_WARN_ON(!devcore_snapshot->matched_node);
>
> @@ -1706,11 +1707,27 @@ snapshot_print_by_list_order(struct xe_hw_engine_snapshot *snapshot, struct drm_
> value = reg->value;
> if (reg_desc->data_type == REG_64BIT_LOW_DW) {
> last_value = value;
> +
> + /*
> + * A 64 bit register define requires 2 consecutive
> + * entries in register list, with low dword first
> + * and hi dword the second, like:
> + * { XXX_REG_LO(0), REG_64BIT_LOW_DW, 0, 0, NULL},
> + * { XXX_REG_HI(0), REG_64BIT_HI_DW, 0, 0, "XXX_REG"},
> + *
> + * Incorrect order will trigger XE_WARN.
> + */
> + XE_WARN_ON(low32_ready); /* Possible double low here */
> + low32_ready = true;
> /* Low 32 bit dword saved, continue for high 32 bit */
> continue;
> } else if (reg_desc->data_type == REG_64BIT_HI_DW) {
> u64 value_qw = ((u64)value << 32) | last_value;
alan: (just a comment) i see that we continue to print the values out irrespective
of ordering issue, but i think that's perfectly fine since an attempt to mitigate
could be completely wrong without knowing how the last developer incorrectly
modified the reglist. So this is fine.
>
> + /* Incorrect 64bit register order. Possible missing low */
> + XE_WARN_ON(!low32_ready);
alan: perhaps we should catch errors in the opposite direction.
so perhaps we need something like the following before the first if(LOW_DW) check above?
if (low32_ready && reg_desc->data_type != REG_64BIT_HI_DW) {
/* Incorrect register order: higher-DW not following lower-DW */
XE_WARN_ON(!low32_ready);
low32_ready = false;
}
> + low32_ready = false;
> +
> drm_printf(p, "\t%s: 0x%016llx\n", reg_desc->regname, value_qw);
> continue;
> }
> @@ -1727,6 +1744,9 @@ snapshot_print_by_list_order(struct xe_hw_engine_snapshot *snapshot, struct drm_
> drm_printf(p, "\t%s: 0x%08x\n", reg_desc->regname, value);
> }
> }
> +
> + /* Incorrect 64bit register order. Possible missing high */
> + XE_WARN_ON(low32_ready);
> }
>
> /**
More information about the Intel-xe
mailing list