[Intel-xe] [PATCH 14/14] drm/xe: Add VM snapshot to xe_devcoredump.
Matthew Brost
matthew.brost at intel.com
Tue May 2 15:38:37 UTC 2023
On Wed, Apr 26, 2023 at 04:57:13PM -0400, Rodrigo Vivi wrote:
> With this patch, we now have some parity between xe_devcoredump
> and the simple_error_capture. The only difference is that
> xe_devcoredump will only stash the 'first' hang, which is the one
> that we care most and should analyze first, while
> simple_error_capture will dump them all the kernel log.
>
> But this is just a start point to start building a useful and
> organized crash dump, using standard infrastructure. Later this
> will be changed to have output that can be parsed by tools and
> used for error replay.
>
> Also, it is important to highlight that the goal is not to replace
> the simple_error_capture which is still useful for some cases.
> But simple_error_capture should be protected under DEBUG and
> EXPERT flags, while the devcoredump has its own production config
> and will be useful for bug reporting and for error replay.
>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
Again maybe hold this off after GPUVA but LGTM. Also 1 nit below.
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> ---
> drivers/gpu/drm/xe/xe_devcoredump.c | 6 ++++++
> drivers/gpu/drm/xe/xe_devcoredump_types.h | 3 +++
> 2 files changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index 1ffd12646a99..9dbafd586fbd 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -16,6 +16,7 @@
> #include "xe_guc_ct.h"
> #include "xe_guc_submit.h"
> #include "xe_hw_engine.h"
> +#include "xe_vm.h"
>
> /**
> * DOC: Xe device coredump
> @@ -103,6 +104,9 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
> for_each_hw_engine(hwe, e->gt, id)
> xe_hw_engine_snapshot_print(coredump->snapshot.hwe[id], &p);
>
> + drm_printf(&p, "\n**** VM ****\n");
> + xe_vm_snapshot_print(coredump->snapshot.vm, &p);
> +
> mutex_unlock(&coredump->lock);
>
> return count - iter.remain;
> @@ -124,6 +128,7 @@ static void xe_devcoredump_free(void *data)
> xe_guc_engine_snapshot_free(coredump->snapshot.ge);
> for_each_hw_engine(hwe, coredump->faulty_engine->gt, id)
> xe_hw_engine_snapshot_free(coredump->snapshot.hwe[id]);
> + xe_vm_snapshot_free(coredump->snapshot.vm);
>
> coredump->faulty_engine = NULL;
> drm_info(&coredump_to_xe(coredump)->drm,
> @@ -172,6 +177,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump)
> coredump->snapshot.hwe[id] = xe_hw_engine_snapshot_capture(hwe);
> }
>
> + coredump->snapshot.vm = xe_vm_snapshot_capture(e->vm, e->gt->info.id);
> xe_force_wake_put(gt_to_fw(e->gt), XE_FORCEWAKE_ALL);
> dma_fence_end_signalling(cookie);
> }
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> index 8b17ecf1b6e6..f508eca292f7 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
> +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> @@ -31,8 +31,11 @@ struct xe_devcoredump_snapshot {
> struct xe_guc_ct_snapshot *ct;
> /** @ge: Guc Engine snapshot */
> struct xe_guc_submit_engine_snapshot *ge;
> +
Nit extra newline.
> /** @hwe: HW Engine snapshot array */
> struct xe_hw_engine_snapshot *hwe[XE_NUM_HW_ENGINES];
> + /** @vm: VM snapshot */
> + struct xe_vm_snapshot *vm;
> };
>
> /**
> --
> 2.39.2
>
More information about the Intel-xe
mailing list