[Intel-xe] [PATCH 2/2] drm/xe: Add coredump to wa_bb timeouts

Rodrigo Vivi rodrigo.vivi at intel.com
Wed Oct 4 13:40:40 UTC 2023


On Tue, Sep 26, 2023 at 09:20:56PM +0000, Stuart Summers wrote:
> We're seeing some hangs during driver load on some platforms
> in CI which are hard to catch manually. As such, add the dump
> at the time of the hang.
> 
> Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 1aa44d4f9ac1..80ea076197e5 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -46,6 +46,7 @@
>  #include "xe_vm.h"
>  #include "xe_wa.h"
>  #include "xe_wopcm.h"
> +#include "xe_devcoredump.h"
>  
>  struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>  {
> @@ -187,8 +188,10 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)

please notice that xe_devcoredump doesn't have any kind of locking
mechanism, because it relies on the serialization of the gt_reset.

Once you start calling from other places, then we should probably
add some data protection there.

But also, maybe we should define and print some kind of 'type' var
that is and argument to xe_devcoredump() and that gets printed on
top to ensure that we have a clear indication from when they are
coming from a gt_reset and from other timeouts?

Cc: Maarten

>  	xe_bb_free(bb, NULL);
>  	if (timeout < 0)
>  		return timeout;
> -	else if (!timeout)
> +	else if (!timeout) {
> +		xe_devcoredump(q);
>  		return -ETIME;
> +	}
>  
>  	return 0;
>  }
> -- 
> 2.34.1
> 


More information about the Intel-xe mailing list