[igt-dev] [PATCH] xe_exec_reset: Fix cm-gt-reset for LR job behavior

Rodrigo Vivi rodrigo.vivi at intel.com
Wed Aug 23 19:45:07 UTC 2023


On Tue, Aug 08, 2023 at 03:27:10PM -0700, Matthew Brost wrote:
> Long running jobs in Xe are not recoverable even if the job did not
> trigger the GT reset due to DRM scheduler not tracking LR jobs. Update
> cm-gt-reset to understand all LR jobs are lost after a GT reset.
> 
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
>  tests/xe/xe_exec_reset.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/xe/xe_exec_reset.c b/tests/xe/xe_exec_reset.c
> index dfbaa6035..e8faf6209 100644
> --- a/tests/xe/xe_exec_reset.c
> +++ b/tests/xe/xe_exec_reset.c
> @@ -622,8 +622,10 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		xe_exec(fd, &exec);
>  	}
>  
> -	if (flags & GT_RESET)
> +	if (flags & GT_RESET) {
>  		xe_force_gt_reset(fd, eci->gt_id);
> +		usleep(150000);	/* Let GT reset soak */

do we really need this here? and why?

> +	}
>  
>  	if (flags & CLOSE_FD) {
>  		if (flags & CLOSE_ENGINES) {
> @@ -636,7 +638,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  		return;
>  	}
>  
> -	for (i = 1; i < n_execs; i++)
> +	for (i = 1; i < n_execs && !(flags & GT_RESET); i++)
>  		xe_wait_ufence(fd, &data[i].exec_sync, USER_FENCE_VALUE,
>  			       NULL, THREE_SEC);
>  
> @@ -644,7 +646,7 @@ test_compute_mode(int fd, struct drm_xe_engine_class_instance *eci,
>  	xe_vm_unbind_async(fd, vm, 0, 0, addr, bo_size, sync, 1);
>  	xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE, NULL, THREE_SEC);
>  
> -	for (i = 1; i < n_execs; i++)
> +	for (i = 1; i < n_execs && !(flags & GT_RESET); i++)
>  		igt_assert_eq(data[i].data, 0xc0ffee);
>  
>  	for (i = 0; i < n_engines; i++)
> -- 
> 2.34.1
> 


More information about the igt-dev mailing list