[Mesa-dev] [PATCH] anv: Add an option to abort on device loss

Lionel Landwerlin lionel.g.landwerlin at intel.com
Thu May 18 21:04:46 UTC 2017


This looks good, but I wonder whether we're missing a vk_errorf() in 
anv_QueueSubmit() when we get an error from anv_cmd_buffer_execbuf().
In that case it looks like we won't abort.

On 18/05/17 21:51, Jason Ekstrand wrote:
> This is mostly for running in our CI system to prevent dEQP from
> continuing on to the next test if we get a GPU hang.  As it currently
> stands, dEQP uses the same VkDevice for almost all tests and if one of
> the tests hangs, we set the anv_device::device_lost flag and report
> VK_ERROR_DEVICE_LOST for all queue operations from that point forward
> without sending anything to the GPU.  dEQP will happily continue trying
> to run tests and reporting failures until it eventually gets crash that
> forces the test runner to start over.  This circumvents the problem by
> just aborting the process if we ever get a GPU hang.  Since this is not
> the recommended behavior most of the time, we hide it behind an
> environment variable.
>
> Cc: Mark Janes <mark.a.janes at intel.com>
> ---
>   src/intel/vulkan/anv_util.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
> index ba91733..4b916e2 100644
> --- a/src/intel/vulkan/anv_util.c
> +++ b/src/intel/vulkan/anv_util.c
> @@ -30,6 +30,7 @@
>   
>   #include "anv_private.h"
>   #include "vk_enum_to_str.h"
> +#include "util/debug.h"
>   
>   /** Log an error message.  */
>   void anv_printflike(1, 2)
> @@ -95,5 +96,9 @@ __vk_errorf(VkResult error, const char *file, int line, const char *format, ...)
>         fprintf(stderr, "%s:%d: %s\n", file, line, error_str);
>      }
>   
> +   if (error == VK_ERROR_DEVICE_LOST &&
> +       env_var_as_boolean("ANV_ABORT_ON_DEVICE_LOSS", false))
> +      abort();
> +
>      return error;
>   }




More information about the mesa-dev mailing list