[PATCH v2] drm/msm: Update global fault counter when faulty process has already ended

Fri Aug 15 20:10:35 UTC 2025

Hi,

Gentle ping on this patch.

Best Regards,
- Maíra

On 7/20/25 18:42, Maíra Canal wrote:
> The global fault counter is no longer used since commit 12578c075f89
> ("drm/msm/gpu: Skip retired submits in recover worker"). However, it's
> still needed, as we need to handle cases where a GPU fault occurs after
> the faulting process has already ended.
> 
> Hence, increment the global fault counter when the submitting process
> had already ended. This way, the number of faults returned by
> MSM_PARAM_FAULTS will stay consistent.
> 
> While here, s/unusuable/unusable.
> 
> Fixes: 12578c075f89 ("drm/msm/gpu: Skip retired submits in recover worker")
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
> ---
> 
> v1 -> v2: https://lore.kernel.org/dri-devel/20250714230813.46279-1-mcanal@igalia.com/T/
> 
> * Don't delete the global fault, but instead, increment it when the we get
> 	a fault after the faulting process has ended (Rob Clark)
> * Rewrite the commit message based on the changes.
> 
>   drivers/gpu/drm/msm/msm_gpu.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index c317b25a8162..416d47185ef0 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -465,6 +465,7 @@ static void recover_worker(struct kthread_work *work)
>   	struct msm_gem_submit *submit;
>   	struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
>   	char *comm = NULL, *cmd = NULL;
> +	struct task_struct *task;
>   	int i;
>   
>   	mutex_lock(&gpu->lock);
> @@ -482,16 +483,20 @@ static void recover_worker(struct kthread_work *work)
>   
>   	/* Increment the fault counts */
>   	submit->queue->faults++;
> -	if (submit->vm) {
> +
> +	task = get_pid_task(submit->pid, PIDTYPE_PID);
> +	if (!task)
> +		gpu->global_faults++;
> +	else {
>   		struct msm_gem_vm *vm = to_msm_vm(submit->vm);
>   
>   		vm->faults++;
>   
>   		/*
>   		 * If userspace has opted-in to VM_BIND (and therefore userspace
> -		 * management of the VM), faults mark the VM as unusuable.  This
> +		 * management of the VM), faults mark the VM as unusable. This
>   		 * matches vulkan expectations (vulkan is the main target for
> -		 * VM_BIND)
> +		 * VM_BIND).
>   		 */
>   		if (!vm->managed)
>   			msm_gem_vm_unusable(submit->vm);