[PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Tue Apr 23 15:14:57 UTC 2019


On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote:
> +Monk to response this patch.
>
>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> For later driver's reference to see if the fence is signaled.
>>
>> v2: Move parent fence put to resubmit jobs.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> Reviewed-by: Christian König <christian.koenig at amd.com>
>> ---
>>    drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++--
>>    1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 7816de7..03e6bd8 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    		if (s_job->s_fence->parent &&
>>    		    dma_fence_remove_callback(s_job->s_fence->parent,
>>    					      &s_job->cb)) {
>> -			dma_fence_put(s_job->s_fence->parent);
>> -			s_job->s_fence->parent = NULL;
> I vaguely remember Monk set parent to be NULL to avoiod potiential free
> problem after callback removal.
>
>
> -David

I see, we have to avoid setting it to NULL here as in case the guilty 
job does signal and we avoid HW reset we are not going to resubmit the 
jobs and hence stay with the same parent on reattachment of the cb. So I 
need to know exactly what scenario this set to NULL fixes.

Andrey


>
>
>>    			atomic_dec(&sched->hw_rq_count);
>>    		} else {
>>    			/*
>> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    				sched->ops->free_job(s_job);
>>    		}
>>    	}
>> +
>> +	/*
>> +	 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
>> +	 * avoids the pending timeout work in progress to fire right away after
>> +	 * this TDR finished and before the newly restarted jobs had a
>> +	 * chance to complete.
>> +	 */
>> +	cancel_delayed_work(&sched->work_tdr);
>>    }
>>    
>>    EXPORT_SYMBOL(drm_sched_stop);
>> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    		if (found_guilty && s_job->s_fence->scheduled.context == guilty_context)
>>    			dma_fence_set_error(&s_fence->finished, -ECANCELED);
>>    
>> +		dma_fence_put(s_job->s_fence->parent);
>>    		s_job->s_fence->parent = sched->ops->run_job(s_job);
>>    	}
>>    }


More information about the amd-gfx mailing list