[PATCH v3 4/5] drm/xe: Wait on killed exec queues

Lucas De Marchi lucas.demarchi at intel.com
Mon Nov 4 17:54:04 UTC 2024


On Mon, Nov 04, 2024 at 08:29:40AM -0800, Matthew Brost wrote:
>On Mon, Nov 04, 2024 at 06:38:14AM -0800, Lucas De Marchi wrote:
>> When an exec queue is killed it triggers an async process of asking the
>> GuC to schedule the context out. The timestamp in the context image is
>> only updated when this process completes. In case a userspace process
>> kills an exec and tries to read the timestamp, it may not get an updated
>> runtime.
>>
>> Add synchronization between the process reading the fdinfo and the exec
>> queue being killed. After reading all the timestamps, wait on exec
>> queues in the process of being killed. When that wait is over,
>> xe_exec_queue_fini() was already called and updated the timestamps.
>>
>> v2: Do not update pending_removal before validating user args
>>     (Matthew Auld)
>>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
>> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_device_types.h | 5 +++++
>>  drivers/gpu/drm/xe/xe_drm_client.c   | 7 +++++++
>>  drivers/gpu/drm/xe/xe_exec_queue.c   | 6 ++++++
>>  3 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index cb193234c7eca..ed6b34d4a8030 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -605,6 +605,11 @@ struct xe_file {
>>  		 * which does things while being held.
>>  		 */
>>  		struct mutex lock;
>> +		/**
>> +		 * @exec_queue.pending_removal: items pending to be removed to
>> +		 * synchronize GPU state update with ongoing query.
>> +		 */
>> +		atomic_t pending_removal;
>
>Would the interface in 'linux/completion.h' be better here?

I think it'd be counter-intuitive in this context

1) most of the time it's "done" rather than "pending", which is the opposite 
    of the completion interface - note that you can't call complete_all()
    multiple times

2) we'd still need a counter to support multiple queues being killed if
    we keep it on xef. Imagine userspace did:

    	for q in queues:
		ioctl_wrapper(fd, DRM_XE_EXEC_QUEUE_DESTROY, ...)

Not clear how you'd do the complete_all() thought. Could probably be
more easily done if the completion was in the exec_queue, but the task
reading the fdinfo doesn't have access since it's removed from the array

did you have a different way to integrate the complete in mind?

after writing this I'm thinking the wait could be the first thing when
reading the fdinfo, even before  taking the force wake. That would
synchronize well with the scenarion in which it's the same task doing
the destroy/read-fdinfo and for others it wouldn't really matter.

Lucas De Marchi


More information about the Intel-xe mailing list