[PATCH] drm/xe/pxp: Don't kill queues while holding the spinlock

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Thu Feb 13 17:23:26 UTC 2025



On 2/12/2025 10:42 PM, Dan Carpenter wrote:
> On Wed, Feb 12, 2025 at 05:26:55PM -0800, Matthew Brost wrote:
>> On Wed, Feb 12, 2025 at 04:40:32PM -0800, Daniele Ceraolo Spurio wrote:
>>> xe_exec_queue_kill can sleep, so we can't call it from under the lock.
>>> We can instead move the queues to a separate list and then kill them all
>>> after we release the lock.
>>>
>>> Since being in the list is used to track whether RPM cleanup is needed,
>>> we can no longer defer that to queue_destroy, so we perform it
>>> immediately instead.
>>>
>>> Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
>>> Fixes: f8caa80154c4 ("drm/xe/pxp: Add PXP queue tracking and session start")
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Patch LGTM but can this actually happen though? i.e. Can or do we enable
>> PXP on LR queues?
>>
> This isn't really an answer to your question, but when I reported this
> bug I didn't notice the if (xe_vm_in_preempt_fence_mode()) check in
> xe_vm_remove_compute_exec_queue().  So it's possible that this was a
> false positive?

We currently don't have a use-case where we need a vm in 
preempt_fence_mode for a queue that uses PXP, but I didn't block the 
combination because there is a chance we might want to use it in the 
future (compute PXP is supported by the HW, even if we don't currently 
support it in Xe), so a user can still set things up that way.

>
>> Also as a follow should be add a might_sleep() to xe_exec_queue_kill to
>> catch this type of bug immediately?
> There is a might_sleep() in down_write().  If this is a real bug that
> would have caught it.  The problem is that people don't generally test
> with CONFIG_DEBUG_ATOMIC_SLEEP so the might_sleep() calls are turned off.

We do have CONFIG_DEBUG_ATOMIC_SLEEP enabled in CI (and I have it 
locally since I use the CI config), but since PXP + preempt_fence_mode 
is not an expected use-case we don't have any tests that cover that 
combination, so we return early from that 
xe_vm_remove_compute_exec_queue() and don't hit the 
down_write/might_sleep. I'll see if I can add a test to cover it, as 
there might be other issues I've missed.
Also, I don't think it'd be right to add a might_sleep at the top of the 
exec_queue_kill() function either, because if a caller is sure that 
xe_vm_in_preempt_fence_mode() is false they should be allowed to 
call exec_queue_kill() from atomic context.

Thanks,
Daniele

>
> regards,
> dan carpenter
>



More information about the Intel-xe mailing list