[PATCH v6 07/11] drm/xe: Assert runnable state in handle_sched_done

Thu Jun 13 00:54:30 UTC 2024

On 6/12/2024 15:27, Matthew Brost wrote:
> On Wed, Jun 12, 2024 at 02:25:40PM -0700, John Harrison wrote:
>> On 6/11/2024 07:40, Matthew Brost wrote:
>>> Ensure G2H and KMD GuC machine match.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_guc_submit.c | 9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> index afd22a8d815d..ab0dc93d7740 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> @@ -1592,16 +1592,21 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>>>    	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
>>>    }
>>> -static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q)
>>> +static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>>> +			      u32 runnable_state)
>>>    {
>>>    	trace_xe_exec_queue_scheduling_done(q);
>>>    	if (exec_queue_pending_enable(q)) {
>>> +		xe_gt_assert(guc_to_gt(guc), runnable_state == 1);
>>> +
>>>    		q->guc->resume_time = ktime_get();
>>>    		clear_exec_queue_pending_enable(q);
>>>    		smp_wmb();
>>>    		wake_up_all(&guc->ct.wq);
>>>    	} else {
>>> +		xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
>>> +
>> Isn't this the wrong way around?
>>
> These asserts are per my testing and CI.
>
>> You made an earlier comment that sounded like it is legal for an enable to
>> be queued while a disable is still pending? If so, then you would get in
> Other way around, a disable can be sent when an enable is still in
> flight in the case of a preempt fence.
>
> Enables cannot be issued when a pending disable is in flight.
>
> So I believe this patch is correct.
It might work but it does not feel correct.

On receipt of a disable notification, the code first checks to see if 
there is a pending enable. That just seems backwards. It is more logical 
to process the message according to the message type actually received 
rather than according to the message type that is expected.

If there is ever a valid reason for sending back to back 
disable-then-enable, then this will break. Whereas, coding it according 
to the notification type rather than the internal state would allow that 
sequence to work just fine.

As I mentioned earlier, this code is basically broken in i915 and can't 
be fixed without a significant re-write of the upper layers. It would be 
greatly preferable to do it properly in Xe.

John.

>
> Matt
>
>> here for the disable notification but with both enable_pending and
>> disable_pending flags set. That would hit the assert. Whereas, if the if
>> checks the runnable_state parameter and the assert then checks for pending,
>> you will not hit the assert and the code will do the correct thing.
>>
>> John.
>>
>>>    		clear_exec_queue_pending_disable(q);
>>>    		if (q->guc->suspend_pending) {
>>>    			suspend_fence_signal(q);
>>> @@ -1640,7 +1645,7 @@ int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>    		return -EPROTO;
>>>    	}
>>> -	handle_sched_done(guc, q);
>>> +	handle_sched_done(guc, q, runnable_state);
>>>    	return 0;
>>>    }