[PATCH] drm/xe/oa: Disallow OA from being enabled on active exec_queue's

Thu Nov 21 22:06:56 UTC 2024

On Tue, Nov 19, 2024 at 01:08:49PM -0800, Dixit, Ashutosh wrote:
>On Tue, 19 Nov 2024 06:44:51 -0800, Matthew Brost wrote:
>>
>> On Mon, Nov 18, 2024 at 05:32:56PM -0800, Ashutosh Dixit wrote:
>> > Enabling OA on an exec_queue toggles the OAC_CONTEXT_ENABLE bit in
>> > CTXT_SR_CTL register. Toggling this bit changes the size and layout of the
>> > underlying HW context image. Therefore, enabling OA on an already active
>> > exec_queue (as currently implemented in xe) is an invalid operation and can
>> > cause hangs. Therefore, disallow OA from being enabled on active
>> > exec_queue's (here, by active we mean a context on which submissions have
>> > previously happened).
>> >
>>
>> This is something we will need to keep on eye on then because in various
>> experimental code I've played around enabling exec queues upon creation.
>> e.g., If we want to allocate a doorbell. I seem to recall Habana wanting
>> to enable exec queues upon creation too.
>
>The real requirement here is that HW context image should not have been
>loaded before OA is enabled on the exec queue. That is what happens today
>in the ENABLED state, correct, when user space submissions start?
>
>If operations such as doorbell are only management requests to GuC (which
>don't cause HW context image to be loaded) and if we can name a new state
>when the exec queue is handed off to userspace for starting submissions, we
>should be able to stay with this approach.
>
>> Just curious if it was ever explored having exec queue creation
>> extension which enables OA? It seems like this is something we may need
>> at some point if our exec queue creation semantics change of course
>> being careful to not break existing flows.
>
>Yeah I did think of it but didn't want to change the uapi.
>
>Also, a different implementation is possible which avoids this resizing of
>the context image altogether. It requires the kernel OA code submit its
>submissions on the user exec queue (and use that exec queue's VM, currently
>OA code uses a kernel exec queue). There are some reasons I don't want to
>implement that just yet, but worst case, we can do that if absolutely
>needed.

Just curious about why we are not going with the LRI command to set this 
bit in the context control. I think it should just be an MI_LRI 
submitted to the ring directly. More in the lines of 
xe_lrc_write_ring(). If that works, then we can avoid this patch and 
also revert the one that Jose had added as a work around.

Regards,
Umesh

>
>Thanks.
>--
>Ashutosh