[PATCH] drm/xe/oa: Disallow OA from being enabled on active exec_queue's

Fri Nov 22 02:38:36 UTC 2024

On Thu, 21 Nov 2024 14:06:56 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Tue, Nov 19, 2024 at 01:08:49PM -0800, Dixit, Ashutosh wrote:
> > On Tue, 19 Nov 2024 06:44:51 -0800, Matthew Brost wrote:
> >>
> >> On Mon, Nov 18, 2024 at 05:32:56PM -0800, Ashutosh Dixit wrote:
> >> > Enabling OA on an exec_queue toggles the OAC_CONTEXT_ENABLE bit in
> >> > CTXT_SR_CTL register. Toggling this bit changes the size and layout of the
> >> > underlying HW context image. Therefore, enabling OA on an already active
> >> > exec_queue (as currently implemented in xe) is an invalid operation and can
> >> > cause hangs. Therefore, disallow OA from being enabled on active
> >> > exec_queue's (here, by active we mean a context on which submissions have
> >> > previously happened).
> >> >
> >>
> >> This is something we will need to keep on eye on then because in various
> >> experimental code I've played around enabling exec queues upon creation.
> >> e.g., If we want to allocate a doorbell. I seem to recall Habana wanting
> >> to enable exec queues upon creation too.
> >
> > The real requirement here is that HW context image should not have been
> > loaded before OA is enabled on the exec queue. That is what happens today
> > in the ENABLED state, correct, when user space submissions start?
> >
> > If operations such as doorbell are only management requests to GuC (which
> > don't cause HW context image to be loaded) and if we can name a new state
> > when the exec queue is handed off to userspace for starting submissions, we
> > should be able to stay with this approach.
> >
> >> Just curious if it was ever explored having exec queue creation
> >> extension which enables OA? It seems like this is something we may need
> >> at some point if our exec queue creation semantics change of course
> >> being careful to not break existing flows.
> >
> > Yeah I did think of it but didn't want to change the uapi.
> >
> > Also, a different implementation is possible which avoids this resizing of
> > the context image altogether. It requires the kernel OA code submit its
> > submissions on the user exec queue (and use that exec queue's VM, currently
> > OA code uses a kernel exec queue). There are some reasons I don't want to
> > implement that just yet, but worst case, we can do that if absolutely
> > needed.
>
> Just curious about why we are not going with the LRI command to set this
> bit in the context control. I think it should just be an MI_LRI submitted
> to the ring directly. More in the lines of xe_lrc_write_ring(). If that
> works, then we can avoid this patch and also revert the one that Jose had
> added as a work around.

Hmm, xe_lrc_write_ring does a direct write to the context image
(xe_map_memcpy_to) and does not use an LRI command? Akin to
xe_lrc_write_ctx_reg, which is what you would use to toggle the bit.

My answer to why not do what I have outlined above, or what you are
suggesting, is basically that there is no UMD use case for it. Whatever is
there (submitting to a kernel exec queue on a kernel VM) is already
sufficient. Even this patch is not needed. But there is a hole which I want
to plug, in case someone submits to a exec queue on which OA has not been
already been enabled. Instead of seeing hangs, they will see an error
return.

Doing what I outlined above (submitting to the user exec queue), or doing
what you are suggesting (submitting directly to the ring), is not an
insignificant amount of work. E.g. submitting a MI_LRI command to the ring
requires allocating the batch buffer in the user/exec_queue VM. And even
then writing directly to the ring would race with the scheduler potentially
submitting simultaneously to the same ring (see guc_exec_queue_run_job).

We could do it if someone needed it, or if we had nothing better to do, but
neither of it is the case at present.

At present Mesa is the only user for OAR and they seem to be happy with
what we have currently in the kernel. It is also possible that the OAR
feature may be discontinued in the future. MDAPI doesn't find it
particularly useful and uses MMIO trigger e.g.

Note we are not changing the uapi. Uapi still supports enabling OAR on an
exec queue without restrictions. Just that the implementation backing that
general uapi is not there at present and the implementation which is there
is sufficient for current use cases.

So, in summary, we could do the general implementation if there were a need
for it, but that does not seem to be the case at present.

Thanks.
--
Ashutosh