[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)

Wed May 29 04:55:00 UTC 2024

On Tue, 28 May 2024 00:43:42 -0700, Lionel Landwerlin wrote:
>

Hi Lionel/Jose,

> On 28/05/2024 10:16, Dixit, Ashutosh wrote:
> > On Mon, 27 May 2024 23:39:50 -0700, Lionel Landwerlin wrote:
> >> On 28/05/2024 09:17, Dixit, Ashutosh wrote:
> >>> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
> >>>
> >>>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
> >>>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
> >>>>>
> >>>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
> >>>>>>> +			     struct xe_oa_open_param *param)
> >>>>>>> +{
> >>>>> /snip/
> >>>>>
> >>>>>>> +	stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
> >>>>>>> +						BIT(stream->hwe->logical_instance), 1,
> >>>>>>> +						stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
> >>>>>> Hi Ashutosh,
> >>>>>>
> >>>>>> On i915 the changes of configuration were pipelined in the application's
> >>>>>> execution just like any other submission.
> >>>>>>
> >>>>>> Creating another queue completely unsynchronized from the application's
> >>>>>> submissions makes this non usable in my opinion.
> >>>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
> >>>>> through stream properties, which can use to synchronize OA programming with
> >>>>> workload submisson.
> >>>>>
> >>>>> Would that not work? If not, we can do what was done in i915. But note that
> >>>>> i915 still has unresolved hangs, which I believe are due to the spinner
> >>>>> running on the application engine (iirc repeatedly opening/closing an OA
> >>>>> stream will hang in i915, though it could be due to other i915
> >>>>> complexity). That is why thought using drm_xe_sync array is both safer and
> >>>>> more standard way of doing what we want to achieve.
> >>>>>
> >>>>> Basically the output sync object will be signalled after registers are
> >>>>> programmed and also any additional OA programming delay (which is
> >>>>> implemented in i915 using the spinner).
> >>>>>
> >>>>> This would be done both for OA stream open and changing OA stream
> >>>>> configuration.
> >>> That is true. But now that I have other stuff like gpuvis wrapped up, I
> >>> plan to start looking these couple of missing uapi pieces (hold preemption
> >>> and synchronization, likely in that order).
> >>>
> >>> Because synchronization is not implemented I add the delay below:
> >>>
> >>>           static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
> >>>           {
> >>>           #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> >>>                   struct xe_oa_config_bo *oa_bo;
> >>>                   int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
> >>>
> >>>                   oa_bo = xe_oa_alloc_config_buffer(stream);
> >>>                   if (IS_ERR(oa_bo)) {
> >>>                           err = PTR_ERR(oa_bo);
> >>>                           goto exit;
> >>>                   }
> >>>
> >>>                   err = xe_oa_submit_bb(stream, oa_bo->bb);
> >>>
> >>>                   /* Additional empirical delay needed for NOA programming after registers are written */
> >>>                   usleep_range(us, 2 * us);
> >>>           exit:
> >>>                   return err;
> >>>           }
> >>>
> >>> I need understand this is temporaty band-aid, since it stalls the
> >>> submission pipeline and needs to be replaced by proper synchronization.
> >>>
> >>>> Just letting you know, because we cannot use the current ioctl because it
> >>>> doesn't behave as we expect
> >>> You wouldn't be able to merge the Mesa PR as per the current uapi now and
> >>> then add additional Mesa patches, when we implement these couple of missing
> >>> uapi features in KMD?
> >>
> >> We could merge only the stuff that parse the reports, that's enough to have
> >> perfetto work.
> > Hi Lionel, so this means the OAG buffer stuff, correct?
>
> Correct
>
> >> But all of the VK_KHR_performance_query won't work as far as I can tell.
> > And this is the OAR stuff, synchronization and hold preemption?
> >
>
> This is actually a mix of OAG/OAR. We do read a couple of OAG registers
> that don't come as part of MI_REPORT_PERF_COUNT.
>
> We will need a way to tell when you've added the feature we need for
> VK_KHR_performance_query.
>
> I don't think we want to test this at runtime given some of the ioctl
> take time.

Hmm, you mean like a version? We have capability bits in Xe and could
potentially add a bit if needed. It is in the oa_unit query so like init
time, not runtime.

> > Yes even if we can merge the OAG stuff first, that will get most of KMD OA
> > code merged and then we can merge the remaining stuff later. Since it's a
> > pain to maintain the code out of tree, that's why even if I can merge the
> > OAG stuff (together with Mesa) which is the majority of the OA KMD code
> > anyway, that would be a big relief.

So, do you think you Mesa guys can give us an ack to merge the uapi we have
now? And add the missing uapi bits in the second round of OA uapi changes?
Or is there anything else you want us to do in this first round?

For the second round, here is the list I have:

* hold preemption (needs some investigation as to feasibility)
* xe_sync for OA stream configuration and reconfiguration
* Equivalent of DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID. I am still trying
  to figure out if this is really needed. See here:

  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29312#note_2429650

  Since this seems to be using some MDAPI metrics in Mesa, I think it
  should be ok to do this in the second round too.

Thanks.
--
Ashutosh