[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)
Dixit, Ashutosh
ashutosh.dixit at intel.com
Wed May 29 06:11:33 UTC 2024
On Tue, 28 May 2024 21:55:00 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 28 May 2024 00:43:42 -0700, Lionel Landwerlin wrote:
> >
>
> Hi Lionel/Jose,
>
> > On 28/05/2024 10:16, Dixit, Ashutosh wrote:
> > > On Mon, 27 May 2024 23:39:50 -0700, Lionel Landwerlin wrote:
> > >> On 28/05/2024 09:17, Dixit, Ashutosh wrote:
> > >>> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
> > >>>
> > >>>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
> > >>>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
> > >>>>>
> > >>>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
> > >>>>>>> + struct xe_oa_open_param *param)
> > >>>>>>> +{
> > >>>>> /snip/
> > >>>>>
> > >>>>>>> + stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
> > >>>>>>> + BIT(stream->hwe->logical_instance), 1,
> > >>>>>>> + stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
> > >>>>>> Hi Ashutosh,
> > >>>>>>
> > >>>>>> On i915 the changes of configuration were pipelined in the application's
> > >>>>>> execution just like any other submission.
> > >>>>>>
> > >>>>>> Creating another queue completely unsynchronized from the application's
> > >>>>>> submissions makes this non usable in my opinion.
> > >>>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
> > >>>>> through stream properties, which can use to synchronize OA programming with
> > >>>>> workload submisson.
> > >>>>>
> > >>>>> Would that not work? If not, we can do what was done in i915. But note that
> > >>>>> i915 still has unresolved hangs, which I believe are due to the spinner
> > >>>>> running on the application engine (iirc repeatedly opening/closing an OA
> > >>>>> stream will hang in i915, though it could be due to other i915
> > >>>>> complexity). That is why thought using drm_xe_sync array is both safer and
> > >>>>> more standard way of doing what we want to achieve.
> > >>>>>
> > >>>>> Basically the output sync object will be signalled after registers are
> > >>>>> programmed and also any additional OA programming delay (which is
> > >>>>> implemented in i915 using the spinner).
> > >>>>>
> > >>>>> This would be done both for OA stream open and changing OA stream
> > >>>>> configuration.
> > >>> That is true. But now that I have other stuff like gpuvis wrapped up, I
> > >>> plan to start looking these couple of missing uapi pieces (hold preemption
> > >>> and synchronization, likely in that order).
> > >>>
> > >>> Because synchronization is not implemented I add the delay below:
> > >>>
> > >>> static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
> > >>> {
> > >>> #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> > >>> struct xe_oa_config_bo *oa_bo;
> > >>> int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
> > >>>
> > >>> oa_bo = xe_oa_alloc_config_buffer(stream);
> > >>> if (IS_ERR(oa_bo)) {
> > >>> err = PTR_ERR(oa_bo);
> > >>> goto exit;
> > >>> }
> > >>>
> > >>> err = xe_oa_submit_bb(stream, oa_bo->bb);
> > >>>
> > >>> /* Additional empirical delay needed for NOA programming after registers are written */
> > >>> usleep_range(us, 2 * us);
> > >>> exit:
> > >>> return err;
> > >>> }
> > >>>
> > >>> I need understand this is temporaty band-aid, since it stalls the
> > >>> submission pipeline and needs to be replaced by proper synchronization.
> > >>>
> > >>>> Just letting you know, because we cannot use the current ioctl because it
> > >>>> doesn't behave as we expect
> > >>> You wouldn't be able to merge the Mesa PR as per the current uapi now and
> > >>> then add additional Mesa patches, when we implement these couple of missing
> > >>> uapi features in KMD?
> > >>
> > >> We could merge only the stuff that parse the reports, that's enough to have
> > >> perfetto work.
> > > Hi Lionel, so this means the OAG buffer stuff, correct?
> >
> > Correct
> >
> > >> But all of the VK_KHR_performance_query won't work as far as I can tell.
> > > And this is the OAR stuff, synchronization and hold preemption?
> > >
> >
> > This is actually a mix of OAG/OAR. We do read a couple of OAG registers
> > that don't come as part of MI_REPORT_PERF_COUNT.
> >
> > We will need a way to tell when you've added the feature we need for
> > VK_KHR_performance_query.
> >
> > I don't think we want to test this at runtime given some of the ioctl
> > take time.
>
> Hmm, you mean like a version? We have capability bits in Xe and could
> potentially add a bit if needed. It is in the oa_unit query so like init
> time, not runtime.
Though, maybe, just not implement in mesa yet and implement it only after
KMD exposes that property? Till then say VK_KHR_performance_query is not
supported?
>
>
> > > Yes even if we can merge the OAG stuff first, that will get most of KMD OA
> > > code merged and then we can merge the remaining stuff later. Since it's a
> > > pain to maintain the code out of tree, that's why even if I can merge the
> > > OAG stuff (together with Mesa) which is the majority of the OA KMD code
> > > anyway, that would be a big relief.
>
> So, do you think you Mesa guys can give us an ack to merge the uapi we have
> now? And add the missing uapi bits in the second round of OA uapi changes?
> Or is there anything else you want us to do in this first round?
>
> For the second round, here is the list I have:
>
> * hold preemption (needs some investigation as to feasibility)
> * xe_sync for OA stream configuration and reconfiguration
> * Equivalent of DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID. I am still trying
> to figure out if this is really needed. See here:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29312#note_2429650
>
> Since this seems to be using some MDAPI metrics in Mesa, I think it
> should be ok to do this in the second round too.
>
> Thanks.
> --
> Ashutosh
More information about the Intel-xe
mailing list