[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Tue May 28 06:39:50 UTC 2024
On 28/05/2024 09:17, Dixit, Ashutosh wrote:
> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
> Hi Lionel,
>
>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
>>>
>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
>>>>> + struct xe_oa_open_param *param)
>>>>> +{
>>> /snip/
>>>
>>>>> + stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
>>>>> + BIT(stream->hwe->logical_instance), 1,
>>>>> + stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
>>>> Hi Ashutosh,
>>>>
>>>> On i915 the changes of configuration were pipelined in the application's
>>>> execution just like any other submission.
>>>>
>>>> Creating another queue completely unsynchronized from the application's
>>>> submissions makes this non usable in my opinion.
>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
>>> through stream properties, which can use to synchronize OA programming with
>>> workload submisson.
>>>
>>> Would that not work? If not, we can do what was done in i915. But note that
>>> i915 still has unresolved hangs, which I believe are due to the spinner
>>> running on the application engine (iirc repeatedly opening/closing an OA
>>> stream will hang in i915, though it could be due to other i915
>>> complexity). That is why thought using drm_xe_sync array is both safer and
>>> more standard way of doing what we want to achieve.
>>>
>>> Basically the output sync object will be signalled after registers are
>>> programmed and also any additional OA programming delay (which is
>>> implemented in i915 using the spinner).
>>>
>>> This would be done both for OA stream open and changing OA stream
>>> configuration.
> That is true. But now that I have other stuff like gpuvis wrapped up, I
> plan to start looking these couple of missing uapi pieces (hold preemption
> and synchronization, likely in that order).
>
> Because synchronization is not implemented I add the delay below:
>
> static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
> {
> #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> struct xe_oa_config_bo *oa_bo;
> int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
>
> oa_bo = xe_oa_alloc_config_buffer(stream);
> if (IS_ERR(oa_bo)) {
> err = PTR_ERR(oa_bo);
> goto exit;
> }
>
> err = xe_oa_submit_bb(stream, oa_bo->bb);
>
> /* Additional empirical delay needed for NOA programming after registers are written */
> usleep_range(us, 2 * us);
> exit:
> return err;
> }
>
> I need understand this is temporaty band-aid, since it stalls the
> submission pipeline and needs to be replaced by proper synchronization.
>
>> Just letting you know, because we cannot use the current ioctl because it
>> doesn't behave as we expect
> You wouldn't be able to merge the Mesa PR as per the current uapi now and
> then add additional Mesa patches, when we implement these couple of missing
> uapi features in KMD?
We could merge only the stuff that parse the reports, that's enough to
have perfetto work.
But all of the VK_KHR_performance_query won't work as far as I can tell.
-Lionel
>
> Thanks.
> --
> Ashutosh
More information about the Intel-xe
mailing list