[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)

Tue May 28 06:39:50 UTC 2024

On 28/05/2024 09:17, Dixit, Ashutosh wrote:
> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
> Hi Lionel,
>
>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
>>>
>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
>>>>> +			     struct xe_oa_open_param *param)
>>>>> +{
>>> /snip/
>>>
>>>>> +	stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
>>>>> +						BIT(stream->hwe->logical_instance), 1,
>>>>> +						stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
>>>> Hi Ashutosh,
>>>>
>>>> On i915 the changes of configuration were pipelined in the application's
>>>> execution just like any other submission.
>>>>
>>>> Creating another queue completely unsynchronized from the application's
>>>> submissions makes this non usable in my opinion.
>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
>>> through stream properties, which can use to synchronize OA programming with
>>> workload submisson.
>>>
>>> Would that not work? If not, we can do what was done in i915. But note that
>>> i915 still has unresolved hangs, which I believe are due to the spinner
>>> running on the application engine (iirc repeatedly opening/closing an OA
>>> stream will hang in i915, though it could be due to other i915
>>> complexity). That is why thought using drm_xe_sync array is both safer and
>>> more standard way of doing what we want to achieve.
>>>
>>> Basically the output sync object will be signalled after registers are
>>> programmed and also any additional OA programming delay (which is
>>> implemented in i915 using the spinner).
>>>
>>> This would be done both for OA stream open and changing OA stream
>>> configuration.
> That is true. But now that I have other stuff like gpuvis wrapped up, I
> plan to start looking these couple of missing uapi pieces (hold preemption
> and synchronization, likely in that order).
>
> Because synchronization is not implemented I add the delay below:
>
>          static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
>          {
>          #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
>                  struct xe_oa_config_bo *oa_bo;
>                  int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
>
>                  oa_bo = xe_oa_alloc_config_buffer(stream);
>                  if (IS_ERR(oa_bo)) {
>                          err = PTR_ERR(oa_bo);
>                          goto exit;
>                  }
>
>                  err = xe_oa_submit_bb(stream, oa_bo->bb);
>
>                  /* Additional empirical delay needed for NOA programming after registers are written */
>                  usleep_range(us, 2 * us);
>          exit:
>                  return err;
>          }
>
> I need understand this is temporaty band-aid, since it stalls the
> submission pipeline and needs to be replaced by proper synchronization.
>
>> Just letting you know, because we cannot use the current ioctl because it
>> doesn't behave as we expect
> You wouldn't be able to merge the Mesa PR as per the current uapi now and
> then add additional Mesa patches, when we implement these couple of missing
> uapi features in KMD?

We could merge only the stuff that parse the reports, that's enough to 
have perfetto work.

But all of the VK_KHR_performance_query won't work as far as I can tell.

-Lionel

>
> Thanks.
> --
> Ashutosh