[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)

Tue May 28 07:43:42 UTC 2024

On 28/05/2024 10:16, Dixit, Ashutosh wrote:
> On Mon, 27 May 2024 23:39:50 -0700, Lionel Landwerlin wrote:
>> On 28/05/2024 09:17, Dixit, Ashutosh wrote:
>>> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
>>> Hi Lionel,
>>>
>>>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
>>>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
>>>>>
>>>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
>>>>>>> +			     struct xe_oa_open_param *param)
>>>>>>> +{
>>>>> /snip/
>>>>>
>>>>>>> +	stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
>>>>>>> +						BIT(stream->hwe->logical_instance), 1,
>>>>>>> +						stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
>>>>>> Hi Ashutosh,
>>>>>>
>>>>>> On i915 the changes of configuration were pipelined in the application's
>>>>>> execution just like any other submission.
>>>>>>
>>>>>> Creating another queue completely unsynchronized from the application's
>>>>>> submissions makes this non usable in my opinion.
>>>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
>>>>> through stream properties, which can use to synchronize OA programming with
>>>>> workload submisson.
>>>>>
>>>>> Would that not work? If not, we can do what was done in i915. But note that
>>>>> i915 still has unresolved hangs, which I believe are due to the spinner
>>>>> running on the application engine (iirc repeatedly opening/closing an OA
>>>>> stream will hang in i915, though it could be due to other i915
>>>>> complexity). That is why thought using drm_xe_sync array is both safer and
>>>>> more standard way of doing what we want to achieve.
>>>>>
>>>>> Basically the output sync object will be signalled after registers are
>>>>> programmed and also any additional OA programming delay (which is
>>>>> implemented in i915 using the spinner).
>>>>>
>>>>> This would be done both for OA stream open and changing OA stream
>>>>> configuration.
>>> That is true. But now that I have other stuff like gpuvis wrapped up, I
>>> plan to start looking these couple of missing uapi pieces (hold preemption
>>> and synchronization, likely in that order).
>>>
>>> Because synchronization is not implemented I add the delay below:
>>>
>>>           static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
>>>           {
>>>           #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
>>>                   struct xe_oa_config_bo *oa_bo;
>>>                   int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
>>>
>>>                   oa_bo = xe_oa_alloc_config_buffer(stream);
>>>                   if (IS_ERR(oa_bo)) {
>>>                           err = PTR_ERR(oa_bo);
>>>                           goto exit;
>>>                   }
>>>
>>>                   err = xe_oa_submit_bb(stream, oa_bo->bb);
>>>
>>>                   /* Additional empirical delay needed for NOA programming after registers are written */
>>>                   usleep_range(us, 2 * us);
>>>           exit:
>>>                   return err;
>>>           }
>>>
>>> I need understand this is temporaty band-aid, since it stalls the
>>> submission pipeline and needs to be replaced by proper synchronization.
>>>
>>>> Just letting you know, because we cannot use the current ioctl because it
>>>> doesn't behave as we expect
>>> You wouldn't be able to merge the Mesa PR as per the current uapi now and
>>> then add additional Mesa patches, when we implement these couple of missing
>>> uapi features in KMD?
>>
>> We could merge only the stuff that parse the reports, that's enough to have
>> perfetto work.
> Hi Lionel, so this means the OAG buffer stuff, correct?

Correct

>
>> But all of the VK_KHR_performance_query won't work as far as I can tell.
> And this is the OAR stuff, synchronization and hold preemption?
>
> Yes even if we can merge the OAG stuff first, that will get most of KMD OA
> code merged and then we can merge the remaining stuff later. Since it's a
> pain to maintain the code out of tree, that's why even if I can merge the
> OAG stuff (together with Mesa) which is the majority of the OA KMD code
> anyway, that would be a big relief.

This is actually a mix of OAG/OAR. We do read a couple of OAG registers 
that don't come as part of MI_REPORT_PERF_COUNT.

We will need a way to tell when you've added the feature we need for 
VK_KHR_performance_query.

I don't think we want to test this at runtime given some of the ioctl 
take time.

-Lionel

>
> Thanks.
> --
> Ashutosh