[PATCH 07/17] drm/xe/oa: OA stream initialization (OAG)
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Wed May 29 06:29:02 UTC 2024
On 29/05/2024 09:11, Dixit, Ashutosh wrote:
> On Tue, 28 May 2024 21:55:00 -0700, Dixit, Ashutosh wrote:
>> On Tue, 28 May 2024 00:43:42 -0700, Lionel Landwerlin wrote:
>> Hi Lionel/Jose,
>>
>>> On 28/05/2024 10:16, Dixit, Ashutosh wrote:
>>>> On Mon, 27 May 2024 23:39:50 -0700, Lionel Landwerlin wrote:
>>>>> On 28/05/2024 09:17, Dixit, Ashutosh wrote:
>>>>>> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote:
>>>>>>
>>>>>>> On 28/05/2024 08:27, Dixit, Ashutosh wrote:
>>>>>>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote:
>>>>>>>>
>>>>>>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream,
>>>>>>>>>> + struct xe_oa_open_param *param)
>>>>>>>>>> +{
>>>>>>>> /snip/
>>>>>>>>
>>>>>>>>>> + stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
>>>>>>>>>> + BIT(stream->hwe->logical_instance), 1,
>>>>>>>>>> + stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0);
>>>>>>>>> Hi Ashutosh,
>>>>>>>>>
>>>>>>>>> On i915 the changes of configuration were pipelined in the application's
>>>>>>>>> execution just like any other submission.
>>>>>>>>>
>>>>>>>>> Creating another queue completely unsynchronized from the application's
>>>>>>>>> submissions makes this non usable in my opinion.
>>>>>>>> As we discussed previously, the plan here is to provide a drm_xe_sync array,
>>>>>>>> through stream properties, which can use to synchronize OA programming with
>>>>>>>> workload submisson.
>>>>>>>>
>>>>>>>> Would that not work? If not, we can do what was done in i915. But note that
>>>>>>>> i915 still has unresolved hangs, which I believe are due to the spinner
>>>>>>>> running on the application engine (iirc repeatedly opening/closing an OA
>>>>>>>> stream will hang in i915, though it could be due to other i915
>>>>>>>> complexity). That is why thought using drm_xe_sync array is both safer and
>>>>>>>> more standard way of doing what we want to achieve.
>>>>>>>>
>>>>>>>> Basically the output sync object will be signalled after registers are
>>>>>>>> programmed and also any additional OA programming delay (which is
>>>>>>>> implemented in i915 using the spinner).
>>>>>>>>
>>>>>>>> This would be done both for OA stream open and changing OA stream
>>>>>>>> configuration.
>>>>>> That is true. But now that I have other stuff like gpuvis wrapped up, I
>>>>>> plan to start looking these couple of missing uapi pieces (hold preemption
>>>>>> and synchronization, likely in that order).
>>>>>>
>>>>>> Because synchronization is not implemented I add the delay below:
>>>>>>
>>>>>> static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
>>>>>> {
>>>>>> #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
>>>>>> struct xe_oa_config_bo *oa_bo;
>>>>>> int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
>>>>>>
>>>>>> oa_bo = xe_oa_alloc_config_buffer(stream);
>>>>>> if (IS_ERR(oa_bo)) {
>>>>>> err = PTR_ERR(oa_bo);
>>>>>> goto exit;
>>>>>> }
>>>>>>
>>>>>> err = xe_oa_submit_bb(stream, oa_bo->bb);
>>>>>>
>>>>>> /* Additional empirical delay needed for NOA programming after registers are written */
>>>>>> usleep_range(us, 2 * us);
>>>>>> exit:
>>>>>> return err;
>>>>>> }
>>>>>>
>>>>>> I need understand this is temporaty band-aid, since it stalls the
>>>>>> submission pipeline and needs to be replaced by proper synchronization.
>>>>>>
>>>>>>> Just letting you know, because we cannot use the current ioctl because it
>>>>>>> doesn't behave as we expect
>>>>>> You wouldn't be able to merge the Mesa PR as per the current uapi now and
>>>>>> then add additional Mesa patches, when we implement these couple of missing
>>>>>> uapi features in KMD?
>>>>> We could merge only the stuff that parse the reports, that's enough to have
>>>>> perfetto work.
>>>> Hi Lionel, so this means the OAG buffer stuff, correct?
>>> Correct
>>>
>>>>> But all of the VK_KHR_performance_query won't work as far as I can tell.
>>>> And this is the OAR stuff, synchronization and hold preemption?
>>>>
>>> This is actually a mix of OAG/OAR. We do read a couple of OAG registers
>>> that don't come as part of MI_REPORT_PERF_COUNT.
>>>
>>> We will need a way to tell when you've added the feature we need for
>>> VK_KHR_performance_query.
>>>
>>> I don't think we want to test this at runtime given some of the ioctl
>>> take time.
>> Hmm, you mean like a version? We have capability bits in Xe and could
>> potentially add a bit if needed. It is in the oa_unit query so like init
>> time, not runtime.
> Though, maybe, just not implement in mesa yet and implement it only after
> KMD exposes that property? Till then say VK_KHR_performance_query is not
> supported?
Yeah but how do I know the kernel I'm running on supports this feature?
A version number could help or whatever query we can do on Xe.
Thanks,
-Lionel
>
>>
>>>> Yes even if we can merge the OAG stuff first, that will get most of KMD OA
>>>> code merged and then we can merge the remaining stuff later. Since it's a
>>>> pain to maintain the code out of tree, that's why even if I can merge the
>>>> OAG stuff (together with Mesa) which is the majority of the OA KMD code
>>>> anyway, that would be a big relief.
>> So, do you think you Mesa guys can give us an ack to merge the uapi we have
>> now? And add the missing uapi bits in the second round of OA uapi changes?
>> Or is there anything else you want us to do in this first round?
>>
>> For the second round, here is the list I have:
>>
>> * hold preemption (needs some investigation as to feasibility)
>> * xe_sync for OA stream configuration and reconfiguration
>> * Equivalent of DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID. I am still trying
>> to figure out if this is really needed. See here:
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29312#note_2429650
>>
>> Since this seems to be using some MDAPI metrics in Mesa, I think it
>> should be ok to do this in the second round too.
>>
>> Thanks.
>> --
>> Ashutosh
More information about the Intel-xe
mailing list