[PATCH v2] drm/xe: flush gtt before signalling user fence on all engines

Mon Jun 3 08:11:07 UTC 2024


On 03.06.2024 09:35, Thomas Hellström wrote:
> On Thu, 2024-05-30 at 20:45 +0000, Matthew Brost wrote:
>> On Thu, May 30, 2024 at 01:17:32PM +0200, Thomas Hellström wrote:
>>> Hi, All.
>>>
>>> I was looking at this patch for drm-xe-fixes but it doesn't look
>>> correct to me.
>>>
>>> First, AFAICT, the "emit flush imm ggtt" means that we're flushing
>>> outstanding / posted writes, and then write a DW to a ggtt address,
>>> so
>>> we're not really "flushing gtt"
>>>
>> So, is this a bad name? I think I agree. It could have been a
>> holdover
>> from the i915 names. Maybe we should do a cleanup in xe_ring_ops
>> soon?
>>
>> Or are you saying that the existing emit_flush_imm_ggtt is not
>> sufficient to ensure all writes from batches are visible? If this
>> were
>> true, I would think we'd have all sorts of problems popping up.
> It was more the title of the patch that says "flush gtt" when I think
> it should say "flush writes" or something similar.
>
>
>>> Second, I don't think we have anything left that explicitly flushes
>>> the
>>> posted write of the user-fence value?
>>>
>> I think this might be true. So there could be a case where we get an
>> IRQ
>> and the user fence value is not yet visible?
> Yes, exactly.
>
>> Not an expert ring programming but are instructions to store a dword
>> which make these immediately visible? If so, I think that is what
>> should
>> be used.
> There are various options here, using various variants of MI_FLUSH_DW
> and pipe_control, and I'm not sure what would be the most performant
> but I think the simplest solution would be to revert the patch and just
> emit an additional MI_FLUSH_DW as a write barrier before emitting the
> posted userptr value.

As the patch already landed I have posted fix for it:
https://patchwork.freedesktop.org/series/134354/

Regards
Andrzej

>
>>
>>
>> We should also probably check how downstream i915 did this too.
>>
>>> and finally the seqno fence now gets flushed before the user-fence.
>>> Perhaps that's not a bad thing, though.
>>>
>> I don't think this is an issue, I can't think of a case where this
>> reordering would create a problem.
>>
>> Matt
> /Thomas
>
>>   
>>> /Thomas
>>>
>>>
>>> On Wed, 2024-05-22 at 09:27 +0200, Andrzej Hajda wrote:
>>>> Tests show that user fence signalling requires kind of write
>>>> barrier,
>>>> otherwise not all writes performed by the workload will be
>>>> available
>>>> to userspace. It is already done for render and compute, we need
>>>> it
>>>> also for the rest: video, gsc, copy.
>>>>
>>>> v2: added gsc and copy engines, added fixes and r-b tags
>>>>
>>>> Closes:
>>>> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488
>>>> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for
>>>> Intel
>>>> GPUs")
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda at intel.com>
>>>> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>>>> ---
>>>> Changes in v2:
>>>> - Added fixes and r-b tags
>>>> - Link to v1:
>>>> https://lore.kernel.org/r/20240521-xu_flush_vcs_before_ufence-v1-1-ded38b56c8c9@intel.com
>>>> ---
>>>> Matthew,
>>>>
>>>> I have extended patch to copy and gsc engines. I have kept your
>>>> r-b,
>>>> since the change is similar, I hope it is OK.
>>>> ---
>>>>   drivers/gpu/drm/xe/xe_ring_ops.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
>>>> b/drivers/gpu/drm/xe/xe_ring_ops.c
>>>> index a3ca718456f6..a46a1257a24f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>>>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>>>> @@ -234,13 +234,13 @@ static void __emit_job_gen12_simple(struct
>>>> xe_sched_job *job, struct xe_lrc *lrc
>>>>   
>>>>   	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
>>>>   
>>>> +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
>>>> seqno,
>>>> false, dw, i);
>>>> +
>>>>   	if (job->user_fence.used)
>>>>   		i = emit_store_imm_ppgtt_posted(job-
>>>>> user_fence.addr,
>>>>   						job-
>>>>> user_fence.value,
>>>>   						dw, i);
>>>>   
>>>> -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
>>>> seqno,
>>>> false, dw, i);
>>>> -
>>>>   	i = emit_user_interrupt(dw, i);
>>>>   
>>>>   	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
>>>> @@ -293,13 +293,13 @@ static void __emit_job_gen12_video(struct
>>>> xe_sched_job *job, struct xe_lrc *lrc,
>>>>   
>>>>   	i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
>>>>   
>>>> +	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
>>>> seqno,
>>>> false, dw, i);
>>>> +
>>>>   	if (job->user_fence.used)
>>>>   		i = emit_store_imm_ppgtt_posted(job-
>>>>> user_fence.addr,
>>>>   						job-
>>>>> user_fence.value,
>>>>   						dw, i);
>>>>   
>>>> -	i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc),
>>>> seqno,
>>>> false, dw, i);
>>>> -
>>>>   	i = emit_user_interrupt(dw, i);
>>>>   
>>>>   	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
>>>>
>>>> ---
>>>> base-commit: 188ced1e0ff892f0948f20480e2e0122380ae46d
>>>> change-id: 20240521-xu_flush_vcs_before_ufence-a7b45d94cf33
>>>>
>>>> Best regards,