<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 7/9/2024 6:35 PM, Matthew Brost
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:Zo1m7r+f0kfXsUSC@DUT025-TGLU.fm.intel.com">
<pre class="moz-quote-pre" wrap="">On Tue, Jul 09, 2024 at 06:08:54PM +0200, Nirmoy Das wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
On 7/9/2024 11:57 AM, Matthew Auld wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi,
On 08/07/2024 05:03, Matthew Brost wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">While debuging [1] an issue was identified in which if too many GT TLB
invalidations are issued to the GuC, the GuC can get overwhelmed to the
point scheduling of jobs starts to stall. To avoid this, hold and
coalesce GT TLB invalidations in the KMD if a watermark of pending
invalidations is past. Add gitlab for this issue has also been opened
[2].
Layering issues with GT TLB invalidations are known [3] which needed to
be fixed first before adding this new feature.
- Patches 1-8 fix the layering.
- Patches 9-11 add coalescing feature.
We could merge these two as seperate series if needed.
CCing various stakeholders (Farah, Michal, Nirmoy) which have raised GT
TLB invalidation issues in the past.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Maybe worth mentioning for [1], we try to process TLB invalidations
directly from the irq, however we also only process the g2h queue
in-order, so if there is something other than TLB invalidation or fault
earlier in the queue then we do nothing useful from the irq and just
return, that is until the wq can eventually process those earlier items
that couldn't be processed directly from the irq. In the past
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Seen this recently :
<3> [3763.731822] xe 0000:03:00.0: [drm] *ERROR* GT0: g2h outstanding: 611
<snip>
<6> [3727.857273] [IGT] xe_evict: executing
<3> [3730.165480] xe 0000:03:00.0: [drm] *ERROR* TILE0 [GTT] GT0: TLB
invalidation time'd out, seqno=26858, recv=2685
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Missing the last digit of '2685'?</pre>
</blockquote>
oops, yes: <br>
<div
style="font-family: Roboto, Oxygen-Sans, Ubuntu, Cantarell, sans-serif; color: rgb(0, 0, 0); font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: nowrap; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><span
id="dmesg-warnings1" class="dmesg dmesg-warnings"
style="font-family: monospace; color: orangered; font-weight: bold; white-space: pre; overflow-wrap: normal;"><3> [3730.165480] xe 0000:03:00.0: [drm] *ERROR* TILE0 [GTT] GT0: TLB invalidation time'd out, seqno=26858, recv=26857</span></div>
<br class="Apple-interchange-newline">
<br>
<blockquote type="cite"
cite="mid:Zo1m7r+f0kfXsUSC@DUT025-TGLU.fm.intel.com">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Which I think fits your description. This series should help but not sure
how much.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
>From arch level if this is a continued problem, perhaps we should ask
for a dedicated G2H queue for TLB invalidation done responses. It seems
like a fairly reasonable ask to me as TLB invalidations really shouldn't
get stuck behind other G2H processing...
</pre>
</blockquote>
<p>Yes, that should work really well. I am currently trying out this
series but haven't manged to reproduce the issue without/without
the series reliably yet. <br>
</p>
<p><br>
</p>
<p>Regards,</p>
<p>Nirmoy<br>
</p>
<blockquote type="cite"
cite="mid:Zo1m7r+f0kfXsUSC@DUT025-TGLU.fm.intel.com">
<pre class="moz-quote-pre" wrap="">
Matt
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Regards,
Nirmoy
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I have seen TLB timeouts where the TLB invalidation is clearly in the
g2h queue (and has been for a while), but is stuck behind something
earlier in the queue that needs the wq, but system is under such a heavy
load that the wq can't be scheduled in a timely manner.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
v2:
- Fix CI issues
- Clean up some of the series / patch structure
Matt
[1]
<a class="moz-txt-link-freetext" href="https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/799#note_2449497">https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/799#note_2449497</a>
[2] <a class="moz-txt-link-freetext" href="https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2162">https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2162</a>
[3] <a class="moz-txt-link-freetext" href="https://patchwork.freedesktop.org/series/133001/">https://patchwork.freedesktop.org/series/133001/</a>
Matthew Brost (11):
drm/xe: Add xe_gt_tlb_invalidation_fence_init helper
drm/xe: Drop xe_gt_tlb_invalidation_wait
drm/xe: s/tlb_invalidation.lock/tlb_invalidation.fence_lock
drm/xe: Add tlb_invalidation.seqno_lock
drm/xe: Add xe_gt_tlb_invalidation_done_handler
drm/xe: Add send tlb invalidation helpers
drm/xe: Add xe_guc_tlb_invalidation layer
drm/xe: Add multi-client support for GT TLB invalidations
drm/xe: Add GT TLB invalidation coalescing
drm/xe: Add GT TLB invalidation coalesce tracepoints
drm/xe: Add GT TLB invalidation watermark debugfs
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_debugfs.c | 38 ++
drivers/gpu/drm/xe/xe_device.c | 3 +
drivers/gpu/drm/xe/xe_device_types.h | 5 +
drivers/gpu/drm/xe/xe_ggtt.c | 21 +-
drivers/gpu/drm/xe/xe_ggtt_types.h | 5 +
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 641 ++++++++++++------
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 26 +-
.../gpu/drm/xe/xe_gt_tlb_invalidation_types.h | 41 ++
drivers/gpu/drm/xe/xe_gt_types.h | 43 +-
drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
drivers/gpu/drm/xe/xe_guc_tlb_invalidation.c | 145 ++++
drivers/gpu/drm/xe/xe_guc_tlb_invalidation.h | 18 +
drivers/gpu/drm/xe/xe_pt.c | 33 +-
drivers/gpu/drm/xe/xe_trace.h | 10 +
drivers/gpu/drm/xe/xe_vm.c | 45 +-
drivers/gpu/drm/xe/xe_vm_types.h | 3 +
17 files changed, 801 insertions(+), 279 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_invalidation.c
create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_invalidation.h
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>