[PATCH v2 00/11] Proper GT TLB invalidation layering and new coalescing feature.
Matthew Auld
matthew.auld at intel.com
Tue Jul 9 09:57:14 UTC 2024
Hi,
On 08/07/2024 05:03, Matthew Brost wrote:
> While debuging [1] an issue was identified in which if too many GT TLB
> invalidations are issued to the GuC, the GuC can get overwhelmed to the
> point scheduling of jobs starts to stall. To avoid this, hold and
> coalesce GT TLB invalidations in the KMD if a watermark of pending
> invalidations is past. Add gitlab for this issue has also been opened
> [2].
>
> Layering issues with GT TLB invalidations are known [3] which needed to
> be fixed first before adding this new feature.
>
> - Patches 1-8 fix the layering.
> - Patches 9-11 add coalescing feature.
>
> We could merge these two as seperate series if needed.
>
> CCing various stakeholders (Farah, Michal, Nirmoy) which have raised GT
> TLB invalidation issues in the past.
Maybe worth mentioning for [1], we try to process TLB invalidations
directly from the irq, however we also only process the g2h queue
in-order, so if there is something other than TLB invalidation or fault
earlier in the queue then we do nothing useful from the irq and just
return, that is until the wq can eventually process those earlier items
that couldn't be processed directly from the irq. In the past I have
seen TLB timeouts where the TLB invalidation is clearly in the g2h queue
(and has been for a while), but is stuck behind something earlier in the
queue that needs the wq, but system is under such a heavy load that the
wq can't be scheduled in a timely manner.
>
> v2:
> - Fix CI issues
> - Clean up some of the series / patch structure
>
> Matt
>
> [1] https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/799#note_2449497
> [2] https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2162
> [3] https://patchwork.freedesktop.org/series/133001/
>
> Matthew Brost (11):
> drm/xe: Add xe_gt_tlb_invalidation_fence_init helper
> drm/xe: Drop xe_gt_tlb_invalidation_wait
> drm/xe: s/tlb_invalidation.lock/tlb_invalidation.fence_lock
> drm/xe: Add tlb_invalidation.seqno_lock
> drm/xe: Add xe_gt_tlb_invalidation_done_handler
> drm/xe: Add send tlb invalidation helpers
> drm/xe: Add xe_guc_tlb_invalidation layer
> drm/xe: Add multi-client support for GT TLB invalidations
> drm/xe: Add GT TLB invalidation coalescing
> drm/xe: Add GT TLB invalidation coalesce tracepoints
> drm/xe: Add GT TLB invalidation watermark debugfs
>
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/xe_debugfs.c | 38 ++
> drivers/gpu/drm/xe/xe_device.c | 3 +
> drivers/gpu/drm/xe/xe_device_types.h | 5 +
> drivers/gpu/drm/xe/xe_ggtt.c | 21 +-
> drivers/gpu/drm/xe/xe_ggtt_types.h | 5 +
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 641 ++++++++++++------
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 26 +-
> .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h | 41 ++
> drivers/gpu/drm/xe/xe_gt_types.h | 43 +-
> drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
> drivers/gpu/drm/xe/xe_guc_tlb_invalidation.c | 145 ++++
> drivers/gpu/drm/xe/xe_guc_tlb_invalidation.h | 18 +
> drivers/gpu/drm/xe/xe_pt.c | 33 +-
> drivers/gpu/drm/xe/xe_trace.h | 10 +
> drivers/gpu/drm/xe/xe_vm.c | 45 +-
> drivers/gpu/drm/xe/xe_vm_types.h | 3 +
> 17 files changed, 801 insertions(+), 279 deletions(-)
> create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_invalidation.c
> create mode 100644 drivers/gpu/drm/xe/xe_guc_tlb_invalidation.h
>
More information about the Intel-xe
mailing list