✓ CI.checkpatch: success for series starting with [v6,1/2] drm/ttm: Add a flag to allow drivers to skip clear-on-free
Patchwork
patchwork at emeril.freedesktop.org
Fri Aug 16 14:25:51 UTC 2024
== Series Details ==
Series: series starting with [v6,1/2] drm/ttm: Add a flag to allow drivers to skip clear-on-free
URL : https://patchwork.freedesktop.org/series/137396/
State : success
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
9fe5037901cabbcdf27a6fe0dfb047ca1474d363
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 7756d740c34138ac4e6b3e68bbe8ca2bd54cc8fa
Author: Nirmoy Das <nirmoy.das at intel.com>
Date: Fri Aug 16 15:51:54 2024 +0200
drm/xe/lnl: Offload system clear page activity to GPU
On LNL because of flat CCS, driver creates migrates job to clear
CCS meta data. Extend that to also clear system pages using GPU.
Inform TTM to allocate pages without __GFP_ZERO to avoid double page
clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set
TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clear
on free as XE now takes care of clearing pages. If a bo is in system
placement such as BO created with DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING
and there is a cpu map then for such BO gpu clear will be avoided as
there is no dma mapping for such BO at that moment to create migration
jobs.
Tested this patch api_overhead_benchmark_l0 from
https://github.com/intel/compute-benchmarks
Without the patch:
api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
UsmMemoryAllocation(api=l0 type=Host size=4KB) 84.206 us
UsmMemoryAllocation(api=l0 type=Host size=1GB) 105775.56 us
erf tool top 5 entries:
71.44% api_overhead_be [kernel.kallsyms] [k] clear_page_erms
6.34% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page
2.24% api_overhead_be [kernel.kallsyms] [k] cpa_flush
2.15% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable
1.94% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res
With the patch:
api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us
UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us
Perf tool top 5 entries:
11.16% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page
7.85% api_overhead_be [kernel.kallsyms] [k] cpa_flush
7.59% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res
7.24% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable
5.53% api_overhead_be [kernel.kallsyms] [k] lookup_address_in_pgd_attr
Without this patch clear_page_erms() dominates execution time which is
also not pipelined with migration jobs. With this patch page clearing
will get pipelined with migration job and will free CPU for more work.
v2: Handle regression on dgfx(Himal)
Update commit message as no ttm API changes needed.
v3: Fix Kunit test.
v4: handle data leak on cpu mmap(Thomas)
v5: s/gpu_page_clear/gpu_page_clear_sys and move setting
it to xe_ttm_sys_mgr_init() and other nits (Matt Auld)
v6: Disable it when init_on_alloc and/or init_on_free is active(Matt)
Use compute-benchmarks as reporter used it to report this
allocation latency issue also a proper test application than mime.
In v5, the test showed significant reduction in alloc latency but
that is not the case any more, I think this was mostly because
previous test was done on IFWI which had low mem BW from CPU.
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom at linux.intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
+ /mt/dim checkpatch cfdb0d68f7d07eecfafb5fda99e6dc313359d425 drm-intel
5037f5c58193 drm/ttm: Add a flag to allow drivers to skip clear-on-free
7756d740c341 drm/xe/lnl: Offload system clear page activity to GPU
More information about the Intel-xe
mailing list