[Patch v3] drm/ttm: Schedule delayed_delete worker closer
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Nov 13 16:30:15 UTC 2023
Am 11.11.23 um 14:08 schrieb Rajneesh Bhardwaj:
> Try to allocate system memory on the NUMA node the device is closest to
> and try to run delayed_delete workers on a CPU of this node as well.
>
> To optimize the memory clearing operation when a TTM BO gets freed by
> the delayed_delete worker, scheduling it closer to a NUMA node where the
> memory was initially allocated helps avoid the cases where the worker
> gets randomly scheduled on the CPU cores that are across interconnect
> boundaries such as xGMI, PCIe etc.
>
> This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
> APU platforms such as GFXIP9.4.3.
>
> Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
Reviewed-by: Christian König <christian.koenig at amd.com>
> ---
> Changes in v3:
> * Use WQ_UNBOUND to address the warning reported by CI pipeline.
>
> drivers/gpu/drm/ttm/ttm_bo.c | 8 +++++++-
> drivers/gpu/drm/ttm/ttm_device.c | 6 ++++--
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 5757b9415e37..6f28a77a565b 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -370,7 +370,13 @@ static void ttm_bo_release(struct kref *kref)
> spin_unlock(&bo->bdev->lru_lock);
>
> INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete);
> - queue_work(bdev->wq, &bo->delayed_delete);
> +
> + /* Schedule the worker on the closest NUMA node. This
> + * improves performance since system memory might be
> + * cleared on free and that is best done on a CPU core
> + * close to it.
> + */
> + queue_work_node(bdev->pool.nid, bdev->wq, &bo->delayed_delete);
> return;
> }
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 43e27ab77f95..bc97e3dd40f0 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -204,7 +204,8 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
> if (ret)
> return ret;
>
> - bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI, 16);
> + bdev->wq = alloc_workqueue("ttm",
> + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
> if (!bdev->wq) {
> ttm_global_release();
> return -ENOMEM;
> @@ -213,7 +214,8 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
> - ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
> +
> + ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
>
> bdev->vma_manager = vma_manager;
> spin_lock_init(&bdev->lru_lock);
More information about the dri-devel
mailing list