[PATCH] [RFC]drm/ttm: fix scheduling balance

Thomas Hellstrom thomas at shipmail.org
Thu Jan 25 14:57:00 UTC 2018


On 01/25/2018 10:59 AM, Chunming Zhou wrote:
> there is a scheduling balance issue about get node like:
> a. process A allocates full memory and use it for submission.
> b. process B tries to allocates memory, will wait for process A BO idle in eviction.
> c. process A completes the job, process B eviction will put process A BO node,
> but in the meantime, process C is comming to allocate BO, whill directly get node successfully, and do submission,
> process B will again wait for process C BO idle.
> d. repeat the above setps, process B could be delayed much more.
>
> add a mutex to gerantee the allocation sequence for same domain. But there is a possibility that
> visible vram could be evicted to invisilbe, the tricky is they are same domain manager, so which needs a special handling.
>
> Change-Id: I260e8eb704f7b4788b071d3f641f21b242912256
> Signed-off-by: Chunming Zhou <david1.zhou at amd.com>

I think this is a good approach, however there are two things that IMO 
needs fixing.

1) You must lock mutex interruptible if requested. It's important for 
fast delivery of Xserver mouse signals, and also to avoid unkillable 
processes on GPU hang.
2) How do we resolve contradictary eviction order between buffers? You'd 
probably get lockdep warnings if that should happen. Now I'm not sure 
there are
drivers that do this, and that would probably be a bug if they did. In 
principle you'd have to either establish a manager priority and document 
that drivers have to do that,
or use ww_mutexes.

/Thomas

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 16 +++++++++++++++-
>   include/drm/ttm/ttm_bo_api.h    |  2 ++
>   include/drm/ttm/ttm_bo_driver.h |  1 +
>   3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d33a6bb742a1..f465f1d38711 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -969,7 +969,11 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   		if (mem_type == TTM_PL_SYSTEM)
>   			break;
>   
> +		if (ctx->man != man)
> +			mutex_lock(&man->node_mutex);
>   		ret = (*man->func->get_node)(man, bo, place, mem);
> +		if (ctx->man != man)
> +			mutex_unlock(&man->node_mutex);
>   		if (unlikely(ret))
>   			return ret;
>   
> @@ -991,6 +995,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   
>   	for (i = 0; i < placement->num_busy_placement; ++i) {
>   		const struct ttm_place *place = &placement->busy_placement[i];
> +		struct ttm_operation_ctx busy_ctx;
>   
>   		ret = ttm_mem_type_from_place(place, &mem_type);
>   		if (ret)
> @@ -1018,7 +1023,15 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   			return 0;
>   		}
>   
> -		ret = ttm_bo_mem_force_space(bo, mem_type, place, mem, ctx);
> +		memcpy(&busy_ctx, ctx, sizeof(busy_ctx));
> +		if (ctx->man != man) {
> +			mutex_lock(&man->node_mutex);
> +			busy_ctx.man = man;
> +		}
> +		ret = ttm_bo_mem_force_space(bo, mem_type, place, mem,
> +					     &busy_ctx);
> +		if (ctx->man != man)
> +			mutex_unlock(&man->node_mutex);
>   		if (ret == 0 && mem->mm_node) {
>   			mem->placement = cur_flags;
>   			return 0;
> @@ -1449,6 +1462,7 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>   	man->io_reserve_fastpath = true;
>   	man->use_io_reserve_lru = false;
>   	mutex_init(&man->io_reserve_mutex);
> +	mutex_init(&man->node_mutex);
>   	spin_lock_init(&man->move_lock);
>   	INIT_LIST_HEAD(&man->io_reserve_lru);
>   
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index 2cd025c2abe7..b08629050215 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -265,6 +265,7 @@ struct ttm_bo_kmap_obj {
>    * @no_wait_gpu: Return immediately if the GPU is busy.
>    * @allow_reserved_eviction: Allow eviction of reserved BOs.
>    * @resv: Reservation object to allow reserved evictions with.
> + * @man: record ctx is under man->node_mutex.
>    *
>    * Context for TTM operations like changing buffer placement or general memory
>    * allocation.
> @@ -275,6 +276,7 @@ struct ttm_operation_ctx {
>   	bool allow_reserved_eviction;
>   	struct reservation_object *resv;
>   	uint64_t bytes_moved;
> +	struct ttm_mem_type_manager *man;
>   };
>   
>   /**
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 9b417eb2df20..d85d6ee4f54d 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -288,6 +288,7 @@ struct ttm_mem_type_manager {
>   	uint32_t default_caching;
>   	const struct ttm_mem_type_manager_func *func;
>   	void *priv;
> +	struct mutex node_mutex;
>   	struct mutex io_reserve_mutex;
>   	bool use_io_reserve_lru;
>   	bool io_reserve_fastpath;




More information about the amd-gfx mailing list