[PATCH] drm/amdgpu: Initialize schedulers before using them

Christian König ckoenig.leichtzumerken at gmail.com
Mon Oct 23 05:49:06 UTC 2023



Am 23.10.23 um 05:23 schrieb Luben Tuikov:
> Initialize ring schedulers before using them, very early in the amdgpu boot,
> at PCI probe time, specifically at frame-buffer dumb-create at fill-buffer.
>
> This was discovered by using dynamic scheduler run-queues, which showed that
> amdgpu was using a scheduler before calling drm_sched_init(), and the only
> reason it was working was because sched_rq[] was statically allocated in the
> scheduler structure. However, the scheduler structure had _not_ been
> initialized.
>
> When switching to dynamically allocated run-queues, this lack of
> initialization was causing an oops and a blank screen at boot up. This patch
> fixes this amdgpu bug.
>
> This patch depends on the "drm/sched: Convert the GPU scheduler to variable
> number of run-queues" patch, as that patch prevents subsequent scheduler
> initialization if a scheduler has already been initialized.
>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Alex Deucher <Alexander.Deucher at amd.com>
> Cc: Felix Kuehling <Felix.Kuehling at amd.com>
> Cc: AMD Graphics <amd-gfx at lists.freedesktop.org>
> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 4e51dce3aab5d6..575ef7e1e30fd4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -60,6 +60,7 @@
>   #include "amdgpu_atomfirmware.h"
>   #include "amdgpu_res_cursor.h"
>   #include "bif/bif_4_1_d.h"
> +#include "amdgpu_reset.h"
>   
>   MODULE_IMPORT_NS(DMA_BUF);
>   
> @@ -2059,6 +2060,19 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>   
>   		ring = adev->mman.buffer_funcs_ring;
>   		sched = &ring->sched;
> +
> +		r = drm_sched_init(sched, &amdgpu_sched_ops,
> +				   DRM_SCHED_PRIORITY_COUNT,
> +				   ring->num_hw_submission, 0,
> +				   adev->sdma_timeout, adev->reset_domain->wq,
> +				   ring->sched_score, ring->name,
> +				   adev->dev);
> +		if (r) {
> +			drm_err(adev, "%s: couldn't initialize ring:%s error:%d\n",
> +				__func__, ring->name, r);
> +			return;
> +		}

That doesn't look correct either.

amdgpu_ttm_set_buffer_funcs_status() should only be called with 
enable=true as argument *after* the copy ring is initialized and valid 
to use. One part of this ring initialization is to setup the scheduler.


> +
>   		r = drm_sched_entity_init(&adev->mman.high_pr,
>   					  DRM_SCHED_PRIORITY_KERNEL, &sched,
>   					  1, NULL);

That here looks totally incorrect and misplaced to me. 
amdgpu_ttm_set_buffer_funcs_status() should only enabled/disable using 
the copy functions and not really initialize the entity.

So the entity should only be created when enable=true and it should 
especially *not* re-created all the time without properly destroying it.

Can you look at the history of the code? I'm pretty sure that this was 
at some time correctly implemented.

Thanks,
Christian.

>
> base-commit: 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
> prerequisite-patch-id: c52673df9b6fc9ee001d6261c7ac107b618912a0



More information about the amd-gfx mailing list