[PATCH] drm/amdgpu: Initialize schedulers before using them
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Oct 23 05:49:06 UTC 2023
Am 23.10.23 um 05:23 schrieb Luben Tuikov:
> Initialize ring schedulers before using them, very early in the amdgpu boot,
> at PCI probe time, specifically at frame-buffer dumb-create at fill-buffer.
>
> This was discovered by using dynamic scheduler run-queues, which showed that
> amdgpu was using a scheduler before calling drm_sched_init(), and the only
> reason it was working was because sched_rq[] was statically allocated in the
> scheduler structure. However, the scheduler structure had _not_ been
> initialized.
>
> When switching to dynamically allocated run-queues, this lack of
> initialization was causing an oops and a blank screen at boot up. This patch
> fixes this amdgpu bug.
>
> This patch depends on the "drm/sched: Convert the GPU scheduler to variable
> number of run-queues" patch, as that patch prevents subsequent scheduler
> initialization if a scheduler has already been initialized.
>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Alex Deucher <Alexander.Deucher at amd.com>
> Cc: Felix Kuehling <Felix.Kuehling at amd.com>
> Cc: AMD Graphics <amd-gfx at lists.freedesktop.org>
> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 4e51dce3aab5d6..575ef7e1e30fd4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -60,6 +60,7 @@
> #include "amdgpu_atomfirmware.h"
> #include "amdgpu_res_cursor.h"
> #include "bif/bif_4_1_d.h"
> +#include "amdgpu_reset.h"
>
> MODULE_IMPORT_NS(DMA_BUF);
>
> @@ -2059,6 +2060,19 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable)
>
> ring = adev->mman.buffer_funcs_ring;
> sched = &ring->sched;
> +
> + r = drm_sched_init(sched, &amdgpu_sched_ops,
> + DRM_SCHED_PRIORITY_COUNT,
> + ring->num_hw_submission, 0,
> + adev->sdma_timeout, adev->reset_domain->wq,
> + ring->sched_score, ring->name,
> + adev->dev);
> + if (r) {
> + drm_err(adev, "%s: couldn't initialize ring:%s error:%d\n",
> + __func__, ring->name, r);
> + return;
> + }
That doesn't look correct either.
amdgpu_ttm_set_buffer_funcs_status() should only be called with
enable=true as argument *after* the copy ring is initialized and valid
to use. One part of this ring initialization is to setup the scheduler.
> +
> r = drm_sched_entity_init(&adev->mman.high_pr,
> DRM_SCHED_PRIORITY_KERNEL, &sched,
> 1, NULL);
That here looks totally incorrect and misplaced to me.
amdgpu_ttm_set_buffer_funcs_status() should only enabled/disable using
the copy functions and not really initialize the entity.
So the entity should only be created when enable=true and it should
especially *not* re-created all the time without properly destroying it.
Can you look at the history of the code? I'm pretty sure that this was
at some time correctly implemented.
Thanks,
Christian.
>
> base-commit: 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
> prerequisite-patch-id: c52673df9b6fc9ee001d6261c7ac107b618912a0
More information about the amd-gfx
mailing list