[PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"
Alex Deucher
alexdeucher at gmail.com
Wed Jan 3 18:50:54 UTC 2024
Applied. Thanks!
Alex
On Wed, Jan 3, 2024 at 10:33 AM Jay Cornwall <jay.cornwall at amd.com> wrote:
>
> On 1/3/2024 09:19, Alex Deucher wrote:
> > + Jay, Felix
> >
> > On Wed, Jan 3, 2024 at 5:16 AM Kaibo Ma <ent3rm4n at gmail.com> wrote:
> >>
> >> That commit causes NULL pointer dereferences in dmesgs when
> >> running applications using ROCm, including clinfo, blender,
> >> and PyTorch, since v6.6.1. Revert it to fix blender again.
> >>
> >> This reverts commit 96c211f1f9ef82183493f4ceed4e347b52849149.
> >>
> >> Closes: https://github.com/ROCm/ROCm/issues/2596
> >> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2991
> >> Signed-off-by: Kaibo Ma <ent3rm4n at gmail.com>
> >> ---
> >> drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 26 ++++++++++----------
> >> 1 file changed, 13 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> >> index 62b205dac..6604a3f99 100644
> >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
> >> @@ -330,12 +330,6 @@ static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id)
> >> pdd->gpuvm_limit =
> >> pdd->dev->kfd->shared_resources.gpuvm_size - 1;
> >>
> >> - /* dGPUs: the reserved space for kernel
> >> - * before SVM
> >> - */
> >> - pdd->qpd.cwsr_base = SVM_CWSR_BASE;
> >> - pdd->qpd.ib_base = SVM_IB_BASE;
> >> -
> >> pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
> >> pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
> >> }
> >> @@ -345,18 +339,18 @@ static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id)
> >> pdd->lds_base = MAKE_LDS_APP_BASE_V9();
> >> pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
> >>
> >> - pdd->gpuvm_base = PAGE_SIZE;
> >> + /* Raven needs SVM to support graphic handle, etc. Leave the small
> >> + * reserved space before SVM on Raven as well, even though we don't
> >> + * have to.
> >> + * Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they
> >> + * are used in Thunk to reserve SVM.
> >> + */
> >> + pdd->gpuvm_base = SVM_USER_BASE;
> >> pdd->gpuvm_limit =
> >> pdd->dev->kfd->shared_resources.gpuvm_size - 1;
> >>
> >> pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
> >> pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
> >> -
> >> - /*
> >> - * Place TBA/TMA on opposite side of VM hole to prevent
> >> - * stray faults from triggering SVM on these pages.
> >> - */
> >> - pdd->qpd.cwsr_base = pdd->dev->kfd->shared_resources.gpuvm_size;
> >> }
> >>
> >> int kfd_init_apertures(struct kfd_process *process)
> >> @@ -413,6 +407,12 @@ int kfd_init_apertures(struct kfd_process *process)
> >> return -EINVAL;
> >> }
> >> }
> >> +
> >> + /* dGPUs: the reserved space for kernel
> >> + * before SVM
> >> + */
> >> + pdd->qpd.cwsr_base = SVM_CWSR_BASE;
> >> + pdd->qpd.ib_base = SVM_IB_BASE;
> >> }
> >>
> >> dev_dbg(kfd_device, "node id %u\n", id);
> >> --
> >> 2.42.0
> >>
>
> I saw a segfault issue in Mesa yesterday. Not sure about the others, but I don't know how to make this change while compatibility with older UMDs.
>
> So I agree, let's revert it.
>
> Reviewed-by: Jay Cornwall <jay.cornwall at amd.com>
More information about the amd-gfx
mailing list