[PATCH] drm/amdgpu: Fix incorrect return value
Zhou1, Tao
Tao.Zhou1 at amd.com
Mon Apr 8 08:40:47 UTC 2024
[AMD Official Use Only - General]
> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai at amd.com>
> Sent: Sunday, April 7, 2024 10:21 AM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Candice
> <Candice.Li at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Yang,
> Stanley <Stanley.Yang at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value
>
> [AMD Official Use Only - General]
>
> -----------------
> Best Regards,
> Thomas
>
> -----Original Message-----
> From: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Sent: Wednesday, April 3, 2024 6:36 PM
> To: Chai, Thomas <YiPeng.Chai at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Candice
> <Candice.Li at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>; Yang,
> Stanley <Stanley.Yang at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: Fix incorrect return value
>
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: Chai, Thomas <YiPeng.Chai at amd.com>
> > Sent: Wednesday, April 3, 2024 3:07 PM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Chai, Thomas <YiPeng.Chai at amd.com>; Zhang, Hawking
> > <Hawking.Zhang at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Li, Candice
> > <Candice.Li at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>;
> > Yang, Stanley <Stanley.Yang at amd.com>; Chai, Thomas
> > <YiPeng.Chai at amd.com>
> > Subject: [PATCH] drm/amdgpu: Fix incorrect return value
> >
> > [Why]
> > After calling amdgpu_vram_mgr_reserve_range multiple times with the
> > same address, calling amdgpu_vram_mgr_query_page_status will always
> > return - EBUSY.
>
> >[Tao] could you explain why we call amdgpu_vram_mgr_reserve_range multiple
> times with the same address? IIRC, we skip duplicate address before reserve
> memory.
>
> [Thomas]
> When poison creation interrupt is received, since some poisoning addresses may
> have been allocated by some processes, reserving these memories will fail.
> These memory will be tried to reserve again after killing the poisoned process in
> the subsequent poisoning consumption interrupt handler.
> so amdgpu_vram_mgr_reserve_range needs to be called multiple times with the
> same address.
>
> > From the second call to amdgpu_vram_mgr_reserve_range, the same
> > address will be added to the reservations_pending list again and is
> > never moved to the reserved_pages list because the address had been
> reserved.
[Tao] but if a page is added to reservations_pending list, it should also be put in data->bps array, and when we call amdgpu_ras_add_bad_pages again, amdgpu_ras_check_bad_page_unlock could ignore this page.
So except for amdgpu_ras_add_bad_pages, would you like to call amdgpu_vram_mgr_reserve_range in other place?
> >
> > [How]
> > First add the address status check before calling
> > amdgpu_vram_mgr_do_reserve, if the address is already reserved, do
> > nothing; If the address is already in the reservations_pending list,
> > directly reserve memory; only add new nodes for the addresses that are
> > not in the reserved_pages list and reservations_pending list.
> >
> > Signed-off-by: YiPeng Chai <YiPeng.Chai at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28
> > +++++++++++++-------
> > 1 file changed, 19 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > index 1e36c428d254..0bf3f4092900 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct
> > ttm_resource_manager *man)
> >
> > dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",
> > rsv->start, rsv->size);
> > -
> > vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
> > atomic64_add(vis_usage, &mgr->vis_usage);
> > spin_lock(&man->bdev->lru_lock); @@ -340,19 +339,30 @@
> > int amdgpu_vram_mgr_reserve_range(struct
> > amdgpu_vram_mgr *mgr,
> > uint64_t start, uint64_t size) {
> > struct amdgpu_vram_reservation *rsv;
> > + int ret = 0;
> >
> > - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> > - if (!rsv)
> > - return -ENOMEM;
> > + ret = amdgpu_vram_mgr_query_page_status(mgr, start);
> > + if (!ret)
> > + return 0;
> > +
> > + if (ret == -ENOENT) {
> > + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> > + if (!rsv)
> > + return -ENOMEM;
> >
> > - INIT_LIST_HEAD(&rsv->allocated);
> > - INIT_LIST_HEAD(&rsv->blocks);
> > + INIT_LIST_HEAD(&rsv->allocated);
> > + INIT_LIST_HEAD(&rsv->blocks);
> >
> > - rsv->start = start;
> > - rsv->size = size;
> > + rsv->start = start;
> > + rsv->size = size;
> > +
> > + mutex_lock(&mgr->lock);
> > + list_add_tail(&rsv->blocks, &mgr->reservations_pending);
> > + mutex_unlock(&mgr->lock);
> > +
> > + }
> >
> > mutex_lock(&mgr->lock);
> > - list_add_tail(&rsv->blocks, &mgr->reservations_pending);
> > amdgpu_vram_mgr_do_reserve(&mgr->manager);
> > mutex_unlock(&mgr->lock);
> >
> > --
> > 2.34.1
>
>
More information about the amd-gfx
mailing list