[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
Christian König
ckoenig.leichtzumerken at gmail.com
Wed May 15 07:04:52 UTC 2019
Hi Prike,
no, that can lead to massive problems in a real OOM situation and is not
something we can do here.
Christian.
Am 15.05.19 um 04:00 schrieb Liang, Prike:
>
> Hi Christian ,
>
> I just wonder when encounter ENOMEM error during pin amdgpu BOs can we
> retry validate again as below.
>
> With the following simply patch the Abaqus pinned issue not observed.
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>
> index 11cbf63..72a32f5 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>
> @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo
> *bo, u32 domain,
>
> bo->placements[i].lpfn = lpfn;
>
> bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
>
> }
>
> -
>
> +retry:
>
> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>
> if (unlikely(r)) {
>
> - dev_err(adev->dev, "%p pin failed\n", bo);
>
> - goto error;
>
> + if (r == -ENOMEM){
>
> + goto retry;
>
> + } else {
>
> + dev_err(adev->dev, "%p pin failed\n", bo);
>
> + goto error;
>
> + }
>
> }
>
> bo->pin_count = 1;
>
> Thanks,
>
> Prike
>
> *From:* Marek Olšák <maraeo at gmail.com>
> *Sent:* Wednesday, May 15, 2019 3:33 AM
> *To:* Christian König <ckoenig.leichtzumerken at gmail.com>
> *Cc:* Zhou, David(ChunMing) <David1.Zhou at amd.com>; Liang, Prike
> <Prike.Liang at amd.com>; dri-devel <dri-devel at lists.freedesktop.org>;
> amd-gfx mailing list <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the
> LRU during CS
>
> [CAUTION: External Email]
>
> This series fixes the OOM errors. However, if I torture the kernel
> driver more, I can get it to deadlock and end up with unkillable
> processes. I can also get an OOM error. I just ran the test 5 times:
>
> AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears &
> AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears &
> AMD_DEBUG=testgdsmm glxgears
>
> Marek
>
> On Tue, May 14, 2019 at 8:31 AM Christian König
> <ckoenig.leichtzumerken at gmail.com
> <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>
> This avoids OOM situations when we have lots of threads
> submitting at the same time.
>
> Signed-off-by: Christian König <christian.koenig at amd.com
> <mailto:christian.koenig at amd.com>>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index fff558cf385b..f9240a94217b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
> amdgpu_cs_parser *p,
> }
>
> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
> - &duplicates, true);
> + &duplicates, false);
> if (unlikely(r != 0)) {
> if (r != -ERESTARTSYS)
> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190515/c7f157ec/attachment.html>
More information about the amd-gfx
mailing list