[PATCH 4/6] drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier

Oded Gabbay oded.gabbay at gmail.com
Mon May 14 17:20:09 UTC 2018


Cool, thanks!

On Mon, 14 May 2018, 19:07 Felix Kuehling <felix.kuehling at amd.com> wrote:

> On 2018-05-11 03:59 AM, Oded Gabbay wrote:
> > On Fri, Mar 23, 2018 at 10:32 PM, Felix Kuehling <Felix.Kuehling at amd.com>
> wrote:
> >> When an MMU notifier runs in memory reclaim context, it can deadlock
> >> trying to take locks that are already held in the thread causing the
> >> memory reclaim. The solution is to avoid memory reclaim while holding
> >> locks that are taken in MMU notifiers by using GFP_NOIO.
> > Which locks are problematic ?
>
> The only lock I need to take in our MMU notifier is the DQM lock.
>
> >
> > The kernel recommendation is to use "memalloc_noio_{save,restore} to
> > mark the whole scope which cannot perform any IO with a short
> > explanation why"
>
> Yeah. Looking at it more, I think the correct one to use is actually
> memalloc_nofs_{save,restore}.
>
> >
> > By using the scope functions, you protect against future allocation
> > code that will be written in the critical path, without worrying about
> > the developer using the correct GFP_NOIO flag.
>
> Yes. Last time I looked into this it was broken and didn't properly
> handle kmalloc allocations. It looks like this was fixed by this commit:
>
>     commit 6d7225f0cc1a1fc32cf5dd01b4ab4b8a34c7cdb4
>     Author: Nikolay Borisov <nborisov at suse.com>
>     Date:   Wed May 3 14:53:05 2017 -0700
>
>         lockdep: teach lockdep about memalloc_noio_save
>
>
> Later NOFS was introduced, which is now used by the lockdep checker to
> detect reclaim deadlocks.
>
> Regards,
>   Felix
>
> >
> > Oded
> >
> >> This commit fixes memory allocations done while holding the dqm->lock
> >> which is needed in the MMU notifier (dqm->ops.evict_process_queues).
> >>
> >> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> >> ---
> >>  drivers/gpu/drm/amd/amdkfd/kfd_device.c          | 2 +-
> >>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 2 +-
> >>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c  | 2 +-
> >>  3 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >> index 334669996..0434f65 100644
> >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >> @@ -652,7 +652,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd,
> unsigned int size,
> >>         if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
> >>                 return -ENOMEM;
> >>
> >> -       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);
> >> +       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> >>         if ((*mem_obj) == NULL)
> >>                 return -ENOMEM;
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> >> index c00c325..2bc49c6 100644
> >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> >> @@ -412,7 +412,7 @@ struct mqd_manager *mqd_manager_init_cik(enum
> KFD_MQD_TYPE type,
> >>         if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
> >>                 return NULL;
> >>
> >> -       mqd = kzalloc(sizeof(*mqd), GFP_KERNEL);
> >> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
> >>         if (!mqd)
> >>                 return NULL;
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> >> index 89e4242..481307b 100644
> >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> >> @@ -394,7 +394,7 @@ struct mqd_manager *mqd_manager_init_vi(enum
> KFD_MQD_TYPE type,
> >>         if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
> >>                 return NULL;
> >>
> >> -       mqd = kzalloc(sizeof(*mqd), GFP_KERNEL);
> >> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
> >>         if (!mqd)
> >>                 return NULL;
> >>
> >> --
> >> 2.7.4
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180514/32ea2a7f/attachment-0001.html>


More information about the amd-gfx mailing list