<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 2019-12-19 21:34, Liu, Shaoyun
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:DM6PR12MB32417F69354B176C28AB3ACBF42D0@DM6PR12MB3241.namprd12.prod.outlook.com">
      
      Will it looks cleaner if we keep a pre_reset flag in per device
      structure and check it in the function when talk to hw?<br>
    </blockquote>
    <p>I was briefly considering that when I saw how many function
      needed that pre_reset flag. But this could lead to race conditions
      with other threads doing evictions concurrently (e.g. MMU
      notifiers). I think we want concurrent evictions to hang until the
      GPU reset is done, not just skip the actual eviction quietly.<br>
    </p>
    <p>Hmm, I need to think more about the interaction between evictions
      and resets.</p>
    <p>Regards,<br>
        Felix<br>
    </p>
    <blockquote type="cite" cite="mid:DM6PR12MB32417F69354B176C28AB3ACBF42D0@DM6PR12MB3241.namprd12.prod.outlook.com"><br>
      Regards<br>
      Shaoyun.liu
      <hr style="display:inline-block;width:98%" tabindex="-1">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
          Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a><br>
          <b>Sent:</b> December 19, 2019 9:21:08 PM<br>
          <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
          <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a>; Liu, Monk
          <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Liu, Shaoyun
          <a class="moz-txt-link-rfc2396E" href="mailto:Shaoyun.Liu@amd.com"><Shaoyun.Liu@amd.com></a>; Grodzovsky, Andrey
          <a class="moz-txt-link-rfc2396E" href="mailto:Andrey.Grodzovsky@amd.com"><Andrey.Grodzovsky@amd.com></a><br>
          <b>Subject:</b> Re: [PATCH 1/1] drm/amdkfd: Don't touch the
          hardware in pre_reset callback</font>
        <div> </div>
      </div>
      <div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
            <div class="PlainText">[+Andrey]<br>
              <br>
              Hi Shaoyun, Monk, and Andrey,<br>
              <br>
              I tested this on my bare-metal Vega10 system. GPU reset
              (using BACO) is <br>
              flaky on this system with and without this patch. The
              first reset seems <br>
              to work OK, the second one fails in different ways. In
              theory this <br>
              change should be an improvement as it eliminated at least
              one failure <br>
              mode (hanging before the actual reset). A least I don't
              think I'm making <br>
              anything worse.<br>
              <br>
              My test is to run KFDTest and trigger a GPU reset while
              it's busy with <br>
              the eviction test. This test runs multiple KFD processes
              and also <br>
              performs graphics memory allocations and command
              submissions to force <br>
              TTM to evict KFD processes. I trigger the reset by hanging
              the HWS <br>
              through a debugfs hook:<br>
              <br>
                   # cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id >
              <br>
              /sys/kernel/debug/kfd/hang_hws<br>
              <br>
              Please give this patch a try and reinstate the pre_reset
              call before the <br>
              FLR for SRIOV.<br>
              <br>
              Regards,<br>
                 Felix<br>
              <br>
              On 2019-12-19 9:09 p.m., Felix Kuehling wrote:<br>
              > The intention of the pre_reset callback is to update
              the driver<br>
              > state to reflect that all user mode queues are
              preempted and the<br>
              > HIQ is destroyed. However we should not actually
              preempt any queues<br>
              > or otherwise touch the hardware because it is
              presumably hanging.<br>
              > The impending reset will take care of actually
              stopping all queues.<br>
              ><br>
              > This should prevent KIQ register access hanging on
              SRIOV function<br>
              > level reset (FLR). It should also speed up the reset
              by avoiding<br>
              > unnecessary timeouts from a potentially hanging GPU
              scheduler.<br>
              ><br>
              > CC: shaoyunl <a class="moz-txt-link-rfc2396E" href="mailto:shaoyun.liu@amd.com"><shaoyun.liu@amd.com></a><br>
              > CC: Liu Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a><br>
              > Signed-off-by: Felix Kuehling
              <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a><br>
              > ---<br>
              >   drivers/gpu/drm/amd/amdkfd/kfd_device.c       | 24
              ++++++++++-------<br>
              >   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 27
              ++++++++++++-------<br>
              >   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  5
              ++--<br>
              >   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  8
              +++---<br>
              >   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  4
              +--<br>
              >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  8
              +++---<br>
              >   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 11
              ++++----<br>
              >   .../amd/amdkfd/kfd_process_queue_manager.c    |  2
              +-<br>
              >   8 files changed, 53 insertions(+), 36 deletions(-)<br>
              ><br>
              > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_device.c<br>
              > index c6b6901bbda3..796996a0d926 100644<br>
              > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c<br>
              > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c<br>
              > @@ -486,6 +486,7 @@ static int kfd_gtt_sa_init(struct
              kfd_dev *kfd, unsigned int buf_size,<br>
              >                                unsigned int
              chunk_size);<br>
              >   static void kfd_gtt_sa_fini(struct kfd_dev *kfd);<br>
              >   <br>
              > +static void kfd_suspend(struct kfd_dev *kfd, bool
              pre_reset);<br>
              >   static int kfd_resume(struct kfd_dev *kfd);<br>
              >   <br>
              >   struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,<br>
              > @@ -728,7 +729,7 @@ int kgd2kfd_pre_reset(struct
              kfd_dev *kfd)<br>
              >   {<br>
              >        if (!kfd->init_complete)<br>
              >                return 0;<br>
              > -     kgd2kfd_suspend(kfd);<br>
              > +     kfd_suspend(kfd, true);<br>
              >   <br>
              >        kfd_signal_reset_event(kfd);<br>
              >        return 0;<br>
              > @@ -767,13 +768,7 @@ void kgd2kfd_suspend(struct
              kfd_dev *kfd)<br>
              >        if (!kfd->init_complete)<br>
              >                return;<br>
              >   <br>
              > -     /* For first KFD device suspend all the KFD
              processes */<br>
              > -     if (atomic_inc_return(&kfd_locked) == 1)<br>
              > -             kfd_suspend_all_processes();<br>
              > -<br>
              > -     kfd->dqm->ops.stop(kfd->dqm);<br>
              > -<br>
              > -     kfd_iommu_suspend(kfd);<br>
              > +     kfd_suspend(kfd, false);<br>
              >   }<br>
              >   <br>
              >   int kgd2kfd_resume(struct kfd_dev *kfd)<br>
              > @@ -795,6 +790,17 @@ int kgd2kfd_resume(struct
              kfd_dev *kfd)<br>
              >        return ret;<br>
              >   }<br>
              >   <br>
              > +static void kfd_suspend(struct kfd_dev *kfd, bool
              pre_reset)<br>
              > +{<br>
              > +     /* For first KFD device suspend all the KFD
              processes */<br>
              > +     if (atomic_inc_return(&kfd_locked) == 1)<br>
              > +             kfd_suspend_all_processes(pre_reset);<br>
              > +<br>
              > +     kfd->dqm->ops.stop(kfd->dqm,
              pre_reset);<br>
              > +<br>
              > +     kfd_iommu_suspend(kfd);<br>
              > +}<br>
              > +<br>
              >   static int kfd_resume(struct kfd_dev *kfd)<br>
              >   {<br>
              >        int err = 0;<br>
              > @@ -877,7 +883,7 @@ int kgd2kfd_quiesce_mm(struct
              mm_struct *mm)<br>
              >        if (!p)<br>
              >                return -ESRCH;<br>
              >   <br>
              > -     r = kfd_process_evict_queues(p);<br>
              > +     r = kfd_process_evict_queues(p, false);<br>
              >   <br>
              >        kfd_unref_process(p);<br>
              >        return r;<br>
              > diff --git
              a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
              > index f7f6df40875e..3a49944164da 100644<br>
              > ---
              a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
              > +++
              b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
              > @@ -592,7 +592,8 @@ static int update_queue(struct
              device_queue_manager *dqm, struct queue *q)<br>
              >   }<br>
              >   <br>
              >   static int evict_process_queues_nocpsch(struct
              device_queue_manager *dqm,<br>
              > -                                     struct
              qcm_process_device *qpd)<br>
              > +                                     struct
              qcm_process_device *qpd,<br>
              > +                                     bool pre_reset)<br>
              >   {<br>
              >        struct queue *q;<br>
              >        struct mqd_manager *mqd_mgr;<br>
              > @@ -622,6 +623,8 @@ static int
              evict_process_queues_nocpsch(struct device_queue_manager
              *dqm,<br>
              >   <br>
              >                if (WARN_ONCE(!dqm->sched_running,
              "Evict when stopped\n"))<br>
              >                        continue;<br>
              > +             if (pre_reset)<br>
              > +                     continue;<br>
              >   <br>
              >                retval =
              mqd_mgr->destroy_mqd(mqd_mgr, q->mqd,<br>
              >                               
              KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,<br>
              > @@ -639,7 +642,8 @@ static int
              evict_process_queues_nocpsch(struct device_queue_manager
              *dqm,<br>
              >   }<br>
              >   <br>
              >   static int evict_process_queues_cpsch(struct
              device_queue_manager *dqm,<br>
              > -                                   struct
              qcm_process_device *qpd)<br>
              > +                                   struct
              qcm_process_device *qpd,<br>
              > +                                   bool pre_reset)<br>
              >   {<br>
              >        struct queue *q;<br>
              >        struct kfd_process_device *pdd;<br>
              > @@ -664,6 +668,10 @@ static int
              evict_process_queues_cpsch(struct device_queue_manager
              *dqm,<br>
              >                q->properties.is_active = false;<br>
              >                dqm->queue_count--;<br>
              >        }<br>
              > +<br>
              > +     if (pre_reset)<br>
              > +             goto out;<br>
              > +<br>
              >        retval = execute_queues_cpsch(dqm,<br>
              >                                qpd->is_debug ?<br>
              >                               
              KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :<br>
              > @@ -944,10 +952,10 @@ static int start_nocpsch(struct
              device_queue_manager *dqm)<br>
              >        return 0;<br>
              >   }<br>
              >   <br>
              > -static int stop_nocpsch(struct device_queue_manager
              *dqm)<br>
              > +static int stop_nocpsch(struct device_queue_manager
              *dqm, bool pre_reset)<br>
              >   {<br>
              >        if
              (dqm->dev->device_info->asic_family ==
              CHIP_HAWAII)<br>
              > -             pm_uninit(&dqm->packets);<br>
              > +             pm_uninit(&dqm->packets,
              pre_reset);<br>
              >        dqm->sched_running = false;<br>
              >   <br>
              >        return 0;<br>
              > @@ -1107,20 +1115,21 @@ static int start_cpsch(struct
              device_queue_manager *dqm)<br>
              >        return 0;<br>
              >   fail_allocate_vidmem:<br>
              >   fail_set_sched_resources:<br>
              > -     pm_uninit(&dqm->packets);<br>
              > +     pm_uninit(&dqm->packets, false);<br>
              >   fail_packet_manager_init:<br>
              >        return retval;<br>
              >   }<br>
              >   <br>
              > -static int stop_cpsch(struct device_queue_manager
              *dqm)<br>
              > +static int stop_cpsch(struct device_queue_manager
              *dqm, bool pre_reset)<br>
              >   {<br>
              >        dqm_lock(dqm);<br>
              > -     unmap_queues_cpsch(dqm,
              KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);<br>
              > +     if (!pre_reset)<br>
              > +             unmap_queues_cpsch(dqm,
              KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);<br>
              >        dqm->sched_running = false;<br>
              >        dqm_unlock(dqm);<br>
              >   <br>
              >        kfd_gtt_sa_free(dqm->dev,
              dqm->fence_mem);<br>
              > -     pm_uninit(&dqm->packets);<br>
              > +     pm_uninit(&dqm->packets, pre_reset);<br>
              >   <br>
              >        return 0;<br>
              >   }<br>
              > @@ -1891,7 +1900,7 @@ int kfd_process_vm_fault(struct
              device_queue_manager *dqm,<br>
              >                return -EINVAL;<br>
              >        pdd = kfd_get_process_device_data(dqm->dev,
              p);<br>
              >        if (pdd)<br>
              > -             ret =
              dqm->ops.evict_process_queues(dqm, &pdd->qpd);<br>
              > +             ret =
              dqm->ops.evict_process_queues(dqm, &pdd->qpd,
              false);<br>
              >        kfd_unref_process(p);<br>
              >   <br>
              >        return ret;<br>
              > diff --git
              a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
              b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h<br>
              > index a8c37e6da027..9f82f95f6a92 100644<br>
              > ---
              a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h<br>
              > +++
              b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h<br>
              > @@ -103,7 +103,7 @@ struct device_queue_manager_ops {<br>
              >   <br>
              >        int     (*initialize)(struct
              device_queue_manager *dqm);<br>
              >        int     (*start)(struct device_queue_manager
              *dqm);<br>
              > -     int     (*stop)(struct device_queue_manager
              *dqm);<br>
              > +     int     (*stop)(struct device_queue_manager
              *dqm, bool pre_reset);<br>
              >        void    (*uninitialize)(struct
              device_queue_manager *dqm);<br>
              >        int     (*create_kernel_queue)(struct
              device_queue_manager *dqm,<br>
              >                                        struct
              kernel_queue *kq,<br>
              > @@ -129,7 +129,8 @@ struct device_queue_manager_ops {<br>
              >                        struct qcm_process_device
              *qpd);<br>
              >   <br>
              >        int (*evict_process_queues)(struct
              device_queue_manager *dqm,<br>
              > -                                 struct
              qcm_process_device *qpd);<br>
              > +                                 struct
              qcm_process_device *qpd,<br>
              > +                                 bool pre_reset);<br>
              >        int (*restore_process_queues)(struct
              device_queue_manager *dqm,<br>
              >                                      struct
              qcm_process_device *qpd);<br>
              >   <br>
              > diff --git
              a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c<br>
              > index 2d56dc534459..dbd2e8e9ae69 100644<br>
              > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c<br>
              > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c<br>
              > @@ -195,9 +195,9 @@ static bool kq_initialize(struct
              kernel_queue *kq, struct kfd_dev *dev,<br>
              >   }<br>
              >   <br>
              >   /* Uninitialize a kernel queue and free all its
              memory usages. */<br>
              > -static void kq_uninitialize(struct kernel_queue *kq)<br>
              > +static void kq_uninitialize(struct kernel_queue *kq,
              bool pre_reset)<br>
              >   {<br>
              > -     if (kq->queue->properties.type ==
              KFD_QUEUE_TYPE_HIQ)<br>
              > +     if (kq->queue->properties.type ==
              KFD_QUEUE_TYPE_HIQ && !pre_reset)<br>
              >               
              kq->mqd_mgr->destroy_mqd(kq->mqd_mgr,<br>
              >                                       
              kq->queue->mqd,<br>
              >                                       
              KFD_PREEMPT_TYPE_WAVEFRONT_RESET,<br>
              > @@ -337,9 +337,9 @@ struct kernel_queue
              *kernel_queue_init(struct kfd_dev *dev,<br>
              >        return NULL;<br>
              >   }<br>
              >   <br>
              > -void kernel_queue_uninit(struct kernel_queue *kq)<br>
              > +void kernel_queue_uninit(struct kernel_queue *kq,
              bool pre_reset)<br>
              >   {<br>
              > -     kq_uninitialize(kq);<br>
              > +     kq_uninitialize(kq, pre_reset);<br>
              >        kfree(kq);<br>
              >   }<br>
              >   <br>
              > diff --git
              a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
              > index 6cabed06ef5d..7732a3bbebd6 100644<br>
              > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
              > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
              > @@ -264,10 +264,10 @@ int pm_init(struct
              packet_manager *pm, struct device_queue_manager *dqm)<br>
              >        return 0;<br>
              >   }<br>
              >   <br>
              > -void pm_uninit(struct packet_manager *pm)<br>
              > +void pm_uninit(struct packet_manager *pm, bool
              pre_reset)<br>
              >   {<br>
              >        mutex_destroy(&pm->lock);<br>
              > -     kernel_queue_uninit(pm->priv_queue);<br>
              > +     kernel_queue_uninit(pm->priv_queue,
              pre_reset);<br>
              >   }<br>
              >   <br>
              >   int pm_send_set_resources(struct packet_manager
              *pm,<br>
              > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
              b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
              > index 087e96838997..9bc4ced4acba 100644<br>
              > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
              > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
              > @@ -762,9 +762,9 @@ struct kfd_process
              *kfd_get_process(const struct task_struct *);<br>
              >   struct kfd_process
              *kfd_lookup_process_by_pasid(unsigned int pasid);<br>
              >   struct kfd_process *kfd_lookup_process_by_mm(const
              struct mm_struct *mm);<br>
              >   void kfd_unref_process(struct kfd_process *p);<br>
              > -int kfd_process_evict_queues(struct kfd_process *p);<br>
              > +int kfd_process_evict_queues(struct kfd_process *p,
              bool pre_reset);<br>
              >   int kfd_process_restore_queues(struct kfd_process
              *p);<br>
              > -void kfd_suspend_all_processes(void);<br>
              > +void kfd_suspend_all_processes(bool pre_reset);<br>
              >   int kfd_resume_all_processes(void);<br>
              >   <br>
              >   int kfd_process_device_init_vm(struct
              kfd_process_device *pdd,<br>
              > @@ -883,7 +883,7 @@ struct device_queue_manager
              *device_queue_manager_init(struct kfd_dev *dev);<br>
              >   void device_queue_manager_uninit(struct
              device_queue_manager *dqm);<br>
              >   struct kernel_queue *kernel_queue_init(struct
              kfd_dev *dev,<br>
              >                                        enum
              kfd_queue_type type);<br>
              > -void kernel_queue_uninit(struct kernel_queue *kq);<br>
              > +void kernel_queue_uninit(struct kernel_queue *kq,
              bool pre_reset);<br>
              >   int kfd_process_vm_fault(struct
              device_queue_manager *dqm, unsigned int pasid);<br>
              >   <br>
              >   /* Process Queue Manager */<br>
              > @@ -974,7 +974,7 @@ extern const struct
              packet_manager_funcs kfd_vi_pm_funcs;<br>
              >   extern const struct packet_manager_funcs
              kfd_v9_pm_funcs;<br>
              >   <br>
              >   int pm_init(struct packet_manager *pm, struct
              device_queue_manager *dqm);<br>
              > -void pm_uninit(struct packet_manager *pm);<br>
              > +void pm_uninit(struct packet_manager *pm, bool
              pre_reset);<br>
              >   int pm_send_set_resources(struct packet_manager
              *pm,<br>
              >                                struct
              scheduling_resources *res);<br>
              >   int pm_send_runlist(struct packet_manager *pm,
              struct list_head *dqm_queues);<br>
              > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_process.c<br>
              > index 536a153ac9a4..7bcadd3a1e3b 100644<br>
              > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c<br>
              > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c<br>
              > @@ -948,7 +948,7 @@ struct kfd_process
              *kfd_lookup_process_by_mm(const struct mm_struct *mm)<br>
              >    * Eviction is reference-counted per
              process-device. This means multiple<br>
              >    * evictions from different sources can be nested
              safely.<br>
              >    */<br>
              > -int kfd_process_evict_queues(struct kfd_process *p)<br>
              > +int kfd_process_evict_queues(struct kfd_process *p,
              bool pre_reset)<br>
              >   {<br>
              >        struct kfd_process_device *pdd;<br>
              >        int r = 0;<br>
              > @@ -956,7 +956,8 @@ int
              kfd_process_evict_queues(struct kfd_process *p)<br>
              >   <br>
              >        list_for_each_entry(pdd,
              &p->per_device_data, per_device_list) {<br>
              >                r =
              pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,<br>
              >
              -                                                        
              &pdd->qpd);<br>
              >
              +                                                        
              &pdd->qpd,<br>
              >
              +                                                        
              pre_reset);<br>
              >                if (r) {<br>
              >                        pr_err("Failed to evict
              process queues\n");<br>
              >                        goto fail;<br>
              > @@ -1026,7 +1027,7 @@ static void
              evict_process_worker(struct work_struct *work)<br>
              >        flush_delayed_work(&p->restore_work);<br>
              >   <br>
              >        pr_debug("Started evicting pasid 0x%x\n",
              p->pasid);<br>
              > -     ret = kfd_process_evict_queues(p);<br>
              > +     ret = kfd_process_evict_queues(p, false);<br>
              >        if (!ret) {<br>
              >                dma_fence_signal(p->ef);<br>
              >                dma_fence_put(p->ef);<br>
              > @@ -1082,7 +1083,7 @@ static void
              restore_process_worker(struct work_struct *work)<br>
              >                pr_err("Failed to restore queues of
              pasid 0x%x\n", p->pasid);<br>
              >   }<br>
              >   <br>
              > -void kfd_suspend_all_processes(void)<br>
              > +void kfd_suspend_all_processes(bool pre_reset)<br>
              >   {<br>
              >        struct kfd_process *p;<br>
              >        unsigned int temp;<br>
              > @@ -1092,7 +1093,7 @@ void
              kfd_suspend_all_processes(void)<br>
              >               
              cancel_delayed_work_sync(&p->eviction_work);<br>
              >               
              cancel_delayed_work_sync(&p->restore_work);<br>
              >   <br>
              > -             if (kfd_process_evict_queues(p))<br>
              > +             if (kfd_process_evict_queues(p,
              pre_reset))<br>
              >                        pr_err("Failed to suspend
              process 0x%x\n", p->pasid);<br>
              >                dma_fence_signal(p->ef);<br>
              >                dma_fence_put(p->ef);<br>
              > diff --git
              a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
              b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c<br>
              > index d3eacf72e8db..8fa856e6a03f 100644<br>
              > ---
              a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c<br>
              > +++
              b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c<br>
              > @@ -374,7 +374,7 @@ int pqm_destroy_queue(struct
              process_queue_manager *pqm, unsigned int qid)<br>
              >                /* destroy kernel queue (DIQ) */<br>
              >                dqm = pqn->kq->dev->dqm;<br>
              >                dqm->ops.destroy_kernel_queue(dqm,
              pqn->kq, &pdd->qpd);<br>
              > -             kernel_queue_uninit(pqn->kq);<br>
              > +             kernel_queue_uninit(pqn->kq, false);<br>
              >        }<br>
              >   <br>
              >        if (pqn->q) {<br>
            </div>
          </span></font></div>
    </blockquote>
  </body>
</html>