<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    Am 30.10.23 um 16:16 schrieb Felix Kuehling:<br>
    <blockquote type="cite"
      cite="mid:da7010c1-6b11-41f6-986d-796539073451@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="moz-cite-prefix">On 2023-10-30 4:23, Christian König
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com">Am
        28.10.23 um 00:39 schrieb Felix Kuehling: <br>
        <blockquote type="cite">Make restore workers freezable so we
          don't have to explicitly flush them <br>
          in suspend and GPU reset code paths, and we don't accidentally
          try to <br>
          restore BOs while the GPU is suspended. Not having to flush
          restore_work <br>
          also helps avoid lock/fence dependencies in the GPU reset case
          where we're <br>
          not allowed to wait for fences. <br>
          <br>
          This is an RFC and request for testing. I have not tested this
          myself yet. <br>
          My notes below: <br>
          <br>
          Restore work won't be frozen during GPU reset. Does it matter?
          Queues will <br>
          stay evicted until resume in any case. But restore_work may be
          in trouble <br>
          if it gets run in the middle of a GPU reset. In that case, if
          anything <br>
          fails, it will just reschedule itself, so should be fine as
          long as it <br>
          doesn't interfere with the reset itself (respects any
          mechanisms in place <br>
          to prevent HW access during the reset). <br>
          <br>
          What HW access does restore_work perform: <br>
          - migrating buffers: uses GPU scheduler, should be suspended
          during reset <br>
          - TLB flushes in userptr restore worker: not called directly,
          relies on <br>
             HWS to flush TLBs on VMID attachment <br>
          - TLB flushes in SVM restore worker: Does TLB flush in the
          mapping code <br>
          - Resuming user mode queues: should not happen while GPU reset
          keeps queue <br>
             eviction counter elevated <br>
          Ergo: Except for the SVM case, it's OK to not flush restore
          work before <br>
          GPU resets. I'll need to rethink the interaction of SVM
          restore_work with <br>
          GPU resets. <br>
        </blockquote>
        <br>
        It also sounds like the restore work is some high level work and
        shouldn't interact with the lower level GPU reset. <br>
      </blockquote>
      <p>That was my hope as well. Just trying to think through to make
        sure I'm not making any incorrect assumptions.</p>
      <p>Do you know if there is anything preventing a TLB flush using
        MMIO from messing up a GPU reset in progress? That's the only
        thing in the SVM restore work that tries to touch HW directly.<br>
      </p>
    </blockquote>
    <br>
    The TLB invalidations should be protected from GPU reset influence
    after I reworked TLB flush.<br>
    <br>
    So I think we are save here.<br>
    <br>
    <blockquote type="cite"
      cite="mid:da7010c1-6b11-41f6-986d-796539073451@amd.com">
      <p> </p>
      <blockquote type="cite"
        cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com"> <br>
        <blockquote type="cite"> <br>
          What about cancelling p->eviction_work? Eviction work would
          normally be <br>
          needed to signal eviction fences, but we're doing that
          explicitly in <br>
          suspend_all_processes. Does eviction_work wait for fences
          anywhere? Yes, <br>
          indirectly by flushing restore_work. So we should not try to
          cancel it <br>
          during a GPU reset. <br>
          <br>
          Problem: accessing p->ef concurrently in
          evict_process_worker and <br>
          suspend_all_processes. Need a spinlock to use and update it
          safely. <br>
          Problem: What if evict_process_worker gets stuck in flushing
          restore_work? <br>
          We can skip all of that if p->ef is NULL (which it is
          during the reset). <br>
          Even if it gets stuck, it's no problem if the reset doesn't
          depend on it. <br>
          It should get unstuck after the reset. <br>
          <br>
          Signed-off-by: Felix Kuehling <a
            class="moz-txt-link-rfc2396E"
            href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true"><Felix.Kuehling@amd.com></a>
          <br>
          --- <br>
            .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  9 ++-- <br>
            drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  1 + <br>
            drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 49
          +++++++++++++------ <br>
            drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  4 +- <br>
            4 files changed, 44 insertions(+), 19 deletions(-) <br>
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c <br>
          index 54f31a420229..89e632257663 100644 <br>
          --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c <br>
          @@ -1644,7 +1644,8 @@ int amdgpu_amdkfd_criu_resume(void *p) <br>
                    goto out_unlock; <br>
                } <br>
                WRITE_ONCE(pinfo->block_mmu_notifications, false); <br>
          -   
          schedule_delayed_work(&pinfo->restore_userptr_work, 0);
          <br>
          +    queue_delayed_work(system_freezable_wq, <br>
          +               &pinfo->restore_userptr_work, 0); <br>
              out_unlock: <br>
                mutex_unlock(&pinfo->lock); <br>
          @@ -2458,7 +2459,8 @@ int amdgpu_amdkfd_evict_userptr(struct
          mmu_interval_notifier *mni, <br>
                                   KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
          <br>
                    if (r) <br>
                        pr_err("Failed to quiesce KFD\n"); <br>
          -       
          schedule_delayed_work(&process_info->restore_userptr_work,
          <br>
          +        queue_delayed_work(system_freezable_wq, <br>
          +            &process_info->restore_userptr_work, <br>
                       
          msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); <br>
                } <br>
                mutex_unlock(&process_info->notifier_lock); <br>
          @@ -2793,7 +2795,8 @@ static void
          amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
          <br>
                  /* If validation failed, reschedule another attempt */
          <br>
                if (evicted_bos) { <br>
          -       
          schedule_delayed_work(&process_info->restore_userptr_work,
          <br>
          +        queue_delayed_work(system_freezable_wq, <br>
          +            &process_info->restore_userptr_work, <br>
                       
          msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); <br>
                      kfd_smi_event_queue_restore_rescheduled(mm); <br>
          diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
          b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h <br>
          index 9cc32f577e38..cf017d027fee 100644 <br>
          --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h <br>
          +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h <br>
          @@ -919,6 +919,7 @@ struct kfd_process { <br>
                 * during restore <br>
                 */ <br>
                struct dma_fence *ef; <br>
          +    spinlock_t ef_lock; <br>
                  /* Work items for evicting and restoring BOs */ <br>
                struct delayed_work eviction_work; <br>
          diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
          b/drivers/gpu/drm/amd/amdkfd/kfd_process.c <br>
          index fbf053001af9..a07cba58ec5e 100644 <br>
          --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c <br>
          +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c <br>
          @@ -664,7 +664,8 @@ int kfd_process_create_wq(void) <br>
                if (!kfd_process_wq) <br>
                    kfd_process_wq = alloc_workqueue("kfd_process_wq",
          0, 0); <br>
                if (!kfd_restore_wq) <br>
          -        kfd_restore_wq =
          alloc_ordered_workqueue("kfd_restore_wq", 0); <br>
          +        kfd_restore_wq =
          alloc_ordered_workqueue("kfd_restore_wq", <br>
          +                             WQ_FREEZABLE); <br>
                  if (!kfd_process_wq || !kfd_restore_wq) { <br>
                    kfd_process_destroy_wq(); <br>
          @@ -1460,6 +1461,7 @@ static struct kfd_process
          *create_process(const struct task_struct *thread) <br>
                  kref_init(&process->ref); <br>
                mutex_init(&process->mutex); <br>
          +    spin_lock_init(&process->ef_lock); <br>
                process->mm = thread->mm; <br>
                process->lead_thread = thread->group_leader; <br>
                process->n_pdds = 0; <br>
          @@ -1904,6 +1906,19 @@ kfd_process_gpuid_from_node(struct
          kfd_process *p, struct kfd_node *node, <br>
                return -EINVAL; <br>
            } <br>
            +static void signal_eviction_fence(struct kfd_process *p) <br>
          +{ <br>
          +    spin_lock(&p->ef_lock); <br>
          +    if (!p->ef) <br>
          +        goto unlock_out; <br>
          +    dma_fence_signal(p->ef); <br>
        </blockquote>
        <br>
        This needs to grab the internal lock of the eviction fence, I'm
        not sure that has correct ordering with the newly added ef_lock.
        <br>
      </blockquote>
      <p>Hmm, the only thing we could get a circular lock dependency
        would be, if we took the p->ef_lock in a fence callback. I
        don't think that ever happens, because even the eviction work
        runs on a worker thread (exactly to avoid such lock dependency
        issues).</p>
      <p>Anyway, I could try to move the fence_signal out of the
        critical section. The lock is only there to ensure that exactly
        one process signals and frees the fence.</p>
    </blockquote>
    <br>
    So basically either the eviction worker or the GPU reset could
    signal this fence.<br>
    <br>
    In that case I think it would be simpler to grab the reset lock in
    the eviction worker to protect against a concurrent reset.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote type="cite"
      cite="mid:da7010c1-6b11-41f6-986d-796539073451@amd.com">
      <pre>       spin_lock(&p->ef_lock);
        ef = p->ef;
        WRITE_ONCE(p->ef, NULL);
        spin_unlock(&p->ef_lock);
        if (ef) {
                dma_fence_signal(ef);
                dma_fence_put(ef);
        }
</pre>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com"> <br>
        <blockquote type="cite">+    dma_fence_put(p->ef); <br>
          +    WRITE_ONCE(p->ef, NULL); <br>
          + <br>
          +unlock_out: <br>
          +    spin_unlock(&p->ef_lock); <br>
          +} <br>
          + <br>
            static void evict_process_worker(struct work_struct *work) <br>
            { <br>
                int ret; <br>
          @@ -1916,8 +1931,11 @@ static void evict_process_worker(struct
          work_struct *work) <br>
                 * lifetime of this thread, kfd_process p will be valid
          <br>
                 */ <br>
                p = container_of(dwork, struct kfd_process,
          eviction_work); <br>
          -    WARN_ONCE(p->last_eviction_seqno !=
          p->ef->seqno, <br>
          -          "Eviction fence mismatch\n"); <br>
          +    /* If the eviction fence is not valid, we're probably in
          a suspend <br>
          +     * or GPU reset cycle. There is nothing to do in this
          case. <br>
          +     */ <br>
          +    if (!READ_ONCE(p->ef)) <br>
          +        return; <br>
        </blockquote>
        <br>
        This evict process work is high level I don't think it should
        have any dependency on the GPU reset. <br>
      </blockquote>
      <p>Right. This is not here to avoid issues, just a short-cut to
        avoid unnecessary work.</p>
      <p>Regards,<br>
          Felix<br>
      </p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:8e81a28a-5d50-44df-b441-6ceb2133c5e9@amd.com"> <br>
        Regards, <br>
        Christian. <br>
        <br>
        <blockquote type="cite">        /* Narrow window of overlap
          between restore and evict work <br>
                 * item is possible. Once
          amdgpu_amdkfd_gpuvm_restore_process_bos <br>
          @@ -1930,9 +1948,7 @@ static void evict_process_worker(struct
          work_struct *work) <br>
                pr_debug("Started evicting pasid 0x%x\n", p->pasid);
          <br>
                ret = kfd_process_evict_queues(p,
          KFD_QUEUE_EVICTION_TRIGGER_TTM); <br>
                if (!ret) { <br>
          -        dma_fence_signal(p->ef); <br>
          -        dma_fence_put(p->ef); <br>
          -        p->ef = NULL; <br>
          +        signal_eviction_fence(p); <br>
                    queue_delayed_work(kfd_restore_wq,
          &p->restore_work, <br>
                            msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
          <br>
            @@ -1967,9 +1983,19 @@ static void
          restore_process_worker(struct work_struct *work) <br>
                  p->last_restore_timestamp = get_jiffies_64(); <br>
                /* VMs may not have been acquired yet during debugging.
          */ <br>
          -    if (p->kgd_process_info) <br>
          +    if (p->kgd_process_info) { <br>
          +        struct dma_fence *ef = NULL; <br>
          + <br>
                    ret =
          amdgpu_amdkfd_gpuvm_restore_process_bos(p->kgd_process_info,
          <br>
          -                                 &p->ef); <br>
          +                                 &ef); <br>
          +        if (!ret) { <br>
          +            spin_lock(&p->ef_lock); <br>
          +            WARN_ONCE(p->ef, "Eviction fence is not
          NULL"); <br>
          +            WRITE_ONCE(p->ef, ef); <br>
          +            spin_unlock(&p->ef_lock); <br>
          +        } <br>
          +    } <br>
          + <br>
                if (ret) { <br>
                    pr_debug("Failed to restore BOs of pasid 0x%x, retry
          after %d ms\n", <br>
                         p->pasid, PROCESS_BACK_OFF_TIME_MS); <br>
          @@ -1994,14 +2020,9 @@ void kfd_suspend_all_processes(void) <br>
                  WARN(debug_evictions, "Evicting all processes"); <br>
                hash_for_each_rcu(kfd_processes_table, temp, p,
          kfd_processes) { <br>
          -        cancel_delayed_work_sync(&p->eviction_work); <br>
          -        flush_delayed_work(&p->restore_work); <br>
          - <br>
                    if (kfd_process_evict_queues(p,
          KFD_QUEUE_EVICTION_TRIGGER_SUSPEND)) <br>
                        pr_err("Failed to suspend process 0x%x\n",
          p->pasid); <br>
          -        dma_fence_signal(p->ef); <br>
          -        dma_fence_put(p->ef); <br>
          -        p->ef = NULL; <br>
          +        signal_eviction_fence(p); <br>
                } <br>
                srcu_read_unlock(&kfd_processes_srcu, idx); <br>
            } <br>
          diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
          b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c <br>
          index fe937670c927..aa6c34127be9 100644 <br>
          --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c <br>
          +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c <br>
          @@ -1869,7 +1869,7 @@ static void
          svm_range_restore_work(struct work_struct *work) <br>
                /* If validation failed, reschedule another attempt */ <br>
                if (evicted_ranges) { <br>
                    pr_debug("reschedule to restore svm range\n"); <br>
          -        schedule_delayed_work(&svms->restore_work, <br>
          +        queue_delayed_work(system_freezable_wq,
          &svms->restore_work, <br>
                       
          msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS)); <br>
                      kfd_smi_event_queue_restore_rescheduled(mm); <br>
          @@ -1945,7 +1945,7 @@ svm_range_evict(struct svm_range
          *prange, struct mm_struct *mm, <br>
                        pr_debug("failed to quiesce KFD\n"); <br>
                      pr_debug("schedule to restore svm %p ranges\n",
          svms); <br>
          -        schedule_delayed_work(&svms->restore_work, <br>
          +        queue_delayed_work(system_freezable_wq,
          &svms->restore_work, <br>
                       
          msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS)); <br>
                } else { <br>
                    unsigned long s, l; <br>
        </blockquote>
        <br>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>