<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-09-09 14:46, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:5c26a4aa-d18d-4e36-98a9-b915644cb853@amd.com">Am
      09.09.24 um 18:02 schrieb Kim, Jonathan:
      <br>
      <blockquote type="cite">[Public]
        <br>
        <br>
        <blockquote type="cite">-----Original Message-----
          <br>
          From: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:ckoenig.leichtzumerken@gmail.com"><ckoenig.leichtzumerken@gmail.com></a>
          <br>
          Sent: Thursday, September 5, 2024 10:24 AM
          <br>
          To: Kim, Jonathan <a class="moz-txt-link-rfc2396E" href="mailto:Jonathan.Kim@amd.com"><Jonathan.Kim@amd.com></a>;
          <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
          <br>
          Cc: Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; Deucher,
          Alexander
          <br>
          <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a>; Joshi, Mukul
          <a class="moz-txt-link-rfc2396E" href="mailto:Mukul.Joshi@amd.com"><Mukul.Joshi@amd.com></a>
          <br>
          Subject: Re: [PATCH] drm/amdkfd: fix vm-pasid lookup for
          multiple partitions
          <br>
          <br>
          Caution: This message originated from an External Source. Use
          proper caution
          <br>
          when opening attachments, clicking links, or responding.
          <br>
          <br>
          <br>
          Am 19.08.24 um 19:59 schrieb Jonathan Kim:
          <br>
          <blockquote type="cite">Currently multiple partitions will
            incorrectly overwrite the VM lookup
            <br>
            table since the table is indexed by PASID and multiple
            partitions can
            <br>
            register different VM objects on the same PASID.
            <br>
          </blockquote>
          That's a rather bad idea. Why do we have the same PASID for
          different VM
          <br>
          objects in the first place?
          <br>
        </blockquote>
        Alex can probably elaborate on the KGD side, but from what I can
        see, the KMS driver open call has always assigned a new VM
        object per PASID on an open call.
        <br>
        The KFD acquires and replaces the KGD PASID-VMID registration on
        its own compute process open/creation call.
        <br>
        If this is the bad_idea you're referring to, then someone else
        will have to chime in.  I don't have much history on this
        unfortunately.
        <br>
      </blockquote>
      <br>
      Yeah, Felix and I designed that.
      <br>
    </blockquote>
    <p>app opens drm node to create vm for each partition, with
      different vm->pasid for each vm, issue is from
      kfd_ioctl_acquire_vm -> kfd_process_device_init_vm -> 
      amdgpu_amdkfd_gpuvm_set_vm_pasid, to replace all vm->pasid with
      kfd process->pasid, which is from open kfd node. This ends up
      to store only one vm to adev->vm_manager.pasids with KFD
      process pasid, so we cannot retrieve correct vm from
      adev->vm_manager.pasids on mGPUs or multiple partitions.<br>
    </p>
    <blockquote type="cite" cite="mid:5c26a4aa-d18d-4e36-98a9-b915644cb853@amd.com">
      <br>
      <blockquote type="cite">That aside, the current problem is, is
        that all KFD device structures are logical partitions and
        register their PASID-VM binding using this concept of a device.
        <br>
      </blockquote>
      <br>
      As far as I can see that is the fundamental problem. This needs to
      be fixed instead.
      <br>
      <br>
      <blockquote type="cite">On the KGD side however, the registration
        table is maintained in the adev struct, which is a physical
        socket.
        <br>
        So there's a mismatch in understanding of what a device is
        between the KFD & KGD with regard to the look up table that
        results in bad bindings.
        <br>
        <br>
        Adding a per-partition dimension to the existing lookup table
        resolves issues where seeing, for example, with memory violation
        interception and XNACK i.e bad bindings result in wrong vm
        object found to set no-retry flags on memory violations.
        <br>
      </blockquote>
    </blockquote>
    <p>svm_range_restore_pages retry fault recover uses fault pasid to
      get kfd process, and use the fault node_id to get pdd->vm,
      maybe you can use this way to fix the debugger issue.<br>
    </p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:5c26a4aa-d18d-4e36-98a9-b915644cb853@amd.com">
      <br>
      Yeah that is pretty much a no-go.
      <br>
      <br>
      The PASID and how it is used is defined by the PCIe
      specifications. If we now start to assign multiple VMs to the same
      PASID then we are violating the PCIe specification.
      <br>
      <br>
      The problems you see are most likely just the tip of the iceberg
      here.
      <br>
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
      <blockquote type="cite">
        <br>
        Jon
        <br>
        <br>
        <blockquote type="cite">Regards,
          <br>
          Christian.
          <br>
          <br>
          <blockquote type="cite">This results in loading the wrong VM
            object on PASID query.
            <br>
            <br>
            To correct this, setup the lookup table to be
            per-partition-per-PASID
            <br>
            instead.
            <br>
            <br>
            Signed-off-by: Jonathan Kim <a class="moz-txt-link-rfc2396E" href="mailto:jonathan.kim@amd.com"><jonathan.kim@amd.com></a>
            <br>
            ---
            <br>
               drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    | 12 ++++
            <br>
               drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  1 +
            <br>
               drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  4 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  7 ++-
            <br>
               drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 55
            +++++++++++------
            <br>
          </blockquote>
          --
          <br>
          <blockquote type="cite">  
            drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        | 11 +++-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |  5 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c        |  5 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c        |  5 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c         |  3 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c         |  5 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         | 16 ++----
            <br>
               drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  2 +-
            <br>
               drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c      |  4 +-
            <br>
               drivers/gpu/drm/amd/amdkfd/kfd_events.c       |  3 +-
            <br>
               .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c  |  8 +--
            <br>
               .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c   |  8 +--
            <br>
               drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  3 +-
            <br>
               18 files changed, 92 insertions(+), 65 deletions(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
          <br>
          <blockquote type="cite">index c272461d70a9..28db789610e1
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
            <br>
            @@ -887,3 +887,15 @@ int amdgpu_amdkfd_unmap_hiq(struct
            <br>
          </blockquote>
          amdgpu_device *adev, u32 doorbell_off,
          <br>
          <blockquote type="cite">       return r;
            <br>
               }
            <br>
            +
            <br>
            +int amdgpu_amdkfd_node_id_to_xcc_id(struct amdgpu_device
            *adev,
            <br>
          </blockquote>
          uint32_t node_id)
          <br>
          <blockquote type="cite">+{
            <br>
            +     if (adev->gfx.funcs->ih_node_to_logical_xcc) {
            <br>
            +             int xcc_id =
            adev->gfx.funcs->ih_node_to_logical_xcc(adev,
            node_id);
            <br>
            +
            <br>
            +             if (xcc_id >= 0)
            <br>
            +                     return xcc_id;
            <br>
            +     }
            <br>
            +
            <br>
            +     return 0;
            <br>
            +}
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
          <br>
          <blockquote type="cite">index 4ed49265c764..bf8bb45d8ab6
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
            <br>
            @@ -356,6 +356,7 @@ void
            amdgpu_amdkfd_unreserve_mem_limit(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               uint64_t size, u32
            alloc_flag, int8_t xcp_id);
            <br>
            <br>
               u64 amdgpu_amdkfd_xcp_memory_size(struct amdgpu_device
            *adev, int
            <br>
          </blockquote>
          xcp_id);
          <br>
          <blockquote type="cite">+int
            amdgpu_amdkfd_node_id_to_xcc_id(struct amdgpu_device *adev,
            <br>
          </blockquote>
          uint32_t node_id);
          <br>
          <blockquote type="cite">   #define KFD_XCP_MEM_ID(adev,
            xcp_id) \
            <br>
                           ((adev)->xcp_mgr && (xcp_id) >=
            0 ?\
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
          <br>
          <blockquote type="cite">index c6a1783fc9ef..bf9f8802e18d
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
            <br>
            @@ -37,7 +37,7 @@ static enum drm_gpu_sched_stat
            <br>
          </blockquote>
          amdgpu_job_timedout(struct drm_sched_job *s_job)
          <br>
          <blockquote type="cite">       struct amdgpu_job *job =
            to_amdgpu_job(s_job);
            <br>
                   struct amdgpu_task_info *ti;
            <br>
                   struct amdgpu_device *adev = ring->adev;
            <br>
            -     int idx;
            <br>
            +     int idx, xcp_id = !job->vm ? 0 :
            job->vm->xcp_id;
            <br>
                   int r;
            <br>
            <br>
                   if (!drm_dev_enter(adev_to_drm(adev), &idx)) {
            <br>
            @@ -62,7 +62,7 @@ static enum drm_gpu_sched_stat
            <br>
          </blockquote>
          amdgpu_job_timedout(struct drm_sched_job *s_job)
          <br>
          <blockquote type="cite">              
            job->base.sched->name,
            atomic_read(&ring->fence_drv.last_seq),
            <br>
                           ring->fence_drv.sync_seq);
            <br>
            <br>
            -     ti = amdgpu_vm_get_task_info_pasid(ring->adev,
            job->pasid);
            <br>
            +     ti = amdgpu_vm_get_task_info_pasid(ring->adev,
            job->pasid, xcp_id);
            <br>
                   if (ti) {
            <br>
                           dev_err(adev->dev,
            <br>
                                   "Process information: process %s pid
            %d thread %s pid %d\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
          <br>
          <blockquote type="cite">index d9fde38f6ee2..e413bf4a3e84
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
            <br>
            @@ -1275,17 +1275,20 @@ int amdgpu_info_ioctl(struct
            drm_device *dev,
            <br>
          </blockquote>
          void *data, struct drm_file *filp)
          <br>
          <blockquote type="cite">               struct amdgpu_vm *vm =
            &fpriv->vm;
            <br>
                           struct drm_amdgpu_info_gpuvm_fault
            gpuvm_fault;
            <br>
                           unsigned long flags;
            <br>
            +             int i;
            <br>
            <br>
                           if (!vm)
            <br>
                                   return -EINVAL;
            <br>
            <br>
                           memset(&gpuvm_fault, 0,
            sizeof(gpuvm_fault));
            <br>
            <br>
            -            
            xa_lock_irqsave(&adev->vm_manager.pasids, flags);
            <br>
            +             for (i = 0; i <
            adev->xcp_mgr->num_xcps; i++)
            <br>
            +                    
            xa_lock_irqsave(&adev->vm_manager.pasids[i], flags);
            <br>
                           gpuvm_fault.addr = vm->fault_info.addr;
            <br>
                           gpuvm_fault.status =
            vm->fault_info.status;
            <br>
                           gpuvm_fault.vmhub = vm->fault_info.vmhub;
            <br>
            -            
            xa_unlock_irqrestore(&adev->vm_manager.pasids,
            flags);
            <br>
            +             for (i = 0; i <
            adev->xcp_mgr->num_xcps; i++)
            <br>
            +                    
            xa_unlock_irqrestore(&adev->vm_manager.pasids[i],
            flags);
            <br>
            <br>
                           return copy_to_user(out, &gpuvm_fault,
            <br>
                                               min((size_t)size,
            sizeof(gpuvm_fault))) ? -EFAULT : 0;
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
          <br>
          <blockquote type="cite">index bcb729094521..f43e1c15f423
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
            @@ -146,7 +146,7 @@ int amdgpu_vm_set_pasid(struct
            amdgpu_device
            <br>
          </blockquote>
          *adev, struct amdgpu_vm *vm,
          <br>
          <blockquote type="cite">               return 0;
            <br>
            <br>
                   if (vm->pasid) {
            <br>
            -             r =
            xa_err(xa_erase_irq(&adev->vm_manager.pasids,
            vm->pasid));
            <br>
            +             r =
            xa_err(xa_erase_irq(&adev->vm_manager.pasids[vm->xcp_id],
            <br>
          </blockquote>
          vm->pasid));
          <br>
          <blockquote type="cite">               if (r < 0)
            <br>
                                   return r;
            <br>
            <br>
            @@ -154,7 +154,7 @@ int amdgpu_vm_set_pasid(struct
            amdgpu_device
            <br>
          </blockquote>
          *adev, struct amdgpu_vm *vm,
          <br>
          <blockquote type="cite">       }
            <br>
            <br>
                   if (pasid) {
            <br>
            -             r =
            xa_err(xa_store_irq(&adev->vm_manager.pasids, pasid,
            vm,
            <br>
            +             r =
            xa_err(xa_store_irq(&adev->vm_manager.pasids[vm->xcp_id],
            <br>
          </blockquote>
          pasid, vm,
          <br>
          <blockquote type="cite">                                      
            GFP_KERNEL));
            <br>
                           if (r < 0)
            <br>
                                   return r;
            <br>
            @@ -2288,14 +2288,14 @@ static void
            <br>
          </blockquote>
          amdgpu_vm_destroy_task_info(struct kref *kref)
          <br>
          <blockquote type="cite">   }
            <br>
            <br>
               static inline struct amdgpu_vm *
            <br>
            -amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32
            pasid)
            <br>
            +amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32
            pasid,
            <br>
          </blockquote>
          u32 xcp_id)
          <br>
          <blockquote type="cite">   {
            <br>
                   struct amdgpu_vm *vm;
            <br>
                   unsigned long flags;
            <br>
            <br>
            -     xa_lock_irqsave(&adev->vm_manager.pasids,
            flags);
            <br>
            -     vm = xa_load(&adev->vm_manager.pasids, pasid);
            <br>
            -     xa_unlock_irqrestore(&adev->vm_manager.pasids,
            flags);
            <br>
            +    
            xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
            flags);
            <br>
            +     vm = xa_load(&adev->vm_manager.pasids[xcp_id],
            pasid);
            <br>
            +    
            xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
            flags);
            <br>
            <br>
                   return vm;
            <br>
               }
            <br>
            @@ -2343,10 +2343,10 @@ amdgpu_vm_get_task_info_vm(struct
            <br>
          </blockquote>
          amdgpu_vm *vm)
          <br>
          <blockquote type="cite">    * referenced down with
            amdgpu_vm_put_task_info.
            <br>
                */
            <br>
               struct amdgpu_task_info *
            <br>
            -amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev,
            u32 pasid)
            <br>
            +amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev,
            u32 pasid,
            <br>
          </blockquote>
          u32 xcp_id)
          <br>
          <blockquote type="cite">   {
            <br>
                   return amdgpu_vm_get_task_info_vm(
            <br>
            -                     amdgpu_vm_get_vm_from_pasid(adev,
            pasid));
            <br>
            +                     amdgpu_vm_get_vm_from_pasid(adev,
            pasid, xcp_id));
            <br>
               }
            <br>
            <br>
               static int amdgpu_vm_create_task_info(struct amdgpu_vm
            *vm)
            <br>
            @@ -2481,6 +2481,8 @@ int amdgpu_vm_init(struct
            amdgpu_device
            <br>
          </blockquote>
          *adev, struct amdgpu_vm *vm,
          <br>
          <blockquote type="cite">      
            amdgpu_bo_unreserve(vm->root.bo);
            <br>
                   amdgpu_bo_unref(&root_bo);
            <br>
            <br>
            +     vm->xcp_id = xcp_id < 0 ? 0 : xcp_id;
            <br>
            +
            <br>
                   return 0;
            <br>
            <br>
               error_free_root:
            <br>
            @@ -2695,8 +2697,8 @@ void amdgpu_vm_manager_init(struct
            <br>
          </blockquote>
          amdgpu_device *adev)
          <br>
          <blockquote type="cite">   #else
            <br>
                   adev->vm_manager.vm_update_mode = 0;
            <br>
               #endif
            <br>
            -
            <br>
            -     xa_init_flags(&adev->vm_manager.pasids,
            XA_FLAGS_LOCK_IRQ);
            <br>
            +     for (i = 0; i < MAX_XCP; i++)
            <br>
            +            
            xa_init_flags(&(adev->vm_manager.pasids[i]),
            <br>
          </blockquote>
          XA_FLAGS_LOCK_IRQ);
          <br>
          <blockquote type="cite">   }
            <br>
            <br>
               /**
            <br>
            @@ -2708,10 +2710,15 @@ void amdgpu_vm_manager_init(struct
            <br>
          </blockquote>
          amdgpu_device *adev)
          <br>
          <blockquote type="cite">    */
            <br>
               void amdgpu_vm_manager_fini(struct amdgpu_device *adev)
            <br>
               {
            <br>
            -     WARN_ON(!xa_empty(&adev->vm_manager.pasids));
            <br>
            -     xa_destroy(&adev->vm_manager.pasids);
            <br>
            +     int i;
            <br>
            +
            <br>
            +     for (i = 0; i < MAX_XCP; i++) {
            <br>
            +            
            WARN_ON(!xa_empty(&adev->vm_manager.pasids[i]));
            <br>
            +            
            xa_destroy(&adev->vm_manager.pasids[i]);
            <br>
            +     }
            <br>
            <br>
                   amdgpu_vmid_mgr_fini(adev);
            <br>
            +
            <br>
               }
            <br>
            <br>
               /**
            <br>
            @@ -2778,17 +2785,18 @@ bool amdgpu_vm_handle_fault(struct
            <br>
          </blockquote>
          amdgpu_device *adev, u32 pasid,
          <br>
          <blockquote type="cite">       unsigned long irqflags;
            <br>
                   uint64_t value, flags;
            <br>
                   struct amdgpu_vm *vm;
            <br>
            -     int r;
            <br>
            +     int r, xcp_id;
            <br>
            <br>
            -     xa_lock_irqsave(&adev->vm_manager.pasids,
            irqflags);
            <br>
            -     vm = xa_load(&adev->vm_manager.pasids, pasid);
            <br>
            +     xcp_id = amdgpu_amdkfd_node_id_to_xcc_id(adev,
            node_id)/adev-
            <br>
            gfx.num_xcc_per_xcp;
            <br>
            +    
            xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
            irqflags);
            <br>
            +     vm = xa_load(&adev->vm_manager.pasids[xcp_id],
            pasid);
            <br>
                   if (vm) {
            <br>
                           root = amdgpu_bo_ref(vm->root.bo);
            <br>
                           is_compute_context =
            vm->is_compute_context;
            <br>
                   } else {
            <br>
                           root = NULL;
            <br>
                   }
            <br>
            -     xa_unlock_irqrestore(&adev->vm_manager.pasids,
            irqflags);
            <br>
            +    
            xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
            irqflags);
            <br>
            <br>
                   if (!root)
            <br>
                           return false;
            <br>
            @@ -2806,11 +2814,11 @@ bool amdgpu_vm_handle_fault(struct
            <br>
          </blockquote>
          amdgpu_device *adev, u32 pasid,
          <br>
          <blockquote type="cite">               goto error_unref;
            <br>
            <br>
                   /* Double check that the VM still exists */
            <br>
            -     xa_lock_irqsave(&adev->vm_manager.pasids,
            irqflags);
            <br>
            -     vm = xa_load(&adev->vm_manager.pasids, pasid);
            <br>
            +    
            xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
            irqflags);
            <br>
            +     vm = xa_load(&adev->vm_manager.pasids[xcp_id],
            pasid);
            <br>
                   if (vm && vm->root.bo != root)
            <br>
                           vm = NULL;
            <br>
            -     xa_unlock_irqrestore(&adev->vm_manager.pasids,
            irqflags);
            <br>
            +    
            xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
            irqflags);
            <br>
                   if (!vm)
            <br>
                           goto error_unlock;
            <br>
            <br>
            @@ -2968,14 +2976,15 @@ void
            amdgpu_vm_update_fault_cache(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                                
            unsigned int pasid,
            <br>
                                             uint64_t addr,
            <br>
                                             uint32_t status,
            <br>
            -                               unsigned int vmhub)
            <br>
            +                               unsigned int vmhub,
            <br>
            +                               uint32_t xcp_id)
            <br>
               {
            <br>
                   struct amdgpu_vm *vm;
            <br>
                   unsigned long flags;
            <br>
            <br>
            -     xa_lock_irqsave(&adev->vm_manager.pasids,
            flags);
            <br>
            +    
            xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
            flags);
            <br>
            <br>
            -     vm = xa_load(&adev->vm_manager.pasids, pasid);
            <br>
            +     vm = xa_load(&adev->vm_manager.pasids[xcp_id],
            pasid);
            <br>
                   /* Don't update the fault cache if status is 0.  In
            the multiple
            <br>
                    * fault case, subsequent faults will return a 0
            status which is
            <br>
                    * useless for userspace and replaces the useful
            fault status, so
            <br>
            @@ -3008,7 +3017,7 @@ void
            amdgpu_vm_update_fault_cache(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                       WARN_ONCE(1,
            "Invalid vmhub %u\n", vmhub);
            <br>
                           }
            <br>
                   }
            <br>
            -     xa_unlock_irqrestore(&adev->vm_manager.pasids,
            flags);
            <br>
            +    
            xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
            flags);
            <br>
               }
            <br>
            <br>
               /**
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
          <br>
          <blockquote type="cite">index 046949c4b695..1499f5f731e9
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
            @@ -35,6 +35,7 @@
            <br>
               #include "amdgpu_sync.h"
            <br>
               #include "amdgpu_ring.h"
            <br>
               #include "amdgpu_ids.h"
            <br>
            +#include "amdgpu_xcp.h"
            <br>
            <br>
               struct drm_exec;
            <br>
            <br>
            @@ -418,6 +419,9 @@ struct amdgpu_vm {
            <br>
            <br>
                   /* cached fault info */
            <br>
                   struct amdgpu_vm_fault_info fault_info;
            <br>
            +
            <br>
            +     /* XCP ID */
            <br>
            +     int xcp_id;
            <br>
               };
            <br>
            <br>
               struct amdgpu_vm_manager {
            <br>
            @@ -456,7 +460,7 @@ struct amdgpu_vm_manager {
            <br>
                   /* PASID to VM mapping, will be used in interrupt
            context to
            <br>
                    * look up VM of a page fault
            <br>
                    */
            <br>
            -     struct xarray                           pasids;
            <br>
            +     struct xarray                          
            pasids[MAX_XCP];
            <br>
                   /* Global registration of recent page fault
            information */
            <br>
                   struct amdgpu_vm_fault_info     fault_info;
            <br>
               };
            <br>
            @@ -550,7 +554,7 @@ bool amdgpu_vm_need_pipeline_sync(struct
            <br>
          </blockquote>
          amdgpu_ring *ring,
          <br>
          <blockquote type="cite">   void
            amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
            <br>
            <br>
               struct amdgpu_task_info *
            <br>
            -amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev,
            u32 pasid);
            <br>
            +amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev,
            u32 pasid,
            <br>
          </blockquote>
          u32 xcp_id);
          <br>
          <blockquote type="cite">   struct amdgpu_task_info *
            <br>
               amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm);
            <br>
            @@ -649,7 +653,8 @@ void amdgpu_vm_update_fault_cache(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                                
            unsigned int pasid,
            <br>
                                             uint64_t addr,
            <br>
                                             uint32_t status,
            <br>
            -                               unsigned int vmhub);
            <br>
            +                               unsigned int vmhub,
            <br>
            +                               uint32_t xcp_id);
            <br>
               void amdgpu_vm_tlb_fence_create(struct amdgpu_device
            *adev,
            <br>
                                            struct amdgpu_vm *vm,
            <br>
                                            struct dma_fence **fence);
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
          <br>
          <blockquote type="cite">index f0ceab3ce5bf..24b042febf5c
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
            @@ -151,7 +151,8 @@ static int
            gmc_v10_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">              
            WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
            <br>
            <br>
                           amdgpu_vm_update_fault_cache(adev,
            entry->pasid, addr, status,
            <br>
            -                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0));
          <br>
          <blockquote type="cite">+                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0),
          <br>
          <blockquote type="cite">+                                         
            0);
            <br>
                   }
            <br>
            <br>
                   if (!printk_ratelimit())
            <br>
            @@ -161,7 +162,7 @@ static int
            gmc_v10_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               "[%s] page fault
            (src_id:%u ring:%u vmid:%u pasid:%u)\n",
            <br>
                           entry->vmid_src ? "mmhub" : "gfxhub",
            <br>
                           entry->src_id, entry->ring_id,
            entry->vmid, entry->pasid);
            <br>
            -     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid);
            <br>
            +     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid, 0);
            <br>
                   if (task_info) {
            <br>
                           dev_err(adev->dev,
            <br>
                                   " in process %s pid %d thread %s pid
            %d\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
          <br>
          <blockquote type="cite">index 2797fd84432b..3507046d33e6
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
            <br>
            @@ -122,7 +122,8 @@ static int
            gmc_v11_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">              
            WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
            <br>
            <br>
                           amdgpu_vm_update_fault_cache(adev,
            entry->pasid, addr, status,
            <br>
            -                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0));
          <br>
          <blockquote type="cite">+                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0),
          <br>
          <blockquote type="cite">+                                         
            0);
            <br>
                   }
            <br>
            <br>
                   if (printk_ratelimit()) {
            <br>
            @@ -132,7 +133,7 @@ static int
            gmc_v11_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                       "[%s] page
            fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
            <br>
                                   entry->vmid_src ? "mmhub" :
            "gfxhub",
            <br>
                                   entry->src_id, entry->ring_id,
            entry->vmid, entry->pasid);
            <br>
            -             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
            <br>
            +             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid, 0);
            <br>
                           if (task_info) {
            <br>
                                   dev_err(adev->dev,
            <br>
                                           " in process %s pid %d thread
            %s pid %d)\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
          <br>
          <blockquote type="cite">index 60acf676000b..9844564c6c74
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
            <br>
            @@ -115,7 +115,8 @@ static int
            gmc_v12_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">              
            WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
            <br>
            <br>
                           amdgpu_vm_update_fault_cache(adev,
            entry->pasid, addr, status,
            <br>
            -                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0));
          <br>
          <blockquote type="cite">+                                         
            entry->vmid_src ? AMDGPU_MMHUB0(0) :
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0),
          <br>
          <blockquote type="cite">+                                         
            0);
            <br>
                   }
            <br>
            <br>
                   if (printk_ratelimit()) {
            <br>
            @@ -125,7 +126,7 @@ static int
            gmc_v12_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                       "[%s] page
            fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
            <br>
                                   entry->vmid_src ? "mmhub" :
            "gfxhub",
            <br>
                                   entry->src_id, entry->ring_id,
            entry->vmid, entry->pasid);
            <br>
            -             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
            <br>
            +             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid, 0);
            <br>
                           if (task_info) {
            <br>
                                   dev_err(adev->dev,
            <br>
                                           " in process %s pid %d thread
            %s pid %d)\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
          <br>
          <blockquote type="cite">index 994432fb57ea..2cdb0cbb7c4d
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
            <br>
            @@ -1268,7 +1268,8 @@ static int
            gmc_v7_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               return 0;
            <br>
            <br>
                   amdgpu_vm_update_fault_cache(adev, entry->pasid,
            <br>
            -                                  ((u64)addr) <<
            AMDGPU_GPU_PAGE_SHIFT, status,
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0));
          <br>
          <blockquote type="cite">+                                 
            ((u64)addr) << AMDGPU_GPU_PAGE_SHIFT, status,
            <br>
            +                                  AMDGPU_GFXHUB(0), 0);
            <br>
            <br>
                   if (amdgpu_vm_fault_stop ==
            AMDGPU_VM_FAULT_STOP_FIRST)
            <br>
                           gmc_v7_0_set_fault_enable_default(adev,
            false);
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
          <br>
          <blockquote type="cite">index 86488c052f82..6855caeb7f74
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
            <br>
            @@ -1437,7 +1437,8 @@ static int
            gmc_v8_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               return 0;
            <br>
            <br>
                   amdgpu_vm_update_fault_cache(adev, entry->pasid,
            <br>
            -                                  ((u64)addr) <<
            AMDGPU_GPU_PAGE_SHIFT, status,
            <br>
          </blockquote>
          AMDGPU_GFXHUB(0));
          <br>
          <blockquote type="cite">+                                 
            ((u64)addr) << AMDGPU_GPU_PAGE_SHIFT, status,
            <br>
            +                                  AMDGPU_GFXHUB(0), 0);
            <br>
            <br>
                   if (amdgpu_vm_fault_stop ==
            AMDGPU_VM_FAULT_STOP_FIRST)
            <br>
                           gmc_v8_0_set_fault_enable_default(adev,
            false);
            <br>
            @@ -1448,7 +1449,7 @@ static int
            gmc_v8_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               dev_err(adev->dev,
            "GPU fault detected: %d 0x%08x\n",
            <br>
                                   entry->src_id,
            entry->src_data[0]);
            <br>
            <br>
            -             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
            <br>
            +             task_info =
            amdgpu_vm_get_task_info_pasid(adev, entry->pasid, 0);
            <br>
                           if (task_info) {
            <br>
                                   dev_err(adev->dev, " for process
            %s pid %d thread %s pid %d\n",
            <br>
                                           task_info->process_name,
            task_info->tgid,
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
          <br>
          <blockquote type="cite">index b73136d390cc..e183e08b2c02
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
            <br>
            @@ -556,10 +556,12 @@ static int
            gmc_v9_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">       unsigned int vmhub;
            <br>
                   u64 addr;
            <br>
                   uint32_t cam_index = 0;
            <br>
            -     int ret, xcc_id = 0;
            <br>
            -     uint32_t node_id;
            <br>
            +     int ret;
            <br>
            +     uint32_t node_id, xcc_id, xcp_id;
            <br>
            <br>
                   node_id = entry->node_id;
            <br>
            +     xcc_id = amdgpu_amdkfd_node_id_to_xcc_id(adev,
            node_id);
            <br>
            +     xcp_id = xcc_id/adev->gfx.num_xcc_per_xcp;
            <br>
            <br>
                   addr = (u64)entry->src_data[0] << 12;
            <br>
                   addr |= ((u64)entry->src_data[1] & 0xf)
            << 44;
            <br>
            @@ -572,12 +574,6 @@ static int
            gmc_v9_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               vmhub =
            AMDGPU_MMHUB1(0);
            <br>
                   } else {
            <br>
                           hub_name = "gfxhub0";
            <br>
            -             if
            (adev->gfx.funcs->ih_node_to_logical_xcc) {
            <br>
            -                     xcc_id =
            adev->gfx.funcs->ih_node_to_logical_xcc(adev,
            <br>
            -                             node_id);
            <br>
            -                     if (xcc_id < 0)
            <br>
            -                             xcc_id = 0;
            <br>
            -             }
            <br>
                           vmhub = xcc_id;
            <br>
                   }
            <br>
                   hub = &adev->vmhub[vmhub];
            <br>
            @@ -631,7 +627,7 @@ static int
            gmc_v9_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">               retry_fault ? "retry" :
            "no-retry",
            <br>
                           entry->src_id, entry->ring_id,
            entry->vmid, entry->pasid);
            <br>
            <br>
            -     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid);
            <br>
            +     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid,
            <br>
          </blockquote>
          xcp_id);
          <br>
          <blockquote type="cite">       if (task_info) {
            <br>
                           dev_err(adev->dev,
            <br>
                                   " for process %s pid %d thread %s pid
            %d)\n",
            <br>
            @@ -675,7 +671,7 @@ static int
            gmc_v9_0_process_interrupt(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">       if (!amdgpu_sriov_vf(adev))
            <br>
                           WREG32_P(hub->vm_l2_pro_fault_cntl, 1,
            ~1);
            <br>
            <br>
            -     amdgpu_vm_update_fault_cache(adev, entry->pasid,
            addr, status,
            <br>
          </blockquote>
          vmhub);
          <br>
          <blockquote type="cite">+    
            amdgpu_vm_update_fault_cache(adev, entry->pasid, addr,
            status,
            <br>
          </blockquote>
          vmhub, xcp_id);
          <br>
          <blockquote type="cite">       dev_err(adev->dev,
            <br>
                           "VM_L2_PROTECTION_FAULT_STATUS:0x%08X\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
          <br>
          <blockquote type="cite">index 23ef4eb36b40..1ac4224bbe5b
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
            <br>
            @@ -2182,7 +2182,7 @@ static int
            sdma_v4_0_print_iv_entry(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                          instance,
            addr, entry->src_id, entry->ring_id, entry->vmid,
            <br>
                                      entry->pasid);
            <br>
            <br>
            -     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid);
            <br>
            +     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid, 0);
            <br>
                   if (task_info) {
            <br>
                           dev_dbg_ratelimited(adev->dev,
            <br>
                                               " for process %s pid %d
            thread %s pid %d\n",
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
          <br>
          <blockquote type="cite">index 57f16c09abfc..c8b5c0302ca7
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
            <br>
            @@ -1683,6 +1683,8 @@ static int
            sdma_v4_4_2_print_iv_entry(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">       int instance;
            <br>
                   struct amdgpu_task_info *task_info;
            <br>
                   u64 addr;
            <br>
            +     uint32_t xcc_id =
            amdgpu_amdkfd_node_id_to_xcc_id(adev, entry-
            <br>
            node_id);
            <br>
            +     uint32_t xcp_id = xcc_id/adev->gfx.num_xcc_per_xcp;
            <br>
            <br>
                   instance = sdma_v4_4_2_irq_id_to_seq(adev,
            entry->client_id);
            <br>
                   if (instance < 0 || instance >=
            adev->sdma.num_instances) {
            <br>
            @@ -1698,7 +1700,7 @@ static int
            sdma_v4_4_2_print_iv_entry(struct
            <br>
          </blockquote>
          amdgpu_device *adev,
          <br>
          <blockquote type="cite">                           instance,
            addr, entry->src_id, entry->ring_id, entry->vmid,
            <br>
                                       entry->pasid);
            <br>
            <br>
            -     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid);
            <br>
            +     task_info = amdgpu_vm_get_task_info_pasid(adev,
            entry->pasid,
            <br>
          </blockquote>
          xcp_id);
          <br>
          <blockquote type="cite">       if (task_info) {
            <br>
                           dev_dbg_ratelimited(adev->dev, " for
            process %s pid %d thread %s
            <br>
          </blockquote>
          pid %d\n",
          <br>
          <blockquote type="cite">                                  
            task_info->process_name, task_info->tgid,
            <br>
            diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
          <br>
          <blockquote type="cite">index ea3792249209..c098fbaf0e1c
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
            <br>
            @@ -1262,8 +1262,9 @@ void kfd_signal_reset_event(struct
            kfd_node
            <br>
          </blockquote>
          *dev)
          <br>
          <blockquote type="cite">               if
            (dev->dqm->detect_hang_count) {
            <br>
                                   struct amdgpu_task_info *ti;
            <br>
            +                     uint32_t xcp_id = dev->xcp ?
            dev->xcp->id : 0;
            <br>
            <br>
            -                     ti =
            amdgpu_vm_get_task_info_pasid(dev->adev, p->pasid);
            <br>
            +                     ti =
            amdgpu_vm_get_task_info_pasid(dev->adev, p->pasid,
            <br>
          </blockquote>
          xcp_id);
          <br>
          <blockquote type="cite">                       if (ti) {
            <br>
                                           dev_err(dev->adev->dev,
            <br>
                                                   "Queues reset on
            process %s tid %d thread %s pid %d\n",
            <br>
            diff --git
            a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
          <br>
          <blockquote type="cite">index 8e0d0356e810..d7cbf9525698
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
            <br>
            @@ -377,12 +377,8 @@ static void
            event_interrupt_wq_v10(struct
            <br>
          </blockquote>
          kfd_node *dev,
          <br>
          <blockquote type="cite">               struct
            kfd_hsa_memory_exception_data exception_data;
            <br>
            <br>
                           /* gfxhub */
            <br>
            -             if (!vmid_type &&
            dev->adev->gfx.funcs->ih_node_to_logical_xcc) {
            <br>
            -                     hub_inst =
            dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev-
            <br>
            adev,
            <br>
            -                             node_id);
            <br>
            -                     if (hub_inst < 0)
            <br>
            -                             hub_inst = 0;
            <br>
            -             }
            <br>
            +             if (!vmid_type)
            <br>
            +                     hub_inst =
            amdgpu_amdkfd_node_id_to_xcc_id(dev->adev,
            <br>
          </blockquote>
          node_id);
          <br>
          <blockquote type="cite">               /* mmhub */
            <br>
                           if (vmid_type && client_id ==
            SOC15_IH_CLIENTID_VMC)
            <br>
            diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
          <br>
          <blockquote type="cite">index a9c3580be8c9..4708b8c811a5
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
            <br>
            @@ -437,12 +437,8 @@ static void
            event_interrupt_wq_v9(struct
            <br>
          </blockquote>
          kfd_node *dev,
          <br>
          <blockquote type="cite">               struct
            kfd_hsa_memory_exception_data exception_data;
            <br>
            <br>
                           /* gfxhub */
            <br>
            -             if (!vmid_type &&
            dev->adev->gfx.funcs->ih_node_to_logical_xcc) {
            <br>
            -                     hub_inst =
            dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev-
            <br>
            adev,
            <br>
            -                             node_id);
            <br>
            -                     if (hub_inst < 0)
            <br>
            -                             hub_inst = 0;
            <br>
            -             }
            <br>
            +             if (!vmid_type)
            <br>
            +                     hub_inst =
            amdgpu_amdkfd_node_id_to_xcc_id(dev->adev,
            <br>
          </blockquote>
          node_id);
          <br>
          <blockquote type="cite">               /* mmhub */
            <br>
                           if (vmid_type && client_id ==
            SOC15_IH_CLIENTID_VMC)
            <br>
            diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
            <br>
          </blockquote>
          b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
          <br>
          <blockquote type="cite">index ea6a8e43bd5b..b5f2f5b1069c
            100644
            <br>
            --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
            <br>
            @@ -251,8 +251,9 @@ void
            <br>
          </blockquote>
          kfd_smi_event_update_thermal_throttling(struct kfd_node *dev,
          <br>
          <blockquote type="cite">   void
            kfd_smi_event_update_vmfault(struct kfd_node *dev, uint16_t
            pasid)
            <br>
               {
            <br>
                   struct amdgpu_task_info *task_info;
            <br>
            +     uint32_t xcp_id = dev->xcp ? dev->xcp->id :
            0;
            <br>
            <br>
            -     task_info =
            amdgpu_vm_get_task_info_pasid(dev->adev, pasid);
            <br>
            +     task_info =
            amdgpu_vm_get_task_info_pasid(dev->adev, pasid, xcp_id);
            <br>
                   if (task_info) {
            <br>
                           /* Report VM faults from user applications,
            not retry from kernel */
            <br>
                           if (task_info->pid)
            <br>
          </blockquote>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>