<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    Am 11.09.24 um 15:40 schrieb Philip Yang:<br>
    <blockquote type="cite" cite="mid:2c6fe61b-d223-40a8-d8ba-974857676512@amd.com">
      
      <p>On 2024-09-11 02:54, Christian König wrote: </p>
      <blockquote type="cite" cite="mid:a0c252aa-e232-4d61-8dad-3782fe864ba5@amd.com"> Yeah, I
        completely agree with Xiaogang.<br>
        <br>
        The PASID is an identifier of an address space. And the idea of
        the KFD was that we can just use the same address space and with
        it the page tables for multiple execution devices, e.g. CPUs,
        GPUs etc...<br>
        <br>
        That idea turned out to be a bad one because it clashes with
        some use cases (e.g. native context virtualization). The better
        approach is to see the CPU and GPU processes as separate things
        which just share the same underlying data.<br>
        <br>
        Opening the KFD node multiple times currently results in the
        same KFD process being used. We should probably consider
        changing that.<br>
      </blockquote>
      <p>It is one KFD process binding to one app process, with count to
        support multiple open/close of for the same process.<br>
      </p>
    </blockquote>
    <br>
    Yeah and to repeat myself: That seems to be a bad idea.<br>
    <br>
    <blockquote type="cite" cite="mid:2c6fe61b-d223-40a8-d8ba-974857676512@amd.com">
      <p> </p>
      <p>The IOMMU most likely uses Linux process pid, not from kfd
        process->pasid. The KFD process->pasid is passed to F/W to
        map queues, flush TLB.</p>
    </blockquote>
    <br>
    Actually no, the IOMMU uses driver assigned PASIDs.<br>
    <br>
    <blockquote type="cite" cite="mid:2c6fe61b-d223-40a8-d8ba-974857676512@amd.com">
      <p>The reason to replace vm->pasid with KFD process->pasid
        is to find vm from fault pasid, then for compute vm, find the
        kfd process from pasid.</p>
    </blockquote>
    <br>
    Why aren't we doing it the other way around? In other words start
    using the VM assigned PASID?<br>
    <br>
    IIRC that the KFD overwrites the PASID inside the VM was only done
    for the ATC/IOMMU handling, which is now completely gone.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote type="cite" cite="mid:2c6fe61b-d223-40a8-d8ba-974857676512@amd.com">
      <p>I can see a bug in amdgpu_vm_handle_fault, only for compute vm,
        to force update PTE no-retry-fault to the incorrect VM for
        multiple partitions. This patch will fix this bug but we can
        have a simple fix.<br>
      </p>
      <p>Regards,</p>
      <p>Philip<br>
      </p>
      <blockquote type="cite" cite="mid:a0c252aa-e232-4d61-8dad-3782fe864ba5@amd.com"> <br>
        Regards,<br>
        Christian.<br>
        <br>
        <div class="moz-cite-prefix">Am 11.09.24 um 01:59 schrieb Chen,
          Xiaogang:<br>
        </div>
        <blockquote type="cite" cite="mid:be8eb0ef-2368-47c2-b4bb-d24f8cb0165d@amd.com">
          <p><br>
          </p>
          <p>You want have 1:1 mapping between vm and pasid so can query
            vm from pasid.  I think there is a basic existing issue that
            we cannot have vm and pasid 1:1 correspondence.</p>
          <p>PASIDs are global address space identifiers that can be
            shared between the GPU, an IOMMU and the driver. One app
            should have one pasid that iommu uses to decide which page
            table to use when device access system resource. But one app
            can open render/kfd node multiple times even for one gpu.
            That said one app could have multiple GPU vms .</p>
          <p>I think we did not have this issue because app usually open
            a rent node or kfd node only once. With one adev has
            multiple partitions there are multiple vms on one adev, so
            have this issue.  But the root cause is not from multiple
            partitions and solution is not to introduce multiple pasids.
            I think we should have one pasid for one app and use
            different way to get vm from pasid.</p>
          <p><br>
          </p>
          <p>Regards</p>
          <p>Xiaogang<br>
          </p>
          <div class="moz-cite-prefix">On 9/10/2024 3:47 PM, Kim,
            Jonathan wrote:<br>
          </div>
          <blockquote type="cite" cite="mid:CY8PR12MB743552BE87CE07964CC89558859A2@CY8PR12MB7435.namprd12.prod.outlook.com">
            <meta name="Generator" content="Microsoft Word 15 (filtered
            medium)">
            <style>@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
        {font-family:"Malgun Gothic";
        panose-1:2 11 5 3 2 0 0 2 0 4;}@font-face
        {font-family:Aptos;}@font-face
        {font-family:"\@Malgun Gothic";}p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        font-size:12.0pt;
        font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Arial",sans-serif;
        color:windowtext;}.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        mso-ligatures:none;}div.WordSection1
        {page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
            <p style="font-family:Calibri;font-size:10pt;color:#008000;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left"> [Public]<br>
            </p>
            <br>
            <div><!-- Yellow caution banner -->
              <table width="100%" cellspacing="0" cellpadding="0" border="0" align="left">
                <tbody>
                  <tr>
                    <!-- Remove the next line if you don't want the Yellow bar on the left side -->
                    <td style="background:#ffb900;padding:5pt 2pt 5pt 2pt"><br>
                    </td>
                    <td cellpadding="7px 6px 7px 15px" style="background:#fff8e5;padding:5pt 4pt 5pt
                    12pt;word-wrap:break-word" width="100%">
                      <div style="color:#222222;"><span style="color:#222;
                        font-weight:bold;">Caution:</span> This message
                        originated from an External Source. Use proper
                        caution when opening attachments, clicking
                        links, or responding. </div>
                    </td>
                  </tr>
                </tbody>
              </table>
              <br>
              <div>
                <p style="font-family:Calibri;font-size:10pt;color:#008000;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left"> [Public]<br>
                </p>
                <br>
                <div>
                  <div class="WordSection1">
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">KMS
                        open still set per pasid-vm bindings per adev
                        (socket) so I don’t see how the per-partition
                        overwrite PASID issue is primarily a KFD
                        concern.<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">Are
                        you saying the KFD process devices holds a
                        shadow copy of the correct VM during page
                        restore during fault?<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">Doesn’t
                        it acquire the wrong VM object on process init
                        in the first place?<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">Even
                        if it were the case the KFD had a separate VM
                        reference, the underlying IRQ fault handling is
                        still broken.<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">We
                        probably don’t want to bandage over something to
                        fix one symptom.<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif">Jon<o:p></o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
                    <p class="MsoNormal"><span style="font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
                    <div style="border:none;border-left:solid blue
                    1.5pt;padding:0in 0in 0in 4.0pt">
                      <div>
                        <div style="border:none;border-top:solid #E1E1E1
                        1.0pt;padding:3.0pt 0in 0in 0in">
                          <p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
                              Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com" moz-do-not-send="true"><Philip.Yang@amd.com></a>
                              <br>
                              <b>Sent:</b> Tuesday, September 10, 2024
                              11:24 AM<br>
                              <b>To:</b> Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
                              Kim, Jonathan <a class="moz-txt-link-rfc2396E" href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true"><Jonathan.Kim@amd.com></a>;
                              <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                              <b>Cc:</b> Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true"><Felix.Kuehling@amd.com></a>;
                              Deucher, Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                              Joshi, Mukul <a class="moz-txt-link-rfc2396E" href="mailto:Mukul.Joshi@amd.com" moz-do-not-send="true"><Mukul.Joshi@amd.com></a><br>
                              <b>Subject:</b> Re: [PATCH] drm/amdkfd:
                              fix vm-pasid lookup for multiple
                              partitions<o:p></o:p></span></p>
                        </div>
                      </div>
                      <p class="MsoNormal"><o:p> </o:p></p>
                      <p><o:p> </o:p></p>
                      <div>
                        <p class="MsoNormal">On 2024-09-09 14:46,
                          Christian König wrote:<o:p></o:p></p>
                      </div>
                      <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                        <p class="MsoNormal">Am 09.09.24 um 18:02
                          schrieb Kim, Jonathan: <br>
                          <br>
                          <o:p></o:p></p>
                        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                          <p class="MsoNormal">[Public] <br>
                            <br>
                            <br>
                            <o:p></o:p></p>
                          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                            <p class="MsoNormal">-----Original
                              Message----- <br>
                              From: Christian König <a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true"><ckoenig.leichtzumerken@gmail.com></a>
                              <br>
                              Sent: Thursday, September 5, 2024 10:24 AM
                              <br>
                              To: Kim, Jonathan <a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true"><Jonathan.Kim@amd.com></a>;
                              <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true" class="moz-txt-link-freetext">amd-gfx@lists.freedesktop.org</a>
                              <br>
                              Cc: Kuehling, Felix <a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true"><Felix.Kuehling@amd.com></a>;
                              Deucher, Alexander <br>
                              <a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                              Joshi, Mukul <a href="mailto:Mukul.Joshi@amd.com" moz-do-not-send="true"><Mukul.Joshi@amd.com></a>
                              <br>
                              Subject: Re: [PATCH] drm/amdkfd: fix
                              vm-pasid lookup for multiple partitions <br>
                              <br>
                              Caution: This message originated from an
                              External Source. Use proper caution <br>
                              when opening attachments, clicking links,
                              or responding. <br>
                              <br>
                              <br>
                              Am 19.08.24 um 19:59 schrieb Jonathan Kim:
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">Currently multiple
                                partitions will incorrectly overwrite
                                the VM lookup <br>
                                table since the table is indexed by
                                PASID and multiple partitions can <br>
                                register different VM objects on the
                                same PASID. <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">That's a rather bad
                              idea. Why do we have the same PASID for
                              different VM <br>
                              objects in the first place? <o:p></o:p></p>
                          </blockquote>
                          <p class="MsoNormal">Alex can probably
                            elaborate on the KGD side, but from what I
                            can see, the KMS driver open call has always
                            assigned a new VM object per PASID on an
                            open call. <br>
                            The KFD acquires and replaces the KGD
                            PASID-VMID registration on its own compute
                            process open/creation call. <br>
                            If this is the bad_idea you're referring to,
                            then someone else will have to chime in.  I
                            don't have much history on this
                            unfortunately. <o:p></o:p></p>
                        </blockquote>
                        <p class="MsoNormal"><br>
                          Yeah, Felix and I designed that. <o:p></o:p></p>
                      </blockquote>
                      <p>app opens drm node to create vm for each
                        partition, with different vm->pasid for each
                        vm, issue is from kfd_ioctl_acquire_vm ->
                        kfd_process_device_init_vm -> 
                        amdgpu_amdkfd_gpuvm_set_vm_pasid, to replace all
                        vm->pasid with kfd process->pasid, which
                        is from open kfd node. This ends up to store
                        only one vm to adev->vm_manager.pasids with
                        KFD process pasid, so we cannot retrieve correct
                        vm from adev->vm_manager.pasids on mGPUs or
                        multiple partitions.<o:p></o:p></p>
                      <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                        <p class="MsoNormal"><br>
                          <br>
                          <o:p></o:p></p>
                        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                          <p class="MsoNormal">That aside, the current
                            problem is, is that all KFD device
                            structures are logical partitions and
                            register their PASID-VM binding using this
                            concept of a device. <o:p></o:p></p>
                        </blockquote>
                        <p class="MsoNormal"><br>
                          As far as I can see that is the fundamental
                          problem. This needs to be fixed instead. <br>
                          <br>
                          <br>
                          <o:p></o:p></p>
                        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                          <p class="MsoNormal">On the KGD side however,
                            the registration table is maintained in the
                            adev struct, which is a physical socket. <br>
                            So there's a mismatch in understanding of
                            what a device is between the KFD & KGD
                            with regard to the look up table that
                            results in bad bindings. <br>
                            <br>
                            Adding a per-partition dimension to the
                            existing lookup table resolves issues where
                            seeing, for example, with memory violation
                            interception and XNACK i.e bad bindings
                            result in wrong vm object found to set
                            no-retry flags on memory violations. <o:p></o:p></p>
                        </blockquote>
                      </blockquote>
                      <p>svm_range_restore_pages retry fault recover
                        uses fault pasid to get kfd process, and use the
                        fault node_id to get pdd->vm, maybe you can
                        use this way to fix the debugger issue.<o:p></o:p></p>
                      <p>Regards,<o:p></o:p></p>
                      <p>Philip<o:p></o:p></p>
                      <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                        <p class="MsoNormal"><br>
                          Yeah that is pretty much a no-go. <br>
                          <br>
                          The PASID and how it is used is defined by the
                          PCIe specifications. If we now start to assign
                          multiple VMs to the same PASID then we are
                          violating the PCIe specification. <br>
                          <br>
                          The problems you see are most likely just the
                          tip of the iceberg here. <br>
                          <br>
                          Regards, <br>
                          Christian. <br>
                          <br>
                          <br>
                          <o:p></o:p></p>
                        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                          <p class="MsoNormal"><br>
                            Jon <br>
                            <br>
                            <br>
                            <o:p></o:p></p>
                          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                            <p class="MsoNormal">Regards, <br>
                              Christian. <br>
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">This results in
                                loading the wrong VM object on PASID
                                query. <br>
                                <br>
                                To correct this, setup the lookup table
                                to be per-partition-per-PASID <br>
                                instead. <br>
                                <br>
                                Signed-off-by: Jonathan Kim <a href="mailto:jonathan.kim@amd.com" moz-do-not-send="true"><jonathan.kim@amd.com></a>
                                <br>
                                --- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   
                                | 12 ++++ <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   
                                |  1 + <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_job.c      
                                |  4 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c      
                                |  7 ++- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c       
                                | 55 +++++++++++------ <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">-- <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h       
                                | 11 +++- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c       
                                |  5 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c       
                                |  5 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c       
                                |  5 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c        
                                |  3 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c        
                                |  5 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c        
                                | 16 ++---- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c       
                                |  2 +- <br>
                                  
                                drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c     
                                |  4 +- <br>
                                  
                                drivers/gpu/drm/amd/amdkfd/kfd_events.c      
                                |  3 +- <br>
                                  
                                .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c 
                                |  8 +-- <br>
                                  
                                .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c  
                                |  8 +-- <br>
                                  
                                drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c  
                                |  3 +- <br>
                                   18 files changed, 92 insertions(+),
                                65 deletions(-) <br>
                                <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                c272461d70a9..28db789610e1 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
                                <br>
                                @@ -887,3 +887,15 @@ int
                                amdgpu_amdkfd_unmap_hiq(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev,
                              u32 doorbell_off, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       return r; <br>
                                   } <br>
                                + <br>
                                +int
                                amdgpu_amdkfd_node_id_to_xcc_id(struct
                                amdgpu_device *adev, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">uint32_t node_id) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+{ <br>
                                +     if
                                (adev->gfx.funcs->ih_node_to_logical_xcc)
                                { <br>
                                +             int xcc_id =
                                adev->gfx.funcs->ih_node_to_logical_xcc(adev,
                                node_id); <br>
                                + <br>
                                +             if (xcc_id >= 0) <br>
                                +                     return xcc_id; <br>
                                +     } <br>
                                + <br>
                                +     return 0; <br>
                                +} <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                4ed49265c764..bf8bb45d8ab6 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
                                <br>
                                @@ -356,6 +356,7 @@ void
                                amdgpu_amdkfd_unreserve_mem_limit(struct
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                uint64_t size, u32 alloc_flag, int8_t
                                xcp_id); <br>
                                <br>
                                   u64
                                amdgpu_amdkfd_xcp_memory_size(struct
                                amdgpu_device *adev, int <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+int
                                amdgpu_amdkfd_node_id_to_xcc_id(struct
                                amdgpu_device *adev, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">uint32_t node_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   #define
                                KFD_XCP_MEM_ID(adev, xcp_id) \ <br>
                                               ((adev)->xcp_mgr
                                && (xcp_id) >= 0 ?\ <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                c6a1783fc9ef..bf9f8802e18d 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
                                <br>
                                @@ -37,7 +37,7 @@ static enum
                                drm_gpu_sched_stat <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_job_timedout(struct
                              drm_sched_job *s_job) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       struct
                                amdgpu_job *job = to_amdgpu_job(s_job);
                                <br>
                                       struct amdgpu_task_info *ti; <br>
                                       struct amdgpu_device *adev =
                                ring->adev; <br>
                                -     int idx; <br>
                                +     int idx, xcp_id = !job->vm ? 0
                                : job->vm->xcp_id; <br>
                                       int r; <br>
                                <br>
                                       if
                                (!drm_dev_enter(adev_to_drm(adev),
                                &idx)) { <br>
                                @@ -62,7 +62,7 @@ static enum
                                drm_gpu_sched_stat <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_job_timedout(struct
                              drm_sched_job *s_job) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                job->base.sched->name,
                                atomic_read(&ring->fence_drv.last_seq),
                                <br>
                                              
                                ring->fence_drv.sync_seq); <br>
                                <br>
                                -     ti =
                                amdgpu_vm_get_task_info_pasid(ring->adev,
                                job->pasid); <br>
                                +     ti =
                                amdgpu_vm_get_task_info_pasid(ring->adev,
                                job->pasid, xcp_id); <br>
                                       if (ti) { <br>
                                               dev_err(adev->dev, <br>
                                                       "Process
                                information: process %s pid %d thread %s
                                pid %d\n", <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                d9fde38f6ee2..e413bf4a3e84 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
                                <br>
                                @@ -1275,17 +1275,20 @@ int
                                amdgpu_info_ioctl(struct drm_device
                                *dev, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">void *data, struct
                              drm_file *filp) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               struct
                                amdgpu_vm *vm = &fpriv->vm; <br>
                                               struct
                                drm_amdgpu_info_gpuvm_fault gpuvm_fault;
                                <br>
                                               unsigned long flags; <br>
                                +             int i; <br>
                                <br>
                                               if (!vm) <br>
                                                       return -EINVAL; <br>
                                <br>
                                               memset(&gpuvm_fault,
                                0, sizeof(gpuvm_fault)); <br>
                                <br>
                                -            
                                xa_lock_irqsave(&adev->vm_manager.pasids,
                                flags); <br>
                                +             for (i = 0; i <
                                adev->xcp_mgr->num_xcps; i++) <br>
                                +                    
                                xa_lock_irqsave(&adev->vm_manager.pasids[i],
                                flags); <br>
                                               gpuvm_fault.addr =
                                vm->fault_info.addr; <br>
                                               gpuvm_fault.status =
                                vm->fault_info.status; <br>
                                               gpuvm_fault.vmhub =
                                vm->fault_info.vmhub; <br>
                                -            
                                xa_unlock_irqrestore(&adev->vm_manager.pasids,
                                flags); <br>
                                +             for (i = 0; i <
                                adev->xcp_mgr->num_xcps; i++) <br>
                                +                    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids[i],
                                flags); <br>
                                <br>
                                               return copy_to_user(out,
                                &gpuvm_fault, <br>
                                                                  
                                min((size_t)size, sizeof(gpuvm_fault)))
                                ? -EFAULT : 0; <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                bcb729094521..f43e1c15f423 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
                                <br>
                                @@ -146,7 +146,7 @@ int
                                amdgpu_vm_set_pasid(struct amdgpu_device
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">*adev, struct amdgpu_vm
                              *vm, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               return
                                0; <br>
                                <br>
                                       if (vm->pasid) { <br>
                                -             r =
                                xa_err(xa_erase_irq(&adev->vm_manager.pasids,
                                vm->pasid)); <br>
                                +             r =
                                xa_err(xa_erase_irq(&adev->vm_manager.pasids[vm->xcp_id],
                                <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">vm->pasid)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               if (r
                                < 0) <br>
                                                       return r; <br>
                                <br>
                                @@ -154,7 +154,7 @@ int
                                amdgpu_vm_set_pasid(struct amdgpu_device
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">*adev, struct amdgpu_vm
                              *vm, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       } <br>
                                <br>
                                       if (pasid) { <br>
                                -             r =
                                xa_err(xa_store_irq(&adev->vm_manager.pasids,
                                pasid, vm, <br>
                                +             r =
                                xa_err(xa_store_irq(&adev->vm_manager.pasids[vm->xcp_id],
                                <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">pasid, vm, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                                      
                                GFP_KERNEL)); <br>
                                               if (r < 0) <br>
                                                       return r; <br>
                                @@ -2288,14 +2288,14 @@ static void <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_vm_destroy_task_info(struct
                              kref *kref) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   } <br>
                                <br>
                                   static inline struct amdgpu_vm * <br>
                                -amdgpu_vm_get_vm_from_pasid(struct
                                amdgpu_device *adev, u32 pasid) <br>
                                +amdgpu_vm_get_vm_from_pasid(struct
                                amdgpu_device *adev, u32 pasid, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">u32 xcp_id) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   { <br>
                                       struct amdgpu_vm *vm; <br>
                                       unsigned long flags; <br>
                                <br>
                                -    
                                xa_lock_irqsave(&adev->vm_manager.pasids,
                                flags); <br>
                                -     vm =
                                xa_load(&adev->vm_manager.pasids,
                                pasid); <br>
                                -    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids,
                                flags); <br>
                                +    
                                xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
                                flags); <br>
                                +     vm =
                                xa_load(&adev->vm_manager.pasids[xcp_id],
                                pasid); <br>
                                +    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
                                flags); <br>
                                <br>
                                       return vm; <br>
                                   } <br>
                                @@ -2343,10 +2343,10 @@
                                amdgpu_vm_get_task_info_vm(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_vm *vm) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">    * referenced down
                                with amdgpu_vm_put_task_info. <br>
                                    */ <br>
                                   struct amdgpu_task_info * <br>
                                -amdgpu_vm_get_task_info_pasid(struct
                                amdgpu_device *adev, u32 pasid) <br>
                                +amdgpu_vm_get_task_info_pasid(struct
                                amdgpu_device *adev, u32 pasid, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">u32 xcp_id) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   { <br>
                                       return
                                amdgpu_vm_get_task_info_vm( <br>
                                -                    
                                amdgpu_vm_get_vm_from_pasid(adev,
                                pasid)); <br>
                                +                    
                                amdgpu_vm_get_vm_from_pasid(adev, pasid,
                                xcp_id)); <br>
                                   } <br>
                                <br>
                                   static int
                                amdgpu_vm_create_task_info(struct
                                amdgpu_vm *vm) <br>
                                @@ -2481,6 +2481,8 @@ int
                                amdgpu_vm_init(struct amdgpu_device <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">*adev, struct amdgpu_vm
                              *vm, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">      
                                amdgpu_bo_unreserve(vm->root.bo); <br>
                                       amdgpu_bo_unref(&root_bo); <br>
                                <br>
                                +     vm->xcp_id = xcp_id < 0 ? 0
                                : xcp_id; <br>
                                + <br>
                                       return 0; <br>
                                <br>
                                   error_free_root: <br>
                                @@ -2695,8 +2697,8 @@ void
                                amdgpu_vm_manager_init(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   #else <br>
                                      
                                adev->vm_manager.vm_update_mode = 0;
                                <br>
                                   #endif <br>
                                - <br>
                                -    
                                xa_init_flags(&adev->vm_manager.pasids,
                                XA_FLAGS_LOCK_IRQ); <br>
                                +     for (i = 0; i < MAX_XCP; i++) <br>
                                +            
                                xa_init_flags(&(adev->vm_manager.pasids[i]),
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">XA_FLAGS_LOCK_IRQ); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   } <br>
                                <br>
                                   /** <br>
                                @@ -2708,10 +2710,15 @@ void
                                amdgpu_vm_manager_init(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">    */ <br>
                                   void amdgpu_vm_manager_fini(struct
                                amdgpu_device *adev) <br>
                                   { <br>
                                -    
                                WARN_ON(!xa_empty(&adev->vm_manager.pasids));
                                <br>
                                -    
                                xa_destroy(&adev->vm_manager.pasids);
                                <br>
                                +     int i; <br>
                                + <br>
                                +     for (i = 0; i < MAX_XCP; i++) {
                                <br>
                                +            
                                WARN_ON(!xa_empty(&adev->vm_manager.pasids[i]));
                                <br>
                                +            
                                xa_destroy(&adev->vm_manager.pasids[i]);
                                <br>
                                +     } <br>
                                <br>
                                       amdgpu_vmid_mgr_fini(adev); <br>
                                + <br>
                                   } <br>
                                <br>
                                   /** <br>
                                @@ -2778,17 +2785,18 @@ bool
                                amdgpu_vm_handle_fault(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev,
                              u32 pasid, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       unsigned long
                                irqflags; <br>
                                       uint64_t value, flags; <br>
                                       struct amdgpu_vm *vm; <br>
                                -     int r; <br>
                                +     int r, xcp_id; <br>
                                <br>
                                -    
                                xa_lock_irqsave(&adev->vm_manager.pasids,
                                irqflags); <br>
                                -     vm =
                                xa_load(&adev->vm_manager.pasids,
                                pasid); <br>
                                +     xcp_id =
                                amdgpu_amdkfd_node_id_to_xcc_id(adev,
                                node_id)/adev- <br>
                                gfx.num_xcc_per_xcp; <br>
                                +    
                                xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
                                irqflags); <br>
                                +     vm =
                                xa_load(&adev->vm_manager.pasids[xcp_id],
                                pasid); <br>
                                       if (vm) { <br>
                                               root =
                                amdgpu_bo_ref(vm->root.bo); <br>
                                               is_compute_context =
                                vm->is_compute_context; <br>
                                       } else { <br>
                                               root = NULL; <br>
                                       } <br>
                                -    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids,
                                irqflags); <br>
                                +    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
                                irqflags); <br>
                                <br>
                                       if (!root) <br>
                                               return false; <br>
                                @@ -2806,11 +2814,11 @@ bool
                                amdgpu_vm_handle_fault(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev,
                              u32 pasid, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               goto
                                error_unref; <br>
                                <br>
                                       /* Double check that the VM still
                                exists */ <br>
                                -    
                                xa_lock_irqsave(&adev->vm_manager.pasids,
                                irqflags); <br>
                                -     vm =
                                xa_load(&adev->vm_manager.pasids,
                                pasid); <br>
                                +    
                                xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
                                irqflags); <br>
                                +     vm =
                                xa_load(&adev->vm_manager.pasids[xcp_id],
                                pasid); <br>
                                       if (vm && vm->root.bo
                                != root) <br>
                                               vm = NULL; <br>
                                -    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids,
                                irqflags); <br>
                                +    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
                                irqflags); <br>
                                       if (!vm) <br>
                                               goto error_unlock; <br>
                                <br>
                                @@ -2968,14 +2976,15 @@ void
                                amdgpu_vm_update_fault_cache(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                                
                                unsigned int pasid, <br>
                                                                
                                uint64_t addr, <br>
                                                                
                                uint32_t status, <br>
                                -                               unsigned
                                int vmhub) <br>
                                +                               unsigned
                                int vmhub, <br>
                                +                               uint32_t
                                xcp_id) <br>
                                   { <br>
                                       struct amdgpu_vm *vm; <br>
                                       unsigned long flags; <br>
                                <br>
                                -    
                                xa_lock_irqsave(&adev->vm_manager.pasids,
                                flags); <br>
                                +    
                                xa_lock_irqsave(&adev->vm_manager.pasids[xcp_id],
                                flags); <br>
                                <br>
                                -     vm =
                                xa_load(&adev->vm_manager.pasids,
                                pasid); <br>
                                +     vm =
                                xa_load(&adev->vm_manager.pasids[xcp_id],
                                pasid); <br>
                                       /* Don't update the fault cache
                                if status is 0.  In the multiple <br>
                                        * fault case, subsequent faults
                                will return a 0 status which is <br>
                                        * useless for userspace and
                                replaces the useful fault status, so <br>
                                @@ -3008,7 +3017,7 @@ void
                                amdgpu_vm_update_fault_cache(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                      
                                WARN_ONCE(1, "Invalid vmhub %u\n",
                                vmhub); <br>
                                               } <br>
                                       } <br>
                                -    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids,
                                flags); <br>
                                +    
                                xa_unlock_irqrestore(&adev->vm_manager.pasids[xcp_id],
                                flags); <br>
                                   } <br>
                                <br>
                                   /** <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                046949c4b695..1499f5f731e9 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
                                <br>
                                @@ -35,6 +35,7 @@ <br>
                                   #include "amdgpu_sync.h" <br>
                                   #include "amdgpu_ring.h" <br>
                                   #include "amdgpu_ids.h" <br>
                                +#include "amdgpu_xcp.h" <br>
                                <br>
                                   struct drm_exec; <br>
                                <br>
                                @@ -418,6 +419,9 @@ struct amdgpu_vm { <br>
                                <br>
                                       /* cached fault info */ <br>
                                       struct amdgpu_vm_fault_info
                                fault_info; <br>
                                + <br>
                                +     /* XCP ID */ <br>
                                +     int xcp_id; <br>
                                   }; <br>
                                <br>
                                   struct amdgpu_vm_manager { <br>
                                @@ -456,7 +460,7 @@ struct
                                amdgpu_vm_manager { <br>
                                       /* PASID to VM mapping, will be
                                used in interrupt context to <br>
                                        * look up VM of a page fault <br>
                                        */ <br>
                                -     struct
                                xarray                           pasids;
                                <br>
                                +     struct
                                xarray                          
                                pasids[MAX_XCP]; <br>
                                       /* Global registration of recent
                                page fault information */ <br>
                                       struct amdgpu_vm_fault_info    
                                fault_info; <br>
                                   }; <br>
                                @@ -550,7 +554,7 @@ bool
                                amdgpu_vm_need_pipeline_sync(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_ring *ring, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   void
                                amdgpu_vm_check_compute_bug(struct
                                amdgpu_device *adev); <br>
                                <br>
                                   struct amdgpu_task_info * <br>
                                -amdgpu_vm_get_task_info_pasid(struct
                                amdgpu_device *adev, u32 pasid); <br>
                                +amdgpu_vm_get_task_info_pasid(struct
                                amdgpu_device *adev, u32 pasid, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">u32 xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   struct
                                amdgpu_task_info * <br>
                                   amdgpu_vm_get_task_info_vm(struct
                                amdgpu_vm *vm); <br>
                                @@ -649,7 +653,8 @@ void
                                amdgpu_vm_update_fault_cache(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                                
                                unsigned int pasid, <br>
                                                                
                                uint64_t addr, <br>
                                                                
                                uint32_t status, <br>
                                -                               unsigned
                                int vmhub); <br>
                                +                               unsigned
                                int vmhub, <br>
                                +                               uint32_t
                                xcp_id); <br>
                                   void
                                amdgpu_vm_tlb_fence_create(struct
                                amdgpu_device *adev, <br>
                                                                struct
                                amdgpu_vm *vm, <br>
                                                                struct
                                dma_fence **fence); <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                f0ceab3ce5bf..24b042febf5c 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                                <br>
                                @@ -151,7 +151,8 @@ static int
                                gmc_v10_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                WREG32_P(hub->vm_l2_pro_fault_cntl,
                                1, ~1); <br>
                                <br>
                                              
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, addr, status, <br>
-                                          entry->vmid_src ?
                                AMDGPU_MMHUB0(0) : <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                entry->vmid_src ? AMDGPU_MMHUB0(0) :
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0), <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                0); <br>
                                       } <br>
                                <br>
                                       if (!printk_ratelimit()) <br>
                                @@ -161,7 +162,7 @@ static int
                                gmc_v10_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               "[%s]
                                page fault (src_id:%u ring:%u vmid:%u
                                pasid:%u)\n", <br>
                                               entry->vmid_src ?
                                "mmhub" : "gfxhub", <br>
                                               entry->src_id,
                                entry->ring_id, entry->vmid,
                                entry->pasid); <br>
                                -     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, 0); <br>
                                       if (task_info) { <br>
                                               dev_err(adev->dev, <br>
                                                       " in process %s
                                pid %d thread %s pid %d\n", <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                2797fd84432b..3507046d33e6 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                                <br>
                                @@ -122,7 +122,8 @@ static int
                                gmc_v11_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                WREG32_P(hub->vm_l2_pro_fault_cntl,
                                1, ~1); <br>
                                <br>
                                              
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, addr, status, <br>
-                                          entry->vmid_src ?
                                AMDGPU_MMHUB0(0) : <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                entry->vmid_src ? AMDGPU_MMHUB0(0) :
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0), <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                0); <br>
                                       } <br>
                                <br>
                                       if (printk_ratelimit()) { <br>
                                @@ -132,7 +133,7 @@ static int
                                gmc_v11_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                      
                                "[%s] page fault (src_id:%u ring:%u
                                vmid:%u pasid:%u)\n", <br>
                                                      
                                entry->vmid_src ? "mmhub" : "gfxhub",
                                <br>
                                                       entry->src_id,
                                entry->ring_id, entry->vmid,
                                entry->pasid); <br>
                                -             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, 0); <br>
                                               if (task_info) { <br>
                                                      
                                dev_err(adev->dev, <br>
                                                               " in
                                process %s pid %d thread %s pid %d)\n",
                                <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                60acf676000b..9844564c6c74 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
                                <br>
                                @@ -115,7 +115,8 @@ static int
                                gmc_v12_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                WREG32_P(hub->vm_l2_pro_fault_cntl,
                                1, ~1); <br>
                                <br>
                                              
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, addr, status, <br>
-                                          entry->vmid_src ?
                                AMDGPU_MMHUB0(0) : <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                entry->vmid_src ? AMDGPU_MMHUB0(0) :
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0), <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                         
                                0); <br>
                                       } <br>
                                <br>
                                       if (printk_ratelimit()) { <br>
                                @@ -125,7 +126,7 @@ static int
                                gmc_v12_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                      
                                "[%s] page fault (src_id:%u ring:%u
                                vmid:%u pasid:%u)\n", <br>
                                                      
                                entry->vmid_src ? "mmhub" : "gfxhub",
                                <br>
                                                       entry->src_id,
                                entry->ring_id, entry->vmid,
                                entry->pasid); <br>
                                -             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, 0); <br>
                                               if (task_info) { <br>
                                                      
                                dev_err(adev->dev, <br>
                                                               " in
                                process %s pid %d thread %s pid %d)\n",
                                <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                994432fb57ea..2cdb0cbb7c4d 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
                                <br>
                                @@ -1268,7 +1268,8 @@ static int
                                gmc_v7_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               return
                                0; <br>
                                <br>
                                      
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, <br>
                                -                                 
                                ((u64)addr) <<
                                AMDGPU_GPU_PAGE_SHIFT, status, <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                 
                                ((u64)addr) <<
                                AMDGPU_GPU_PAGE_SHIFT, status, <br>
                                +                                 
                                AMDGPU_GFXHUB(0), 0); <br>
                                <br>
                                       if (amdgpu_vm_fault_stop ==
                                AMDGPU_VM_FAULT_STOP_FIRST) <br>
                                              
                                gmc_v7_0_set_fault_enable_default(adev,
                                false); <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                86488c052f82..6855caeb7f74 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
                                <br>
                                @@ -1437,7 +1437,8 @@ static int
                                gmc_v8_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               return
                                0; <br>
                                <br>
                                      
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, <br>
                                -                                 
                                ((u64)addr) <<
                                AMDGPU_GPU_PAGE_SHIFT, status, <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">AMDGPU_GFXHUB(0)); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+                                 
                                ((u64)addr) <<
                                AMDGPU_GPU_PAGE_SHIFT, status, <br>
                                +                                 
                                AMDGPU_GFXHUB(0), 0); <br>
                                <br>
                                       if (amdgpu_vm_fault_stop ==
                                AMDGPU_VM_FAULT_STOP_FIRST) <br>
                                              
                                gmc_v8_0_set_fault_enable_default(adev,
                                false); <br>
                                @@ -1448,7 +1449,7 @@ static int
                                gmc_v8_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                dev_err(adev->dev, "GPU fault
                                detected: %d 0x%08x\n", <br>
                                                       entry->src_id,
                                entry->src_data[0]); <br>
                                <br>
                                -             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +             task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, 0); <br>
                                               if (task_info) { <br>
                                                      
                                dev_err(adev->dev, " for process %s
                                pid %d thread %s pid %d\n", <br>
                                                              
                                task_info->process_name,
                                task_info->tgid, <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                b73136d390cc..e183e08b2c02 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
                                <br>
                                @@ -556,10 +556,12 @@ static int
                                gmc_v9_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       unsigned int
                                vmhub; <br>
                                       u64 addr; <br>
                                       uint32_t cam_index = 0; <br>
                                -     int ret, xcc_id = 0; <br>
                                -     uint32_t node_id; <br>
                                +     int ret; <br>
                                +     uint32_t node_id, xcc_id, xcp_id;
                                <br>
                                <br>
                                       node_id = entry->node_id; <br>
                                +     xcc_id =
                                amdgpu_amdkfd_node_id_to_xcc_id(adev,
                                node_id); <br>
                                +     xcp_id =
                                xcc_id/adev->gfx.num_xcc_per_xcp; <br>
                                <br>
                                       addr = (u64)entry->src_data[0]
                                << 12; <br>
                                       addr |=
                                ((u64)entry->src_data[1] & 0xf)
                                << 44; <br>
                                @@ -572,12 +574,6 @@ static int
                                gmc_v9_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               vmhub
                                = AMDGPU_MMHUB1(0); <br>
                                       } else { <br>
                                               hub_name = "gfxhub0"; <br>
                                -             if
                                (adev->gfx.funcs->ih_node_to_logical_xcc)
                                { <br>
                                -                     xcc_id =
                                adev->gfx.funcs->ih_node_to_logical_xcc(adev,
                                <br>
                                -                             node_id);
                                <br>
                                -                     if (xcc_id < 0)
                                <br>
                                -                             xcc_id =
                                0; <br>
                                -             } <br>
                                               vmhub = xcc_id; <br>
                                       } <br>
                                       hub = &adev->vmhub[vmhub];
                                <br>
                                @@ -631,7 +627,7 @@ static int
                                gmc_v9_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">              
                                retry_fault ? "retry" : "no-retry", <br>
                                               entry->src_id,
                                entry->ring_id, entry->vmid,
                                entry->pasid); <br>
                                <br>
                                -     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       if (task_info)
                                { <br>
                                               dev_err(adev->dev, <br>
                                                       " for process %s
                                pid %d thread %s pid %d)\n", <br>
                                @@ -675,7 +671,7 @@ static int
                                gmc_v9_0_process_interrupt(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       if
                                (!amdgpu_sriov_vf(adev)) <br>
                                              
                                WREG32_P(hub->vm_l2_pro_fault_cntl,
                                1, ~1); <br>
                                <br>
                                -     amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, addr, status, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">vmhub); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">+    
                                amdgpu_vm_update_fault_cache(adev,
                                entry->pasid, addr, status, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">vmhub, xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">      
                                dev_err(adev->dev, <br>
                                              
                                "VM_L2_PROTECTION_FAULT_STATUS:0x%08X\n",
                                <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                23ef4eb36b40..1ac4224bbe5b 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
                                <br>
                                @@ -2182,7 +2182,7 @@ static int
                                sdma_v4_0_print_iv_entry(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                         
                                instance, addr, entry->src_id,
                                entry->ring_id, entry->vmid, <br>
                                                         
                                entry->pasid); <br>
                                <br>
                                -     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, 0); <br>
                                       if (task_info) { <br>
                                              
                                dev_dbg_ratelimited(adev->dev, <br>
                                                                   " for
                                process %s pid %d thread %s pid %d\n", <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                57f16c09abfc..c8b5c0302ca7 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
                                <br>
                                @@ -1683,6 +1683,8 @@ static int
                                sdma_v4_4_2_print_iv_entry(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       int instance;
                                <br>
                                       struct amdgpu_task_info
                                *task_info; <br>
                                       u64 addr; <br>
                                +     uint32_t xcc_id =
                                amdgpu_amdkfd_node_id_to_xcc_id(adev,
                                entry- <br>
                                node_id); <br>
                                +     uint32_t xcp_id =
                                xcc_id/adev->gfx.num_xcc_per_xcp; <br>
                                <br>
                                       instance =
                                sdma_v4_4_2_irq_id_to_seq(adev,
                                entry->client_id); <br>
                                       if (instance < 0 || instance
                                >= adev->sdma.num_instances) { <br>
                                @@ -1698,7 +1700,7 @@ static int
                                sdma_v4_4_2_print_iv_entry(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">amdgpu_device *adev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                          
                                instance, addr, entry->src_id,
                                entry->ring_id, entry->vmid, <br>
                                                          
                                entry->pasid); <br>
                                <br>
                                -     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid); <br>
                                +     task_info =
                                amdgpu_vm_get_task_info_pasid(adev,
                                entry->pasid, <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">       if (task_info)
                                { <br>
                                              
                                dev_dbg_ratelimited(adev->dev, " for
                                process %s pid %d thread %s <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">pid %d\n", <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                                  
                                task_info->process_name,
                                task_info->tgid, <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                ea3792249209..c098fbaf0e1c 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
                                <br>
                                @@ -1262,8 +1262,9 @@ void
                                kfd_signal_reset_event(struct kfd_node <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">*dev) <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               if
                                (dev->dqm->detect_hang_count) { <br>
                                                       struct
                                amdgpu_task_info *ti; <br>
                                +                     uint32_t xcp_id =
                                dev->xcp ? dev->xcp->id : 0; <br>
                                <br>
                                -                     ti =
                                amdgpu_vm_get_task_info_pasid(dev->adev,
                                p->pasid); <br>
                                +                     ti =
                                amdgpu_vm_get_task_info_pasid(dev->adev,
                                p->pasid, <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">xcp_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">                      
                                if (ti) { <br>
                                                              
                                dev_err(dev->adev->dev, <br>
                                                                      
                                "Queues reset on process %s tid %d
                                thread %s pid %d\n", <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                8e0d0356e810..d7cbf9525698 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
                                <br>
                                @@ -377,12 +377,8 @@ static void
                                event_interrupt_wq_v10(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">kfd_node *dev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               struct
                                kfd_hsa_memory_exception_data
                                exception_data; <br>
                                <br>
                                               /* gfxhub */ <br>
                                -             if (!vmid_type &&
dev->adev->gfx.funcs->ih_node_to_logical_xcc) { <br>
                                -                     hub_inst =
                                dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev-
                                <br>
                                adev, <br>
                                -                             node_id);
                                <br>
                                -                     if (hub_inst <
                                0) <br>
                                -                             hub_inst =
                                0; <br>
                                -             } <br>
                                +             if (!vmid_type) <br>
                                +                     hub_inst =
                                amdgpu_amdkfd_node_id_to_xcc_id(dev->adev,
                                <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">node_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               /*
                                mmhub */ <br>
                                               if (vmid_type &&
                                client_id == SOC15_IH_CLIENTID_VMC) <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                a9c3580be8c9..4708b8c811a5 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
                                <br>
                                @@ -437,12 +437,8 @@ static void
                                event_interrupt_wq_v9(struct <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">kfd_node *dev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               struct
                                kfd_hsa_memory_exception_data
                                exception_data; <br>
                                <br>
                                               /* gfxhub */ <br>
                                -             if (!vmid_type &&
dev->adev->gfx.funcs->ih_node_to_logical_xcc) { <br>
                                -                     hub_inst =
                                dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev-
                                <br>
                                adev, <br>
                                -                             node_id);
                                <br>
                                -                     if (hub_inst <
                                0) <br>
                                -                             hub_inst =
                                0; <br>
                                -             } <br>
                                +             if (!vmid_type) <br>
                                +                     hub_inst =
                                amdgpu_amdkfd_node_id_to_xcc_id(dev->adev,
                                <o:p> </o:p></p>
                            </blockquote>
                            <p class="MsoNormal">node_id); <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">               /*
                                mmhub */ <br>
                                               if (vmid_type &&
                                client_id == SOC15_IH_CLIENTID_VMC) <br>
                                diff --git
                                a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
                                <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
                              <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">index
                                ea6a8e43bd5b..b5f2f5b1069c 100644 <br>
                                ---
                                a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
                                <br>
                                +++
                                b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
                                <br>
                                @@ -251,8 +251,9 @@ void <o:p></o:p></p>
                            </blockquote>
                            <p class="MsoNormal">kfd_smi_event_update_thermal_throttling(struct
                              kfd_node *dev, <br>
                              <br>
                              <o:p></o:p></p>
                            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <p class="MsoNormal">   void
                                kfd_smi_event_update_vmfault(struct
                                kfd_node *dev, uint16_t pasid) <br>
                                   { <br>
                                       struct amdgpu_task_info
                                *task_info; <br>
                                +     uint32_t xcp_id = dev->xcp ?
                                dev->xcp->id : 0; <br>
                                <br>
                                -     task_info =
                                amdgpu_vm_get_task_info_pasid(dev->adev,
                                pasid); <br>
                                +     task_info =
                                amdgpu_vm_get_task_info_pasid(dev->adev,
                                pasid, xcp_id); <br>
                                       if (task_info) { <br>
                                               /* Report VM faults from
                                user applications, not retry from kernel
                                */ <br>
                                               if (task_info->pid) <o:p></o:p></p>
                            </blockquote>
                          </blockquote>
                        </blockquote>
                        <p class="MsoNormal"><o:p> </o:p></p>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
        </blockquote>
        <br>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>