<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>On 2022-04-27 05:20, Shuotao Xu wrote:<br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      
      Hi Andrey,
      <div class=""><br class="">
      </div>
      <div class="">Sorry that I did not have time to work on this for a
        few days.</div>
      <div class=""><br class="">
      </div>
      <div class="">I just tried the sysfs crash fix on Radeon VII and
        it seems that it worked. It did not pass last the hotplug test,
        but my version has 4 tests instead of 3 in your case.</div>
    </blockquote>
    <p><br>
    </p>
    <p>That because the 4th one is only enabled when here are 2 cards in
      the system - to test DRI_PRIME export. I tested this time with
      only one card.<br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      <div class="">
        <div class=""><br class="">
        </div>
        <div class=""><br class="">
        </div>
        <div class="">Suite: Hotunplug Tests</div>
        <div class="">  Test: Unplug card and rescan the bus to plug it
          back .../usr/local/share/libdrm/amdgpu.ids: No such file or
          directory</div>
        <div class="">passed</div>
        <div class="">  Test: Same as first test but with command
          submission .../usr/local/share/libdrm/amdgpu.ids: No such file
          or directory</div>
        <div class="">passed</div>
        <div class="">  Test: Unplug with exported bo
          .../usr/local/share/libdrm/amdgpu.ids: No such file or
          directory</div>
        <div class="">passed</div>
        <div class="">  Test: Unplug with exported fence
          .../usr/local/share/libdrm/amdgpu.ids: No such file or
          directory</div>
        <div class="">amdgpu_device_initialize: amdgpu_get_auth (1)
          failed (-1)</div>
      </div>
    </blockquote>
    <p><br>
      on the kernel side - the IOCTlL returning this is drm_getclient -
      maybe take a look while it can't find client it ? I didn't have
      such issue as far as I remember when testing. </p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      <div class="">
        <div class="">FAILED</div>
        <div class="">    1. ../tests/amdgpu/hotunplug_tests.c:368  -
          CU_ASSERT_EQUAL(r,0)</div>
        <div class="">    2. ../tests/amdgpu/hotunplug_tests.c:411  -
          CU_ASSERT_EQUAL(amdgpu_cs_import_syncobj(device2, shared_fd,
          &sync_obj_handle2),0)</div>
        <div class="">    3. ../tests/amdgpu/hotunplug_tests.c:423  -
          CU_ASSERT_EQUAL(amdgpu_cs_syncobj_wait(device2,
          &sync_obj_handle2, 1, 100000000, 0, NULL),0)</div>
        <div class="">    4. ../tests/amdgpu/hotunplug_tests.c:425  -
          CU_ASSERT_EQUAL(amdgpu_cs_destroy_syncobj(device2,
          sync_obj_handle2),0)</div>
        <div class=""><br class="">
        </div>
        <div class="">Run Summary:    Type  Total    Ran Passed Failed
          Inactive</div>
        <div class="">              suites     14      1    n/a      0  
               0</div>
        <div class="">               tests     71      4      3      1  
               0</div>
        <div class="">             asserts     39     39     35      4  
             n/a</div>
        <div class=""><br class="">
        </div>
        <div class="">Elapsed time =   17.321 seconds</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class="">For kfd compute, there is some problem which I did
        not see in MI100 after I killed the hung application after hot
        plugout. I was using rocm5.0.2 driver for MI100 card, and not
        sure if it is a regression from the newer driver.</div>
      <div class="">After pkill, one of child of user process would be
        stuck in Zombie mode (Z) understandably because of the bug, and
        future rocm application after plug-back would in uninterrupted
        sleep mode (D) because it would not return from syscall to kfd.</div>
      <div class=""><br class="">
      </div>
      <div class="">Although drm test for amdgpu would run just fine
        without issues after plug-back with dangling kfd state. <br>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>I am not clear when the crash bellow happens ? Is it related to
      what you describe above ?</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      <div class=""><br class="">
      </div>
      <div class="">I don’t know if there is a quick fix to it. I was
        thinking add drm_enter/drm_exit to amdgpu_device_rreg.</div>
    </blockquote>
    <p><br>
    </p>
    <p>Try adding drm_dev_enter/exit pair at the highest level of
      attmetong to access HW - in this case it's
      amdgpu_amdkfd_set_compute_idle. We always try to avoid accessing
      any HW functions after backing device is gone.</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      <div class="">Also this has been a long time in my attempt to fix
        hotplug issue for kfd application. </div>
      <div class="">I don’t know 1) if I would be able to get to MI100
        (fixing Radeon VII would mean something but MI100 is more
        important for us); 2) what the direct of the patch to this issue
        will move forward.</div>
    </blockquote>
    <p><br>
    </p>
    <p>I will go to office tomorrow to pick up MI-100, With time and
      priorities permitting I will then then try to test it and fix any
      bugs such that it will be passing all hot plug libdrm tests at the
      tip of public amd-staging-drm-next -
      <a class="moz-txt-link-freetext" href="https://gitlab.freedesktop.org/agd5f/linux">https://gitlab.freedesktop.org/agd5f/linux</a>, after that you can try
      to continue working with ROCm enabling on top of that. <br>
    </p>
    <p>For now i suggest you move on with Radeon 7 which as your
      development ASIC and use the fix i mentioned above.<br>
    </p>
    <p>Andrey</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:FF40C1DB-326C-45F5-9B59-14C39E17359D@microsoft.com">
      <div class=""><br class="">
      </div>
      <div class="">Regards,</div>
      <div class="">Shuotao</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div class="">[  +0.001645] BUG: unable to handle page fault for
          address: 0000000000058a68</div>
        <div class="">[  +0.001298] #PF: supervisor read access in
          kernel mode</div>
        <div class="">[  +0.001252] #PF: error_code(0x0000) -
          not-present page</div>
        <div class="">[  +0.001248] PGD 8000000115806067 P4D
          8000000115806067 PUD 109b2d067 PMD 0</div>
        <div class="">[  +0.001270] Oops: 0000 [#1] PREEMPT SMP PTI</div>
        <div class="">[  +0.001256] CPU: 5 PID: 13818 Comm:
          tf_cnn_benchmar Tainted: G        W   E     5.16.0+ #3</div>
        <div class="">[  +0.001290] Hardware name: Dell Inc. PowerEdge
          R730/0H21J3, BIOS 1.5.4 [FPGA Test BIOS] 10/002/2015</div>
        <div class="">[  +0.001309] RIP:
          0010:amdgpu_device_rreg.part.24+0xa9/0xe0 [amdgpu]</div>
        <div class="">[  +0.001562] Code: e8 8c 7d 02 00 65 ff 0d 65 e0
          7f 3f 75 ae 0f 1f 44 00 00 eb a7 83 e2 02 75 09 f6 87 10 69 01
          00 10 75 0d 4c 03 a3 a0 09 00 00 <45> 8b 24 24 eb 8a 4c
          8d b7 b0 6b 01 00 4c 89 f7 e8 a2 4c 2e ca 85</div>
        <div class="">[  +0.002751] RSP: 0018:ffffb58fac313928 EFLAGS:
          00010202</div>
        <div class="">[  +0.001388] RAX: ffffffffc09a4270 RBX:
          ffff8b0c9c840000 RCX: 00000000ffffffff</div>
        <div class="">[  +0.001402] RDX: 0000000000000000 RSI:
          000000000001629a RDI: ffff8b0c9c840000</div>
        <div class="">[  +0.001418] RBP: ffffb58fac313948 R08:
          0000000000000021 R09: 0000000000000001</div>
        <div class="">[  +0.001421] R10: ffffb58fac313b30 R11:
          ffffffff8c065b00 R12: 0000000000058a68</div>
        <div class="">[  +0.001400] R13: 000000000001629a R14:
          0000000000000000 R15: 000000000001629a</div>
        <div class="">[  +0.001397] FS:  0000000000000000(0000)
          GS:ffff8b4b7fa80000(0000) knlGS:0000000000000000</div>
        <div class="">[  +0.001411] CS:  0010 DS: 0000 ES: 0000 CR0:
          0000000080050033</div>
        <div class="">[  +0.001405] CR2: 0000000000058a68 CR3:
          000000010a2c8001 CR4: 00000000001706e0</div>
        <div class="">[  +0.001422] Call Trace:</div>
        <div class="">[  +0.001407]  <TASK></div>
        <div class="">[  +0.001391]  amdgpu_device_rreg+0x17/0x20
          [amdgpu]</div>
        <div class="">[  +0.001614]  amdgpu_cgs_read_register+0x14/0x20
          [amdgpu]</div>
        <div class="">[  +0.001735]
           phm_wait_for_register_unequal.part.1+0x58/0x90 [amdgpu]</div>
        <div class="">[  +0.001790]
           phm_wait_for_register_unequal+0x1a/0x30 [amdgpu]</div>
        <div class="">[  +0.001800]  vega20_wait_for_response+0x28/0x80
          [amdgpu]</div>
        <div class="">[  +0.001757]
           vega20_send_msg_to_smc_with_parameter+0x21/0x110 [amdgpu]</div>
        <div class="">[  +0.001838]
           smum_send_msg_to_smc_with_parameter+0xcd/0x100 [amdgpu]</div>
        <div class="">[  +0.001829]  ? kvfree+0x1e/0x30</div>
        <div class="">[  +0.001462]
           vega20_set_power_profile_mode+0x58/0x330 [amdgpu]</div>
        <div class="">[  +0.001868]  ? kvfree+0x1e/0x30</div>
        <div class="">[  +0.001462]  ? ttm_bo_release+0x261/0x370 [ttm]</div>
        <div class="">[  +0.001467]
           pp_dpm_switch_power_profile+0xc2/0x170 [amdgpu]</div>
        <div class="">[  +0.001863]
           amdgpu_dpm_switch_power_profile+0x6b/0x90 [amdgpu]</div>
        <div class="">[  +0.001866]
           amdgpu_amdkfd_set_compute_idle+0x1a/0x20 [amdgpu]</div>
        <div class="">[  +0.001784]  kfd_dec_compute_active+0x2c/0x50
          [amdgpu]</div>
        <div class="">[  +0.001744]
           process_termination_cpsch+0x2f9/0x3a0 [amdgpu]</div>
        <div class="">[  +0.001728]
           kfd_process_dequeue_from_all_devices+0x49/0x70 [amdgpu]</div>
        <div class="">[  +0.001730]
           kfd_process_notifier_release+0x91/0xe0 [amdgpu]</div>
        <div class="">[  +0.001718]  __mmu_notifier_release+0x77/0x1f0</div>
        <div class="">[  +0.001411]  exit_mmap+0x1b5/0x200</div>
        <div class="">[  +0.001396]  ? __switch_to+0x12d/0x3e0</div>
        <div class="">[  +0.001388]  ? __switch_to_asm+0x36/0x70</div>
        <div class="">[  +0.001372]  ? preempt_count_add+0x74/0xc0</div>
        <div class="">[  +0.001364]  mmput+0x57/0x110</div>
        <div class="">[  +0.001349]  do_exit+0x33d/0xc20</div>
        <div class="">[  +0.001337]  ? _raw_spin_unlock+0x1a/0x30</div>
        <div class="">[  +0.001346]  do_group_exit+0x43/0xa0</div>
        <div class="">[  +0.001341]  get_signal+0x131/0x920</div>
        <div class="">[  +0.001295]
           arch_do_signal_or_restart+0xb1/0x870</div>
        <div class="">[  +0.001303]  ? do_futex+0x125/0x190</div>
        <div class="">[  +0.001285]
           exit_to_user_mode_prepare+0xb1/0x1c0</div>
        <div class="">[  +0.001282]  syscall_exit_to_user_mode+0x2a/0x40</div>
        <div class="">[  +0.001264]  do_syscall_64+0x46/0xb0</div>
        <div class="">[  +0.001236]
           entry_SYSCALL_64_after_hwframe+0x44/0xae</div>
        <div class="">[  +0.001219] RIP: 0033:0x7f6aff1d2ad3</div>
        <div class="">[  +0.001177] Code: Unable to access opcode bytes
          at RIP 0x7f6aff1d2aa9.</div>
        <div class="">[  +0.001166] RSP: 002b:00007f6ab2029d20 EFLAGS:
          00000246 ORIG_RAX: 00000000000000ca</div>
        <div class="">[  +0.001170] RAX: fffffffffffffe00 RBX:
          0000000004f542b0 RCX: 00007f6aff1d2ad3</div>
        <div class="">[  +0.001168] RDX: 0000000000000000 RSI:
          0000000000000080 RDI: 0000000004f542d8</div>
        <div class="">[  +0.001162] RBP: 0000000004f542d4 R08:
          0000000000000000 R09: 0000000000000000</div>
        <div class="">[  +0.001152] R10: 0000000000000000 R11:
          0000000000000246 R12: 0000000004f542d8</div>
        <div class="">[  +0.001176] R13: 0000000000000000 R14:
          0000000004f54288 R15: 0000000000000000</div>
        <div class="">[  +0.001152]  </TASK></div>
        <div class="">[  +0.001113] Modules linked in: veth amdgpu(E)
          nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype
          br_netfilter xt_CHECKSUM iptable_mangle xt_MASQUERADE
          iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6
          nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp
          llc ebtable_filter ebtables ip6table_filter ip6_tables
          iptable_filter overlay esp6_offload esp6 esp4_offload esp4
          xfrm_algo intel_rapl_msr intel_rapl_common sb_edac
          x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi
          snd_hda_intel ipmi_ssif snd_intel_dspcfg coretemp
          snd_hda_codec kvm_intel snd_hda_core snd_hwdep snd_pcm
          snd_timer snd kvm soundcore irqbypass ftdi_sio usbserial
          input_leds iTCO_wdt iTCO_vendor_support joydev mei_me rapl
          lpc_ich intel_cstate mei ipmi_si ipmi_devintf ipmi_msghandler
          mac_hid acpi_power_meter sch_fq_codel ib_iser rdma_cm iw_cm
          ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi
          scsi_transport_iscsi ip_tables x_tables autofs4 btrfs
          blake2b_generic zstd_compress raid10 raid456</div>
        <div class="">[  +0.000102]  async_raid6_recov async_memcpy
          async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
          multipath linear iommu_v2 gpu_sched drm_ttm_helper mgag200 ttm
          drm_shmem_helper drm_kms_helper syscopyarea sysfillrect
          sysimgblt fb_sys_fops crct10dif_pclmul hid_generic
          crc32_pclmul ghash_clmulni_intel usbhid uas aesni_intel
          crypto_simd igb ahci hid drm usb_storage cryptd libahci dca
          megaraid_sas i2c_algo_bit wmi [last unloaded: amdgpu]</div>
        <div class="">[  +0.016626] CR2: 0000000000058a68</div>
        <div class="">[  +0.001550] ---[ end trace ff90849fe0a8b3b4 ]---</div>
        <div class="">[  +0.024953] RIP:
          0010:amdgpu_device_rreg.part.24+0xa9/0xe0 [amdgpu]</div>
        <div class="">[  +0.001814] Code: e8 8c 7d 02 00 65 ff 0d 65 e0
          7f 3f 75 ae 0f 1f 44 00 00 eb a7 83 e2 02 75 09 f6 87 10 69 01
          00 10 75 0d 4c 03 a3 a0 09 00 00 <45> 8b 24 24 eb 8a 4c
          8d b7 b0 6b 01 00 4c 89 f7 e8 a2 4c 2e ca 85</div>
        <div class="">[  +0.003255] RSP: 0018:ffffb58fac313928 EFLAGS:
          00010202</div>
        <div class="">[  +0.001641] RAX: ffffffffc09a4270 RBX:
          ffff8b0c9c840000 RCX: 00000000ffffffff</div>
        <div class="">[  +0.001656] RDX: 0000000000000000 RSI:
          000000000001629a RDI: ffff8b0c9c840000</div>
        <div class="">[  +0.001681] RBP: ffffb58fac313948 R08:
          0000000000000021 R09: 0000000000000001</div>
        <div class="">[  +0.001662] R10: ffffb58fac313b30 R11:
          ffffffff8c065b00 R12: 0000000000058a68</div>
        <div class="">[  +0.001650] R13: 000000000001629a R14:
          0000000000000000 R15: 000000000001629a</div>
        <div class="">[  +0.001648] FS:  0000000000000000(0000)
          GS:ffff8b4b7fa80000(0000) knlGS:0000000000000000</div>
        <div class="">[  +0.001668] CS:  0010 DS: 0000 ES: 0000 CR0:
          0000000080050033</div>
        <div class="">[  +0.001673] CR2: 0000000000058a68 CR3:
          000000010a2c8001 CR4: 00000000001706e0</div>
        <div class="">[  +0.001740] Fixing recursive fault but reboot is
          needed!</div>
        <div class=""><br class="">
        </div>
        <div class=""><br class="">
        </div>
        <div>
          <blockquote type="cite" class="">
            <div class="">On Apr 21, 2022, at 2:41 AM, Andrey Grodzovsky
              <<a href="mailto:andrey.grodzovsky@amd.com" class="moz-txt-link-freetext" moz-do-not-send="true">andrey.grodzovsky@amd.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div class="">
                <p class="">I retested hot plug tests at the commit I
                  mentioned bellow - looks ok, my ASIC is Navi 10, I
                  also tested using Vega 10 and older Polaris ASICs
                  (whatever i had at home at the time). It's possible
                  there are extra issues in ASICs like ur which I didn't
                  cover during tests. <br class="">
                </p>
                <p class="">andrey@andrey-test:~/drm$ sudo
                  ./build/tests/amdgpu/amdgpu_test -s 13<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  <br class="">
                  <br class="">
                  The ASIC NOT support UVD, suite disabled<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  <br class="">
                  <br class="">
                  The ASIC NOT support VCE, suite disabled<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  <br class="">
                  <br class="">
                  The ASIC NOT support UVD ENC, suite disabled.<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  <br class="">
                  <br class="">
                  Don't support TMZ (trust memory zone), security suite
                  disabled<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  /usr/local/share/libdrm/amdgpu.ids: No such file or
                  directory<br class="">
                  Peer device is not opened or has ASIC not supported by
                  the suite, skip all Peer to Peer tests.<br class="">
                  <br class="">
                  <br class="">
                       CUnit - A unit testing framework for C - Version
                  2.1-3<br class="">
                       <a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcunit.sourceforge.net%2F&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C5a4c0dad2eca4483f42808da282f3370%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637866480459680617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mRW%2F7DvAdcJ5JRvxst5MRJ%2BV68b4lPfuPxSzGWbgXlc%3D&reserved=0" originalsrc="http://cunit.sourceforge.net/" shash="afv2gau6xD3ns2Bab8U6BEq3m8vGWv9KLfdx/jV8/e1FbeRiKfd5Mqxr2LbnxEunCt0uDDZVQPARqfgf7OauV8LZ3yjav6eSxUyOvVl7eevtDKdcZXVhUs3sMo2Lg3rOoLbdLgy6BHZohRsGZgbOLsQhlqvI22F18jwx8aLubVY=" moz-do-not-send="true">
                    http://cunit.sourceforge.net/</a><br class="">
                  <br class="">
                  <br class="">
                  <b class="">Suite: Hotunplug Tests</b><b class=""><br class="">
                  </b><b class="">  Test: Unplug card and rescan the bus
                    to plug it back
                    .../usr/local/share/libdrm/amdgpu.ids: No such file
                    or directory</b><b class=""><br class="">
                  </b><b class="">passed</b><b class=""><br class="">
                  </b><b class="">  Test: Same as first test but with
                    command submission
                    .../usr/local/share/libdrm/amdgpu.ids: No such file
                    or directory</b><b class=""><br class="">
                  </b><b class="">passed</b><b class=""><br class="">
                  </b><b class="">  Test: Unplug with exported bo
                    .../usr/local/share/libdrm/amdgpu.ids: No such file
                    or directory</b><b class=""><br class="">
                  </b><b class="">passed</b><br class="">
                  <br class="">
                  Run Summary:    Type  Total    Ran Passed Failed
                  Inactive<br class="">
                                suites     14      1    n/a     
                  0        0<br class="">
                                 tests     71      3      3     
                  0        1<br class="">
                               asserts     21     21     21      0     
                  n/a<br class="">
                  <br class="">
                  Elapsed time =    9.195 seconds<br class="">
                </p>
                <p class=""><br class="">
                </p>
                <p class="">Andrey<br class="">
                </p>
                <div class="moz-cite-prefix">On 2022-04-20 11:44, Andrey
                  Grodzovsky wrote:<br class="">
                </div>
                <blockquote type="cite" cite="mid:34789d77-b8ee-1e4f-c5c2-f32f42f923dc@amd.com" class="">
                  <p class="">The only one in Radeon 7 I see is the same
                    sysfs crash we already fixed so you can use the same
                    fix. The MI 200 issue i haven't seen yet but I also
                    haven't tested MI200 so never saw it before. Need to
                    test when i get the time.
                    <br class="">
                  </p>
                  <p class="">So try that fix with Radeon 7 again to see
                    if you pass the tests (the warnings should all be
                    minor issues).</p>
                  <p class="">Andrey</p>
                  <p class=""><br class="">
                  </p>
                  <div class="moz-cite-prefix">On 2022-04-20 05:24,
                    Shuotao Xu wrote:<br class="">
                  </div>
                  <blockquote type="cite" cite="mid:549246A3-B326-47CC-92FD-608708E1876B@microsoft.com" class="">
                    <div class="">
                      <blockquote type="cite" class="">
                        <div class="">
                          <div class="">
                            <p class="">That a problem, latest working
                              baseline I tested and confirmed passing
                              hotplug tests is this branch and commit
                              <a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux%2F-%2Fcommit%2F86e12a53b73135806e101142e72f3f1c0e6fa8e6&data=05%7C01%7Candrey.grodzovsky%40amd.com%7C5a4c0dad2eca4483f42808da282f3370%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637866480459680617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=46BBfjOQueTxTXx921QRShZS7cznCZXGxE%2FEG8M6F70%3D&reserved=0" originalsrc="https://gitlab.freedesktop.org/agd5f/linux/-/commit/86e12a53b73135806e101142e72f3f1c0e6fa8e6" shash="KVG3wy3Hyu0De7/Ws65ytbANo6V8qXAyheJ0nRigcIEhn7TShfYQr1DXaXemV1gnomzUJwongr/UeEhKwr5M2Wq8xsDsPRAPiyI2UCVRZYw8fyTnrLG1IuZ+2r8Bx4tsNbHDz9l+CWo36VFbUUfTQu7cxRrIGwqhlnwkRTOeK7Q=" moz-do-not-send="true">
https://gitlab.freedesktop.org/agd5f/linux/-/commit/86e12a53b73135806e101142e72f3f1c0e6fa8e6</a>
                              which is amd-staging-drm-next. 5.14 was
                              the branch we ups-reamed the hotplug code
                              but it had a lot of regressions over time
                              due to new changes (that why I added the
                              hotplug test to try and catch them early).
                              It would be best to run this branch on
                              mi-100 so we have a clean baseline and
                              only after confirming  this particular
                              branch from this commits passes libdrm
                              tests only then start adding the KFD
                              specific addons. Another option if you
                              can't work with MI-100 and this branch is
                              to try a different ASIC that does work
                              with this branch (if possible).</p>
                            <p class="">Andrey<br class="">
                            </p>
                          </div>
                        </div>
                      </blockquote>
                      OK I tried both this commit and the HEAD of
                      and-staging-drm-next on two GPUs( MI100 and Radeon
                      VII) both did not pass hotplugout libdrm test. I
                      might be able to gain access to MI200, but I
                      suspect it would work. </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">I copied the complete dmesgs as
                      follows. I highlighted the OOPSES for you.</div>
                    <div class=""><br class="">
                    </div>
                    <div class=""><span style="background-color:
                        rgb(255, 38, 0);" class="">Radeon VII:</span></div>
                  </blockquote>
                </blockquote>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
  </body>
</html>