<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Thanks for catching this! You're correct; I've reverted my
      request to remove <code>sh_mem != NULL</code> since the <code>igt_fixture</code>
      isn't executed when the <code>--list-subtests</code> parameter is
      passed. I'll merge the changes tomorrow. Thanks again!</p>
    <p><br>
    </p>
    <p>Vitaly<br>
    </p>
    <div class="moz-cite-prefix">On 2024-08-27 22:00, Zhang, Jesse(Jie)
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:DM4PR12MB5152F4675A22C9079370877AE3952@DM4PR12MB5152.namprd12.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">[AMD Official Use Only - AMD Internal Distribution Only]

Hi Vitaly,

-----Original Message-----
From: Prosyak, Vitaly <a class="moz-txt-link-rfc2396E" href="mailto:Vitaly.Prosyak@amd.com"><Vitaly.Prosyak@amd.com></a>
Sent: Wednesday, August 28, 2024 9:51 AM
To: Zhang, Jesse(Jie) <a class="moz-txt-link-rfc2396E" href="mailto:Jesse.Zhang@amd.com"><Jesse.Zhang@amd.com></a>; Kamil Konieczny <a class="moz-txt-link-rfc2396E" href="mailto:kamil.konieczny@linux.intel.com"><kamil.konieczny@linux.intel.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:igt-dev@lists.freedesktop.org">igt-dev@lists.freedesktop.org</a>
Cc: Prosyak, Vitaly <a class="moz-txt-link-rfc2396E" href="mailto:Vitaly.Prosyak@amd.com"><Vitaly.Prosyak@amd.com></a>; Deucher, Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a>; Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
Subject: Re: [PATCH i-g-t] test/amdgpu: fix unknow test issue for amdgpu queue test

Hi Jesse,

The changes look good.

Could you please remove the condition check for sh_mem? This check is redundant because we already have igt_require(sh_mem != NULL); in the igt_fixture.


when we run sudo ./tests/amdgpu/amd_queue_reset --list-subtests, the sh_mem is NULL, and it should not call set_next_test_to_skip.

if remove the check for sh_mem, it will have segmentation fault, like this:

jenkins@image-update:~/workspace/tools/igt-gpu-tools/6code/igt-gpu-tools/build$ sudo ./tests/amdgpu/amd_queue_reset --list-subtests
amdgpu-COMPUTE-CMD_STREAM_EXEC_INVALID_PACKET_LENGTH
amdgpu-COMPUTE-CMD_STREAM_EXEC_INVALID_OPCODE
amdgpu-COMPUTE-BACKEND_SE_GC_SHADER_INVALID_PROGRAM_ADDR
amdgpu-COMPUTE-BACKEND_SE_GC_SHADER_INVALID_USER_DATA
amdgpu-COMPUTE-BACKEND_SE_GC_SHADER_INVALID_SHADER
amdgpu-GFX-CMD_STREAM_EXEC_INVALID_PACKET_LENGTH
amdgpu-GFX-CMD_STREAM_EXEC_INVALID_OPCODE
amdgpu-GFX-BACKEND_SE_GC_SHADER_INVALID_PROGRAM_ADDR
amdgpu-GFX-BACKEND_SE_GC_SHADER_INVALID_USER_DATA
amdgpu-GFX-BACKEND_SE_GC_SHADER_INVALID_SHADER
Received signal SIGSEGV.
Stack trace:
 #0 [fatal_sig_handler+0x17b]
 #1 [__sigaction+0x50]
 #2 [__igt_unique____real_main1025+0x27e]
 #3 [main+0x2d]
 #4 [__libc_init_first+0x90]
 #5 [__libc_start_main+0x80]
 #6 [_start+0x25]
Segmentation fault

Thanks
Jesse

With that adjustment, the patch is:

Reviewed-by: Vitaly Prosyak <a class="moz-txt-link-rfc2396E" href="mailto:vitaly.prosyak@amd.com"><vitaly.prosyak@amd.com></a>

Thanks


Vitaly




On 2024-08-27 03:54, Zhang, Jesse(Jie) wrote:
</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">[AMD Official Use Only - AMD Internal Distribution Only]

Hi Kamil

-----Original Message-----
From: Kamil Konieczny <a class="moz-txt-link-rfc2396E" href="mailto:kamil.konieczny@linux.intel.com"><kamil.konieczny@linux.intel.com></a>
Sent: Tuesday, August 27, 2024 3:24 PM
To: <a class="moz-txt-link-abbreviated" href="mailto:igt-dev@lists.freedesktop.org">igt-dev@lists.freedesktop.org</a>
Cc: Zhang, Jesse(Jie) <a class="moz-txt-link-rfc2396E" href="mailto:Jesse.Zhang@amd.com"><Jesse.Zhang@amd.com></a>; Prosyak, Vitaly
<a class="moz-txt-link-rfc2396E" href="mailto:Vitaly.Prosyak@amd.com"><Vitaly.Prosyak@amd.com></a>; Deucher, Alexander
<a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a>; Koenig, Christian
<a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
Subject: Re: [PATCH i-g-t] test/amdgpu: fix unknow test issue for
amdgpu queue test

Hi Jesse.zhang,
On 2024-08-27 at 13:19:32 +0800, <a class="moz-txt-link-abbreviated" href="mailto:Jesse.zhang@amd.com">Jesse.zhang@amd.com</a> wrote:
</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">Queue reset does not exit properly when executing unknown subtests.
Because other processes are still functioning.

It should exit the other three processes (test, background, and
monitor) for this case.

Cc: Vitaly Prosyak <a class="moz-txt-link-rfc2396E" href="mailto:vitaly.prosyak@amd.com"><vitaly.prosyak@amd.com></a>
Cc: Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a>
Cc: Christian Koenig <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>

Signed-off-by: Jesse Zhang <a class="moz-txt-link-rfc2396E" href="mailto:jesse.zhang@amd.com"><jesse.zhang@amd.com></a>
---
 tests/amdgpu/amd_queue_reset.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/amdgpu/amd_queue_reset.c
b/tests/amdgpu/amd_queue_reset.c index 60208e085..85408e3ff 100644
--- a/tests/amdgpu/amd_queue_reset.c
+++ b/tests/amdgpu/amd_queue_reset.c
@@ -70,6 +70,7 @@ struct shmbuf {
      int count;
      bool sub_test_completed;
      bool sub_test_is_skipped;
+     bool sub_test_is_existed;
      unsigned int test_flags;
      int test_error_code;
      bool reset_completed;
@@ -148,6 +149,7 @@ skip_sub_test(struct shmbuf *sh_mem)  {
      sem_wait(&sh_mem->sem_state_mutex);
      sh_mem->sub_test_is_skipped = true;
+     sh_mem->sub_test_is_existed = true;
      sem_post(&sh_mem->sem_state_mutex);
 }
</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">Do you re-implement igt infra?

Hi Kamil

No, in the queue reset test, we start three processes (test process,
background process, and monitoring process) when running any test (including unknown tests, such as such as:  sudo amd_queue_reset --run-subtest amdgpu_testxxx).

The known process can exit with the other three processes.

The unknown process can exit, but the other processes will not exit.

This patch fixes the issue of other processes exiting in the unknown case.

Regards
Jesse

Regards,
Kamil

</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">@@ -327,6 +329,7 @@ static void set_next_test_to_run(struct shmbuf *sh_mem, unsigned int error,
      sh_mem->good_job.ip = ip_good;
      sh_mem->good_job.ring_id = ring_id_good;
      sh_mem->sub_test_is_skipped = false;
+     sh_mem->sub_test_is_existed = true;
      sem_post(&sh_mem->sem_state_mutex);

      //sync and wait for complete
@@ -405,6 +408,7 @@ shared_mem_create(struct shmbuf **ppbuf)
      shmp->sub_test_completed = false;
      shmp->reset_completed = false;
      shmp->sub_test_is_skipped = false;
+     shmp->sub_test_is_existed = false;

      *ppbuf = shmp;
      return shm_fd;
@@ -1128,7 +1132,6 @@ igt_main
                      create_contexts(device, &arr_context_handle, const_num_of_tests);
              else if (process == PROCESS_BACKGROUND)
                      fd_shm = shared_mem_open(&sh_mem);
-
              igt_require(fd_shm != -1);
              igt_require(sh_mem != NULL);

@@ -1136,7 +1139,6 @@ igt_main
                      process, sh_mem, const_num_of_tests, info[0].hw_ip_version_major,
                      &monitor_child, &test_child);
      }
-
      for (int i = 0; i < ARRAY_SIZE(ip_tests); i++) {
              reset_rings_numbers(&ring_id_good, &ring_id_bad, &ring_id_job_good, &ring_id_job_bad);
              for (struct dynamic_test *it = &arr_err[0]; it->name;
it++) { @@
-1154,6 +1156,10 @@ igt_main
                      }
              }
      }
+
</pre>
        </blockquote>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">Please, remove

sh_mem

</pre>
      <blockquote type="cite">
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">+     if (sh_mem &&( !sh_mem->sub_test_is_existed))
+             set_next_test_to_skip(sh_mem);
+
      igt_fixture {
              if (process == PROCESS_TEST) {
                      waitpid(monitor_child, &monitorExitMethod, 0);
--
2.25.1

</pre>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>