[PATCH i-g-t] tests/intel/xe_vm: Fix Sync Issue between Unbind and Hammer Thread
Matthew Brost
matthew.brost at intel.com
Fri Apr 5 21:56:37 UTC 2024
On Fri, Apr 05, 2024 at 02:06:08PM -0700, Jagmeet Randhawa wrote:
> This patch addresses a critical synchronization issue
> between the "test_munmap_style_unbind" function and
> the "hammer_thread" function. Previously, "test_munmap_style_unbind"
> would proceed with it's execution after launching
> "hammer_thread". However, the "hammer_thread" in it's
> initial iteration encountered an error during the syncobj_wait()
> call halting its execution prematurely. So we never returned
> back to the "hammer_thread" from "test_munmap_style_unbind".
>
> We resolved this error by adding a syncobj_signal() call in our
> "hammer_thread" function, allowing "hammer_thread" to send the
> signal to "test_munmap_style_unbind" therefore ensuring the
> seamless operation of both threads and correct synchronization.
>
This explaination does make sense, see below.
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Stuart Summers <stuart.summers at intel.com>
> Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa at intel.com>
> ---
> VLK-54352 and VLK-55620
>
> tests/intel/xe_vm.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> index ecb2a783c..a25878cd8 100644
> --- a/tests/intel/xe_vm.c
> +++ b/tests/intel/xe_vm.c
> @@ -1153,6 +1153,7 @@ static void *hammer_thread(void *tdata)
> } else {
> exec.num_syncs = 1;
> err = __xe_exec(t->fd, &exec);
> + syncobj_signal(t->fd, &sync[0].handle, 1);
This doesn't look right.
This thread is doing execs as fast as possible waiting on every 32rd
exec. The main thread (test_munmap_style_unbind) is modifying the VMs
bindings in a way that creates scheduling dependencies between the
threads. The KMD is designed to enforce these scheduling dependencies
while both threads run fully async. If syncobj_wait hangs, there is
likely an KMD or hardware issues here.
This code signals the syncobj from every 32nd exec in software bypassing
the hardware / KMD signaling the sync. This breaks the design of the
tests and makes a likely KMD / hardware issue.
Do the VLK failures occur on every engine instance / class?
Matt
> igt_assert(syncobj_wait(t->fd, &sync[0].handle, 1,
> INT64_MAX, 0, NULL));
> syncobj_reset(t->fd, &sync[0].handle, 1);
> --
> 2.25.1
>
More information about the igt-dev
mailing list