[PATCH i-g-t] tests/intel/xe_vm: Fix Sync Issue between Unbind and Hammer Thread

Matthew Brost matthew.brost at intel.com
Fri Apr 5 21:56:37 UTC 2024


On Fri, Apr 05, 2024 at 02:06:08PM -0700, Jagmeet Randhawa wrote:
> This patch addresses a critical synchronization issue
> between the "test_munmap_style_unbind" function and
> the "hammer_thread" function. Previously, "test_munmap_style_unbind"
> would proceed with it's execution after launching
> "hammer_thread". However, the "hammer_thread" in it's
> initial iteration encountered an error during the syncobj_wait()
> call halting its execution prematurely. So we never returned
> back to the "hammer_thread" from "test_munmap_style_unbind".
> 
> We resolved this error by adding a syncobj_signal() call in our
> "hammer_thread" function, allowing "hammer_thread" to send the
> signal to "test_munmap_style_unbind" therefore ensuring the
> seamless operation of both threads and correct synchronization.
>

This explaination does make sense, see below.
 
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Stuart Summers <stuart.summers at intel.com>
> Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa at intel.com>
> ---
> VLK-54352 and VLK-55620
> 
>  tests/intel/xe_vm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> index ecb2a783c..a25878cd8 100644
> --- a/tests/intel/xe_vm.c
> +++ b/tests/intel/xe_vm.c
> @@ -1153,6 +1153,7 @@ static void *hammer_thread(void *tdata)
>  		} else {
>  			exec.num_syncs = 1;
>  			err = __xe_exec(t->fd, &exec);
> +			syncobj_signal(t->fd, &sync[0].handle, 1);

This doesn't look right.

This thread is doing execs as fast as possible waiting on every 32rd
exec. The main thread (test_munmap_style_unbind) is modifying the VMs
bindings in a way that creates scheduling dependencies between the
threads. The KMD is designed to enforce these scheduling dependencies
while both threads run fully async. If syncobj_wait hangs, there is
likely an KMD or hardware issues here.

This code signals the syncobj from every 32nd exec in software bypassing
the hardware / KMD signaling the sync. This breaks the design of the
tests and makes a likely KMD / hardware issue.

Do the VLK failures occur on every engine instance / class?

Matt

>  			igt_assert(syncobj_wait(t->fd, &sync[0].handle, 1,
>  						INT64_MAX, 0, NULL));
>  			syncobj_reset(t->fd, &sync[0].handle, 1);
> -- 
> 2.25.1
> 


More information about the igt-dev mailing list