[PATCH i-g-t] tests/intel/xe_vm: Fix Sync Issue between Unbind and Hammer Thread
Matthew Brost
matthew.brost at intel.com
Fri Apr 5 21:58:15 UTC 2024
On Fri, Apr 05, 2024 at 09:56:37PM +0000, Matthew Brost wrote:
> On Fri, Apr 05, 2024 at 02:06:08PM -0700, Jagmeet Randhawa wrote:
> > This patch addresses a critical synchronization issue
> > between the "test_munmap_style_unbind" function and
> > the "hammer_thread" function. Previously, "test_munmap_style_unbind"
> > would proceed with it's execution after launching
> > "hammer_thread". However, the "hammer_thread" in it's
> > initial iteration encountered an error during the syncobj_wait()
> > call halting its execution prematurely. So we never returned
> > back to the "hammer_thread" from "test_munmap_style_unbind".
> >
> > We resolved this error by adding a syncobj_signal() call in our
> > "hammer_thread" function, allowing "hammer_thread" to send the
> > signal to "test_munmap_style_unbind" therefore ensuring the
> > seamless operation of both threads and correct synchronization.
> >
>
> This explaination does make sense, see below.
Typo, does *not* make sense.
Matt
>
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > Cc: Stuart Summers <stuart.summers at intel.com>
> > Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa at intel.com>
> > ---
> > VLK-54352 and VLK-55620
> >
> > tests/intel/xe_vm.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> > index ecb2a783c..a25878cd8 100644
> > --- a/tests/intel/xe_vm.c
> > +++ b/tests/intel/xe_vm.c
> > @@ -1153,6 +1153,7 @@ static void *hammer_thread(void *tdata)
> > } else {
> > exec.num_syncs = 1;
> > err = __xe_exec(t->fd, &exec);
> > + syncobj_signal(t->fd, &sync[0].handle, 1);
>
> This doesn't look right.
>
> This thread is doing execs as fast as possible waiting on every 32rd
> exec. The main thread (test_munmap_style_unbind) is modifying the VMs
> bindings in a way that creates scheduling dependencies between the
> threads. The KMD is designed to enforce these scheduling dependencies
> while both threads run fully async. If syncobj_wait hangs, there is
> likely an KMD or hardware issues here.
>
> This code signals the syncobj from every 32nd exec in software bypassing
> the hardware / KMD signaling the sync. This breaks the design of the
> tests and makes a likely KMD / hardware issue.
>
> Do the VLK failures occur on every engine instance / class?
>
> Matt
>
> > igt_assert(syncobj_wait(t->fd, &sync[0].handle, 1,
> > INT64_MAX, 0, NULL));
> > syncobj_reset(t->fd, &sync[0].handle, 1);
> > --
> > 2.25.1
> >
More information about the igt-dev
mailing list