[PATCH i-g-t] tests/intel/xe_vm: Fix Sync Issue between Unbind and Hammer Thread

Matthew Brost matthew.brost at intel.com
Fri Apr 5 21:58:15 UTC 2024


On Fri, Apr 05, 2024 at 09:56:37PM +0000, Matthew Brost wrote:
> On Fri, Apr 05, 2024 at 02:06:08PM -0700, Jagmeet Randhawa wrote:
> > This patch addresses a critical synchronization issue
> > between the "test_munmap_style_unbind" function and
> > the "hammer_thread" function. Previously, "test_munmap_style_unbind"
> > would proceed with it's execution after launching
> > "hammer_thread". However, the "hammer_thread" in it's
> > initial iteration encountered an error during the syncobj_wait()
> > call halting its execution prematurely. So we never returned
> > back to the "hammer_thread" from "test_munmap_style_unbind".
> > 
> > We resolved this error by adding a syncobj_signal() call in our
> > "hammer_thread" function, allowing "hammer_thread" to send the
> > signal to "test_munmap_style_unbind" therefore ensuring the
> > seamless operation of both threads and correct synchronization.
> >
> 
> This explaination does make sense, see below.

Typo, does *not* make sense.

Matt

>  
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > Cc: Stuart Summers <stuart.summers at intel.com>
> > Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa at intel.com>
> > ---
> > VLK-54352 and VLK-55620
> > 
> >  tests/intel/xe_vm.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> > index ecb2a783c..a25878cd8 100644
> > --- a/tests/intel/xe_vm.c
> > +++ b/tests/intel/xe_vm.c
> > @@ -1153,6 +1153,7 @@ static void *hammer_thread(void *tdata)
> >  		} else {
> >  			exec.num_syncs = 1;
> >  			err = __xe_exec(t->fd, &exec);
> > +			syncobj_signal(t->fd, &sync[0].handle, 1);
> 
> This doesn't look right.
> 
> This thread is doing execs as fast as possible waiting on every 32rd
> exec. The main thread (test_munmap_style_unbind) is modifying the VMs
> bindings in a way that creates scheduling dependencies between the
> threads. The KMD is designed to enforce these scheduling dependencies
> while both threads run fully async. If syncobj_wait hangs, there is
> likely an KMD or hardware issues here.
> 
> This code signals the syncobj from every 32nd exec in software bypassing
> the hardware / KMD signaling the sync. This breaks the design of the
> tests and makes a likely KMD / hardware issue.
> 
> Do the VLK failures occur on every engine instance / class?
> 
> Matt
> 
> >  			igt_assert(syncobj_wait(t->fd, &sync[0].handle, 1,
> >  						INT64_MAX, 0, NULL));
> >  			syncobj_reset(t->fd, &sync[0].handle, 1);
> > -- 
> > 2.25.1
> > 


More information about the igt-dev mailing list