[PATCH i-g-t] tests/intel/xe_vm: Fix Sync Issue between Unbind and Hammer Thread

Randhawa, Jagmeet jagmeet.randhawa at intel.com
Mon Apr 8 21:37:31 UTC 2024


On 4/5/2024 2:56 PM, Matthew Brost wrote:
> On Fri, Apr 05, 2024 at 02:06:08PM -0700, Jagmeet Randhawa wrote:
>> This patch addresses a critical synchronization issue
>> between the "test_munmap_style_unbind" function and
>> the "hammer_thread" function. Previously, "test_munmap_style_unbind"
>> would proceed with it's execution after launching
>> "hammer_thread". However, the "hammer_thread" in it's
>> initial iteration encountered an error during the syncobj_wait()
>> call halting its execution prematurely. So we never returned
>> back to the "hammer_thread" from "test_munmap_style_unbind".
>>
>> We resolved this error by adding a syncobj_signal() call in our
>> "hammer_thread" function, allowing "hammer_thread" to send the
>> signal to "test_munmap_style_unbind" therefore ensuring the
>> seamless operation of both threads and correct synchronization.
>>
> This explaination does make sense, see below.
>   
>> Cc: Matthew Auld <matthew.auld at intel.com>
>> Cc: Stuart Summers <stuart.summers at intel.com>
>> Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa at intel.com>
>> ---
>> VLK-54352 and VLK-55620
>>
>>   tests/intel/xe_vm.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
>> index ecb2a783c..a25878cd8 100644
>> --- a/tests/intel/xe_vm.c
>> +++ b/tests/intel/xe_vm.c
>> @@ -1153,6 +1153,7 @@ static void *hammer_thread(void *tdata)
>>   		} else {
>>   			exec.num_syncs = 1;
>>   			err = __xe_exec(t->fd, &exec);
>> +			syncobj_signal(t->fd, &sync[0].handle, 1);
> This doesn't look right.
>
> This thread is doing execs as fast as possible waiting on every 32rd
> exec. The main thread (test_munmap_style_unbind) is modifying the VMs
> bindings in a way that creates scheduling dependencies between the
> threads. The KMD is designed to enforce these scheduling dependencies
> while both threads run fully async. If syncobj_wait hangs, there is
> likely an KMD or hardware issues here.
>
> This code signals the syncobj from every 32nd exec in software bypassing
> the hardware / KMD signaling the sync. This breaks the design of the
> tests and makes a likely KMD / hardware issue.
>
> Do the VLK failures occur on every engine instance / class?
>
> Matt

Thank you for the review. The KMD is enforcing the scheduling 
dependencies, so this patch is not addressing the real issue here, it is 
likely just masking it. We can probably discard this patch. The VLK 
failures have a requirement to run on non-copy engines, and seem to fail 
on every non-copy engine instance.

Jagmeet


>
>>   			igt_assert(syncobj_wait(t->fd, &sync[0].handle, 1,
>>   						INT64_MAX, 0, NULL));
>>   			syncobj_reset(t->fd, &sync[0].handle, 1);
>> -- 
>> 2.25.1
>>


More information about the igt-dev mailing list