[PATCH i-g-t 2/3] lib/xe_spin: Fix xe_spin_wait_started()

Thu Aug 8 06:00:13 UTC 2024

On Thu, Aug 01, 2024 at 07:42:54AM GMT, Zbigniew Kempczyński wrote:
>On Wed, Jul 31, 2024 at 08:06:52AM +0200, Zbigniew Kempczyński wrote:
>> On Tue, Jul 30, 2024 at 08:35:03PM -0700, Lucas De Marchi wrote:
>> > Use READ_ONCE() to guarantee we are actually reading the memory written
>> > by the GPU and compiler can't optimize it away.
>> >
>> > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> > ---
>> >  lib/xe/xe_spin.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
>> > index 3adacc3a8..43a7a43c2 100644
>> > --- a/lib/xe/xe_spin.c
>> > +++ b/lib/xe/xe_spin.c
>> > @@ -166,7 +166,7 @@ void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
>> >   */
>> >  bool xe_spin_started(struct xe_spin *spin)
>> >  {
>> > -	return spin->start != 0;
>> > +	return READ_ONCE(spin->start) != 0;
>> >  }
>> >
>> >  /**
>> > --
>> > 2.43.0
>> >
>>
>> Hmm, function is exported and would expect compiler will always
>> read the memory before return. This is part of .so and I don't

right... I was thinking the whole thing would be statically linked to
the tests. Now I see we actually have a libigt.so

>> think this will be inlined to the caller. Where do you observe
>> gpu memory write is not reflected to the cpu?

I didn't. This was just my braindump while trying to understand the
xe_spin and going out on vacations. Now I'm back...

>>
>> --
>> Zbigniew
>>
>
>I wondered about local inlining the function or optimizing which
>will prevent from looping the read (endless loop after single read)
>and I got this when code is optimized but not position independent.
>I mean -O2 produces:
>
>xe_spin_wait_started:
>	cmp     DWORD PTR [rdi], 1
>	jne     .L6
>        ret
>.L6:
>        jmp     .L6

I was actually thinking about xe_spin.o and calls to
xe_spin_wait_started() in the same compilation unit. Could that
function could be inlined and folded in others like
xe_cork_wait_started(). Disassembling the build I get:

void xe_cork_wait_started(struct xe_cork *cork)
{
    d1bc0:       f3 0f 1e fa             endbr64
         xe_spin_wait_started(cork->spin);
    d1bc4:       48 8b 3f                mov    (%rdi),%rdi
    d1bc7:       e9 b4 15 f6 ff          jmp    33180 <xe_spin_wait_started at plt>
    d1bcc:       0f 1f 40 00             nopl   0x0(%rax)

As you said, apparently no, as the function is exported. And xe_spin_wait_started
does do the right loop.

>
>Only -fpic prevents from this optimization.
>
>As Jonathan gave you r-b I'm not going to block this series,
>it is just not necessary imo in this particular case.

Yeah, I don't think this is needed at all. I will drop these 2
patches and keep only the first one.

Thanks
Lucas De Marchi

>
>--
>Zbigniew