[PATCH] x86: Add an explicit barrier() to clflushopt()

Tue Jan 12 18:06:34 PST 2016

On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> The double clflush() remains a mystery.

Actually, I think it's explainable.

It's wrong to do the clflush *after* the GPU has done the write, which
seems to be what you are doing.

Why?

If the GPU really isn't cache coherent, what can happen is:

 - the CPU has the line cached

 - the GPU writes the data

 - you do the clflushopt to invalidate the cacheline

 - you expect to see the GPU data.

Right?

Wrong. The above is complete crap.

Why?

Very simple reason: the CPU may have had the cacheline dirty at some
level in its caches, so when you did the clflushopt, it didn't just
invalidate the CPU cacheline, it wrote it back to memory. And in the
process over-wrote the data that the GPU had written.

Now you can say "but the CPU never wrote to the cacheline, so it's not
dirty in the CPU caches". That may or may not be trie. The CPU may
have written to it quite a long time ago.

So if you are doing a GPU write, and you want to see the data that the
GPU wrote, you had better do the clflushopt long *before* the GPU ever
writes to memory.

Your pattern of doing "flush and read" is simply fundamentally buggy.
There are only two valid CPU flushing patterns:

 - write and flush (to make the writes visible to the GPU)

 - flush before starting GPU accesses, and then read

At no point can "flush and read" be right.

Now, I haven't actually seen your code, so I'm just going by your
high-level description of where the CPU flush and CPU read were done,
but it *sounds* like you did that invalid "flush and read" behavior.

                 Linus