[Intel-gfx] GEM object write

Ma, Ling ling.ma at intel.com
Wed Apr 1 03:59:41 CEST 2009

If data is in cache, movnti will invalidate the cache line, then write to memory,
otherwise it will write data into memory directly. So after all stores complete,
we only need do mfence instruction to drain out left data in write combine buffer, instead of clflush every cache line.

Ma Ling

-----Original Message-----
From: intel-gfx-bounces at lists.freedesktop.org [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf Of Keith Packard
Sent: Tuesday, March 31, 2009 10:33 PM
To: Ma, Ling
Cc: intel-gfx at lists.freedesktop.org
Subject: Re: [Intel-gfx] GEM object write

On Tue, 2009-03-31 at 14:56 +0800, Ma, Ling wrote:
> Hi,
> I did another test program based on original one,
> The test result shows WB  is faster than WC - WC/WB is about 8369/4421.
> In this file I use movnti instruction to write in order to avoid  much clflush instruction.
>  may be we can do some optimization on it.

That's a good thought, but we've learned from the CPU architects that
non-temporal stores aren't guaranteed to bypass the cache, they just
avoid pulling memory into cache if it isn't already there. So, it's the
right instruction to use, you just have to combine that with clflush as

keith.packard at intel.com

More information about the Intel-gfx mailing list