[Intel-gfx] GEM object write
ling.ma at intel.com
Wed Apr 1 03:59:41 CEST 2009
If data is in cache, movnti will invalidate the cache line, then write to memory,
otherwise it will write data into memory directly. So after all stores complete,
we only need do mfence instruction to drain out left data in write combine buffer, instead of clflush every cache line.
From: intel-gfx-bounces at lists.freedesktop.org [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf Of Keith Packard
Sent: Tuesday, March 31, 2009 10:33 PM
To: Ma, Ling
Cc: intel-gfx at lists.freedesktop.org
Subject: Re: [Intel-gfx] GEM object write
On Tue, 2009-03-31 at 14:56 +0800, Ma, Ling wrote:
> I did another test program based on original one,
> The test result shows WB is faster than WC - WC/WB is about 8369/4421.
> In this file I use movnti instruction to write in order to avoid much clflush instruction.
> may be we can do some optimization on it.
That's a good thought, but we've learned from the CPU architects that
non-temporal stores aren't guaranteed to bypass the cache, they just
avoid pulling memory into cache if it isn't already there. So, it's the
right instruction to use, you just have to combine that with clflush as
keith.packard at intel.com
More information about the Intel-gfx