[Intel-gfx] GEM object write

Shaohua Li shaohua.li at intel.com
Tue Mar 31 03:50:03 CEST 2009


On Tue, 2009-03-31 at 00:32 +0800, Keith Packard wrote:
> On Mon, 2009-03-30 at 09:19 +0800, Shaohua Li wrote:
> > Hi,
> > I recently did some benchmarks with different GEM object write methods
> > 
> > 1. bo_map.
> > This approach will memory map the gem object to write-back, and then
> > flush cache to memory. I did a benchmark to compare mapping memory to
> > write-back (and then clflush cache) and write-combine. In my test,
> > write-combine is about 3 times faster than the write-back (please try
> > with attached module). The data is quite stable in my test.
> > 
> > 2. pwrite
> > pwrite almost will map gem object to write-combine (if the gem object is
> > in gtt, and this is almost true in general case), but it adds extra
> > copy. In my XVMC test, pwrite approach cause 20% performance lost.
> > 
> > 3. bo_map_gtt
> > this approach will bind gem object to gtt and map object as
> > write-combine. This is the fastest approach and equal to the performance
> > without GEM, but the object should be bound to gtt and can't be swapped
> > out as the mapping is for a device.
> 
> Your example is biased in favor of WC mapping as it only writes 1 byte
> in 64 to the object. I propose a slightly different test which would
> model how we expect rendering operations to access memory (at least for
> streaming data from CPU to GPU):
> 
>      1. Allocate a large pool of memory to simulate pages behind the
>         aperture
>      2. For pwrite mappings:
>              1. allocate a small (32kB) pool of memory
>              2. Write data to the small buffer
>              3. Copy that data to the "aperture"
>              4. clflush
>      3. For WB mappings
>              1. Write data to the "aperture"
>              2. clflush
>      4. For WC mappings
>              1. Write data to the "aperture"
> 
> In each case, the writes should be 4 bytes, aligned on 4-byte
> boundaries, and the writes should fill the nominal buffer size (32kB),
> and you should use a different section of the aperture, as a streaming
> application would.
> 
> Given that WC mapping is only 3x slower than WB mapping + clflush, when
> writing only 1/64 of a cache line each time, I think it will be
> interesting to see how this works when writing the full amount of data.
Just tried the 4byte access. the result for WB/WC mapping isn't changed.
WC mapping is still about 3x faster than WB mapping + clflush. please
give a try.
I'll do a benchmark for pwrite mapping later.

Thanks,
Shaohua
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 1467 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20090331/5cb893e1/attachment.c>


More information about the Intel-gfx mailing list