[Intel-gfx] Why the memcpy from a mapped GPU memory is so slow on Intel Bay Trail?

Wed May 21 20:25:24 CEST 2014

三月！ <sunnymarch at qq.com> writes:

> Hello!  I'm developing some openCL application with Beignet in Ubuntu
> 14.04 x64 Desktop upon Bay Trail E3825.  And I found that reading data
> from GPU memory through whatever drm_intel gem_bo_map or
> drm_intel_gem_bo_get subdata cost about 0.002 ~ 0.003 second to fetch
> a 7MiB array, which is not quite satisfing.  Could anybody help solve
> this problem?

GPUs (except in the case of SNB/IVB/HSW where the CPU is coherent with
the GPU other than the GPU's L1/2 caches) are extremely slow to read
From because write-combining memory is effectively uncached performance
for reads.  You can get better streaming read performance using the
movntdqa instruction, and you can see an example of code using it in
streaming-load-memcpy.c in mesa (though it looks like that code is
missing an mfence, which iirc is required).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20140521/2cb08764/attachment.sig>