[Intel-gfx] gem clflush optimization for media encoding

Wed Jun 22 06:29:21 CEST 2011

>>-----Original Message-----
>>From: Keith Packard [mailto:keithp at keithp.com]
>>Sent: 2011年6月22日 12:14
>>To: Zou, Nanhai; intel-gfx at lists.freedesktop.org
>>Cc: Anholt, Eric
>>Subject: Re: [Intel-gfx] gem clflush optimization for media encoding
>>
>>On Wed, 22 Jun 2011 11:13:09 +0800, "Zou, Nanhai" <nanhai.zou at intel.com> wrote:
>>
>>> 	If I upload input buffer with movnti or movntdq (bypass cache) +
>>> 	sfence(clear write combine buffer) in the end, clflush should
>>> 	not be needed.
>>
>>Alas, neither of these will flush existing cached data, so you must
>>still use clflush to ensure that the data makes it out to memory. All
>>that they do is avoid consuming additional cache lines.
>>
  As I understand,
  with movnti + sfence, data should be surly reach memory. Cache should be coherent at this case.

>>You want to use a write combining mapping, which should give you full
>>bandwidth access to memory without hitting any caches. You can use the GTT
>>mapping as the aperture is configured for write combining access, or we
>>can figure out how to make PAT work.
>>
	map_gtt in current gem is super slow. 
	I've tried map_gtt but it seems that the speed is unacceptable.

>>> 	Since it is CPU read only surface, clflush in not needed at all.
>>
>>You'd still have to invalidate cache lines using clflush to avoid using
>>stale data in the CPU cache.
>>
>>--
  Yes, you are right, in this case clflush is still needed to invalidate the CPU cache. 

  The problem is that we do not now how large the coded output buffer is before we do the encoding.
  So we have to allocate a large enough gem object before encoding, in most
case the encoding result will be less than 1/10 of the safe buffer size, 9/10 of the buffer was unnecessarily clflushed.

  A fast map_gtt implementation could be the best choice here.

Thanks
Zou Nanhai

>>keith.packard at intel.com