[Intel-gfx] [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back

Thu Jun 9 12:54:44 UTC 2016

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> If we quickly switch from writing through the GTT to a read of the
> physical page directly with the CPU (e.g. performing relocations through
> the GTT and then running the command parser), we can observe that the
> writes are not visible to the CPU. It is not a coherency problem, as
> extensive investigations with clflush have demonstrated, but a mere
> timing issue - we have to wait for the GTT to complete it's write before
> we start our read from the CPU.
> 
> The issue can be illustrated in userspace with:
> 
> 	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> 
> 	for (i = 0; i < OBJECT_SIZE / 64; i++) {
> 		int x = 16*i + (i%16);
> 		gtt[x] = i;
> 		clflush(&cpu[x], sizeof(cpu[x]));
> 		assert(cpu[x] == i);
> 	}
> 
> Experimenting with that shows that this behaviour is indeed limited to
> recent Atom-class hardware.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 18b4a684ddde..ffe3d3e9d69d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2898,20 +2898,30 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  static void
>  i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
>  {
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>  	uint32_t old_write_domain;
>  
>  	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
>  		return;
>  
>  	/* No actual flushing is required for the GTT write domain.  Writes
> -	 * to it immediately go to main memory as far as we know, so there's
> +	 * to it "immediately" go to main memory as far as we know, so there's
>  	 * no chipset flush.  It also doesn't land in render cache.
>  	 *
>  	 * However, we do have to enforce the order so that all writes through
>  	 * the GTT land before any writes to the device, such as updates to
>  	 * the GATT itself.
> +	 *
> +	 * We also have to wait a bit for the writes to land from the GTT.
> +	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
> +	 * timing. This issue has only been observed when switching quickly
> +	 * between GTT writes and CPU reads from inside the kernel on recent hw,
> +	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
> +	 * system agents we cannot reproduce this behaviour).

This screams for a Tested-by: tag before merging...

>  	 */
>  	wmb();
> +	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))

INTEL_GEN()

This fixed, and adding the Testcase: label

Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

> +		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
>  
>  	old_write_domain = obj->base.write_domain;
>  	obj->base.write_domain = 0;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation