[Intel-gfx] [PATCH 00/15] [v2] Broadwell HW semaphore

Tue Dec 17 10:17:38 CET 2013

On Mon, Dec 16, 2013 at 08:50:36PM -0800, Ben Widawsky wrote:
> Reposting this as a new series since two of the patches dropped off
> since last time.
> 
> Functionally it's the same as before. Like before, the patch "drm/i915:
> unleash semaphores on gen8" should probably not be merged as it's not
> 100% clear where the hang is currently coming from. Everything else
> should be pretty benign for other platforms.

I've pulled in the first two patches already. For review Damien is signed
up (althought he goes on vacation soon) and for greater learning
experience he's also agreed to throw in a testcase on top.

We already have a nice stresstest for semaphores (gem_ring_sync_loop), but
no real functional test which checks that the batches are indeed correctly
ordered. For gpu vs. cpu races we already have a fairly complete set in
gem_concurrent_blt, but that has many additional complications we don't
really care about for ring2ring syncing.

For each pair of rings R1, R2 where we have copy support (i.e. blt,
rendercpy and mediafill) do:
- Throw a busy load onto R1. gem_concurrent_blt just uses lots of buffers
  for this effect.
- Fill three buffers A, B, C with unique data.
- Copy A to B on ring R1

Then come the three different variants.
- Copy B to C on ring R2, check that C now contains what A originally
  contained. This is the write->read hazard. gem_concurrent_blt calls this
  early read.
- Copy C to A on ring R2, check that B now contains what A originally
  contained. This is the read->write hazard, gem_concurrent_blt calls it
  overwrite_source.
- Copy C to B on ring R2 and check that B contains what C originally
  contained. This is the write/write hazard. gem_concurrent_blt doesn't
  have that since for the cpu case it's too boring.
- As long as we don't allow concurrent reads on different rings testing
  that one isn't worth it. And even then we could only check whether the
  ring without the busy load indeed completes much earlier than the other
  (i.e. both rings would copy a shared buffer to a private buffer). Not
  worth it at all.

We also have some other tests for cpu access which check for specific bugs
where we've in the past lost the last gpu read/last gpu write access
breadcrumb. But those only make sense once we have bugs (or more
complicated code like e.g. whith the scheduler).

Cheers, Daniel
> 
> Ben Widawsky (15):
>   drm/i915: Reorder/respace MI instruction definition
>   drm/i915: Don't emit mbox updates without semaphores
>   drm/i915: Move semaphore specific ring members to struct
>   drm/i915: Virtualize the ringbuffer signal func
>   drm/i915: Move ring_begin to signal()
>   drm/i915: Make semaphore updates more precise
>   drm/i915: gen specific ring init
>   drm/i915/bdw: implement semaphore signal
>   drm/i915/bdw: implement semaphore wait
>   drm/i915: FORCE_RESTORE for gen8 semaphores
>   drm/i915/bdw: poll semaphores
>   drm/i915: Extract semaphore error collection
>   drm/i915/bdw: collect semaphore error state
>   drm/i915: unleash semaphores on gen8
>   drm/i915: semaphore debugfs
> 
>  drivers/gpu/drm/i915/i915_debugfs.c     |  69 +++++++
>  drivers/gpu/drm/i915/i915_drv.c         |   6 -
>  drivers/gpu/drm/i915/i915_drv.h         |   2 +
>  drivers/gpu/drm/i915/i915_gem.c         |  10 +-
>  drivers/gpu/drm/i915/i915_gem_context.c |   9 +
>  drivers/gpu/drm/i915/i915_gpu_error.c   |  75 ++++++--
>  drivers/gpu/drm/i915/i915_reg.h         |  58 +++---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 329 ++++++++++++++++++++++++--------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  87 ++++++++-
>  9 files changed, 508 insertions(+), 137 deletions(-)
> 
> -- 
> 1.8.5.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch