[PATCH v10b 0/6] Convert requests to use struct fence
John.C.Harrison at Intel.com
John.C.Harrison at Intel.com
Fri Jun 17 09:44:52 UTC 2016
From: John Harrison <John.C.Harrison at Intel.com>
There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.
Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.
Note that in order to allow the full fence API to be used (e.g.
merging multiple fences together), the driver needs to provide an
incrementing timeline for the fence. Currently this timeline is
specific to the fence code as it must be per context. There is future
work planned to make the driver's internal seqno value also be per
context rather than driver global (VIZ-7443). Once this is done the
fence specific timeline code can be dropped in favour of just using
the driver's seqno value.
This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.
An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.
v2: Updated for review comments by various people and to add support
for Android style 'native sync'.
v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.
v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.
Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.
v7: Updated to newer nightly and resolved conflicts around massive
ring -> engine rename and interface change to get_seqno(). Also fixed
up a race condition issue with stale request pointers in file client
lists and added a minor optimisation to not acquire spinlocks when a
list is empty and does not need processing.
v8: Updated to yet another nightly and resolved the merge conflicts.
Dropped 'delay freeing of requests' patch as no longer needed to due
changes in request clean up code. Likewise with the deferred
processing of the fence signalling. Also moved the fence timeline
patch to before the fence conversion. It now means the timeline is
initially added with no actual user but also means the fence
conversion patch does not need to add a horrid hack timeline which is
then removed again in a subsequent patch.
Added support for possible RCU usage of fence object (Review comments
by Maarten Lankhorst).
v9: Updated to another newer nightly (changes to context structure
Moved the request completion processing out of the interrupt handler
and into a worker thread (Chris Wilson).
v10: Removed obsolete fields from timeline structure and a couple of
functions. Corrected some comments and debug prints. Removed duplicate
rcu_head field from request - there is already one in the fence
structure for this exact purpose. Improved/added some comments and
WARNs. Changed to an un-ordered work queue to allow parallel
processing of different engines. Also set the high priority flag for
reduced latency. Removed some unnecessary checks for invalid seqno
values. Moved a spinlock release a few lines later to make the
'locked' parameter of i915_gem_request_enable_interrupt redundant and
removed it. Also shuffled the function around in the file so as to
make it static and remove it from the header file. Corrected the use
of fence_signal_locked() to fence_signal() in the retire code. Dropped
the irq save part of the spin lock calls in the notify code as this is
no longer called from the ISR. Changed the call of
i915_gem_retire_requests_ring() in the reset cleanup code to
i915_gem_request_notify() instead as the former is just duplicating a
lot of operations. Dropped the 'is_empty' flag from
trace_i915_gem_request_notify() as it is now redundant - 'seqno == 0'
[Review comments from Maarten Lankhorst & Tvrtko Ursulin]
Added extra checks and re-instated the lazy_coherency flag to the call
of i915_gem_request_notify() from i915_gem_retire_requests_ring() on
the grounds that it happens lots and lots and mostly does not actually
need to do anything.
Updated for yet more nightly changes (u64 for fence context).
v10b: Fix race condition shown up during BAT testing. I did manage to
reproduce it locally as well but only after many runs of the BAT
suite. Annoyingly, it was rare enough not to be noticed before letting
the BAT farm run lots of tests across lots of different machines.
[Patches against drm-intel-nightly tree fetched 09/06/2016]
John Harrison (6):
drm/i915: Add per context timelines for fence objects
drm/i915: Convert requests to use struct fence
drm/i915: Removed now redundant parameter to i915_gem_request_completed()
drm/i915: Interrupt driven fences
drm/i915: Updated request structure tracing
drm/i915: Cache last IRQ seqno to reduce IRQ overhead
drivers/gpu/drm/i915/i915_debugfs.c | 7 +-
drivers/gpu/drm/i915/i915_dma.c | 14 +-
drivers/gpu/drm/i915/i915_drv.h | 61 ++---
drivers/gpu/drm/i915/i915_gem.c | 414 ++++++++++++++++++++++++++++++--
drivers/gpu/drm/i915/i915_gem_context.c | 16 ++
drivers/gpu/drm/i915/i915_irq.c | 3 +-
drivers/gpu/drm/i915/i915_trace.h | 6 +-
drivers/gpu/drm/i915/intel_display.c | 2 +-
drivers/gpu/drm/i915/intel_lrc.c | 14 ++
drivers/gpu/drm/i915/intel_pm.c | 4 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 6 +
drivers/gpu/drm/i915/intel_ringbuffer.h | 12 +
12 files changed, 493 insertions(+), 66 deletions(-)
More information about the Intel-gfx-trybot