[Intel-gfx] [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Jun 2 11:07:35 UTC 2016
On 01/06/16 18:07, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the
> request. It replaces the explicit reference count with that of the
> fence. It also replaces the 'is completed' test with the fence's
> equivalent. Currently, that simply chains on to the original request
> implementation. A future patch will improve this.
>
> v3: Updated after review comments by Tvrtko Ursulin. Added fence
> context/seqno pair to the debugfs request info. Renamed fence 'driver
> name' to just 'i915'. Removed BUG_ONs.
>
> v5: Changed seqno format in debugfs to %x rather than %u as that is
> apparently the preferred appearance. Line wrapped some long lines to
> keep the style checker happy.
>
> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
> was with the re-worked busy spin precursor to waiting on a request. In
> particular, the addition of a 'request_started' helper function. This
> has no corresponding concept within the fence framework. However, it
> is only ever used in one place and the whole point of that place is to
> always directly read the seqno for absolutely lowest latency possible.
> So the simple solution is to just make the seqno test explicit at that
> point now rather than later in the series (it was previously being
> done anyway when fences become interrupt driven).
>
> v7: Rebased to newer nightly - lots of ring -> engine renaming and
> interface change to get_seqno().
>
> v8: Rebased to newer nightly - no longer needs to worry about mutex
> locking in the request free code path. Moved to after fence timeline
> patch so no longer needs to add a horrid hack timeline.
>
> Removed commented out code block. Added support for possible RCU usage
> of fence object (Review comments by Maarten Lankhorst).
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
Was it an r-b or an ack from Jesse? If the former does it need a "(v?)"
suffix, depending on the amount of code changes after his r-b?
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 5 +-
> drivers/gpu/drm/i915/i915_drv.h | 43 +++++---------
> drivers/gpu/drm/i915/i915_gem.c | 101 +++++++++++++++++++++++++++++---
> drivers/gpu/drm/i915/intel_lrc.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +
> 6 files changed, 115 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ac7e569..844cc4b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
> task = NULL;
> if (req->pid)
> task = pid_task(req->pid, PIDTYPE_PID);
> - seq_printf(m, " %x @ %d: %s [%d]\n",
> + seq_printf(m, " %x @ %d: %s [%d], fence = %x:%x\n",
In the previous patch fence context and seqno were %d in the
timeline->name so it would probably be more consistent.
> req->seqno,
> (int) (jiffies - req->emitted_jiffies),
> task ? task->comm : "<unknown>",
> - task ? task->pid : -1);
> + task ? task->pid : -1,
> + req->fence.context, req->fence.seqno);
> rcu_read_unlock();
> }
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a5f8ad8..905feae 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -42,6 +42,7 @@
> #include <linux/kref.h>
> #include <linux/pm_qos.h>
> #include <linux/shmem_fs.h>
> +#include <linux/fence.h>
>
> #include <drm/drmP.h>
> #include <drm/intel-gtt.h>
> @@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
> * initial reference taken using kref_init
> */
> struct drm_i915_gem_request {
> - struct kref ref;
> + /**
> + * Underlying object for implementing the signal/wait stuff.
> + */
> + struct fence fence;
> + struct rcu_head rcu_head;
>
> /** On Which ring this request was generated */
> struct drm_i915_private *i915;
> @@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
> struct drm_i915_gem_request * __must_check
> i915_gem_request_alloc(struct intel_engine_cs *engine,
> struct i915_gem_context *ctx);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> + bool lazy_coherency)
> +{
> + return fence_is_signaled(&req->fence);
> +}
I would squash the following patch into this one, it makes no sense to
keep a function with an unused parameter. And fewer patches in the
series makes it less scary to review. :) Of course if they are also not
too big. :D
> +
> int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
> struct drm_file *file);
>
> @@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
> i915_gem_request_reference(struct drm_i915_gem_request *req)
> {
> if (req)
> - kref_get(&req->ref);
> + fence_get(&req->fence);
> return req;
> }
>
> static inline void
> i915_gem_request_unreference(struct drm_i915_gem_request *req)
> {
> - kref_put(&req->ref, i915_gem_request_free);
> + fence_put(&req->fence);
> }
>
> static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> @@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> }
>
> /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
> * A command that requires special handling by the command parser.
> */
> struct drm_i915_cmd_descriptor {
> @@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
> return (int32_t)(seq1 - seq2) >= 0;
> }
>
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> - bool lazy_coherency)
> -{
> - if (!lazy_coherency && req->engine->irq_seqno_barrier)
> - req->engine->irq_seqno_barrier(req->engine);
> - return i915_seqno_passed(req->engine->get_seqno(req->engine),
> - req->previous_seqno);
> -}
> -
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> - bool lazy_coherency)
> -{
> - if (!lazy_coherency && req->engine->irq_seqno_barrier)
> - req->engine->irq_seqno_barrier(req->engine);
> - return i915_seqno_passed(req->engine->get_seqno(req->engine),
> - req->seqno);
> -}
> -
> int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
> int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 57d3593..b67fd7c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> {
> unsigned long timeout;
> unsigned cpu;
> + uint32_t seqno;
>
> /* When waiting for high frequency requests, e.g. during synchronous
> * rendering split between the CPU and GPU, the finite amount of time
> @@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> return -EBUSY;
>
> /* Only spin if we know the GPU is processing this request */
> - if (!i915_gem_request_started(req, true))
> + seqno = req->engine->get_seqno(req->engine);
> + if (!i915_seqno_passed(seqno, req->previous_seqno))
> return -EAGAIN;
>
> timeout = local_clock_us(&cpu) + 5;
> while (!need_resched()) {
> - if (i915_gem_request_completed(req, true))
> + seqno = req->engine->get_seqno(req->engine);
> + if (i915_seqno_passed(seqno, req->seqno))
> return 0;
>
> if (signal_pending_state(state, current))
> @@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> cpu_relax_lowlatency();
> }
>
> - if (i915_gem_request_completed(req, false))
> + if (req->engine->irq_seqno_barrier)
> + req->engine->irq_seqno_barrier(req->engine);
> + seqno = req->engine->get_seqno(req->engine);
> + if (i915_seqno_passed(seqno, req->seqno))
> return 0;
>
> return -EAGAIN;
> @@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
> }
> }
>
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free_rcu(struct rcu_head *head)
> {
> - struct drm_i915_gem_request *req = container_of(req_ref,
> - typeof(*req), ref);
> + struct drm_i915_gem_request *req;
> +
> + req = container_of(head, typeof(*req), rcu_head);
> kmem_cache_free(req->i915->requests, req);
> }
>
> +static void i915_gem_request_free(struct fence *req_fence)
> +{
> + struct drm_i915_gem_request *req;
> +
> + req = container_of(req_fence, typeof(*req), fence);
> + call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> + /* Interrupt driven fences are not implemented yet.*/
> + WARN(true, "This should not be called!");
> + return true;
> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> + struct drm_i915_gem_request *req = container_of(req_fence,
> + typeof(*req), fence);
> + u32 seqno;
> +
> + seqno = req->engine->get_seqno(req->engine);
> +
> + return i915_seqno_passed(seqno, req->seqno);
> +}
> +
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> + return "i915";
> +}
> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> + struct drm_i915_gem_request *req;
> + struct i915_fence_timeline *timeline;
> +
> + req = container_of(req_fence, typeof(*req), fence);
> + timeline = &req->ctx->engine[req->engine->id].fence_timeline;
> +
> + return timeline->name;
> +}
> +
> +static void i915_gem_request_timeline_value_str(struct fence *req_fence,
> + char *str, int size)
> +{
> + struct drm_i915_gem_request *req;
> +
> + req = container_of(req_fence, typeof(*req), fence);
> +
> + /* Last signalled timeline value ??? */
> + snprintf(str, size, "? [%d]"/*, timeline->value*/,
Reference to timeline->value a leftover from the past?
Is the string format defined by the API? Asking because "? [%d]" looks
intriguing.
> + req->engine->get_seqno(req->engine));
> +}
> +
> +static void i915_gem_request_fence_value_str(struct fence *req_fence,
> + char *str, int size)
> +{
> + struct drm_i915_gem_request *req;
> +
> + req = container_of(req_fence, typeof(*req), fence);
> +
> + snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
Is it OK to put req->seqno in this one? OR it is just for debug anyway
so it helps us and fence framework does not care?
> +}
> +
> +static const struct fence_ops i915_gem_request_fops = {
> + .enable_signaling = i915_gem_request_enable_signaling,
> + .signaled = i915_gem_request_is_completed,
> + .wait = fence_default_wait,
> + .release = i915_gem_request_free,
> + .get_driver_name = i915_gem_request_get_driver_name,
> + .get_timeline_name = i915_gem_request_get_timeline_name,
> + .fence_value_str = i915_gem_request_fence_value_str,
> + .timeline_value_str = i915_gem_request_timeline_value_str,
> +};
> +
> int i915_create_fence_timeline(struct drm_device *dev,
> struct i915_gem_context *ctx,
> struct intel_engine_cs *engine)
> @@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
> return 0;
> }
>
> -unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
> {
> unsigned seqno;
>
> @@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
> if (ret)
> goto err;
>
> - kref_init(&req->ref);
> req->i915 = dev_priv;
> req->engine = engine;
> req->reset_counter = reset_counter;
> req->ctx = ctx;
> i915_gem_context_reference(req->ctx);
>
> + fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
> + ctx->engine[engine->id].fence_timeline.fence_context,
> + i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
> +
> /*
> * Reserve space in the ring buffer for all the commands required to
> * eventually emit this request. This is to guarantee that the
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 14bcfb7..f126bcb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
> INIT_LIST_HEAD(&engine->buffers);
> INIT_LIST_HEAD(&engine->execlist_queue);
> spin_lock_init(&engine->execlist_lock);
> + spin_lock_init(&engine->fence_lock);
>
> tasklet_init(&engine->irq_tasklet,
> intel_lrc_irq_handler, (unsigned long)engine);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8d35a39..fbd3f12 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
> INIT_LIST_HEAD(&engine->request_list);
> INIT_LIST_HEAD(&engine->execlist_queue);
> INIT_LIST_HEAD(&engine->buffers);
> + spin_lock_init(&engine->fence_lock);
> i915_gem_batch_pool_init(dev, &engine->batch_pool);
> memset(engine->semaphore.sync_seqno, 0,
> sizeof(engine->semaphore.sync_seqno));
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b33c876..3f39daf 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -345,6 +345,8 @@ struct intel_engine_cs {
> * to encode the command length in the header).
> */
> u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> + spinlock_t fence_lock;
Why is this lock per-engine, and not for example per timeline? Aren't
fencees living completely isolated in their per-context-per-engine
domains? So presumably there is something somewhere which is shared
outside that domain to need a lock at the engine level?
> };
>
> static inline bool
>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list