[Intel-gfx] [RFC 17/44] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two

Daniel Vetter daniel at ffwll.ch
Wed Jul 23 20:14:59 CEST 2014


On Wed, Jul 23, 2014 at 05:33:42PM +0100, John Harrison wrote:
> 
> On 07/07/2014 20:21, Daniel Vetter wrote:
> >On Wed, Jul 02, 2014 at 11:34:23AM -0700, Jesse Barnes wrote:
> >>On Thu, 26 Jun 2014 18:24:08 +0100
> >>John.C.Harrison at Intel.com wrote:
> >>
> >>>From: John Harrison <John.C.Harrison at Intel.com>
> >>>
> >>>The scheduler decouples the submission of batch buffers to the driver with their
> >>>submission to the hardware. This basically means splitting the execbuffer()
> >>>function in half. This change rearranges some code ready for the split to occur.
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
> >>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index ec274ef..fda9187 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -32,6 +32,7 @@
> >>>  #include "i915_trace.h"
> >>>  #include "intel_drv.h"
> >>>  #include <linux/dma_remapping.h>
> >>>+#include "i915_scheduler.h"
> >>>  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
> >>>  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> >>>@@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >>>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
> >>>  		wmb();
> >>>-	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>-	 * any residual writes from the previous batch.
> >>>-	 */
> >>>-	return intel_ring_invalidate_all_caches(ring);
> >>>+	return 0;
> >>>  }
> >>>  static bool
> >>>@@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  		}
> >>>  	}
> >>>-	intel_runtime_pm_get(dev_priv);
> >>>-
> >>>  	ret = i915_mutex_lock_interruptible(dev);
> >>>  	if (ret)
> >>>  		goto pre_mutex_err;
> >>>@@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	if (ret)
> >>>  		goto err;
> >>>+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>+
> >>>+	/* To be split into two functions here... */
> >>>+
> >>>+	intel_runtime_pm_get(dev_priv);
> >>>+
> >>>+	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>+	 * any residual writes from the previous batch.
> >>>+	 */
> >>>+	ret = intel_ring_invalidate_all_caches(ring);
> >>>+	if (ret)
> >>>+		goto err;
> >>>+
> >>>+	/* Switch to the correct context for the batch */
> >>>  	ret = i915_switch_context(ring, ctx);
> >>>  	if (ret)
> >>>  		goto err;
> >>>@@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
> >>>-	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> >>>  err:
> >>I'd like Chris to take a look too, but it looks safe afaict.
> >>
> >>Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> >switch_context can fail with EINTR so we really can't move stuff to the
> >active list before that point. Or we need to make sure that all the stuff
> >between the old and new move_to_active callsite can't fail.
> >
> >Or we need to track this and tell userspace with an EIO and adjusted reset
> >stats that something between our point of no return where the kernel
> >committed to executing the batch failed.
> >
> >Or we need to unrol move_to_active (which is currently not really
> >possible).
> >-Daniel
> 
> switch_context can fail with quite a lot of different error codes. Is there
> anything particularly special about EINTR? I can't spot that particular code
> path at the moment.
> 
> The context switch is done at the point of submission to the hardware. As
> batch buffers can be re-ordered between submission to driver and submission
> to hardware, there is no point choosing a context any earlier. Whereas the
> move to active needs to be done at the point of submission to the driver.
> The object needs to be marked as in use even though the batch buffer that
> actually uses it might not be executed for some time. From the software
> viewpoint, the object is in use and all the syncrhonisation code needs to
> know that.
> 
> The scheduler makes the batch buffer execution asynchronous to its
> submission to the driver. There is no way to communicate back a return code
> to user land. Instead, it is up to the scheduler to check the return codes
> from all the execution paths and to retry later if something fails for a
> temporary reason. Or to discard the buffer if it is truly toast.

EINTR is simply really easy to test&hit since you can provoke it with
signals. And X uses signals excessively. One point where EINTR might
happen is in intel_ring_begin, the other when we try to pin the context
into ggtt. The other error codes are true exceptions and will much harder
to hit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list