[Intel-gfx] [PATCH 2/3] drm/i915: s/seqno/request/ tracking inside objects

John Harrison John.C.Harrison at Intel.com
Tue Sep 9 16:14:29 CEST 2014


I pulled a fresh tree on Monday and applied this set of patches. There 
were two conflicts. It looks like nightly does not have 
'i915_gem_context_setparam_ioctl' yet but the tree the patches came from 
does. Also, my tree has 'DRM_I915_CTX_BAN_PERIOD' instead of 
'ctx->hang_stats.ban_period_seconds'.

However, I can only boot if I have both execlists and PPGTT disabled. 
With just PPGTT enabled, I get continuous GPU hangs and nothing ever 
gets rendered. With execlists enabled, I get a null pointer dereference 
in 'execlists_get_ring'. With both disabled, I can boot to an Ubuntu 
desktop but shortly after it goes pop with 'BUG_ON(obj->active == 0)' in 
'i915_gem_object_retire__read'.

This is running on BDW.

Am I missing some critical earlier patches?

Thanks,
John.


On 06/09/2014 10:28, Chris Wilson wrote:
> At the heart of this change is that the seqno is a too low level of an
> abstraction to handle the growing complexities of command tracking, both
> with the introduction of multiple command queues with execlists and the
> potential for reordering with a scheduler. On top of the seqno we have
> the request. Conceptually this is just a fence, but it also has
> substantial bookkeeping of its own in order to track the context and
> batch in flight, for example. It is the central structure upon which we
> can extend with dependency tracking et al.
>
> As regards the objects, they were using the seqno as a simple fence,
> upon which is check or even wait upon for command completion. This patch
> exchanges that seqno/ring pair with the request itself. For the
> majority, lifetime of the request is ordered by how we retire objects
> then requests. However, both the unlocked waits and probing elsewhere do
> not tie into the normal request lifetimes and so we need to introduce a
> kref. Extending the objects to use the request as the fence naturally
> extends to segregating read/write fence tracking. This has significance
> for it reduces the number of semaphores we need to emit, reducing the
> likelihood of #54226, and improving performance overall.
>
> v2: Rebase and split out the orthogonal tweaks.
>
> A silly happened with this patch. It seemed to nullify our earlier
> seqno-vs-interrupt w/a. I could not spot why, but gen6+ started to fail
> with missed interrupts (a good test of our robustness handling). So I
> ripped out the existing ACTHD read and replaced it with a RING_HEAD to
> manually check whether the request is complete. That also had the nice
> consequence of forcing __wait_request() to being the central arbiter of
> request completion. Note that during testing, it was not enough to
> re-enable the old workaround of keeping a forcewake reference whilst
> waiting upon the interrupt+seqno.
>
> The keener eyed reviewer will also spot that the reset_counter is moved
> into the request simplifying __wait_request() callsites and reducing the
> number of atomic reads by virtue of moving the check for a pending GPU
> reset to the endpoints of GPU access.
>
> v3: Implement the grand plan
>
> Since execlist landed with its upside-down abstraction, unveil the power
> of the request to remove all the duplication. To gain access to a ring,
> you must allocate a request. To allocate a request you must specify the
> context. Ergo all ring commands are carefully tracked by individual
> requests (which demarcate a single complete transaction with the GPU) in
> a known context (logical partitioning of the GPU with its own set of
> registers and rings - which may be shared with other partitions for
> backwards compatibility).
>
> v4:
>
> Tweak locking around execlist submission and request lists and
> remove duplicated execlist code and the peppering of execlist
> specific code throughout the core.
>
> To simplify rebasing, I pulled in the s/ring/engine/ rename, it adds
> a fair amount of noise of little significance and very easy to tune out.
>
> The patch itself consists of 3 heavily intertwined parts:
>
> 0. Rename ring and engine variables to be consistent with their usage.
> 1. Change the ring access API to require the context under which we
>     are operating. This generates a request which we use to build up a
>     ring transaction. The request tracks required flushes and
>     serialisation with both the GPU caches, other requests and the CPU.
> 2. Reorder initialisation such that we have a clearly defined context
>     and engines for the early ring access on module load, resume and
>     reset.
> 3. Convert the seqno tracking over to using requests (ala explicit
>     fencing).
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Jesse Barnes <jbarnes at virtuousgeek.org>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Damien Lespiau <damien.lespiau at intel.com>
> Cc: Oscar Mateo <oscar.mateo at intel.com>
> Cc: Brad Volkin <bradley.d.volkin at intel.com>
> Cc: "Kukanova, Svetlana" <svetlana.kukanova at intel.com>
> Cc: Akash Goel <akash.goel at intel.com>
> Cc: "Daniel, Thomas" <thomas.daniel at intel.com>
> Cc: "Siluvery, Arun" <arun.siluvery at linux.intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile                |    4 +-
>   drivers/gpu/drm/i915/i915_cmd_parser.c       |  150 +-
>   drivers/gpu/drm/i915/i915_debugfs.c          |  388 ++--
>   drivers/gpu/drm/i915/i915_dma.c              |   18 +-
>   drivers/gpu/drm/i915/i915_drv.c              |   46 +-
>   drivers/gpu/drm/i915/i915_drv.h              |  406 ++--
>   drivers/gpu/drm/i915/i915_gem.c              | 1759 +++++---------
>   drivers/gpu/drm/i915/i915_gem_context.c      |  508 +++--
>   drivers/gpu/drm/i915/i915_gem_debug.c        |  118 -
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  517 ++---
>   drivers/gpu/drm/i915/i915_gem_gtt.c          |  140 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h          |    4 +-
>   drivers/gpu/drm/i915/i915_gem_render_state.c |   69 +-
>   drivers/gpu/drm/i915/i915_gem_render_state.h |   47 -
>   drivers/gpu/drm/i915/i915_gem_request.c      |  651 ++++++
>   drivers/gpu/drm/i915/i915_gem_tiling.c       |    2 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c        |  396 ++--
>   drivers/gpu/drm/i915/i915_irq.c              |  341 +--
>   drivers/gpu/drm/i915/i915_reg.h              |    3 +-
>   drivers/gpu/drm/i915/i915_trace.h            |  215 +-
>   drivers/gpu/drm/i915/intel_display.c         |  355 ++-
>   drivers/gpu/drm/i915/intel_drv.h             |   14 +-
>   drivers/gpu/drm/i915/intel_lrc.c             | 1689 +++-----------
>   drivers/gpu/drm/i915/intel_lrc.h             |   80 +-
>   drivers/gpu/drm/i915/intel_overlay.c         |  200 +-
>   drivers/gpu/drm/i915/intel_pm.c              |   90 +-
>   drivers/gpu/drm/i915/intel_renderstate.h     |    8 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c      | 3171 +++++++++++++-------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h      |  391 ++--
>   29 files changed, 5397 insertions(+), 6383 deletions(-)
>   delete mode 100644 drivers/gpu/drm/i915/i915_gem_debug.c
>   delete mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.h
>   create mode 100644 drivers/gpu/drm/i915/i915_gem_request.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index c1dd485aeb6c..225e8a8206b2 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -17,14 +17,14 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
>   
>   # GEM code
>   i915-y += i915_cmd_parser.o \
> +	  i915_gem.o \
>   	  i915_gem_context.o \
>   	  i915_gem_render_state.o \
> -	  i915_gem_debug.o \
>   	  i915_gem_dmabuf.o \
>   	  i915_gem_evict.o \
>   	  i915_gem_execbuffer.o \
>   	  i915_gem_gtt.o \
> -	  i915_gem.o \
> +	  i915_gem_request.o \
>   	  i915_gem_stolen.o \
>   	  i915_gem_tiling.o \
>   	  i915_gem_userptr.o \
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index c45856bcc8b9..408e0bdba48c 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -501,7 +501,7 @@ static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
>   	return 0;
>   }
>   
> -static bool validate_cmds_sorted(struct intel_engine_cs *ring,
> +static bool validate_cmds_sorted(struct intel_engine_cs *engine,
>   				 const struct drm_i915_cmd_table *cmd_tables,
>   				 int cmd_table_count)
>   {
> @@ -523,7 +523,7 @@ static bool validate_cmds_sorted(struct intel_engine_cs *ring,
>   
>   			if (curr < previous) {
>   				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X prev=0x%08X\n",
> -					  ring->id, i, j, curr, previous);
> +					  engine->id, i, j, curr, previous);
>   				ret = false;
>   			}
>   
> @@ -555,11 +555,11 @@ static bool check_sorted(int ring_id, const u32 *reg_table, int reg_count)
>   	return ret;
>   }
>   
> -static bool validate_regs_sorted(struct intel_engine_cs *ring)
> +static bool validate_regs_sorted(struct intel_engine_cs *engine)
>   {
> -	return check_sorted(ring->id, ring->reg_table, ring->reg_count) &&
> -		check_sorted(ring->id, ring->master_reg_table,
> -			     ring->master_reg_count);
> +	return check_sorted(engine->id, engine->reg_table, engine->reg_count) &&
> +		check_sorted(engine->id, engine->master_reg_table,
> +			     engine->master_reg_count);
>   }
>   
>   struct cmd_node {
> @@ -583,13 +583,13 @@ struct cmd_node {
>    */
>   #define CMD_HASH_MASK STD_MI_OPCODE_MASK
>   
> -static int init_hash_table(struct intel_engine_cs *ring,
> +static int init_hash_table(struct intel_engine_cs *engine,
>   			   const struct drm_i915_cmd_table *cmd_tables,
>   			   int cmd_table_count)
>   {
>   	int i, j;
>   
> -	hash_init(ring->cmd_hash);
> +	hash_init(engine->cmd_hash);
>   
>   	for (i = 0; i < cmd_table_count; i++) {
>   		const struct drm_i915_cmd_table *table = &cmd_tables[i];
> @@ -604,7 +604,7 @@ static int init_hash_table(struct intel_engine_cs *ring,
>   				return -ENOMEM;
>   
>   			desc_node->desc = desc;
> -			hash_add(ring->cmd_hash, &desc_node->node,
> +			hash_add(engine->cmd_hash, &desc_node->node,
>   				 desc->cmd.value & CMD_HASH_MASK);
>   		}
>   	}
> @@ -612,21 +612,21 @@ static int init_hash_table(struct intel_engine_cs *ring,
>   	return 0;
>   }
>   
> -static void fini_hash_table(struct intel_engine_cs *ring)
> +static void fini_hash_table(struct intel_engine_cs *engine)
>   {
>   	struct hlist_node *tmp;
>   	struct cmd_node *desc_node;
>   	int i;
>   
> -	hash_for_each_safe(ring->cmd_hash, i, tmp, desc_node, node) {
> +	hash_for_each_safe(engine->cmd_hash, i, tmp, desc_node, node) {
>   		hash_del(&desc_node->node);
>   		kfree(desc_node);
>   	}
>   }
>   
>   /**
> - * i915_cmd_parser_init_ring() - set cmd parser related fields for a ringbuffer
> - * @ring: the ringbuffer to initialize
> + * i915_cmd_parser_init_engine() - set cmd parser related fields for a ringbuffer
> + * @engine: the ringbuffer to initialize
>    *
>    * Optionally initializes fields related to batch buffer command parsing in the
>    * struct intel_engine_cs based on whether the platform requires software
> @@ -634,18 +634,18 @@ static void fini_hash_table(struct intel_engine_cs *ring)
>    *
>    * Return: non-zero if initialization fails
>    */
> -int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
> +int i915_cmd_parser_init_engine(struct intel_engine_cs *engine)
>   {
>   	const struct drm_i915_cmd_table *cmd_tables;
>   	int cmd_table_count;
>   	int ret;
>   
> -	if (!IS_GEN7(ring->dev))
> +	if (!IS_GEN7(engine->i915))
>   		return 0;
>   
> -	switch (ring->id) {
> +	switch (engine->id) {
>   	case RCS:
> -		if (IS_HASWELL(ring->dev)) {
> +		if (IS_HASWELL(engine->i915)) {
>   			cmd_tables = hsw_render_ring_cmds;
>   			cmd_table_count =
>   				ARRAY_SIZE(hsw_render_ring_cmds);
> @@ -654,26 +654,26 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
>   			cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
>   		}
>   
> -		ring->reg_table = gen7_render_regs;
> -		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
> +		engine->reg_table = gen7_render_regs;
> +		engine->reg_count = ARRAY_SIZE(gen7_render_regs);
>   
> -		if (IS_HASWELL(ring->dev)) {
> -			ring->master_reg_table = hsw_master_regs;
> -			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
> +		if (IS_HASWELL(engine->i915)) {
> +			engine->master_reg_table = hsw_master_regs;
> +			engine->master_reg_count = ARRAY_SIZE(hsw_master_regs);
>   		} else {
> -			ring->master_reg_table = ivb_master_regs;
> -			ring->master_reg_count = ARRAY_SIZE(ivb_master_regs);
> +			engine->master_reg_table = ivb_master_regs;
> +			engine->master_reg_count = ARRAY_SIZE(ivb_master_regs);
>   		}
>   
> -		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> +		engine->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
>   		break;
>   	case VCS:
>   		cmd_tables = gen7_video_cmds;
>   		cmd_table_count = ARRAY_SIZE(gen7_video_cmds);
> -		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		engine->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
>   		break;
>   	case BCS:
> -		if (IS_HASWELL(ring->dev)) {
> +		if (IS_HASWELL(engine->i915)) {
>   			cmd_tables = hsw_blt_ring_cmds;
>   			cmd_table_count = ARRAY_SIZE(hsw_blt_ring_cmds);
>   		} else {
> @@ -681,68 +681,68 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
>   			cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
>   		}
>   
> -		ring->reg_table = gen7_blt_regs;
> -		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
> +		engine->reg_table = gen7_blt_regs;
> +		engine->reg_count = ARRAY_SIZE(gen7_blt_regs);
>   
> -		if (IS_HASWELL(ring->dev)) {
> -			ring->master_reg_table = hsw_master_regs;
> -			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
> +		if (IS_HASWELL(engine->i915)) {
> +			engine->master_reg_table = hsw_master_regs;
> +			engine->master_reg_count = ARRAY_SIZE(hsw_master_regs);
>   		} else {
> -			ring->master_reg_table = ivb_master_regs;
> -			ring->master_reg_count = ARRAY_SIZE(ivb_master_regs);
> +			engine->master_reg_table = ivb_master_regs;
> +			engine->master_reg_count = ARRAY_SIZE(ivb_master_regs);
>   		}
>   
> -		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> +		engine->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
>   		break;
>   	case VECS:
>   		cmd_tables = hsw_vebox_cmds;
>   		cmd_table_count = ARRAY_SIZE(hsw_vebox_cmds);
>   		/* VECS can use the same length_mask function as VCS */
> -		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		engine->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
>   		break;
>   	default:
> -		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
> -			  ring->id);
> +		DRM_ERROR("CMD: cmd_parser_init with unknown engine: %d\n",
> +			  engine->id);
>   		BUG();
>   	}
>   
> -	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
> -	BUG_ON(!validate_regs_sorted(ring));
> +	BUG_ON(!validate_cmds_sorted(engine, cmd_tables, cmd_table_count));
> +	BUG_ON(!validate_regs_sorted(engine));
>   
> -	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
> +	ret = init_hash_table(engine, cmd_tables, cmd_table_count);
>   	if (ret) {
>   		DRM_ERROR("CMD: cmd_parser_init failed!\n");
> -		fini_hash_table(ring);
> +		fini_hash_table(engine);
>   		return ret;
>   	}
>   
> -	ring->needs_cmd_parser = true;
> +	engine->needs_cmd_parser = true;
>   
>   	return 0;
>   }
>   
>   /**
> - * i915_cmd_parser_fini_ring() - clean up cmd parser related fields
> - * @ring: the ringbuffer to clean up
> + * i915_cmd_parser_fini_engine() - clean up cmd parser related fields
> + * @engine: the ringbuffer to clean up
>    *
>    * Releases any resources related to command parsing that may have been
> - * initialized for the specified ring.
> + * initialized for the specified engine.
>    */
> -void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
> +void i915_cmd_parser_fini_engine(struct intel_engine_cs *engine)
>   {
> -	if (!ring->needs_cmd_parser)
> +	if (!engine->needs_cmd_parser)
>   		return;
>   
> -	fini_hash_table(ring);
> +	fini_hash_table(engine);
>   }
>   
>   static const struct drm_i915_cmd_descriptor*
> -find_cmd_in_table(struct intel_engine_cs *ring,
> +find_cmd_in_table(struct intel_engine_cs *engine,
>   		  u32 cmd_header)
>   {
>   	struct cmd_node *desc_node;
>   
> -	hash_for_each_possible(ring->cmd_hash, desc_node, node,
> +	hash_for_each_possible(engine->cmd_hash, desc_node, node,
>   			       cmd_header & CMD_HASH_MASK) {
>   		const struct drm_i915_cmd_descriptor *desc = desc_node->desc;
>   		u32 masked_cmd = desc->cmd.mask & cmd_header;
> @@ -759,23 +759,23 @@ find_cmd_in_table(struct intel_engine_cs *ring,
>    * Returns a pointer to a descriptor for the command specified by cmd_header.
>    *
>    * The caller must supply space for a default descriptor via the default_desc
> - * parameter. If no descriptor for the specified command exists in the ring's
> + * parameter. If no descriptor for the specified command exists in the engine's
>    * command parser tables, this function fills in default_desc based on the
> - * ring's default length encoding and returns default_desc.
> + * engine's default length encoding and returns default_desc.
>    */
>   static const struct drm_i915_cmd_descriptor*
> -find_cmd(struct intel_engine_cs *ring,
> +find_cmd(struct intel_engine_cs *engine,
>   	 u32 cmd_header,
>   	 struct drm_i915_cmd_descriptor *default_desc)
>   {
>   	const struct drm_i915_cmd_descriptor *desc;
>   	u32 mask;
>   
> -	desc = find_cmd_in_table(ring, cmd_header);
> +	desc = find_cmd_in_table(engine, cmd_header);
>   	if (desc)
>   		return desc;
>   
> -	mask = ring->get_cmd_length_mask(cmd_header);
> +	mask = engine->get_cmd_length_mask(cmd_header);
>   	if (!mask)
>   		return NULL;
>   
> @@ -832,17 +832,17 @@ finish:
>   }
>   
>   /**
> - * i915_needs_cmd_parser() - should a given ring use software command parsing?
> - * @ring: the ring in question
> + * i915_needs_cmd_parser() - should a given engine use software command parsing?
> + * @engine: the engine in question
>    *
>    * Only certain platforms require software batch buffer command parsing, and
>    * only when enabled via module paramter.
>    *
> - * Return: true if the ring requires software command parsing
> + * Return: true if the engine requires software command parsing
>    */
> -bool i915_needs_cmd_parser(struct intel_engine_cs *ring)
> +bool i915_needs_cmd_parser(struct intel_engine_cs *engine)
>   {
> -	if (!ring->needs_cmd_parser)
> +	if (!engine->needs_cmd_parser)
>   		return false;
>   
>   	/*
> @@ -850,13 +850,13 @@ bool i915_needs_cmd_parser(struct intel_engine_cs *ring)
>   	 * disabled. That will cause all of the parser's PPGTT checks to
>   	 * fail. For now, disable parsing when PPGTT is off.
>   	 */
> -	if (USES_PPGTT(ring->dev))
> +	if (USES_PPGTT(engine->dev))
>   		return false;
>   
>   	return (i915.enable_cmd_parser == 1);
>   }
>   
> -static bool check_cmd(const struct intel_engine_cs *ring,
> +static bool check_cmd(const struct intel_engine_cs *engine,
>   		      const struct drm_i915_cmd_descriptor *desc,
>   		      const u32 *cmd,
>   		      const bool is_master,
> @@ -893,16 +893,16 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>   				*oacontrol_set = (cmd[2] != 0);
>   		}
>   
> -		if (!valid_reg(ring->reg_table,
> -			       ring->reg_count, reg_addr)) {
> +		if (!valid_reg(engine->reg_table,
> +			       engine->reg_count, reg_addr)) {
>   			if (!is_master ||
> -			    !valid_reg(ring->master_reg_table,
> -				       ring->master_reg_count,
> +			    !valid_reg(engine->master_reg_table,
> +				       engine->master_reg_count,
>   				       reg_addr)) {
> -				DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
> +				DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (engine=%d)\n",
>   						 reg_addr,
>   						 *cmd,
> -						 ring->id);
> +						 engine->id);
>   				return false;
>   			}
>   		}
> @@ -931,11 +931,11 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>   				desc->bits[i].mask;
>   
>   			if (dword != desc->bits[i].expected) {
> -				DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> +				DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (engine=%d)\n",
>   						 *cmd,
>   						 desc->bits[i].mask,
>   						 desc->bits[i].expected,
> -						 dword, ring->id);
> +						 dword, engine->id);
>   				return false;
>   			}
>   		}
> @@ -948,7 +948,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>   
>   /**
>    * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
> - * @ring: the ring on which the batch is to execute
> + * @engine: the engine on which the batch is to execute
>    * @batch_obj: the batch buffer in question
>    * @batch_start_offset: byte offset in the batch at which execution starts
>    * @is_master: is the submitting process the drm master?
> @@ -958,7 +958,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
>    *
>    * Return: non-zero if the parser finds violations or otherwise fails
>    */
> -int i915_parse_cmds(struct intel_engine_cs *ring,
> +int i915_parse_cmds(struct intel_engine_cs *engine,
>   		    struct drm_i915_gem_object *batch_obj,
>   		    u32 batch_start_offset,
>   		    bool is_master)
> @@ -995,7 +995,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>   		if (*cmd == MI_BATCH_BUFFER_END)
>   			break;
>   
> -		desc = find_cmd(ring, *cmd, &default_desc);
> +		desc = find_cmd(engine, *cmd, &default_desc);
>   		if (!desc) {
>   			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
>   					 *cmd);
> @@ -1017,7 +1017,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>   			break;
>   		}
>   
> -		if (!check_cmd(ring, desc, cmd, is_master, &oacontrol_set)) {
> +		if (!check_cmd(engine, desc, cmd, is_master, &oacontrol_set)) {
>   			ret = -EINVAL;
>   			break;
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2cbc85f3b237..4d0b5cff5291 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -123,19 +123,22 @@ static void
>   describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   {
>   	struct i915_vma *vma;
> -	int pin_count = 0;
> +	int pin_count = 0, n;
>   
> -	seq_printf(m, "%pK: %s%s%s %8zdKiB %02x %02x %u %u %u%s%s%s",
> +	seq_printf(m, "%pK: %s%s%s %8zdKiB %02x %02x [",
>   		   &obj->base,
>   		   get_pin_flag(obj),
>   		   get_tiling_flag(obj),
>   		   get_global_flag(obj),
>   		   obj->base.size / 1024,
>   		   obj->base.read_domains,
> -		   obj->base.write_domain,
> -		   obj->last_read_seqno,
> -		   obj->last_write_seqno,
> -		   obj->last_fenced_seqno,
> +		   obj->base.write_domain);
> +	for (n = 0; n < ARRAY_SIZE(obj->last_read); n++)
> +		seq_printf(m, " %x",
> +			   i915_request_seqno(obj->last_read[n].request));
> +	seq_printf(m, " ] %x %x%s%s%s",
> +		   i915_request_seqno(obj->last_write.request),
> +		   i915_request_seqno(obj->last_fence.request),
>   		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
>   		   obj->dirty ? " dirty" : "",
>   		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
> @@ -168,15 +171,15 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		*t = '\0';
>   		seq_printf(m, " (%s mappable)", s);
>   	}
> -	if (obj->ring != NULL)
> -		seq_printf(m, " (%s)", obj->ring->name);
> +	if (obj->last_write.request)
> +		seq_printf(m, " (%s)", obj->last_write.request->engine->name);
>   	if (obj->frontbuffer_bits)
>   		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
>   }
>   
>   static void describe_ctx(struct seq_file *m, struct intel_context *ctx)
>   {
> -	seq_putc(m, ctx->legacy_hw_ctx.initialized ? 'I' : 'i');
> +	seq_putc(m, ctx->ring[RCS].initialized ? 'I' : 'i');
>   	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
>   	seq_putc(m, ' ');
>   }
> @@ -336,7 +339,7 @@ static int per_file_stats(int id, void *ptr, void *data)
>   			if (ppgtt->file_priv != stats->file_priv)
>   				continue;
>   
> -			if (obj->ring) /* XXX per-vma statistic */
> +			if (obj->active) /* XXX per-vma statistic */
>   				stats->active += obj->base.size;
>   			else
>   				stats->inactive += obj->base.size;
> @@ -346,7 +349,7 @@ static int per_file_stats(int id, void *ptr, void *data)
>   	} else {
>   		if (i915_gem_obj_ggtt_bound(obj)) {
>   			stats->global += obj->base.size;
> -			if (obj->ring)
> +			if (obj->active)
>   				stats->active += obj->base.size;
>   			else
>   				stats->inactive += obj->base.size;
> @@ -544,14 +547,14 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>   				seq_printf(m, "Flip pending (waiting for vsync) on pipe %c (plane %c)\n",
>   					   pipe, plane);
>   			}
> -			if (work->flip_queued_ring) {
> +			if (work->flip_queued_request) {
> +				struct i915_gem_request *rq =
> +					work->flip_queued_request;
>   				seq_printf(m, "Flip queued on %s at seqno %u, next seqno %u [current breadcrumb %u], completed? %d\n",
> -					   work->flip_queued_ring->name,
> -					   work->flip_queued_seqno,
> -					   dev_priv->next_seqno,
> -					   work->flip_queued_ring->get_seqno(work->flip_queued_ring, true),
> -					   i915_seqno_passed(work->flip_queued_ring->get_seqno(work->flip_queued_ring, true),
> -							     work->flip_queued_seqno));
> +					   rq->engine->name,
> +					   rq->seqno, rq->i915->next_seqno,
> +					   rq->engine->get_seqno(rq->engine),
> +					   __i915_request_complete__wa(rq));
>   			} else
>   				seq_printf(m, "Flip not associated with any ring\n");
>   			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
> @@ -588,8 +591,8 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	struct drm_i915_gem_request *gem_request;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_request *rq;
>   	int ret, count, i;
>   
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -597,17 +600,15 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   		return ret;
>   
>   	count = 0;
> -	for_each_ring(ring, dev_priv, i) {
> -		if (list_empty(&ring->request_list))
> +	for_each_engine(engine, dev_priv, i) {
> +		if (list_empty(&engine->requests))
>   			continue;
>   
> -		seq_printf(m, "%s requests:\n", ring->name);
> -		list_for_each_entry(gem_request,
> -				    &ring->request_list,
> -				    list) {
> +		seq_printf(m, "%s requests:\n", engine->name);
> +		list_for_each_entry(rq, &engine->requests, engine_list) {
>   			seq_printf(m, "    %d @ %d\n",
> -				   gem_request->seqno,
> -				   (int) (jiffies - gem_request->emitted_jiffies));
> +				   rq->seqno,
> +				   (int)(jiffies - rq->emitted_jiffies));
>   		}
>   		count++;
>   	}
> @@ -619,13 +620,17 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>   
> -static void i915_ring_seqno_info(struct seq_file *m,
> -				 struct intel_engine_cs *ring)
> +static void i915_engine_seqno_info(struct seq_file *m,
> +				   struct intel_engine_cs *engine)
>   {
> -	if (ring->get_seqno) {
> -		seq_printf(m, "Current sequence (%s): %u\n",
> -			   ring->name, ring->get_seqno(ring, false));
> -	}
> +	seq_printf(m, "Current sequence (%s): seqno=%u, tag=%u [last breadcrumb %u, last request %u], next seqno=%u, next tag=%u\n",
> +		   engine->name,
> +		   engine->get_seqno(engine),
> +		   engine->tag,
> +		   engine->breadcrumb[engine->id],
> +		   engine->last_request ? engine->last_request->seqno : 0,
> +		   engine->i915->next_seqno,
> +		   engine->next_tag);
>   }
>   
>   static int i915_gem_seqno_info(struct seq_file *m, void *data)
> @@ -633,7 +638,7 @@ static int i915_gem_seqno_info(struct seq_file *m, void *data)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int ret, i;
>   
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -641,8 +646,8 @@ static int i915_gem_seqno_info(struct seq_file *m, void *data)
>   		return ret;
>   	intel_runtime_pm_get(dev_priv);
>   
> -	for_each_ring(ring, dev_priv, i)
> -		i915_ring_seqno_info(m, ring);
> +	for_each_engine(engine, dev_priv, i)
> +		i915_engine_seqno_info(m, engine);
>   
>   	intel_runtime_pm_put(dev_priv);
>   	mutex_unlock(&dev->struct_mutex);
> @@ -656,7 +661,7 @@ static int i915_interrupt_info(struct seq_file *m, void *data)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int ret, i, pipe;
>   
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -823,13 +828,13 @@ static int i915_interrupt_info(struct seq_file *m, void *data)
>   		seq_printf(m, "Graphics Interrupt mask:		%08x\n",
>   			   I915_READ(GTIMR));
>   	}
> -	for_each_ring(ring, dev_priv, i) {
> +	for_each_engine(engine, dev_priv, i) {
>   		if (INTEL_INFO(dev)->gen >= 6) {
>   			seq_printf(m,
>   				   "Graphics Interrupt mask (%s):	%08x\n",
> -				   ring->name, I915_READ_IMR(ring));
> +				   engine->name, I915_READ_IMR(engine));
>   		}
> -		i915_ring_seqno_info(m, ring);
> +		i915_engine_seqno_info(m, engine);
>   	}
>   	intel_runtime_pm_put(dev_priv);
>   	mutex_unlock(&dev->struct_mutex);
> @@ -871,12 +876,12 @@ static int i915_hws_info(struct seq_file *m, void *data)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	const u32 *hws;
>   	int i;
>   
> -	ring = &dev_priv->ring[(uintptr_t)node->info_ent->data];
> -	hws = ring->status_page.page_addr;
> +	engine = &dev_priv->engine[(uintptr_t)node->info_ent->data];
> +	hws = engine->status_page.page_addr;
>   	if (hws == NULL)
>   		return 0;
>   
> @@ -1000,7 +1005,7 @@ i915_next_seqno_set(void *data, u64 val)
>   	struct drm_device *dev = data;
>   	int ret;
>   
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	ret = i915_mutex_lock_interruptible(dev);
>   	if (ret)
>   		return ret;
>   
> @@ -1701,12 +1706,10 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>   
> -static void describe_ctx_ringbuf(struct seq_file *m,
> -				 struct intel_ringbuffer *ringbuf)
> +static void describe_ring(struct seq_file *m, struct intel_ringbuffer *ring)
>   {
>   	seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: %d)",
> -		   ringbuf->space, ringbuf->head, ringbuf->tail,
> -		   ringbuf->last_retired_head);
> +		   ring->space, ring->head, ring->tail, ring->retired_head);
>   }
>   
>   static int i915_context_status(struct seq_file *m, void *unused)
> @@ -1714,7 +1717,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	struct intel_context *ctx;
>   	int ret, i;
>   
> @@ -1728,42 +1731,26 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   		seq_putc(m, '\n');
>   	}
>   
> -	if (dev_priv->ips.renderctx) {
> -		seq_puts(m, "render context ");
> -		describe_obj(m, dev_priv->ips.renderctx);
> -		seq_putc(m, '\n');
> -	}
> -
>   	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> -		if (!i915.enable_execlists &&
> -		    ctx->legacy_hw_ctx.rcs_state == NULL)
> -			continue;
> -
>   		seq_puts(m, "HW context ");
>   		describe_ctx(m, ctx);
> -		for_each_ring(ring, dev_priv, i) {
> -			if (ring->default_context == ctx)
> +		for_each_engine(engine, dev_priv, i) {
> +			if (engine->default_context == ctx)
>   				seq_printf(m, "(default context %s) ",
> -					   ring->name);
> +					   engine->name);
>   		}
>   
> -		if (i915.enable_execlists) {
> +		seq_putc(m, '\n');
> +		for_each_engine(engine, dev_priv, i) {
> +			struct drm_i915_gem_object *obj = ctx->ring[i].state;
> +			struct intel_ringbuffer *ring = ctx->ring[i].ring;
> +
> +			seq_printf(m, "%s: ", engine->name);
> +			if (obj)
> +				describe_obj(m, obj);
> +			if (ring)
> +				describe_ring(m, ring);
>   			seq_putc(m, '\n');
> -			for_each_ring(ring, dev_priv, i) {
> -				struct drm_i915_gem_object *ctx_obj =
> -					ctx->engine[i].state;
> -				struct intel_ringbuffer *ringbuf =
> -					ctx->engine[i].ringbuf;
> -
> -				seq_printf(m, "%s: ", ring->name);
> -				if (ctx_obj)
> -					describe_obj(m, ctx_obj);
> -				if (ringbuf)
> -					describe_ctx_ringbuf(m, ringbuf);
> -				seq_putc(m, '\n');
> -			}
> -		} else {
> -			describe_obj(m, ctx->legacy_hw_ctx.rcs_state);
>   		}
>   
>   		seq_putc(m, '\n');
> @@ -1778,45 +1765,50 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>   {
>   	struct drm_info_node *node = (struct drm_info_node *) m->private;
>   	struct drm_device *dev = node->minor->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	struct intel_context *ctx;
> +	struct intel_engine_cs *engine;
>   	int ret, i;
>   
> -	if (!i915.enable_execlists) {
> -		seq_printf(m, "Logical Ring Contexts are disabled\n");
> -		return 0;
> -	}
> -
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
>   	if (ret)
>   		return ret;
>   
> -	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> -		for_each_ring(ring, dev_priv, i) {
> -			struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> +	for_each_engine(engine, to_i915(dev), i) {
> +		struct intel_ringbuffer *ring;
>   
> -			if (ring->default_context == ctx)
> -				continue;
> +		list_for_each_entry(ring, &engine->rings, engine_list) {
> +			struct intel_context *ctx = ring->ctx;
> +			struct task_struct *task;
> +
> +			seq_printf(m, "CONTEXT: %s", engine->name);
> +
> +			rcu_read_lock();
> +			task = ctx->file_priv ? pid_task(ctx->file_priv->file->pid, PIDTYPE_PID) : NULL;
> +			seq_printf(m, " %d:%d\n", task ? task->pid : 0, ctx->file_priv ? ctx->user_handle : 0);
> +			rcu_read_unlock();
>   
> -			if (ctx_obj) {
> -				struct page *page = i915_gem_object_get_page(ctx_obj, 1);
> -				uint32_t *reg_state = kmap_atomic(page);
> +			if (engine->execlists_enabled &&
> +			    ctx->ring[engine->id].state) {
> +				struct drm_i915_gem_object *obj;
> +				struct page *page;
> +				uint32_t *reg_state;
>   				int j;
>   
> -				seq_printf(m, "CONTEXT: %s %u\n", ring->name,
> -						intel_execlists_ctx_id(ctx_obj));
> +				obj = ctx->ring[engine->id].state;
> +				page = i915_gem_object_get_page(obj, 1);
> +				reg_state = kmap_atomic(page);
>   
> +				seq_printf(m, "\tLRCA:\n");
>   				for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
>   					seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
> -					i915_gem_obj_ggtt_offset(ctx_obj) + 4096 + (j * 4),
> +					i915_gem_obj_ggtt_offset(obj) + 4096 + (j * 4),
>   					reg_state[j], reg_state[j + 1],
>   					reg_state[j + 2], reg_state[j + 3]);
>   				}
>   				kunmap_atomic(reg_state);
>   
>   				seq_putc(m, '\n');
> -			}
> +			} else
> +				seq_puts(m, "\tLogical Ring Contexts are disabled\n");
>   		}
>   	}
>   
> @@ -1830,7 +1822,7 @@ static int i915_execlists(struct seq_file *m, void *data)
>   	struct drm_info_node *node = (struct drm_info_node *)m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	u32 status_pointer;
>   	u8 read_pointer;
>   	u8 write_pointer;
> @@ -1840,31 +1832,31 @@ static int i915_execlists(struct seq_file *m, void *data)
>   	int ring_id, i;
>   	int ret;
>   
> -	if (!i915.enable_execlists) {
> -		seq_puts(m, "Logical Ring Contexts are disabled\n");
> -		return 0;
> -	}
> -
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
>   	if (ret)
>   		return ret;
>   
> -	for_each_ring(ring, dev_priv, ring_id) {
> -		struct intel_ctx_submit_request *head_req = NULL;
> +	for_each_engine(engine, dev_priv, ring_id) {
> +		struct i915_gem_request *rq = NULL;
>   		int count = 0;
>   		unsigned long flags;
>   
> -		seq_printf(m, "%s\n", ring->name);
> +		seq_printf(m, "%s\n", engine->name);
> +
> +		if (!engine->execlists_enabled) {
> +			seq_puts(m, "\tExeclists are disabled\n");
> +			continue;
> +		}
>   
> -		status = I915_READ(RING_EXECLIST_STATUS(ring));
> -		ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
> +		status = I915_READ(RING_EXECLIST_STATUS(engine));
> +		ctx_id = I915_READ(RING_EXECLIST_STATUS(engine) + 4);
>   		seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
>   			   status, ctx_id);
>   
> -		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +		status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
>   		seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
>   
> -		read_pointer = ring->next_context_status_buffer;
> +		read_pointer = engine->next_context_status_buffer;
>   		write_pointer = status_pointer & 0x07;
>   		if (read_pointer > write_pointer)
>   			write_pointer += 6;
> @@ -1872,29 +1864,33 @@ static int i915_execlists(struct seq_file *m, void *data)
>   			   read_pointer, write_pointer);
>   
>   		for (i = 0; i < 6; i++) {
> -			status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
> -			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i + 4);
> +			status = I915_READ(RING_CONTEXT_STATUS_BUF(engine) + 8*i);
> +			ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(engine) + 8*i + 4);
>   
>   			seq_printf(m, "\tStatus buffer %d: 0x%08X, context: %u\n",
>   				   i, status, ctx_id);
>   		}
>   
> -		spin_lock_irqsave(&ring->execlist_lock, flags);
> -		list_for_each(cursor, &ring->execlist_queue)
> +		spin_lock_irqsave(&engine->irqlock, flags);
> +		list_for_each(cursor, &engine->pending)
>   			count++;
> -		head_req = list_first_entry_or_null(&ring->execlist_queue,
> -				struct intel_ctx_submit_request, execlist_link);
> -		spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +		rq = list_first_entry_or_null(&engine->pending, typeof(*rq), engine_list);
> +		spin_unlock_irqrestore(&engine->irqlock, flags);
>   
>   		seq_printf(m, "\t%d requests in queue\n", count);
> -		if (head_req) {
> -			struct drm_i915_gem_object *ctx_obj;
> -
> -			ctx_obj = head_req->ctx->engine[ring_id].state;
> -			seq_printf(m, "\tHead request id: %u\n",
> -				   intel_execlists_ctx_id(ctx_obj));
> -			seq_printf(m, "\tHead request tail: %u\n",
> -				   head_req->tail);
> +		if (rq) {
> +			struct intel_context *ctx = rq->ctx;
> +			struct task_struct *task;
> +
> +			seq_printf(m, "\tHead request ctx:");
> +
> +			rcu_read_lock();
> +			task = ctx->file_priv ? pid_task(ctx->file_priv->file->pid, PIDTYPE_PID) : NULL;
> +			seq_printf(m, " %d:%d\n", task ? task->pid : 0, ctx->file_priv ? ctx->user_handle : 0);
> +			rcu_read_unlock();
> +
> +			seq_printf(m, "\tHead request tail: %u\n", rq->tail);
> +			seq_printf(m, "\tHead request seqno: %d\n", rq->seqno);
>   		}
>   
>   		seq_putc(m, '\n');
> @@ -2025,7 +2021,7 @@ static int per_file_ctx(int id, void *ptr, void *data)
>   static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>   	int unused, i;
>   
> @@ -2034,13 +2030,13 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   
>   	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
>   	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
> -	for_each_ring(ring, dev_priv, unused) {
> -		seq_printf(m, "%s\n", ring->name);
> +	for_each_engine(engine, dev_priv, unused) {
> +		seq_printf(m, "%s\n", engine->name);
>   		for (i = 0; i < 4; i++) {
>   			u32 offset = 0x270 + i * 8;
> -			u64 pdp = I915_READ(ring->mmio_base + offset + 4);
> +			u64 pdp = I915_READ(engine->mmio_base + offset + 4);
>   			pdp <<= 32;
> -			pdp |= I915_READ(ring->mmio_base + offset);
> +			pdp |= I915_READ(engine->mmio_base + offset);
>   			seq_printf(m, "\tPDP%d 0x%016llx\n", i, pdp);
>   		}
>   	}
> @@ -2049,20 +2045,20 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	struct drm_file *file;
>   	int i;
>   
>   	if (INTEL_INFO(dev)->gen == 6)
>   		seq_printf(m, "GFX_MODE: 0x%08x\n", I915_READ(GFX_MODE));
>   
> -	for_each_ring(ring, dev_priv, i) {
> -		seq_printf(m, "%s\n", ring->name);
> +	for_each_engine(engine, dev_priv, i) {
> +		seq_printf(m, "%s\n", engine->name);
>   		if (INTEL_INFO(dev)->gen == 7)
> -			seq_printf(m, "GFX_MODE: 0x%08x\n", I915_READ(RING_MODE_GEN7(ring)));
> -		seq_printf(m, "PP_DIR_BASE: 0x%08x\n", I915_READ(RING_PP_DIR_BASE(ring)));
> -		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
> -		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
> +			seq_printf(m, "GFX_MODE: 0x%08x\n", I915_READ(RING_MODE_GEN7(engine)));
> +		seq_printf(m, "PP_DIR_BASE: 0x%08x\n", I915_READ(RING_PP_DIR_BASE(engine)));
> +		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(engine)));
> +		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(engine)));
>   	}
>   	if (dev_priv->mm.aliasing_ppgtt) {
>   		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
> @@ -2549,67 +2545,62 @@ static int i915_semaphore_status(struct seq_file *m, void *unused)
>   	struct drm_info_node *node = (struct drm_info_node *) m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
>   	int i, j, ret;
>   
> -	if (!i915_semaphore_is_enabled(dev)) {
> -		seq_puts(m, "Semaphores are disabled\n");
> -		return 0;
> -	}
> -
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
>   	if (ret)
>   		return ret;
>   	intel_runtime_pm_get(dev_priv);
>   
> -	if (IS_BROADWELL(dev)) {
> -		struct page *page;
> -		uint64_t *seqno;
> +	seq_puts(m, "  Last breadcrumb:");
> +	for_each_engine(engine, dev_priv, i)
> +		for (j = 0; j < num_rings; j++)
> +			seq_printf(m, "0x%08x\n",
> +				   engine->breadcrumb[j]);
> +	seq_putc(m, '\n');
>   
> -		page = i915_gem_object_get_page(dev_priv->semaphore_obj, 0);
> +	if (engine->semaphore.wait) {
> +		if (IS_BROADWELL(dev)) {
> +			struct page *page;
> +			uint64_t *seqno;
>   
> -		seqno = (uint64_t *)kmap_atomic(page);
> -		for_each_ring(ring, dev_priv, i) {
> -			uint64_t offset;
> +			page = i915_gem_object_get_page(dev_priv->semaphore_obj, 0);
>   
> -			seq_printf(m, "%s\n", ring->name);
> +			seqno = (uint64_t *)kmap_atomic(page);
> +			for_each_engine(engine, dev_priv, i) {
> +				uint64_t offset;
>   
> -			seq_puts(m, "  Last signal:");
> -			for (j = 0; j < num_rings; j++) {
> -				offset = i * I915_NUM_RINGS + j;
> -				seq_printf(m, "0x%08llx (0x%02llx) ",
> -					   seqno[offset], offset * 8);
> -			}
> -			seq_putc(m, '\n');
> +				seq_printf(m, "%s\n", engine->name);
>   
> -			seq_puts(m, "  Last wait:  ");
> -			for (j = 0; j < num_rings; j++) {
> -				offset = i + (j * I915_NUM_RINGS);
> -				seq_printf(m, "0x%08llx (0x%02llx) ",
> -					   seqno[offset], offset * 8);
> -			}
> -			seq_putc(m, '\n');
> +				seq_puts(m, "  Last signal:");
> +				for (j = 0; j < num_rings; j++) {
> +					offset = i * I915_NUM_ENGINES + j;
> +					seq_printf(m, "0x%08llx (0x%02llx) ",
> +						   seqno[offset], offset * 8);
> +				}
> +				seq_putc(m, '\n');
>   
> -		}
> -		kunmap_atomic(seqno);
> -	} else {
> -		seq_puts(m, "  Last signal:");
> -		for_each_ring(ring, dev_priv, i)
> -			for (j = 0; j < num_rings; j++)
> -				seq_printf(m, "0x%08x\n",
> -					   I915_READ(ring->semaphore.mbox.signal[j]));
> -		seq_putc(m, '\n');
> -	}
> +				seq_puts(m, "  Last wait:  ");
> +				for (j = 0; j < num_rings; j++) {
> +					offset = i + (j * I915_NUM_ENGINES);
> +					seq_printf(m, "0x%08llx (0x%02llx) ",
> +						   seqno[offset], offset * 8);
> +				}
> +				seq_putc(m, '\n');
>   
> -	seq_puts(m, "\nSync seqno:\n");
> -	for_each_ring(ring, dev_priv, i) {
> -		for (j = 0; j < num_rings; j++) {
> -			seq_printf(m, "  0x%08x ", ring->semaphore.sync_seqno[j]);
> +			}
> +			kunmap_atomic(seqno);
> +		} else {
> +			seq_puts(m, "  Last signal:");
> +			for_each_engine(engine, dev_priv, i)
> +				for (j = 0; j < num_rings; j++)
> +					seq_printf(m, "0x%08x\n",
> +						   I915_READ(engine->semaphore.mbox.signal[j]));
> +			seq_putc(m, '\n');
>   		}
> -		seq_putc(m, '\n');
>   	}
> -	seq_putc(m, '\n');
>   
>   	intel_runtime_pm_put(dev_priv);
>   	mutex_unlock(&dev->struct_mutex);
> @@ -3826,7 +3817,6 @@ i915_drop_caches_set(void *data, u64 val)
>   {
>   	struct drm_device *dev = data;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_gem_object *obj, *next;
>   	int ret;
>   
>   	DRM_DEBUG("Dropping caches: 0x%08llx\n", val);
> @@ -3847,10 +3837,18 @@ i915_drop_caches_set(void *data, u64 val)
>   		i915_gem_retire_requests(dev);
>   
>   	if (val & DROP_BOUND) {
> -		list_for_each_entry_safe(obj, next, &dev_priv->mm.bound_list,
> -					 global_list) {
> +		struct list_head still_in_list;
> +
> +		INIT_LIST_HEAD(&still_in_list);
> +		while (!list_empty(&dev_priv->mm.bound_list)) {
> +			struct drm_i915_gem_object *obj;
>   			struct i915_vma *vma, *v;
>   
> +			obj = list_first_entry(&dev_priv->mm.bound_list,
> +					       typeof(*obj), global_list);
> +
> +			list_move_tail(&obj->global_list, &still_in_list);
> +
>   			ret = 0;
>   			drm_gem_object_reference(&obj->base);
>   			list_for_each_entry_safe(vma, v, &obj->vma_list, vma_link) {
> @@ -3865,16 +3863,30 @@ i915_drop_caches_set(void *data, u64 val)
>   			if (ret)
>   				goto unlock;
>   		}
> +
> +		list_splice(&still_in_list, &dev_priv->mm.bound_list);
>   	}
>   
>   	if (val & DROP_UNBOUND) {
> -		list_for_each_entry_safe(obj, next, &dev_priv->mm.unbound_list,
> -					 global_list)
> +		struct list_head still_in_list;
> +
> +		INIT_LIST_HEAD(&still_in_list);
> +		while (!list_empty(&dev_priv->mm.unbound_list)) {
> +			struct drm_i915_gem_object *obj;
> +
> +			obj = list_first_entry(&dev_priv->mm.unbound_list,
> +					       typeof(*obj), global_list);
> +
> +			list_move_tail(&obj->global_list, &still_in_list);
> +
>   			if (obj->pages_pin_count == 0) {
>   				ret = i915_gem_object_put_pages(obj);
>   				if (ret)
>   					goto unlock;
>   			}
> +		}
> +
> +		list_splice(&still_in_list, &dev_priv->mm.unbound_list);
>   	}
>   
>   unlock:
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index a729721595b0..681e7416702c 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -142,13 +142,13 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   		value = 1;
>   		break;
>   	case I915_PARAM_HAS_BSD:
> -		value = intel_ring_initialized(&dev_priv->ring[VCS]);
> +		value = intel_engine_initialized(&dev_priv->engine[VCS]);
>   		break;
>   	case I915_PARAM_HAS_BLT:
> -		value = intel_ring_initialized(&dev_priv->ring[BCS]);
> +		value = intel_engine_initialized(&dev_priv->engine[BCS]);
>   		break;
>   	case I915_PARAM_HAS_VEBOX:
> -		value = intel_ring_initialized(&dev_priv->ring[VECS]);
> +		value = intel_engine_initialized(&dev_priv->engine[VECS]);
>   		break;
>   	case I915_PARAM_HAS_RELAXED_FENCING:
>   		value = 1;
> @@ -178,7 +178,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   		value = 1;
>   		break;
>   	case I915_PARAM_HAS_SEMAPHORES:
> -		value = i915_semaphore_is_enabled(dev);
> +		value = RCS_ENGINE(dev_priv)->semaphore.wait != NULL;
>   		break;
>   	case I915_PARAM_HAS_PRIME_VMAP_FLUSH:
>   		value = 1;
> @@ -512,8 +512,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
>   
>   cleanup_gem:
>   	mutex_lock(&dev->struct_mutex);
> -	i915_gem_cleanup_ringbuffer(dev);
> -	i915_gem_context_fini(dev);
> +	i915_gem_fini(dev);
>   	mutex_unlock(&dev->struct_mutex);
>   cleanup_irq:
>   	drm_irq_uninstall(dev);
> @@ -698,6 +697,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
>   	if (!drm_core_check_feature(dev, DRIVER_MODESET) && !dev->agp)
>   		return -EINVAL;
>   
> +	BUILD_BUG_ON(I915_NUM_ENGINES >= (1 << I915_NUM_ENGINE_BITS));
> +
>   	dev_priv = kzalloc(sizeof(*dev_priv), GFP_KERNEL);
>   	if (dev_priv == NULL)
>   		return -ENOMEM;
> @@ -997,8 +998,7 @@ int i915_driver_unload(struct drm_device *dev)
>   		flush_workqueue(dev_priv->wq);
>   
>   		mutex_lock(&dev->struct_mutex);
> -		i915_gem_cleanup_ringbuffer(dev);
> -		i915_gem_context_fini(dev);
> +		i915_gem_fini(dev);
>   		mutex_unlock(&dev->struct_mutex);
>   		i915_gem_cleanup_stolen(dev);
>   	}
> @@ -1084,8 +1084,6 @@ void i915_driver_postclose(struct drm_device *dev, struct drm_file *file)
>   {
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   
> -	if (file_priv && file_priv->bsd_ring)
> -		file_priv->bsd_ring = NULL;
>   	kfree(file_priv);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4f9c2478aba1..ab504ecc848e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -473,30 +473,6 @@ void intel_detect_pch(struct drm_device *dev)
>   	pci_dev_put(pch);
>   }
>   
> -bool i915_semaphore_is_enabled(struct drm_device *dev)
> -{
> -	if (INTEL_INFO(dev)->gen < 6)
> -		return false;
> -
> -	if (i915.semaphores >= 0)
> -		return i915.semaphores;
> -
> -	/* TODO: make semaphores and Execlists play nicely together */
> -	if (i915.enable_execlists)
> -		return false;
> -
> -	/* Until we get further testing... */
> -	if (IS_GEN8(dev))
> -		return false;
> -
> -#ifdef CONFIG_INTEL_IOMMU
> -	/* Enable semaphores on SNB when IO remapping is off */
> -	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped)
> -		return false;
> -#endif
> -
> -	return true;
> -}
>   
>   void intel_hpd_cancel_work(struct drm_i915_private *dev_priv)
>   {
> @@ -795,7 +771,6 @@ static int i915_resume_legacy(struct drm_device *dev)
>   int i915_reset(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	bool simulated;
>   	int ret;
>   
>   	if (!i915.reset)
> @@ -803,14 +778,16 @@ int i915_reset(struct drm_device *dev)
>   
>   	mutex_lock(&dev->struct_mutex);
>   
> -	i915_gem_reset(dev);
> -
> -	simulated = dev_priv->gpu_error.stop_rings != 0;
> -
>   	ret = intel_gpu_reset(dev);
>   
> +	/* Clear the reset counter. Before anyone else
> +	 * can grab the mutex, we will declare whether or
> +	 * not the GPU is wedged.
> +	 */
> +	atomic_inc(&dev_priv->gpu_error.reset_counter);
> +
>   	/* Also reset the gpu hangman. */
> -	if (simulated) {
> +	if (dev_priv->gpu_error.stop_rings) {
>   		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
>   		dev_priv->gpu_error.stop_rings = 0;
>   		if (ret == -ENODEV) {
> @@ -820,6 +797,8 @@ int i915_reset(struct drm_device *dev)
>   		}
>   	}
>   
> +	i915_gem_reset(dev);
> +
>   	if (ret) {
>   		DRM_ERROR("Failed to reset chip: %i\n", ret);
>   		mutex_unlock(&dev->struct_mutex);
> @@ -843,14 +822,7 @@ int i915_reset(struct drm_device *dev)
>   	if (drm_core_check_feature(dev, DRIVER_MODESET) ||
>   			!dev_priv->ums.mm_suspended) {
>   		dev_priv->ums.mm_suspended = 0;
> -
> -		/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
> -		dev_priv->gpu_error.reload_in_reset = true;
> -
>   		ret = i915_gem_init_hw(dev);
> -
> -		dev_priv->gpu_error.reload_in_reset = false;
> -
>   		mutex_unlock(&dev->struct_mutex);
>   		if (ret) {
>   			DRM_ERROR("Failed hw init on reset %d\n", ret);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 19d2b060c18c..9529b6b0fef6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -37,7 +37,6 @@
>   #include "intel_ringbuffer.h"
>   #include "intel_lrc.h"
>   #include "i915_gem_gtt.h"
> -#include "i915_gem_render_state.h"
>   #include <linux/io-mapping.h>
>   #include <linux/i2c.h>
>   #include <linux/i2c-algo-bit.h>
> @@ -194,6 +193,7 @@ enum hpd_pin {
>   struct drm_i915_private;
>   struct i915_mm_struct;
>   struct i915_mmu_object;
> +struct i915_gem_request;
>   
>   enum intel_dpll_id {
>   	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
> @@ -323,6 +323,7 @@ struct drm_i915_error_state {
>   	u32 pgtbl_er;
>   	u32 ier;
>   	u32 gtier[4];
> +	u32 gtimr[4];
>   	u32 ccid;
>   	u32 derrmr;
>   	u32 forcewake;
> @@ -340,23 +341,26 @@ struct drm_i915_error_state {
>   	struct drm_i915_error_object *semaphore_obj;
>   
>   	struct drm_i915_error_ring {
> +		int id;
>   		bool valid;
>   		/* Software tracked state */
>   		bool waiting;
>   		int hangcheck_score;
> -		enum intel_ring_hangcheck_action hangcheck_action;
> +		enum intel_engine_hangcheck_action hangcheck_action;
>   		int num_requests;
>   
>   		/* our own tracking of ring head and tail */
>   		u32 cpu_ring_head;
>   		u32 cpu_ring_tail;
> -
> -		u32 semaphore_seqno[I915_NUM_RINGS - 1];
> +		u32 interrupts;
> +		u32 irq_count;
>   
>   		/* Register state */
>   		u32 tail;
>   		u32 head;
> +		u32 start;
>   		u32 ctl;
> +		u32 mode;
>   		u32 hws;
>   		u32 ipeir;
>   		u32 ipehr;
> @@ -364,13 +368,15 @@ struct drm_i915_error_state {
>   		u32 bbstate;
>   		u32 instpm;
>   		u32 instps;
> -		u32 seqno;
> +		u32 seqno, request, tag, hangcheck;
> +		u32 breadcrumb[I915_NUM_ENGINES];
>   		u64 bbaddr;
>   		u64 acthd;
>   		u32 fault_reg;
>   		u64 faddr;
>   		u32 rc_psmi; /* sleep state */
> -		u32 semaphore_mboxes[I915_NUM_RINGS - 1];
> +		u32 semaphore_mboxes[I915_NUM_ENGINES];
> +		u32 semaphore_sync[I915_NUM_ENGINES];
>   
>   		struct drm_i915_error_object {
>   			int page_count;
> @@ -380,8 +386,14 @@ struct drm_i915_error_state {
>   
>   		struct drm_i915_error_request {
>   			long jiffies;
> -			u32 seqno;
> +			long pid;
> +			u32 batch;
> +			u32 head;
>   			u32 tail;
> +			u32 seqno;
> +			u32 breadcrumb[I915_NUM_ENGINES];
> +			u32 complete;
> +			u32 tag;
>   		} *requests;
>   
>   		struct {
> @@ -394,12 +406,12 @@ struct drm_i915_error_state {
>   
>   		pid_t pid;
>   		char comm[TASK_COMM_LEN];
> -	} ring[I915_NUM_RINGS];
> +	} ring[I915_NUM_ENGINES];
>   
>   	struct drm_i915_error_buffer {
>   		u32 size;
>   		u32 name;
> -		u32 rseqno, wseqno;
> +		u32 rseqno[I915_NUM_ENGINES], wseqno, fseqno;
>   		u32 gtt_offset;
>   		u32 read_domains;
>   		u32 write_domain;
> @@ -471,10 +483,10 @@ struct drm_i915_display_funcs {
>   			  struct drm_display_mode *mode);
>   	void (*fdi_link_train)(struct drm_crtc *crtc);
>   	void (*init_clock_gating)(struct drm_device *dev);
> -	int (*queue_flip)(struct drm_device *dev, struct drm_crtc *crtc,
> +	int (*queue_flip)(struct i915_gem_request *rq,
> +			  struct intel_crtc *crtc,
>   			  struct drm_framebuffer *fb,
>   			  struct drm_i915_gem_object *obj,
> -			  struct intel_engine_cs *ring,
>   			  uint32_t flags);
>   	void (*update_primary_plane)(struct drm_crtc *crtc,
>   				     struct drm_framebuffer *fb,
> @@ -626,24 +638,18 @@ struct i915_ctx_hang_stats {
>    */
>   struct intel_context {
>   	struct kref ref;
> +	struct drm_i915_private *i915;
>   	int user_handle;
>   	uint8_t remap_slice;
>   	struct drm_i915_file_private *file_priv;
>   	struct i915_ctx_hang_stats hang_stats;
>   	struct i915_hw_ppgtt *ppgtt;
>   
> -	/* Legacy ring buffer submission */
> -	struct {
> -		struct drm_i915_gem_object *rcs_state;
> -		bool initialized;
> -	} legacy_hw_ctx;
> -
> -	/* Execlists */
> -	bool rcs_initialized;
> -	struct {
> +	struct intel_engine_context {
> +		struct intel_ringbuffer *ring;
>   		struct drm_i915_gem_object *state;
> -		struct intel_ringbuffer *ringbuf;
> -	} engine[I915_NUM_RINGS];
> +		bool initialized;
> +	} ring[I915_NUM_ENGINES];
>   
>   	struct list_head link;
>   };
> @@ -1028,7 +1034,6 @@ struct intel_ilk_power_mgmt {
>   	int r_t;
>   
>   	struct drm_i915_gem_object *pwrctx;
> -	struct drm_i915_gem_object *renderctx;
>   };
>   
>   struct drm_i915_private;
> @@ -1253,9 +1258,6 @@ struct i915_gpu_error {
>   
>   	/* For missed irq/seqno simulation. */
>   	unsigned int test_irq_rings;
> -
> -	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset   */
> -	bool reload_in_reset;
>   };
>   
>   enum modeset_restore {
> @@ -1460,9 +1462,10 @@ struct drm_i915_private {
>   	wait_queue_head_t gmbus_wait_queue;
>   
>   	struct pci_dev *bridge_dev;
> -	struct intel_engine_cs ring[I915_NUM_RINGS];
> +	struct intel_engine_cs engine[I915_NUM_ENGINES];
> +	struct intel_context *default_context;
>   	struct drm_i915_gem_object *semaphore_obj;
> -	uint32_t last_seqno, next_seqno;
> +	uint32_t next_seqno;
>   
>   	drm_dma_handle_t *status_page_dmah;
>   	struct resource mch_res;
> @@ -1673,21 +1676,6 @@ struct drm_i915_private {
>   
>   	/* Old ums support infrastructure, same warning applies. */
>   	struct i915_ums_state ums;
> -
> -	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
> -	struct {
> -		int (*do_execbuf)(struct drm_device *dev, struct drm_file *file,
> -				  struct intel_engine_cs *ring,
> -				  struct intel_context *ctx,
> -				  struct drm_i915_gem_execbuffer2 *args,
> -				  struct list_head *vmas,
> -				  struct drm_i915_gem_object *batch_obj,
> -				  u64 exec_start, u32 flags);
> -		int (*init_rings)(struct drm_device *dev);
> -		void (*cleanup_ring)(struct intel_engine_cs *ring);
> -		void (*stop_ring)(struct intel_engine_cs *ring);
> -	} gt;
> -
>   	/*
>   	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>   	 * will be rejected. Instead look for a better place.
> @@ -1700,9 +1688,11 @@ static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
>   }
>   
>   /* Iterate over initialised rings */
> -#define for_each_ring(ring__, dev_priv__, i__) \
> -	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
> -		if (((ring__) = &(dev_priv__)->ring[(i__)]), intel_ring_initialized((ring__)))
> +#define for_each_engine(engine__, dev_priv__, i__) \
> +	for ((i__) = 0; (i__) < I915_NUM_ENGINES; (i__)++) \
> +		if (((engine__) = &(dev_priv__)->engine[(i__)]), intel_engine_initialized((engine__)))
> +
> +#define RCS_ENGINE(x) (&__I915__(x)->engine[RCS])
>   
>   enum hdmi_force_audio {
>   	HDMI_AUDIO_OFF_DVI = -2,	/* no aux data for HDMI-DVI converter */
> @@ -1767,16 +1757,15 @@ struct drm_i915_gem_object {
>   	struct drm_mm_node *stolen;
>   	struct list_head global_list;
>   
> -	struct list_head ring_list;
>   	/** Used in execbuf to temporarily hold a ref */
>   	struct list_head obj_exec_link;
>   
>   	/**
>   	 * This is set if the object is on the active lists (has pending
> -	 * rendering and so a non-zero seqno), and is not set if it i s on
> -	 * inactive (ready to be unbound) list.
> +	 * rendering and so a submitted request), and is not set if it is on
> +	 * inactive (ready to be unbound) list. We track activity per engine.
>   	 */
> -	unsigned int active:1;
> +	unsigned int active:I915_NUM_ENGINE_BITS;
>   
>   	/**
>   	 * This is set if the object has been written to since last bound
> @@ -1844,13 +1833,11 @@ struct drm_i915_gem_object {
>   	void *dma_buf_vmapping;
>   	int vmapping_count;
>   
> -	struct intel_engine_cs *ring;
> -
> -	/** Breadcrumb of last rendering to the buffer. */
> -	uint32_t last_read_seqno;
> -	uint32_t last_write_seqno;
> -	/** Breadcrumb of last fenced GPU access to the buffer. */
> -	uint32_t last_fenced_seqno;
> +	/** Breadcrumbs of last rendering to the buffer. */
> +	struct {
> +		struct i915_gem_request *request;
> +		struct list_head engine_list;
> +	} last_write, last_read[I915_NUM_ENGINES], last_fence;
>   
>   	/** Current tiling stride for the object, if it's tiled. */
>   	uint32_t stride;
> @@ -1888,44 +1875,13 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   		       unsigned frontbuffer_bits);
>   
>   /**
> - * Request queue structure.
> - *
> - * The request queue allows us to note sequence numbers that have been emitted
> - * and may be associated with active buffers to be retired.
> - *
> - * By keeping this list, we can avoid having to do questionable
> - * sequence-number comparisons on buffer last_rendering_seqnos, and associate
> - * an emission time with seqnos for tracking how far ahead of the GPU we are.
> + * Returns true if seq1 is later than seq2.
>    */
> -struct drm_i915_gem_request {
> -	/** On Which ring this request was generated */
> -	struct intel_engine_cs *ring;
> -
> -	/** GEM sequence number associated with this request. */
> -	uint32_t seqno;
> -
> -	/** Position in the ringbuffer of the start of the request */
> -	u32 head;
> -
> -	/** Position in the ringbuffer of the end of the request */
> -	u32 tail;
> -
> -	/** Context related to this request */
> -	struct intel_context *ctx;
> -
> -	/** Batch buffer related to this request if any */
> -	struct drm_i915_gem_object *batch_obj;
> -
> -	/** Time at which this request was emitted, in jiffies. */
> -	unsigned long emitted_jiffies;
> -
> -	/** global list entry for this request */
> -	struct list_head list;
> -
> -	struct drm_i915_file_private *file_priv;
> -	/** file_priv list entry for this request */
> -	struct list_head client_list;
> -};
> +static inline bool
> +__i915_seqno_passed(uint32_t seq1, uint32_t seq2)
> +{
> +	return (int32_t)(seq1 - seq2) >= 0;
> +}
>   
>   struct drm_i915_file_private {
>   	struct drm_i915_private *dev_priv;
> @@ -1939,7 +1895,7 @@ struct drm_i915_file_private {
>   	struct idr context_idr;
>   
>   	atomic_t rps_wait_boost;
> -	struct  intel_engine_cs *bsd_ring;
> +	struct intel_engine_cs *bsd_engine;
>   };
>   
>   /*
> @@ -2119,7 +2075,7 @@ struct drm_i915_cmd_table {
>   				 to_i915(dev)->ellc_size)
>   #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
>   
> -#define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
> +#define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 5)
>   #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
>   #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >= 6)
>   #define HAS_PPGTT(dev)		(INTEL_INFO(dev)->gen >= 7 && !IS_GEN8(dev))
> @@ -2227,7 +2183,7 @@ struct i915_params {
>   };
>   extern struct i915_params i915 __read_mostly;
>   
> -				/* i915_dma.c */
> +/* i915_dma.c */
>   extern int i915_driver_load(struct drm_device *, unsigned long flags);
>   extern int i915_driver_unload(struct drm_device *);
>   extern int i915_driver_open(struct drm_device *dev, struct drm_file *file);
> @@ -2297,20 +2253,6 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>   			      struct drm_file *file_priv);
>   int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>   			     struct drm_file *file_priv);
> -void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> -					struct intel_engine_cs *ring);
> -void i915_gem_execbuffer_retire_commands(struct drm_device *dev,
> -					 struct drm_file *file,
> -					 struct intel_engine_cs *ring,
> -					 struct drm_i915_gem_object *obj);
> -int i915_gem_ringbuffer_submission(struct drm_device *dev,
> -				   struct drm_file *file,
> -				   struct intel_engine_cs *ring,
> -				   struct intel_context *ctx,
> -				   struct drm_i915_gem_execbuffer2 *args,
> -				   struct list_head *vmas,
> -				   struct drm_i915_gem_object *batch_obj,
> -				   u64 exec_start, u32 flags);
>   int i915_gem_execbuffer(struct drm_device *dev, void *data,
>   			struct drm_file *file_priv);
>   int i915_gem_execbuffer2(struct drm_device *dev, void *data,
> @@ -2397,22 +2339,12 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>   
>   int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>   int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> -			 struct intel_engine_cs *to);
> -void i915_vma_move_to_active(struct i915_vma *vma,
> -			     struct intel_engine_cs *ring);
> +			 struct i915_gem_request *rq);
>   int i915_gem_dumb_create(struct drm_file *file_priv,
>   			 struct drm_device *dev,
>   			 struct drm_mode_create_dumb *args);
>   int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
>   		      uint32_t handle, uint64_t *offset);
> -/**
> - * Returns true if seq1 is later than seq2.
> - */
> -static inline bool
> -i915_seqno_passed(uint32_t seq1, uint32_t seq2)
> -{
> -	return (int32_t)(seq1 - seq2) >= 0;
> -}
>   
>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
> @@ -2422,24 +2354,33 @@ int __must_check i915_gem_object_put_fence(struct drm_i915_gem_object *obj);
>   bool i915_gem_object_pin_fence(struct drm_i915_gem_object *obj);
>   void i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj);
>   
> -struct drm_i915_gem_request *
> -i915_gem_find_active_request(struct intel_engine_cs *ring);
> -
>   bool i915_gem_retire_requests(struct drm_device *dev);
> -void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
> -int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
> -				      bool interruptible);
> -int __must_check i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno);
> +void i915_gem_retire_requests__engine(struct intel_engine_cs *engine);
> +
> +static inline bool __i915_reset_in_progress(unsigned x)
> +{
> +	return unlikely(x & I915_RESET_IN_PROGRESS_FLAG);
> +}
>   
>   static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
>   {
> -	return unlikely(atomic_read(&error->reset_counter)
> -			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
> +	return __i915_reset_in_progress(atomic_read(&error->reset_counter));
> +}
> +
> +static inline bool __i915_terminally_wedged(unsigned x)
> +{
> +	return unlikely(x & I915_WEDGED);
>   }
>   
>   static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>   {
> -	return atomic_read(&error->reset_counter) & I915_WEDGED;
> +	return __i915_terminally_wedged(atomic_read(&error->reset_counter));
> +}
> +
> +static inline bool i915_recovery_pending(struct i915_gpu_error *error)
> +{
> +	unsigned x = atomic_read(&error->reset_counter);
> +	return __i915_reset_in_progress(x) && !__i915_terminally_wedged(x);
>   }
>   
>   static inline u32 i915_reset_count(struct i915_gpu_error *error)
> @@ -2463,21 +2404,11 @@ void i915_gem_reset(struct drm_device *dev);
>   bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>   int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>   int __must_check i915_gem_init(struct drm_device *dev);
> -int i915_gem_init_rings(struct drm_device *dev);
>   int __must_check i915_gem_init_hw(struct drm_device *dev);
> -int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice);
> +void i915_gem_fini(struct drm_device *dev);
>   void i915_gem_init_swizzling(struct drm_device *dev);
> -void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
>   int __must_check i915_gpu_idle(struct drm_device *dev);
>   int __must_check i915_gem_suspend(struct drm_device *dev);
> -int __i915_add_request(struct intel_engine_cs *ring,
> -		       struct drm_file *file,
> -		       struct drm_i915_gem_object *batch_obj,
> -		       u32 *seqno);
> -#define i915_add_request(ring, seqno) \
> -	__i915_add_request(ring, NULL, NULL, seqno)
> -int __must_check i915_wait_seqno(struct intel_engine_cs *ring,
> -				 uint32_t seqno);
>   int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
>   int __must_check
>   i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
> @@ -2487,7 +2418,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
>   int __must_check
>   i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>   				     u32 alignment,
> -				     struct intel_engine_cs *pipelined);
> +				     struct i915_gem_request *pipelined);
>   void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj);
>   int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
>   				int align);
> @@ -2534,13 +2465,10 @@ static inline bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj) {
>   }
>   
>   /* Some GGTT VM helpers */
> -#define i915_obj_to_ggtt(obj) \
> -	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
> +#define i915_obj_to_ggtt(obj) (&to_i915((obj)->base.dev)->gtt.base)
>   static inline bool i915_is_ggtt(struct i915_address_space *vm)
>   {
> -	struct i915_address_space *ggtt =
> -		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
> -	return vm == ggtt;
> +	return vm == &to_i915(vm->dev)->gtt.base;
>   }
>   
>   static inline struct i915_hw_ppgtt *
> @@ -2589,12 +2517,12 @@ void i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj);
>   /* i915_gem_context.c */
>   int __must_check i915_gem_context_init(struct drm_device *dev);
>   void i915_gem_context_fini(struct drm_device *dev);
> -void i915_gem_context_reset(struct drm_device *dev);
>   int i915_gem_context_open(struct drm_device *dev, struct drm_file *file);
>   int i915_gem_context_enable(struct drm_i915_private *dev_priv);
>   void i915_gem_context_close(struct drm_device *dev, struct drm_file *file);
> -int i915_switch_context(struct intel_engine_cs *ring,
> -			struct intel_context *to);
> +int i915_request_switch_context(struct i915_gem_request *rq);
> +void i915_request_switch_context__commit(struct i915_gem_request *rq);
> +void i915_request_switch_context__undo(struct i915_gem_request *rq);
>   struct intel_context *
>   i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
>   void i915_gem_context_free(struct kref *ctx_ref);
> @@ -2624,6 +2552,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file_priv);
>   
> +/* i915_gem_render_state.c */
> +int i915_gem_render_state_init(struct i915_gem_request *rq);
>   /* i915_gem_evict.c */
>   int __must_check i915_gem_evict_something(struct drm_device *dev,
>   					  struct i915_address_space *vm,
> @@ -2643,6 +2573,160 @@ static inline void i915_gem_chipset_flush(struct drm_device *dev)
>   		intel_gtt_chipset_flush();
>   }
>   
> +/* i915_gem_request.c */
> +
> +/**
> + * Request queue structure.
> + *
> + * The request queue allows us to note sequence numbers that have been emitted
> + * and may be associated with active buffers to be retired.
> + *
> + * By keeping this list, we can avoid having to do questionable
> + * sequence-number comparisons on buffer last_rendering_seqnos, and associate
> + * an emission time with seqnos for tracking how far ahead of the GPU we are.
> + */
> +struct i915_gem_request {
> +	struct kref kref;
> +
> +	/** On which ring/engine/ctx this request was generated */
> +	struct drm_i915_private *i915;
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *engine;
> +	struct intel_ringbuffer *ring;
> +
> +	/** How many GPU resets ago was this request first constructed? */
> +	unsigned reset_counter;
> +
> +	/** GEM sequence number/breadcrumb associated with this request. */
> +	u32 seqno;
> +	u32 breadcrumb[I915_NUM_ENGINES];
> +	u32 semaphore[I915_NUM_ENGINES];
> +
> +	/** Position in the ringbuffer of the request */
> +	u32 head, tail;
> +
> +	/** Batch buffer and objects related to this request if any */
> +	struct i915_vma *batch;
> +	struct list_head vmas;
> +
> +	/** Time at which this request was emitted, in jiffies. */
> +	unsigned long emitted_jiffies;
> +
> +	/** global list entry for this request */
> +	struct list_head engine_list;
> +	struct list_head breadcrumb_link;
> +
> +	struct drm_i915_file_private *file_priv;
> +	/** file_priv list entry for this request */
> +	struct list_head client_list;
> +
> +	u16 tag;
> +	unsigned remap_l3:8;
> +	unsigned pending_flush:4;
> +	bool outstanding:1;
> +	bool has_ctx_switch:1;
> +
> +	bool completed; /* kept separate for atomicity */
> +};
> +
> +static inline struct intel_engine_cs *i915_request_engine(struct i915_gem_request *rq)
> +{
> +	return rq ? rq->engine : NULL;
> +}
> +
> +static inline int i915_request_engine_id(struct i915_gem_request *rq)
> +{
> +	return rq ? rq->engine->id : -1;
> +}
> +
> +static inline u32 i915_request_seqno(struct i915_gem_request *rq)
> +{
> +	return rq ? rq->seqno : 0;
> +}
> +
> +bool __i915_request_complete__wa(struct i915_gem_request *rq);
> +
> +static inline bool
> +i915_request_complete(struct i915_gem_request *rq)
> +{
> +	if (!rq->completed && rq->engine->is_complete(rq)) {
> +		trace_i915_gem_request_complete(rq);
> +		rq->completed = true;
> +	}
> +	return rq->completed;
> +}
> +
> +static inline struct i915_gem_request *
> +i915_request_get(struct i915_gem_request *rq)
> +{
> +	if (rq)
> +		kref_get(&rq->kref);
> +	return rq;
> +}
> +
> +void __i915_request_free(struct kref *kref);
> +
> +static inline void
> +i915_request_put(struct i915_gem_request *rq)
> +{
> +	if (rq == NULL)
> +		return;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +	kref_put(&rq->kref, __i915_request_free);
> +}
> +
> +static inline void
> +i915_request_put__unlocked(struct i915_gem_request *rq)
> +{
> +	if (!atomic_add_unless(&rq->kref.refcount, -1, 1)) {
> +		struct drm_device *dev = rq->i915->dev;
> +
> +		mutex_lock(&dev->struct_mutex);
> +		if (likely(atomic_dec_and_test(&rq->kref.refcount)))
> +			__i915_request_free(&rq->kref);
> +		mutex_unlock(&dev->struct_mutex);
> +	}
> +}
> +
> +int __must_check
> +i915_request_add_vma(struct i915_gem_request *rq,
> +		     struct i915_vma *vma,
> +		     unsigned fenced);
> +#define VMA_IS_FENCED 0x1
> +#define VMA_HAS_FENCE 0x2
> +int __must_check
> +i915_request_emit_flush(struct i915_gem_request *rq,
> +			unsigned flags);
> +int __must_check
> +__i915_request_emit_breadcrumb(struct i915_gem_request *rq, int id);
> +static inline int __must_check
> +i915_request_emit_breadcrumb(struct i915_gem_request *rq)
> +{
> +	return __i915_request_emit_breadcrumb(rq, rq->engine->id);
> +}
> +static inline int __must_check
> +i915_request_emit_semaphore(struct i915_gem_request *rq, int id)
> +{
> +	return __i915_request_emit_breadcrumb(rq, id);
> +}
> +int __must_check
> +i915_request_emit_batchbuffer(struct i915_gem_request *rq,
> +			      struct i915_vma *batch,
> +			      uint64_t start, uint32_t len,
> +			      unsigned flags);
> +int __must_check
> +i915_request_commit(struct i915_gem_request *rq);
> +struct i915_gem_request *
> +i915_request_get_breadcrumb(struct i915_gem_request *rq);
> +int __must_check
> +i915_request_wait(struct i915_gem_request *rq);
> +int __i915_request_wait(struct i915_gem_request *rq,
> +			bool interruptible,
> +			s64 *timeout,
> +			struct drm_i915_file_private *file);
> +void i915_request_retire(struct i915_gem_request *rq);
> +
>   /* i915_gem_stolen.c */
>   int i915_gem_init_stolen(struct drm_device *dev);
>   int i915_gem_stolen_setup_compression(struct drm_device *dev, int size, int fb_cpp);
> @@ -2669,13 +2753,6 @@ void i915_gem_detect_bit_6_swizzle(struct drm_device *dev);
>   void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj);
>   void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj);
>   
> -/* i915_gem_debug.c */
> -#if WATCH_LISTS
> -int i915_verify_lists(struct drm_device *dev);
> -#else
> -#define i915_verify_lists(dev) 0
> -#endif
> -
>   /* i915_debugfs.c */
>   int i915_debugfs_init(struct drm_minor *minor);
>   void i915_debugfs_cleanup(struct drm_minor *minor);
> @@ -2710,10 +2787,10 @@ const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
>   
>   /* i915_cmd_parser.c */
>   int i915_cmd_parser_get_version(void);
> -int i915_cmd_parser_init_ring(struct intel_engine_cs *ring);
> -void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
> -bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
> -int i915_parse_cmds(struct intel_engine_cs *ring,
> +int i915_cmd_parser_init_engine(struct intel_engine_cs *engine);
> +void i915_cmd_parser_fini_engine(struct intel_engine_cs *engine);
> +bool i915_needs_cmd_parser(struct intel_engine_cs *engine);
> +int i915_parse_cmds(struct intel_engine_cs *engine,
>   		    struct drm_i915_gem_object *batch_obj,
>   		    u32 batch_start_offset,
>   		    bool is_master);
> @@ -2812,14 +2889,11 @@ extern void intel_detect_pch(struct drm_device *dev);
>   extern int intel_trans_dp_port_sel(struct drm_crtc *crtc);
>   extern int intel_enable_rc6(const struct drm_device *dev);
>   
> -extern bool i915_semaphore_is_enabled(struct drm_device *dev);
>   int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>   			struct drm_file *file);
>   int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
>   			       struct drm_file *file);
>   
> -void intel_notify_mmio_flip(struct intel_engine_cs *ring);
> -
>   /* overlay */
>   extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
>   extern void intel_overlay_print_error_state(struct drm_i915_error_state_buf *e,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f4553b2bee8e..46d3aced7a50 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -44,9 +44,6 @@ static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *o
>   static __must_check int
>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   			       bool readonly);
> -static void
> -i915_gem_object_retire(struct drm_i915_gem_object *obj);
> -
>   static void i915_gem_write_fence(struct drm_device *dev, int reg,
>   				 struct drm_i915_gem_object *obj);
>   static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
> @@ -108,23 +105,95 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
>   	spin_unlock(&dev_priv->mm.object_stat_lock);
>   }
>   
> +static void
> +i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
> +{
> +	intel_fb_obj_flush(obj, true);
> +	list_del_init(&obj->last_write.engine_list);
> +	i915_request_put(obj->last_write.request);
> +	obj->last_write.request = NULL;
> +}
> +
> +static void
> +i915_gem_object_retire__fence(struct drm_i915_gem_object *obj)
> +{
> +	list_del_init(&obj->last_fence.engine_list);
> +	i915_request_put(obj->last_fence.request);
> +	obj->last_fence.request = NULL;
> +}
> +
> +static void
> +i915_gem_object_retire__read(struct drm_i915_gem_object *obj,
> +			     struct intel_engine_cs *engine)
> +{
> +	struct i915_vma *vma;
> +
> +	BUG_ON(obj->active == 0);
> +
> +	list_del_init(&obj->last_read[engine->id].engine_list);
> +	i915_request_put(obj->last_read[engine->id].request);
> +	obj->last_read[engine->id].request = NULL;
> +
> +	if (obj->last_write.request &&
> +	    obj->last_write.request->engine == engine)
> +		i915_gem_object_retire__write(obj);
> +
> +	if (obj->last_fence.request &&
> +	    obj->last_fence.request->engine == engine)
> +		i915_gem_object_retire__fence(obj);
> +
> +	if (--obj->active)
> +		return;
> +
> +	list_for_each_entry(vma, &obj->vma_list, vma_link)
> +		if (!list_empty(&vma->mm_list))
> +			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
> +
> +	drm_gem_object_unreference(&obj->base);
> +}
> +
> +static void
> +i915_gem_object_retire(struct drm_i915_gem_object *obj)
> +{
> +	struct i915_gem_request *rq;
> +	int i;
> +
> +	/* We should only be called from code paths where we know we
> +	 * hold both the active reference *and* a user reference.
> +	 * Therefore we can safely access the object after retiring as
> +	 * we will hold a second reference and not free the object.
> +	 */
> +
> +	rq = obj->last_write.request;
> +	if (rq && i915_request_complete(rq))
> +		i915_gem_object_retire__write(obj);
> +
> +	rq = obj->last_fence.request;
> +	if (rq && i915_request_complete(rq))
> +		i915_gem_object_retire__fence(obj);
> +
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		rq = obj->last_read[i].request;
> +		if (rq && i915_request_complete(rq))
> +			i915_gem_object_retire__read(obj, rq->engine);
> +	}
> +
> +	if (!obj->active)
> +		i915_gem_retire_requests(obj->base.dev);
> +}
> +
>   static int
>   i915_gem_wait_for_error(struct i915_gpu_error *error)
>   {
>   	int ret;
>   
> -#define EXIT_COND (!i915_reset_in_progress(error) || \
> -		   i915_terminally_wedged(error))
> -	if (EXIT_COND)
> -		return 0;
> -
>   	/*
>   	 * Only wait 10 seconds for the gpu reset to complete to avoid hanging
>   	 * userspace. If it takes that long something really bad is going on and
>   	 * we should simply try to bail out and fail as gracefully as possible.
>   	 */
>   	ret = wait_event_interruptible_timeout(error->reset_queue,
> -					       EXIT_COND,
> +					       !i915_recovery_pending(error),
>   					       10*HZ);
>   	if (ret == 0) {
>   		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
> @@ -132,7 +201,6 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
>   	} else if (ret < 0) {
>   		return ret;
>   	}
> -#undef EXIT_COND
>   
>   	return 0;
>   }
> @@ -152,7 +220,6 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
>   	if (ret)
>   		return ret;
>   
> -	WARN_ON(i915_verify_lists(dev));
>   	return 0;
>   }
>   
> @@ -476,8 +543,6 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   		ret = i915_gem_object_wait_rendering(obj, true);
>   		if (ret)
>   			return ret;
> -
> -		i915_gem_object_retire(obj);
>   	}
>   
>   	ret = i915_gem_object_get_pages(obj);
> @@ -893,8 +958,6 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>   		ret = i915_gem_object_wait_rendering(obj, false);
>   		if (ret)
>   			return ret;
> -
> -		i915_gem_object_retire(obj);
>   	}
>   	/* Same trick applies to invalidate partially written cachelines read
>   	 * before writing. */
> @@ -1073,235 +1136,6 @@ unlock:
>   	return ret;
>   }
>   
> -int
> -i915_gem_check_wedge(struct i915_gpu_error *error,
> -		     bool interruptible)
> -{
> -	if (i915_reset_in_progress(error)) {
> -		/* Non-interruptible callers can't handle -EAGAIN, hence return
> -		 * -EIO unconditionally for these. */
> -		if (!interruptible)
> -			return -EIO;
> -
> -		/* Recovery complete, but the reset failed ... */
> -		if (i915_terminally_wedged(error))
> -			return -EIO;
> -
> -		/*
> -		 * Check if GPU Reset is in progress - we need intel_ring_begin
> -		 * to work properly to reinit the hw state while the gpu is
> -		 * still marked as reset-in-progress. Handle this with a flag.
> -		 */
> -		if (!error->reload_in_reset)
> -			return -EAGAIN;
> -	}
> -
> -	return 0;
> -}
> -
> -/*
> - * Compare seqno against outstanding lazy request. Emit a request if they are
> - * equal.
> - */
> -int
> -i915_gem_check_olr(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	int ret;
> -
> -	BUG_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> -
> -	ret = 0;
> -	if (seqno == ring->outstanding_lazy_seqno)
> -		ret = i915_add_request(ring, NULL);
> -
> -	return ret;
> -}
> -
> -static void fake_irq(unsigned long data)
> -{
> -	wake_up_process((struct task_struct *)data);
> -}
> -
> -static bool missed_irq(struct drm_i915_private *dev_priv,
> -		       struct intel_engine_cs *ring)
> -{
> -	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> -}
> -
> -static bool can_wait_boost(struct drm_i915_file_private *file_priv)
> -{
> -	if (file_priv == NULL)
> -		return true;
> -
> -	return !atomic_xchg(&file_priv->rps_wait_boost, true);
> -}
> -
> -/**
> - * __wait_seqno - wait until execution of seqno has finished
> - * @ring: the ring expected to report seqno
> - * @seqno: duh!
> - * @reset_counter: reset sequence associated with the given seqno
> - * @interruptible: do an interruptible wait (normally yes)
> - * @timeout: in - how long to wait (NULL forever); out - how much time remaining
> - *
> - * Note: It is of utmost importance that the passed in seqno and reset_counter
> - * values have been read by the caller in an smp safe manner. Where read-side
> - * locks are involved, it is sufficient to read the reset_counter before
> - * unlocking the lock that protects the seqno. For lockless tricks, the
> - * reset_counter _must_ be read before, and an appropriate smp_rmb must be
> - * inserted.
> - *
> - * Returns 0 if the seqno was found within the alloted time. Else returns the
> - * errno with remaining time filled in timeout argument.
> - */
> -static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
> -			unsigned reset_counter,
> -			bool interruptible,
> -			s64 *timeout,
> -			struct drm_i915_file_private *file_priv)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
> -	DEFINE_WAIT(wait);
> -	unsigned long timeout_expire;
> -	s64 before, now;
> -	int ret;
> -
> -	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
> -
> -	if (i915_seqno_passed(ring->get_seqno(ring, true), seqno))
> -		return 0;
> -
> -	timeout_expire = timeout ? jiffies + nsecs_to_jiffies((u64)*timeout) : 0;
> -
> -	if (INTEL_INFO(dev)->gen >= 6 && ring->id == RCS && can_wait_boost(file_priv)) {
> -		gen6_rps_boost(dev_priv);
> -		if (file_priv)
> -			mod_delayed_work(dev_priv->wq,
> -					 &file_priv->mm.idle_work,
> -					 msecs_to_jiffies(100));
> -	}
> -
> -	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
> -		return -ENODEV;
> -
> -	/* Record current time in case interrupted by signal, or wedged */
> -	trace_i915_gem_request_wait_begin(ring, seqno);
> -	before = ktime_get_raw_ns();
> -	for (;;) {
> -		struct timer_list timer;
> -
> -		prepare_to_wait(&ring->irq_queue, &wait,
> -				interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE);
> -
> -		/* We need to check whether any gpu reset happened in between
> -		 * the caller grabbing the seqno and now ... */
> -		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) {
> -			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
> -			 * is truely gone. */
> -			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
> -			if (ret == 0)
> -				ret = -EAGAIN;
> -			break;
> -		}
> -
> -		if (i915_seqno_passed(ring->get_seqno(ring, false), seqno)) {
> -			ret = 0;
> -			break;
> -		}
> -
> -		if (interruptible && signal_pending(current)) {
> -			ret = -ERESTARTSYS;
> -			break;
> -		}
> -
> -		if (timeout && time_after_eq(jiffies, timeout_expire)) {
> -			ret = -ETIME;
> -			break;
> -		}
> -
> -		timer.function = NULL;
> -		if (timeout || missed_irq(dev_priv, ring)) {
> -			unsigned long expire;
> -
> -			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
> -			expire = missed_irq(dev_priv, ring) ? jiffies + 1 : timeout_expire;
> -			mod_timer(&timer, expire);
> -		}
> -
> -		io_schedule();
> -
> -		if (timer.function) {
> -			del_singleshot_timer_sync(&timer);
> -			destroy_timer_on_stack(&timer);
> -		}
> -	}
> -	now = ktime_get_raw_ns();
> -	trace_i915_gem_request_wait_end(ring, seqno);
> -
> -	if (!irq_test_in_progress)
> -		ring->irq_put(ring);
> -
> -	finish_wait(&ring->irq_queue, &wait);
> -
> -	if (timeout) {
> -		s64 tres = *timeout - (now - before);
> -
> -		*timeout = tres < 0 ? 0 : tres;
> -	}
> -
> -	return ret;
> -}
> -
> -/**
> - * Waits for a sequence number to be signaled, and cleans up the
> - * request and object lists appropriately for that event.
> - */
> -int
> -i915_wait_seqno(struct intel_engine_cs *ring, uint32_t seqno)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	bool interruptible = dev_priv->mm.interruptible;
> -	int ret;
> -
> -	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> -	BUG_ON(seqno == 0);
> -
> -	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
> -	if (ret)
> -		return ret;
> -
> -	ret = i915_gem_check_olr(ring, seqno);
> -	if (ret)
> -		return ret;
> -
> -	return __wait_seqno(ring, seqno,
> -			    atomic_read(&dev_priv->gpu_error.reset_counter),
> -			    interruptible, NULL, NULL);
> -}
> -
> -static int
> -i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj,
> -				     struct intel_engine_cs *ring)
> -{
> -	if (!obj->active)
> -		return 0;
> -
> -	/* Manually manage the write flush as we may have not yet
> -	 * retired the buffer.
> -	 *
> -	 * Note that the last_write_seqno is always the earlier of
> -	 * the two (read/write) seqno, so if we haved successfully waited,
> -	 * we know we have passed the last write.
> -	 */
> -	obj->last_write_seqno = 0;
> -
> -	return 0;
> -}
> -
>   /**
>    * Ensures that all rendering to the object has completed and the object is
>    * safe to unbind from the GTT or access from the CPU.
> @@ -1310,19 +1144,30 @@ static __must_check int
>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   			       bool readonly)
>   {
> -	struct intel_engine_cs *ring = obj->ring;
> -	u32 seqno;
> -	int ret;
> +	int i, ret;
>   
> -	seqno = readonly ? obj->last_write_seqno : obj->last_read_seqno;
> -	if (seqno == 0)
> +	if (!obj->active)
>   		return 0;
>   
> -	ret = i915_wait_seqno(ring, seqno);
> -	if (ret)
> -		return ret;
> +	if (readonly) {
> +		if (obj->last_write.request) {
> +			ret = i915_request_wait(obj->last_write.request);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
> +		for (i = 0; i < I915_NUM_ENGINES; i++) {
> +			if (obj->last_read[i].request == NULL)
> +				continue;
> +
> +			ret = i915_request_wait(obj->last_read[i].request);
> +			if (ret)
> +				return ret;
> +		}
> +	}
>   
> -	return i915_gem_object_wait_rendering__tail(obj, ring);
> +	i915_gem_object_retire(obj);
> +	return 0;
>   }
>   
>   /* A nonblocking variant of the above wait. This is a highly dangerous routine
> @@ -1335,34 +1180,51 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_device *dev = obj->base.dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = obj->ring;
> -	unsigned reset_counter;
> -	u32 seqno;
> -	int ret;
> +	struct i915_gem_request *rq[I915_NUM_ENGINES] = {};
> +	int i, n, ret;
>   
>   	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>   	BUG_ON(!dev_priv->mm.interruptible);
>   
> -	seqno = readonly ? obj->last_write_seqno : obj->last_read_seqno;
> -	if (seqno == 0)
> +	n= 0;
> +	if (readonly) {
> +		if (obj->last_write.request) {
> +			rq[n] = i915_request_get_breadcrumb(obj->last_write.request);
> +			if (IS_ERR(rq[n]))
> +				return PTR_ERR(rq[n]);
> +			n++;
> +		}
> +	} else {
> +		for (i = 0; i < I915_NUM_ENGINES; i++) {
> +			if (obj->last_read[i].request == NULL)
> +				continue;
> +
> +			rq[n] = i915_request_get_breadcrumb(obj->last_read[i].request);
> +			if (IS_ERR(rq[n])) {
> +				ret = PTR_ERR(rq[n]);
> +				goto out;
> +			}
> +			n++;
> +		}
> +	}
> +	if (n == 0)
>   		return 0;
>   
> -	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
> -	if (ret)
> -		return ret;
> +	mutex_unlock(&dev->struct_mutex);
>   
> -	ret = i915_gem_check_olr(ring, seqno);
> -	if (ret)
> -		return ret;
> +	for (i = 0; i < n; i++) {
> +		ret = __i915_request_wait(rq[i], true, NULL, file_priv);
> +		if (ret)
> +			break;
> +	}
>   
> -	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> -	mutex_unlock(&dev->struct_mutex);
> -	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, file_priv);
>   	mutex_lock(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
>   
> -	return i915_gem_object_wait_rendering__tail(obj, ring);
> +out:
> +	for (i = 0; i < n; i++)
> +		i915_request_put(rq[i]);
> +
> +	return ret;
>   }
>   
>   /**
> @@ -2165,459 +2027,115 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>   	return 0;
>   }
>   
> -static void
> -i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
> -			       struct intel_engine_cs *ring)
> +int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
>   {
> -	u32 seqno = intel_ring_get_seqno(ring);
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *signaller, *waiter;
> +	int ret, i, j;
>   
> -	BUG_ON(ring == NULL);
> -	if (obj->ring != ring && obj->last_write_seqno) {
> -		/* Keep the seqno relative to the current ring */
> -		obj->last_write_seqno = seqno;
> -	}
> -	obj->ring = ring;
> +	if (seqno == 0)
> +		return -EINVAL;
>   
> -	/* Add a reference if we're newly entering the active list. */
> -	if (!obj->active) {
> -		drm_gem_object_reference(&obj->base);
> -		obj->active = 1;
> -	}
> +	if (seqno == dev_priv->next_seqno)
> +		return 0;
>   
> -	list_move_tail(&obj->ring_list, &ring->active_list);
> +	do {
> +		/* Flush the breadcrumbs */
> +		ret = i915_gpu_idle(dev);
> +		if (ret)
> +			return ret;
>   
> -	obj->last_read_seqno = seqno;
> -}
> +		if (!i915_gem_retire_requests(dev))
> +			return -EIO;
>   
> -void i915_vma_move_to_active(struct i915_vma *vma,
> -			     struct intel_engine_cs *ring)
> -{
> -	list_move_tail(&vma->mm_list, &vma->vm->active_list);
> -	return i915_gem_object_move_to_active(vma->obj, ring);
> -}
> +		/* Update all semaphores to the current value */
> +		for_each_engine(signaller, to_i915(dev), i) {
> +			struct i915_gem_request *rq;
>   
> -static void
> -i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
> -{
> -	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> -	struct i915_address_space *vm;
> -	struct i915_vma *vma;
> +			if (!signaller->semaphore.signal)
> +				continue;
>   
> -	BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS);
> -	BUG_ON(!obj->active);
> +			rq = intel_engine_alloc_request(signaller,
> +							signaller->default_context);
> +			if (IS_ERR(rq))
> +				return PTR_ERR(rq);
>   
> -	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
> -		vma = i915_gem_obj_to_vma(obj, vm);
> -		if (vma && !list_empty(&vma->mm_list))
> -			list_move_tail(&vma->mm_list, &vm->inactive_list);
> -	}
> +			for_each_engine(waiter, to_i915(dev), j) {
> +				if (signaller == waiter)
> +					continue;
>   
> -	intel_fb_obj_flush(obj, true);
> +				if (!waiter->semaphore.wait)
> +					continue;
>   
> -	list_del_init(&obj->ring_list);
> -	obj->ring = NULL;
> +				ret = i915_request_emit_semaphore(rq, waiter->id);
> +				if (ret)
> +					break;
> +			}
>   
> -	obj->last_read_seqno = 0;
> -	obj->last_write_seqno = 0;
> -	obj->base.write_domain = 0;
> +			if (ret == 0)
> +				ret = i915_request_commit(rq);
> +			i915_request_put(rq);
> +			if (ret)
> +				return ret;
> +		}
>   
> -	obj->last_fenced_seqno = 0;
> +		/* We can only roll seqno forwards across a wraparound.
> +		 * This ship is not for turning!
> +		 */
> +		if (!__i915_seqno_passed(dev_priv->next_seqno, seqno))
> +			break;
>   
> -	obj->active = 0;
> -	drm_gem_object_unreference(&obj->base);
> +		dev_priv->next_seqno += 0x40000000;
> +	}while (1);
>   
> -	WARN_ON(i915_verify_lists(dev));
> +	dev_priv->next_seqno = seqno;
> +	return 0;
>   }
>   
> -static void
> -i915_gem_object_retire(struct drm_i915_gem_object *obj)
> +void i915_gem_restore_fences(struct drm_device *dev)
>   {
> -	struct intel_engine_cs *ring = obj->ring;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int i;
>   
> -	if (ring == NULL)
> -		return;
> +	for (i = 0; i < dev_priv->num_fence_regs; i++) {
> +		struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i];
>   
> -	if (i915_seqno_passed(ring->get_seqno(ring, true),
> -			      obj->last_read_seqno))
> -		i915_gem_object_move_to_inactive(obj);
> +		/*
> +		 * Commit delayed tiling changes if we have an object still
> +		 * attached to the fence, otherwise just clear the fence.
> +		 */
> +		if (reg->obj) {
> +			i915_gem_object_update_fence(reg->obj, reg,
> +						     reg->obj->tiling_mode);
> +		} else {
> +			i915_gem_write_fence(dev, i, NULL);
> +		}
> +	}
>   }
>   
> -static int
> -i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
> +void i915_gem_reset(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	int ret, i, j;
> +	struct intel_engine_cs *engine;
> +	int i;
>   
> -	/* Carefully retire all requests without writing to the rings */
> -	for_each_ring(ring, dev_priv, i) {
> -		ret = intel_ring_idle(ring);
> -		if (ret)
> -			return ret;
> -	}
> -	i915_gem_retire_requests(dev);
> +	for_each_engine(engine, dev_priv, i) {
> +		/* Clearing the read list will also clear the write
> +		 * and fence lists, 3 birds with one stone.
> +		 */
> +		while (!list_empty(&engine->read_list)) {
> +			struct drm_i915_gem_object *obj;
> +
> +			obj = list_first_entry(&engine->read_list,
> +					       struct drm_i915_gem_object,
> +					       last_read[i].engine_list);
>   
> -	/* Finally reset hw state */
> -	for_each_ring(ring, dev_priv, i) {
> -		intel_ring_init_seqno(ring, seqno);
> +			i915_gem_object_retire__read(obj, engine);
> +		}
>   
> -		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
> -			ring->semaphore.sync_seqno[j] = 0;
> +		intel_engine_reset(engine);
>   	}
>   
> -	return 0;
> -}
> -
> -int i915_gem_set_seqno(struct drm_device *dev, u32 seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> -
> -	if (seqno == 0)
> -		return -EINVAL;
> -
> -	/* HWS page needs to be set less than what we
> -	 * will inject to ring
> -	 */
> -	ret = i915_gem_init_seqno(dev, seqno - 1);
> -	if (ret)
> -		return ret;
> -
> -	/* Carefully set the last_seqno value so that wrap
> -	 * detection still works
> -	 */
> -	dev_priv->next_seqno = seqno;
> -	dev_priv->last_seqno = seqno - 1;
> -	if (dev_priv->last_seqno == 0)
> -		dev_priv->last_seqno--;
> -
> -	return 0;
> -}
> -
> -int
> -i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	/* reserve 0 for non-seqno */
> -	if (dev_priv->next_seqno == 0) {
> -		int ret = i915_gem_init_seqno(dev, 0);
> -		if (ret)
> -			return ret;
> -
> -		dev_priv->next_seqno = 1;
> -	}
> -
> -	*seqno = dev_priv->last_seqno = dev_priv->next_seqno++;
> -	return 0;
> -}
> -
> -int __i915_add_request(struct intel_engine_cs *ring,
> -		       struct drm_file *file,
> -		       struct drm_i915_gem_object *obj,
> -		       u32 *out_seqno)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	struct drm_i915_gem_request *request;
> -	struct intel_ringbuffer *ringbuf;
> -	u32 request_ring_position, request_start;
> -	int ret;
> -
> -	request = ring->preallocated_lazy_request;
> -	if (WARN_ON(request == NULL))
> -		return -ENOMEM;
> -
> -	if (i915.enable_execlists) {
> -		struct intel_context *ctx = request->ctx;
> -		ringbuf = ctx->engine[ring->id].ringbuf;
> -	} else
> -		ringbuf = ring->buffer;
> -
> -	request_start = intel_ring_get_tail(ringbuf);
> -	/*
> -	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> -	 * after having emitted the batchbuffer command. Hence we need to fix
> -	 * things up similar to emitting the lazy request. The difference here
> -	 * is that the flush _must_ happen before the next request, no matter
> -	 * what.
> -	 */
> -	if (i915.enable_execlists) {
> -		ret = logical_ring_flush_all_caches(ringbuf);
> -		if (ret)
> -			return ret;
> -	} else {
> -		ret = intel_ring_flush_all_caches(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	/* Record the position of the start of the request so that
> -	 * should we detect the updated seqno part-way through the
> -	 * GPU processing the request, we never over-estimate the
> -	 * position of the head.
> -	 */
> -	request_ring_position = intel_ring_get_tail(ringbuf);
> -
> -	if (i915.enable_execlists) {
> -		ret = ring->emit_request(ringbuf);
> -		if (ret)
> -			return ret;
> -	} else {
> -		ret = ring->add_request(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	request->seqno = intel_ring_get_seqno(ring);
> -	request->ring = ring;
> -	request->head = request_start;
> -	request->tail = request_ring_position;
> -
> -	/* Whilst this request exists, batch_obj will be on the
> -	 * active_list, and so will hold the active reference. Only when this
> -	 * request is retired will the the batch_obj be moved onto the
> -	 * inactive_list and lose its active reference. Hence we do not need
> -	 * to explicitly hold another reference here.
> -	 */
> -	request->batch_obj = obj;
> -
> -	if (!i915.enable_execlists) {
> -		/* Hold a reference to the current context so that we can inspect
> -		 * it later in case a hangcheck error event fires.
> -		 */
> -		request->ctx = ring->last_context;
> -		if (request->ctx)
> -			i915_gem_context_reference(request->ctx);
> -	}
> -
> -	request->emitted_jiffies = jiffies;
> -	list_add_tail(&request->list, &ring->request_list);
> -	request->file_priv = NULL;
> -
> -	if (file) {
> -		struct drm_i915_file_private *file_priv = file->driver_priv;
> -
> -		spin_lock(&file_priv->mm.lock);
> -		request->file_priv = file_priv;
> -		list_add_tail(&request->client_list,
> -			      &file_priv->mm.request_list);
> -		spin_unlock(&file_priv->mm.lock);
> -	}
> -
> -	trace_i915_gem_request_add(ring, request->seqno);
> -	ring->outstanding_lazy_seqno = 0;
> -	ring->preallocated_lazy_request = NULL;
> -
> -	if (!dev_priv->ums.mm_suspended) {
> -		i915_queue_hangcheck(ring->dev);
> -
> -		cancel_delayed_work_sync(&dev_priv->mm.idle_work);
> -		queue_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.retire_work,
> -				   round_jiffies_up_relative(HZ));
> -		intel_mark_busy(dev_priv->dev);
> -	}
> -
> -	if (out_seqno)
> -		*out_seqno = request->seqno;
> -	return 0;
> -}
> -
> -static inline void
> -i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> -{
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> -
> -	if (!file_priv)
> -		return;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> -	spin_unlock(&file_priv->mm.lock);
> -}
> -
> -static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
> -				   const struct intel_context *ctx)
> -{
> -	unsigned long elapsed;
> -
> -	elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
> -
> -	if (ctx->hang_stats.banned)
> -		return true;
> -
> -	if (ctx->hang_stats.ban_period_seconds &&
> -	    elapsed <= ctx->hang_stats.ban_period_seconds) {
> -		if (!i915_gem_context_is_default(ctx)) {
> -			DRM_DEBUG("context hanging too fast, banning!\n");
> -			return true;
> -		} else if (i915_stop_ring_allow_ban(dev_priv)) {
> -			if (i915_stop_ring_allow_warn(dev_priv))
> -				DRM_ERROR("gpu hanging too fast, banning!\n");
> -			return true;
> -		}
> -	}
> -
> -	return false;
> -}
> -
> -static void i915_set_reset_status(struct drm_i915_private *dev_priv,
> -				  struct intel_context *ctx,
> -				  const bool guilty)
> -{
> -	struct i915_ctx_hang_stats *hs;
> -
> -	if (WARN_ON(!ctx))
> -		return;
> -
> -	hs = &ctx->hang_stats;
> -
> -	if (guilty) {
> -		hs->banned = i915_context_is_banned(dev_priv, ctx);
> -		hs->batch_active++;
> -		hs->guilty_ts = get_seconds();
> -	} else {
> -		hs->batch_pending++;
> -	}
> -}
> -
> -static void i915_gem_free_request(struct drm_i915_gem_request *request)
> -{
> -	list_del(&request->list);
> -	i915_gem_request_remove_from_client(request);
> -
> -	if (request->ctx)
> -		i915_gem_context_unreference(request->ctx);
> -
> -	kfree(request);
> -}
> -
> -struct drm_i915_gem_request *
> -i915_gem_find_active_request(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_gem_request *request;
> -	u32 completed_seqno;
> -
> -	completed_seqno = ring->get_seqno(ring, false);
> -
> -	list_for_each_entry(request, &ring->request_list, list) {
> -		if (i915_seqno_passed(completed_seqno, request->seqno))
> -			continue;
> -
> -		return request;
> -	}
> -
> -	return NULL;
> -}
> -
> -static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv,
> -				       struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_gem_request *request;
> -	bool ring_hung;
> -
> -	request = i915_gem_find_active_request(ring);
> -
> -	if (request == NULL)
> -		return;
> -
> -	ring_hung = ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG;
> -
> -	i915_set_reset_status(dev_priv, request->ctx, ring_hung);
> -
> -	list_for_each_entry_continue(request, &ring->request_list, list)
> -		i915_set_reset_status(dev_priv, request->ctx, false);
> -}
> -
> -static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
> -					struct intel_engine_cs *ring)
> -{
> -	while (!list_empty(&ring->active_list)) {
> -		struct drm_i915_gem_object *obj;
> -
> -		obj = list_first_entry(&ring->active_list,
> -				       struct drm_i915_gem_object,
> -				       ring_list);
> -
> -		i915_gem_object_move_to_inactive(obj);
> -	}
> -
> -	/*
> -	 * We must free the requests after all the corresponding objects have
> -	 * been moved off active lists. Which is the same order as the normal
> -	 * retire_requests function does. This is important if object hold
> -	 * implicit references on things like e.g. ppgtt address spaces through
> -	 * the request.
> -	 */
> -	while (!list_empty(&ring->request_list)) {
> -		struct drm_i915_gem_request *request;
> -
> -		request = list_first_entry(&ring->request_list,
> -					   struct drm_i915_gem_request,
> -					   list);
> -
> -		i915_gem_free_request(request);
> -	}
> -
> -	while (!list_empty(&ring->execlist_queue)) {
> -		struct intel_ctx_submit_request *submit_req;
> -
> -		submit_req = list_first_entry(&ring->execlist_queue,
> -				struct intel_ctx_submit_request,
> -				execlist_link);
> -		list_del(&submit_req->execlist_link);
> -		intel_runtime_pm_put(dev_priv);
> -		i915_gem_context_unreference(submit_req->ctx);
> -		kfree(submit_req);
> -	}
> -
> -	/* These may not have been flush before the reset, do so now */
> -	kfree(ring->preallocated_lazy_request);
> -	ring->preallocated_lazy_request = NULL;
> -	ring->outstanding_lazy_seqno = 0;
> -}
> -
> -void i915_gem_restore_fences(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int i;
> -
> -	for (i = 0; i < dev_priv->num_fence_regs; i++) {
> -		struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i];
> -
> -		/*
> -		 * Commit delayed tiling changes if we have an object still
> -		 * attached to the fence, otherwise just clear the fence.
> -		 */
> -		if (reg->obj) {
> -			i915_gem_object_update_fence(reg->obj, reg,
> -						     reg->obj->tiling_mode);
> -		} else {
> -			i915_gem_write_fence(dev, i, NULL);
> -		}
> -	}
> -}
> -
> -void i915_gem_reset(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	int i;
> -
> -	/*
> -	 * Before we free the objects from the requests, we need to inspect
> -	 * them for finding the guilty party. As the requests only borrow
> -	 * their reference to the objects, the inspection must be done first.
> -	 */
> -	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_status(dev_priv, ring);
> -
> -	for_each_ring(ring, dev_priv, i)
> -		i915_gem_reset_ring_cleanup(dev_priv, ring);
> -
> -	i915_gem_context_reset(dev);
> -
>   	i915_gem_restore_fences(dev);
>   }
>   
> @@ -2625,100 +2143,95 @@ void i915_gem_reset(struct drm_device *dev)
>    * This function clears the request list as sequence numbers are passed.
>    */
>   void
> -i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> +i915_gem_retire_requests__engine(struct intel_engine_cs *engine)
>   {
> -	uint32_t seqno;
> -
> -	if (list_empty(&ring->request_list))
> +	if (engine->last_request == NULL)
>   		return;
>   
> -	WARN_ON(i915_verify_lists(ring->dev));
> -
> -	seqno = ring->get_seqno(ring, true);
> +	if (!intel_engine_retire(engine, engine->get_seqno(engine)))
> +		return;
>   
> -	/* Move any buffers on the active list that are no longer referenced
> -	 * by the ringbuffer to the flushing/inactive lists as appropriate,
> -	 * before we free the context associated with the requests.
> -	 */
> -	while (!list_empty(&ring->active_list)) {
> +	while (!list_empty(&engine->write_list)) {
>   		struct drm_i915_gem_object *obj;
>   
> -		obj = list_first_entry(&ring->active_list,
> -				      struct drm_i915_gem_object,
> -				      ring_list);
> +		obj = list_first_entry(&engine->write_list,
> +				       struct drm_i915_gem_object,
> +				       last_write.engine_list);
>   
> -		if (!i915_seqno_passed(seqno, obj->last_read_seqno))
> +		if (!obj->last_write.request->completed)
>   			break;
>   
> -		i915_gem_object_move_to_inactive(obj);
> +		i915_gem_object_retire__write(obj);
>   	}
>   
> +	while (!list_empty(&engine->fence_list)) {
> +		struct drm_i915_gem_object *obj;
>   
> -	while (!list_empty(&ring->request_list)) {
> -		struct drm_i915_gem_request *request;
> -		struct intel_ringbuffer *ringbuf;
> -
> -		request = list_first_entry(&ring->request_list,
> -					   struct drm_i915_gem_request,
> -					   list);
> +		obj = list_first_entry(&engine->fence_list,
> +				       struct drm_i915_gem_object,
> +				       last_fence.engine_list);
>   
> -		if (!i915_seqno_passed(seqno, request->seqno))
> +		if (!obj->last_fence.request->completed)
>   			break;
>   
> -		trace_i915_gem_request_retire(ring, request->seqno);
> +		i915_gem_object_retire__fence(obj);
> +	}
>   
> -		/* This is one of the few common intersection points
> -		 * between legacy ringbuffer submission and execlists:
> -		 * we need to tell them apart in order to find the correct
> -		 * ringbuffer to which the request belongs to.
> -		 */
> -		if (i915.enable_execlists) {
> -			struct intel_context *ctx = request->ctx;
> -			ringbuf = ctx->engine[ring->id].ringbuf;
> -		} else
> -			ringbuf = ring->buffer;
> -
> -		/* We know the GPU must have read the request to have
> -		 * sent us the seqno + interrupt, so use the position
> -		 * of tail of the request to update the last known position
> -		 * of the GPU head.
> -		 */
> -		ringbuf->last_retired_head = request->tail;
> +	while (!list_empty(&engine->read_list)) {
> +		struct drm_i915_gem_object *obj;
>   
> -		i915_gem_free_request(request);
> -	}
> +		obj = list_first_entry(&engine->read_list,
> +				       struct drm_i915_gem_object,
> +				       last_read[engine->id].engine_list);
>   
> -	if (unlikely(ring->trace_irq_seqno &&
> -		     i915_seqno_passed(seqno, ring->trace_irq_seqno))) {
> -		ring->irq_put(ring);
> -		ring->trace_irq_seqno = 0;
> -	}
> +		if (!obj->last_read[engine->id].request->completed)
> +			break;
>   
> -	WARN_ON(i915_verify_lists(ring->dev));
> +		i915_gem_object_retire__read(obj, engine);
> +	}
>   }
>   
>   bool
>   i915_gem_retire_requests(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	bool idle = true;
>   	int i;
>   
> -	for_each_ring(ring, dev_priv, i) {
> -		i915_gem_retire_requests_ring(ring);
> -		idle &= list_empty(&ring->request_list);
> +	for_each_engine(engine, dev_priv, i) {
> +		i915_gem_retire_requests__engine(engine);
> +		idle &= engine->last_request == NULL;
>   	}
>   
>   	if (idle)
>   		mod_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.idle_work,
> -				   msecs_to_jiffies(100));
> +				 &dev_priv->mm.idle_work,
> +				 msecs_to_jiffies(100));
>   
>   	return idle;
>   }
>   
>   static void
> +i915_gem_flush_requests(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_engine_cs *engine;
> +	int i, ignored;
> +
> +	for_each_engine(engine, dev_priv, i) {
> +		if (engine->last_request == NULL)
> +			continue;
> +
> +		if (engine->last_request->breadcrumb[engine->id])
> +			continue;
> +
> +		ignored = intel_engine_flush(engine, engine->last_request->ctx);
> +	}
> +	(void)ignored;
> +}
> +
> +static void
>   i915_gem_retire_work_handler(struct work_struct *work)
>   {
>   	struct drm_i915_private *dev_priv =
> @@ -2730,10 +2243,13 @@ i915_gem_retire_work_handler(struct work_struct *work)
>   	idle = false;
>   	if (mutex_trylock(&dev->struct_mutex)) {
>   		idle = i915_gem_retire_requests(dev);
> +		if (!idle)
> +			i915_gem_flush_requests(dev);
>   		mutex_unlock(&dev->struct_mutex);
>   	}
>   	if (!idle)
> -		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work,
> +		queue_delayed_work(dev_priv->wq,
> +				   &dev_priv->mm.retire_work,
>   				   round_jiffies_up_relative(HZ));
>   }
>   
> @@ -2756,14 +2272,16 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   {
>   	int ret;
>   
> -	if (obj->active) {
> -		ret = i915_gem_check_olr(obj->ring, obj->last_read_seqno);
> +	if (!obj->active)
> +		return 0;
> +
> +	if (obj->last_write.request) {
> +		ret = i915_request_emit_breadcrumb(obj->last_write.request);
>   		if (ret)
>   			return ret;
> -
> -		i915_gem_retire_requests_ring(obj->ring);
>   	}
>   
> +	i915_gem_object_retire(obj);
>   	return 0;
>   }
>   
> @@ -2792,13 +2310,10 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   int
>   i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_gem_wait *args = data;
>   	struct drm_i915_gem_object *obj;
> -	struct intel_engine_cs *ring = NULL;
> -	unsigned reset_counter;
> -	u32 seqno = 0;
> -	int ret = 0;
> +	struct i915_gem_request *rq[I915_NUM_ENGINES] = {};
> +	int i, n, ret = 0;
>   
>   	ret = i915_mutex_lock_interruptible(dev);
>   	if (ret)
> @@ -2815,13 +2330,8 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	if (ret)
>   		goto out;
>   
> -	if (obj->active) {
> -		seqno = obj->last_read_seqno;
> -		ring = obj->ring;
> -	}
> -
> -	if (seqno == 0)
> -		 goto out;
> +	if (!obj->active)
> +		goto out;
>   
>   	/* Do this after OLR check to make sure we make forward progress polling
>   	 * on this IOCTL with a timeout <=0 (like busy ioctl)
> @@ -2831,12 +2341,31 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		goto out;
>   	}
>   
> +	for (i = n = 0; i < I915_NUM_ENGINES; i++) {
> +		if (obj->last_read[i].request == NULL)
> +			continue;
> +
> +		rq[n] = i915_request_get_breadcrumb(obj->last_read[i].request);
> +		if (IS_ERR(rq[n])) {
> +			ret = PTR_ERR(rq[n]);
> +			break;
> +		}
> +		n++;
> +	}
> +
>   	drm_gem_object_unreference(&obj->base);
> -	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
>   	mutex_unlock(&dev->struct_mutex);
>   
> -	return __wait_seqno(ring, seqno, reset_counter, true, &args->timeout_ns,
> -			    file->driver_priv);
> +	for (i = 0; i < n; i++) {
> +		if (ret == 0)
> +			ret = __i915_request_wait(rq[i], true,
> +						  &args->timeout_ns,
> +						  file->driver_priv);
> +
> +		i915_request_put__unlocked(rq[i]);
> +	}
> +
> +	return ret;
>   
>   out:
>   	drm_gem_object_unreference(&obj->base);
> @@ -2844,6 +2373,50 @@ out:
>   	return ret;
>   }
>   
> +static int
> +__i915_request_sync(struct i915_gem_request *waiter,
> +		    struct i915_gem_request *signaller,
> +		    struct drm_i915_gem_object *obj,
> +		    bool *retire)
> +{
> +	int ret;
> +
> +	if (signaller == NULL || i915_request_complete(signaller))
> +		return 0;
> +
> +	if (waiter == NULL)
> +		goto wait;
> +
> +	/* XXX still true with execlists? */
> +	if (waiter->engine == signaller->engine)
> +		return 0;
> +
> +	if (!waiter->engine->semaphore.wait)
> +		goto wait;
> +
> +	/* Try to emit only one wait per request per ring */
> +	if (waiter->semaphore[signaller->engine->id] &&
> +	    __i915_seqno_passed(waiter->semaphore[signaller->engine->id],
> +				signaller->seqno))
> +		return 0;
> +
> +	ret = i915_request_emit_semaphore(signaller, waiter->engine->id);
> +	if (ret)
> +		goto wait;
> +
> +	trace_i915_gem_ring_wait(signaller, waiter);
> +	if (waiter->engine->semaphore.wait(waiter, signaller))
> +		goto wait;
> +
> +	waiter->pending_flush &= ~I915_COMMAND_BARRIER;
> +	waiter->semaphore[signaller->engine->id] = signaller->breadcrumb[waiter->engine->id];
> +	return 0;
> +
> +wait:
> +	*retire = true;
> +	return i915_request_wait(signaller);
> +}
> +
>   /**
>    * i915_gem_object_sync - sync an object to a ring.
>    *
> @@ -2858,38 +2431,23 @@ out:
>    */
>   int
>   i915_gem_object_sync(struct drm_i915_gem_object *obj,
> -		     struct intel_engine_cs *to)
> +		     struct i915_gem_request *rq)
>   {
> -	struct intel_engine_cs *from = obj->ring;
> -	u32 seqno;
> -	int ret, idx;
> -
> -	if (from == NULL || to == from)
> -		return 0;
> +	int ret = 0, i;
> +	bool retire = false;
>   
> -	if (to == NULL || !i915_semaphore_is_enabled(obj->base.dev))
> -		return i915_gem_object_wait_rendering(obj, false);
> -
> -	idx = intel_ring_sync_index(from, to);
> -
> -	seqno = obj->last_read_seqno;
> -	/* Optimization: Avoid semaphore sync when we are sure we already
> -	 * waited for an object with higher seqno */
> -	if (seqno <= from->semaphore.sync_seqno[idx])
> -		return 0;
> -
> -	ret = i915_gem_check_olr(obj->ring, seqno);
> -	if (ret)
> -		return ret;
> +	if (obj->base.pending_write_domain == 0) {
> +		ret = __i915_request_sync(rq, obj->last_write.request, obj, &retire);
> +	} else {
> +		for (i = 0; i < I915_NUM_ENGINES; i++) {
> +			ret = __i915_request_sync(rq, obj->last_read[i].request, obj, &retire);
> +			if (ret)
> +				break;
> +		}
> +	}
>   
> -	trace_i915_gem_ring_sync_to(from, to, seqno);
> -	ret = to->semaphore.sync_to(to, from, seqno);
> -	if (!ret)
> -		/* We use last_read_seqno because sync_to()
> -		 * might have just caused seqno wrap under
> -		 * the radar.
> -		 */
> -		from->semaphore.sync_seqno[idx] = obj->last_read_seqno;
> +	if (retire)
> +		i915_gem_object_retire(obj);
>   
>   	return ret;
>   }
> @@ -2983,19 +2541,22 @@ int i915_vma_unbind(struct i915_vma *vma)
>   
>   int i915_gpu_idle(struct drm_device *dev)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	int ret, i;
> +	struct intel_engine_cs *engine;
> +	int i;
>   
> -	/* Flush everything onto the inactive list. */
> -	for_each_ring(ring, dev_priv, i) {
> -		if (!i915.enable_execlists) {
> -			ret = i915_switch_context(ring, ring->default_context);
> -			if (ret)
> -				return ret;
> -		}
> +	/* Flush everything including contexts onto the inactive list. */
> +	for_each_engine(engine, to_i915(dev), i) {
> +		struct i915_gem_request *rq;
> +		int ret;
> +
> +		rq = intel_engine_alloc_request(engine,
> +						engine->default_context);
> +		if (IS_ERR(rq))
> +			return PTR_ERR(rq);
> +
> +		ret = i915_request_wait(rq);
> +		i915_request_put(rq);
>   
> -		ret = intel_ring_idle(ring);
>   		if (ret)
>   			return ret;
>   	}
> @@ -3199,14 +2760,16 @@ static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
>   static int
>   i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
>   {
> -	if (obj->last_fenced_seqno) {
> -		int ret = i915_wait_seqno(obj->ring, obj->last_fenced_seqno);
> -		if (ret)
> -			return ret;
> +	int ret;
>   
> -		obj->last_fenced_seqno = 0;
> -	}
> +	if (obj->last_fence.request == NULL)
> +		return 0;
>   
> +	ret = i915_request_wait(obj->last_fence.request);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_object_retire__fence(obj);
>   	return 0;
>   }
>   
> @@ -3641,7 +3204,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   	if (ret)
>   		return ret;
>   
> -	i915_gem_object_retire(obj);
>   	i915_gem_object_flush_cpu_write_domain(obj, false);
>   
>   	/* Serialise direct access to this object with the barriers for
> @@ -3660,14 +3222,12 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   	BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_GTT) != 0);
>   	obj->base.read_domains |= I915_GEM_DOMAIN_GTT;
>   	if (write) {
> +		intel_fb_obj_invalidate(obj, NULL);
>   		obj->base.read_domains = I915_GEM_DOMAIN_GTT;
>   		obj->base.write_domain = I915_GEM_DOMAIN_GTT;
>   		obj->dirty = 1;
>   	}
>   
> -	if (write)
> -		intel_fb_obj_invalidate(obj, NULL);
> -
>   	trace_i915_gem_object_change_domain(obj,
>   					    old_read_domains,
>   					    old_write_domain);
> @@ -3739,7 +3299,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   		 * in obj->write_domain and have been skipping the clflushes.
>   		 * Just set it to the CPU cache for now.
>   		 */
> -		i915_gem_object_retire(obj);
>   		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
>   
>   		old_read_domains = obj->base.read_domains;
> @@ -3865,17 +3424,15 @@ static bool is_pin_display(struct drm_i915_gem_object *obj)
>   int
>   i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>   				     u32 alignment,
> -				     struct intel_engine_cs *pipelined)
> +				     struct i915_gem_request *pipelined)
>   {
>   	u32 old_read_domains, old_write_domain;
>   	bool was_pin_display;
>   	int ret;
>   
> -	if (pipelined != obj->ring) {
> -		ret = i915_gem_object_sync(obj, pipelined);
> -		if (ret)
> -			return ret;
> -	}
> +	ret = i915_gem_object_sync(obj, pipelined);
> +	if (ret)
> +		return ret;
>   
>   	/* Mark the pin_display early so that we account for the
>   	 * display coherency whilst setting up the cache domains.
> @@ -3971,7 +3528,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>   	if (ret)
>   		return ret;
>   
> -	i915_gem_object_retire(obj);
>   	i915_gem_object_flush_gtt_write_domain(obj);
>   
>   	old_write_domain = obj->base.write_domain;
> @@ -3984,78 +3540,25 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>   		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>   	}
>   
> -	/* It should now be out of any other write domains, and we can update
> -	 * the domain values for our changes.
> -	 */
> -	BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0);
> -
> -	/* If we're writing through the CPU, then the GPU read domains will
> -	 * need to be invalidated at next use.
> -	 */
> -	if (write) {
> -		obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> -		obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> -	}
> -
> -	if (write)
> -		intel_fb_obj_invalidate(obj, NULL);
> -
> -	trace_i915_gem_object_change_domain(obj,
> -					    old_read_domains,
> -					    old_write_domain);
> -
> -	return 0;
> -}
> -
> -/* Throttle our rendering by waiting until the ring has completed our requests
> - * emitted over 20 msec ago.
> - *
> - * Note that if we were to use the current jiffies each time around the loop,
> - * we wouldn't escape the function with any frames outstanding if the time to
> - * render a frame was over 20ms.
> - *
> - * This should get us reasonable parallelism between CPU and GPU but also
> - * relatively low latency when blocking on a particular request to finish.
> - */
> -static int
> -i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
> -	struct drm_i915_gem_request *request;
> -	struct intel_engine_cs *ring = NULL;
> -	unsigned reset_counter;
> -	u32 seqno = 0;
> -	int ret;
> -
> -	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
> -	if (ret)
> -		return ret;
> -
> -	ret = i915_gem_check_wedge(&dev_priv->gpu_error, false);
> -	if (ret)
> -		return ret;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
> -		if (time_after_eq(request->emitted_jiffies, recent_enough))
> -			break;
> -
> -		ring = request->ring;
> -		seqno = request->seqno;
> -	}
> -	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> -	spin_unlock(&file_priv->mm.lock);
> -
> -	if (seqno == 0)
> -		return 0;
> +	/* It should now be out of any other write domains, and we can update
> +	 * the domain values for our changes.
> +	 */
> +	BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0);
>   
> -	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, NULL);
> -	if (ret == 0)
> -		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
> +	/* If we're writing through the CPU, then the GPU read domains will
> +	 * need to be invalidated at next use.
> +	 */
> +	if (write) {
> +		intel_fb_obj_invalidate(obj, NULL);
> +		obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> +		obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> +	}
>   
> -	return ret;
> +	trace_i915_gem_object_change_domain(obj,
> +					    old_read_domains,
> +					    old_write_domain);
> +
> +	return 0;
>   }
>   
>   static bool
> @@ -4268,7 +3771,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>   {
>   	struct drm_i915_gem_busy *args = data;
>   	struct drm_i915_gem_object *obj;
> -	int ret;
> +	int ret, i;
>   
>   	ret = i915_mutex_lock_interruptible(dev);
>   	if (ret)
> @@ -4287,10 +3790,16 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>   	 */
>   	ret = i915_gem_object_flush_active(obj);
>   
> -	args->busy = obj->active;
> -	if (obj->ring) {
> -		BUILD_BUG_ON(I915_NUM_RINGS > 16);
> -		args->busy |= intel_ring_flag(obj->ring) << 16;
> +	args->busy = 0;
> +	if (obj->active) {
> +		BUILD_BUG_ON(I915_NUM_ENGINES > 16);
> +		args->busy |= 1;
> +		for (i = 0; i < I915_NUM_ENGINES; i++)  {
> +			if (obj->last_read[i].request == NULL)
> +				continue;
> +
> +			args->busy |= 1 << (16 + i);
> +		}
>   	}
>   
>   	drm_gem_object_unreference(&obj->base);
> @@ -4299,11 +3808,58 @@ unlock:
>   	return ret;
>   }
>   
> +/* Throttle our rendering by waiting until the ring has completed our requests
> + * emitted over 20 msec ago.
> + *
> + * Note that if we were to use the current jiffies each time around the loop,
> + * we wouldn't escape the function with any frames outstanding if the time to
> + * render a frame was over 20ms.
> + *
> + * This should get us reasonable parallelism between CPU and GPU but also
> + * relatively low latency when blocking on a particular request to finish.
> + */
>   int
>   i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
> -			struct drm_file *file_priv)
> +			struct drm_file *file)
>   {
> -	return i915_gem_ring_throttle(dev, file_priv);
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
> +	struct i915_gem_request *rq, *tmp;
> +	int ret;
> +
> +	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
> +	if (ret)
> +		return ret;
> +
> +	/* used for querying whethering the GPU is wedged by legacy userspace */
> +	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +		return -EIO;
> +
> +	spin_lock(&file_priv->mm.lock);
> +	rq = NULL;
> +	list_for_each_entry(tmp, &file_priv->mm.request_list, client_list) {
> +		if (time_after_eq(tmp->emitted_jiffies, recent_enough))
> +			break;
> +		rq = tmp;
> +	}
> +	rq = i915_request_get(rq);
> +	spin_unlock(&file_priv->mm.lock);
> +
> +	if (rq != NULL) {
> +		if (rq->breadcrumb[rq->engine->id] == 0) {
> +			ret = i915_mutex_lock_interruptible(dev);
> +			if (ret == 0) {
> +				ret = i915_request_emit_breadcrumb(rq);
> +				mutex_unlock(&dev->struct_mutex);
> +			}
> +		}
> +		if (ret == 0)
> +			ret = __i915_request_wait(rq, true, NULL, NULL);
> +		i915_request_put__unlocked(rq);
> +	}
> +
> +	return ret;
>   }
>   
>   int
> @@ -4356,8 +3912,13 @@ unlock:
>   void i915_gem_object_init(struct drm_i915_gem_object *obj,
>   			  const struct drm_i915_gem_object_ops *ops)
>   {
> +	int i;
> +
>   	INIT_LIST_HEAD(&obj->global_list);
> -	INIT_LIST_HEAD(&obj->ring_list);
> +	INIT_LIST_HEAD(&obj->last_fence.engine_list);
> +	INIT_LIST_HEAD(&obj->last_write.engine_list);
> +	for (i = 0; i < I915_NUM_ENGINES; i++)
> +		INIT_LIST_HEAD(&obj->last_read[i].engine_list);
>   	INIT_LIST_HEAD(&obj->obj_exec_link);
>   	INIT_LIST_HEAD(&obj->vma_list);
>   
> @@ -4543,121 +4104,59 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
>   }
>   
>   static void
> -i915_gem_stop_ringbuffers(struct drm_device *dev)
> +i915_gem_cleanup_engines(struct drm_device *dev)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
>   	int i;
>   
> -	for_each_ring(ring, dev_priv, i)
> -		dev_priv->gt.stop_ring(ring);
> -}
> -
> -int
> -i915_gem_suspend(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret = 0;
> -
> -	mutex_lock(&dev->struct_mutex);
> -	if (dev_priv->ums.mm_suspended)
> -		goto err;
> -
> -	ret = i915_gpu_idle(dev);
> -	if (ret)
> -		goto err;
> -
> -	i915_gem_retire_requests(dev);
> -
> -	/* Under UMS, be paranoid and evict. */
> -	if (!drm_core_check_feature(dev, DRIVER_MODESET))
> -		i915_gem_evict_everything(dev);
> -
> -	i915_gem_stop_ringbuffers(dev);
> -
> -	/* Hack!  Don't let anybody do execbuf while we don't control the chip.
> -	 * We need to replace this with a semaphore, or something.
> -	 * And not confound ums.mm_suspended!
> -	 */
> -	dev_priv->ums.mm_suspended = !drm_core_check_feature(dev,
> -							     DRIVER_MODESET);
> -	mutex_unlock(&dev->struct_mutex);
> -
> -	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
> -	cancel_delayed_work_sync(&dev_priv->mm.retire_work);
> -	flush_delayed_work(&dev_priv->mm.idle_work);
> +	/* Not the regular for_each_engine so we can cleanup a failed setup */
> +	for (i =0; i < I915_NUM_ENGINES; i++) {
> +		struct intel_engine_cs *engine = &to_i915(dev)->engine[i];
>   
> -	return 0;
> +		if (engine->i915 == NULL)
> +			continue;
>   
> -err:
> -	mutex_unlock(&dev->struct_mutex);
> -	return ret;
> +		intel_engine_cleanup(engine);
> +	}
>   }
>   
> -int i915_gem_l3_remap(struct intel_engine_cs *ring, int slice)
> +static int
> +i915_gem_resume_engines(struct drm_device *dev)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> -	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> +	struct intel_engine_cs *engine;
>   	int i, ret;
>   
> -	if (!HAS_L3_DPF(dev) || !remap_info)
> -		return 0;
> -
> -	ret = intel_ring_begin(ring, GEN7_L3LOG_SIZE / 4 * 3);
> -	if (ret)
> -		return ret;
> -
> -	/*
> -	 * Note: We do not worry about the concurrent register cacheline hang
> -	 * here because no other code should access these registers other than
> -	 * at initialization time.
> -	 */
> -	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> -		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -		intel_ring_emit(ring, reg_base + i);
> -		intel_ring_emit(ring, remap_info[i/4]);
> +	for_each_engine(engine, to_i915(dev), i) {
> +		ret = intel_engine_resume(engine);
> +		if (ret)
> +			return ret;
>   	}
>   
> -	intel_ring_advance(ring);
> -
> -	return ret;
> +	return 0;
>   }
>   
> -void i915_gem_init_swizzling(struct drm_device *dev)
> +static int
> +i915_gem_suspend_engines(struct drm_device *dev)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	if (INTEL_INFO(dev)->gen < 5 ||
> -	    dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_NONE)
> -		return;
> -
> -	I915_WRITE(DISP_ARB_CTL, I915_READ(DISP_ARB_CTL) |
> -				 DISP_TILE_SURFACE_SWIZZLING);
> +	struct intel_engine_cs *engine;
> +	int i, ret;
>   
> -	if (IS_GEN5(dev))
> -		return;
> +	for_each_engine(engine, to_i915(dev), i) {
> +		ret = intel_engine_suspend(engine);
> +		if (ret)
> +			return ret;
> +	}
>   
> -	I915_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_SWZCTL);
> -	if (IS_GEN6(dev))
> -		I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_SNB));
> -	else if (IS_GEN7(dev))
> -		I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_IVB));
> -	else if (IS_GEN8(dev))
> -		I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_BDW));
> -	else
> -		BUG();
> +	return 0;
>   }
>   
>   static bool
> -intel_enable_blt(struct drm_device *dev)
> +intel_enable_blt(struct drm_i915_private *dev_priv)
>   {
> -	if (!HAS_BLT(dev))
> +	if (!HAS_BLT(dev_priv))
>   		return false;
>   
>   	/* The blitter was dysfunctional on early prototypes */
> -	if (IS_GEN6(dev) && dev->pdev->revision < 8) {
> +	if (IS_GEN6(dev_priv) && dev_priv->dev->pdev->revision < 8) {
>   		DRM_INFO("BLT not supported on this pre-production hardware;"
>   			 " graphics performance will be degraded.\n");
>   		return false;
> @@ -4666,34 +4165,32 @@ intel_enable_blt(struct drm_device *dev)
>   	return true;
>   }
>   
> -static void init_unused_ring(struct drm_device *dev, u32 base)
> +static void stop_unused_ring(struct drm_i915_private *dev_priv, u32 base)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
>   	I915_WRITE(RING_CTL(base), 0);
>   	I915_WRITE(RING_HEAD(base), 0);
>   	I915_WRITE(RING_TAIL(base), 0);
>   	I915_WRITE(RING_START(base), 0);
>   }
>   
> -static void init_unused_rings(struct drm_device *dev)
> +static void stop_unused_rings(struct drm_i915_private *dev_priv)
>   {
> -	if (IS_I830(dev)) {
> -		init_unused_ring(dev, PRB1_BASE);
> -		init_unused_ring(dev, SRB0_BASE);
> -		init_unused_ring(dev, SRB1_BASE);
> -		init_unused_ring(dev, SRB2_BASE);
> -		init_unused_ring(dev, SRB3_BASE);
> -	} else if (IS_GEN2(dev)) {
> -		init_unused_ring(dev, SRB0_BASE);
> -		init_unused_ring(dev, SRB1_BASE);
> -	} else if (IS_GEN3(dev)) {
> -		init_unused_ring(dev, PRB1_BASE);
> -		init_unused_ring(dev, PRB2_BASE);
> +	if (IS_I830(dev_priv)) {
> +		stop_unused_ring(dev_priv, PRB1_BASE);
> +		stop_unused_ring(dev_priv, SRB0_BASE);
> +		stop_unused_ring(dev_priv, SRB1_BASE);
> +		stop_unused_ring(dev_priv, SRB2_BASE);
> +		stop_unused_ring(dev_priv, SRB3_BASE);
> +	} else if (IS_GEN2(dev_priv)) {
> +		stop_unused_ring(dev_priv, SRB0_BASE);
> +		stop_unused_ring(dev_priv, SRB1_BASE);
> +	} else if (IS_GEN3(dev_priv)) {
> +		stop_unused_ring(dev_priv, PRB1_BASE);
> +		stop_unused_ring(dev_priv, PRB2_BASE);
>   	}
>   }
>   
> -int i915_gem_init_rings(struct drm_device *dev)
> +static int i915_gem_setup_engines(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	int ret;
> @@ -4704,61 +4201,116 @@ int i915_gem_init_rings(struct drm_device *dev)
>   	 * will prevent c3 entry. Makes sure all unused rings
>   	 * are totally idle.
>   	 */
> -	init_unused_rings(dev);
> +	stop_unused_rings(dev_priv);
>   
> -	ret = intel_init_render_ring_buffer(dev);
> +	ret = intel_init_render_engine(dev_priv);
>   	if (ret)
> -		return ret;
> +		goto cleanup;
>   
> -	if (HAS_BSD(dev)) {
> -		ret = intel_init_bsd_ring_buffer(dev);
> +	if (HAS_BSD(dev_priv)) {
> +		ret = intel_init_bsd_engine(dev_priv);
>   		if (ret)
> -			goto cleanup_render_ring;
> +			goto cleanup;
>   	}
>   
> -	if (intel_enable_blt(dev)) {
> -		ret = intel_init_blt_ring_buffer(dev);
> +	if (intel_enable_blt(dev_priv)) {
> +		ret = intel_init_blt_engine(dev_priv);
>   		if (ret)
> -			goto cleanup_bsd_ring;
> +			goto cleanup;
>   	}
>   
> -	if (HAS_VEBOX(dev)) {
> -		ret = intel_init_vebox_ring_buffer(dev);
> +	if (HAS_VEBOX(dev_priv)) {
> +		ret = intel_init_vebox_engine(dev_priv);
>   		if (ret)
> -			goto cleanup_blt_ring;
> +			goto cleanup;
>   	}
>   
> -	if (HAS_BSD2(dev)) {
> -		ret = intel_init_bsd2_ring_buffer(dev);
> +	if (HAS_BSD2(dev_priv)) {
> +		ret = intel_init_bsd2_engine(dev_priv);
>   		if (ret)
> -			goto cleanup_vebox_ring;
> +			goto cleanup;
>   	}
>   
> -	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
> +	return 0;
> +
> +cleanup:
> +	i915_gem_cleanup_engines(dev);
> +	return ret;
> +}
> +
> +int
> +i915_gem_suspend(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret = 0;
> +
> +	mutex_lock(&dev->struct_mutex);
> +	if (dev_priv->ums.mm_suspended)
> +		goto err;
> +
> +	ret = i915_gpu_idle(dev);
>   	if (ret)
> -		goto cleanup_bsd2_ring;
> +		goto err;
>   
> -	return 0;
> +	i915_gem_retire_requests(dev);
> +
> +	/* Under UMS, be paranoid and evict. */
> +	if (!drm_core_check_feature(dev, DRIVER_MODESET))
> +		i915_gem_evict_everything(dev);
> +
> +	ret = i915_gem_suspend_engines(dev);
> +	if (ret)
> +		goto err;
> +
> +	/* Hack!  Don't let anybody do execbuf while we don't control the chip.
> +	 * We need to replace this with a semaphore, or something.
> +	 * And not confound ums.mm_suspended!
> +	 */
> +	dev_priv->ums.mm_suspended = !drm_core_check_feature(dev,
> +							     DRIVER_MODESET);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
> +	cancel_delayed_work_sync(&dev_priv->mm.retire_work);
> +	flush_delayed_work(&dev_priv->mm.idle_work);
>   
> -cleanup_bsd2_ring:
> -	intel_cleanup_ring_buffer(&dev_priv->ring[VCS2]);
> -cleanup_vebox_ring:
> -	intel_cleanup_ring_buffer(&dev_priv->ring[VECS]);
> -cleanup_blt_ring:
> -	intel_cleanup_ring_buffer(&dev_priv->ring[BCS]);
> -cleanup_bsd_ring:
> -	intel_cleanup_ring_buffer(&dev_priv->ring[VCS]);
> -cleanup_render_ring:
> -	intel_cleanup_ring_buffer(&dev_priv->ring[RCS]);
> +	return 0;
>   
> +err:
> +	mutex_unlock(&dev->struct_mutex);
>   	return ret;
>   }
>   
> +void i915_gem_init_swizzling(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	if (INTEL_INFO(dev)->gen < 5 ||
> +	    dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_NONE)
> +		return;
> +
> +	I915_WRITE(DISP_ARB_CTL, I915_READ(DISP_ARB_CTL) |
> +				 DISP_TILE_SURFACE_SWIZZLING);
> +
> +	if (IS_GEN5(dev))
> +		return;
> +
> +	I915_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_SWZCTL);
> +	if (IS_GEN6(dev))
> +		I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_SNB));
> +	else if (IS_GEN7(dev))
> +		I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_IVB));
> +	else if (IS_GEN8(dev))
> +		I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_BDW));
> +	else
> +		BUG();
> +}
> +
>   int
>   i915_gem_init_hw(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret, i;
> +	int ret;
>   
>   	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>   		return -EIO;
> @@ -4784,33 +4336,11 @@ i915_gem_init_hw(struct drm_device *dev)
>   
>   	i915_gem_init_swizzling(dev);
>   
> -	ret = dev_priv->gt.init_rings(dev);
> -	if (ret)
> -		return ret;
> -
> -	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> -		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> -
> -	/*
> -	 * XXX: Contexts should only be initialized once. Doing a switch to the
> -	 * default context switch however is something we'd like to do after
> -	 * reset or thaw (the latter may not actually be necessary for HW, but
> -	 * goes with our code better). Context switching requires rings (for
> -	 * the do_switch), but before enabling PPGTT. So don't move this.
> -	 */
> -	ret = i915_gem_context_enable(dev_priv);
> -	if (ret && ret != -EIO) {
> -		DRM_ERROR("Context enable failed %d\n", ret);
> -		i915_gem_cleanup_ringbuffer(dev);
> -
> -		return ret;
> -	}
> -
>   	ret = i915_ppgtt_init_hw(dev);
> -	if (ret && ret != -EIO) {
> -		DRM_ERROR("PPGTT enable failed %d\n", ret);
> -		i915_gem_cleanup_ringbuffer(dev);
> -	}
> +	if (ret == 0)
> +		ret = i915_gem_context_enable(dev_priv);
> +	if (ret == 0)
> +		ret = i915_gem_resume_engines(dev);
>   
>   	return ret;
>   }
> @@ -4820,9 +4350,6 @@ int i915_gem_init(struct drm_device *dev)
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	int ret;
>   
> -	i915.enable_execlists = intel_sanitize_enable_execlists(dev,
> -			i915.enable_execlists);
> -
>   	mutex_lock(&dev->struct_mutex);
>   
>   	if (IS_VALLEYVIEW(dev)) {
> @@ -4833,18 +4360,6 @@ int i915_gem_init(struct drm_device *dev)
>   			DRM_DEBUG_DRIVER("allow wake ack timed out\n");
>   	}
>   
> -	if (!i915.enable_execlists) {
> -		dev_priv->gt.do_execbuf = i915_gem_ringbuffer_submission;
> -		dev_priv->gt.init_rings = i915_gem_init_rings;
> -		dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
> -		dev_priv->gt.stop_ring = intel_stop_ring_buffer;
> -	} else {
> -		dev_priv->gt.do_execbuf = intel_execlists_submission;
> -		dev_priv->gt.init_rings = intel_logical_rings_init;
> -		dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
> -		dev_priv->gt.stop_ring = intel_logical_ring_stop;
> -	}
> -
>   	ret = i915_gem_init_userptr(dev);
>   	if (ret) {
>   		mutex_unlock(&dev->struct_mutex);
> @@ -4853,13 +4368,12 @@ int i915_gem_init(struct drm_device *dev)
>   
>   	i915_gem_init_global_gtt(dev);
>   
> -	ret = i915_gem_context_init(dev);
> -	if (ret) {
> -		mutex_unlock(&dev->struct_mutex);
> -		return ret;
> -	}
> -
> -	ret = i915_gem_init_hw(dev);
> +	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	ret = i915_gem_setup_engines(dev);
> +	if (ret == 0)
> +		ret = i915_gem_context_init(dev);
> +	if (ret == 0)
> +		ret = i915_gem_init_hw(dev);
>   	if (ret == -EIO) {
>   		/* Allow ring initialisation to fail by marking the GPU as
>   		 * wedged. But we only want to do this where the GPU is angry,
> @@ -4869,20 +4383,16 @@ int i915_gem_init(struct drm_device *dev)
>   		atomic_set_mask(I915_WEDGED, &dev_priv->gpu_error.reset_counter);
>   		ret = 0;
>   	}
> +	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
>   	mutex_unlock(&dev->struct_mutex);
>   
>   	return ret;
>   }
>   
> -void
> -i915_gem_cleanup_ringbuffer(struct drm_device *dev)
> +void i915_gem_fini(struct drm_device *dev)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	int i;
> -
> -	for_each_ring(ring, dev_priv, i)
> -		dev_priv->gt.cleanup_ring(ring);
> +	i915_gem_context_fini(dev);
> +	i915_gem_cleanup_engines(dev);
>   }
>   
>   int
> @@ -4901,26 +4411,12 @@ i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
>   	}
>   
>   	mutex_lock(&dev->struct_mutex);
> -	dev_priv->ums.mm_suspended = 0;
> -
>   	ret = i915_gem_init_hw(dev);
> -	if (ret != 0) {
> -		mutex_unlock(&dev->struct_mutex);
> -		return ret;
> -	}
> -
> +	if (ret == 0)
> +		ret = drm_irq_install(dev, dev->pdev->irq);
> +	if (ret == 0)
> +		dev_priv->ums.mm_suspended = 0;
>   	BUG_ON(!list_empty(&dev_priv->gtt.base.active_list));
> -
> -	ret = drm_irq_install(dev, dev->pdev->irq);
> -	if (ret)
> -		goto cleanup_ringbuffer;
> -	mutex_unlock(&dev->struct_mutex);
> -
> -	return 0;
> -
> -cleanup_ringbuffer:
> -	i915_gem_cleanup_ringbuffer(dev);
> -	dev_priv->ums.mm_suspended = 1;
>   	mutex_unlock(&dev->struct_mutex);
>   
>   	return ret;
> @@ -4954,10 +4450,13 @@ i915_gem_lastclose(struct drm_device *dev)
>   }
>   
>   static void
> -init_ring_lists(struct intel_engine_cs *ring)
> +init_null_engine(struct intel_engine_cs *engine)
>   {
> -	INIT_LIST_HEAD(&ring->active_list);
> -	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&engine->read_list);
> +	INIT_LIST_HEAD(&engine->write_list);
> +	INIT_LIST_HEAD(&engine->fence_list);
> +	INIT_LIST_HEAD(&engine->requests);
> +	INIT_LIST_HEAD(&engine->rings);
>   }
>   
>   void i915_init_vm(struct drm_i915_private *dev_priv,
> @@ -4991,8 +4490,8 @@ i915_gem_load(struct drm_device *dev)
>   	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
>   	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
>   	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
> -	for (i = 0; i < I915_NUM_RINGS; i++)
> -		init_ring_lists(&dev_priv->ring[i]);
> +	for (i = 0; i < I915_NUM_ENGINES; i++)
> +		init_null_engine(&dev_priv->engine[i]);
>   	for (i = 0; i < I915_MAX_NUM_FENCES; i++)
>   		INIT_LIST_HEAD(&dev_priv->fence_regs[i].lru_list);
>   	INIT_DELAYED_WORK(&dev_priv->mm.retire_work,
> @@ -5052,13 +4551,13 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
>   	 */
>   	spin_lock(&file_priv->mm.lock);
>   	while (!list_empty(&file_priv->mm.request_list)) {
> -		struct drm_i915_gem_request *request;
> +		struct i915_gem_request *rq;
>   
> -		request = list_first_entry(&file_priv->mm.request_list,
> -					   struct drm_i915_gem_request,
> -					   client_list);
> -		list_del(&request->client_list);
> -		request->file_priv = NULL;
> +		rq = list_first_entry(&file_priv->mm.request_list,
> +				      struct i915_gem_request,
> +				      client_list);
> +		list_del(&rq->client_list);
> +		rq->file_priv = NULL;
>   	}
>   	spin_unlock(&file_priv->mm.lock);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 958d2cfad61a..c9b2a12be660 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -96,9 +96,9 @@
>   #define GEN6_CONTEXT_ALIGN (64<<10)
>   #define GEN7_CONTEXT_ALIGN 4096
>   
> -static size_t get_context_alignment(struct drm_device *dev)
> +static size_t get_context_alignment(struct drm_i915_private *i915)
>   {
> -	if (IS_GEN6(dev))
> +	if (IS_GEN6(i915))
>   		return GEN6_CONTEXT_ALIGN;
>   
>   	return GEN7_CONTEXT_ALIGN;
> @@ -111,6 +111,9 @@ static int get_context_size(struct drm_device *dev)
>   	u32 reg;
>   
>   	switch (INTEL_INFO(dev)->gen) {
> +	case 5:
> +		ret = ILK_CXT_TOTAL_SIZE;
> +		break;
>   	case 6:
>   		reg = I915_READ(CXT_SIZE);
>   		ret = GEN6_CXT_TOTAL_SIZE(reg) * 64;
> @@ -134,16 +137,22 @@ static int get_context_size(struct drm_device *dev)
>   
>   void i915_gem_context_free(struct kref *ctx_ref)
>   {
> -	struct intel_context *ctx = container_of(ctx_ref,
> -						 typeof(*ctx), ref);
> -
> -	if (i915.enable_execlists)
> -		intel_lr_context_free(ctx);
> +	struct intel_context *ctx =
> +		container_of(ctx_ref, typeof(*ctx), ref);
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +	int i;
>   
>   	i915_ppgtt_put(ctx->ppgtt);
>   
> -	if (ctx->legacy_hw_ctx.rcs_state)
> -		drm_gem_object_unreference(&ctx->legacy_hw_ctx.rcs_state->base);
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		if (intel_engine_initialized(&dev_priv->engine[i]) &&
> +		    ctx->ring[i].ring != NULL)
> +			dev_priv->engine[i].put_ring(ctx->ring[i].ring, ctx);
> +
> +		if (ctx->ring[i].state != NULL)
> +			drm_gem_object_unreference(&ctx->ring[i].state->base);
> +	}
> +
>   	list_del(&ctx->link);
>   	kfree(ctx);
>   }
> @@ -192,15 +201,16 @@ __create_hw_context(struct drm_device *dev,
>   
>   	kref_init(&ctx->ref);
>   	list_add_tail(&ctx->link, &dev_priv->context_list);
> +	ctx->i915 = dev_priv;
>   
>   	if (dev_priv->hw_context_size) {
>   		struct drm_i915_gem_object *obj =
> -				i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
> +			i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
>   		if (IS_ERR(obj)) {
>   			ret = PTR_ERR(obj);
>   			goto err_out;
>   		}
> -		ctx->legacy_hw_ctx.rcs_state = obj;
> +		ctx->ring[RCS].state = obj;
>   	}
>   
>   	/* Default context will never have a file_priv */
> @@ -228,18 +238,11 @@ err_out:
>   	return ERR_PTR(ret);
>   }
>   
> -/**
> - * The default context needs to exist per ring that uses contexts. It stores the
> - * context state of the GPU for applications that don't utilize HW contexts, as
> - * well as an idle case.
> - */
>   static struct intel_context *
>   i915_gem_create_context(struct drm_device *dev,
>   			struct drm_i915_file_private *file_priv)
>   {
> -	const bool is_global_default_ctx = file_priv == NULL;
>   	struct intel_context *ctx;
> -	int ret = 0;
>   
>   	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>   
> @@ -247,82 +250,29 @@ i915_gem_create_context(struct drm_device *dev,
>   	if (IS_ERR(ctx))
>   		return ctx;
>   
> -	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
> -		/* We may need to do things with the shrinker which
> -		 * require us to immediately switch back to the default
> -		 * context. This can cause a problem as pinning the
> -		 * default context also requires GTT space which may not
> -		 * be available. To avoid this we always pin the default
> -		 * context.
> -		 */
> -		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
> -					    get_context_alignment(dev), 0);
> -		if (ret) {
> -			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
> -			goto err_destroy;
> -		}
> -	}
> -
>   	if (USES_FULL_PPGTT(dev)) {
>   		struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, file_priv);
>   
>   		if (IS_ERR_OR_NULL(ppgtt)) {
>   			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
>   					 PTR_ERR(ppgtt));
> -			ret = PTR_ERR(ppgtt);
> -			goto err_unpin;
> +			i915_gem_context_unreference(ctx);
> +			return ERR_CAST(ppgtt);
>   		}
>   
>   		ctx->ppgtt = ppgtt;
>   	}
>   
>   	return ctx;
> -
> -err_unpin:
> -	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state)
> -		i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
> -err_destroy:
> -	i915_gem_context_unreference(ctx);
> -	return ERR_PTR(ret);
> -}
> -
> -void i915_gem_context_reset(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int i;
> -
> -	/* In execlists mode we will unreference the context when the execlist
> -	 * queue is cleared and the requests destroyed.
> -	 */
> -	if (i915.enable_execlists)
> -		return;
> -
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct intel_engine_cs *ring = &dev_priv->ring[i];
> -		struct intel_context *lctx = ring->last_context;
> -
> -		if (lctx) {
> -			if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> -				i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state);
> -
> -			i915_gem_context_unreference(lctx);
> -			ring->last_context = NULL;
> -		}
> -	}
>   }
>   
>   int i915_gem_context_init(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_context *ctx;
> -	int i;
> -
> -	/* Init should only be called once per module load. Eventually the
> -	 * restriction on the context_disabled check can be loosened. */
> -	if (WARN_ON(dev_priv->ring[RCS].default_context))
> -		return 0;
> +	int i, ret;
>   
> -	if (i915.enable_execlists) {
> +	if (RCS_ENGINE(dev_priv)->execlists_enabled) {
>   		/* NB: intentionally left blank. We will allocate our own
>   		 * backing objects as we need them, thank you very much */
>   		dev_priv->hw_context_size = 0;
> @@ -335,83 +285,112 @@ int i915_gem_context_init(struct drm_device *dev)
>   		}
>   	}
>   
> -	ctx = i915_gem_create_context(dev, NULL);
> +	/**
> +	 * The default context needs to exist per ring that uses contexts.
> +	 * It stores the context state of the GPU for applications that don't
> +	 * utilize HW contexts or per-process VM, as well as an idle case.
> +	 */
> +	ctx = __create_hw_context(dev, NULL);
>   	if (IS_ERR(ctx)) {
>   		DRM_ERROR("Failed to create default global context (error %ld)\n",
>   			  PTR_ERR(ctx));
>   		return PTR_ERR(ctx);
>   	}
>   
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct intel_engine_cs *ring = &dev_priv->ring[i];
> +	if (dev_priv->hw_context_size) {
> +		/* We may need to do things with the shrinker which
> +		 * require us to immediately switch back to the default
> +		 * context. This can cause a problem as pinning the
> +		 * default context also requires GTT space which may not
> +		 * be available. To avoid this we always pin the default
> +		 * context.
> +		 */
> +		ret = i915_gem_obj_ggtt_pin(ctx->ring[RCS].state,
> +					    get_context_alignment(dev_priv), 0);
> +		if (ret) {
> +			DRM_ERROR("Failed to pin global default context\n");
> +			i915_gem_context_unreference(ctx);
> +			return ret;
> +		}
> +	}
>   
> -		/* NB: RCS will hold a ref for all rings */
> -		ring->default_context = ctx;
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		struct intel_engine_cs *engine = &dev_priv->engine[i];
> +
> +		if (engine->i915 == NULL)
> +			continue;
> +
> +		engine->default_context = ctx;
> +		i915_gem_context_reference(ctx);
>   	}
>   
> +	dev_priv->default_context = ctx;
> +
>   	DRM_DEBUG_DRIVER("%s context support initialized\n",
> -			i915.enable_execlists ? "LR" :
> -			dev_priv->hw_context_size ? "HW" : "fake");
> +			 RCS_ENGINE(dev_priv)->execlists_enabled ? "LR" :
> +			 dev_priv->hw_context_size ? "HW" : "fake");
>   	return 0;
>   }
>   
>   void i915_gem_context_fini(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_context *dctx = dev_priv->ring[RCS].default_context;
> +	struct intel_engine_cs *engine;
>   	int i;
>   
> -	if (dctx->legacy_hw_ctx.rcs_state) {
> +	if (dev_priv->hw_context_size)
>   		/* The only known way to stop the gpu from accessing the hw context is
>   		 * to reset it. Do this as the very last operation to avoid confusing
>   		 * other code, leading to spurious errors. */
>   		intel_gpu_reset(dev);
>   
> -		/* When default context is created and switched to, base object refcount
> -		 * will be 2 (+1 from object creation and +1 from do_switch()).
> -		 * i915_gem_context_fini() will be called after gpu_idle() has switched
> -		 * to default context. So we need to unreference the base object once
> -		 * to offset the do_switch part, so that i915_gem_context_unreference()
> -		 * can then free the base object correctly. */
> -		WARN_ON(!dev_priv->ring[RCS].last_context);
> -		if (dev_priv->ring[RCS].last_context == dctx) {
> -			/* Fake switch to NULL context */
> -			WARN_ON(dctx->legacy_hw_ctx.rcs_state->active);
> -			i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
> -			i915_gem_context_unreference(dctx);
> -			dev_priv->ring[RCS].last_context = NULL;
> -		}
> -
> -		i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state);
> +	for_each_engine(engine, dev_priv, i) {
> +		i915_gem_context_unreference(engine->default_context);
> +		engine->default_context = NULL;
>   	}
>   
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct intel_engine_cs *ring = &dev_priv->ring[i];
> -
> -		if (ring->last_context)
> -			i915_gem_context_unreference(ring->last_context);
> -
> -		ring->default_context = NULL;
> -		ring->last_context = NULL;
> +	if (dev_priv->default_context) {
> +		if (dev_priv->hw_context_size)
> +			i915_gem_object_ggtt_unpin(dev_priv->default_context->ring[RCS].state);
> +		i915_gem_context_unreference(dev_priv->default_context);
> +		dev_priv->default_context = NULL;
>   	}
> -
> -	i915_gem_context_unreference(dctx);
>   }
>   
>   int i915_gem_context_enable(struct drm_i915_private *dev_priv)
>   {
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int ret, i;
>   
> -	BUG_ON(!dev_priv->ring[RCS].default_context);
> +	for_each_engine(engine, dev_priv, i) {
> +		struct intel_context *ctx = engine->default_context;
> +		struct i915_gem_request *rq;
>   
> -	if (i915.enable_execlists)
> -		return 0;
> +		if (HAS_L3_DPF(dev_priv))
> +			ctx->remap_slice = (1 << NUM_L3_SLICES(dev_priv)) - 1;
>   
> -	for_each_ring(ring, dev_priv, i) {
> -		ret = i915_switch_context(ring, ring->default_context);
> -		if (ret)
> +		rq = intel_engine_alloc_request(engine, ctx);
> +		if (IS_ERR(rq)) {
> +			ret = PTR_ERR(rq);
> +			goto err;
> +		}
> +
> +		ret = 0;
> +		/*
> +		 * Workarounds applied in this fn are part of register state context,
> +		 * they need to be re-initialized followed by gpu reset, suspend/resume,
> +		 * module reload.
> +		 */
> +		if (engine->init_context)
> +			ret = engine->init_context(rq);
> +		if (ret == 0)
> +			ret = i915_request_commit(rq);
> +		i915_request_put(rq);
> +		if (ret) {
> +err:
> +			DRM_ERROR("failed to enabled contexts (%s): %d\n", engine->name, ret);
>   			return ret;
> +		}
>   	}
>   
>   	return 0;
> @@ -421,7 +400,9 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   {
>   	struct intel_context *ctx = p;
>   
> +	ctx->file_priv = NULL;
>   	i915_gem_context_unreference(ctx);
> +
>   	return 0;
>   }
>   
> @@ -465,41 +446,48 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>   }
>   
>   static inline int
> -mi_set_context(struct intel_engine_cs *ring,
> -	       struct intel_context *new_context,
> -	       u32 hw_flags)
> +mi_set_context(struct i915_gem_request *rq,
> +	       struct intel_engine_context *new_context,
> +	       u32 flags)
>   {
> -	u32 flags = hw_flags | MI_MM_SPACE_GTT;
> -	int ret;
> +	struct intel_ringbuffer *ring;
> +	int len;
>   
>   	/* w/a: If Flush TLB Invalidation Mode is enabled, driver must do a TLB
>   	 * invalidation prior to MI_SET_CONTEXT. On GEN6 we don't set the value
> -	 * explicitly, so we rely on the value at ring init, stored in
> +	 * explicitly, so we rely on the value at engine init, stored in
>   	 * itlb_before_ctx_switch.
>   	 */
> -	if (IS_GEN6(ring->dev)) {
> -		ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, 0);
> -		if (ret)
> -			return ret;
> -	}
> +	if (IS_GEN6(rq->i915))
> +		rq->pending_flush |= I915_INVALIDATE_CACHES;
>   
> -	/* These flags are for resource streamer on HSW+ */
> -	if (!IS_HASWELL(ring->dev) && INTEL_INFO(ring->dev)->gen < 8)
> -		flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> +	len = 3;
> +	switch (INTEL_INFO(rq->i915)->gen) {
> +	case 8:
> +	case 7:
> +	case 5: len += 2;
> +		break;
> +	}
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, len);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
> -	if (INTEL_INFO(ring->dev)->gen >= 7)
> +	switch (INTEL_INFO(rq->i915)->gen) {
> +	case 8:
> +	case 7:
> +		/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
>   		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
> -	else
> -		intel_ring_emit(ring, MI_NOOP);
> +		break;
> +	case 5:
> +		intel_ring_emit(ring, MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN);
> +		break;
> +	}
>   
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_emit(ring, MI_SET_CONTEXT);
> -	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->legacy_hw_ctx.rcs_state) |
> +	intel_ring_emit(ring,
> +			i915_gem_obj_ggtt_offset(new_context->state) |
> +			MI_MM_SPACE_GTT |
>   			flags);
>   	/*
>   	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
> @@ -507,60 +495,106 @@ mi_set_context(struct intel_engine_cs *ring,
>   	 */
>   	intel_ring_emit(ring, MI_NOOP);
>   
> -	if (INTEL_INFO(ring->dev)->gen >= 7)
> +	switch (INTEL_INFO(rq->i915)->gen) {
> +	case 8:
> +	case 7:
>   		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> -	else
> -		intel_ring_emit(ring, MI_NOOP);
> +		break;
> +	case 5:
> +		intel_ring_emit(ring, MI_SUSPEND_FLUSH);
> +		break;
> +	}
>   
>   	intel_ring_advance(ring);
>   
> -	return ret;
> +	rq->pending_flush &= ~I915_COMMAND_BARRIER;
> +	return 0;
>   }
>   
> -static int do_switch(struct intel_engine_cs *ring,
> -		     struct intel_context *to)
> +static int l3_remap(struct i915_gem_request *rq, int slice)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	struct intel_context *from = ring->last_context;
> +	const u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> +	const u32 *remap_info;
> +	struct intel_ringbuffer *ring;
> +	int i;
> +
> +	remap_info = rq->i915->l3_parity.remap_info[slice];
> +	if (remap_info == NULL)
> +		return 0;
> +
> +	ring = intel_ring_begin(rq, GEN7_L3LOG_SIZE / 4 * 3);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
> +
> +	/*
> +	 * Note: We do not worry about the concurrent register cacheline hang
> +	 * here because no other code should access these registers other than
> +	 * at initialization time.
> +	 */
> +	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> +		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> +		intel_ring_emit(ring, reg_base + i);
> +		intel_ring_emit(ring, remap_info[i/4]);
> +	}
> +
> +	intel_ring_advance(ring);
> +	return 0;
> +}
> +
> +/**
> + * i915_request_switch_context() - perform a GPU context switch.
> + * @rq: request and ring/ctx for which we'll execute the context switch
> + *
> + * The context life cycle is simple. The context refcount is incremented and
> + * decremented by 1 and create and destroy. If the context is in use by the GPU,
> + * it will have a refoucnt > 1. This allows us to destroy the context abstract
> + * object while letting the normal object tracking destroy the backing BO.
> + */
> +int i915_request_switch_context(struct i915_gem_request *rq)
> +{
> +	struct intel_context *to = rq->ctx;
> +	struct intel_engine_context *ctx = &to->ring[rq->engine->id];
> +	struct intel_context *from;
>   	u32 hw_flags = 0;
> -	bool uninitialized = false;
>   	int ret, i;
>   
> -	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> -		BUG_ON(from->legacy_hw_ctx.rcs_state == NULL);
> -		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
> -	}
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (ctx->state == NULL)
> +		return 0;
>   
> -	if (from == to && !to->remap_slice)
> +	if (rq->ring->last_context == to && !to->remap_slice)
>   		return 0;
>   
>   	/* Trying to pin first makes error handling easier. */
> -	if (ring == &dev_priv->ring[RCS]) {
> -		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
> -					    get_context_alignment(ring->dev), 0);
> -		if (ret)
> -			return ret;
> -	}
> +	ret = i915_gem_obj_ggtt_pin(ctx->state,
> +				    get_context_alignment(rq->i915), 0);
> +	if (ret)
> +		return ret;
>   
>   	/*
>   	 * Pin can switch back to the default context if we end up calling into
>   	 * evict_everything - as a last ditch gtt defrag effort that also
>   	 * switches to the default context. Hence we need to reload from here.
>   	 */
> -	from = ring->last_context;
> +	from = rq->ring->last_context;
> +
> +	/* With execlists enabled, the ring, vm and logical state are
> +	 * interwined and we do not need to explicitly load the mm or
> +	 * logical state as it is loaded along with the LRCA.
> +	 *
> +	 * But we still want to pin the state (for global usage tracking)
> +	 * whilst in use and reload the l3 mapping if it has changed.
> +	 */
> +	if (rq->engine->execlists_enabled)
> +		goto load_l3_map;
>   
>   	if (to->ppgtt) {
> -		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
> +		ret = to->ppgtt->switch_mm(rq, to->ppgtt);
>   		if (ret)
>   			goto unpin_out;
>   	}
>   
> -	if (ring != &dev_priv->ring[RCS]) {
> -		if (from)
> -			i915_gem_context_unreference(from);
> -		goto done;
> -	}
> -
>   	/*
>   	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
>   	 * that thanks to write = false in this call and us not setting any gpu
> @@ -569,33 +603,39 @@ static int do_switch(struct intel_engine_cs *ring,
>   	 *
>   	 * XXX: We need a real interface to do this instead of trickery.
>   	 */
> -	ret = i915_gem_object_set_to_gtt_domain(to->legacy_hw_ctx.rcs_state, false);
> +	ret = i915_gem_object_set_to_gtt_domain(ctx->state, false);
>   	if (ret)
>   		goto unpin_out;
>   
> -	if (!to->legacy_hw_ctx.rcs_state->has_global_gtt_mapping) {
> -		struct i915_vma *vma = i915_gem_obj_to_vma(to->legacy_hw_ctx.rcs_state,
> -							   &dev_priv->gtt.base);
> -		vma->bind_vma(vma, to->legacy_hw_ctx.rcs_state->cache_level, GLOBAL_BIND);
> +	if (!ctx->state->has_global_gtt_mapping) {
> +		struct i915_vma *vma = i915_gem_obj_to_vma(ctx->state,
> +							   &rq->i915->gtt.base);
> +		vma->bind_vma(vma, ctx->state->cache_level, GLOBAL_BIND);
>   	}
>   
> -	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
> +	if (!ctx->initialized || i915_gem_context_is_default(to))
>   		hw_flags |= MI_RESTORE_INHIBIT;
>   
> -	ret = mi_set_context(ring, to, hw_flags);
> +	/* These flags are for resource streamer on HSW+ */
> +	if (!IS_HASWELL(rq->i915) && INTEL_INFO(rq->i915)->gen < 8) {
> +		if (ctx->initialized)
> +		       	hw_flags |= MI_RESTORE_EXT_STATE_EN;
> +		hw_flags |= MI_SAVE_EXT_STATE_EN;
> +	}
> +
> +	trace_i915_gem_ring_switch_context(rq->engine, to, hw_flags);
> +	ret = mi_set_context(rq, ctx, hw_flags);
>   	if (ret)
>   		goto unpin_out;
>   
> +load_l3_map:
>   	for (i = 0; i < MAX_L3_SLICES; i++) {
>   		if (!(to->remap_slice & (1<<i)))
>   			continue;
>   
> -		ret = i915_gem_l3_remap(ring, i);
>   		/* If it failed, try again next round */
> -		if (ret)
> -			DRM_DEBUG_DRIVER("L3 remapping failed\n");
> -		else
> -			to->remap_slice &= ~(1<<i);
> +		if (l3_remap(rq, i) == 0)
> +			rq->remap_l3 |= 1 << i;
>   	}
>   
>   	/* The backing object for the context is done after switching to the
> @@ -605,8 +645,16 @@ static int do_switch(struct intel_engine_cs *ring,
>   	 * MI_SET_CONTEXT instead of when the next seqno has completed.
>   	 */
>   	if (from != NULL) {
> -		from->legacy_hw_ctx.rcs_state->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> -		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->legacy_hw_ctx.rcs_state), ring);
> +		struct drm_i915_gem_object *from_obj = from->ring[rq->engine->id].state;
> +
> +		from_obj->base.pending_read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> +		/* obj is kept alive until the next request by its active ref */
> +		ret = i915_request_add_vma(rq,
> +					   i915_gem_obj_to_ggtt(from_obj),
> +					   0);
> +		if (ret)
> +			goto unpin_out;
> +
>   		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
>   		 * whole damn pipeline, we don't need to explicitly mark the
>   		 * object dirty. The only exception is that the context must be
> @@ -614,79 +662,61 @@ static int do_switch(struct intel_engine_cs *ring,
>   		 * able to defer doing this until we know the object would be
>   		 * swapped, but there is no way to do that yet.
>   		 */
> -		from->legacy_hw_ctx.rcs_state->dirty = 1;
> -		BUG_ON(from->legacy_hw_ctx.rcs_state->ring != ring);
> -
> -		/* obj is kept alive until the next request by its active ref */
> -		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
> -		i915_gem_context_unreference(from);
> -	}
> -
> -	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
> -	to->legacy_hw_ctx.initialized = true;
> -
> -done:
> -	i915_gem_context_reference(to);
> -	ring->last_context = to;
> -
> -	if (uninitialized) {
> -		if (ring->init_context) {
> -			ret = ring->init_context(ring);
> -			if (ret)
> -				DRM_ERROR("ring init context: %d\n", ret);
> -		}
> -
> -		ret = i915_gem_render_state_init(ring);
> -		if (ret)
> -			DRM_ERROR("init render state: %d\n", ret);
> +		from_obj->dirty = 1;
>   	}
>   
> +	rq->has_ctx_switch = true;
>   	return 0;
>   
>   unpin_out:
> -	if (ring->id == RCS)
> -		i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
> +	i915_gem_object_ggtt_unpin(ctx->state);
>   	return ret;
>   }
>   
>   /**
> - * i915_switch_context() - perform a GPU context switch.
> - * @ring: ring for which we'll execute the context switch
> - * @to: the context to switch to
> - *
> - * The context life cycle is simple. The context refcount is incremented and
> - * decremented by 1 and create and destroy. If the context is in use by the GPU,
> - * it will have a refcount > 1. This allows us to destroy the context abstract
> - * object while letting the normal object tracking destroy the backing BO.
> - *
> - * This function should not be used in execlists mode.  Instead the context is
> - * switched by writing to the ELSP and requests keep a reference to their
> - * context.
> + * i915_request_switch_context__commit() - commit the context sitch
> + * @rq: request for which we have executed the context switch
>    */
> -int i915_switch_context(struct intel_engine_cs *ring,
> -			struct intel_context *to)
> +void i915_request_switch_context__commit(struct i915_gem_request *rq)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_context *ctx;
>   
> -	WARN_ON(i915.enable_execlists);
> -	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
>   
> -	if (to->legacy_hw_ctx.rcs_state == NULL) { /* We have the fake context */
> -		if (to != ring->last_context) {
> -			i915_gem_context_reference(to);
> -			if (ring->last_context)
> -				i915_gem_context_unreference(ring->last_context);
> -			ring->last_context = to;
> -		}
> -		return 0;
> -	}
> +	if (!rq->has_ctx_switch)
> +		return;
> +
> +	ctx = rq->ring->last_context;
> +	if (ctx)
> +		i915_gem_object_ggtt_unpin(ctx->ring[rq->engine->id].state);
>   
> -	return do_switch(ring, to);
> +	ctx = rq->ctx;
> +	ctx->remap_slice &= ~rq->remap_l3;
> +	ctx->ring[rq->engine->id].initialized = true;
> +
> +	rq->has_ctx_switch = false;
> +}
> +
> +/**
> + * i915_request_switch_context__undo() - unwind the context sitch
> + * @rq: request for which we have executed the context switch
> + */
> +void i915_request_switch_context__undo(struct i915_gem_request *rq)
> +{
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (!rq->has_ctx_switch)
> +		return;
> +
> +	i915_gem_object_ggtt_unpin(rq->ctx->ring[rq->engine->id].state);
>   }
>   
> -static bool contexts_enabled(struct drm_device *dev)
> +static bool contexts_enabled(struct drm_i915_private *dev_priv)
>   {
> -	return i915.enable_execlists || to_i915(dev)->hw_context_size;
> +	if (RCS_ENGINE(dev_priv)->execlists_enabled)
> +		return true;
> +
> +	return dev_priv->hw_context_size;
>   }
>   
>   int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> @@ -697,7 +727,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   	struct intel_context *ctx;
>   	int ret;
>   
> -	if (!contexts_enabled(dev))
> +	if (!contexts_enabled(to_i915(dev)))
>   		return -ENODEV;
>   
>   	ret = i915_mutex_lock_interruptible(dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
> deleted file mode 100644
> index f462d1b51d97..000000000000
> --- a/drivers/gpu/drm/i915/i915_gem_debug.c
> +++ /dev/null
> @@ -1,118 +0,0 @@
> -/*
> - * Copyright © 2008 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the next
> - * paragraph) shall be included in all copies or substantial portions of the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> - * IN THE SOFTWARE.
> - *
> - * Authors:
> - *    Keith Packard <keithp at keithp.com>
> - *
> - */
> -
> -#include <drm/drmP.h>
> -#include <drm/i915_drm.h>
> -#include "i915_drv.h"
> -
> -#if WATCH_LISTS
> -int
> -i915_verify_lists(struct drm_device *dev)
> -{
> -	static int warned;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_gem_object *obj;
> -	int err = 0;
> -
> -	if (warned)
> -		return 0;
> -
> -	list_for_each_entry(obj, &dev_priv->render_ring.active_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed render active %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0) {
> -			DRM_ERROR("invalid render active %p (a %d r %x)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.read_domains);
> -			err++;
> -		} else if (obj->base.write_domain && list_empty(&obj->gpu_write_list)) {
> -			DRM_ERROR("invalid render active %p (w %x, gwl %d)\n",
> -				  obj,
> -				  obj->base.write_domain,
> -				  !list_empty(&obj->gpu_write_list));
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &dev_priv->mm.flushing_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed flushing %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0 ||
> -			   list_empty(&obj->gpu_write_list)) {
> -			DRM_ERROR("invalid flushing %p (a %d w %x gwl %d)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.write_domain,
> -				  !list_empty(&obj->gpu_write_list));
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &dev_priv->mm.gpu_write_list, gpu_write_list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed gpu write %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0) {
> -			DRM_ERROR("invalid gpu write %p (a %d w %x)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.write_domain);
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &i915_gtt_vm->inactive_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed inactive %p\n", obj);
> -			err++;
> -			break;
> -		} else if (obj->pin_count || obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS)) {
> -			DRM_ERROR("invalid inactive %p (p %d a %d w %x)\n",
> -				  obj,
> -				  obj->pin_count, obj->active,
> -				  obj->base.write_domain);
> -			err++;
> -		}
> -	}
> -
> -	return warned = err;
> -}
> -#endif /* WATCH_LIST */
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index c9016c439649..5ee96db71b37 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -42,6 +42,7 @@
>   
>   struct eb_vmas {
>   	struct list_head vmas;
> +	struct i915_vma *batch;
>   	int and;
>   	union {
>   		struct i915_vma *lut[0];
> @@ -88,6 +89,26 @@ eb_reset(struct eb_vmas *eb)
>   		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
>   }
>   
> +static struct i915_vma *
> +eb_get_batch(struct eb_vmas *eb)
> +{
> +	struct i915_vma *vma =
> +		list_entry(eb->vmas.prev, typeof(*vma), exec_list);
> +
> +	/*
> +	 * SNA is doing fancy tricks with compressing batch buffers, which leads
> +	 * to negative relocation deltas. Usually that works out ok since the
> +	 * relocate address is still positive, except when the batch is placed
> +	 * very low in the GTT. Ensure this doesn't happen.
> +	 *
> +	 * Note that actual hangs have only been observed on gen7, but for
> +	 * paranoia do it everywhere.
> +	 */
> +	vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
> +
> +	return vma;
> +}
> +
>   static int
>   eb_lookup_vmas(struct eb_vmas *eb,
>   	       struct drm_i915_gem_exec_object2 *exec,
> @@ -165,6 +186,9 @@ eb_lookup_vmas(struct eb_vmas *eb,
>   		++i;
>   	}
>   
> +	/* take note of the batch buffer before we might reorder the lists */
> +	eb->batch = eb_get_batch(eb);
> +
>   	return 0;
>   
>   
> @@ -256,7 +280,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_device *dev = obj->base.dev;
>   	uint32_t page_offset = offset_in_page(reloc->offset);
> -	uint64_t delta = reloc->delta + target_offset;
> +	uint64_t delta = (int)reloc->delta + target_offset;
>   	char *vaddr;
>   	int ret;
>   
> @@ -292,7 +316,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_device *dev = obj->base.dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	uint64_t delta = reloc->delta + target_offset;
> +	uint64_t delta = (int)reloc->delta + target_offset;
>   	uint64_t offset;
>   	void __iomem *reloc_page;
>   	int ret;
> @@ -422,13 +446,11 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   		ret = relocate_entry_cpu(obj, reloc, target_offset);
>   	else
>   		ret = relocate_entry_gtt(obj, reloc, target_offset);
> -
>   	if (ret)
>   		return ret;
>   
>   	/* and update the user's relocation entry */
>   	reloc->presumed_offset = target_offset;
> -
>   	return 0;
>   }
>   
> @@ -521,7 +543,7 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>   
>   static int
>   i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> -				struct intel_engine_cs *ring,
> +				struct intel_engine_cs *engine,
>   				bool *need_reloc)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
> @@ -610,7 +632,7 @@ eb_vma_misplaced(struct i915_vma *vma)
>   }
>   
>   static int
> -i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
> +i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
>   			    struct list_head *vmas,
>   			    bool *need_relocs)
>   {
> @@ -618,10 +640,10 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>   	struct i915_vma *vma;
>   	struct i915_address_space *vm;
>   	struct list_head ordered_vmas;
> -	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
> +	bool has_fenced_gpu_access = INTEL_INFO(engine->i915)->gen < 4;
>   	int retry;
>   
> -	i915_gem_retire_requests_ring(ring);
> +	i915_gem_retire_requests__engine(engine);
>   
>   	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
>   
> @@ -676,7 +698,7 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>   			if (eb_vma_misplaced(vma))
>   				ret = i915_vma_unbind(vma);
>   			else
> -				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
> +				ret = i915_gem_execbuffer_reserve_vma(vma, engine, need_relocs);
>   			if (ret)
>   				goto err;
>   		}
> @@ -686,7 +708,7 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>   			if (drm_mm_node_allocated(&vma->node))
>   				continue;
>   
> -			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
> +			ret = i915_gem_execbuffer_reserve_vma(vma, engine, need_relocs);
>   			if (ret)
>   				goto err;
>   		}
> @@ -706,10 +728,10 @@ err:
>   }
>   
>   static int
> -i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
> +i915_gem_execbuffer_relocate_slow(struct drm_i915_private *i915,
>   				  struct drm_i915_gem_execbuffer2 *args,
>   				  struct drm_file *file,
> -				  struct intel_engine_cs *ring,
> +				  struct intel_engine_cs *engine,
>   				  struct eb_vmas *eb,
>   				  struct drm_i915_gem_exec_object2 *exec)
>   {
> @@ -731,7 +753,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		drm_gem_object_unreference(&vma->obj->base);
>   	}
>   
> -	mutex_unlock(&dev->struct_mutex);
> +	mutex_unlock(&i915->dev->struct_mutex);
>   
>   	total = 0;
>   	for (i = 0; i < count; i++)
> @@ -742,7 +764,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   	if (reloc == NULL || reloc_offset == NULL) {
>   		drm_free_large(reloc);
>   		drm_free_large(reloc_offset);
> -		mutex_lock(&dev->struct_mutex);
> +		mutex_lock(&i915->dev->struct_mutex);
>   		return -ENOMEM;
>   	}
>   
> @@ -757,7 +779,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		if (copy_from_user(reloc+total, user_relocs,
>   				   exec[i].relocation_count * sizeof(*reloc))) {
>   			ret = -EFAULT;
> -			mutex_lock(&dev->struct_mutex);
> +			mutex_lock(&i915->dev->struct_mutex);
>   			goto err;
>   		}
>   
> @@ -775,7 +797,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   					   &invalid_offset,
>   					   sizeof(invalid_offset))) {
>   				ret = -EFAULT;
> -				mutex_lock(&dev->struct_mutex);
> +				mutex_lock(&i915->dev->struct_mutex);
>   				goto err;
>   			}
>   		}
> @@ -784,9 +806,9 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		total += exec[i].relocation_count;
>   	}
>   
> -	ret = i915_mutex_lock_interruptible(dev);
> +	ret = i915_mutex_lock_interruptible(i915->dev);
>   	if (ret) {
> -		mutex_lock(&dev->struct_mutex);
> +		mutex_lock(&i915->dev->struct_mutex);
>   		goto err;
>   	}
>   
> @@ -797,7 +819,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
>   		goto err;
>   
>   	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> -	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
> +	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, &need_relocs);
>   	if (ret)
>   		goto err;
>   
> @@ -822,17 +844,19 @@ err:
>   }
>   
>   static int
> -i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> -				struct list_head *vmas)
> +vmas_move_to_rq(struct list_head *vmas,
> +		struct i915_gem_request *rq)
>   {
>   	struct i915_vma *vma;
>   	uint32_t flush_domains = 0;
>   	bool flush_chipset = false;
>   	int ret;
>   
> +	/* 1: flush/serialise damage from other sources */
>   	list_for_each_entry(vma, vmas, exec_list) {
>   		struct drm_i915_gem_object *obj = vma->obj;
> -		ret = i915_gem_object_sync(obj, ring);
> +
> +		ret = i915_gem_object_sync(obj, rq);
>   		if (ret)
>   			return ret;
>   
> @@ -840,18 +864,39 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>   			flush_chipset |= i915_gem_clflush_object(obj, false);
>   
>   		flush_domains |= obj->base.write_domain;
> +		if (obj->last_read[rq->engine->id].request == NULL)
> +			rq->pending_flush |= I915_INVALIDATE_CACHES;
>   	}
>   
>   	if (flush_chipset)
> -		i915_gem_chipset_flush(ring->dev);
> +		i915_gem_chipset_flush(rq->i915->dev);
>   
>   	if (flush_domains & I915_GEM_DOMAIN_GTT)
>   		wmb();
>   
> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> -	 * any residual writes from the previous batch.
> -	 */
> -	return intel_ring_invalidate_all_caches(ring);
> +	/* 2: invalidate the caches from this ring after emitting semaphores */
> +	ret = i915_request_emit_flush(rq, I915_INVALIDATE_CACHES);
> +	if (ret)
> +		return ret;
> +
> +	/* 3: track flushes and objects for this rq */
> +	list_for_each_entry(vma, vmas, exec_list) {
> +		struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> +		unsigned fenced;
> +
> +		fenced = 0;
> +		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
> +			fenced |= VMA_IS_FENCED;
> +			if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
> +				fenced |= VMA_HAS_FENCE;
> +		}
> +
> +		ret = i915_request_add_vma(rq, vma, fenced);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
>   }
>   
>   static bool
> @@ -864,7 +909,7 @@ i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
>   }
>   
>   static int
> -validate_exec_list(struct drm_device *dev,
> +validate_exec_list(struct drm_i915_private *dev_priv,
>   		   struct drm_i915_gem_exec_object2 *exec,
>   		   int count)
>   {
> @@ -874,7 +919,7 @@ validate_exec_list(struct drm_device *dev,
>   	int i;
>   
>   	invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
> -	if (USES_FULL_PPGTT(dev))
> +	if (USES_FULL_PPGTT(dev_priv))
>   		invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>   
>   	for (i = 0; i < count; i++) {
> @@ -912,13 +957,14 @@ validate_exec_list(struct drm_device *dev,
>   }
>   
>   static struct intel_context *
> -i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
> -			  struct intel_engine_cs *ring, const u32 ctx_id)
> +i915_gem_validate_context(struct drm_file *file,
> +			  struct intel_engine_cs *engine,
> +			  const u32 ctx_id)
>   {
>   	struct intel_context *ctx = NULL;
>   	struct i915_ctx_hang_stats *hs;
>   
> -	if (ring->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
> +	if (engine->id != RCS && ctx_id != DEFAULT_CONTEXT_HANDLE)
>   		return ERR_PTR(-EINVAL);
>   
>   	ctx = i915_gem_context_get(file->driver_priv, ctx_id);
> @@ -931,86 +977,23 @@ i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
>   		return ERR_PTR(-EIO);
>   	}
>   
> -	if (i915.enable_execlists && !ctx->engine[ring->id].state) {
> -		int ret = intel_lr_context_deferred_create(ctx, ring);
> -		if (ret) {
> -			DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
> -			return ERR_PTR(ret);
> -		}
> -	}
> -
>   	return ctx;
>   }
>   
> -void
> -i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> -				   struct intel_engine_cs *ring)
> -{
> -	u32 seqno = intel_ring_get_seqno(ring);
> -	struct i915_vma *vma;
> -
> -	list_for_each_entry(vma, vmas, exec_list) {
> -		struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> -		struct drm_i915_gem_object *obj = vma->obj;
> -		u32 old_read = obj->base.read_domains;
> -		u32 old_write = obj->base.write_domain;
> -
> -		obj->base.write_domain = obj->base.pending_write_domain;
> -		if (obj->base.write_domain == 0)
> -			obj->base.pending_read_domains |= obj->base.read_domains;
> -		obj->base.read_domains = obj->base.pending_read_domains;
> -
> -		i915_vma_move_to_active(vma, ring);
> -		if (obj->base.write_domain) {
> -			obj->dirty = 1;
> -			obj->last_write_seqno = seqno;
> -
> -			intel_fb_obj_invalidate(obj, ring);
> -
> -			/* update for the implicit flush after a batch */
> -			obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
> -		}
> -		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
> -			obj->last_fenced_seqno = seqno;
> -			if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
> -				struct drm_i915_private *dev_priv = to_i915(ring->dev);
> -				list_move_tail(&dev_priv->fence_regs[obj->fence_reg].lru_list,
> -					       &dev_priv->mm.fence_list);
> -			}
> -		}
> -
> -		trace_i915_gem_object_change_domain(obj, old_read, old_write);
> -	}
> -}
> -
> -void
> -i915_gem_execbuffer_retire_commands(struct drm_device *dev,
> -				    struct drm_file *file,
> -				    struct intel_engine_cs *ring,
> -				    struct drm_i915_gem_object *obj)
> -{
> -	/* Unconditionally force add_request to emit a full flush. */
> -	ring->gpu_caches_dirty = true;
> -
> -	/* Add a breadcrumb for the completion of the batch buffer */
> -	(void)__i915_add_request(ring, file, obj, NULL);
> -}
> -
>   static int
> -i915_reset_gen7_sol_offsets(struct drm_device *dev,
> -			    struct intel_engine_cs *ring)
> +reset_sol_offsets(struct i915_gem_request *rq)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret, i;
> +	struct intel_ringbuffer *ring;
> +	int i;
>   
> -	if (!IS_GEN7(dev) || ring != &dev_priv->ring[RCS]) {
> +	if (!IS_GEN7(rq->i915) || rq->engine->id != RCS) {
>   		DRM_DEBUG("sol reset is gen7/rcs only\n");
>   		return -EINVAL;
>   	}
>   
> -	ret = intel_ring_begin(ring, 4 * 3);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4 * 3);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	for (i = 0; i < 4; i++) {
>   		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> @@ -1019,74 +1002,119 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
>   	}
>   
>   	intel_ring_advance(ring);
> -
>   	return 0;
>   }
>   
>   static int
> -i915_emit_box(struct intel_engine_cs *ring,
> -	      struct drm_clip_rect *box,
> -	      int DR1, int DR4)
> +emit_box(struct i915_gem_request *rq,
> +	 struct drm_clip_rect *box,
> +	 int DR1, int DR4)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
>   	if (box->y2 <= box->y1 || box->x2 <= box->x1 ||
>   	    box->y2 <= 0 || box->x2 <= 0) {
> -		DRM_ERROR("Bad box %d,%d..%d,%d\n",
> +		DRM_DEBUG("Bad box %d,%d..%d,%d\n",
>   			  box->x1, box->y1, box->x2, box->y2);
>   		return -EINVAL;
>   	}
>   
> -	if (INTEL_INFO(ring->dev)->gen >= 4) {
> -		ret = intel_ring_begin(ring, 4);
> -		if (ret)
> -			return ret;
> +	if (INTEL_INFO(rq->i915)->gen >= 4) {
> +		ring = intel_ring_begin(rq, 4);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
>   
>   		intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO_I965);
> -		intel_ring_emit(ring, (box->x1 & 0xffff) | box->y1 << 16);
> -		intel_ring_emit(ring, ((box->x2 - 1) & 0xffff) | (box->y2 - 1) << 16);
> -		intel_ring_emit(ring, DR4);
>   	} else {
> -		ret = intel_ring_begin(ring, 6);
> -		if (ret)
> -			return ret;
> +		ring = intel_ring_begin(rq, 5);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
>   
>   		intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO);
>   		intel_ring_emit(ring, DR1);
> -		intel_ring_emit(ring, (box->x1 & 0xffff) | box->y1 << 16);
> -		intel_ring_emit(ring, ((box->x2 - 1) & 0xffff) | (box->y2 - 1) << 16);
> -		intel_ring_emit(ring, DR4);
> -		intel_ring_emit(ring, 0);
>   	}
> +	intel_ring_emit(ring, (box->x1 & 0xffff) | box->y1 << 16);
> +	intel_ring_emit(ring, ((box->x2 - 1) & 0xffff) | (box->y2 - 1) << 16);
> +	intel_ring_emit(ring, DR4);
>   	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> +static int set_contants_base(struct i915_gem_request *rq,
> +			     struct drm_i915_gem_execbuffer2 *args)
> +{
> +	int mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> +	u32 mask = I915_EXEC_CONSTANTS_MASK;
>   
> -int
> -i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
> -			       struct intel_engine_cs *ring,
> -			       struct intel_context *ctx,
> -			       struct drm_i915_gem_execbuffer2 *args,
> -			       struct list_head *vmas,
> -			       struct drm_i915_gem_object *batch_obj,
> -			       u64 exec_start, u32 flags)
> +	switch (mode) {
> +	case I915_EXEC_CONSTANTS_REL_GENERAL:
> +	case I915_EXEC_CONSTANTS_ABSOLUTE:
> +	case I915_EXEC_CONSTANTS_REL_SURFACE:
> +		if (mode != 0 && rq->engine->id != RCS) {
> +			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> +			return -EINVAL;
> +		}
> +
> +		if (mode != rq->engine->i915->relative_constants_mode) {
> +			if (INTEL_INFO(rq->engine->i915)->gen < 4) {
> +				DRM_DEBUG("no rel constants on pre-gen4\n");
> +				return -EINVAL;
> +			}
> +
> +			if (INTEL_INFO(rq->engine->i915)->gen > 5 &&
> +			    mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> +				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> +				return -EINVAL;
> +			}
> +
> +			/* The HW changed the meaning on this bit on gen6 */
> +			if (INTEL_INFO(rq->i915)->gen >= 6)
> +				mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> +		}
> +		break;
> +	default:
> +		DRM_DEBUG("execbuf with unknown constants: %d\n", mode);
> +		return -EINVAL;
> +	}
> +
> +	/* XXX INSTPM is per-context not global etc */
> +	if (rq->engine->id == RCS && mode != rq->i915->relative_constants_mode) {
> +		struct intel_ringbuffer *ring;
> +
> +		ring = intel_ring_begin(rq, 3);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
> +
> +		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> +		intel_ring_emit(ring, INSTPM);
> +		intel_ring_emit(ring, mask << 16 | mode);
> +		intel_ring_advance(ring);
> +
> +		rq->i915->relative_constants_mode = mode;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +submit_execbuf(struct intel_engine_cs *engine,
> +	       struct intel_context *ctx,
> +	       struct drm_i915_gem_execbuffer2 *args,
> +	       struct eb_vmas *eb,
> +	       u64 exec_start, u32 flags)
>   {
>   	struct drm_clip_rect *cliprects = NULL;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	u64 exec_len;
> -	int instp_mode;
> -	u32 instp_mask;
> +	struct i915_gem_request *rq = NULL;
>   	int i, ret = 0;
>   
>   	if (args->num_cliprects != 0) {
> -		if (ring != &dev_priv->ring[RCS]) {
> +		if (engine->id != RCS) {
>   			DRM_DEBUG("clip rectangles are only valid with the render ring\n");
>   			return -EINVAL;
>   		}
>   
> -		if (INTEL_INFO(dev)->gen >= 5) {
> +		if (INTEL_INFO(engine->i915)->gen >= 5) {
>   			DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
>   			return -EINVAL;
>   		}
> @@ -1108,7 +1136,6 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>   		if (copy_from_user(cliprects,
>   				   to_user_ptr(args->cliprects_ptr),
>   				   sizeof(*cliprects)*args->num_cliprects)) {
> -			ret = -EFAULT;
>   			goto error;
>   		}
>   	} else {
> @@ -1123,168 +1150,108 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>   		}
>   	}
>   
> -	ret = i915_gem_execbuffer_move_to_gpu(ring, vmas);
> -	if (ret)
> -		goto error;
> +	rq = intel_engine_alloc_request(engine, ctx);
> +	if (IS_ERR(rq)) {
> +		kfree(cliprects);
> +		return PTR_ERR(rq);
> +	}
>   
> -	ret = i915_switch_context(ring, ctx);
> +	ret = vmas_move_to_rq(&eb->vmas, rq);
>   	if (ret)
>   		goto error;
>   
> -	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> -	instp_mask = I915_EXEC_CONSTANTS_MASK;
> -	switch (instp_mode) {
> -	case I915_EXEC_CONSTANTS_REL_GENERAL:
> -	case I915_EXEC_CONSTANTS_ABSOLUTE:
> -	case I915_EXEC_CONSTANTS_REL_SURFACE:
> -		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
> -			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> -			ret = -EINVAL;
> -			goto error;
> -		}
> -
> -		if (instp_mode != dev_priv->relative_constants_mode) {
> -			if (INTEL_INFO(dev)->gen < 4) {
> -				DRM_DEBUG("no rel constants on pre-gen4\n");
> -				ret = -EINVAL;
> -				goto error;
> -			}
> -
> -			if (INTEL_INFO(dev)->gen > 5 &&
> -			    instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> -				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> -				ret = -EINVAL;
> -				goto error;
> -			}
> -
> -			/* The HW changed the meaning on this bit on gen6 */
> -			if (INTEL_INFO(dev)->gen >= 6)
> -				instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> -		}
> -		break;
> -	default:
> -		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
> -		ret = -EINVAL;
> +	ret = set_contants_base(rq, args);
> +	if (ret)
>   		goto error;
> -	}
> -
> -	if (ring == &dev_priv->ring[RCS] &&
> -			instp_mode != dev_priv->relative_constants_mode) {
> -		ret = intel_ring_begin(ring, 4);
> -		if (ret)
> -			goto error;
> -
> -		intel_ring_emit(ring, MI_NOOP);
> -		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -		intel_ring_emit(ring, INSTPM);
> -		intel_ring_emit(ring, instp_mask << 16 | instp_mode);
> -		intel_ring_advance(ring);
> -
> -		dev_priv->relative_constants_mode = instp_mode;
> -	}
>   
>   	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> -		ret = i915_reset_gen7_sol_offsets(dev, ring);
> +		ret = reset_sol_offsets(rq);
>   		if (ret)
>   			goto error;
>   	}
>   
> -	exec_len = args->batch_len;
>   	if (cliprects) {
>   		for (i = 0; i < args->num_cliprects; i++) {
> -			ret = i915_emit_box(ring, &cliprects[i],
> -					    args->DR1, args->DR4);
> +			ret = emit_box(rq, &cliprects[i],
> +				       args->DR1, args->DR4);
>   			if (ret)
>   				goto error;
>   
> -			ret = ring->dispatch_execbuffer(ring,
> -							exec_start, exec_len,
> -							flags);
> +			ret = i915_request_emit_batchbuffer(rq, eb->batch,
> +							    exec_start,
> +							    args->batch_len,
> +							    flags);
>   			if (ret)
>   				goto error;
>   		}
>   	} else {
> -		ret = ring->dispatch_execbuffer(ring,
> -						exec_start, exec_len,
> -						flags);
> +		ret = i915_request_emit_batchbuffer(rq, eb->batch,
> +						    exec_start,
> +						    args->batch_len,
> +						    flags);
>   		if (ret)
> -			return ret;
> +			goto error;
>   	}
>   
> -	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
> +	ret = i915_request_commit(rq);
> +	if (ret)
> +		goto error;
> +
> +	i915_queue_hangcheck(rq->i915->dev);
>   
> -	i915_gem_execbuffer_move_to_active(vmas, ring);
> -	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> +	cancel_delayed_work_sync(&rq->i915->mm.idle_work);
> +	queue_delayed_work(rq->i915->wq,
> +			   &rq->i915->mm.retire_work,
> +			   round_jiffies_up_relative(HZ));
> +	intel_mark_busy(rq->i915->dev);
>   
>   error:
> +	i915_request_put(rq);
>   	kfree(cliprects);
>   	return ret;
>   }
>   
>   /**
>    * Find one BSD ring to dispatch the corresponding BSD command.
> - * The Ring ID is returned.
>    */
> -static int gen8_dispatch_bsd_ring(struct drm_device *dev,
> -				  struct drm_file *file)
> +static struct intel_engine_cs *
> +gen8_select_bsd_engine(struct drm_i915_private *dev_priv,
> +		       struct drm_file *file)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   
> -	/* Check whether the file_priv is using one ring */
> -	if (file_priv->bsd_ring)
> -		return file_priv->bsd_ring->id;
> -	else {
> -		/* If no, use the ping-pong mechanism to select one ring */
> -		int ring_id;
> +	/* Use the ping-pong mechanism to select one ring for this client */
> +	if (file_priv->bsd_engine == NULL) {
> +		int id;
>   
> -		mutex_lock(&dev->struct_mutex);
> +		mutex_lock(&dev_priv->dev->struct_mutex);
>   		if (dev_priv->mm.bsd_ring_dispatch_index == 0) {
> -			ring_id = VCS;
> +			id = VCS;
>   			dev_priv->mm.bsd_ring_dispatch_index = 1;
>   		} else {
> -			ring_id = VCS2;
> +			id = VCS2;
>   			dev_priv->mm.bsd_ring_dispatch_index = 0;
>   		}
> -		file_priv->bsd_ring = &dev_priv->ring[ring_id];
> -		mutex_unlock(&dev->struct_mutex);
> -		return ring_id;
> +		file_priv->bsd_engine = &dev_priv->engine[id];
> +		mutex_unlock(&dev_priv->dev->struct_mutex);
>   	}
> -}
> -
> -static struct drm_i915_gem_object *
> -eb_get_batch(struct eb_vmas *eb)
> -{
> -	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
>   
> -	/*
> -	 * SNA is doing fancy tricks with compressing batch buffers, which leads
> -	 * to negative relocation deltas. Usually that works out ok since the
> -	 * relocate address is still positive, except when the batch is placed
> -	 * very low in the GTT. Ensure this doesn't happen.
> -	 *
> -	 * Note that actual hangs have only been observed on gen7, but for
> -	 * paranoia do it everywhere.
> -	 */
> -	vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
> -
> -	return vma->obj;
> +	return file_priv->bsd_engine;
>   }
>   
>   static int
> -i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> +i915_gem_do_execbuffer(struct drm_i915_private *dev_priv, void *data,
>   		       struct drm_file *file,
>   		       struct drm_i915_gem_execbuffer2 *args,
>   		       struct drm_i915_gem_exec_object2 *exec)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct eb_vmas *eb;
> -	struct drm_i915_gem_object *batch_obj;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	struct intel_context *ctx;
>   	struct i915_address_space *vm;
>   	const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
>   	u64 exec_start = args->batch_start_offset;
> +	struct drm_i915_gem_object *batch;
>   	u32 flags;
>   	int ret;
>   	bool need_relocs;
> @@ -1292,7 +1259,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	if (!i915_gem_check_execbuffer(args))
>   		return -EINVAL;
>   
> -	ret = validate_exec_list(dev, exec, args->buffer_count);
> +	ret = validate_exec_list(dev_priv, exec, args->buffer_count);
>   	if (ret)
>   		return ret;
>   
> @@ -1313,18 +1280,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	}
>   
>   	if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_DEFAULT)
> -		ring = &dev_priv->ring[RCS];
> +		engine = &dev_priv->engine[RCS];
>   	else if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_BSD) {
> -		if (HAS_BSD2(dev)) {
> -			int ring_id;
> -			ring_id = gen8_dispatch_bsd_ring(dev, file);
> -			ring = &dev_priv->ring[ring_id];
> -		} else
> -			ring = &dev_priv->ring[VCS];
> +		if (HAS_BSD2(dev_priv))
> +			engine = gen8_select_bsd_engine(dev_priv, file);
> +		else
> +			engine = &dev_priv->engine[VCS];
>   	} else
> -		ring = &dev_priv->ring[(args->flags & I915_EXEC_RING_MASK) - 1];
> +		engine = &dev_priv->engine[(args->flags & I915_EXEC_RING_MASK) - 1];
>   
> -	if (!intel_ring_initialized(ring)) {
> +	if (!intel_engine_initialized(engine)) {
>   		DRM_DEBUG("execbuf with invalid ring: %d\n",
>   			  (int)(args->flags & I915_EXEC_RING_MASK));
>   		return -EINVAL;
> @@ -1337,19 +1302,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   
>   	intel_runtime_pm_get(dev_priv);
>   
> -	ret = i915_mutex_lock_interruptible(dev);
> +	ret = i915_mutex_lock_interruptible(dev_priv->dev);
>   	if (ret)
>   		goto pre_mutex_err;
>   
>   	if (dev_priv->ums.mm_suspended) {
> -		mutex_unlock(&dev->struct_mutex);
> +		mutex_unlock(&dev_priv->dev->struct_mutex);
>   		ret = -EBUSY;
>   		goto pre_mutex_err;
>   	}
>   
> -	ctx = i915_gem_validate_context(dev, file, ring, ctx_id);
> +	ctx = i915_gem_validate_context(file, engine, ctx_id);
>   	if (IS_ERR(ctx)) {
> -		mutex_unlock(&dev->struct_mutex);
> +		mutex_unlock(&dev_priv->dev->struct_mutex);
>   		ret = PTR_ERR(ctx);
>   		goto pre_mutex_err;
>   	}
> @@ -1364,7 +1329,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	eb = eb_create(args);
>   	if (eb == NULL) {
>   		i915_gem_context_unreference(ctx);
> -		mutex_unlock(&dev->struct_mutex);
> +		mutex_unlock(&dev_priv->dev->struct_mutex);
>   		ret = -ENOMEM;
>   		goto pre_mutex_err;
>   	}
> @@ -1374,12 +1339,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	if (ret)
>   		goto err;
>   
> -	/* take note of the batch buffer before we might reorder the lists */
> -	batch_obj = eb_get_batch(eb);
> -
>   	/* Move the objects en-masse into the GTT, evicting if necessary. */
>   	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> -	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
> +	ret = i915_gem_execbuffer_reserve(engine, &eb->vmas, &need_relocs);
>   	if (ret)
>   		goto err;
>   
> @@ -1388,25 +1350,25 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		ret = i915_gem_execbuffer_relocate(eb);
>   	if (ret) {
>   		if (ret == -EFAULT) {
> -			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
> +			ret = i915_gem_execbuffer_relocate_slow(dev_priv, args, file, engine,
>   								eb, exec);
> -			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> +			BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
>   		}
>   		if (ret)
>   			goto err;
>   	}
>   
>   	/* Set the pending read domains for the batch buffer to COMMAND */
> -	if (batch_obj->base.pending_write_domain) {
> +	batch = eb->batch->obj;
> +	if (batch->base.pending_write_domain) {
>   		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
>   		ret = -EINVAL;
>   		goto err;
>   	}
> -	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> +	batch->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
>   
> -	if (i915_needs_cmd_parser(ring)) {
> -		ret = i915_parse_cmds(ring,
> -				      batch_obj,
> +	if (i915_needs_cmd_parser(engine)) {
> +		ret = i915_parse_cmds(engine, batch,
>   				      args->batch_start_offset,
>   				      file->is_master);
>   		if (ret)
> @@ -1436,16 +1398,15 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		 *   fitting due to fragmentation.
>   		 * So this is actually safe.
>   		 */
> -		ret = i915_gem_obj_ggtt_pin(batch_obj, 0, 0);
> +		ret = i915_gem_obj_ggtt_pin(batch, 0, 0);
>   		if (ret)
>   			goto err;
>   
> -		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
> +		exec_start += i915_gem_obj_ggtt_offset(batch);
>   	} else
> -		exec_start += i915_gem_obj_offset(batch_obj, vm);
> +		exec_start += i915_gem_obj_offset(batch, vm);
>   
> -	ret = dev_priv->gt.do_execbuf(dev, file, ring, ctx, args,
> -				      &eb->vmas, batch_obj, exec_start, flags);
> +	ret = submit_execbuf(engine, ctx, args, eb, exec_start, flags);
>   
>   	/*
>   	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
> @@ -1454,13 +1415,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	 * active.
>   	 */
>   	if (flags & I915_DISPATCH_SECURE)
> -		i915_gem_object_ggtt_unpin(batch_obj);
> +		i915_gem_object_ggtt_unpin(batch);
>   err:
>   	/* the request owns the ref now */
>   	i915_gem_context_unreference(ctx);
>   	eb_destroy(eb);
>   
> -	mutex_unlock(&dev->struct_mutex);
> +	mutex_unlock(&dev_priv->dev->struct_mutex);
>   
>   pre_mutex_err:
>   	/* intel_gpu_busy should also get a ref, so it will free when the device
> @@ -1532,7 +1493,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
>   	exec2.flags = I915_EXEC_RENDER;
>   	i915_execbuffer2_set_context_id(exec2, 0);
>   
> -	ret = i915_gem_do_execbuffer(dev, data, file, &exec2, exec2_list);
> +	ret = i915_gem_do_execbuffer(to_i915(dev), data, file, &exec2, exec2_list);
>   	if (!ret) {
>   		struct drm_i915_gem_exec_object __user *user_exec_list =
>   			to_user_ptr(args->buffers_ptr);
> @@ -1596,7 +1557,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
>   		return -EFAULT;
>   	}
>   
> -	ret = i915_gem_do_execbuffer(dev, data, file, args, exec2_list);
> +	ret = i915_gem_do_execbuffer(to_i915(dev), data, file, args, exec2_list);
>   	if (!ret) {
>   		/* Copy the new buffer offsets back to the user's exec list. */
>   		struct drm_i915_gem_exec_object2 __user *user_exec_list =
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 6f410cfb0510..ba9bce1a2f07 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -203,30 +203,28 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>   }
>   
>   /* Broadwell Page Directory Pointer Descriptors */
> -static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
> -			   uint64_t val)
> +static int gen8_write_pdp(struct i915_gem_request *rq, unsigned entry, uint64_t val)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
>   	BUG_ON(entry >= 4);
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 5);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
> +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
> +	intel_ring_emit(ring, GEN8_RING_PDP_UDW(rq->engine, entry));
>   	intel_ring_emit(ring, (u32)(val >> 32));
> -	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
> +	intel_ring_emit(ring, GEN8_RING_PDP_LDW(rq->engine, entry));
>   	intel_ring_emit(ring, (u32)(val));
>   	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> -static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> -			  struct intel_engine_cs *ring)
> +static int gen8_mm_switch(struct i915_gem_request *rq,
> +			  struct i915_hw_ppgtt *ppgtt)
>   {
>   	int i, ret;
>   
> @@ -235,7 +233,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>   
>   	for (i = used_pd - 1; i >= 0; i--) {
>   		dma_addr_t addr = ppgtt->pd_dma_addr[i];
> -		ret = gen8_write_pdp(ring, i, addr);
> +		ret = gen8_write_pdp(rq, i, addr);
>   		if (ret)
>   			return ret;
>   	}
> @@ -699,94 +697,81 @@ static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>   	return (ppgtt->pd_offset / 64) << 16;
>   }
>   
> -static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
> -			 struct intel_engine_cs *ring)
> +static int hsw_mm_switch(struct i915_gem_request *rq,
> +			 struct i915_hw_ppgtt *ppgtt)
>   {
> +	struct intel_ringbuffer *ring;
>   	int ret;
>   
>   	/* NB: TLBs must be flushed and invalidated before a switch */
> -	ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
> +	ret = i915_request_emit_flush(rq, I915_INVALIDATE_CACHES);
>   	if (ret)
>   		return ret;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 5);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
> -	intel_ring_emit(ring, RING_PP_DIR_DCLV(ring));
> +	intel_ring_emit(ring, RING_PP_DIR_DCLV(rq->engine));
>   	intel_ring_emit(ring, PP_DIR_DCLV_2G);
> -	intel_ring_emit(ring, RING_PP_DIR_BASE(ring));
> +	intel_ring_emit(ring, RING_PP_DIR_BASE(rq->engine));
>   	intel_ring_emit(ring, get_pd_offset(ppgtt));
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> -static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
> -			  struct intel_engine_cs *ring)
> +static int gen7_mm_switch(struct i915_gem_request *rq,
> +			  struct i915_hw_ppgtt *ppgtt)
>   {
> +	struct intel_ringbuffer *ring;
>   	int ret;
>   
>   	/* NB: TLBs must be flushed and invalidated before a switch */
> -	ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
> +	ret = i915_request_emit_flush(rq, I915_INVALIDATE_CACHES);
>   	if (ret)
>   		return ret;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 5);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
> -	intel_ring_emit(ring, RING_PP_DIR_DCLV(ring));
> +	intel_ring_emit(ring, RING_PP_DIR_DCLV(rq->engine));
>   	intel_ring_emit(ring, PP_DIR_DCLV_2G);
> -	intel_ring_emit(ring, RING_PP_DIR_BASE(ring));
> +	intel_ring_emit(ring, RING_PP_DIR_BASE(rq->engine));
>   	intel_ring_emit(ring, get_pd_offset(ppgtt));
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
>   	/* XXX: RCS is the only one to auto invalidate the TLBs? */
> -	if (ring->id != RCS) {
> -		ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
> -		if (ret)
> -			return ret;
> -	}
> +	if (rq->engine->id != RCS)
> +		rq->pending_flush |= I915_INVALIDATE_CACHES;
>   
>   	return 0;
>   }
>   
> -static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
> -			  struct intel_engine_cs *ring)
> +static int gen6_mm_switch(struct i915_gem_request *rq,
> +			  struct i915_hw_ppgtt *ppgtt)
>   {
> -	struct drm_device *dev = ppgtt->base.dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -
> -	I915_WRITE(RING_PP_DIR_DCLV(ring), PP_DIR_DCLV_2G);
> -	I915_WRITE(RING_PP_DIR_BASE(ring), get_pd_offset(ppgtt));
> -
> -	POSTING_READ(RING_PP_DIR_DCLV(ring));
> -
> -	return 0;
> +	return -ENODEV;
>   }
>   
>   static void gen8_ppgtt_enable(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int j;
>   
> -	for_each_ring(ring, dev_priv, j) {
> -		I915_WRITE(RING_MODE_GEN7(ring),
> +	for_each_engine(engine, dev_priv, j)
> +		I915_WRITE(RING_MODE_GEN7(engine),
>   			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> -	}
>   }
>   
>   static void gen7_ppgtt_enable(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	uint32_t ecochk, ecobits;
>   	int i;
>   
> @@ -802,10 +787,15 @@ static void gen7_ppgtt_enable(struct drm_device *dev)
>   	}
>   	I915_WRITE(GAM_ECOCHK, ecochk);
>   
> -	for_each_ring(ring, dev_priv, i) {
> +	for_each_engine(engine, dev_priv, i) {
>   		/* GFX_MODE is per-ring on gen7+ */
> -		I915_WRITE(RING_MODE_GEN7(ring),
> +		I915_WRITE(RING_MODE_GEN7(engine),
>   			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> +
> +		I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
> +		I915_WRITE(RING_PP_DIR_BASE(engine), get_pd_offset(dev_priv->mm.aliasing_ppgtt));
> +
> +		POSTING_READ(RING_PP_DIR_DCLV(engine));
>   	}
>   }
>   
> @@ -813,6 +803,8 @@ static void gen6_ppgtt_enable(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	uint32_t ecochk, gab_ctl, ecobits;
> +	struct intel_engine_cs *engine;
> +	int i;
>   
>   	ecobits = I915_READ(GAC_ECO_BITS);
>   	I915_WRITE(GAC_ECO_BITS, ecobits | ECOBITS_SNB_BIT |
> @@ -825,6 +817,13 @@ static void gen6_ppgtt_enable(struct drm_device *dev)
>   	I915_WRITE(GAM_ECOCHK, ecochk | ECOCHK_SNB_BIT | ECOCHK_PPGTT_CACHE64B);
>   
>   	I915_WRITE(GFX_MODE, _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> +
> +	for_each_engine(engine, dev_priv, i) {
> +		I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
> +		I915_WRITE(RING_PP_DIR_BASE(engine), get_pd_offset(dev_priv->mm.aliasing_ppgtt));
> +
> +		POSTING_READ(RING_PP_DIR_DCLV(engine));
> +	}
>   }
>   
>   /* PPGTT support for Sandybdrige/Gen6 and later */
> @@ -1115,18 +1114,13 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>   
>   int i915_ppgtt_init_hw(struct drm_device *dev)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> -	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
> -	int i, ret = 0;
> +	if (!USES_PPGTT(dev))
> +		return 0;
>   
>   	/* In the case of execlists, PPGTT is enabled by the context descriptor
>   	 * and the PDPs are contained within the context itself.  We don't
>   	 * need to do anything here. */
> -	if (i915.enable_execlists)
> -		return 0;
> -
> -	if (!USES_PPGTT(dev))
> +	if (RCS_ENGINE(dev)->execlists_enabled)
>   		return 0;
>   
>   	if (IS_GEN6(dev))
> @@ -1138,15 +1132,7 @@ int i915_ppgtt_init_hw(struct drm_device *dev)
>   	else
>   		WARN_ON(1);
>   
> -	if (ppgtt) {
> -		for_each_ring(ring, dev_priv, i) {
> -			ret = ppgtt->switch_mm(ppgtt, ring);
> -			if (ret != 0)
> -				return ret;
> -		}
> -	}
> -
> -	return ret;
> +	return 0;
>   }
>   struct i915_hw_ppgtt *
>   i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
> @@ -1247,15 +1233,15 @@ static void undo_idling(struct drm_i915_private *dev_priv, bool interruptible)
>   void i915_check_and_clear_faults(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int i;
>   
>   	if (INTEL_INFO(dev)->gen < 6)
>   		return;
>   
> -	for_each_ring(ring, dev_priv, i) {
> +	for_each_engine(engine, dev_priv, i) {
>   		u32 fault_reg;
> -		fault_reg = I915_READ(RING_FAULT_REG(ring));
> +		fault_reg = I915_READ(RING_FAULT_REG(engine));
>   		if (fault_reg & RING_FAULT_VALID) {
>   			DRM_DEBUG_DRIVER("Unexpected fault\n"
>   					 "\tAddr: 0x%08lx\\n"
> @@ -1266,11 +1252,11 @@ void i915_check_and_clear_faults(struct drm_device *dev)
>   					 fault_reg & RING_FAULT_GTTSEL_MASK ? "GGTT" : "PPGTT",
>   					 RING_FAULT_SRCID(fault_reg),
>   					 RING_FAULT_FAULT_TYPE(fault_reg));
> -			I915_WRITE(RING_FAULT_REG(ring),
> +			I915_WRITE(RING_FAULT_REG(engine),
>   				   fault_reg & ~RING_FAULT_VALID);
>   		}
>   	}
> -	POSTING_READ(RING_FAULT_REG(&dev_priv->ring[RCS]));
> +	POSTING_READ(RING_FAULT_REG(RCS_ENGINE(dev_priv)));
>   }
>   
>   void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d5c14af51e99..0802832df28c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -263,8 +263,8 @@ struct i915_hw_ppgtt {
>   	struct drm_i915_file_private *file_priv;
>   
>   	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> -	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> -			 struct intel_engine_cs *ring);
> +	int (*switch_mm)(struct i915_gem_request *rq,
> +			 struct i915_hw_ppgtt *ppgtt);
>   	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
>   };
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> index a9a62d75aa57..fffd26dfa4dd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> @@ -28,8 +28,15 @@
>   #include "i915_drv.h"
>   #include "intel_renderstate.h"
>   
> +struct render_state {
> +	const struct intel_renderstate_rodata *rodata;
> +	struct drm_i915_gem_object *obj;
> +	u64 ggtt_offset;
> +	int gen;
> +};
> +
>   static const struct intel_renderstate_rodata *
> -render_state_get_rodata(struct drm_device *dev, const int gen)
> +render_state_get_rodata(const int gen)
>   {
>   	switch (gen) {
>   	case 6:
> @@ -43,19 +50,19 @@ render_state_get_rodata(struct drm_device *dev, const int gen)
>   	return NULL;
>   }
>   
> -static int render_state_init(struct render_state *so, struct drm_device *dev)
> +static int render_state_init(struct render_state *so, struct i915_gem_request *rq)
>   {
>   	int ret;
>   
> -	so->gen = INTEL_INFO(dev)->gen;
> -	so->rodata = render_state_get_rodata(dev, so->gen);
> +	so->gen = INTEL_INFO(rq->i915)->gen;
> +	so->rodata = render_state_get_rodata(so->gen);
>   	if (so->rodata == NULL)
>   		return 0;
>   
>   	if (so->rodata->batch_items * 4 > 4096)
>   		return -EINVAL;
>   
> -	so->obj = i915_gem_alloc_object(dev, 4096);
> +	so->obj = i915_gem_alloc_object(rq->i915->dev, 4096);
>   	if (so->obj == NULL)
>   		return -ENOMEM;
>   
> @@ -108,10 +115,6 @@ static int render_state_setup(struct render_state *so)
>   	}
>   	kunmap(page);
>   
> -	ret = i915_gem_object_set_to_gtt_domain(so->obj, false);
> -	if (ret)
> -		return ret;
> -
>   	if (rodata->reloc[reloc_index] != -1) {
>   		DRM_ERROR("only %d relocs resolved\n", reloc_index);
>   		return -EINVAL;
> @@ -120,60 +123,46 @@ static int render_state_setup(struct render_state *so)
>   	return 0;
>   }
>   
> -void i915_gem_render_state_fini(struct render_state *so)
> +static void render_state_fini(struct render_state *so)
>   {
>   	i915_gem_object_ggtt_unpin(so->obj);
>   	drm_gem_object_unreference(&so->obj->base);
>   }
>   
> -int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> -				  struct render_state *so)
> +int i915_gem_render_state_init(struct i915_gem_request *rq)
>   {
> +	struct render_state so;
>   	int ret;
>   
> -	if (WARN_ON(ring->id != RCS))
> +	if (WARN_ON(rq->engine->id != RCS))
>   		return -ENOENT;
>   
> -	ret = render_state_init(so, ring->dev);
> +	ret = render_state_init(&so, rq);
>   	if (ret)
>   		return ret;
>   
> -	if (so->rodata == NULL)
> +	if (so.rodata == NULL)
>   		return 0;
>   
> -	ret = render_state_setup(so);
> -	if (ret) {
> -		i915_gem_render_state_fini(so);
> -		return ret;
> -	}
> -
> -	return 0;
> -}
> -
> -int i915_gem_render_state_init(struct intel_engine_cs *ring)
> -{
> -	struct render_state so;
> -	int ret;
> -
> -	ret = i915_gem_render_state_prepare(ring, &so);
> +	ret = render_state_setup(&so);
>   	if (ret)
> -		return ret;
> +		goto out;
>   
> -	if (so.rodata == NULL)
> -		return 0;
> +	if (i915_gem_clflush_object(so.obj, false))
> +		i915_gem_chipset_flush(rq->i915->dev);
>   
> -	ret = ring->dispatch_execbuffer(ring,
> -					so.ggtt_offset,
> -					so.rodata->batch_items * 4,
> -					I915_DISPATCH_SECURE);
> +	ret = i915_request_emit_batchbuffer(rq, NULL,
> +					    so.ggtt_offset,
> +					    so.rodata->batch_items * 4,
> +					    I915_DISPATCH_SECURE);
>   	if (ret)
>   		goto out;
>   
> -	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
> +	so.obj->base.pending_read_domains = I915_GEM_DOMAIN_COMMAND;
> +	ret = i915_request_add_vma(rq, i915_gem_obj_to_ggtt(so.obj), 0);
>   
> -	ret = __i915_add_request(ring, NULL, so.obj, NULL);
>   	/* __i915_add_request moves object to inactive if it fails */
>   out:
> -	i915_gem_render_state_fini(&so);
> +	render_state_fini(&so);
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
> deleted file mode 100644
> index c44961ed3fad..000000000000
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.h
> +++ /dev/null
> @@ -1,47 +0,0 @@
> -/*
> - * Copyright © 2014 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the next
> - * paragraph) shall be included in all copies or substantial portions of the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> - * DEALINGS IN THE SOFTWARE.
> - */
> -
> -#ifndef _I915_GEM_RENDER_STATE_H_
> -#define _I915_GEM_RENDER_STATE_H_
> -
> -#include <linux/types.h>
> -
> -struct intel_renderstate_rodata {
> -	const u32 *reloc;
> -	const u32 *batch;
> -	const u32 batch_items;
> -};
> -
> -struct render_state {
> -	const struct intel_renderstate_rodata *rodata;
> -	struct drm_i915_gem_object *obj;
> -	u64 ggtt_offset;
> -	int gen;
> -};
> -
> -int i915_gem_render_state_init(struct intel_engine_cs *ring);
> -void i915_gem_render_state_fini(struct render_state *so);
> -int i915_gem_render_state_prepare(struct intel_engine_cs *ring,
> -				  struct render_state *so);
> -
> -#endif /* _I915_GEM_RENDER_STATE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> new file mode 100644
> index 000000000000..582c5df2933e
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -0,0 +1,651 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include <drm/drmP.h>
> +#include "i915_drv.h"
> +#include <drm/i915_drm.h>
> +#include "i915_trace.h"
> +#include "intel_drv.h"
> +
> +struct i915_gem_request__vma {
> +	struct list_head link;
> +	struct i915_vma *vma;
> +	struct drm_i915_gem_object *obj;
> +	u32 write, fence;
> +};
> +
> +static bool check_reset(struct i915_gem_request *rq)
> +{
> +	unsigned reset = atomic_read(&rq->i915->gpu_error.reset_counter);
> +	return likely(reset == rq->reset_counter);
> +}
> +
> +int
> +i915_request_add_vma(struct i915_gem_request *rq,
> +		     struct i915_vma *vma,
> +		     unsigned fenced)
> +{
> +	struct drm_i915_gem_object *obj = vma->obj;
> +	u32 old_read = obj->base.read_domains;
> +	u32 old_write = obj->base.write_domain;
> +	struct i915_gem_request__vma *ref;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +	BUG_ON(!rq->outstanding);
> +
> +	obj->base.write_domain = obj->base.pending_write_domain;
> +	if (obj->base.write_domain == 0)
> +		obj->base.pending_read_domains |= obj->base.read_domains;
> +	obj->base.read_domains = obj->base.pending_read_domains;
> +
> +	obj->base.pending_read_domains = 0;
> +	obj->base.pending_write_domain = 0;
> +
> +	trace_i915_gem_object_change_domain(obj, old_read, old_write);
> +	if (obj->base.read_domains == 0)
> +		return 0;
> +
> +	ref = kmalloc(sizeof(*ref), GFP_KERNEL);
> +	if (ref == NULL)
> +		return -ENOMEM;
> +
> +	list_add(&ref->link, &rq->vmas);
> +	ref->vma = vma;
> +	ref->obj = obj;
> +	drm_gem_object_reference(&obj->base);
> +	ref->write = obj->base.write_domain & I915_GEM_GPU_DOMAINS;
> +	ref->fence = fenced;
> +
> +	if (ref->write) {
> +		rq->pending_flush |= I915_FLUSH_CACHES;
> +		intel_fb_obj_invalidate(obj, rq);
> +	}
> +
> +	/* update for the implicit flush after the rq */
> +	obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
> +	return 0;
> +}
> +
> +static void vma_free(struct i915_gem_request__vma *ref)
> +{
> +	drm_gem_object_unreference(&ref->obj->base);
> +	list_del(&ref->link);
> +	kfree(ref);
> +}
> +
> +int
> +i915_request_emit_flush(struct i915_gem_request *rq,
> +			unsigned flags)
> +{
> +	struct intel_engine_cs *engine = rq->engine;
> +	int ret;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +	BUG_ON(!rq->outstanding);
> +
> +	if ((flags & rq->pending_flush) == 0)
> +		return 0;
> +
> +	trace_i915_gem_request_emit_flush(rq);
> +	ret = engine->emit_flush(rq, rq->pending_flush);
> +	if (ret)
> +		return ret;
> +
> +	rq->pending_flush = 0;
> +	return 0;
> +}
> +
> +int
> +__i915_request_emit_breadcrumb(struct i915_gem_request *rq, int id)
> +{
> +	struct intel_engine_cs *engine = rq->engine;
> +	u32 seqno;
> +	int ret;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (rq->breadcrumb[id])
> +		return 0;
> +
> +	if (rq->outstanding) {
> +		ret = i915_request_emit_flush(rq, I915_COMMAND_BARRIER);
> +		if (ret)
> +			return ret;
> +
> +		trace_i915_gem_request_emit_breadcrumb(rq);
> +		if (id == engine->id)
> +			ret = engine->emit_breadcrumb(rq);
> +		else
> +			ret = engine->semaphore.signal(rq, id);
> +		if (ret)
> +			return ret;
> +
> +		seqno = rq->seqno;
> +	} else if (engine->breadcrumb[id] == 0 ||
> +		   __i915_seqno_passed(rq->seqno, engine->breadcrumb[id])) {
> +		struct i915_gem_request *tmp;
> +
> +		tmp = intel_engine_alloc_request(engine,
> +						 rq->ring->last_context);
> +		if (IS_ERR(tmp))
> +			return PTR_ERR(tmp);
> +
> +		/* Masquerade as a continuation of the earlier request */
> +		tmp->reset_counter = rq->reset_counter;
> +
> +		ret = __i915_request_emit_breadcrumb(tmp, id);
> +		if (ret == 0 && id != engine->id) {
> +			/* semaphores are unstable across a wrap */
> +			if (tmp->seqno < engine->breadcrumb[id])
> +				ret = i915_request_wait(tmp);
> +		}
> +		if (ret == 0)
> +			ret = i915_request_commit(tmp);
> +
> +		i915_request_put(tmp);
> +		if (ret)
> +			return ret;
> +
> +		seqno = tmp->seqno;
> +	} else
> +		seqno = engine->breadcrumb[id];
> +
> +	rq->breadcrumb[id] = seqno;
> +	return 0;
> +}
> +
> +int
> +i915_request_emit_batchbuffer(struct i915_gem_request *rq,
> +			      struct i915_vma *batch,
> +			      uint64_t start, uint32_t len,
> +			      unsigned flags)
> +{
> +	struct intel_engine_cs *engine = rq->engine;
> +	int ret;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +	BUG_ON(!rq->outstanding);
> +	BUG_ON(rq->breadcrumb[rq->engine->id]);
> +
> +	trace_i915_gem_request_emit_batch(rq);
> +	ret = engine->emit_batchbuffer(rq, start, len, flags);
> +	if (ret)
> +		return ret;
> +
> +	/* We track the associated batch vma for debugging and error capture.
> +	 * Whilst this request exists, the batch obj will be on the active_list,
> +	 * and so will hold the active reference. Only when this request is
> +	 * retired will the the batch be moved onto the inactive_list and lose
> +	 * its active reference. Hence we do not need to explicitly hold
> +	 * another reference here.
> +	 */
> +	rq->batch = batch;
> +	rq->pending_flush |= I915_COMMAND_BARRIER;
> +	return 0;
> +}
> +
> +/* Track the batches submitted by clients for throttling */
> +static void
> +add_to_client(struct i915_gem_request *rq)
> +{
> +	struct drm_i915_file_private *file_priv = rq->ctx->file_priv;
> +
> +	if (file_priv) {
> +		spin_lock(&file_priv->mm.lock);
> +		list_add_tail(&rq->client_list,
> +			      &file_priv->mm.request_list);
> +		rq->file_priv = file_priv;
> +		spin_unlock(&file_priv->mm.lock);
> +	}
> +}
> +
> +static void
> +remove_from_client(struct i915_gem_request *rq)
> +{
> +	struct drm_i915_file_private *file_priv = rq->file_priv;
> +
> +	if (!file_priv)
> +		return;
> +
> +	spin_lock(&file_priv->mm.lock);
> +	if (rq->file_priv) {
> +		list_del(&rq->client_list);
> +		rq->file_priv = NULL;
> +	}
> +	spin_unlock(&file_priv->mm.lock);
> +}
> +
> +/* Activity tracking on the object so that we can serialise CPU access to
> + * the object's memory with the GPU.
> + */
> +static void
> +add_to_obj(struct i915_gem_request *rq,
> +	   struct i915_gem_request__vma *ref)
> +{
> +	struct drm_i915_gem_object *obj = ref->obj;
> +	struct intel_engine_cs *engine = rq->engine;
> +
> +	/* Add a reference if we're newly entering the active list. */
> +	if (obj->last_read[engine->id].request == NULL && obj->active++ == 0)
> +		drm_gem_object_reference(&obj->base);
> +
> +	if (ref->write) {
> +		obj->dirty = 1;
> +		i915_request_put(obj->last_write.request);
> +		obj->last_write.request = i915_request_get(rq);
> +		list_move_tail(&obj->last_write.engine_list,
> +			       &engine->write_list);
> +
> +		if (obj->active > 1) {
> +			int i;
> +
> +			for (i = 0; i < I915_NUM_ENGINES; i++) {
> +				if (obj->last_read[i].request == NULL)
> +					continue;
> +
> +				list_del_init(&obj->last_read[i].engine_list);
> +				i915_request_put(obj->last_read[i].request);
> +				obj->last_read[i].request = NULL;
> +			}
> +
> +			obj->active = 1;
> +		}
> +	}
> +
> +	if (ref->fence & VMA_IS_FENCED) {
> +		i915_request_put(obj->last_fence.request);
> +		obj->last_fence.request = i915_request_get(rq);
> +		list_move_tail(&obj->last_fence.engine_list,
> +			       &engine->fence_list);
> +		if (ref->fence & VMA_HAS_FENCE)
> +			list_move_tail(&rq->i915->fence_regs[obj->fence_reg].lru_list,
> +					&rq->i915->mm.fence_list);
> +	}
> +
> +	i915_request_put(obj->last_read[engine->id].request);
> +	obj->last_read[engine->id].request = i915_request_get(rq);
> +	list_move_tail(&obj->last_read[engine->id].engine_list,
> +		       &engine->read_list);
> +
> +	list_move_tail(&ref->vma->mm_list, &ref->vma->vm->active_list);
> +}
> +
> +static bool leave_breadcrumb(struct i915_gem_request *rq)
> +{
> +	if (rq->breadcrumb[rq->engine->id])
> +		return false;
> +
> +	/* Auto-report HEAD every 4k to make sure that we can always wait on
> +	 * some available ring space in the future. This also caps the
> +	 * latency of future waits for missed breadcrumbs.
> +	 */
> +	if (__intel_ring_space(rq->ring->tail, rq->ring->breadcrumb_tail,
> +			       rq->ring->size, 0) >= PAGE_SIZE)
> +		return true;
> +
> +	return false;
> +}
> +
> +int i915_request_commit(struct i915_gem_request *rq)
> +{
> +	int ret, n;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (!rq->outstanding)
> +		return 0;
> +
> +	if (rq->head == rq->ring->tail) {
> +		rq->completed = true;
> +		goto done;
> +	}
> +
> +	if (intel_engine_hang(rq->engine))
> +		i915_handle_error(rq->i915->dev, true, "Simulated hang");
> +
> +	if (!check_reset(rq))
> +		return rq->i915->mm.interruptible ? -EAGAIN : -EIO;
> +
> +	if (leave_breadcrumb(rq)) {
> +		ret = i915_request_emit_breadcrumb(rq);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/* TAIL must be aligned to a qword */
> +	if ((rq->ring->tail / sizeof (uint32_t)) & 1) {
> +		intel_ring_emit(rq->ring, MI_NOOP);
> +		intel_ring_advance(rq->ring);
> +	}
> +	rq->tail = rq->ring->tail;
> +	rq->emitted_jiffies = jiffies;
> +
> +	intel_runtime_pm_get(rq->i915);
> +
> +	trace_i915_gem_request_commit(rq);
> +	ret = rq->engine->add_request(rq);
> +	if (ret) {
> +		intel_runtime_pm_put(rq->i915);
> +		return ret;
> +	}
> +
> +	i915_request_get(rq);
> +
> +	rq->outstanding = false;
> +	if (rq->breadcrumb[rq->engine->id]) {
> +		list_add_tail(&rq->breadcrumb_link, &rq->ring->breadcrumbs);
> +		rq->ring->breadcrumb_tail = rq->tail;
> +	}
> +
> +	memcpy(rq->engine->semaphore.sync,
> +	       rq->semaphore,
> +	       sizeof(rq->semaphore));
> +	for (n = 0; n < ARRAY_SIZE(rq->breadcrumb); n++)
> +		if (rq->breadcrumb[n])
> +			rq->engine->breadcrumb[n] = rq->breadcrumb[n];
> +
> +	rq->ring->pending_flush = rq->pending_flush;
> +
> +	if (rq->batch) {
> +		add_to_client(rq);
> +		while (!list_empty(&rq->vmas)) {
> +			struct i915_gem_request__vma *ref =
> +				list_first_entry(&rq->vmas, typeof(*ref), link);
> +
> +			add_to_obj(rq, ref);
> +			vma_free(ref);
> +		}
> +	}
> +
> +	i915_request_switch_context__commit(rq);
> +
> +	rq->engine->last_request = rq;
> +done:
> +	rq->ring->last_context = rq->ctx;
> +	return 0;
> +}
> +
> +static void fake_irq(unsigned long data)
> +{
> +	wake_up_process((struct task_struct *)data);
> +}
> +
> +static bool missed_irq(struct i915_gem_request *rq)
> +{
> +	return test_bit(rq->engine->id, &rq->i915->gpu_error.missed_irq_rings);
> +}
> +
> +static bool can_wait_boost(struct drm_i915_file_private *file_priv)
> +{
> +	if (file_priv == NULL)
> +		return true;
> +
> +	return !atomic_xchg(&file_priv->rps_wait_boost, true);
> +}
> +
> +bool __i915_request_complete__wa(struct i915_gem_request *rq)
> +{
> +	struct drm_i915_private *dev_priv = rq->i915;
> +	unsigned head, tail;
> +
> +	if (i915_request_complete(rq))
> +		return true;
> +
> +	/* With execlists, we rely on interrupts to track request completion */
> +	if (rq->engine->execlists_enabled)
> +		return false;
> +
> +	/* As we may not emit a breadcrumb with every request, we
> +	 * often have unflushed requests. In the event of an emergency,
> +	 * just assume that if the RING_HEAD has reached the tail, then
> +	 * the request is complete. However, note that the RING_HEAD
> +	 * advances before the instruction completes, so this is quite lax,
> +	 * and should only be used carefully.
> +	 *
> +	 * As we treat this as only an advisory completion, we forgo
> +	 * marking the request as actually complete.
> +	 */
> +	head = __intel_ring_space(I915_READ_HEAD(rq->engine) & HEAD_ADDR,
> +				  rq->ring->tail, rq->ring->size, 0);
> +	tail = __intel_ring_space(rq->tail,
> +				  rq->ring->tail, rq->ring->size, 0);
> +	return head >= tail;
> +}
> +
> +/**
> + * __i915_request_wait - wait until execution of request has finished
> + * @request: the request to wait upon
> + * @interruptible: do an interruptible wait (normally yes)
> + * @timeout_ns: in - how long to wait (NULL forever); out - how much time remaining
> + *
> + * Returns 0 if the request was completed within the alloted time. Else returns the
> + * errno with remaining time filled in timeout argument.
> + */
> +int __i915_request_wait(struct i915_gem_request *rq,
> +			bool interruptible,
> +			s64 *timeout_ns,
> +			struct drm_i915_file_private *file_priv)
> +{
> +	const bool irq_test_in_progress =
> +		ACCESS_ONCE(rq->i915->gpu_error.test_irq_rings) & intel_engine_flag(rq->engine);
> +	DEFINE_WAIT(wait);
> +	unsigned long timeout_expire;
> +	unsigned long before, now;
> +	int ret = 0;
> +
> +	WARN(!intel_irqs_enabled(rq->i915), "IRQs disabled");
> +
> +	if (i915_request_complete(rq))
> +		return 0;
> +
> +	timeout_expire = timeout_ns ? jiffies + nsecs_to_jiffies((u64)*timeout_ns) : 0;
> +
> +	if (INTEL_INFO(rq->i915)->gen >= 6 && rq->engine->id == RCS && can_wait_boost(file_priv)) {
> +		gen6_rps_boost(rq->i915);
> +		if (file_priv)
> +			mod_delayed_work(rq->i915->wq,
> +					 &file_priv->mm.idle_work,
> +					 msecs_to_jiffies(100));
> +	}
> +
> +	if (!irq_test_in_progress && WARN_ON(!rq->engine->irq_get(rq->engine)))
> +		return -ENODEV;
> +
> +	/* Record current time in case interrupted by signal, or wedged */
> +	trace_i915_gem_request_wait_begin(rq);
> +	before = jiffies;
> +	for (;;) {
> +		struct timer_list timer;
> +
> +		prepare_to_wait(&rq->engine->irq_queue, &wait,
> +				interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE);
> +
> +		if (!check_reset(rq))
> +			break;
> +
> +		rq->engine->irq_barrier(rq->engine);
> +
> +		if (i915_request_complete(rq))
> +			break;
> +
> +		if (timeout_ns && time_after_eq(jiffies, timeout_expire)) {
> +			ret = -ETIME;
> +			break;
> +		}
> +
> +		if (interruptible && signal_pending(current)) {
> +			ret = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		timer.function = NULL;
> +		if (timeout_ns || missed_irq(rq)) {
> +			unsigned long expire;
> +
> +			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
> +			expire = missed_irq(rq) ? jiffies + 1 : timeout_expire;
> +			mod_timer(&timer, expire);
> +		}
> +
> +		io_schedule();
> +
> +		if (timer.function) {
> +			del_singleshot_timer_sync(&timer);
> +			destroy_timer_on_stack(&timer);
> +		}
> +	}
> +	now = jiffies;
> +	trace_i915_gem_request_wait_end(rq);
> +
> +	if (!irq_test_in_progress)
> +		rq->engine->irq_put(rq->engine);
> +
> +	finish_wait(&rq->engine->irq_queue, &wait);
> +
> +	if (timeout_ns) {
> +		s64 tres = *timeout_ns - jiffies_to_nsecs(now - before);
> +		*timeout_ns = tres <= 0 ? 0 : tres;
> +	}
> +
> +	return ret;
> +}
> +
> +struct i915_gem_request *
> +i915_request_get_breadcrumb(struct i915_gem_request *rq)
> +{
> +	struct list_head *list;
> +	u32 seqno;
> +	int ret;
> +
> +	/* Writes are only coherent from the cpu (in the general case) when
> +	 * the interrupt following the write to memory is complete. That is
> +	 * when the breadcrumb after the write request is complete.
> +	 *
> +	 * Reads are only complete when then command streamer barrier is
> +	 * passed.
> +	 *
> +	 * In both cases, the CPU needs to wait upon the subsequent breadcrumb,
> +	 * which ensures that all pending flushes have been emitted and are
> +	 * complete, before reporting that the request is finished and
> +	 * the CPU's view of memory is coherent with the GPU.
> +	 */
> +
> +	ret = i915_request_emit_breadcrumb(rq);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	ret = i915_request_commit(rq);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	if (!list_empty(&rq->breadcrumb_link))
> +		return i915_request_get(rq);
> +
> +	seqno = rq->breadcrumb[rq->engine->id];
> +	list = &rq->ring->breadcrumbs;
> +	list_for_each_entry_reverse(rq, list, breadcrumb_link) {
> +		if (rq->seqno == seqno)
> +			return i915_request_get(rq);
> +	}
> +
> +	return ERR_PTR(-EIO);
> +}
> +
> +int
> +i915_request_wait(struct i915_gem_request *rq)
> +{
> +	int ret;
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	rq = i915_request_get_breadcrumb(rq);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	ret = __i915_request_wait(rq, rq->i915->mm.interruptible,
> +				  NULL, NULL);
> +	i915_request_put(rq);
> +
> +	return ret;
> +}
> +
> +void
> +i915_request_retire(struct i915_gem_request *rq)
> +{
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (!rq->completed) {
> +		trace_i915_gem_request_complete(rq);
> +		rq->completed = true;
> +	}
> +	trace_i915_gem_request_retire(rq);
> +
> +	/* We know the GPU must have read the request to have
> +	 * sent us the seqno + interrupt, we can use the position
> +	 * of tail of the request to update the last known position
> +	 * of the GPU head.
> +	 */
> +	if (!list_empty(&rq->breadcrumb_link))
> +		rq->ring->retired_head = rq->tail;
> +
> +	rq->batch = NULL;
> +
> +	/* We need to protect against simultaneous hangcheck/capture */
> +	spin_lock(&rq->engine->lock);
> +	if (rq->engine->last_request == rq)
> +		rq->engine->last_request = NULL;
> +	list_del(&rq->engine_list);
> +	spin_unlock(&rq->engine->lock);
> +
> +	list_del(&rq->breadcrumb_link);
> +	remove_from_client(rq);
> +
> +	intel_runtime_pm_put(rq->i915);
> +	i915_request_put(rq);
> +}
> +
> +void
> +__i915_request_free(struct kref *kref)
> +{
> +	struct i915_gem_request *rq = container_of(kref, struct i915_gem_request, kref);
> +
> +	lockdep_assert_held(&rq->i915->dev->struct_mutex);
> +
> +	if (rq->outstanding) {
> +		/* Rollback this partial transaction as we never committed
> +		 * the request to the hardware queue.
> +		 */
> +		rq->ring->tail = rq->head;
> +		rq->ring->space = intel_ring_space(rq->ring);
> +	}
> +
> +	while (!list_empty(&rq->vmas))
> +		vma_free(list_first_entry(&rq->vmas,
> +					  struct i915_gem_request__vma,
> +					  link));
> +
> +	i915_request_switch_context__undo(rq);
> +	i915_gem_context_unreference(rq->ctx);
> +	kfree(rq);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
> index 2cefb597df6d..a48355c4ef88 100644
> --- a/drivers/gpu/drm/i915/i915_gem_tiling.c
> +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
> @@ -383,7 +383,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>   
>   		if (ret == 0) {
>   			obj->fence_dirty =
> -				obj->last_fenced_seqno ||
> +				obj->last_fence.request ||
>   				obj->fence_reg != I915_FENCE_REG_NONE;
>   
>   			obj->tiling_mode = args->tiling_mode;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2c87a797213f..adb6358a8f6e 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -192,15 +192,18 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
>   				struct drm_i915_error_buffer *err,
>   				int count)
>   {
> -	err_printf(m, "  %s [%d]:\n", name, count);
> +	int n;
>   
> +	err_printf(m, "  %s [%d]:\n", name, count);
>   	while (count--) {
> -		err_printf(m, "    %08x %8u %02x %02x %x %x",
> +		err_printf(m, "    %08x %8u %02x %02x [",
>   			   err->gtt_offset,
>   			   err->size,
>   			   err->read_domains,
> -			   err->write_domain,
> -			   err->rseqno, err->wseqno);
> +			   err->write_domain);
> +		for (n = 0; n < ARRAY_SIZE(err->rseqno); n++)
> +			err_printf(m, " %x", err->rseqno[n]);
> +		err_printf(m, " ] %x %x ", err->wseqno, err->fseqno);
>   		err_puts(m, pin_flag(err->pinned));
>   		err_puts(m, tiling_flag(err->tiling));
>   		err_puts(m, dirty_flag(err->dirty));
> @@ -220,11 +223,13 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
>   	}
>   }
>   
> -static const char *hangcheck_action_to_str(enum intel_ring_hangcheck_action a)
> +static const char *hangcheck_action_to_str(enum intel_engine_hangcheck_action a)
>   {
>   	switch (a) {
>   	case HANGCHECK_IDLE:
>   		return "idle";
> +	case HANGCHECK_IDLE_WAITERS:
> +		return "idle (with waiters)";
>   	case HANGCHECK_WAIT:
>   		return "wait";
>   	case HANGCHECK_ACTIVE:
> @@ -244,13 +249,19 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
>   				  struct drm_device *dev,
>   				  struct drm_i915_error_ring *ring)
>   {
> +	int n;
> +
>   	if (!ring->valid)
>   		return;
>   
> -	err_printf(m, "  HEAD: 0x%08x\n", ring->head);
> -	err_printf(m, "  TAIL: 0x%08x\n", ring->tail);
> -	err_printf(m, "  CTL: 0x%08x\n", ring->ctl);
> -	err_printf(m, "  HWS: 0x%08x\n", ring->hws);
> +	err_printf(m, "%s command stream:\n", ring_str(ring->id));
> +
> +	err_printf(m, "  START: 0x%08x\n", ring->start);
> +	err_printf(m, "  HEAD:  0x%08x\n", ring->head);
> +	err_printf(m, "  TAIL:  0x%08x\n", ring->tail);
> +	err_printf(m, "  CTL:   0x%08x\n", ring->ctl);
> +	err_printf(m, "  MODE:  0x%08x [idle? %d]\n", ring->mode, !!(ring->mode & MODE_IDLE));
> +	err_printf(m, "  HWS:   0x%08x\n", ring->hws);
>   	err_printf(m, "  ACTHD: 0x%08x %08x\n", (u32)(ring->acthd>>32), (u32)ring->acthd);
>   	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
>   	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
> @@ -266,17 +277,13 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
>   	if (INTEL_INFO(dev)->gen >= 6) {
>   		err_printf(m, "  RC PSMI: 0x%08x\n", ring->rc_psmi);
>   		err_printf(m, "  FAULT_REG: 0x%08x\n", ring->fault_reg);
> -		err_printf(m, "  SYNC_0: 0x%08x [last synced 0x%08x]\n",
> -			   ring->semaphore_mboxes[0],
> -			   ring->semaphore_seqno[0]);
> -		err_printf(m, "  SYNC_1: 0x%08x [last synced 0x%08x]\n",
> -			   ring->semaphore_mboxes[1],
> -			   ring->semaphore_seqno[1]);
> -		if (HAS_VEBOX(dev)) {
> -			err_printf(m, "  SYNC_2: 0x%08x [last synced 0x%08x]\n",
> -				   ring->semaphore_mboxes[2],
> -				   ring->semaphore_seqno[2]);
> -		}
> +		err_printf(m, "  SYNC_0: 0x%08x\n",
> +			   ring->semaphore_mboxes[0]);
> +		err_printf(m, "  SYNC_1: 0x%08x\n",
> +			   ring->semaphore_mboxes[1]);
> +		if (HAS_VEBOX(dev))
> +			err_printf(m, "  SYNC_2: 0x%08x\n",
> +				   ring->semaphore_mboxes[2]);
>   	}
>   	if (USES_PPGTT(dev)) {
>   		err_printf(m, "  GFX_MODE: 0x%08x\n", ring->vm_info.gfx_mode);
> @@ -291,8 +298,20 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
>   				   ring->vm_info.pp_dir_base);
>   		}
>   	}
> -	err_printf(m, "  seqno: 0x%08x\n", ring->seqno);
> -	err_printf(m, "  waiting: %s\n", yesno(ring->waiting));
> +	err_printf(m, "  tag: 0x%04x\n", ring->tag);
> +	err_printf(m, "  seqno: 0x%08x [hangcheck 0x%08x, breadcrumb 0x%08x, request 0x%08x]\n",
> +		   ring->seqno, ring->hangcheck, ring->breadcrumb[ring->id], ring->request);
> +	err_printf(m, "  sem.signal: [");
> +	for (n = 0; n < ARRAY_SIZE(ring->breadcrumb); n++)
> +		err_printf(m, " %s%08x", n == ring->id ? "*" : "", ring->breadcrumb[n]);
> +	err_printf(m, " ]\n");
> +	err_printf(m, "  sem.waited: [");
> +	for (n = 0; n < ARRAY_SIZE(ring->semaphore_sync); n++)
> +		err_printf(m, " %s%08x", n == ring->id ? "*" : "", ring->semaphore_sync[n]);
> +	err_printf(m, " ]\n");
> +	err_printf(m, "  waiting: %s [irq count %d]\n",
> +		   yesno(ring->waiting), ring->irq_count);
> +	err_printf(m, "  interrupts: %d\n", ring->interrupts);
>   	err_printf(m, "  ring->head: 0x%08x\n", ring->cpu_ring_head);
>   	err_printf(m, "  ring->tail: 0x%08x\n", ring->cpu_ring_tail);
>   	err_printf(m, "  hangcheck: %s [%d]\n",
> @@ -362,11 +381,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   	err_printf(m, "EIR: 0x%08x\n", error->eir);
>   	err_printf(m, "IER: 0x%08x\n", error->ier);
>   	if (INTEL_INFO(dev)->gen >= 8) {
> -		for (i = 0; i < 4; i++)
> +		for (i = 0; i < 4; i++) {
>   			err_printf(m, "GTIER gt %d: 0x%08x\n", i,
>   				   error->gtier[i]);
> -	} else if (HAS_PCH_SPLIT(dev) || IS_VALLEYVIEW(dev))
> +			err_printf(m, "GTIMR gt %d: 0x%08x\n", i,
> +				   error->gtimr[i]);
> +		}
> +	} else if (HAS_PCH_SPLIT(dev) || IS_VALLEYVIEW(dev)) {
>   		err_printf(m, "GTIER: 0x%08x\n", error->gtier[0]);
> +		err_printf(m, "GTIMR: 0x%08x\n", error->gtimr[0]);
> +	}
>   	err_printf(m, "PGTBL_ER: 0x%08x\n", error->pgtbl_er);
>   	err_printf(m, "FORCEWAKE: 0x%08x\n", error->forcewake);
>   	err_printf(m, "DERRMR: 0x%08x\n", error->derrmr);
> @@ -388,10 +412,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   	if (INTEL_INFO(dev)->gen == 7)
>   		err_printf(m, "ERR_INT: 0x%08x\n", error->err_int);
>   
> -	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
> -		err_printf(m, "%s command stream:\n", ring_str(i));
> +	for (i = 0; i < ARRAY_SIZE(error->ring); i++)
>   		i915_ring_error_state(m, dev, &error->ring[i]);
> -	}
>   
>   	for (i = 0; i < error->vm_count; i++) {
>   		err_printf(m, "vm[%d]\n", i);
> @@ -406,48 +428,53 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   	}
>   
>   	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
> -		obj = error->ring[i].batchbuffer;
> +		const struct drm_i915_error_ring *ering = &error->ring[i];
> +		const char *name = dev_priv->engine[ering->id].name;
> +
> +		obj = ering->batchbuffer;
>   		if (obj) {
> -			err_puts(m, dev_priv->ring[i].name);
> -			if (error->ring[i].pid != -1)
> +			err_puts(m, name);
> +			if (ering->pid != -1)
>   				err_printf(m, " (submitted by %s [%d])",
> -					   error->ring[i].comm,
> -					   error->ring[i].pid);
> +					   ering->comm, ering->pid);
>   			err_printf(m, " --- gtt_offset = 0x%08x\n",
>   				   obj->gtt_offset);
>   			print_error_obj(m, obj);
>   		}
>   
> -		obj = error->ring[i].wa_batchbuffer;
> +		obj = ering->wa_batchbuffer;
>   		if (obj) {
>   			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
> -				   dev_priv->ring[i].name, obj->gtt_offset);
> +				   name, obj->gtt_offset);
>   			print_error_obj(m, obj);
>   		}
>   
> -		if (error->ring[i].num_requests) {
> +		if (ering->num_requests) {
>   			err_printf(m, "%s --- %d requests\n",
> -				   dev_priv->ring[i].name,
> -				   error->ring[i].num_requests);
> -			for (j = 0; j < error->ring[i].num_requests; j++) {
> -				err_printf(m, "  seqno 0x%08x, emitted %ld, tail 0x%08x\n",
> -					   error->ring[i].requests[j].seqno,
> -					   error->ring[i].requests[j].jiffies,
> -					   error->ring[i].requests[j].tail);
> +				   name, ering->num_requests);
> +			for (j = 0; j < ering->num_requests; j++) {
> +				err_printf(m, "  pid %ld, seqno 0x%08x, tag 0x%04x, emitted %dms ago (at %ld jiffies), head 0x%08x, tail 0x%08x, batch 0x%08x, complete? %d\n",
> +					   ering->requests[j].pid,
> +					   ering->requests[j].seqno,
> +					   ering->requests[j].tag,
> +					   jiffies_to_usecs(jiffies - ering->requests[j].jiffies) / 1000,
> +					   ering->requests[j].jiffies,
> +					   ering->requests[j].head,
> +					   ering->requests[j].tail,
> +					   ering->requests[j].batch,
> +					   ering->requests[j].complete);
>   			}
>   		}
>   
> -		if ((obj = error->ring[i].ringbuffer)) {
> +		if ((obj = ering->ringbuffer)) {
>   			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
> -				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   name, obj->gtt_offset);
>   			print_error_obj(m, obj);
>   		}
>   
> -		if ((obj = error->ring[i].hws_page)) {
> +		if ((obj = ering->hws_page)) {
>   			err_printf(m, "%s --- HW Status = 0x%08x\n",
> -				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   name, obj->gtt_offset);
>   			offset = 0;
>   			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
>   				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> @@ -462,8 +489,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   
>   		if ((obj = error->ring[i].ctx)) {
>   			err_printf(m, "%s --- HW Context = 0x%08x\n",
> -				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   name, obj->gtt_offset);
>   			print_error_obj(m, obj);
>   		}
>   	}
> @@ -561,16 +587,20 @@ static void i915_error_state_free(struct kref *error_ref)
>   
>   static struct drm_i915_error_object *
>   i915_error_object_create(struct drm_i915_private *dev_priv,
> -			 struct drm_i915_gem_object *src,
> -			 struct i915_address_space *vm)
> +			 struct i915_vma *vma)
>   {
> +	struct drm_i915_gem_object *src;
>   	struct drm_i915_error_object *dst;
>   	int num_pages;
>   	bool use_ggtt;
>   	int i = 0;
>   	u32 reloc_offset;
>   
> -	if (src == NULL || src->pages == NULL)
> +	if (vma == NULL)
> +		return NULL;
> +
> +	src = vma->obj;
> +	if (src->pages == NULL)
>   		return NULL;
>   
>   	num_pages = src->base.size >> PAGE_SHIFT;
> @@ -579,14 +609,11 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
>   	if (dst == NULL)
>   		return NULL;
>   
> -	if (i915_gem_obj_bound(src, vm))
> -		dst->gtt_offset = i915_gem_obj_offset(src, vm);
> -	else
> -		dst->gtt_offset = -1;
> +	dst->gtt_offset = vma->node.start;
>   
>   	reloc_offset = dst->gtt_offset;
>   	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
> -		    i915_is_ggtt(vm) &&
> +		    i915_is_ggtt(vma->vm) &&
>   		    src->has_global_gtt_mapping &&
>   		    reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
>   
> @@ -656,18 +683,31 @@ unwind:
>   	kfree(dst);
>   	return NULL;
>   }
> -#define i915_error_ggtt_object_create(dev_priv, src) \
> -	i915_error_object_create((dev_priv), (src), &(dev_priv)->gtt.base)
> +
> +static inline struct drm_i915_error_object *
> +i915_error_ggtt_object_create(struct drm_i915_private *i915,
> +			      struct drm_i915_gem_object *src)
> +{
> +	if (src == NULL)
> +		return NULL;
> +
> +	return i915_error_object_create(i915,
> +					i915_gem_obj_to_vma(src,
> +							    &i915->gtt.base));
> +}
>   
>   static void capture_bo(struct drm_i915_error_buffer *err,
>   		       struct i915_vma *vma)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
> +	int n;
>   
>   	err->size = obj->base.size;
>   	err->name = obj->base.name;
> -	err->rseqno = obj->last_read_seqno;
> -	err->wseqno = obj->last_write_seqno;
> +	for (n = 0; n < ARRAY_SIZE(obj->last_read); n++)
> +		err->rseqno[n] = i915_request_seqno(obj->last_read[n].request);
> +	err->wseqno = i915_request_seqno(obj->last_write.request);
> +	err->fseqno = i915_request_seqno(obj->last_fence.request);
>   	err->gtt_offset = vma->node.start;
>   	err->read_domains = obj->base.read_domains;
>   	err->write_domain = obj->base.write_domain;
> @@ -681,7 +721,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>   	err->dirty = obj->dirty;
>   	err->purgeable = obj->madv != I915_MADV_WILLNEED;
>   	err->userptr = obj->userptr.mm != NULL;
> -	err->ring = obj->ring ? obj->ring->id : -1;
> +	err->ring = i915_request_engine_id(obj->last_write.request);
>   	err->cache_level = obj->cache_level;
>   }
>   
> @@ -745,7 +785,7 @@ static uint32_t i915_error_generate_code(struct drm_i915_private *dev_priv,
>   	 * synchronization commands which almost always appear in the case
>   	 * strictly a client bug. Use instdone to differentiate those some.
>   	 */
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
>   		if (error->ring[i].hangcheck_action == HANGCHECK_HUNG) {
>   			if (ring_id)
>   				*ring_id = i;
> @@ -793,83 +833,77 @@ static void i915_gem_record_fences(struct drm_device *dev,
>   
>   static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
>   					struct drm_i915_error_state *error,
> -					struct intel_engine_cs *ring,
> +					struct intel_engine_cs *engine,
>   					struct drm_i915_error_ring *ering)
>   {
>   	struct intel_engine_cs *to;
> +	u32 *mbox;
>   	int i;
>   
> -	if (!i915_semaphore_is_enabled(dev_priv->dev))
> +	if (dev_priv->semaphore_obj == NULL)
>   		return;
>   
> -	if (!error->semaphore_obj)
> +	if (error->semaphore_obj == NULL)
>   		error->semaphore_obj =
> -			i915_error_object_create(dev_priv,
> -						 dev_priv->semaphore_obj,
> -						 &dev_priv->gtt.base);
> -
> -	for_each_ring(to, dev_priv, i) {
> -		int idx;
> -		u16 signal_offset;
> -		u32 *tmp;
> +			i915_error_ggtt_object_create(dev_priv,
> +						      dev_priv->semaphore_obj);
> +	if (error->semaphore_obj == NULL)
> +		return;
>   
> -		if (ring == to)
> +	mbox = error->semaphore_obj->pages[0];
> +	for_each_engine(to, dev_priv, i) {
> +		if (engine == to)
>   			continue;
>   
> -		signal_offset = (GEN8_SIGNAL_OFFSET(ring, i) & (PAGE_SIZE - 1))
> -				/ 4;
> -		tmp = error->semaphore_obj->pages[0];
> -		idx = intel_ring_sync_index(ring, to);
> -
> -		ering->semaphore_mboxes[idx] = tmp[signal_offset];
> -		ering->semaphore_seqno[idx] = ring->semaphore.sync_seqno[idx];
> +		ering->semaphore_mboxes[i] =
> +			mbox[GEN8_SEMAPHORE_OFFSET(dev_priv,
> +						   engine->id,
> +						   i) & (PAGE_SIZE - 1) / 4];
>   	}
>   }
>   
>   static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
> -					struct intel_engine_cs *ring,
> +					struct intel_engine_cs *engine,
>   					struct drm_i915_error_ring *ering)
>   {
> -	ering->semaphore_mboxes[0] = I915_READ(RING_SYNC_0(ring->mmio_base));
> -	ering->semaphore_mboxes[1] = I915_READ(RING_SYNC_1(ring->mmio_base));
> -	ering->semaphore_seqno[0] = ring->semaphore.sync_seqno[0];
> -	ering->semaphore_seqno[1] = ring->semaphore.sync_seqno[1];
> -
> +	ering->semaphore_mboxes[0] = I915_READ(RING_SYNC_0(engine->mmio_base));
> +	ering->semaphore_mboxes[1] = I915_READ(RING_SYNC_1(engine->mmio_base));
>   	if (HAS_VEBOX(dev_priv->dev)) {
>   		ering->semaphore_mboxes[2] =
> -			I915_READ(RING_SYNC_2(ring->mmio_base));
> -		ering->semaphore_seqno[2] = ring->semaphore.sync_seqno[2];
> +			I915_READ(RING_SYNC_2(engine->mmio_base));
>   	}
>   }
>   
>   static void i915_record_ring_state(struct drm_device *dev,
>   				   struct drm_i915_error_state *error,
> -				   struct intel_engine_cs *ring,
> +				   struct intel_engine_cs *engine,
> +				   struct i915_gem_request *rq,
>   				   struct drm_i915_error_ring *ering)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ringbuffer *ring;
>   
>   	if (INTEL_INFO(dev)->gen >= 6) {
> -		ering->rc_psmi = I915_READ(ring->mmio_base + 0x50);
> -		ering->fault_reg = I915_READ(RING_FAULT_REG(ring));
> +		ering->rc_psmi = I915_READ(engine->mmio_base + 0x50);
> +		ering->fault_reg = I915_READ(RING_FAULT_REG(engine));
>   		if (INTEL_INFO(dev)->gen >= 8)
> -			gen8_record_semaphore_state(dev_priv, error, ring, ering);
> +			gen8_record_semaphore_state(dev_priv, error, engine, ering);
>   		else
> -			gen6_record_semaphore_state(dev_priv, ring, ering);
> +			gen6_record_semaphore_state(dev_priv, engine, ering);
>   	}
>   
>   	if (INTEL_INFO(dev)->gen >= 4) {
> -		ering->faddr = I915_READ(RING_DMA_FADD(ring->mmio_base));
> -		ering->ipeir = I915_READ(RING_IPEIR(ring->mmio_base));
> -		ering->ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
> -		ering->instdone = I915_READ(RING_INSTDONE(ring->mmio_base));
> -		ering->instps = I915_READ(RING_INSTPS(ring->mmio_base));
> -		ering->bbaddr = I915_READ(RING_BBADDR(ring->mmio_base));
> +		ering->faddr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> +		ering->ipeir = I915_READ(RING_IPEIR(engine->mmio_base));
> +		ering->ipehr = I915_READ(RING_IPEHR(engine->mmio_base));
> +		ering->instdone = I915_READ(RING_INSTDONE(engine->mmio_base));
> +		ering->instps = I915_READ(RING_INSTPS(engine->mmio_base));
> +		ering->bbaddr = I915_READ(RING_BBADDR(engine->mmio_base));
>   		if (INTEL_INFO(dev)->gen >= 8) {
> -			ering->faddr |= (u64) I915_READ(RING_DMA_FADD_UDW(ring->mmio_base)) << 32;
> -			ering->bbaddr |= (u64) I915_READ(RING_BBADDR_UDW(ring->mmio_base)) << 32;
> +			ering->faddr |= (u64) I915_READ(RING_DMA_FADD_UDW(engine->mmio_base)) << 32;
> +			ering->bbaddr |= (u64) I915_READ(RING_BBADDR_UDW(engine->mmio_base)) << 32;
>   		}
> -		ering->bbstate = I915_READ(RING_BBSTATE(ring->mmio_base));
> +		ering->bbstate = I915_READ(RING_BBSTATE(engine->mmio_base));
>   	} else {
>   		ering->faddr = I915_READ(DMA_FADD_I8XX);
>   		ering->ipeir = I915_READ(IPEIR);
> @@ -877,19 +911,29 @@ static void i915_record_ring_state(struct drm_device *dev,
>   		ering->instdone = I915_READ(INSTDONE);
>   	}
>   
> -	ering->waiting = waitqueue_active(&ring->irq_queue);
> -	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
> -	ering->seqno = ring->get_seqno(ring, false);
> -	ering->acthd = intel_ring_get_active_head(ring);
> -	ering->head = I915_READ_HEAD(ring);
> -	ering->tail = I915_READ_TAIL(ring);
> -	ering->ctl = I915_READ_CTL(ring);
> +	ering->waiting = waitqueue_active(&engine->irq_queue);
> +	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
> +	ering->acthd = intel_engine_get_active_head(engine);
> +	ering->seqno = engine->get_seqno(engine);
> +	ering->request = engine->last_request ? engine->last_request->seqno : 0;
> +	ering->hangcheck = engine->hangcheck.seqno;
> +	memcpy(ering->breadcrumb, engine->breadcrumb, sizeof(ering->breadcrumb));
> +	memcpy(ering->semaphore_sync, engine->semaphore.sync, sizeof(ering->semaphore_sync));
> +	ering->tag = engine->tag;
> +	ering->interrupts = atomic_read(&engine->interrupts);
> +	ering->irq_count = engine->irq_refcount;
> +	ering->start = I915_READ_START(engine);
> +	ering->head = I915_READ_HEAD(engine);
> +	ering->tail = I915_READ_TAIL(engine);
> +	ering->ctl = I915_READ_CTL(engine);
> +	if (!IS_GEN2(dev_priv))
> +		ering->mode = I915_READ_MODE(engine);
>   
>   	if (I915_NEED_GFX_HWS(dev)) {
>   		int mmio;
>   
>   		if (IS_GEN7(dev)) {
> -			switch (ring->id) {
> +			switch (engine->id) {
>   			default:
>   			case RCS:
>   				mmio = RENDER_HWS_PGA_GEN7;
> @@ -904,56 +948,67 @@ static void i915_record_ring_state(struct drm_device *dev,
>   				mmio = VEBOX_HWS_PGA_GEN7;
>   				break;
>   			}
> -		} else if (IS_GEN6(ring->dev)) {
> -			mmio = RING_HWS_PGA_GEN6(ring->mmio_base);
> +		} else if (IS_GEN6(engine->i915)) {
> +			mmio = RING_HWS_PGA_GEN6(engine->mmio_base);
>   		} else {
>   			/* XXX: gen8 returns to sanity */
> -			mmio = RING_HWS_PGA(ring->mmio_base);
> +			mmio = RING_HWS_PGA(engine->mmio_base);
>   		}
>   
>   		ering->hws = I915_READ(mmio);
>   	}
>   
> -	ering->hangcheck_score = ring->hangcheck.score;
> -	ering->hangcheck_action = ring->hangcheck.action;
> +	ring = rq ? rq->ctx->ring[engine->id].ring : engine->default_context->ring[engine->id].ring;
> +	if (ring) {
> +		ering->cpu_ring_head = ring->head;
> +		ering->cpu_ring_tail = ring->tail;
> +		ering->ringbuffer =
> +			i915_error_ggtt_object_create(dev_priv, ring->obj);
> +	}
> +
> +	ering->hws_page =
> +		i915_error_ggtt_object_create(dev_priv,
> +					      engine->status_page.obj);
> +
> +	ering->hangcheck_score = engine->hangcheck.score;
> +	ering->hangcheck_action = engine->hangcheck.action;
>   
>   	if (USES_PPGTT(dev)) {
>   		int i;
>   
> -		ering->vm_info.gfx_mode = I915_READ(RING_MODE_GEN7(ring));
> +		ering->vm_info.gfx_mode = I915_READ(RING_MODE_GEN7(engine));
>   
>   		switch (INTEL_INFO(dev)->gen) {
>   		case 8:
>   			for (i = 0; i < 4; i++) {
>   				ering->vm_info.pdp[i] =
> -					I915_READ(GEN8_RING_PDP_UDW(ring, i));
> +					I915_READ(GEN8_RING_PDP_UDW(engine, i));
>   				ering->vm_info.pdp[i] <<= 32;
>   				ering->vm_info.pdp[i] |=
> -					I915_READ(GEN8_RING_PDP_LDW(ring, i));
> +					I915_READ(GEN8_RING_PDP_LDW(engine, i));
>   			}
>   			break;
>   		case 7:
>   			ering->vm_info.pp_dir_base =
> -				I915_READ(RING_PP_DIR_BASE(ring));
> +				I915_READ(RING_PP_DIR_BASE(engine));
>   			break;
>   		case 6:
>   			ering->vm_info.pp_dir_base =
> -				I915_READ(RING_PP_DIR_BASE_READ(ring));
> +				I915_READ(RING_PP_DIR_BASE_READ(engine));
>   			break;
>   		}
>   	}
>   }
>   
> -
> -static void i915_gem_record_active_context(struct intel_engine_cs *ring,
> +static void i915_gem_record_active_context(struct intel_engine_cs *engine,
>   					   struct drm_i915_error_state *error,
>   					   struct drm_i915_error_ring *ering)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	struct drm_i915_gem_object *obj;
>   
>   	/* Currently render ring is the only HW context user */
> -	if (ring->id != RCS || !error->ccid)
> +	if (engine->id != RCS || !error->ccid)
>   		return;
>   
>   	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> @@ -971,49 +1026,40 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   				  struct drm_i915_error_state *error)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_gem_request *request;
> +	struct i915_gem_request *rq;
>   	int i, count;
>   
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct intel_engine_cs *ring = &dev_priv->ring[i];
> -		struct intel_ringbuffer *rbuf;
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		struct intel_engine_cs *engine = &dev_priv->engine[i];
>   
>   		error->ring[i].pid = -1;
>   
> -		if (ring->dev == NULL)
> +		if (engine->i915 == NULL)
>   			continue;
>   
>   		error->ring[i].valid = true;
> +		error->ring[i].id = i;
>   
> -		i915_record_ring_state(dev, error, ring, &error->ring[i]);
> -
> -		request = i915_gem_find_active_request(ring);
> -		if (request) {
> -			struct i915_address_space *vm;
> -
> -			vm = request->ctx && request->ctx->ppgtt ?
> -				&request->ctx->ppgtt->base :
> -				&dev_priv->gtt.base;
> -
> +		spin_lock(&engine->lock);
> +		rq = intel_engine_find_active_batch(engine);
> +		if (rq) {
>   			/* We need to copy these to an anonymous buffer
>   			 * as the simplest method to avoid being overwritten
>   			 * by userspace.
>   			 */
>   			error->ring[i].batchbuffer =
> -				i915_error_object_create(dev_priv,
> -							 request->batch_obj,
> -							 vm);
> +				i915_error_object_create(dev_priv, rq->batch);
>   
>   			if (HAS_BROKEN_CS_TLB(dev_priv->dev))
>   				error->ring[i].wa_batchbuffer =
>   					i915_error_ggtt_object_create(dev_priv,
> -							     ring->scratch.obj);
> +							     engine->scratch.obj);
>   
> -			if (request->file_priv) {
> +			if (rq->file_priv) {
>   				struct task_struct *task;
>   
>   				rcu_read_lock();
> -				task = pid_task(request->file_priv->file->pid,
> +				task = pid_task(rq->file_priv->file->pid,
>   						PIDTYPE_PID);
>   				if (task) {
>   					strcpy(error->ring[i].comm, task->comm);
> @@ -1023,32 +1069,12 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   			}
>   		}
>   
> -		if (i915.enable_execlists) {
> -			/* TODO: This is only a small fix to keep basic error
> -			 * capture working, but we need to add more information
> -			 * for it to be useful (e.g. dump the context being
> -			 * executed).
> -			 */
> -			if (request)
> -				rbuf = request->ctx->engine[ring->id].ringbuf;
> -			else
> -				rbuf = ring->default_context->engine[ring->id].ringbuf;
> -		} else
> -			rbuf = ring->buffer;
> +		i915_record_ring_state(dev, error, engine, rq, &error->ring[i]);
>   
> -		error->ring[i].cpu_ring_head = rbuf->head;
> -		error->ring[i].cpu_ring_tail = rbuf->tail;
> -
> -		error->ring[i].ringbuffer =
> -			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
> -
> -		error->ring[i].hws_page =
> -			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
> -
> -		i915_gem_record_active_context(ring, error, &error->ring[i]);
> +		i915_gem_record_active_context(engine, error, &error->ring[i]);
>   
>   		count = 0;
> -		list_for_each_entry(request, &ring->request_list, list)
> +		list_for_each_entry(rq, &engine->requests, engine_list)
>   			count++;
>   
>   		error->ring[i].num_requests = count;
> @@ -1061,14 +1087,28 @@ static void i915_gem_record_rings(struct drm_device *dev,
>   		}
>   
>   		count = 0;
> -		list_for_each_entry(request, &ring->request_list, list) {
> +		list_for_each_entry(rq, &engine->requests, engine_list) {
>   			struct drm_i915_error_request *erq;
> +			struct task_struct *task;
>   
>   			erq = &error->ring[i].requests[count++];
> -			erq->seqno = request->seqno;
> -			erq->jiffies = request->emitted_jiffies;
> -			erq->tail = request->tail;
> +			erq->seqno = rq->seqno;
> +			erq->jiffies = rq->emitted_jiffies;
> +			erq->head = rq->head;
> +			erq->tail = rq->tail;
> +			erq->batch = 0;
> +			if (rq->batch)
> +				erq->batch = rq->batch->node.start;
> +			memcpy(erq->breadcrumb, rq->breadcrumb, sizeof(rq->breadcrumb));
> +			erq->complete = i915_request_complete(rq);
> +			erq->tag = rq->tag;
> +
> +			rcu_read_lock();
> +			task = rq->file_priv ? pid_task(rq->file_priv->file->pid, PIDTYPE_PID) : NULL;
> +			erq->pid = task ? task->pid : 0;
> +			rcu_read_unlock();
>   		}
> +		spin_unlock(&engine->lock);
>   	}
>   }
>   
> @@ -1175,6 +1215,7 @@ static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
>   	/* 1: Registers specific to a single generation */
>   	if (IS_VALLEYVIEW(dev)) {
>   		error->gtier[0] = I915_READ(GTIER);
> +		error->gtimr[0] = I915_READ(GTIMR);
>   		error->ier = I915_READ(VLV_IER);
>   		error->forcewake = I915_READ(FORCEWAKE_VLV);
>   	}
> @@ -1210,11 +1251,14 @@ static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
>   
>   	if (INTEL_INFO(dev)->gen >= 8) {
>   		error->ier = I915_READ(GEN8_DE_MISC_IER);
> -		for (i = 0; i < 4; i++)
> +		for (i = 0; i < 4; i++) {
>   			error->gtier[i] = I915_READ(GEN8_GT_IER(i));
> +			error->gtimr[i] = I915_READ(GEN8_GT_IMR(i));
> +		}
>   	} else if (HAS_PCH_SPLIT(dev)) {
>   		error->ier = I915_READ(DEIER);
>   		error->gtier[0] = I915_READ(GTIER);
> +		error->gtimr[0] = I915_READ(GTIMR);
>   	} else if (IS_GEN2(dev)) {
>   		error->ier = I915_READ16(IER);
>   	} else if (!IS_VALLEYVIEW(dev)) {
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a40a8c9f9758..71bdd9b3784f 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1256,17 +1256,15 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
>   }
>   
>   static void notify_ring(struct drm_device *dev,
> -			struct intel_engine_cs *ring)
> +			struct intel_engine_cs *engine)
>   {
> -	if (!intel_ring_initialized(ring))
> +	if (!intel_engine_initialized(engine))
>   		return;
>   
> -	trace_i915_gem_request_complete(ring);
> +	trace_i915_gem_ring_complete(engine);
> +	atomic_inc(&engine->interrupts);
>   
> -	if (drm_core_check_feature(dev, DRIVER_MODESET))
> -		intel_notify_mmio_flip(ring);
> -
> -	wake_up_all(&ring->irq_queue);
> +	wake_up_all(&engine->irq_queue);
>   	i915_queue_hangcheck(dev);
>   }
>   
> @@ -1584,9 +1582,9 @@ static void ilk_gt_irq_handler(struct drm_device *dev,
>   {
>   	if (gt_iir &
>   	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
> -		notify_ring(dev, &dev_priv->ring[RCS]);
> +		notify_ring(dev, &dev_priv->engine[RCS]);
>   	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[VCS]);
> +		notify_ring(dev, &dev_priv->engine[VCS]);
>   }
>   
>   static void snb_gt_irq_handler(struct drm_device *dev,
> @@ -1596,11 +1594,11 @@ static void snb_gt_irq_handler(struct drm_device *dev,
>   
>   	if (gt_iir &
>   	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
> -		notify_ring(dev, &dev_priv->ring[RCS]);
> +		notify_ring(dev, &dev_priv->engine[RCS]);
>   	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[VCS]);
> +		notify_ring(dev, &dev_priv->engine[VCS]);
>   	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[BCS]);
> +		notify_ring(dev, &dev_priv->engine[BCS]);
>   
>   	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>   		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -1630,7 +1628,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   				       struct drm_i915_private *dev_priv,
>   				       u32 master_ctl)
>   {
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	u32 rcs, bcs, vcs;
>   	uint32_t tmp = 0;
>   	irqreturn_t ret = IRQ_NONE;
> @@ -1642,18 +1640,18 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   			ret = IRQ_HANDLED;
>   
>   			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[RCS];
> +			engine = &dev_priv->engine[RCS];
>   			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +				notify_ring(dev, engine);
>   			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_execlists_handle_ctx_events(ring);
> +				intel_execlists_irq_handler(engine);
>   
>   			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[BCS];
> +			engine = &dev_priv->engine[BCS];
>   			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +				notify_ring(dev, engine);
>   			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_execlists_handle_ctx_events(ring);
> +				intel_execlists_irq_handler(engine);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>   	}
> @@ -1665,18 +1663,18 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   			ret = IRQ_HANDLED;
>   
>   			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VCS];
> +			engine = &dev_priv->engine[VCS];
>   			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +				notify_ring(dev, engine);
>   			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_execlists_handle_ctx_events(ring);
> +				intel_execlists_irq_handler(engine);
>   
>   			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VCS2];
> +			engine = &dev_priv->engine[VCS2];
>   			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +				notify_ring(dev, engine);
>   			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_execlists_handle_ctx_events(ring);
> +				intel_execlists_irq_handler(engine);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>   	}
> @@ -1699,11 +1697,11 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   			ret = IRQ_HANDLED;
>   
>   			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VECS];
> +			engine = &dev_priv->engine[VECS];
>   			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +				notify_ring(dev, engine);
>   			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_execlists_handle_ctx_events(ring);
> +				intel_execlists_irq_handler(engine);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>   	}
> @@ -2021,7 +2019,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
>   
>   	if (HAS_VEBOX(dev_priv->dev)) {
>   		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -			notify_ring(dev_priv->dev, &dev_priv->ring[VECS]);
> +			notify_ring(dev_priv->dev, &dev_priv->engine[VECS]);
>   
>   		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT) {
>   			i915_handle_error(dev_priv->dev, false,
> @@ -2654,7 +2652,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>   static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>   			       bool reset_completed)
>   {
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int i;
>   
>   	/*
> @@ -2665,8 +2663,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>   	 */
>   
>   	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
> -	for_each_ring(ring, dev_priv, i)
> -		wake_up_all(&ring->irq_queue);
> +	for_each_engine(engine, dev_priv, i)
> +		wake_up_all(&engine->irq_queue);
>   
>   	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
>   	wake_up_all(&dev_priv->pending_flip_queue);
> @@ -2710,7 +2708,7 @@ static void i915_error_work_func(struct work_struct *work)
>   	 * the reset in-progress bit is only ever set by code outside of this
>   	 * work we don't need to worry about any other races.
>   	 */
> -	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
> +	if (i915_recovery_pending(error)) {
>   		DRM_DEBUG_DRIVER("resetting chip\n");
>   		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>   				   reset_event);
> @@ -2746,9 +2744,7 @@ static void i915_error_work_func(struct work_struct *work)
>   			 * updates before
>   			 * the counter increment.
>   			 */
> -			smp_mb__before_atomic();
> -			atomic_inc(&dev_priv->gpu_error.reset_counter);
> -
> +			smp_mb__after_atomic();
>   			kobject_uevent_env(&dev->primary->kdev->kobj,
>   					   KOBJ_CHANGE, reset_done_event);
>   		} else {
> @@ -3033,24 +3029,28 @@ static void gen8_disable_vblank(struct drm_device *dev, int pipe)
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
>   }
>   
> -static u32
> -ring_last_seqno(struct intel_engine_cs *ring)
> -{
> -	return list_entry(ring->request_list.prev,
> -			  struct drm_i915_gem_request, list)->seqno;
> -}
> -
>   static bool
> -ring_idle(struct intel_engine_cs *ring, u32 seqno)
> +engine_idle(struct intel_engine_cs *engine)
>   {
> -	return (list_empty(&ring->request_list) ||
> -		i915_seqno_passed(seqno, ring_last_seqno(ring)));
> +	bool ret = true;
> +
> +	spin_lock(&engine->lock);
> +	if (engine->last_request) {
> +		/* poke to make sure we retire before we wake up again */
> +		queue_delayed_work(engine->i915->wq,
> +				   &engine->i915->mm.retire_work,
> +				   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES/2));
> +		ret = __i915_request_complete__wa(engine->last_request);
> +	}
> +	spin_unlock(&engine->lock);
> +
> +	return ret;
>   }
>   
>   static bool
> -ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
> +ipehr_is_semaphore_wait(struct drm_i915_private *i915, u32 ipehr)
>   {
> -	if (INTEL_INFO(dev)->gen >= 8) {
> +	if (INTEL_INFO(i915)->gen >= 8) {
>   		return (ipehr >> 23) == 0x1c;
>   	} else {
>   		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
> @@ -3060,48 +3060,54 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
>   }
>   
>   static struct intel_engine_cs *
> -semaphore_wait_to_signaller_ring(struct intel_engine_cs *ring, u32 ipehr, u64 offset)
> +semaphore_wait_to_signaller_engine(struct intel_engine_cs *engine, u32 ipehr, u64 offset)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	struct intel_engine_cs *signaller;
>   	int i;
>   
>   	if (INTEL_INFO(dev_priv->dev)->gen >= 8) {
> -		for_each_ring(signaller, dev_priv, i) {
> -			if (ring == signaller)
> +		for_each_engine(signaller, dev_priv, i) {
> +			if (engine == signaller)
>   				continue;
>   
> -			if (offset == signaller->semaphore.signal_ggtt[ring->id])
> +			if (offset == GEN8_SEMAPHORE_OFFSET(dev_priv, signaller->id, engine->id))
>   				return signaller;
>   		}
>   	} else {
>   		u32 sync_bits = ipehr & MI_SEMAPHORE_SYNC_MASK;
>   
> -		for_each_ring(signaller, dev_priv, i) {
> -			if(ring == signaller)
> +		for_each_engine(signaller, dev_priv, i) {
> +			if(engine == signaller)
>   				continue;
>   
> -			if (sync_bits == signaller->semaphore.mbox.wait[ring->id])
> +			if (sync_bits == signaller->semaphore.mbox.wait[engine->id])
>   				return signaller;
>   		}
>   	}
>   
>   	DRM_ERROR("No signaller ring found for ring %i, ipehr 0x%08x, offset 0x%016llx\n",
> -		  ring->id, ipehr, offset);
> +		  engine->id, ipehr, offset);
>   
>   	return NULL;
>   }
>   
>   static struct intel_engine_cs *
> -semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno)
> +semaphore_waits_for(struct intel_engine_cs *engine, u32 *seqno)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct intel_ringbuffer *ring;
>   	u32 cmd, ipehr, head;
>   	u64 offset = 0;
>   	int i, backwards;
>   
> -	ipehr = I915_READ(RING_IPEHR(ring->mmio_base));
> -	if (!ipehr_is_semaphore_wait(ring->dev, ipehr))
> +	ipehr = I915_READ(RING_IPEHR(engine->mmio_base));
> +	if (!ipehr_is_semaphore_wait(engine->i915, ipehr))
> +		return NULL;
> +
> +	/* XXX execlists */
> +	ring = engine->default_context->ring[RCS].ring;
> +	if (ring == NULL)
>   		return NULL;
>   
>   	/*
> @@ -3112,19 +3118,19 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno)
>   	 * point at at batch, and semaphores are always emitted into the
>   	 * ringbuffer itself.
>   	 */
> -	head = I915_READ_HEAD(ring) & HEAD_ADDR;
> -	backwards = (INTEL_INFO(ring->dev)->gen >= 8) ? 5 : 4;
> +	head = I915_READ_HEAD(engine) & HEAD_ADDR;
> +	backwards = (INTEL_INFO(dev_priv)->gen >= 8) ? 5 : 4;
>   
>   	for (i = backwards; i; --i) {
>   		/*
>   		 * Be paranoid and presume the hw has gone off into the wild -
> -		 * our ring is smaller than what the hardware (and hence
> +		 * our engine is smaller than what the hardware (and hence
>   		 * HEAD_ADDR) allows. Also handles wrap-around.
>   		 */
> -		head &= ring->buffer->size - 1;
> +		head &= ring->size - 1;
>   
>   		/* This here seems to blow up */
> -		cmd = ioread32(ring->buffer->virtual_start + head);
> +		cmd = ioread32(ring->virtual_start + head);
>   		if (cmd == ipehr)
>   			break;
>   
> @@ -3134,32 +3140,37 @@ semaphore_waits_for(struct intel_engine_cs *ring, u32 *seqno)
>   	if (!i)
>   		return NULL;
>   
> -	*seqno = ioread32(ring->buffer->virtual_start + head + 4) + 1;
> -	if (INTEL_INFO(ring->dev)->gen >= 8) {
> -		offset = ioread32(ring->buffer->virtual_start + head + 12);
> +	*seqno = ioread32(ring->virtual_start + head + 4) + 1;
> +	if (INTEL_INFO(dev_priv)->gen >= 8) {
> +		offset = ioread32(ring->virtual_start + head + 12);
>   		offset <<= 32;
> -		offset = ioread32(ring->buffer->virtual_start + head + 8);
> +		offset = ioread32(ring->virtual_start + head + 8);
>   	}
> -	return semaphore_wait_to_signaller_ring(ring, ipehr, offset);
> +	return semaphore_wait_to_signaller_engine(engine, ipehr, offset);
>   }
>   
> -static int semaphore_passed(struct intel_engine_cs *ring)
> +static int semaphore_passed(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	struct intel_engine_cs *signaller;
> +	struct i915_gem_request *rq;
>   	u32 seqno;
>   
> -	ring->hangcheck.deadlock++;
> +	engine->hangcheck.deadlock++;
>   
> -	signaller = semaphore_waits_for(ring, &seqno);
> +	if (engine->semaphore.wait == NULL)
> +		return -1;
> +
> +	signaller = semaphore_waits_for(engine, &seqno);
>   	if (signaller == NULL)
>   		return -1;
>   
>   	/* Prevent pathological recursion due to driver bugs */
> -	if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
> +	if (signaller->hangcheck.deadlock >= I915_NUM_ENGINES)
>   		return -1;
>   
> -	if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno))
> +	rq = intel_engine_seqno_to_request(engine, seqno);
> +	if (rq == NULL || i915_request_complete(rq))
>   		return 1;
>   
>   	/* cursory check for an unkickable deadlock */
> @@ -3172,30 +3183,29 @@ static int semaphore_passed(struct intel_engine_cs *ring)
>   
>   static void semaphore_clear_deadlocks(struct drm_i915_private *dev_priv)
>   {
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	int i;
>   
> -	for_each_ring(ring, dev_priv, i)
> -		ring->hangcheck.deadlock = 0;
> +	for_each_engine(engine, dev_priv, i)
> +		engine->hangcheck.deadlock = 0;
>   }
>   
> -static enum intel_ring_hangcheck_action
> -ring_stuck(struct intel_engine_cs *ring, u64 acthd)
> +static enum intel_engine_hangcheck_action
> +engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	u32 tmp;
>   
> -	if (acthd != ring->hangcheck.acthd) {
> -		if (acthd > ring->hangcheck.max_acthd) {
> -			ring->hangcheck.max_acthd = acthd;
> +	if (acthd != engine->hangcheck.acthd) {
> +		if (acthd > engine->hangcheck.max_acthd) {
> +			engine->hangcheck.max_acthd = acthd;
>   			return HANGCHECK_ACTIVE;
>   		}
>   
>   		return HANGCHECK_ACTIVE_LOOP;
>   	}
>   
> -	if (IS_GEN2(dev))
> +	if (IS_GEN2(dev_priv))
>   		return HANGCHECK_HUNG;
>   
>   	/* Is the chip hanging on a WAIT_FOR_EVENT?
> @@ -3203,24 +3213,24 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd)
>   	 * and break the hang. This should work on
>   	 * all but the second generation chipsets.
>   	 */
> -	tmp = I915_READ_CTL(ring);
> +	tmp = I915_READ_CTL(engine);
>   	if (tmp & RING_WAIT) {
> -		i915_handle_error(dev, false,
> +		i915_handle_error(dev_priv->dev, false,
>   				  "Kicking stuck wait on %s",
> -				  ring->name);
> -		I915_WRITE_CTL(ring, tmp);
> +				  engine->name);
> +		I915_WRITE_CTL(engine, tmp);
>   		return HANGCHECK_KICK;
>   	}
>   
> -	if (INTEL_INFO(dev)->gen >= 6 && tmp & RING_WAIT_SEMAPHORE) {
> -		switch (semaphore_passed(ring)) {
> +	if (INTEL_INFO(dev_priv)->gen >= 6 && tmp & RING_WAIT_SEMAPHORE) {
> +		switch (semaphore_passed(engine)) {
>   		default:
>   			return HANGCHECK_HUNG;
>   		case 1:
> -			i915_handle_error(dev, false,
> +			i915_handle_error(dev_priv->dev, false,
>   					  "Kicking stuck semaphore on %s",
> -					  ring->name);
> -			I915_WRITE_CTL(ring, tmp);
> +					  engine->name);
> +			I915_WRITE_CTL(engine, tmp);
>   			return HANGCHECK_KICK;
>   		case 0:
>   			return HANGCHECK_WAIT;
> @@ -3232,7 +3242,7 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd)
>   
>   /**
>    * This is called when the chip hasn't reported back with completed
> - * batchbuffers in a long time. We keep track per ring seqno progress and
> + * batchbuffers in a long time. We keep track per engine seqno progress and
>    * if there are no progress, hangcheck score for that ring is increased.
>    * Further, acthd is inspected to see if the ring is stuck. On stuck case
>    * we kick the ring. If we see no progress on three subsequent calls
> @@ -3240,12 +3250,11 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd)
>    */
>   static void i915_hangcheck_elapsed(unsigned long data)
>   {
> -	struct drm_device *dev = (struct drm_device *)data;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct drm_i915_private *dev_priv = (struct drm_i915_private *)data;
> +	struct intel_engine_cs *engine;
>   	int i;
>   	int busy_count = 0, rings_hung = 0;
> -	bool stuck[I915_NUM_RINGS] = { 0 };
> +	bool stuck[I915_NUM_ENGINES] = { 0 };
>   #define BUSY 1
>   #define KICK 5
>   #define HUNG 20
> @@ -3253,104 +3262,108 @@ static void i915_hangcheck_elapsed(unsigned long data)
>   	if (!i915.enable_hangcheck)
>   		return;
>   
> -	for_each_ring(ring, dev_priv, i) {
> +	for_each_engine(engine, dev_priv, i) {
>   		u64 acthd;
>   		u32 seqno;
> +		u32 interrupts;
>   		bool busy = true;
>   
>   		semaphore_clear_deadlocks(dev_priv);
>   
> -		seqno = ring->get_seqno(ring, false);
> -		acthd = intel_ring_get_active_head(ring);
> -
> -		if (ring->hangcheck.seqno == seqno) {
> -			if (ring_idle(ring, seqno)) {
> -				ring->hangcheck.action = HANGCHECK_IDLE;
> -
> -				if (waitqueue_active(&ring->irq_queue)) {
> -					/* Issue a wake-up to catch stuck h/w. */
> -					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
> -						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
> -							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
> -								  ring->name);
> -						else
> -							DRM_INFO("Fake missed irq on %s\n",
> -								 ring->name);
> -						wake_up_all(&ring->irq_queue);
> -					}
> -					/* Safeguard against driver failure */
> -					ring->hangcheck.score += BUSY;
> -				} else
> -					busy = false;
> +		acthd = intel_engine_get_active_head(engine);
> +		seqno = engine->get_seqno(engine);
> +		interrupts = atomic_read(&engine->interrupts);
> +
> +		if (engine_idle(engine)) {
> +			if (waitqueue_active(&engine->irq_queue)) {
> +				/* Issue a wake-up to catch stuck h/w. */
> +				if (engine->hangcheck.action == HANGCHECK_IDLE_WAITERS &&
> +						engine->hangcheck.interrupts == interrupts &&
> +						!test_and_set_bit(engine->id, &dev_priv->gpu_error.missed_irq_rings)) {
> +					if (!(dev_priv->gpu_error.test_irq_rings & intel_engine_flag(engine)))
> +						DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
> +								engine->name);
> +					else
> +						DRM_INFO("Fake missed irq on %s\n",
> +								engine->name);
> +					wake_up_all(&engine->irq_queue);
> +				}
> +
> +				/* Safeguard against driver failure */
> +				engine->hangcheck.score += BUSY;
> +				engine->hangcheck.action = HANGCHECK_IDLE_WAITERS;
>   			} else {
> -				/* We always increment the hangcheck score
> -				 * if the ring is busy and still processing
> -				 * the same request, so that no single request
> -				 * can run indefinitely (such as a chain of
> -				 * batches). The only time we do not increment
> -				 * the hangcheck score on this ring, if this
> -				 * ring is in a legitimate wait for another
> -				 * ring. In that case the waiting ring is a
> -				 * victim and we want to be sure we catch the
> -				 * right culprit. Then every time we do kick
> -				 * the ring, add a small increment to the
> -				 * score so that we can catch a batch that is
> -				 * being repeatedly kicked and so responsible
> -				 * for stalling the machine.
> -				 */
> -				ring->hangcheck.action = ring_stuck(ring,
> -								    acthd);
> -
> -				switch (ring->hangcheck.action) {
> +				busy = false;
> +				engine->hangcheck.action = HANGCHECK_IDLE;
> +			}
> +		} else if (engine->hangcheck.seqno == seqno) {
> +			/* We always increment the hangcheck score
> +			 * if the ring is busy and still processing
> +			 * the same request, so that no single request
> +			 * can run indefinitely (such as a chain of
> +			 * batches). The only time we do not increment
> +			 * the hangcheck score on this ring, if this
> +			 * ring is in a legitimate wait for another
> +			 * ring. In that case the waiting ring is a
> +			 * victim and we want to be sure we catch the
> +			 * right culprit. Then every time we do kick
> +			 * the ring, add a small increment to the
> +			 * score so that we can catch a batch that is
> +			 * being repeatedly kicked and so responsible
> +			 * for stalling the machine.
> +			 */
> +			engine->hangcheck.action = engine_stuck(engine, acthd);
> +			switch (engine->hangcheck.action) {
>   				case HANGCHECK_IDLE:
> +				case HANGCHECK_IDLE_WAITERS:
>   				case HANGCHECK_WAIT:
>   				case HANGCHECK_ACTIVE:
>   					break;
>   				case HANGCHECK_ACTIVE_LOOP:
> -					ring->hangcheck.score += BUSY;
> +					engine->hangcheck.score += BUSY;
>   					break;
>   				case HANGCHECK_KICK:
> -					ring->hangcheck.score += KICK;
> +					engine->hangcheck.score += KICK;
>   					break;
>   				case HANGCHECK_HUNG:
> -					ring->hangcheck.score += HUNG;
> +					engine->hangcheck.score += HUNG;
>   					stuck[i] = true;
>   					break;
> -				}
>   			}
>   		} else {
> -			ring->hangcheck.action = HANGCHECK_ACTIVE;
> +			engine->hangcheck.action = HANGCHECK_ACTIVE;
>   
>   			/* Gradually reduce the count so that we catch DoS
>   			 * attempts across multiple batches.
>   			 */
> -			if (ring->hangcheck.score > 0)
> -				ring->hangcheck.score--;
> +			if (engine->hangcheck.score > 0)
> +				engine->hangcheck.score--;
>   
> -			ring->hangcheck.acthd = ring->hangcheck.max_acthd = 0;
> +			engine->hangcheck.acthd = engine->hangcheck.max_acthd = 0;
>   		}
>   
> -		ring->hangcheck.seqno = seqno;
> -		ring->hangcheck.acthd = acthd;
> +		engine->hangcheck.interrupts = interrupts;
> +		engine->hangcheck.seqno = seqno;
> +		engine->hangcheck.acthd = acthd;
>   		busy_count += busy;
>   	}
>   
> -	for_each_ring(ring, dev_priv, i) {
> -		if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
> +	for_each_engine(engine, dev_priv, i) {
> +		if (engine->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
>   			DRM_INFO("%s on %s\n",
>   				 stuck[i] ? "stuck" : "no progress",
> -				 ring->name);
> +				 engine->name);
>   			rings_hung++;
>   		}
>   	}
>   
>   	if (rings_hung)
> -		return i915_handle_error(dev, true, "Ring hung");
> +		return i915_handle_error(dev_priv->dev, true, "Ring hung");
>   
>   	if (busy_count)
>   		/* Reset timer case chip hangs without another request
>   		 * being added */
> -		i915_queue_hangcheck(dev);
> +		i915_queue_hangcheck(dev_priv->dev);
>   }
>   
>   void i915_queue_hangcheck(struct drm_device *dev)
> @@ -4110,7 +4123,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>   		new_iir = I915_READ16(IIR); /* Flush posted writes */
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(dev, &dev_priv->engine[RCS]);
>   
>   		for_each_pipe(dev_priv, pipe) {
>   			int plane = pipe;
> @@ -4303,7 +4316,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>   		new_iir = I915_READ(IIR); /* Flush posted writes */
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(dev, &dev_priv->engine[RCS]);
>   
>   		for_each_pipe(dev_priv, pipe) {
>   			int plane = pipe;
> @@ -4533,9 +4546,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>   		new_iir = I915_READ(IIR); /* Flush posted writes */
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(dev, &dev_priv->engine[RCS]);
>   		if (iir & I915_BSD_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[VCS]);
> +			notify_ring(dev, &dev_priv->engine[VCS]);
>   
>   		for_each_pipe(dev_priv, pipe) {
>   			if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS &&
> @@ -4663,7 +4676,7 @@ void intel_irq_init(struct drm_device *dev)
>   
>   	setup_timer(&dev_priv->gpu_error.hangcheck_timer,
>   		    i915_hangcheck_elapsed,
> -		    (unsigned long) dev);
> +		    (unsigned long) dev_priv);
>   	INIT_DELAYED_WORK(&dev_priv->hotplug_reenable_work,
>   			  intel_hpd_irq_reenable);
>   
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 15c0eaa9f97f..59f0852d89d6 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -287,7 +287,7 @@
>   #define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
>   #define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
>   #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> -#define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
> +#define MI_FLUSH_DW		MI_INSTR(0x26, 0) /* for GEN6 */
>   #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
>   #define   MI_INVALIDATE_TLB		(1<<18)
>   #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
> @@ -2295,6 +2295,7 @@ enum punit_power_well {
>    *   doesn't need saving on GT1
>    */
>   #define CXT_SIZE		0x21a0
> +#define ILK_CXT_TOTAL_SIZE		(1 * PAGE_SIZE)
>   #define GEN6_CXT_POWER_SIZE(cxt_reg)	((cxt_reg >> 24) & 0x3f)
>   #define GEN6_CXT_RING_SIZE(cxt_reg)	((cxt_reg >> 18) & 0x3f)
>   #define GEN6_CXT_RENDER_SIZE(cxt_reg)	((cxt_reg >> 12) & 0x3f)
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index f5aa0067755a..8bb51dcb10f3 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -325,11 +325,10 @@ TRACE_EVENT(i915_gem_evict_vm,
>   	    TP_printk("dev=%d, vm=%p", __entry->dev, __entry->vm)
>   );
>   
> -TRACE_EVENT(i915_gem_ring_sync_to,
> -	    TP_PROTO(struct intel_engine_cs *from,
> -		     struct intel_engine_cs *to,
> -		     u32 seqno),
> -	    TP_ARGS(from, to, seqno),
> +TRACE_EVENT(i915_gem_ring_wait,
> +	    TP_PROTO(struct i915_gem_request *waiter,
> +		     struct i915_gem_request *signaller),
> +	    TP_ARGS(waiter, signaller),
>   
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
> @@ -339,18 +338,40 @@ TRACE_EVENT(i915_gem_ring_sync_to,
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = from->dev->primary->index;
> -			   __entry->sync_from = from->id;
> -			   __entry->sync_to = to->id;
> -			   __entry->seqno = seqno;
> +			   __entry->dev = waiter->i915->dev->primary->index;
> +			   __entry->sync_from = waiter->engine->id;
> +			   __entry->sync_to = signaller->engine->id;
> +			   __entry->seqno = signaller->breadcrumb[waiter->engine->id];
>   			   ),
>   
> -	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
> +	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%x",
>   		      __entry->dev,
>   		      __entry->sync_from, __entry->sync_to,
>   		      __entry->seqno)
>   );
>   
> +TRACE_EVENT(i915_gem_ring_switch_context,
> +	    TP_PROTO(struct intel_engine_cs *engine, struct intel_context *ctx, u32 flags),
> +	    TP_ARGS(engine, ctx, flags),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, ring)
> +			     __field(u32, ctx)
> +			     __field(u32, flags)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = engine->i915->dev->primary->index;
> +			   __entry->ring = engine->id;
> +			   __entry->ctx = ctx->file_priv ? ctx->user_handle : -1;
> +			   __entry->flags = flags;
> +			   ),
> +
> +	    TP_printk("dev=%u, ring=%u, ctx=%d, flags=0x%08x",
> +		      __entry->dev, __entry->ring, __entry->ctx, __entry->flags)
> +);
> +
>   TRACE_EVENT(i915_gem_ring_dispatch,
>   	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno, u32 flags),
>   	    TP_ARGS(ring, seqno, flags),
> @@ -363,66 +384,84 @@ TRACE_EVENT(i915_gem_ring_dispatch,
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> +			   __entry->dev = ring->i915->dev->primary->index;
>   			   __entry->ring = ring->id;
>   			   __entry->seqno = seqno;
>   			   __entry->flags = flags;
>   			   i915_trace_irq_get(ring, seqno);
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
> +	    TP_printk("dev=%u, ring=%u, seqno=%x, flags=%x",
>   		      __entry->dev, __entry->ring, __entry->seqno, __entry->flags)
>   );
>   
> -TRACE_EVENT(i915_gem_ring_flush,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 invalidate, u32 flush),
> -	    TP_ARGS(ring, invalidate, flush),
> +TRACE_EVENT(intel_ringbuffer_begin,
> +	    TP_PROTO(struct intel_ringbuffer *ring, int need),
> +	    TP_ARGS(ring, need),
>   
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
>   			     __field(u32, ring)
> -			     __field(u32, invalidate)
> -			     __field(u32, flush)
> +			     __field(u32, need)
> +			     __field(u32, space)
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> -			   __entry->ring = ring->id;
> -			   __entry->invalidate = invalidate;
> -			   __entry->flush = flush;
> +			   __entry->dev = ring->engine->i915->dev->primary->index;
> +			   __entry->ring = ring->engine->id;
> +			   __entry->need = need;
> +			   __entry->space = intel_ring_space(ring);
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%x, invalidate=%04x, flush=%04x",
> -		      __entry->dev, __entry->ring,
> -		      __entry->invalidate, __entry->flush)
> +	    TP_printk("dev=%u, ring=%u, need=%u, space=%u",
> +		      __entry->dev, __entry->ring, __entry->need, __entry->space)
>   );
>   
> -DECLARE_EVENT_CLASS(i915_gem_request,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno),
> +TRACE_EVENT(intel_ringbuffer_wait,
> +	    TP_PROTO(struct intel_ringbuffer *ring, int need),
> +	    TP_ARGS(ring, need),
>   
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
>   			     __field(u32, ring)
> -			     __field(u32, seqno)
> +			     __field(u32, need)
> +			     __field(u32, space)
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> -			   __entry->ring = ring->id;
> -			   __entry->seqno = seqno;
> +			   __entry->dev = ring->engine->i915->dev->primary->index;
> +			   __entry->ring = ring->engine->id;
> +			   __entry->need = need;
> +			   __entry->space = intel_ring_space(ring);
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%u, seqno=%u",
> -		      __entry->dev, __entry->ring, __entry->seqno)
> +	    TP_printk("dev=%u, ring=%u, need=%u, space=%u",
> +		      __entry->dev, __entry->ring, __entry->need, __entry->space)
>   );
>   
> -DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno)
> +TRACE_EVENT(intel_ringbuffer_wrap,
> +	    TP_PROTO(struct intel_ringbuffer *ring, int rem),
> +	    TP_ARGS(ring, rem),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, ring)
> +			     __field(u32, rem)
> +			     __field(u32, size)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = ring->engine->i915->dev->primary->index;
> +			   __entry->ring = ring->engine->id;
> +			   __entry->rem = rem;
> +			   __entry->size = ring->effective_size;
> +			   ),
> +
> +	    TP_printk("dev=%u, ring=%u, rem=%u, size=%u",
> +		      __entry->dev, __entry->ring, __entry->rem, __entry->size)
>   );
>   
> -TRACE_EVENT(i915_gem_request_complete,
> +TRACE_EVENT(i915_gem_ring_complete,
>   	    TP_PROTO(struct intel_engine_cs *ring),
>   	    TP_ARGS(ring),
>   
> @@ -433,23 +472,68 @@ TRACE_EVENT(i915_gem_request_complete,
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> +			   __entry->dev = ring->i915->dev->primary->index;
>   			   __entry->ring = ring->id;
> -			   __entry->seqno = ring->get_seqno(ring, false);
> +			   __entry->seqno = ring->get_seqno(ring);
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%u, seqno=%u",
> +	    TP_printk("dev=%u, ring=%u, seqno=%x",
>   		      __entry->dev, __entry->ring, __entry->seqno)
>   );
>   
> +DECLARE_EVENT_CLASS(i915_gem_request,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, ring)
> +			     __field(u32, seqno)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = rq->i915->dev->primary->index;
> +			   __entry->ring = rq->engine->id;
> +			   __entry->seqno = rq->seqno;
> +			   ),
> +
> +	    TP_printk("dev=%u, ring=%u, seqno=%x",
> +		      __entry->dev, __entry->ring, __entry->seqno)
> +);
> +
> +DEFINE_EVENT(i915_gem_request, i915_gem_request_emit_flush,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_gem_request, i915_gem_request_emit_batch,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_gem_request, i915_gem_request_emit_breadcrumb,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_gem_request, i915_gem_request_commit,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_gem_request, i915_gem_request_complete,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
> +);
> +
>   DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno)
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq)
>   );
>   
>   TRACE_EVENT(i915_gem_request_wait_begin,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno),
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq),
>   
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
> @@ -465,47 +549,38 @@ TRACE_EVENT(i915_gem_request_wait_begin,
>   	     * less desirable.
>   	     */
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> -			   __entry->ring = ring->id;
> -			   __entry->seqno = seqno;
> -			   __entry->blocking = mutex_is_locked(&ring->dev->struct_mutex);
> +			   __entry->dev = rq->i915->dev->primary->index;
> +			   __entry->ring = rq->engine->id;
> +			   __entry->seqno = rq->seqno;
> +			   __entry->blocking = mutex_is_locked(&rq->i915->dev->struct_mutex);
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
> +	    TP_printk("dev=%u, ring=%u, seqno=%x, blocking?=%s",
>   		      __entry->dev, __entry->ring, __entry->seqno,
>   		      __entry->blocking ?  "yes (NB)" : "no")
>   );
>   
> -DEFINE_EVENT(i915_gem_request, i915_gem_request_wait_end,
> -	    TP_PROTO(struct intel_engine_cs *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno)
> -);
> -
> -DECLARE_EVENT_CLASS(i915_ring,
> -	    TP_PROTO(struct intel_engine_cs *ring),
> -	    TP_ARGS(ring),
> +TRACE_EVENT(i915_gem_request_wait_end,
> +	    TP_PROTO(struct i915_gem_request *rq),
> +	    TP_ARGS(rq),
>   
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
>   			     __field(u32, ring)
> +			     __field(u32, seqno)
> +			     __field(bool, completed)
>   			     ),
>   
>   	    TP_fast_assign(
> -			   __entry->dev = ring->dev->primary->index;
> -			   __entry->ring = ring->id;
> +			   __entry->dev = rq->i915->dev->primary->index;
> +			   __entry->ring = rq->engine->id;
> +			   __entry->seqno = rq->seqno;
> +			   __entry->completed = rq->completed;
>   			   ),
>   
> -	    TP_printk("dev=%u, ring=%u", __entry->dev, __entry->ring)
> -);
> -
> -DEFINE_EVENT(i915_ring, i915_ring_wait_begin,
> -	    TP_PROTO(struct intel_engine_cs *ring),
> -	    TP_ARGS(ring)
> -);
> -
> -DEFINE_EVENT(i915_ring, i915_ring_wait_end,
> -	    TP_PROTO(struct intel_engine_cs *ring),
> -	    TP_ARGS(ring)
> +	    TP_printk("dev=%u, ring=%u, seqno=%x, completed=%s",
> +		      __entry->dev, __entry->ring, __entry->seqno,
> +		      __entry->completed ?  "yes" : "no")
>   );
>   
>   TRACE_EVENT(i915_flip_request,
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 479e50a2ef98..049eb0fc09f3 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2189,7 +2189,7 @@ static int intel_align_height(struct drm_device *dev, int height, bool tiled)
>   int
>   intel_pin_and_fence_fb_obj(struct drm_device *dev,
>   			   struct drm_i915_gem_object *obj,
> -			   struct intel_engine_cs *pipelined)
> +			   struct i915_gem_request *pipelined)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	u32 alignment;
> @@ -9053,7 +9053,7 @@ out:
>    */
>   static void intel_mark_fb_busy(struct drm_device *dev,
>   			       unsigned frontbuffer_bits,
> -			       struct intel_engine_cs *ring)
> +			       struct i915_gem_request *rq)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	enum pipe pipe;
> @@ -9066,24 +9066,24 @@ static void intel_mark_fb_busy(struct drm_device *dev,
>   			continue;
>   
>   		intel_increase_pllclock(dev, pipe);
> -		if (ring && intel_fbc_enabled(dev))
> -			ring->fbc_dirty = true;
> +		if (rq && intel_fbc_enabled(dev))
> +			rq->pending_flush |= I915_KICK_FBC;
>   	}
>   }
>   
>   /**
>    * intel_fb_obj_invalidate - invalidate frontbuffer object
>    * @obj: GEM object to invalidate
> - * @ring: set for asynchronous rendering
> + * @rq: set for asynchronous rendering
>    *
>    * This function gets called every time rendering on the given object starts and
>    * frontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) must
> - * be invalidated. If @ring is non-NULL any subsequent invalidation will be delayed
> + * be invalidated. If @engine is non-NULL any subsequent invalidation will be delayed
>    * until the rendering completes or a flip on this frontbuffer plane is
>    * scheduled.
>    */
>   void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
> -			     struct intel_engine_cs *ring)
> +			     struct i915_gem_request *rq)
>   {
>   	struct drm_device *dev = obj->base.dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -9093,7 +9093,7 @@ void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
>   	if (!obj->frontbuffer_bits)
>   		return;
>   
> -	if (ring) {
> +	if (rq) {
>   		mutex_lock(&dev_priv->fb_tracking.lock);
>   		dev_priv->fb_tracking.busy_bits
>   			|= obj->frontbuffer_bits;
> @@ -9102,7 +9102,7 @@ void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
>   		mutex_unlock(&dev_priv->fb_tracking.lock);
>   	}
>   
> -	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
> +	intel_mark_fb_busy(dev, obj->frontbuffer_bits, rq);
>   
>   	intel_edp_psr_invalidate(dev, obj->frontbuffer_bits);
>   }
> @@ -9256,6 +9256,7 @@ static void intel_unpin_work_fn(struct work_struct *__work)
>   	intel_unpin_fb_obj(work->old_fb_obj);
>   	drm_gem_object_unreference(&work->pending_flip_obj->base);
>   	drm_gem_object_unreference(&work->old_fb_obj->base);
> +	i915_request_put(work->flip_queued_request);
>   
>   	intel_update_fbc(dev);
>   	mutex_unlock(&dev->struct_mutex);
> @@ -9379,97 +9380,86 @@ static inline void intel_mark_page_flip_active(struct intel_crtc *intel_crtc)
>   	smp_wmb();
>   }
>   
> -static int intel_gen2_queue_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> +static int intel_gen2_queue_flip(struct i915_gem_request *rq,
> +				 struct intel_crtc *crtc,
>   				 struct drm_framebuffer *fb,
>   				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
>   				 uint32_t flags)
>   {
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_ringbuffer *ring;
>   	u32 flip_mask;
> -	int ret;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 5);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	/* Can't queue multiple flips, so wait for the previous
>   	 * one to finish before executing the next.
>   	 */
> -	if (intel_crtc->plane)
> +	if (crtc->plane)
>   		flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
>   	else
>   		flip_mask = MI_WAIT_FOR_PLANE_A_FLIP;
>   	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | flip_mask);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_emit(ring, MI_DISPLAY_FLIP |
> -			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
> +			MI_DISPLAY_FLIP_PLANE(crtc->plane));
>   	intel_ring_emit(ring, fb->pitches[0]);
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
> +	intel_ring_emit(ring, crtc->unpin_work->gtt_offset);
>   	intel_ring_emit(ring, 0); /* aux display base address, unused */
> +	intel_ring_advance(ring);
>   
> -	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
>   	return 0;
>   }
>   
> -static int intel_gen3_queue_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> +static int intel_gen3_queue_flip(struct i915_gem_request *rq,
> +				 struct intel_crtc *crtc,
>   				 struct drm_framebuffer *fb,
>   				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
>   				 uint32_t flags)
>   {
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_ringbuffer *ring;
>   	u32 flip_mask;
> -	int ret;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	if (intel_crtc->plane)
> +	if (crtc->plane)
>   		flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
>   	else
>   		flip_mask = MI_WAIT_FOR_PLANE_A_FLIP;
>   	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | flip_mask);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 |
> -			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
> +			MI_DISPLAY_FLIP_PLANE(crtc->plane));
>   	intel_ring_emit(ring, fb->pitches[0]);
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
> -	intel_ring_emit(ring, MI_NOOP);
> +	intel_ring_emit(ring, crtc->unpin_work->gtt_offset);
> +	intel_ring_advance(ring);
>   
> -	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
>   	return 0;
>   }
>   
> -static int intel_gen4_queue_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> +static int intel_gen4_queue_flip(struct i915_gem_request *rq,
> +				 struct intel_crtc *crtc,
>   				 struct drm_framebuffer *fb,
>   				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
>   				 uint32_t flags)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct drm_i915_private *dev_priv = rq->i915;
> +	struct intel_ringbuffer *ring;
>   	uint32_t pf, pipesrc;
> -	int ret;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	/* i965+ uses the linear or tiled offsets from the
>   	 * Display Registers (which do not change across a page-flip)
>   	 * so we need only reprogram the base address.
>   	 */
>   	intel_ring_emit(ring, MI_DISPLAY_FLIP |
> -			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
> +			MI_DISPLAY_FLIP_PLANE(crtc->plane));
>   	intel_ring_emit(ring, fb->pitches[0]);
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset |
> +	intel_ring_emit(ring, crtc->unpin_work->gtt_offset |
>   			obj->tiling_mode);
>   
>   	/* XXX Enabling the panel-fitter across page-flip is so far
> @@ -9477,62 +9467,57 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
>   	 * pf = I915_READ(pipe == 0 ? PFA_CTL_1 : PFB_CTL_1) & PF_ENABLE;
>   	 */
>   	pf = 0;
> -	pipesrc = I915_READ(PIPESRC(intel_crtc->pipe)) & 0x0fff0fff;
> +	pipesrc = I915_READ(PIPESRC(crtc->pipe)) & 0x0fff0fff;
>   	intel_ring_emit(ring, pf | pipesrc);
> +	intel_ring_advance(ring);
>   
> -	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
>   	return 0;
>   }
>   
> -static int intel_gen6_queue_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> +static int intel_gen6_queue_flip(struct i915_gem_request *rq,
> +				 struct intel_crtc *crtc,
>   				 struct drm_framebuffer *fb,
>   				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
>   				 uint32_t flags)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct drm_i915_private *dev_priv = rq->i915;
> +	struct intel_ringbuffer *ring;
>   	uint32_t pf, pipesrc;
> -	int ret;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_DISPLAY_FLIP |
> -			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
> +			MI_DISPLAY_FLIP_PLANE(crtc->plane));
>   	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
> +	intel_ring_emit(ring, crtc->unpin_work->gtt_offset);
>   
>   	/* Contrary to the suggestions in the documentation,
>   	 * "Enable Panel Fitter" does not seem to be required when page
>   	 * flipping with a non-native mode, and worse causes a normal
>   	 * modeset to fail.
> -	 * pf = I915_READ(PF_CTL(intel_crtc->pipe)) & PF_ENABLE;
> +	 * pf = I915_READ(PF_CTL(crtc->pipe)) & PF_ENABLE;
>   	 */
>   	pf = 0;
> -	pipesrc = I915_READ(PIPESRC(intel_crtc->pipe)) & 0x0fff0fff;
> +	pipesrc = I915_READ(PIPESRC(crtc->pipe)) & 0x0fff0fff;
>   	intel_ring_emit(ring, pf | pipesrc);
> +	intel_ring_advance(ring);
>   
> -	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
>   	return 0;
>   }
>   
> -static int intel_gen7_queue_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> +static int intel_gen7_queue_flip(struct i915_gem_request *rq,
> +				 struct intel_crtc *crtc,
>   				 struct drm_framebuffer *fb,
>   				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
>   				 uint32_t flags)
>   {
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_ringbuffer *ring;
>   	uint32_t plane_bit = 0;
>   	int len, ret;
>   
> -	switch (intel_crtc->plane) {
> +	switch (crtc->plane) {
>   	case PLANE_A:
>   		plane_bit = MI_DISPLAY_FLIP_IVB_PLANE_A;
>   		break;
> @@ -9547,16 +9532,16 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>   		return -ENODEV;
>   	}
>   
> -	len = 4;
> -	if (ring->id == RCS) {
> +	len = 3;
> +	if (rq->engine->id == RCS) {
>   		len += 6;
>   		/*
>   		 * On Gen 8, SRM is now taking an extra dword to accommodate
>   		 * 48bits addresses, and we need a NOOP for the batch size to
>   		 * stay even.
>   		 */
> -		if (IS_GEN8(dev))
> -			len += 2;
> +		if (IS_GEN8(rq->i915))
> +			len += 1;
>   	}
>   
>   	/*
> @@ -9569,13 +9554,13 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>   	 * then do the cacheline alignment, and finally emit the
>   	 * MI_DISPLAY_FLIP.
>   	 */
> -	ret = intel_ring_cacheline_align(ring);
> +	ret = intel_ring_cacheline_align(rq);
>   	if (ret)
>   		return ret;
>   
> -	ret = intel_ring_begin(ring, len);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, len);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	/* Unmask the flip-done completion message. Note that the bspec says that
>   	 * we should do this for both the BCS and RCS, and that we must not unmask
> @@ -9586,37 +9571,33 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>   	 * for the RCS also doesn't appear to drop events. Setting the DERRMR
>   	 * to zero does lead to lockups within MI_DISPLAY_FLIP.
>   	 */
> -	if (ring->id == RCS) {
> +	if (rq->engine->id == RCS) {
>   		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
>   		intel_ring_emit(ring, DERRMR);
>   		intel_ring_emit(ring, ~(DERRMR_PIPEA_PRI_FLIP_DONE |
>   					DERRMR_PIPEB_PRI_FLIP_DONE |
>   					DERRMR_PIPEC_PRI_FLIP_DONE));
> -		if (IS_GEN8(dev))
> +		if (IS_GEN8(rq->i915))
>   			intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) |
>   					      MI_SRM_LRM_GLOBAL_GTT);
>   		else
>   			intel_ring_emit(ring, MI_STORE_REGISTER_MEM(1) |
>   					      MI_SRM_LRM_GLOBAL_GTT);
>   		intel_ring_emit(ring, DERRMR);
> -		intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
> -		if (IS_GEN8(dev)) {
> +		intel_ring_emit(ring, rq->engine->scratch.gtt_offset + 256);
> +		if (IS_GEN8(rq->i915))
>   			intel_ring_emit(ring, 0);
> -			intel_ring_emit(ring, MI_NOOP);
> -		}
>   	}
>   
>   	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
> -	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
> -	intel_ring_emit(ring, intel_crtc->unpin_work->gtt_offset);
> -	intel_ring_emit(ring, (MI_NOOP));
> +	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
> +	intel_ring_emit(ring, crtc->unpin_work->gtt_offset);
> +	intel_ring_advance(ring);
>   
> -	intel_mark_page_flip_active(intel_crtc);
> -	__intel_ring_advance(ring);
>   	return 0;
>   }
>   
> -static bool use_mmio_flip(struct intel_engine_cs *ring,
> +static bool use_mmio_flip(struct intel_engine_cs *engine,
>   			  struct drm_i915_gem_object *obj)
>   {
>   	/*
> @@ -9627,20 +9608,18 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
>   	 * So using MMIO flips there would disrupt this mechanism.
>   	 */
>   
> -	if (ring == NULL)
> +	if (engine == NULL)
>   		return true;
>   
> -	if (INTEL_INFO(ring->dev)->gen < 5)
> +	if (INTEL_INFO(engine->i915)->gen < 5)
>   		return false;
>   
>   	if (i915.use_mmio_flip < 0)
>   		return false;
>   	else if (i915.use_mmio_flip > 0)
>   		return true;
> -	else if (i915.enable_execlists)
> -		return true;
>   	else
> -		return ring != obj->ring;
> +		return engine != i915_request_engine(obj->last_write.request);
>   }
>   
>   static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
> @@ -9671,102 +9650,62 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
>   	POSTING_READ(DSPSURF(intel_crtc->plane));
>   }
>   
> -static int intel_postpone_flip(struct drm_i915_gem_object *obj)
> -{
> -	struct intel_engine_cs *ring;
> -	int ret;
> -
> -	lockdep_assert_held(&obj->base.dev->struct_mutex);
> -
> -	if (!obj->last_write_seqno)
> -		return 0;
> -
> -	ring = obj->ring;
> -
> -	if (i915_seqno_passed(ring->get_seqno(ring, true),
> -			      obj->last_write_seqno))
> -		return 0;
> +struct flip_work {
> +	struct work_struct work;
> +	struct i915_gem_request *rq;
> +	struct intel_crtc *crtc;
> +};
>   
> -	ret = i915_gem_check_olr(ring, obj->last_write_seqno);
> -	if (ret)
> -		return ret;
> +static void intel_mmio_flip_work(struct work_struct *work)
> +{
> +	struct flip_work *flip = container_of(work, struct flip_work, work);
>   
> -	if (WARN_ON(!ring->irq_get(ring)))
> -		return 0;
> +	if (__i915_request_wait(flip->rq, false, NULL, NULL) == 0)
> +		intel_do_mmio_flip(flip->crtc);
>   
> -	return 1;
> +	i915_request_put__unlocked(flip->rq);
> +	kfree(flip);
>   }
>   
> -void intel_notify_mmio_flip(struct intel_engine_cs *ring)
> +static int intel_queue_mmio_flip(struct intel_crtc *crtc,
> +				 struct i915_gem_request *rq)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> -	struct intel_crtc *intel_crtc;
> -	unsigned long irq_flags;
> -	u32 seqno;
> -
> -	seqno = ring->get_seqno(ring, false);
> +	struct flip_work *flip;
>   
> -	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
> -	for_each_intel_crtc(ring->dev, intel_crtc) {
> -		struct intel_mmio_flip *mmio_flip;
> -
> -		mmio_flip = &intel_crtc->mmio_flip;
> -		if (mmio_flip->seqno == 0)
> -			continue;
> -
> -		if (ring->id != mmio_flip->ring_id)
> -			continue;
> +	if (WARN_ON(crtc->mmio_flip))
> +		return -EBUSY;
>   
> -		if (i915_seqno_passed(seqno, mmio_flip->seqno)) {
> -			intel_do_mmio_flip(intel_crtc);
> -			mmio_flip->seqno = 0;
> -			ring->irq_put(ring);
> -		}
> +	if (rq == NULL) {
> +		intel_do_mmio_flip(crtc);
> +		return 0;
>   	}
> -	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
> -}
>   
> -static int intel_queue_mmio_flip(struct drm_device *dev,
> -				 struct drm_crtc *crtc,
> -				 struct drm_framebuffer *fb,
> -				 struct drm_i915_gem_object *obj,
> -				 struct intel_engine_cs *ring,
> -				 uint32_t flags)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> -	unsigned long irq_flags;
> -	int ret;
> +	if (i915_request_complete(rq)) {
> +		intel_do_mmio_flip(crtc);
> +		return 0;
> +	}
>   
> -	if (WARN_ON(intel_crtc->mmio_flip.seqno))
> -		return -EBUSY;
> +	flip = kmalloc(sizeof(*flip), GFP_KERNEL);
> +	if (flip == NULL)
> +		return -ENOMEM;
>   
> -	ret = intel_postpone_flip(obj);
> -	if (ret < 0)
> +	INIT_WORK(&flip->work, intel_mmio_flip_work);
> +	flip->crtc = crtc;
> +	flip->rq = i915_request_get_breadcrumb(rq);
> +	if (IS_ERR(flip->rq)) {
> +		int ret = PTR_ERR(flip->rq);
> +		kfree(flip);
>   		return ret;
> -	if (ret == 0) {
> -		intel_do_mmio_flip(intel_crtc);
> -		return 0;
>   	}
>   
> -	spin_lock_irqsave(&dev_priv->mmio_flip_lock, irq_flags);
> -	intel_crtc->mmio_flip.seqno = obj->last_write_seqno;
> -	intel_crtc->mmio_flip.ring_id = obj->ring->id;
> -	spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
> -
> -	/*
> -	 * Double check to catch cases where irq fired before
> -	 * mmio flip data was ready
> -	 */
> -	intel_notify_mmio_flip(obj->ring);
> +	schedule_work(&flip->work);
>   	return 0;
>   }
>   
> -static int intel_default_queue_flip(struct drm_device *dev,
> -				    struct drm_crtc *crtc,
> +static int intel_default_queue_flip(struct i915_gem_request *rq,
> +				    struct intel_crtc *crtc,
>   				    struct drm_framebuffer *fb,
>   				    struct drm_i915_gem_object *obj,
> -				    struct intel_engine_cs *ring,
>   				    uint32_t flags)
>   {
>   	return -ENODEV;
> @@ -9787,9 +9726,8 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
>   		return false;
>   
>   	if (work->flip_ready_vblank == 0) {
> -		if (work->flip_queued_ring &&
> -		    !i915_seqno_passed(work->flip_queued_ring->get_seqno(work->flip_queued_ring, true),
> -				       work->flip_queued_seqno))
> +		if (work->flip_queued_request &&
> +		    !i915_request_complete(work->flip_queued_request))
>   			return false;
>   
>   		work->flip_ready_vblank = drm_vblank_count(dev, intel_crtc->pipe);
> @@ -9843,7 +9781,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>   	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   	enum pipe pipe = intel_crtc->pipe;
>   	struct intel_unpin_work *work;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_request *rq;
>   	unsigned long flags;
>   	int ret;
>   
> @@ -9930,45 +9869,63 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>   		work->flip_count = I915_READ(PIPE_FLIPCOUNT_GM45(pipe)) + 1;
>   
>   	if (IS_VALLEYVIEW(dev)) {
> -		ring = &dev_priv->ring[BCS];
> +		engine = &dev_priv->engine[BCS];
>   		if (obj->tiling_mode != work->old_fb_obj->tiling_mode)
>   			/* vlv: DISPLAY_FLIP fails to change tiling */
> -			ring = NULL;
> +			engine = NULL;
>   	} else if (IS_IVYBRIDGE(dev)) {
> -		ring = &dev_priv->ring[BCS];
> +		engine = &dev_priv->engine[BCS];
>   	} else if (INTEL_INFO(dev)->gen >= 7) {
> -		ring = obj->ring;
> -		if (ring == NULL || ring->id != RCS)
> -			ring = &dev_priv->ring[BCS];
> +		engine = i915_request_engine(obj->last_write.request);
> +		if (engine == NULL || engine->id != RCS)
> +			engine = &dev_priv->engine[BCS];
>   	} else {
> -		ring = &dev_priv->ring[RCS];
> +		engine = &dev_priv->engine[RCS];
>   	}
>   
> -	ret = intel_pin_and_fence_fb_obj(dev, obj, ring);
> -	if (ret)
> -		goto cleanup_pending;
> +	if (use_mmio_flip(engine, obj)) {
> +		rq = i915_request_get(obj->last_write.request);
> +
> +		ret = intel_pin_and_fence_fb_obj(dev, obj, rq);
> +		if (ret)
> +			goto cleanup_rq;
>   
> -	work->gtt_offset =
> -		i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset;
> +		work->gtt_offset =
> +			i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset;
>   
> -	if (use_mmio_flip(ring, obj)) {
> -		ret = intel_queue_mmio_flip(dev, crtc, fb, obj, ring,
> -					    page_flip_flags);
> +		ret = intel_queue_mmio_flip(intel_crtc, rq);
>   		if (ret)
>   			goto cleanup_unpin;
> -
> -		work->flip_queued_seqno = obj->last_write_seqno;
> -		work->flip_queued_ring = obj->ring;
>   	} else {
> -		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
> +		struct intel_context *ctx = engine->default_context;
> +		if (obj->last_write.request)
> +			ctx = obj->last_write.request->ctx;
> +		rq = intel_engine_alloc_request(engine, ctx);
> +		if (IS_ERR(rq)) {
> +			ret = PTR_ERR(rq);
> +			goto cleanup_pending;
> +		}
> +
> +		ret = intel_pin_and_fence_fb_obj(dev, obj, rq);
> +		if (ret)
> +			goto cleanup_rq;
> +
> +		work->gtt_offset =
> +			i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset;
> +
> +		ret = dev_priv->display.queue_flip(rq, intel_crtc, fb, obj,
>   						   page_flip_flags);
>   		if (ret)
>   			goto cleanup_unpin;
>   
> -		work->flip_queued_seqno = intel_ring_get_seqno(ring);
> -		work->flip_queued_ring = ring;
> +		intel_mark_page_flip_active(intel_crtc);
> +
> +		ret = i915_request_commit(rq);
> +		if (ret)
> +			goto cleanup_unpin;
>   	}
>   
> +	work->flip_queued_request = rq;
>   	work->flip_queued_vblank = drm_vblank_count(dev, intel_crtc->pipe);
>   	work->enable_stall_check = true;
>   
> @@ -9985,6 +9942,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>   
>   cleanup_unpin:
>   	intel_unpin_fb_obj(obj);
> +cleanup_rq:
> +	i915_request_put(rq);
>   cleanup_pending:
>   	atomic_dec(&intel_crtc->unpin_work_count);
>   	crtc->primary->fb = old_fb;
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 07ce04683c30..b0115e81fb6e 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -382,11 +382,6 @@ struct intel_pipe_wm {
>   	bool sprites_scaled;
>   };
>   
> -struct intel_mmio_flip {
> -	u32 seqno;
> -	u32 ring_id;
> -};
> -
>   struct intel_crtc {
>   	struct drm_crtc base;
>   	enum pipe pipe;
> @@ -437,7 +432,7 @@ struct intel_crtc {
>   	} wm;
>   
>   	int scanline_offset;
> -	struct intel_mmio_flip mmio_flip;
> +	struct i915_gem_request *mmio_flip;
>   };
>   
>   struct intel_plane_wm_parameters {
> @@ -674,8 +669,7 @@ struct intel_unpin_work {
>   #define INTEL_FLIP_COMPLETE	2
>   	u32 flip_count;
>   	u32 gtt_offset;
> -	struct intel_engine_cs *flip_queued_ring;
> -	u32 flip_queued_seqno;
> +	struct i915_gem_request *flip_queued_request;
>   	int flip_queued_vblank;
>   	int flip_ready_vblank;
>   	bool enable_stall_check;
> @@ -795,7 +789,7 @@ bool intel_has_pending_fb_unpin(struct drm_device *dev);
>   int intel_pch_rawclk(struct drm_device *dev);
>   void intel_mark_busy(struct drm_device *dev);
>   void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
> -			     struct intel_engine_cs *ring);
> +			     struct i915_gem_request *rq);
>   void intel_frontbuffer_flip_prepare(struct drm_device *dev,
>   				    unsigned frontbuffer_bits);
>   void intel_frontbuffer_flip_complete(struct drm_device *dev,
> @@ -853,7 +847,7 @@ void intel_release_load_detect_pipe(struct drm_connector *connector,
>   				    struct intel_load_detect_pipe *old);
>   int intel_pin_and_fence_fb_obj(struct drm_device *dev,
>   			       struct drm_i915_gem_object *obj,
> -			       struct intel_engine_cs *pipelined);
> +			       struct i915_gem_request *pipelined);
>   void intel_unpin_fb_obj(struct drm_i915_gem_object *obj);
>   struct drm_framebuffer *
>   __intel_framebuffer_create(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index bd1b28d99920..d47af931d5ab 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -204,57 +204,12 @@ enum {
>   };
>   #define GEN8_CTX_ID_SHIFT 32
>   
> -/**
> - * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
> - * @dev: DRM device.
> - * @enable_execlists: value of i915.enable_execlists module parameter.
> - *
> - * Only certain platforms support Execlists (the prerequisites being
> - * support for Logical Ring Contexts and Aliasing PPGTT or better),
> - * and only when enabled via module parameter.
> - *
> - * Return: 1 if Execlists is supported and has to be enabled.
> - */
> -int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists)
> -{
> -	WARN_ON(i915.enable_ppgtt == -1);
> -
> -	if (enable_execlists == 0)
> -		return 0;
> -
> -	if (HAS_LOGICAL_RING_CONTEXTS(dev) && USES_PPGTT(dev) &&
> -	    i915.use_mmio_flip >= 0)
> -		return 1;
> -
> -	return 0;
> -}
> -
> -/**
> - * intel_execlists_ctx_id() - get the Execlists Context ID
> - * @ctx_obj: Logical Ring Context backing object.
> - *
> - * Do not confuse with ctx->id! Unfortunately we have a name overload
> - * here: the old context ID we pass to userspace as a handler so that
> - * they can refer to a context, and the new context ID we pass to the
> - * ELSP so that the GPU can inform us of the context status via
> - * interrupts.
> - *
> - * Return: 20-bits globally unique context ID.
> - */
> -u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
> +static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj,
> +					 u32 ctx_id)
>   {
> -	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
> -
> -	/* LRCA is required to be 4K aligned so the more significant 20 bits
> -	 * are globally unique */
> -	return lrca >> 12;
> -}
> -
> -static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
> -{
> -	uint64_t desc;
> -	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
> +	uint64_t desc, lrca;
>   
> +	lrca = i915_gem_obj_ggtt_offset(ctx_obj);
>   	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>   
>   	desc = GEN8_CTX_VALID;
> @@ -262,7 +217,7 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
>   	desc |= GEN8_CTX_L3LLC_COHERENT;
>   	desc |= GEN8_CTX_PRIVILEGE;
>   	desc |= lrca;
> -	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
> +	desc |= (u64)ctx_id << GEN8_CTX_ID_SHIFT;
>   
>   	/* TODO: WaDisableLiteRestore when we start using semaphore
>   	 * signalling between Command Streamers */
> @@ -271,26 +226,39 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
>   	return desc;
>   }
>   
> -static void execlists_elsp_write(struct intel_engine_cs *ring,
> -				 struct drm_i915_gem_object *ctx_obj0,
> -				 struct drm_i915_gem_object *ctx_obj1)
> +static u32 execlists_ctx_write_tail(struct drm_i915_gem_object *obj, u32 tail, u32 tag)
> +{
> +	uint32_t *reg_state;
> +
> +	reg_state = kmap_atomic(i915_gem_object_get_page(obj, 1));
> +	reg_state[CTX_RING_TAIL+1] = tail;
> +	kunmap_atomic(reg_state);
> +
> +	return execlists_ctx_descriptor(obj, tag);
> +}
> +
> +static void execlists_submit_pair(struct intel_engine_cs *engine,
> +				  struct i915_gem_request *rq[2])
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	uint64_t temp = 0;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	uint64_t tmp;
>   	uint32_t desc[4];
>   	unsigned long flags;
>   
>   	/* XXX: You must always write both descriptors in the order below. */
> -	if (ctx_obj1)
> -		temp = execlists_ctx_descriptor(ctx_obj1);
> -	else
> -		temp = 0;
> -	desc[1] = (u32)(temp >> 32);
> -	desc[0] = (u32)temp;
>   
> -	temp = execlists_ctx_descriptor(ctx_obj0);
> -	desc[3] = (u32)(temp >> 32);
> -	desc[2] = (u32)temp;
> +	tmp = execlists_ctx_write_tail(rq[0]->ctx->ring[engine->id].state,
> +				       rq[0]->tail, rq[0]->tag);
> +	desc[3] = upper_32_bits(tmp);
> +	desc[2] = lower_32_bits(tmp);
> +
> +	if (rq[1])
> +		tmp = execlists_ctx_write_tail(rq[1]->ctx->ring[engine->id].state,
> +					       rq[1]->tail, rq[1]->tag);
> +	else
> +		tmp = 0;
> +	desc[1] = upper_32_bits(tmp);
> +	desc[0] = lower_32_bits(tmp);
>   
>   	/* Set Force Wakeup bit to prevent GT from entering C6 while ELSP writes
>   	 * are in progress.
> @@ -304,14 +272,14 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>   		dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL);
>   	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>   
> -	I915_WRITE(RING_ELSP(ring), desc[1]);
> -	I915_WRITE(RING_ELSP(ring), desc[0]);
> -	I915_WRITE(RING_ELSP(ring), desc[3]);
> +	I915_WRITE(RING_ELSP(engine), desc[1]);
> +	I915_WRITE(RING_ELSP(engine), desc[0]);
> +	I915_WRITE(RING_ELSP(engine), desc[3]);
>   	/* The context is automatically loaded after the following */
> -	I915_WRITE(RING_ELSP(ring), desc[2]);
> +	I915_WRITE(RING_ELSP(engine), desc[2]);
>   
>   	/* ELSP is a wo register, so use another nearby reg for posting instead */
> -	POSTING_READ(RING_EXECLIST_STATUS(ring));
> +	POSTING_READ(RING_EXECLIST_STATUS(engine));
>   
>   	/* Release Force Wakeup (see the big comment above). */
>   	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
> @@ -320,115 +288,58 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>   	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>   }
>   
> -static int execlists_ctx_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
> -{
> -	struct page *page;
> -	uint32_t *reg_state;
> -
> -	page = i915_gem_object_get_page(ctx_obj, 1);
> -	reg_state = kmap_atomic(page);
> -
> -	reg_state[CTX_RING_TAIL+1] = tail;
> -
> -	kunmap_atomic(reg_state);
> -
> -	return 0;
> -}
> -
> -static int execlists_submit_context(struct intel_engine_cs *ring,
> -				    struct intel_context *to0, u32 tail0,
> -				    struct intel_context *to1, u32 tail1)
> +static u16 next_tag(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_gem_object *ctx_obj0;
> -	struct drm_i915_gem_object *ctx_obj1 = NULL;
> -
> -	ctx_obj0 = to0->engine[ring->id].state;
> -	BUG_ON(!ctx_obj0);
> -	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
> -
> -	execlists_ctx_write_tail(ctx_obj0, tail0);
> -
> -	if (to1) {
> -		ctx_obj1 = to1->engine[ring->id].state;
> -		BUG_ON(!ctx_obj1);
> -		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
> -
> -		execlists_ctx_write_tail(ctx_obj1, tail1);
> -	}
> -
> -	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
> -
> -	return 0;
> +	/* status tags are limited to 20b, so we use a u16 for convenience */
> +	if (++engine->next_tag == 0)
> +		++engine->next_tag;
> +	WARN_ON((s16)(engine->next_tag - engine->tag) < 0);
> +	return engine->next_tag;
>   }
>   
> -static void execlists_context_unqueue(struct intel_engine_cs *ring)
> +static void execlists_submit(struct intel_engine_cs *engine)
>   {
> -	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
> -	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct i915_gem_request *rq[2] = {};
> +	int i = 0;
>   
> -	assert_spin_locked(&ring->execlist_lock);
> -
> -	if (list_empty(&ring->execlist_queue))
> -		return;
> +	assert_spin_locked(&engine->irqlock);
>   
>   	/* Try to read in pairs */
> -	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
> -				 execlist_link) {
> -		if (!req0) {
> -			req0 = cursor;
> -		} else if (req0->ctx == cursor->ctx) {
> +	while (!list_empty(&engine->pending)) {
> +		struct i915_gem_request *next;
> +
> +		next = list_first_entry(&engine->pending,
> +					typeof(*next),
> +					engine_list);
> +
> +		if (rq[i] == NULL) {
> +new_slot:
> +			next->tag = next_tag(engine);
> +			rq[i] = next;
> +		} else if (rq[i]->ctx == next->ctx) {
>   			/* Same ctx: ignore first request, as second request
>   			 * will update tail past first request's workload */
> -			cursor->elsp_submitted = req0->elsp_submitted;
> -			list_del(&req0->execlist_link);
> -			queue_work(dev_priv->wq, &req0->work);
> -			req0 = cursor;
> +			next->tag = rq[i]->tag;
> +			rq[i] = next;
>   		} else {
> -			req1 = cursor;
> -			break;
> -		}
> -	}
> -
> -	WARN_ON(req1 && req1->elsp_submitted);
> -
> -	WARN_ON(execlists_submit_context(ring, req0->ctx, req0->tail,
> -					 req1 ? req1->ctx : NULL,
> -					 req1 ? req1->tail : 0));
> -
> -	req0->elsp_submitted++;
> -	if (req1)
> -		req1->elsp_submitted++;
> -}
> +			if (++i == ARRAY_SIZE(rq))
> +				break;
>   
> -static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> -					   u32 request_id)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	struct intel_ctx_submit_request *head_req;
> -
> -	assert_spin_locked(&ring->execlist_lock);
> -
> -	head_req = list_first_entry_or_null(&ring->execlist_queue,
> -					    struct intel_ctx_submit_request,
> -					    execlist_link);
> -
> -	if (head_req != NULL) {
> -		struct drm_i915_gem_object *ctx_obj =
> -				head_req->ctx->engine[ring->id].state;
> -		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> -			WARN(head_req->elsp_submitted == 0,
> -			     "Never submitted head request\n");
> -
> -			if (--head_req->elsp_submitted <= 0) {
> -				list_del(&head_req->execlist_link);
> -				queue_work(dev_priv->wq, &head_req->work);
> -				return true;
> -			}
> +			goto new_slot;
>   		}
> +
> +		/* Move to requests is staged via the submitted list
> +		 * so that we can keep the main request list out of
> +		 * the spinlock coverage.
> +		 */
> +		list_move_tail(&next->engine_list, &engine->submitted);
>   	}
>   
> -	return false;
> +	execlists_submit_pair(engine, rq);
> +
> +	engine->execlists_submitted++;
> +	if (rq[1])
> +		engine->execlists_submitted++;
>   }
>   
>   /**
> @@ -438,1308 +349,378 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>    * Check the unread Context Status Buffers and manage the submission of new
>    * contexts to the ELSP accordingly.
>    */
> -void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
> +void intel_execlists_irq_handler(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	u32 status_pointer;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	unsigned long flags;
>   	u8 read_pointer;
>   	u8 write_pointer;
> -	u32 status;
> -	u32 status_id;
> -	u32 submit_contexts = 0;
>   
> -	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> -
> -	read_pointer = ring->next_context_status_buffer;
> -	write_pointer = status_pointer & 0x07;
> +	read_pointer = engine->next_context_status_buffer;
> +	write_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(engine)) & 0x07;
>   	if (read_pointer > write_pointer)
>   		write_pointer += 6;
>   
> -	spin_lock(&ring->execlist_lock);
> +	spin_lock_irqsave(&engine->irqlock, flags);
>   
> -	while (read_pointer < write_pointer) {
> -		read_pointer++;
> -		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> -				(read_pointer % 6) * 8);
> -		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> -				(read_pointer % 6) * 8 + 4);
> +	while (read_pointer++ < write_pointer) {
> +		u32 reg = (RING_CONTEXT_STATUS_BUF(engine) +
> +			   (read_pointer % 6) * 8);
> +		u32 status = I915_READ(reg);
>   
>   		if (status & GEN8_CTX_STATUS_PREEMPTED) {
> -			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
> -				if (execlists_check_remove_request(ring, status_id))
> -					WARN(1, "Lite Restored request removed from queue\n");
> -			} else
> +			if (status & GEN8_CTX_STATUS_LITE_RESTORE)
> +				WARN(1, "Lite Restored request removed from queue\n");
> +			else
>   				WARN(1, "Preemption without Lite Restore\n");
>   		}
>   
> -		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
> -		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
> -			if (execlists_check_remove_request(ring, status_id))
> -				submit_contexts++;
> +		if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE | GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
> +			engine->tag = I915_READ(reg + 4);
> +			engine->execlists_submitted--;
>   		}
>   	}
>   
> -	if (submit_contexts != 0)
> -		execlists_context_unqueue(ring);
> -
> -	spin_unlock(&ring->execlist_lock);
> +	if (engine->execlists_submitted < 2)
> +		execlists_submit(engine);
>   
> -	WARN(submit_contexts > 2, "More than two context complete events?\n");
> -	ring->next_context_status_buffer = write_pointer % 6;
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
>   
> -	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
> -		   ((u32)ring->next_context_status_buffer & 0x07) << 8);
> +	engine->next_context_status_buffer = write_pointer % 6;
> +	I915_WRITE(RING_CONTEXT_STATUS_PTR(engine),
> +		   ((u32)engine->next_context_status_buffer & 0x07) << 8);
>   }
>   
> -static void execlists_free_request_task(struct work_struct *work)
> +static int
> +populate_lr_context(struct intel_context *ctx,
> +		    struct drm_i915_gem_object *ctx_obj,
> +		    struct intel_engine_cs *engine)
>   {
> -	struct intel_ctx_submit_request *req =
> -		container_of(work, struct intel_ctx_submit_request, work);
> -	struct drm_device *dev = req->ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	intel_runtime_pm_put(dev_priv);
> -
> -	mutex_lock(&dev->struct_mutex);
> -	i915_gem_context_unreference(req->ctx);
> -	mutex_unlock(&dev->struct_mutex);
> +	struct intel_ringbuffer *ring = ctx->ring[engine->id].ring;
> +	struct i915_hw_ppgtt *ppgtt;
> +	uint32_t *reg_state;
> +	int ret;
>   
> -	kfree(req);
> -}
> +	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
> +		return ret;
> +	}
>   
> -static int execlists_context_queue(struct intel_engine_cs *ring,
> -				   struct intel_context *to,
> -				   u32 tail)
> -{
> -	struct intel_ctx_submit_request *req = NULL, *cursor;
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	unsigned long flags;
> -	int num_elements = 0;
> +	ret = i915_gem_object_get_pages(ctx_obj);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Could not get object pages\n");
> +		return ret;
> +	}
>   
> -	req = kzalloc(sizeof(*req), GFP_KERNEL);
> -	if (req == NULL)
> -		return -ENOMEM;
> -	req->ctx = to;
> -	i915_gem_context_reference(req->ctx);
> -	req->ring = ring;
> -	req->tail = tail;
> -	INIT_WORK(&req->work, execlists_free_request_task);
> +	/* The second page of the context object contains some fields which must
> +	 * be set up prior to the first execution. */
> +	reg_state = kmap_atomic(i915_gem_object_get_page(ctx_obj, 1));
>   
> -	intel_runtime_pm_get(dev_priv);
> +	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
> +	 * commands followed by (reg, value) pairs. The values we are setting here are
> +	 * only for the first context restore: on a subsequent save, the GPU will
> +	 * recreate this batchbuffer with new values (including all the missing
> +	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
> +	if (engine->id == RCS)
> +		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
> +	else
> +		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
> +	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
>   
> -	spin_lock_irqsave(&ring->execlist_lock, flags);
> +	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(engine);
> +	reg_state[CTX_CONTEXT_CONTROL+1] =
> +			_MASKED_BIT_ENABLE((1<<3) | MI_RESTORE_INHIBIT);
>   
> -	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> -		if (++num_elements > 2)
> -			break;
> +	reg_state[CTX_RING_HEAD] = RING_HEAD(engine->mmio_base);
> +	reg_state[CTX_RING_HEAD+1] = 0;
> +	reg_state[CTX_RING_TAIL] = RING_TAIL(engine->mmio_base);
> +	reg_state[CTX_RING_TAIL+1] = 0;
> +	reg_state[CTX_RING_BUFFER_START] = RING_START(engine->mmio_base);
> +	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring->obj);
> +	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(engine->mmio_base);
> +	reg_state[CTX_RING_BUFFER_CONTROL+1] =
> +			((ring->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
>   
> -	if (num_elements > 2) {
> -		struct intel_ctx_submit_request *tail_req;
> +	reg_state[CTX_BB_HEAD_U] = engine->mmio_base + 0x168;
> +	reg_state[CTX_BB_HEAD_U+1] = 0;
> +	reg_state[CTX_BB_HEAD_L] = engine->mmio_base + 0x140;
> +	reg_state[CTX_BB_HEAD_L+1] = 0;
> +	reg_state[CTX_BB_STATE] = engine->mmio_base + 0x110;
> +	reg_state[CTX_BB_STATE+1] = (1<<5);
>   
> -		tail_req = list_last_entry(&ring->execlist_queue,
> -					   struct intel_ctx_submit_request,
> -					   execlist_link);
> +	reg_state[CTX_SECOND_BB_HEAD_U] = engine->mmio_base + 0x11c;
> +	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
> +	reg_state[CTX_SECOND_BB_HEAD_L] = engine->mmio_base + 0x114;
> +	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
> +	reg_state[CTX_SECOND_BB_STATE] = engine->mmio_base + 0x118;
> +	reg_state[CTX_SECOND_BB_STATE+1] = 0;
>   
> -		if (to == tail_req->ctx) {
> -			WARN(tail_req->elsp_submitted != 0,
> -			     "More than 2 already-submitted reqs queued\n");
> -			list_del(&tail_req->execlist_link);
> -			queue_work(dev_priv->wq, &tail_req->work);
> -		}
> +	if (engine->id == RCS) {
> +		/* TODO: according to BSpec, the register state context
> +		 * for CHV does not have these. OTOH, these registers do
> +		 * exist in CHV. I'm waiting for a clarification */
> +		reg_state[CTX_BB_PER_CTX_PTR] = engine->mmio_base + 0x1c0;
> +		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
> +		reg_state[CTX_RCS_INDIRECT_CTX] = engine->mmio_base + 0x1c4;
> +		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
> +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = engine->mmio_base + 0x1c8;
> +		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
>   	}
>   
> -	list_add_tail(&req->execlist_link, &ring->execlist_queue);
> -	if (num_elements == 0)
> -		execlists_context_unqueue(ring);
> -
> -	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
> +	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
> +	reg_state[CTX_CTX_TIMESTAMP] = engine->mmio_base + 0x3a8;
> +	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
>   
> -	return 0;
> -}
> +	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(engine, 3);
> +	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(engine, 3);
> +	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(engine, 2);
> +	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(engine, 2);
> +	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(engine, 1);
> +	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(engine, 1);
> +	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(engine, 0);
> +	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(engine, 0);
>   
> -static int logical_ring_invalidate_all_caches(struct intel_ringbuffer *ringbuf)
> -{
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	uint32_t flush_domains;
> -	int ret;
> +	ppgtt = ctx->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
>   
> -	flush_domains = 0;
> -	if (ring->gpu_caches_dirty)
> -		flush_domains = I915_GEM_GPU_DOMAINS;
> +	if (engine->id == RCS) {
> +		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> +		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> +		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
> +	}
>   
> -	ret = ring->emit_flush(ringbuf, I915_GEM_GPU_DOMAINS, flush_domains);
> -	if (ret)
> -		return ret;
> +	kunmap_atomic(reg_state);
>   
> -	ring->gpu_caches_dirty = false;
>   	return 0;
>   }
>   
> -static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
> -				 struct list_head *vmas)
> +static uint32_t get_lr_context_size(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct i915_vma *vma;
> -	uint32_t flush_domains = 0;
> -	bool flush_chipset = false;
> -	int ret;
> -
> -	list_for_each_entry(vma, vmas, exec_list) {
> -		struct drm_i915_gem_object *obj = vma->obj;
> -
> -		ret = i915_gem_object_sync(obj, ring);
> -		if (ret)
> -			return ret;
> +	int ret = 0;
>   
> -		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
> -			flush_chipset |= i915_gem_clflush_object(obj, false);
> +	WARN_ON(INTEL_INFO(engine->i915)->gen != 8);
>   
> -		flush_domains |= obj->base.write_domain;
> +	switch (engine->id) {
> +	case RCS:
> +		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
> +		break;
> +	case VCS:
> +	case BCS:
> +	case VECS:
> +	case VCS2:
> +		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
> +		break;
>   	}
>   
> -	if (flush_domains & I915_GEM_DOMAIN_GTT)
> -		wmb();
> -
> -	/* Unconditionally invalidate gpu caches and ensure that we do flush
> -	 * any residual writes from the previous batch.
> -	 */
> -	return logical_ring_invalidate_all_caches(ringbuf);
> +	return ret;
>   }
>   
> -/**
> - * execlists_submission() - submit a batchbuffer for execution, Execlists style
> - * @dev: DRM device.
> - * @file: DRM file.
> - * @ring: Engine Command Streamer to submit to.
> - * @ctx: Context to employ for this submission.
> - * @args: execbuffer call arguments.
> - * @vmas: list of vmas.
> - * @batch_obj: the batchbuffer to submit.
> - * @exec_start: batchbuffer start virtual address pointer.
> - * @flags: translated execbuffer call flags.
> - *
> - * This is the evil twin version of i915_gem_ringbuffer_submission. It abstracts
> - * away the submission details of the execbuffer ioctl call.
> - *
> - * Return: non-zero if the submission fails.
> - */
> -int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
> -			       struct intel_engine_cs *ring,
> -			       struct intel_context *ctx,
> -			       struct drm_i915_gem_execbuffer2 *args,
> -			       struct list_head *vmas,
> -			       struct drm_i915_gem_object *batch_obj,
> -			       u64 exec_start, u32 flags)
> +static struct intel_ringbuffer *
> +execlists_get_ring(struct intel_engine_cs *engine,
> +		   struct intel_context *ctx)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> -	int instp_mode;
> -	u32 instp_mask;
> +	struct drm_i915_gem_object *ctx_obj;
> +	struct intel_ringbuffer *ring;
> +	uint32_t context_size;
>   	int ret;
>   
> -	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
> -	instp_mask = I915_EXEC_CONSTANTS_MASK;
> -	switch (instp_mode) {
> -	case I915_EXEC_CONSTANTS_REL_GENERAL:
> -	case I915_EXEC_CONSTANTS_ABSOLUTE:
> -	case I915_EXEC_CONSTANTS_REL_SURFACE:
> -		if (instp_mode != 0 && ring != &dev_priv->ring[RCS]) {
> -			DRM_DEBUG("non-0 rel constants mode on non-RCS\n");
> -			return -EINVAL;
> -		}
> -
> -		if (instp_mode != dev_priv->relative_constants_mode) {
> -			if (instp_mode == I915_EXEC_CONSTANTS_REL_SURFACE) {
> -				DRM_DEBUG("rel surface constants mode invalid on gen5+\n");
> -				return -EINVAL;
> -			}
> -
> -			/* The HW changed the meaning on this bit on gen6 */
> -			instp_mask &= ~I915_EXEC_CONSTANTS_REL_SURFACE;
> -		}
> -		break;
> -	default:
> -		DRM_DEBUG("execbuf with unknown constants: %d\n", instp_mode);
> -		return -EINVAL;
> +	ring = intel_engine_alloc_ring(engine, ctx, 32 * PAGE_SIZE);
> +	if (IS_ERR(ring)) {
> +		DRM_ERROR("Failed to allocate ringbuffer %s: %ld\n",
> +			  engine->name, PTR_ERR(ring));
> +		return ERR_CAST(ring);
>   	}
>   
> -	if (args->num_cliprects != 0) {
> -		DRM_DEBUG("clip rectangles are only valid on pre-gen5\n");
> -		return -EINVAL;
> -	} else {
> -		if (args->DR4 == 0xffffffff) {
> -			DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
> -			args->DR4 = 0;
> -		}
> +	context_size = round_up(get_lr_context_size(engine), 4096);
>   
> -		if (args->DR1 || args->DR4 || args->cliprects_ptr) {
> -			DRM_DEBUG("0 cliprects but dirt in cliprects fields\n");
> -			return -EINVAL;
> -		}
> +	ctx_obj = i915_gem_alloc_context_obj(engine->i915->dev, context_size);
> +	if (IS_ERR(ctx_obj)) {
> +		ret = PTR_ERR(ctx_obj);
> +		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
> +		return ERR_CAST(ctx_obj);
>   	}
>   
> -	if (args->flags & I915_EXEC_GEN7_SOL_RESET) {
> -		DRM_DEBUG("sol reset is gen7 only\n");
> -		return -EINVAL;
> +	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
> +		goto err_unref;
>   	}
>   
> -	ret = execlists_move_to_gpu(ringbuf, vmas);
> -	if (ret)
> -		return ret;
> +	ret = populate_lr_context(ctx, ctx_obj, engine);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
> +		goto err_unpin;
> +	}
>   
> -	if (ring == &dev_priv->ring[RCS] &&
> -	    instp_mode != dev_priv->relative_constants_mode) {
> -		ret = intel_logical_ring_begin(ringbuf, 4);
> -		if (ret)
> -			return ret;
> +	ctx->ring[engine->id].state = ctx_obj;
>   
> -		intel_logical_ring_emit(ringbuf, MI_NOOP);
> -		intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
> -		intel_logical_ring_emit(ringbuf, INSTPM);
> -		intel_logical_ring_emit(ringbuf, instp_mask << 16 | instp_mode);
> -		intel_logical_ring_advance(ringbuf);
> +	if (ctx == engine->default_context) {
> +		struct drm_i915_private *dev_priv = engine->i915;
> +		u32 reg;
>   
> -		dev_priv->relative_constants_mode = instp_mode;
> -	}
> +		/* The status page is offset 0 from the context object in LRCs. */
> +		engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj);
> +		engine->status_page.page_addr = kmap(sg_page(ctx_obj->pages->sgl));
> +		if (engine->status_page.page_addr == NULL) {
> +			ret = -ENOMEM;
> +			goto err_unpin;
> +		}
>   
> -	ret = ring->emit_bb_start(ringbuf, exec_start, flags);
> -	if (ret)
> -		return ret;
> +		engine->status_page.obj = ctx_obj;
>   
> -	i915_gem_execbuffer_move_to_active(vmas, ring);
> -	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> +		reg = RING_HWS_PGA(engine->mmio_base);
> +		I915_WRITE(reg, engine->status_page.gfx_addr);
> +		POSTING_READ(reg);
> +	}
>   
>   	return 0;
> +
> +err_unpin:
> +	i915_gem_object_ggtt_unpin(ctx_obj);
> +err_unref:
> +	drm_gem_object_unreference(&ctx_obj->base);
> +	return ERR_PTR(ret);
>   }
>   
> -void intel_logical_ring_stop(struct intel_engine_cs *ring)
> +static void execlists_put_ring(struct intel_ringbuffer *ring,
> +			       struct intel_context *ctx)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	int ret;
> -
> -	if (!intel_ring_initialized(ring))
> -		return;
> -
> -	ret = intel_ring_idle(ring);
> -	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
> -		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
> -			  ring->name, ret);
> -
> -	/* TODO: Is this correct with Execlists enabled? */
> -	I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> -	if (wait_for_atomic((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> -		DRM_ERROR("%s :timed out trying to stop ring\n", ring->name);
> -		return;
> -	}
> -	I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
> +	intel_ring_free(ring);
>   }
>   
> -int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf)
> +static int execlists_add_request(struct i915_gem_request *rq)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	int ret;
> +	unsigned long flags;
>   
> -	if (!ring->gpu_caches_dirty)
> -		return 0;
> +	spin_lock_irqsave(&rq->engine->irqlock, flags);
>   
> -	ret = ring->emit_flush(ringbuf, 0, I915_GEM_GPU_DOMAINS);
> -	if (ret)
> -		return ret;
> +	list_add_tail(&rq->engine_list, &rq->engine->pending);
> +	if (rq->engine->execlists_submitted < 2)
> +		execlists_submit(rq->engine);
> +
> +	spin_unlock_irqrestore(&rq->engine->irqlock, flags);
>   
> -	ring->gpu_caches_dirty = false;
>   	return 0;
>   }
>   
> -/**
> - * intel_logical_ring_advance_and_submit() - advance the tail and submit the workload
> - * @ringbuf: Logical Ringbuffer to advance.
> - *
> - * The tail is updated in our logical ringbuffer struct, not in the actual context. What
> - * really happens during submission is that the context and current tail will be placed
> - * on a queue waiting for the ELSP to be ready to accept a new context submission. At that
> - * point, the tail *inside* the context is updated and the ELSP written to.
> - */
> -void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf)
> +static bool execlists_rq_is_complete(struct i915_gem_request *rq)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct intel_context *ctx = ringbuf->FIXME_lrc_ctx;
> -
> -	intel_logical_ring_advance(ringbuf);
> -
> -	if (intel_ring_stopped(ring))
> -		return;
> -
> -	execlists_context_queue(ring, ctx, ringbuf->tail);
> +	return (s16)(rq->engine->tag - rq->tag) >= 0;
>   }
>   
> -static int logical_ring_alloc_seqno(struct intel_engine_cs *ring,
> -				    struct intel_context *ctx)
> +static int execlists_suspend(struct intel_engine_cs *engine)
>   {
> -	if (ring->outstanding_lazy_seqno)
> -		return 0;
> -
> -	if (ring->preallocated_lazy_request == NULL) {
> -		struct drm_i915_gem_request *request;
> -
> -		request = kmalloc(sizeof(*request), GFP_KERNEL);
> -		if (request == NULL)
> -			return -ENOMEM;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	unsigned long flags;
>   
> -		/* Hold a reference to the context this request belongs to
> -		 * (we will need it when the time comes to emit/retire the
> -		 * request).
> -		 */
> -		request->ctx = ctx;
> -		i915_gem_context_reference(request->ctx);
> +	/* disable submitting more requests until resume */
> +	spin_lock_irqsave(&engine->irqlock, flags);
> +	engine->execlists_submitted = ~0;
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
>   
> -		ring->preallocated_lazy_request = request;
> -	}
> +	I915_WRITE(RING_MODE_GEN7(engine),
> +		   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE) |
> +		   _MASKED_BIT_DISABLE(GFX_RUN_LIST_ENABLE));
> +	POSTING_READ(RING_MODE_GEN7(engine));
> +	DRM_DEBUG_DRIVER("Execlists disabled for %s\n", engine->name);
>   
> -	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
> +	return 0;
>   }
>   
> -static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
> -				     int bytes)
> +static int execlists_resume(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct drm_i915_gem_request *request;
> -	u32 seqno = 0;
> -	int ret;
> -
> -	if (ringbuf->last_retired_head != -1) {
> -		ringbuf->head = ringbuf->last_retired_head;
> -		ringbuf->last_retired_head = -1;
> -
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= bytes)
> -			return 0;
> -	}
> -
> -	list_for_each_entry(request, &ring->request_list, list) {
> -		if (__intel_ring_space(request->tail, ringbuf->tail,
> -				       ringbuf->size) >= bytes) {
> -			seqno = request->seqno;
> -			break;
> -		}
> -	}
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	unsigned long flags;
>   
> -	if (seqno == 0)
> -		return -ENOSPC;
> +	/* XXX */
> +	I915_WRITE(RING_HWSTAM(engine->mmio_base), 0xffffffff);
>   
> -	ret = i915_wait_seqno(ring, seqno);
> -	if (ret)
> -		return ret;
> +	I915_WRITE(RING_MODE_GEN7(engine),
> +		   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
> +		   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
> +	POSTING_READ(RING_MODE_GEN7(engine));
> +	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", engine->name);
>   
> -	i915_gem_retire_requests_ring(ring);
> -	ringbuf->head = ringbuf->last_retired_head;
> -	ringbuf->last_retired_head = -1;
> +	spin_lock_irqsave(&engine->irqlock, flags);
> +	engine->execlists_submitted = 0;
> +	execlists_submit(engine);
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
>   
> -	ringbuf->space = intel_ring_space(ringbuf);
>   	return 0;
>   }
>   
> -static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
> -				       int bytes)
> +static void execlists_retire(struct intel_engine_cs *engine,
> +			     u32 seqno)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	unsigned long end;
> -	int ret;
> -
> -	ret = logical_ring_wait_request(ringbuf, bytes);
> -	if (ret != -ENOSPC)
> -		return ret;
> -
> -	/* Force the context submission in case we have been skipping it */
> -	intel_logical_ring_advance_and_submit(ringbuf);
> -
> -	/* With GEM the hangcheck timer should kick us out of the loop,
> -	 * leaving it early runs the risk of corrupting GEM state (due
> -	 * to running on almost untested codepaths). But on resume
> -	 * timers don't work yet, so prevent a complete hang in that
> -	 * case by choosing an insanely large timeout. */
> -	end = jiffies + 60 * HZ;
> -
> -	do {
> -		ringbuf->head = I915_READ_HEAD(ring);
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= bytes) {
> -			ret = 0;
> -			break;
> -		}
> -
> -		msleep(1);
> -
> -		if (dev_priv->mm.interruptible && signal_pending(current)) {
> -			ret = -ERESTARTSYS;
> -			break;
> -		}
> -
> -		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> -					   dev_priv->mm.interruptible);
> -		if (ret)
> -			break;
> -
> -		if (time_after(jiffies, end)) {
> -			ret = -EBUSY;
> -			break;
> -		}
> -	} while (1);
> +	unsigned long flags;
>   
> -	return ret;
> +	spin_lock_irqsave(&engine->irqlock, flags);
> +	list_splice_tail_init(&engine->submitted, &engine->requests);
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
>   }
>   
> -static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
> +static void execlists_reset(struct intel_engine_cs *engine)
>   {
> -	uint32_t __iomem *virt;
> -	int rem = ringbuf->size - ringbuf->tail;
> -
> -	if (ringbuf->space < rem) {
> -		int ret = logical_ring_wait_for_space(ringbuf, rem);
> -
> -		if (ret)
> -			return ret;
> -	}
> -
> -	virt = ringbuf->virtual_start + ringbuf->tail;
> -	rem /= 4;
> -	while (rem--)
> -		iowrite32(MI_NOOP, virt++);
> -
> -	ringbuf->tail = 0;
> -	ringbuf->space = intel_ring_space(ringbuf);
> +	unsigned long flags;
>   
> -	return 0;
> +	spin_lock_irqsave(&engine->irqlock, flags);
> +	list_splice_tail_init(&engine->pending, &engine->submitted);
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
>   }
>   
> -static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, int bytes)
> +static bool enable_execlists(struct drm_i915_private *dev_priv)
>   {
> -	int ret;
> -
> -	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
> -		ret = logical_ring_wrap_buffer(ringbuf);
> -		if (unlikely(ret))
> -			return ret;
> -	}
> -
> -	if (unlikely(ringbuf->space < bytes)) {
> -		ret = logical_ring_wait_for_space(ringbuf, bytes);
> -		if (unlikely(ret))
> -			return ret;
> -	}
> +	if (!HAS_LOGICAL_RING_CONTEXTS(dev_priv) ||
> +	    !USES_PPGTT(dev_priv))
> +		return false;
>   
> -	return 0;
> +	return i915.enable_execlists;
>   }
>   
> -/**
> - * intel_logical_ring_begin() - prepare the logical ringbuffer to accept some commands
> - *
> - * @ringbuf: Logical ringbuffer.
> - * @num_dwords: number of DWORDs that we plan to write to the ringbuffer.
> - *
> - * The ringbuffer might not be ready to accept the commands right away (maybe it needs to
> - * be wrapped, or wait a bit for the tail to be updated). This function takes care of that
> - * and also preallocates a request (every workload submission is still mediated through
> - * requests, same as it did with legacy ringbuffer submission).
> - *
> - * Return: non-zero if the ringbuffer is not ready to be written to.
> - */
> -int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords)
> +static const int gen8_irq_shift[] = {
> +	[RCS] = GEN8_RCS_IRQ_SHIFT,
> +	[VCS] = GEN8_VCS1_IRQ_SHIFT,
> +	[BCS] = GEN8_BCS_IRQ_SHIFT,
> +	[VECS] = GEN8_VECS_IRQ_SHIFT,
> +	[VCS2] = GEN8_VCS2_IRQ_SHIFT,
> +};
> +
> +int intel_engine_enable_execlists(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> +	if (!enable_execlists(engine->i915))
> +		return 0;
>   
> -	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> -				   dev_priv->mm.interruptible);
> -	if (ret)
> -		return ret;
> +	if (WARN_ON(!IS_GEN8(engine->i915)))
> +		return 0;
>   
> -	ret = logical_ring_prepare(ringbuf, num_dwords * sizeof(uint32_t));
> -	if (ret)
> -		return ret;
> +	engine->irq_keep_mask |=
> +		GT_CONTEXT_SWITCH_INTERRUPT << gen8_irq_shift[engine->id];
>   
> -	/* Preallocate the olr before touching the ring */
> -	ret = logical_ring_alloc_seqno(ring, ringbuf->FIXME_lrc_ctx);
> -	if (ret)
> -		return ret;
> +	engine->get_ring = execlists_get_ring;
> +	engine->put_ring = execlists_put_ring;
> +	engine->add_request = execlists_add_request;
> +	engine->is_complete = execlists_rq_is_complete;
>   
> -	ringbuf->space -= num_dwords * sizeof(uint32_t);
> -	return 0;
> -}
> +	/* Disable semaphores until further notice */
> +	engine->semaphore.wait = NULL;
>   
> -static int gen8_init_common_ring(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	engine->suspend = execlists_suspend;
> +	engine->resume = execlists_resume;
> +	engine->reset = execlists_reset;
> +	engine->retire = execlists_retire;
>   
> -	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> -	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
> -
> -	I915_WRITE(RING_MODE_GEN7(ring),
> -		   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
> -		   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
> -	POSTING_READ(RING_MODE_GEN7(ring));
> -	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
> -
> -	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> -
> -	return 0;
> -}
> -
> -static int gen8_init_render_ring(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> -
> -	ret = gen8_init_common_ring(ring);
> -	if (ret)
> -		return ret;
> -
> -	/* We need to disable the AsyncFlip performance optimisations in order
> -	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
> -	 * programmed to '1' on all products.
> -	 *
> -	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
> -	 */
> -	I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
> -
> -	ret = intel_init_pipe_control(ring);
> -	if (ret)
> -		return ret;
> -
> -	I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
> -
> -	return ret;
> -}
> -
> -static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
> -			      u64 offset, unsigned flags)
> -{
> -	bool ppgtt = !(flags & I915_DISPATCH_SECURE);
> -	int ret;
> -
> -	ret = intel_logical_ring_begin(ringbuf, 4);
> -	if (ret)
> -		return ret;
> -
> -	/* FIXME(BDW): Address space and security selectors. */
> -	intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
> -	intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
> -	intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
> -	intel_logical_ring_emit(ringbuf, MI_NOOP);
> -	intel_logical_ring_advance(ringbuf);
> -
> -	return 0;
> -}
> -
> -static bool gen8_logical_ring_get_irq(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	unsigned long flags;
> -
> -	if (!dev->irq_enabled)
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
> -		POSTING_READ(RING_IMR(ring->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> -}
> -
> -static void gen8_logical_ring_put_irq(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		I915_WRITE_IMR(ring, ~ring->irq_keep_mask);
> -		POSTING_READ(RING_IMR(ring->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -}
> -
> -static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
> -			   u32 invalidate_domains,
> -			   u32 unused)
> -{
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	uint32_t cmd;
> -	int ret;
> -
> -	ret = intel_logical_ring_begin(ringbuf, 4);
> -	if (ret)
> -		return ret;
> -
> -	cmd = MI_FLUSH_DW + 1;
> -
> -	if (ring == &dev_priv->ring[VCS]) {
> -		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> -			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> -				MI_FLUSH_DW_STORE_INDEX |
> -				MI_FLUSH_DW_OP_STOREDW;
> -	} else {
> -		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> -			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> -				MI_FLUSH_DW_OP_STOREDW;
> -	}
> -
> -	intel_logical_ring_emit(ringbuf, cmd);
> -	intel_logical_ring_emit(ringbuf,
> -				I915_GEM_HWS_SCRATCH_ADDR |
> -				MI_FLUSH_DW_USE_GTT);
> -	intel_logical_ring_emit(ringbuf, 0); /* upper addr */
> -	intel_logical_ring_emit(ringbuf, 0); /* value */
> -	intel_logical_ring_advance(ringbuf);
> +	/* start suspended */
> +	engine->execlists_enabled = true;
> +	engine->execlists_submitted = ~0;
>   
>   	return 0;
>   }
> -
> -static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
> -				  u32 invalidate_domains,
> -				  u32 flush_domains)
> -{
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> -	u32 flags = 0;
> -	int ret;
> -
> -	flags |= PIPE_CONTROL_CS_STALL;
> -
> -	if (flush_domains) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> -	}
> -
> -	if (invalidate_domains) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_QW_WRITE;
> -		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> -	}
> -
> -	ret = intel_logical_ring_begin(ringbuf, 6);
> -	if (ret)
> -		return ret;
> -
> -	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
> -	intel_logical_ring_emit(ringbuf, flags);
> -	intel_logical_ring_emit(ringbuf, scratch_addr);
> -	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_advance(ringbuf);
> -
> -	return 0;
> -}
> -
> -static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> -{
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> -}
> -
> -static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> -}
> -
> -static int gen8_emit_request(struct intel_ringbuffer *ringbuf)
> -{
> -	struct intel_engine_cs *ring = ringbuf->ring;
> -	u32 cmd;
> -	int ret;
> -
> -	ret = intel_logical_ring_begin(ringbuf, 6);
> -	if (ret)
> -		return ret;
> -
> -	cmd = MI_STORE_DWORD_IMM_GEN8;
> -	cmd |= MI_GLOBAL_GTT;
> -
> -	intel_logical_ring_emit(ringbuf, cmd);
> -	intel_logical_ring_emit(ringbuf,
> -				(ring->status_page.gfx_addr +
> -				(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)));
> -	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_emit(ringbuf, ring->outstanding_lazy_seqno);
> -	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
> -	intel_logical_ring_emit(ringbuf, MI_NOOP);
> -	intel_logical_ring_advance_and_submit(ringbuf);
> -
> -	return 0;
> -}
> -
> -/**
> - * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
> - *
> - * @ring: Engine Command Streamer.
> - *
> - */
> -void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -
> -	if (!intel_ring_initialized(ring))
> -		return;
> -
> -	intel_logical_ring_stop(ring);
> -	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
> -	ring->preallocated_lazy_request = NULL;
> -	ring->outstanding_lazy_seqno = 0;
> -
> -	if (ring->cleanup)
> -		ring->cleanup(ring);
> -
> -	i915_cmd_parser_fini_ring(ring);
> -
> -	if (ring->status_page.obj) {
> -		kunmap(sg_page(ring->status_page.obj->pages->sgl));
> -		ring->status_page.obj = NULL;
> -	}
> -}
> -
> -static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
> -{
> -	int ret;
> -
> -	/* Intentionally left blank. */
> -	ring->buffer = NULL;
> -
> -	ring->dev = dev;
> -	INIT_LIST_HEAD(&ring->active_list);
> -	INIT_LIST_HEAD(&ring->request_list);
> -	init_waitqueue_head(&ring->irq_queue);
> -
> -	INIT_LIST_HEAD(&ring->execlist_queue);
> -	spin_lock_init(&ring->execlist_lock);
> -	ring->next_context_status_buffer = 0;
> -
> -	ret = i915_cmd_parser_init_ring(ring);
> -	if (ret)
> -		return ret;
> -
> -	if (ring->init) {
> -		ret = ring->init(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	ret = intel_lr_context_deferred_create(ring->default_context, ring);
> -
> -	return ret;
> -}
> -
> -static int logical_render_ring_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> -
> -	ring->name = "render ring";
> -	ring->id = RCS;
> -	ring->mmio_base = RENDER_RING_BASE;
> -	ring->irq_enable_mask =
> -		GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> -	ring->irq_keep_mask =
> -		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT;
> -	if (HAS_L3_DPF(dev))
> -		ring->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> -
> -	ring->init = gen8_init_render_ring;
> -	ring->cleanup = intel_fini_pipe_control;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	ring->emit_request = gen8_emit_request;
> -	ring->emit_flush = gen8_emit_flush_render;
> -	ring->irq_get = gen8_logical_ring_get_irq;
> -	ring->irq_put = gen8_logical_ring_put_irq;
> -	ring->emit_bb_start = gen8_emit_bb_start;
> -
> -	return logical_ring_init(dev, ring);
> -}
> -
> -static int logical_bsd_ring_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VCS];
> -
> -	ring->name = "bsd ring";
> -	ring->id = VCS;
> -	ring->mmio_base = GEN6_BSD_RING_BASE;
> -	ring->irq_enable_mask =
> -		GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> -	ring->irq_keep_mask =
> -		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> -
> -	ring->init = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	ring->emit_request = gen8_emit_request;
> -	ring->emit_flush = gen8_emit_flush;
> -	ring->irq_get = gen8_logical_ring_get_irq;
> -	ring->irq_put = gen8_logical_ring_put_irq;
> -	ring->emit_bb_start = gen8_emit_bb_start;
> -
> -	return logical_ring_init(dev, ring);
> -}
> -
> -static int logical_bsd2_ring_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VCS2];
> -
> -	ring->name = "bds2 ring";
> -	ring->id = VCS2;
> -	ring->mmio_base = GEN8_BSD2_RING_BASE;
> -	ring->irq_enable_mask =
> -		GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> -	ring->irq_keep_mask =
> -		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> -
> -	ring->init = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	ring->emit_request = gen8_emit_request;
> -	ring->emit_flush = gen8_emit_flush;
> -	ring->irq_get = gen8_logical_ring_get_irq;
> -	ring->irq_put = gen8_logical_ring_put_irq;
> -	ring->emit_bb_start = gen8_emit_bb_start;
> -
> -	return logical_ring_init(dev, ring);
> -}
> -
> -static int logical_blt_ring_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[BCS];
> -
> -	ring->name = "blitter ring";
> -	ring->id = BCS;
> -	ring->mmio_base = BLT_RING_BASE;
> -	ring->irq_enable_mask =
> -		GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> -	ring->irq_keep_mask =
> -		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> -
> -	ring->init = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	ring->emit_request = gen8_emit_request;
> -	ring->emit_flush = gen8_emit_flush;
> -	ring->irq_get = gen8_logical_ring_get_irq;
> -	ring->irq_put = gen8_logical_ring_put_irq;
> -	ring->emit_bb_start = gen8_emit_bb_start;
> -
> -	return logical_ring_init(dev, ring);
> -}
> -
> -static int logical_vebox_ring_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VECS];
> -
> -	ring->name = "video enhancement ring";
> -	ring->id = VECS;
> -	ring->mmio_base = VEBOX_RING_BASE;
> -	ring->irq_enable_mask =
> -		GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> -	ring->irq_keep_mask =
> -		GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> -
> -	ring->init = gen8_init_common_ring;
> -	ring->get_seqno = gen8_get_seqno;
> -	ring->set_seqno = gen8_set_seqno;
> -	ring->emit_request = gen8_emit_request;
> -	ring->emit_flush = gen8_emit_flush;
> -	ring->irq_get = gen8_logical_ring_get_irq;
> -	ring->irq_put = gen8_logical_ring_put_irq;
> -	ring->emit_bb_start = gen8_emit_bb_start;
> -
> -	return logical_ring_init(dev, ring);
> -}
> -
> -/**
> - * intel_logical_rings_init() - allocate, populate and init the Engine Command Streamers
> - * @dev: DRM device.
> - *
> - * This function inits the engines for an Execlists submission style (the equivalent in the
> - * legacy ringbuffer submission world would be i915_gem_init_rings). It does it only for
> - * those engines that are present in the hardware.
> - *
> - * Return: non-zero if the initialization failed.
> - */
> -int intel_logical_rings_init(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret;
> -
> -	ret = logical_render_ring_init(dev);
> -	if (ret)
> -		return ret;
> -
> -	if (HAS_BSD(dev)) {
> -		ret = logical_bsd_ring_init(dev);
> -		if (ret)
> -			goto cleanup_render_ring;
> -	}
> -
> -	if (HAS_BLT(dev)) {
> -		ret = logical_blt_ring_init(dev);
> -		if (ret)
> -			goto cleanup_bsd_ring;
> -	}
> -
> -	if (HAS_VEBOX(dev)) {
> -		ret = logical_vebox_ring_init(dev);
> -		if (ret)
> -			goto cleanup_blt_ring;
> -	}
> -
> -	if (HAS_BSD2(dev)) {
> -		ret = logical_bsd2_ring_init(dev);
> -		if (ret)
> -			goto cleanup_vebox_ring;
> -	}
> -
> -	ret = i915_gem_set_seqno(dev, ((u32)~0 - 0x1000));
> -	if (ret)
> -		goto cleanup_bsd2_ring;
> -
> -	return 0;
> -
> -cleanup_bsd2_ring:
> -	intel_logical_ring_cleanup(&dev_priv->ring[VCS2]);
> -cleanup_vebox_ring:
> -	intel_logical_ring_cleanup(&dev_priv->ring[VECS]);
> -cleanup_blt_ring:
> -	intel_logical_ring_cleanup(&dev_priv->ring[BCS]);
> -cleanup_bsd_ring:
> -	intel_logical_ring_cleanup(&dev_priv->ring[VCS]);
> -cleanup_render_ring:
> -	intel_logical_ring_cleanup(&dev_priv->ring[RCS]);
> -
> -	return ret;
> -}
> -
> -int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> -				       struct intel_context *ctx)
> -{
> -	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> -	struct render_state so;
> -	struct drm_i915_file_private *file_priv = ctx->file_priv;
> -	struct drm_file *file = file_priv ? file_priv->file : NULL;
> -	int ret;
> -
> -	ret = i915_gem_render_state_prepare(ring, &so);
> -	if (ret)
> -		return ret;
> -
> -	if (so.rodata == NULL)
> -		return 0;
> -
> -	ret = ring->emit_bb_start(ringbuf,
> -			so.ggtt_offset,
> -			I915_DISPATCH_SECURE);
> -	if (ret)
> -		goto out;
> -
> -	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
> -
> -	ret = __i915_add_request(ring, file, so.obj, NULL);
> -	/* intel_logical_ring_add_request moves object to inactive if it
> -	 * fails */
> -out:
> -	i915_gem_render_state_fini(&so);
> -	return ret;
> -}
> -
> -static int
> -populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_obj,
> -		    struct intel_engine_cs *ring, struct intel_ringbuffer *ringbuf)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct drm_i915_gem_object *ring_obj = ringbuf->obj;
> -	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> -	struct page *page;
> -	uint32_t *reg_state;
> -	int ret;
> -
> -	if (!ppgtt)
> -		ppgtt = dev_priv->mm.aliasing_ppgtt;
> -
> -	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
> -		return ret;
> -	}
> -
> -	ret = i915_gem_object_get_pages(ctx_obj);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Could not get object pages\n");
> -		return ret;
> -	}
> -
> -	i915_gem_object_pin_pages(ctx_obj);
> -
> -	/* The second page of the context object contains some fields which must
> -	 * be set up prior to the first execution. */
> -	page = i915_gem_object_get_page(ctx_obj, 1);
> -	reg_state = kmap_atomic(page);
> -
> -	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
> -	 * commands followed by (reg, value) pairs. The values we are setting here are
> -	 * only for the first context restore: on a subsequent save, the GPU will
> -	 * recreate this batchbuffer with new values (including all the missing
> -	 * MI_LOAD_REGISTER_IMM commands that we are not initializing here). */
> -	if (ring->id == RCS)
> -		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(14);
> -	else
> -		reg_state[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(11);
> -	reg_state[CTX_LRI_HEADER_0] |= MI_LRI_FORCE_POSTED;
> -	reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
> -	reg_state[CTX_CONTEXT_CONTROL+1] =
> -			_MASKED_BIT_ENABLE((1<<3) | MI_RESTORE_INHIBIT);
> -	reg_state[CTX_RING_HEAD] = RING_HEAD(ring->mmio_base);
> -	reg_state[CTX_RING_HEAD+1] = 0;
> -	reg_state[CTX_RING_TAIL] = RING_TAIL(ring->mmio_base);
> -	reg_state[CTX_RING_TAIL+1] = 0;
> -	reg_state[CTX_RING_BUFFER_START] = RING_START(ring->mmio_base);
> -	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
> -	reg_state[CTX_RING_BUFFER_CONTROL] = RING_CTL(ring->mmio_base);
> -	reg_state[CTX_RING_BUFFER_CONTROL+1] =
> -			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES) | RING_VALID;
> -	reg_state[CTX_BB_HEAD_U] = ring->mmio_base + 0x168;
> -	reg_state[CTX_BB_HEAD_U+1] = 0;
> -	reg_state[CTX_BB_HEAD_L] = ring->mmio_base + 0x140;
> -	reg_state[CTX_BB_HEAD_L+1] = 0;
> -	reg_state[CTX_BB_STATE] = ring->mmio_base + 0x110;
> -	reg_state[CTX_BB_STATE+1] = (1<<5);
> -	reg_state[CTX_SECOND_BB_HEAD_U] = ring->mmio_base + 0x11c;
> -	reg_state[CTX_SECOND_BB_HEAD_U+1] = 0;
> -	reg_state[CTX_SECOND_BB_HEAD_L] = ring->mmio_base + 0x114;
> -	reg_state[CTX_SECOND_BB_HEAD_L+1] = 0;
> -	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
> -	reg_state[CTX_SECOND_BB_STATE+1] = 0;
> -	if (ring->id == RCS) {
> -		/* TODO: according to BSpec, the register state context
> -		 * for CHV does not have these. OTOH, these registers do
> -		 * exist in CHV. I'm waiting for a clarification */
> -		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
> -		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
> -		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
> -		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
> -		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
> -		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> -	}
> -	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
> -	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
> -	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
> -	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
> -	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> -	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> -	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> -	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> -	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> -	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> -	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> -	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
> -	if (ring->id == RCS) {
> -		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> -		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -		reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
> -	}
> -
> -	kunmap_atomic(reg_state);
> -
> -	ctx_obj->dirty = 1;
> -	set_page_dirty(page);
> -	i915_gem_object_unpin_pages(ctx_obj);
> -
> -	return 0;
> -}
> -
> -/**
> - * intel_lr_context_free() - free the LRC specific bits of a context
> - * @ctx: the LR context to free.
> - *
> - * The real context freeing is done in i915_gem_context_free: this only
> - * takes care of the bits that are LRC related: the per-engine backing
> - * objects and the logical ringbuffer.
> - */
> -void intel_lr_context_free(struct intel_context *ctx)
> -{
> -	int i;
> -
> -	for (i = 0; i < I915_NUM_RINGS; i++) {
> -		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
> -		struct intel_ringbuffer *ringbuf = ctx->engine[i].ringbuf;
> -
> -		if (ctx_obj) {
> -			intel_destroy_ringbuffer_obj(ringbuf);
> -			kfree(ringbuf);
> -			i915_gem_object_ggtt_unpin(ctx_obj);
> -			drm_gem_object_unreference(&ctx_obj->base);
> -		}
> -	}
> -}
> -
> -static uint32_t get_lr_context_size(struct intel_engine_cs *ring)
> -{
> -	int ret = 0;
> -
> -	WARN_ON(INTEL_INFO(ring->dev)->gen != 8);
> -
> -	switch (ring->id) {
> -	case RCS:
> -		ret = GEN8_LR_CONTEXT_RENDER_SIZE;
> -		break;
> -	case VCS:
> -	case BCS:
> -	case VECS:
> -	case VCS2:
> -		ret = GEN8_LR_CONTEXT_OTHER_SIZE;
> -		break;
> -	}
> -
> -	return ret;
> -}
> -
> -/**
> - * intel_lr_context_deferred_create() - create the LRC specific bits of a context
> - * @ctx: LR context to create.
> - * @ring: engine to be used with the context.
> - *
> - * This function can be called more than once, with different engines, if we plan
> - * to use the context with them. The context backing objects and the ringbuffers
> - * (specially the ringbuffer backing objects) suck a lot of memory up, and that's why
> - * the creation is a deferred call: it's better to make sure first that we need to use
> - * a given ring with the context.
> - *
> - * Return: non-zero on eror.
> - */
> -int intel_lr_context_deferred_create(struct intel_context *ctx,
> -				     struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_gem_object *ctx_obj;
> -	uint32_t context_size;
> -	struct intel_ringbuffer *ringbuf;
> -	int ret;
> -
> -	WARN_ON(ctx->legacy_hw_ctx.rcs_state != NULL);
> -	if (ctx->engine[ring->id].state)
> -		return 0;
> -
> -	context_size = round_up(get_lr_context_size(ring), 4096);
> -
> -	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
> -	if (IS_ERR(ctx_obj)) {
> -		ret = PTR_ERR(ctx_obj);
> -		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
> -		return ret;
> -	}
> -
> -	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n", ret);
> -		drm_gem_object_unreference(&ctx_obj->base);
> -		return ret;
> -	}
> -
> -	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
> -	if (!ringbuf) {
> -		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
> -				ring->name);
> -		i915_gem_object_ggtt_unpin(ctx_obj);
> -		drm_gem_object_unreference(&ctx_obj->base);
> -		ret = -ENOMEM;
> -		return ret;
> -	}
> -
> -	ringbuf->ring = ring;
> -	ringbuf->FIXME_lrc_ctx = ctx;
> -
> -	ringbuf->size = 32 * PAGE_SIZE;
> -	ringbuf->effective_size = ringbuf->size;
> -	ringbuf->head = 0;
> -	ringbuf->tail = 0;
> -	ringbuf->space = ringbuf->size;
> -	ringbuf->last_retired_head = -1;
> -
> -	/* TODO: For now we put this in the mappable region so that we can reuse
> -	 * the existing ringbuffer code which ioremaps it. When we start
> -	 * creating many contexts, this will no longer work and we must switch
> -	 * to a kmapish interface.
> -	 */
> -	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer obj %s: %d\n",
> -				ring->name, ret);
> -		goto error;
> -	}
> -
> -	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
> -		intel_destroy_ringbuffer_obj(ringbuf);
> -		goto error;
> -	}
> -
> -	ctx->engine[ring->id].ringbuf = ringbuf;
> -	ctx->engine[ring->id].state = ctx_obj;
> -
> -	if (ctx == ring->default_context) {
> -		/* The status page is offset 0 from the default context object
> -		 * in LRC mode. */
> -		ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj);
> -		ring->status_page.page_addr =
> -				kmap(sg_page(ctx_obj->pages->sgl));
> -		if (ring->status_page.page_addr == NULL)
> -			return -ENOMEM;
> -		ring->status_page.obj = ctx_obj;
> -	}
> -
> -	if (ring->id == RCS && !ctx->rcs_initialized) {
> -		ret = intel_lr_context_render_state_init(ring, ctx);
> -		if (ret) {
> -			DRM_ERROR("Init render state failed: %d\n", ret);
> -			ctx->engine[ring->id].ringbuf = NULL;
> -			ctx->engine[ring->id].state = NULL;
> -			intel_destroy_ringbuffer_obj(ringbuf);
> -			goto error;
> -		}
> -		ctx->rcs_initialized = true;
> -	}
> -
> -	return 0;
> -
> -error:
> -	kfree(ringbuf);
> -	i915_gem_object_ggtt_unpin(ctx_obj);
> -	drm_gem_object_unreference(&ctx_obj->base);
> -	return ret;
> -}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 33c3b4bf28c5..8b9f5b164ef0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -31,84 +31,8 @@
>   #define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
>   #define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
>   
> -/* Logical Rings */
> -void intel_logical_ring_stop(struct intel_engine_cs *ring);
> -void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
> -int intel_logical_rings_init(struct drm_device *dev);
> -
> -int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf);
> -void intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf);
> -/**
> - * intel_logical_ring_advance() - advance the ringbuffer tail
> - * @ringbuf: Ringbuffer to advance.
> - *
> - * The tail is only updated in our logical ringbuffer struct.
> - */
> -static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
> -{
> -	ringbuf->tail &= ringbuf->size - 1;
> -}
> -/**
> - * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
> - * @ringbuf: Ringbuffer to write to.
> - * @data: DWORD to write.
> - */
> -static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
> -					   u32 data)
> -{
> -	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
> -	ringbuf->tail += 4;
> -}
> -int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords);
> -
> -/* Logical Ring Contexts */
> -int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
> -				       struct intel_context *ctx);
> -void intel_lr_context_free(struct intel_context *ctx);
> -int intel_lr_context_deferred_create(struct intel_context *ctx,
> -				     struct intel_engine_cs *ring);
> -
>   /* Execlists */
> -int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
> -int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
> -			       struct intel_engine_cs *ring,
> -			       struct intel_context *ctx,
> -			       struct drm_i915_gem_execbuffer2 *args,
> -			       struct list_head *vmas,
> -			       struct drm_i915_gem_object *batch_obj,
> -			       u64 exec_start, u32 flags);
> -u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
> -
> -/**
> - * struct intel_ctx_submit_request - queued context submission request
> - * @ctx: Context to submit to the ELSP.
> - * @ring: Engine to submit it to.
> - * @tail: how far in the context's ringbuffer this request goes to.
> - * @execlist_link: link in the submission queue.
> - * @work: workqueue for processing this request in a bottom half.
> - * @elsp_submitted: no. of times this request has been sent to the ELSP.
> - *
> - * The ELSP only accepts two elements at a time, so we queue context/tail
> - * pairs on a given queue (ring->execlist_queue) until the hardware is
> - * available. The queue serves a double purpose: we also use it to keep track
> - * of the up to 2 contexts currently in the hardware (usually one in execution
> - * and the other queued up by the GPU): We only remove elements from the head
> - * of the queue when the hardware informs us that an element has been
> - * completed.
> - *
> - * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
> - */
> -struct intel_ctx_submit_request {
> -	struct intel_context *ctx;
> -	struct intel_engine_cs *ring;
> -	u32 tail;
> -
> -	struct list_head execlist_link;
> -	struct work_struct work;
> -
> -	int elsp_submitted;
> -};
> -
> -void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
> +int intel_engine_enable_execlists(struct intel_engine_cs *engine);
> +void intel_execlists_irq_handler(struct intel_engine_cs *engine);
>   
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index dc2f4f26c961..ae0e5771f730 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -182,7 +182,7 @@ struct intel_overlay {
>   	u32 flip_addr;
>   	struct drm_i915_gem_object *reg_bo;
>   	/* flip handling */
> -	uint32_t last_flip_req;
> +	struct i915_gem_request *flip_request;
>   	void (*flip_tail)(struct intel_overlay *);
>   };
>   
> @@ -208,53 +208,86 @@ static void intel_overlay_unmap_regs(struct intel_overlay *overlay,
>   		io_mapping_unmap(regs);
>   }
>   
> -static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
> -					 void (*tail)(struct intel_overlay *))
> +/* recover from an interruption due to a signal
> + * We have to be careful not to repeat work forever an make forward progess. */
> +static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
>   {
> -	struct drm_device *dev = overlay->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
>   	int ret;
>   
> -	BUG_ON(overlay->last_flip_req);
> -	ret = i915_add_request(ring, &overlay->last_flip_req);
> -	if (ret)
> -		return ret;
> +	if (overlay->flip_request == NULL)
> +		return 0;
>   
> -	overlay->flip_tail = tail;
> -	ret = i915_wait_seqno(ring, overlay->last_flip_req);
> +	ret = i915_request_wait(overlay->flip_request);
>   	if (ret)
>   		return ret;
> -	i915_gem_retire_requests(dev);
>   
> -	overlay->last_flip_req = 0;
> +	i915_request_put(overlay->flip_request);
> +	overlay->flip_request = NULL;
> +
> +	i915_gem_retire_requests(overlay->dev);
> +
> +	if (overlay->flip_tail)
> +		overlay->flip_tail(overlay);
> +
>   	return 0;
>   }
>   
> +static int intel_overlay_add_request(struct intel_overlay *overlay,
> +				     struct i915_gem_request *rq,
> +				     void (*tail)(struct intel_overlay *))
> +{
> +	BUG_ON(overlay->flip_request);
> +	overlay->flip_request = rq;
> +	overlay->flip_tail = tail;
> +
> +	return i915_request_commit(rq);
> +}
> +
> +static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
> +					 struct i915_gem_request *rq,
> +					 void (*tail)(struct intel_overlay *))
> +{
> +	intel_overlay_add_request(overlay, rq, tail);
> +	return intel_overlay_recover_from_interrupt(overlay);
> +}
> +
> +static struct i915_gem_request *
> +intel_overlay_alloc_request(struct intel_overlay *overlay)
> +{
> +	struct drm_i915_private *i915 = to_i915(overlay->dev);
> +	return intel_engine_alloc_request(RCS_ENGINE(i915),
> +					  RCS_ENGINE(i915)->default_context);
> +}
> +
>   /* overlay needs to be disable in OCMD reg */
>   static int intel_overlay_on(struct intel_overlay *overlay)
>   {
>   	struct drm_device *dev = overlay->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> -	int ret;
> +	struct i915_gem_request *rq;
> +	struct intel_ringbuffer *ring;
>   
>   	BUG_ON(overlay->active);
>   	overlay->active = 1;
>   
>   	WARN_ON(IS_I830(dev) && !(dev_priv->quirks & QUIRK_PIPEA_FORCE));
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	rq = intel_overlay_alloc_request(overlay);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	ring = intel_ring_begin(rq, 3);
> +	if (IS_ERR(ring)) {
> +		i915_request_put(rq);
> +		return PTR_ERR(ring);
> +	}
>   
>   	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_ON);
>   	intel_ring_emit(ring, overlay->flip_addr | OFC_UPDATE);
>   	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
> -	return intel_overlay_do_wait_request(overlay, NULL);
> +	return intel_overlay_do_wait_request(overlay, rq, NULL);
>   }
>   
>   /* overlay needs to be enabled in OCMD reg */
> @@ -263,10 +296,10 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
>   {
>   	struct drm_device *dev = overlay->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
>   	u32 flip_addr = overlay->flip_addr;
> +	struct i915_gem_request *rq;
> +	struct intel_ringbuffer *ring;
>   	u32 tmp;
> -	int ret;
>   
>   	BUG_ON(!overlay->active);
>   
> @@ -278,21 +311,30 @@ static int intel_overlay_continue(struct intel_overlay *overlay,
>   	if (tmp & (1 << 17))
>   		DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp);
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	rq = intel_overlay_alloc_request(overlay);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	ring = intel_ring_begin(rq, 2);
> +	if (IS_ERR(ring)) {
> +		i915_request_put(rq);
> +		return PTR_ERR(ring);
> +	}
>   
>   	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
>   	intel_ring_emit(ring, flip_addr);
>   	intel_ring_advance(ring);
>   
> -	return i915_add_request(ring, &overlay->last_flip_req);
> +	return intel_overlay_add_request(overlay, rq, NULL);
>   }
>   
>   static void intel_overlay_release_old_vid_tail(struct intel_overlay *overlay)
>   {
>   	struct drm_i915_gem_object *obj = overlay->old_vid_bo;
>   
> +	i915_gem_track_fb(obj, NULL,
> +			  INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe));
> +
>   	i915_gem_object_ggtt_unpin(obj);
>   	drm_gem_object_unreference(&obj->base);
>   
> @@ -319,10 +361,10 @@ static void intel_overlay_off_tail(struct intel_overlay *overlay)
>   static int intel_overlay_off(struct intel_overlay *overlay)
>   {
>   	struct drm_device *dev = overlay->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
>   	u32 flip_addr = overlay->flip_addr;
> -	int ret;
> +	struct i915_gem_request *rq;
> +	struct intel_ringbuffer *ring;
> +	int len;
>   
>   	BUG_ON(!overlay->active);
>   
> @@ -332,53 +374,36 @@ static int intel_overlay_off(struct intel_overlay *overlay)
>   	 * of the hw. Do it in both cases */
>   	flip_addr |= OFC_UPDATE;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	rq = intel_overlay_alloc_request(overlay);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	len = 3;
> +	if (!IS_I830(dev))
> +		len += 3;
> +
> +	ring = intel_ring_begin(rq, len);
> +	if (IS_ERR(ring)) {
> +		i915_request_put(rq);
> +		return PTR_ERR(ring);
> +	}
>   
>   	/* wait for overlay to go idle */
>   	intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_CONTINUE);
>   	intel_ring_emit(ring, flip_addr);
>   	intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
> -	/* turn overlay off */
> -	if (IS_I830(dev)) {
> -		/* Workaround: Don't disable the overlay fully, since otherwise
> -		 * it dies on the next OVERLAY_ON cmd. */
> -		intel_ring_emit(ring, MI_NOOP);
> -		intel_ring_emit(ring, MI_NOOP);
> -		intel_ring_emit(ring, MI_NOOP);
> -	} else {
> +	/* turn overlay off
> +	 * Workaround for i830: Don't disable the overlay fully, since
> +	 * otherwise it dies on the next OVERLAY_ON cmd.
> +	 */
> +	if (!IS_I830(dev)) {
>   		intel_ring_emit(ring, MI_OVERLAY_FLIP | MI_OVERLAY_OFF);
>   		intel_ring_emit(ring, flip_addr);
>   		intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
>   	}
>   	intel_ring_advance(ring);
>   
> -	return intel_overlay_do_wait_request(overlay, intel_overlay_off_tail);
> -}
> -
> -/* recover from an interruption due to a signal
> - * We have to be careful not to repeat work forever an make forward progess. */
> -static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
> -{
> -	struct drm_device *dev = overlay->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> -	int ret;
> -
> -	if (overlay->last_flip_req == 0)
> -		return 0;
> -
> -	ret = i915_wait_seqno(ring, overlay->last_flip_req);
> -	if (ret)
> -		return ret;
> -	i915_gem_retire_requests(dev);
> -
> -	if (overlay->flip_tail)
> -		overlay->flip_tail(overlay);
> -
> -	overlay->last_flip_req = 0;
> -	return 0;
> +	return intel_overlay_do_wait_request(overlay, rq, intel_overlay_off_tail);
>   }
>   
>   /* Wait for pending overlay flip and release old frame.
> @@ -387,10 +412,8 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
>    */
>   static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
>   {
> -	struct drm_device *dev = overlay->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> -	int ret;
> +	struct drm_i915_private *dev_priv = to_i915(overlay->dev);
> +	int ret = 0;
>   
>   	/* Only wait if there is actually an old frame to release to
>   	 * guarantee forward progress.
> @@ -399,27 +422,29 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
>   		return 0;
>   
>   	if (I915_READ(ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT) {
> +		struct i915_gem_request *rq;
> +		struct intel_ringbuffer *ring;
> +
> +		rq = intel_overlay_alloc_request(overlay);
> +		if (IS_ERR(rq))
> +			return PTR_ERR(rq);
> +
>   		/* synchronous slowpath */
> -		ret = intel_ring_begin(ring, 2);
> -		if (ret)
> -			return ret;
> +		ring = intel_ring_begin(rq, 1);
> +		if (IS_ERR(ring)) {
> +			i915_request_put(rq);
> +			return PTR_ERR(ring);
> +		}
>   
>   		intel_ring_emit(ring, MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP);
> -		intel_ring_emit(ring, MI_NOOP);
>   		intel_ring_advance(ring);
>   
> -		ret = intel_overlay_do_wait_request(overlay,
> +		ret = intel_overlay_do_wait_request(overlay, rq,
>   						    intel_overlay_release_old_vid_tail);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	intel_overlay_release_old_vid_tail(overlay);
> -
> +	} else
> +		intel_overlay_release_old_vid_tail(overlay);
>   
> -	i915_gem_track_fb(overlay->old_vid_bo, NULL,
> -			  INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe));
> -	return 0;
> +	return ret;
>   }
>   
>   struct put_image_params {
> @@ -821,12 +846,7 @@ int intel_overlay_switch_off(struct intel_overlay *overlay)
>   	iowrite32(0, &regs->OCMD);
>   	intel_overlay_unmap_regs(overlay, regs);
>   
> -	ret = intel_overlay_off(overlay);
> -	if (ret != 0)
> -		return ret;
> -
> -	intel_overlay_off_tail(overlay);
> -	return 0;
> +	return intel_overlay_off(overlay);
>   }
>   
>   static int check_overlay_possible_on_crtc(struct intel_overlay *overlay,
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 45f71e6dc544..46e7cbb5e4d8 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3628,9 +3628,11 @@ static int sanitize_rc6_option(const struct drm_device *dev, int enable_rc6)
>   		return enable_rc6 & mask;
>   	}
>   
> -	/* Disable RC6 on Ironlake */
> -	if (INTEL_INFO(dev)->gen == 5)
> +#ifdef CONFIG_INTEL_IOMMU
> +	/* Ironlake + RC6 + VT-d empirically blows up */
> +	if (IS_GEN5(dev) && intel_iommu_gfx_mapped)
>   		return 0;
> +#endif
>   
>   	if (IS_IVYBRIDGE(dev))
>   		return (INTEL_RC6_ENABLE | INTEL_RC6p_ENABLE);
> @@ -3781,7 +3783,7 @@ void bdw_software_turbo(struct drm_device *dev)
>   static void gen8_enable_rps(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	uint32_t rc6_mask = 0, rp_state_cap;
>   	uint32_t threshold_up_pct, threshold_down_pct;
>   	uint32_t ei_up, ei_down; /* up and down evaluation interval */
> @@ -3808,8 +3810,8 @@ static void gen8_enable_rps(struct drm_device *dev)
>   	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
>   	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
>   	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
> -	for_each_ring(ring, dev_priv, unused)
> -		I915_WRITE(RING_MAX_IDLE(ring->mmio_base), 10);
> +	for_each_engine(engine, dev_priv, unused)
> +		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
>   	I915_WRITE(GEN6_RC_SLEEP, 0);
>   	if (IS_BROADWELL(dev))
>   		I915_WRITE(GEN6_RC6_THRESHOLD, 625); /* 800us/1.28 for TO */
> @@ -3909,7 +3911,7 @@ static void gen8_enable_rps(struct drm_device *dev)
>   static void gen6_enable_rps(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	u32 rp_state_cap;
>   	u32 rc6vids, pcu_mbox = 0, rc6_mask = 0;
>   	u32 gtfifodbg;
> @@ -3947,8 +3949,8 @@ static void gen6_enable_rps(struct drm_device *dev)
>   	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
>   	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
>   
> -	for_each_ring(ring, dev_priv, i)
> -		I915_WRITE(RING_MAX_IDLE(ring->mmio_base), 10);
> +	for_each_engine(engine, dev_priv, i)
> +		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
>   
>   	I915_WRITE(GEN6_RC_SLEEP, 0);
>   	I915_WRITE(GEN6_RC1e_THRESHOLD, 1000);
> @@ -4408,7 +4410,7 @@ static void valleyview_cleanup_gt_powersave(struct drm_device *dev)
>   static void cherryview_enable_rps(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	u32 gtfifodbg, val, rc6_mode = 0, pcbr;
>   	int i;
>   
> @@ -4432,8 +4434,8 @@ static void cherryview_enable_rps(struct drm_device *dev)
>   	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
>   	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
>   
> -	for_each_ring(ring, dev_priv, i)
> -		I915_WRITE(RING_MAX_IDLE(ring->mmio_base), 10);
> +	for_each_engine(engine, dev_priv, i)
> +		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
>   	I915_WRITE(GEN6_RC_SLEEP, 0);
>   
>   	I915_WRITE(GEN6_RC6_THRESHOLD, 50000); /* 50/125ms per EI */
> @@ -4500,7 +4502,7 @@ static void cherryview_enable_rps(struct drm_device *dev)
>   static void valleyview_enable_rps(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	u32 gtfifodbg, val, rc6_mode = 0;
>   	int i;
>   
> @@ -4537,8 +4539,8 @@ static void valleyview_enable_rps(struct drm_device *dev)
>   	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
>   	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
>   
> -	for_each_ring(ring, dev_priv, i)
> -		I915_WRITE(RING_MAX_IDLE(ring->mmio_base), 10);
> +	for_each_engine(engine, dev_priv, i)
> +		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
>   
>   	I915_WRITE(GEN6_RC6_THRESHOLD, 0x557);
>   
> @@ -4581,12 +4583,6 @@ void ironlake_teardown_rc6(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   
> -	if (dev_priv->ips.renderctx) {
> -		i915_gem_object_ggtt_unpin(dev_priv->ips.renderctx);
> -		drm_gem_object_unreference(&dev_priv->ips.renderctx->base);
> -		dev_priv->ips.renderctx = NULL;
> -	}
> -
>   	if (dev_priv->ips.pwrctx) {
>   		i915_gem_object_ggtt_unpin(dev_priv->ips.pwrctx);
>   		drm_gem_object_unreference(&dev_priv->ips.pwrctx->base);
> @@ -4616,11 +4612,6 @@ static int ironlake_setup_rc6(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   
> -	if (dev_priv->ips.renderctx == NULL)
> -		dev_priv->ips.renderctx = intel_alloc_context_page(dev);
> -	if (!dev_priv->ips.renderctx)
> -		return -ENOMEM;
> -
>   	if (dev_priv->ips.pwrctx == NULL)
>   		dev_priv->ips.pwrctx = intel_alloc_context_page(dev);
>   	if (!dev_priv->ips.pwrctx) {
> @@ -4634,9 +4625,6 @@ static int ironlake_setup_rc6(struct drm_device *dev)
>   static void ironlake_enable_rc6(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> -	bool was_interruptible;
> -	int ret;
>   
>   	/* rc6 disabled by default due to repeated reports of hanging during
>   	 * boot and resume.
> @@ -4646,46 +4634,8 @@ static void ironlake_enable_rc6(struct drm_device *dev)
>   
>   	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
>   
> -	ret = ironlake_setup_rc6(dev);
> -	if (ret)
> -		return;
> -
> -	was_interruptible = dev_priv->mm.interruptible;
> -	dev_priv->mm.interruptible = false;
> -
> -	/*
> -	 * GPU can automatically power down the render unit if given a page
> -	 * to save state.
> -	 */
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret) {
> -		ironlake_teardown_rc6(dev);
> -		dev_priv->mm.interruptible = was_interruptible;
> -		return;
> -	}
> -
> -	intel_ring_emit(ring, MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN);
> -	intel_ring_emit(ring, MI_SET_CONTEXT);
> -	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(dev_priv->ips.renderctx) |
> -			MI_MM_SPACE_GTT |
> -			MI_SAVE_EXT_STATE_EN |
> -			MI_RESTORE_EXT_STATE_EN |
> -			MI_RESTORE_INHIBIT);
> -	intel_ring_emit(ring, MI_SUSPEND_FLUSH);
> -	intel_ring_emit(ring, MI_NOOP);
> -	intel_ring_emit(ring, MI_FLUSH);
> -	intel_ring_advance(ring);
> -
> -	/*
> -	 * Wait for the command parser to advance past MI_SET_CONTEXT. The HW
> -	 * does an implicit flush, combined with MI_FLUSH above, it should be
> -	 * safe to assume that renderctx is valid
> -	 */
> -	ret = intel_ring_idle(ring);
> -	dev_priv->mm.interruptible = was_interruptible;
> -	if (ret) {
> +	if (ironlake_setup_rc6(dev)) {
>   		DRM_ERROR("failed to enable ironlake power savings\n");
> -		ironlake_teardown_rc6(dev);
>   		return;
>   	}
>   
> @@ -5144,7 +5094,7 @@ EXPORT_SYMBOL_GPL(i915_gpu_lower);
>   bool i915_gpu_busy(void)
>   {
>   	struct drm_i915_private *dev_priv;
> -	struct intel_engine_cs *ring;
> +	struct intel_engine_cs *engine;
>   	bool ret = false;
>   	int i;
>   
> @@ -5153,8 +5103,8 @@ bool i915_gpu_busy(void)
>   		goto out_unlock;
>   	dev_priv = i915_mch_dev;
>   
> -	for_each_ring(ring, dev_priv, i)
> -		ret |= !list_empty(&ring->request_list);
> +	for_each_engine(engine, dev_priv, i)
> +		ret |= engine->last_request != NULL;
>   
>   out_unlock:
>   	spin_unlock_irq(&mchdev_lock);
> diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
> index 6c792d3a9c9c..fd4f66231d30 100644
> --- a/drivers/gpu/drm/i915/intel_renderstate.h
> +++ b/drivers/gpu/drm/i915/intel_renderstate.h
> @@ -24,7 +24,13 @@
>   #ifndef _INTEL_RENDERSTATE_H
>   #define _INTEL_RENDERSTATE_H
>   
> -#include "i915_drv.h"
> +#include <linux/types.h>
> +
> +struct intel_renderstate_rodata {
> +	const u32 *reloc;
> +	const u32 *batch;
> +	const u32 batch_items;
> +};
>   
>   extern const struct intel_renderstate_rodata gen6_null_state;
>   extern const struct intel_renderstate_rodata gen7_null_state;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1b1180797851..ae02b1757745 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -33,86 +33,34 @@
>   #include "i915_trace.h"
>   #include "intel_drv.h"
>   
> -bool
> -intel_ring_initialized(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -
> -	if (!dev)
> -		return false;
> -
> -	if (i915.enable_execlists) {
> -		struct intel_context *dctx = ring->default_context;
> -		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
> -
> -		return ringbuf->obj;
> -	} else
> -		return ring->buffer && ring->buffer->obj;
> -}
> -
> -int __intel_ring_space(int head, int tail, int size)
> -{
> -	int space = head - (tail + I915_RING_FREE_SPACE);
> -	if (space < 0)
> -		space += size;
> -	return space;
> -}
> -
> -int intel_ring_space(struct intel_ringbuffer *ringbuf)
> -{
> -	return __intel_ring_space(ringbuf->head & HEAD_ADDR,
> -				  ringbuf->tail, ringbuf->size);
> -}
> -
> -bool intel_ring_stopped(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring);
> -}
> -
> -void __intel_ring_advance(struct intel_engine_cs *ring)
> -{
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	ringbuf->tail &= ringbuf->size - 1;
> -	if (intel_ring_stopped(ring))
> -		return;
> -	ring->write_tail(ring, ringbuf->tail);
> -}
> -
>   static int
> -gen2_render_ring_flush(struct intel_engine_cs *ring,
> -		       u32	invalidate_domains,
> -		       u32	flush_domains)
> +gen2_emit_flush(struct i915_gem_request *rq, u32 flags)
>   {
> +	struct intel_ringbuffer *ring;
>   	u32 cmd;
> -	int ret;
>   
>   	cmd = MI_FLUSH;
> -	if (((invalidate_domains|flush_domains) & I915_GEM_DOMAIN_RENDER) == 0)
> +	if ((flags & (I915_FLUSH_CACHES | I915_INVALIDATE_CACHES)) == 0)
>   		cmd |= MI_NO_WRITE_FLUSH;
>   
> -	if (invalidate_domains & I915_GEM_DOMAIN_SAMPLER)
> +	if (flags & I915_INVALIDATE_CACHES)
>   		cmd |= MI_READ_FLUSH;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 1);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, cmd);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
>   static int
> -gen4_render_ring_flush(struct intel_engine_cs *ring,
> -		       u32	invalidate_domains,
> -		       u32	flush_domains)
> +gen4_emit_flush(struct i915_gem_request *rq, u32 flags)
>   {
> -	struct drm_device *dev = ring->dev;
> +	struct intel_ringbuffer *ring;
>   	u32 cmd;
> -	int ret;
>   
>   	/*
>   	 * read/write caches:
> @@ -142,22 +90,20 @@ gen4_render_ring_flush(struct intel_engine_cs *ring,
>   	 * are flushed at any MI_FLUSH.
>   	 */
>   
> -	cmd = MI_FLUSH | MI_NO_WRITE_FLUSH;
> -	if ((invalidate_domains|flush_domains) & I915_GEM_DOMAIN_RENDER)
> -		cmd &= ~MI_NO_WRITE_FLUSH;
> -	if (invalidate_domains & I915_GEM_DOMAIN_INSTRUCTION)
> +	cmd = MI_FLUSH;
> +	if ((flags & (I915_FLUSH_CACHES | I915_INVALIDATE_CACHES)) == 0)
> +		cmd |= MI_NO_WRITE_FLUSH;
> +	if (flags & I915_INVALIDATE_CACHES) {
>   		cmd |= MI_EXE_FLUSH;
> +		if (IS_G4X(rq->i915) || IS_GEN5(rq->i915))
> +			cmd |= MI_INVALIDATE_ISP;
> +	}
>   
> -	if (invalidate_domains & I915_GEM_DOMAIN_COMMAND &&
> -	    (IS_G4X(dev) || IS_GEN5(dev)))
> -		cmd |= MI_INVALIDATE_ISP;
> -
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 1);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, cmd);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
>   	return 0;
> @@ -201,100 +147,89 @@ gen4_render_ring_flush(struct intel_engine_cs *ring,
>    * really our business.  That leaves only stall at scoreboard.
>    */
>   static int
> -intel_emit_post_sync_nonzero_flush(struct intel_engine_cs *ring)
> +gen6_emit_post_sync_nonzero_flush(struct i915_gem_request *rq)
>   {
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> -	int ret;
> +	const u32 scratch = rq->engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
> +	struct intel_ringbuffer *ring;
>   
> +	ring = intel_ring_begin(rq, 8);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> -
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(5));
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
>   	intel_ring_emit(ring, PIPE_CONTROL_CS_STALL |
>   			PIPE_CONTROL_STALL_AT_SCOREBOARD);
> -	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */
> -	intel_ring_emit(ring, 0); /* low dword */
> -	intel_ring_emit(ring, 0); /* high dword */
> -	intel_ring_emit(ring, MI_NOOP);
> -	intel_ring_advance(ring);
> -
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	intel_ring_emit(ring, scratch | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, 0);
>   
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(5));
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
>   	intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE);
> -	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */
> -	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring, scratch | PIPE_CONTROL_GLOBAL_GTT);
>   	intel_ring_emit(ring, 0);
> -	intel_ring_emit(ring, MI_NOOP);
> -	intel_ring_advance(ring);
>   
> +	intel_ring_advance(ring);
>   	return 0;
>   }
>   
>   static int
> -gen6_render_ring_flush(struct intel_engine_cs *ring,
> -                         u32 invalidate_domains, u32 flush_domains)
> +gen6_render_emit_flush(struct i915_gem_request *rq, u32 flags)
>   {
> -	u32 flags = 0;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	const u32 scratch = rq->engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
> +	struct intel_ringbuffer *ring;
> +	u32 cmd = 0;
>   	int ret;
>   
> -	/* Force SNB workarounds for PIPE_CONTROL flushes */
> -	ret = intel_emit_post_sync_nonzero_flush(ring);
> -	if (ret)
> -		return ret;
> +	if (flags & I915_FLUSH_CACHES) {
> +		/* Force SNB workarounds for PIPE_CONTROL flushes */
> +		ret = gen6_emit_post_sync_nonzero_flush(rq);
> +		if (ret)
> +			return ret;
>   
> -	/* Just flush everything.  Experiments have shown that reducing the
> -	 * number of bits based on the write domains has little performance
> -	 * impact.
> -	 */
> -	if (flush_domains) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> -		/*
> -		 * Ensure that any following seqno writes only happen
> -		 * when the render cache is indeed flushed.
> -		 */
> -		flags |= PIPE_CONTROL_CS_STALL;
> -	}
> -	if (invalidate_domains) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		cmd |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +	}
> +	if (flags & I915_INVALIDATE_CACHES) {
> +		cmd |= PIPE_CONTROL_TLB_INVALIDATE;
> +		cmd |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
>   		/*
>   		 * TLB invalidate requires a post-sync write.
>   		 */
> -		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> +		cmd |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
>   	}
> +	if (flags & I915_COMMAND_BARRIER)
> +		/*
> +		 * Ensure that any following seqno writes only happen
> +		 * when the render cache is indeed flushed.
> +		 */
> +		cmd |= PIPE_CONTROL_CS_STALL;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	if (cmd) {
> +		ring = intel_ring_begin(rq, 4);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
>   
> -	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> -	intel_ring_emit(ring, flags);
> -	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, 0);
> -	intel_ring_advance(ring);
> +		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> +		intel_ring_emit(ring, cmd);
> +		intel_ring_emit(ring, scratch | PIPE_CONTROL_GLOBAL_GTT);
> +		intel_ring_emit(ring, 0);
> +		intel_ring_advance(ring);
> +	}
>   
>   	return 0;
>   }
>   
>   static int
> -gen7_render_ring_cs_stall_wa(struct intel_engine_cs *ring)
> +gen7_render_ring_cs_stall_wa(struct i915_gem_request *rq)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
>   	intel_ring_emit(ring, PIPE_CONTROL_CS_STALL |
> @@ -306,35 +241,32 @@ gen7_render_ring_cs_stall_wa(struct intel_engine_cs *ring)
>   	return 0;
>   }
>   
> -static int gen7_ring_fbc_flush(struct intel_engine_cs *ring, u32 value)
> +static int gen7_ring_fbc_flush(struct i915_gem_request *rq, u32 value)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	if (!ring->fbc_dirty)
> -		return 0;
> +	ring = intel_ring_begin(rq, 6);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(rq);
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
>   	/* WaFbcNukeOn3DBlt:ivb/hsw */
>   	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
>   	intel_ring_emit(ring, MSG_FBC_REND_STATE);
>   	intel_ring_emit(ring, value);
>   	intel_ring_emit(ring, MI_STORE_REGISTER_MEM(1) | MI_SRM_LRM_GLOBAL_GTT);
>   	intel_ring_emit(ring, MSG_FBC_REND_STATE);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset + 256);
> +	intel_ring_emit(ring, rq->engine->scratch.gtt_offset + 256);
>   	intel_ring_advance(ring);
>   
> -	ring->fbc_dirty = false;
>   	return 0;
>   }
>   
>   static int
> -gen7_render_ring_flush(struct intel_engine_cs *ring,
> -		       u32 invalidate_domains, u32 flush_domains)
> +gen7_render_emit_flush(struct i915_gem_request *rq, u32 flags)
>   {
> -	u32 flags = 0;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	const u32 scratch_addr = rq->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	struct intel_ringbuffer *ring;
> +	u32 cmd = 0;
>   	int ret;
>   
>   	/*
> @@ -345,63 +277,71 @@ gen7_render_ring_flush(struct intel_engine_cs *ring,
>   	 * read-cache invalidate bits set) must have the CS_STALL bit set. We
>   	 * don't try to be clever and just set it unconditionally.
>   	 */
> -	flags |= PIPE_CONTROL_CS_STALL;
> +	cmd |= PIPE_CONTROL_CS_STALL;
>   
>   	/* Just flush everything.  Experiments have shown that reducing the
>   	 * number of bits based on the write domains has little performance
>   	 * impact.
>   	 */
> -	if (flush_domains) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> -	}
> -	if (invalidate_domains) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +	if (flags & I915_FLUSH_CACHES) {
> +		cmd |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		cmd |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +	}
> +	if (flags & I915_INVALIDATE_CACHES) {
> +		cmd |= PIPE_CONTROL_TLB_INVALIDATE;
> +		cmd |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
>   		/*
>   		 * TLB invalidate requires a post-sync write.
>   		 */
> -		flags |= PIPE_CONTROL_QW_WRITE;
> -		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> +		cmd |= PIPE_CONTROL_QW_WRITE;
> +		cmd |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>   
>   		/* Workaround: we must issue a pipe_control with CS-stall bit
>   		 * set before a pipe_control command that has the state cache
>   		 * invalidate bit set. */
> -		gen7_render_ring_cs_stall_wa(ring);
> +		ret = gen7_render_ring_cs_stall_wa(rq);
> +		if (ret)
> +			return ret;
>   	}
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> -	intel_ring_emit(ring, flags);
> +	intel_ring_emit(ring, cmd);
>   	intel_ring_emit(ring, scratch_addr);
>   	intel_ring_emit(ring, 0);
>   	intel_ring_advance(ring);
>   
> -	if (!invalidate_domains && flush_domains)
> -		return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> +	if (flags & I915_KICK_FBC) {
> +		ret = gen7_ring_fbc_flush(rq, FBC_REND_NUKE);
> +		if (ret)
> +			return ret;
> +	}
>   
>   	return 0;
>   }
>   
>   static int
> -gen8_emit_pipe_control(struct intel_engine_cs *ring,
> -		       u32 flags, u32 scratch_addr)
> +gen8_emit_pipe_control(struct i915_gem_request *rq,
> +		       u32 cmd, u32 scratch_addr)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 6);
> -	if (ret)
> -		return ret;
> +	if (cmd == 0)
> +		return 0;
> +
> +	ring = intel_ring_begin(rq, 6);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
>   
>   	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
> -	intel_ring_emit(ring, flags);
> +	intel_ring_emit(ring, cmd);
>   	intel_ring_emit(ring, scratch_addr);
>   	intel_ring_emit(ring, 0);
>   	intel_ring_emit(ring, 0);
> @@ -412,31 +352,29 @@ gen8_emit_pipe_control(struct intel_engine_cs *ring,
>   }
>   
>   static int
> -gen8_render_ring_flush(struct intel_engine_cs *ring,
> -		       u32 invalidate_domains, u32 flush_domains)
> +gen8_render_emit_flush(struct i915_gem_request *rq,
> +		       u32 flags)
>   {
> -	u32 flags = 0;
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	const u32 scratch_addr = rq->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	u32 cmd = 0;
>   	int ret;
>   
> -	flags |= PIPE_CONTROL_CS_STALL;
> -
> -	if (flush_domains) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +	if (flags & I915_FLUSH_CACHES) {
> +		cmd |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		cmd |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
>   	}
> -	if (invalidate_domains) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_QW_WRITE;
> -		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> +	if (flags & I915_INVALIDATE_CACHES) {
> +		cmd |= PIPE_CONTROL_TLB_INVALIDATE;
> +		cmd |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +		cmd |= PIPE_CONTROL_QW_WRITE;
> +		cmd |= PIPE_CONTROL_GLOBAL_GTT_IVB;
>   
>   		/* WaCsStallBeforeStateCacheInvalidate:bdw,chv */
> -		ret = gen8_emit_pipe_control(ring,
> +		ret = gen8_emit_pipe_control(rq,
>   					     PIPE_CONTROL_CS_STALL |
>   					     PIPE_CONTROL_STALL_AT_SCOREBOARD,
>   					     0);
> @@ -444,304 +382,419 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
>   			return ret;
>   	}
>   
> -	ret = gen8_emit_pipe_control(ring, flags, scratch_addr);
> +	if (flags & I915_COMMAND_BARRIER)
> +		cmd |= PIPE_CONTROL_CS_STALL;
> +
> +
> +	ret = gen8_emit_pipe_control(rq, cmd, scratch_addr);
>   	if (ret)
>   		return ret;
>   
> -	if (!invalidate_domains && flush_domains)
> -		return gen7_ring_fbc_flush(ring, FBC_REND_NUKE);
> +	if (flags & I915_KICK_FBC) {
> +		ret = gen7_ring_fbc_flush(rq, FBC_REND_NUKE);
> +		if (ret)
> +			return ret;
> +	}
>   
>   	return 0;
>   }
>   
> -static void ring_write_tail(struct intel_engine_cs *ring,
> +static void ring_write_tail(struct intel_engine_cs *engine,
>   			    u32 value)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	I915_WRITE_TAIL(ring, value);
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	I915_WRITE_TAIL(engine, value);
>   }
>   
> -u64 intel_ring_get_active_head(struct intel_engine_cs *ring)
> +u64 intel_engine_get_active_head(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	u64 acthd;
>   
> -	if (INTEL_INFO(ring->dev)->gen >= 8)
> -		acthd = I915_READ64_2x32(RING_ACTHD(ring->mmio_base),
> -					 RING_ACTHD_UDW(ring->mmio_base));
> -	else if (INTEL_INFO(ring->dev)->gen >= 4)
> -		acthd = I915_READ(RING_ACTHD(ring->mmio_base));
> +	if (INTEL_INFO(dev_priv)->gen >= 8)
> +		acthd = I915_READ64_2x32(RING_ACTHD(engine->mmio_base),
> +					 RING_ACTHD_UDW(engine->mmio_base));
> +	else if (INTEL_INFO(dev_priv)->gen >= 4)
> +		acthd = I915_READ(RING_ACTHD(engine->mmio_base));
>   	else
>   		acthd = I915_READ(ACTHD);
>   
>   	return acthd;
>   }
>   
> -static void ring_setup_phys_status_page(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	u32 addr;
> -
> -	addr = dev_priv->status_page_dmah->busaddr;
> -	if (INTEL_INFO(ring->dev)->gen >= 4)
> -		addr |= (dev_priv->status_page_dmah->busaddr >> 28) & 0xf0;
> -	I915_WRITE(HWS_PGA, addr);
> -}
> -
> -static bool stop_ring(struct intel_engine_cs *ring)
> +static bool engine_stop(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	struct drm_i915_private *dev_priv = engine->i915;
>   
> -	if (!IS_GEN2(ring->dev)) {
> -		I915_WRITE_MODE(ring, _MASKED_BIT_ENABLE(STOP_RING));
> -		if (wait_for((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) {
> -			DRM_ERROR("%s : timed out trying to stop ring\n", ring->name);
> +	if (!IS_GEN2(dev_priv)) {
> +		I915_WRITE_MODE(engine, _MASKED_BIT_ENABLE(STOP_RING));
> +		if (wait_for((I915_READ_MODE(engine) & MODE_IDLE) != 0, 1000)) {
> +			DRM_ERROR("%s : timed out trying to stop ring\n", engine->name);
>   			/* Sometimes we observe that the idle flag is not
>   			 * set even though the ring is empty. So double
>   			 * check before giving up.
>   			 */
> -			if (I915_READ_HEAD(ring) != I915_READ_TAIL(ring))
> +			if (I915_READ_HEAD(engine) != I915_READ_TAIL(engine))
>   				return false;
>   		}
>   	}
>   
> -	I915_WRITE_CTL(ring, 0);
> -	I915_WRITE_HEAD(ring, 0);
> -	ring->write_tail(ring, 0);
> +	I915_WRITE_CTL(engine, 0);
> +	I915_WRITE_HEAD(engine, 0);
> +	engine->write_tail(engine, 0);
>   
> -	if (!IS_GEN2(ring->dev)) {
> -		(void)I915_READ_CTL(ring);
> -		I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
> +	if (!IS_GEN2(dev_priv)) {
> +		(void)I915_READ_CTL(engine);
> +		I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
>   	}
>   
> -	return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0;
> +	return (I915_READ_HEAD(engine) & HEAD_ADDR) == 0;
> +}
> +
> +static int engine_suspend(struct intel_engine_cs *engine)
> +{
> +	return engine_stop(engine) ? 0 : -EIO;
>   }
>   
> -static int init_ring_common(struct intel_engine_cs *ring)
> +static int enable_status_page(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	struct drm_i915_gem_object *obj = ringbuf->obj;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	u32 mmio, addr;
>   	int ret = 0;
>   
> -	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	if (!I915_NEED_GFX_HWS(dev_priv)) {
> +		addr = dev_priv->status_page_dmah->busaddr;
> +		if (INTEL_INFO(dev_priv)->gen >= 4)
> +			addr |= (dev_priv->status_page_dmah->busaddr >> 28) & 0xf0;
> +		mmio = HWS_PGA;
> +	} else {
> +		addr = engine->status_page.gfx_addr;
> +		/* The ring status page addresses are no longer next to the rest of
> +		 * the ring registers as of gen7.
> +		 */
> +		if (IS_GEN7(dev_priv)) {
> +			switch (engine->id) {
> +			default:
> +			case RCS:
> +				mmio = RENDER_HWS_PGA_GEN7;
> +				break;
> +			case BCS:
> +				mmio = BLT_HWS_PGA_GEN7;
> +				break;
> +				/*
> +				 * VCS2 actually doesn't exist on Gen7. Only shut up
> +				 * gcc switch check warning
> +				 */
> +			case VCS2:
> +			case VCS:
> +				mmio = BSD_HWS_PGA_GEN7;
> +				break;
> +			case VECS:
> +				mmio = VEBOX_HWS_PGA_GEN7;
> +				break;
> +			}
> +		} else if (IS_GEN6(dev_priv)) {
> +			mmio = RING_HWS_PGA_GEN6(engine->mmio_base);
> +		} else {
> +			/* XXX: gen8 returns to sanity */
> +			mmio = RING_HWS_PGA(engine->mmio_base);
> +		}
> +	}
> +
> +	I915_WRITE(mmio, addr);
> +	POSTING_READ(mmio);
> +
> +	/*
> +	 * Flush the TLB for this page
> +	 *
> +	 * FIXME: These two bits have disappeared on gen8, so a question
> +	 * arises: do we still need this and if so how should we go about
> +	 * invalidating the TLB?
> +	 */
> +	if (INTEL_INFO(dev_priv)->gen >= 6 && INTEL_INFO(dev_priv)->gen < 8) {
> +		u32 reg = RING_INSTPM(engine->mmio_base);
> +
> +		/* ring should be idle before issuing a sync flush*/
> +		WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>   
> -	if (!stop_ring(ring)) {
> +		I915_WRITE(reg,
> +			   _MASKED_BIT_ENABLE(INSTPM_TLB_INVALIDATE |
> +					      INSTPM_SYNC_FLUSH));
> +		if (wait_for((I915_READ(reg) & INSTPM_SYNC_FLUSH) == 0,
> +			     1000)) {
> +			DRM_ERROR("%s: wait for SyncFlush to complete for TLB invalidation timed out\n",
> +				  engine->name);
> +			ret = -EIO;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static struct intel_ringbuffer *
> +engine_get_ring(struct intel_engine_cs *engine,
> +		struct intel_context *ctx)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct intel_ringbuffer *ring;
> +	int ret = 0;
> +
> +	ring = engine->legacy_ring;
> +	if (ring)
> +		return ring;
> +
> +	ring = intel_engine_alloc_ring(engine, ctx, 32 * PAGE_SIZE);
> +	if (IS_ERR(ring)) {
> +		DRM_ERROR("Failed to allocate ringbuffer for %s: %ld\n", engine->name, PTR_ERR(ring));
> +		return ERR_CAST(ring);
> +	}
> +
> +	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +	if (!engine_stop(engine)) {
>   		/* G45 ring initialization often fails to reset head to zero */
>   		DRM_DEBUG_KMS("%s head not reset to zero "
>   			      "ctl %08x head %08x tail %08x start %08x\n",
> -			      ring->name,
> -			      I915_READ_CTL(ring),
> -			      I915_READ_HEAD(ring),
> -			      I915_READ_TAIL(ring),
> -			      I915_READ_START(ring));
> -
> -		if (!stop_ring(ring)) {
> +			      engine->name,
> +			      I915_READ_CTL(engine),
> +			      I915_READ_HEAD(engine),
> +			      I915_READ_TAIL(engine),
> +			      I915_READ_START(engine));
> +		if (!engine_stop(engine)) {
>   			DRM_ERROR("failed to set %s head to zero "
>   				  "ctl %08x head %08x tail %08x start %08x\n",
> -				  ring->name,
> -				  I915_READ_CTL(ring),
> -				  I915_READ_HEAD(ring),
> -				  I915_READ_TAIL(ring),
> -				  I915_READ_START(ring));
> +				  engine->name,
> +				  I915_READ_CTL(engine),
> +				  I915_READ_HEAD(engine),
> +				  I915_READ_TAIL(engine),
> +				  I915_READ_START(engine));
>   			ret = -EIO;
> -			goto out;
>   		}
>   	}
> +	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
>   
> -	if (I915_NEED_GFX_HWS(dev))
> -		intel_ring_setup_status_page(ring);
> -	else
> -		ring_setup_phys_status_page(ring);
> +	if (ret == 0) {
> +		engine->legacy_ring = ring;
> +	} else {
> +		intel_ring_free(ring);
> +		ring = ERR_PTR(ret);
> +	}
> +
> +	return ring;
> +}
> +
> +static int engine_resume(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	struct intel_ringbuffer *ring = engine->legacy_ring;
> +	int retry = 3, ret;
> +
> +	if (WARN_ON(ring == NULL))
> +		return -ENODEV;
> +
> +	gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	ret = enable_status_page(engine);
>   
> +reset:
>   	/* Enforce ordering by reading HEAD register back */
> -	I915_READ_HEAD(ring);
> +	engine->write_tail(engine, ring->tail);
> +	I915_WRITE_HEAD(engine, ring->head);
> +	(void)I915_READ_HEAD(engine);
>   
>   	/* Initialize the ring. This must happen _after_ we've cleared the ring
>   	 * registers with the above sequence (the readback of the HEAD registers
>   	 * also enforces ordering), otherwise the hw might lose the new ring
>   	 * register values. */
> -	I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
> +	I915_WRITE_START(engine, i915_gem_obj_ggtt_offset(ring->obj));
>   
>   	/* WaClearRingBufHeadRegAtInit:ctg,elk */
> -	if (I915_READ_HEAD(ring))
> +	if (I915_READ_HEAD(engine) != ring->head)
>   		DRM_DEBUG("%s initialization failed [head=%08x], fudging\n",
> -			  ring->name, I915_READ_HEAD(ring));
> -	I915_WRITE_HEAD(ring, 0);
> -	(void)I915_READ_HEAD(ring);
> -
> -	I915_WRITE_CTL(ring,
> -			((ringbuf->size - PAGE_SIZE) & RING_NR_PAGES)
> -			| RING_VALID);
> -
> -	/* If the head is still not zero, the ring is dead */
> -	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
> -		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
> -		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
> +			  engine->name, I915_READ_HEAD(engine));
> +	I915_WRITE_HEAD(engine, ring->head);
> +	(void)I915_READ_HEAD(engine);
> +
> +	I915_WRITE_CTL(engine,
> +		       ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
> +		       | RING_VALID);
> +
> +	if (wait_for((I915_READ_CTL(engine) & RING_VALID) != 0, 50)) {
> +		if (retry-- && engine_stop(engine))
> +			goto reset;
> +	}
> +
> +	if ((I915_READ_CTL(engine) & RING_VALID) == 0 ||
> +	    I915_READ_START(engine) != i915_gem_obj_ggtt_offset(ring->obj)) {
>   		DRM_ERROR("%s initialization failed "
> -			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
> -			  ring->name,
> -			  I915_READ_CTL(ring), I915_READ_CTL(ring) & RING_VALID,
> -			  I915_READ_HEAD(ring), I915_READ_TAIL(ring),
> -			  I915_READ_START(ring), (unsigned long)i915_gem_obj_ggtt_offset(obj));
> +			  "ctl %08x (valid? %d) head %08x [expected %08x], tail %08x [expected %08x], start %08x [expected %08lx]\n",
> +			  engine->name,
> +			  I915_READ_CTL(engine), I915_READ_CTL(engine) & RING_VALID,
> +			  I915_READ_HEAD(engine), ring->head,
> +			  I915_READ_TAIL(engine), ring->tail,
> +			  I915_READ_START(engine), (unsigned long)i915_gem_obj_ggtt_offset(ring->obj));
>   		ret = -EIO;
> -		goto out;
>   	}
>   
> -	ringbuf->head = I915_READ_HEAD(ring);
> -	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
> -	ringbuf->space = intel_ring_space(ringbuf);
> -	ringbuf->last_retired_head = -1;
> +	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +	return ret;
> +}
>   
> -	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> +static void engine_put_ring(struct intel_ringbuffer *ring,
> +			    struct intel_context *ctx)
> +{
> +	if (ring->last_context == ctx) {
> +		struct i915_gem_request *rq;
> +		int ret = -EINVAL;
>   
> -out:
> -	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> +		rq = intel_engine_alloc_request(ring->engine,
> +						ring->engine->default_context);
> +		if (!IS_ERR(rq)) {
> +			ret = i915_request_commit(rq);
> +			i915_request_put(rq);
> +		}
> +		if (WARN_ON(ret))
> +			ring->last_context = ring->engine->default_context;
> +	}
> +}
>   
> -	return ret;
> +static int engine_add_request(struct i915_gem_request *rq)
> +{
> +	rq->engine->write_tail(rq->engine, rq->tail);
> +	list_add_tail(&rq->engine_list, &rq->engine->requests);
> +	return 0;
>   }
>   
> -void
> -intel_fini_pipe_control(struct intel_engine_cs *ring)
> +static bool engine_rq_is_complete(struct i915_gem_request *rq)
>   {
> -	struct drm_device *dev = ring->dev;
> +	return __i915_seqno_passed(rq->engine->get_seqno(rq->engine),
> +				   rq->seqno);
> +}
>   
> -	if (ring->scratch.obj == NULL)
> +static void
> +fini_pipe_control(struct intel_engine_cs *engine)
> +{
> +	if (engine->scratch.obj == NULL)
>   		return;
>   
> -	if (INTEL_INFO(dev)->gen >= 5) {
> -		kunmap(sg_page(ring->scratch.obj->pages->sgl));
> -		i915_gem_object_ggtt_unpin(ring->scratch.obj);
> +	if (INTEL_INFO(engine->i915)->gen >= 5) {
> +		kunmap(sg_page(engine->scratch.obj->pages->sgl));
> +		i915_gem_object_ggtt_unpin(engine->scratch.obj);
>   	}
>   
> -	drm_gem_object_unreference(&ring->scratch.obj->base);
> -	ring->scratch.obj = NULL;
> +	drm_gem_object_unreference(&engine->scratch.obj->base);
> +	engine->scratch.obj = NULL;
>   }
>   
> -int
> -intel_init_pipe_control(struct intel_engine_cs *ring)
> +static int
> +init_pipe_control(struct intel_engine_cs *engine)
>   {
>   	int ret;
>   
> -	if (ring->scratch.obj)
> +	if (engine->scratch.obj)
>   		return 0;
>   
> -	ring->scratch.obj = i915_gem_alloc_object(ring->dev, 4096);
> -	if (ring->scratch.obj == NULL) {
> +	engine->scratch.obj = i915_gem_alloc_object(engine->i915->dev, 4096);
> +	if (engine->scratch.obj == NULL) {
>   		DRM_ERROR("Failed to allocate seqno page\n");
>   		ret = -ENOMEM;
>   		goto err;
>   	}
>   
> -	ret = i915_gem_object_set_cache_level(ring->scratch.obj, I915_CACHE_LLC);
> +	ret = i915_gem_object_set_cache_level(engine->scratch.obj, I915_CACHE_LLC);
>   	if (ret)
>   		goto err_unref;
>   
> -	ret = i915_gem_obj_ggtt_pin(ring->scratch.obj, 4096, 0);
> +	ret = i915_gem_obj_ggtt_pin(engine->scratch.obj, 4096, 0);
>   	if (ret)
>   		goto err_unref;
>   
> -	ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(ring->scratch.obj);
> -	ring->scratch.cpu_page = kmap(sg_page(ring->scratch.obj->pages->sgl));
> -	if (ring->scratch.cpu_page == NULL) {
> +	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(engine->scratch.obj);
> +	engine->scratch.cpu_page = kmap(sg_page(engine->scratch.obj->pages->sgl));
> +	if (engine->scratch.cpu_page == NULL) {
>   		ret = -ENOMEM;
>   		goto err_unpin;
>   	}
>   
>   	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
> -			 ring->name, ring->scratch.gtt_offset);
> +			 engine->name, engine->scratch.gtt_offset);
>   	return 0;
>   
>   err_unpin:
> -	i915_gem_object_ggtt_unpin(ring->scratch.obj);
> +	i915_gem_object_ggtt_unpin(engine->scratch.obj);
>   err_unref:
> -	drm_gem_object_unreference(&ring->scratch.obj->base);
> +	drm_gem_object_unreference(&engine->scratch.obj->base);
>   err:
> +	engine->scratch.obj = NULL;
>   	return ret;
>   }
>   
> -static inline void intel_ring_emit_wa(struct intel_engine_cs *ring,
> -				       u32 addr, u32 value)
> +static int
> +emit_lri(struct i915_gem_request *rq,
> +	 int num_registers,
> +	 ...)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ringbuffer *ring;
> +	va_list ap;
>   
> -	if (WARN_ON(dev_priv->num_wa_regs >= I915_MAX_WA_REGS))
> -		return;
> +	BUG_ON(num_registers > 60);
>   
> -	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> -	intel_ring_emit(ring, addr);
> -	intel_ring_emit(ring, value);
> +	ring = intel_ring_begin(rq, 2*num_registers + 1);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].addr = addr;
> -	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].mask = value & 0xFFFF;
> -	/* value is updated with the status of remaining bits of this
> -	 * register when it is read from debugfs file
> -	 */
> -	dev_priv->intel_wa_regs[dev_priv->num_wa_regs].value = value;
> -	dev_priv->num_wa_regs++;
> +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_registers));
> +	va_start(ap, num_registers);
> +	while (num_registers--) {
> +		intel_ring_emit(ring, va_arg(ap, u32));
> +		intel_ring_emit(ring, va_arg(ap, u32));
> +	}
> +	va_end(ap);
> +	intel_ring_advance(ring);
>   
> -	return;
> +	return 0;
>   }
>   
> -static int bdw_init_workarounds(struct intel_engine_cs *ring)
> +static int bdw_render_init_context(struct i915_gem_request *rq)
>   {
>   	int ret;
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
>   
> -	/*
> -	 * workarounds applied in this fn are part of register state context,
> -	 * they need to be re-initialized followed by gpu reset, suspend/resume,
> -	 * module reload.
> -	 */
> -	dev_priv->num_wa_regs = 0;
> -	memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs));
> -
> -	/*
> -	 * update the number of dwords required based on the
> -	 * actual number of workarounds applied
> -	 */
> -	ret = intel_ring_begin(ring, 24);
> -	if (ret)
> -		return ret;
> +	ret = emit_lri(rq, 8,
>   
> +	/* FIXME: Unclear whether we really need this on production bdw. */
> +	GEN8_ROW_CHICKEN,
>   	/* WaDisablePartialInstShootdown:bdw */
> +	_MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) |
>   	/* WaDisableThreadStallDopClockGating:bdw */
> -	/* FIXME: Unclear whether we really need this on production bdw. */
> -	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
> -			   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE
> -					     | STALL_DOP_GATING_DISABLE));
> +	_MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE),
>   
> +	GEN7_ROW_CHICKEN2,
>   	/* WaDisableDopClockGating:bdw May not be needed for production */
> -	intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2,
> -			   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
> +	_MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE),
>   
>   	/*
>   	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
>   	 * pre-production hardware
>   	 */
> -	intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3,
> -			   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS
> -					      | GEN8_SAMPLER_POWER_BYPASS_DIS));
> +	HALF_SLICE_CHICKEN3,
> +	_MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS) |
> +	_MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS),
>   
> -	intel_ring_emit_wa(ring, GEN7_HALF_SLICE_CHICKEN1,
> -			   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
> +	GEN7_HALF_SLICE_CHICKEN1,
> +	_MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE),
>   
> -	intel_ring_emit_wa(ring, COMMON_SLICE_CHICKEN2,
> -			   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
> +	COMMON_SLICE_CHICKEN2,
> +	_MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE),
>   
>   	/* Use Force Non-Coherent whenever executing a 3D context. This is a
>   	 * workaround for for a possible hang in the unlikely event a TLB
>   	 * invalidation occurs during a PSD flush.
>   	 */
> -	intel_ring_emit_wa(ring, HDC_CHICKEN0,
> -			   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
> +	HDC_CHICKEN0,
> +	_MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT),
>   
> +	CACHE_MODE_1,
>   	/* Wa4x4STCOptimizationDisable:bdw */
> -	intel_ring_emit_wa(ring, CACHE_MODE_1,
> -			   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
> +	_MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE),
>   
>   	/*
>   	 * BSpec recommends 8x4 when MSAA is used,
> @@ -751,66 +804,51 @@ static int bdw_init_workarounds(struct intel_engine_cs *ring)
>   	 * disable bit, which we don't touch here, but it's good
>   	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>   	 */
> -	intel_ring_emit_wa(ring, GEN7_GT_MODE,
> -			   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> -
> -	intel_ring_advance(ring);
> -
> -	DRM_DEBUG_DRIVER("Number of Workarounds applied: %d\n",
> -			 dev_priv->num_wa_regs);
> +	GEN7_GT_MODE,
> +	GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> +	if (ret)
> +		return ret;
>   
> -	return 0;
> +	return i915_gem_render_state_init(rq);
>   }
>   
> -static int chv_init_workarounds(struct intel_engine_cs *ring)
> +static int chv_render_init_context(struct i915_gem_request *rq)
>   {
>   	int ret;
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	/*
> -	 * workarounds applied in this fn are part of register state context,
> -	 * they need to be re-initialized followed by gpu reset, suspend/resume,
> -	 * module reload.
> -	 */
> -	dev_priv->num_wa_regs = 0;
> -	memset(dev_priv->intel_wa_regs, 0, sizeof(dev_priv->intel_wa_regs));
>   
> -	ret = intel_ring_begin(ring, 12);
> -	if (ret)
> -		return ret;
> +	ret = emit_lri(rq, 8,
>   
> +	GEN8_ROW_CHICKEN,
>   	/* WaDisablePartialInstShootdown:chv */
> -	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
> -			   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
> -
> +	_MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE) |
>   	/* WaDisableThreadStallDopClockGating:chv */
> -	intel_ring_emit_wa(ring, GEN8_ROW_CHICKEN,
> -			   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
> +	_MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE),
>   
>   	/* WaDisableDopClockGating:chv (pre-production hw) */
> -	intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2,
> -			   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
> +	GEN7_ROW_CHICKEN2,
> +	_MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE),
>   
>   	/* WaDisableSamplerPowerBypass:chv (pre-production hw) */
> -	intel_ring_emit_wa(ring, HALF_SLICE_CHICKEN3,
> -			   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
> +	HALF_SLICE_CHICKEN3,
> +	_MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
>   
> -	intel_ring_advance(ring);
> +	if (ret)
> +		return ret;
>   
> -	return 0;
> +	return i915_gem_render_state_init(rq);
>   }
>   
> -static int init_render_ring(struct intel_engine_cs *ring)
> +static int render_resume(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret = init_ring_common(ring);
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	int ret;
> +
> +	ret = engine_resume(engine);
>   	if (ret)
>   		return ret;
>   
>   	/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
> -	if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
> +	if (INTEL_INFO(dev_priv)->gen >= 4 && INTEL_INFO(dev_priv)->gen < 7)
>   		I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH));
>   
>   	/* We need to disable the AsyncFlip performance optimisations in order
> @@ -819,28 +857,22 @@ static int init_render_ring(struct intel_engine_cs *ring)
>   	 *
>   	 * WaDisableAsyncFlipPerfMode:snb,ivb,hsw,vlv,bdw,chv
>   	 */
> -	if (INTEL_INFO(dev)->gen >= 6)
> +	if (INTEL_INFO(dev_priv)->gen >= 6)
>   		I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(ASYNC_FLIP_PERF_DISABLE));
>   
>   	/* Required for the hardware to program scanline values for waiting */
>   	/* WaEnableFlushTlbInvalidationMode:snb */
> -	if (INTEL_INFO(dev)->gen == 6)
> +	if (INTEL_INFO(dev_priv)->gen == 6)
>   		I915_WRITE(GFX_MODE,
>   			   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT));
>   
>   	/* WaBCSVCSTlbInvalidationMode:ivb,vlv,hsw */
> -	if (IS_GEN7(dev))
> +	if (IS_GEN7(dev_priv))
>   		I915_WRITE(GFX_MODE_GEN7,
>   			   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT) |
>   			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
>   
> -	if (INTEL_INFO(dev)->gen >= 5) {
> -		ret = intel_init_pipe_control(ring);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	if (IS_GEN6(dev)) {
> +	if (IS_GEN6(dev_priv)) {
>   		/* From the Sandybridge PRM, volume 1 part 3, page 24:
>   		 * "If this bit is set, STCunit will have LRA as replacement
>   		 *  policy. [...] This bit must be reset.  LRA replacement
> @@ -850,19 +882,40 @@ static int init_render_ring(struct intel_engine_cs *ring)
>   			   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
>   	}
>   
> -	if (INTEL_INFO(dev)->gen >= 6)
> +	if (INTEL_INFO(dev_priv)->gen >= 6)
>   		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
>   
> -	if (HAS_L3_DPF(dev))
> -		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
> +	return 0;
> +}
>   
> -	return ret;
> +static void cleanup_status_page(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_gem_object *obj;
> +
> +	obj = engine->status_page.obj;
> +	if (obj == NULL)
> +		return;
> +
> +	kunmap(sg_page(obj->pages->sgl));
> +	i915_gem_object_ggtt_unpin(obj);
> +	drm_gem_object_unreference(&obj->base);
> +	engine->status_page.obj = NULL;
> +}
> +
> +static void engine_cleanup(struct intel_engine_cs *engine)
> +{
> +	if (engine->legacy_ring)
> +		intel_ring_free(engine->legacy_ring);
> +
> +	cleanup_status_page(engine);
> +	i915_cmd_parser_fini_engine(engine);
>   }
>   
> -static void render_ring_cleanup(struct intel_engine_cs *ring)
> +static void render_cleanup(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +
> +	engine_cleanup(engine);
>   
>   	if (dev_priv->semaphore_obj) {
>   		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> @@ -870,154 +923,82 @@ static void render_ring_cleanup(struct intel_engine_cs *ring)
>   		dev_priv->semaphore_obj = NULL;
>   	}
>   
> -	intel_fini_pipe_control(ring);
> +	fini_pipe_control(engine);
>   }
>   
> -static int gen8_rcs_signal(struct intel_engine_cs *signaller,
> -			   unsigned int num_dwords)
> +static int
> +gen8_rcs_emit_signal(struct i915_gem_request *rq, int id)
>   {
> -#define MBOX_UPDATE_DWORDS 8
> -	struct drm_device *dev = signaller->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *waiter;
> -	int i, ret, num_rings;
> -
> -	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
> -	num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS;
> -#undef MBOX_UPDATE_DWORDS
> -
> -	ret = intel_ring_begin(signaller, num_dwords);
> -	if (ret)
> -		return ret;
> +	u64 offset = GEN8_SEMAPHORE_OFFSET(rq->i915, rq->engine->id, id);
> +	struct intel_ringbuffer *ring;
>   
> -	for_each_ring(waiter, dev_priv, i) {
> -		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> -		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> -			continue;
> +	ring = intel_ring_begin(rq, 8);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> -		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> -					   PIPE_CONTROL_QW_WRITE |
> -					   PIPE_CONTROL_FLUSH_ENABLE);
> -		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> -		intel_ring_emit(signaller, 0);
> -		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> -					   MI_SEMAPHORE_TARGET(waiter->id));
> -		intel_ring_emit(signaller, 0);
> -	}
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(6));
> +	intel_ring_emit(ring,
> +			PIPE_CONTROL_GLOBAL_GTT_IVB |
> +			PIPE_CONTROL_QW_WRITE |
> +			PIPE_CONTROL_FLUSH_ENABLE);
> +	intel_ring_emit(ring, lower_32_bits(offset));
> +	intel_ring_emit(ring, upper_32_bits(offset));
> +	intel_ring_emit(ring, rq->seqno);
> +	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring,
> +			MI_SEMAPHORE_SIGNAL |
> +			MI_SEMAPHORE_TARGET(id));
> +	intel_ring_emit(ring, 0);
> +	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> -static int gen8_xcs_signal(struct intel_engine_cs *signaller,
> -			   unsigned int num_dwords)
> +static int
> +gen8_xcs_emit_signal(struct i915_gem_request *rq, int id)
>   {
> -#define MBOX_UPDATE_DWORDS 6
> -	struct drm_device *dev = signaller->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *waiter;
> -	int i, ret, num_rings;
> +	u64 offset = GEN8_SEMAPHORE_OFFSET(rq->i915, rq->engine->id, id);
> +	struct intel_ringbuffer *ring;
>   
> -	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
> -	num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS;
> -#undef MBOX_UPDATE_DWORDS
> +	ring = intel_ring_begin(rq, 6);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	ret = intel_ring_begin(signaller, num_dwords);
> -	if (ret)
> -		return ret;
> -
> -	for_each_ring(waiter, dev_priv, i) {
> -		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> -		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> -			continue;
> -
> -		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
> -					   MI_FLUSH_DW_OP_STOREDW);
> -		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
> -					   MI_FLUSH_DW_USE_GTT);
> -		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> -		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> -					   MI_SEMAPHORE_TARGET(waiter->id));
> -		intel_ring_emit(signaller, 0);
> -	}
> -
> -	return 0;
> -}
> -
> -static int gen6_signal(struct intel_engine_cs *signaller,
> -		       unsigned int num_dwords)
> -{
> -	struct drm_device *dev = signaller->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *useless;
> -	int i, ret, num_rings;
> -
> -#define MBOX_UPDATE_DWORDS 3
> -	num_rings = hweight32(INTEL_INFO(dev)->ring_mask);
> -	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
> -#undef MBOX_UPDATE_DWORDS
> -
> -	ret = intel_ring_begin(signaller, num_dwords);
> -	if (ret)
> -		return ret;
> -
> -	for_each_ring(useless, dev_priv, i) {
> -		u32 mbox_reg = signaller->semaphore.mbox.signal[i];
> -		if (mbox_reg != GEN6_NOSYNC) {
> -			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
> -			intel_ring_emit(signaller, mbox_reg);
> -			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> -		}
> -	}
> -
> -	/* If num_dwords was rounded, make sure the tail pointer is correct */
> -	if (num_rings % 2 == 0)
> -		intel_ring_emit(signaller, MI_NOOP);
> +	intel_ring_emit(ring,
> +			MI_FLUSH_DW |
> +			MI_FLUSH_DW_OP_STOREDW |
> +			(4 - 2));
> +	intel_ring_emit(ring,
> +			lower_32_bits(offset) |
> +			MI_FLUSH_DW_USE_GTT);
> +	intel_ring_emit(ring, upper_32_bits(offset));
> +	intel_ring_emit(ring, rq->seqno);
> +	intel_ring_emit(ring,
> +			MI_SEMAPHORE_SIGNAL |
> +			MI_SEMAPHORE_TARGET(id));
> +	intel_ring_emit(ring, 0);
> +	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> -/**
> - * gen6_add_request - Update the semaphore mailbox registers
> - *
> - * @ring - ring that is adding a request
> - * @seqno - return seqno stuck into the ring
> - *
> - * Update the mailbox registers in the *other* rings with the current seqno.
> - * This acts like a signal in the canonical semaphore.
> - */
>   static int
> -gen6_add_request(struct intel_engine_cs *ring)
> +gen6_emit_signal(struct i915_gem_request *rq, int id)
>   {
> -	int ret;
> -
> -	if (ring->semaphore.signal)
> -		ret = ring->semaphore.signal(ring, 4);
> -	else
> -		ret = intel_ring_begin(ring, 4);
> +	struct intel_ringbuffer *ring;
>   
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 3);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
> -	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
> -	intel_ring_emit(ring, MI_USER_INTERRUPT);
> -	__intel_ring_advance(ring);
> +	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> +	intel_ring_emit(ring, rq->engine->semaphore.mbox.signal[id]);
> +	intel_ring_emit(ring, rq->seqno);
> +	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
> -static inline bool i915_gem_has_seqno_wrapped(struct drm_device *dev,
> -					      u32 seqno)
> -{
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	return dev_priv->last_seqno < seqno;
> -}
> -
>   /**
>    * intel_ring_sync - sync the waiter to the signaller on seqno
>    *
> @@ -1027,66 +1008,52 @@ static inline bool i915_gem_has_seqno_wrapped(struct drm_device *dev,
>    */
>   
>   static int
> -gen8_ring_sync(struct intel_engine_cs *waiter,
> -	       struct intel_engine_cs *signaller,
> -	       u32 seqno)
> +gen8_emit_wait(struct i915_gem_request *waiter,
> +	       struct i915_gem_request *signaller)
>   {
> -	struct drm_i915_private *dev_priv = waiter->dev->dev_private;
> -	int ret;
> +	u64 offset = GEN8_SEMAPHORE_OFFSET(waiter->i915, signaller->engine->id, waiter->engine->id);
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(waiter, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(waiter, 4);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> -	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
> -				MI_SEMAPHORE_GLOBAL_GTT |
> -				MI_SEMAPHORE_POLL |
> -				MI_SEMAPHORE_SAD_GTE_SDD);
> -	intel_ring_emit(waiter, seqno);
> -	intel_ring_emit(waiter,
> -			lower_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
> -	intel_ring_emit(waiter,
> -			upper_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
> -	intel_ring_advance(waiter);
> +	intel_ring_emit(ring,
> +			MI_SEMAPHORE_WAIT |
> +			MI_SEMAPHORE_GLOBAL_GTT |
> +			MI_SEMAPHORE_POLL |
> +			MI_SEMAPHORE_SAD_GTE_SDD);
> +	intel_ring_emit(ring, signaller->breadcrumb[waiter->engine->id]);
> +	intel_ring_emit(ring, lower_32_bits(offset));
> +	intel_ring_emit(ring, upper_32_bits(offset));
> +	intel_ring_advance(ring);
>   	return 0;
>   }
>   
>   static int
> -gen6_ring_sync(struct intel_engine_cs *waiter,
> -	       struct intel_engine_cs *signaller,
> -	       u32 seqno)
> +gen6_emit_wait(struct i915_gem_request *waiter,
> +	       struct i915_gem_request *signaller)
>   {
>   	u32 dw1 = MI_SEMAPHORE_MBOX |
>   		  MI_SEMAPHORE_COMPARE |
>   		  MI_SEMAPHORE_REGISTER;
> -	u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id];
> -	int ret;
> +	u32 wait_mbox = signaller->engine->semaphore.mbox.wait[waiter->engine->id];
> +	struct intel_ringbuffer *ring;
> +
> +	WARN_ON(wait_mbox == MI_SEMAPHORE_SYNC_INVALID);
> +
> +	ring = intel_ring_begin(waiter, 3);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
> +	intel_ring_emit(ring, dw1 | wait_mbox);
>   	/* Throughout all of the GEM code, seqno passed implies our current
>   	 * seqno is >= the last seqno executed. However for hardware the
>   	 * comparison is strictly greater than.
>   	 */
> -	seqno -= 1;
> -
> -	WARN_ON(wait_mbox == MI_SEMAPHORE_SYNC_INVALID);
> -
> -	ret = intel_ring_begin(waiter, 4);
> -	if (ret)
> -		return ret;
> -
> -	/* If seqno wrap happened, omit the wait with no-ops */
> -	if (likely(!i915_gem_has_seqno_wrapped(waiter->dev, seqno))) {
> -		intel_ring_emit(waiter, dw1 | wait_mbox);
> -		intel_ring_emit(waiter, seqno);
> -		intel_ring_emit(waiter, 0);
> -		intel_ring_emit(waiter, MI_NOOP);
> -	} else {
> -		intel_ring_emit(waiter, MI_NOOP);
> -		intel_ring_emit(waiter, MI_NOOP);
> -		intel_ring_emit(waiter, MI_NOOP);
> -		intel_ring_emit(waiter, MI_NOOP);
> -	}
> -	intel_ring_advance(waiter);
> +	intel_ring_emit(ring, signaller->breadcrumb[waiter->engine->id] - 1);
> +	intel_ring_emit(ring, 0);
> +	intel_ring_advance(ring);
>   
>   	return 0;
>   }
> @@ -1101,10 +1068,10 @@ do {									\
>   } while (0)
>   
>   static int
> -pc_render_add_request(struct intel_engine_cs *ring)
> +gen5_emit_breadcrumb(struct i915_gem_request *rq)
>   {
> -	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> -	int ret;
> +	u32 scratch_addr = rq->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	struct intel_ringbuffer *ring;
>   
>   	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
>   	 * incoherent with writes to memory, i.e. completely fubar,
> @@ -1114,16 +1081,17 @@ pc_render_add_request(struct intel_engine_cs *ring)
>   	 * incoherence by flushing the 6 PIPE_NOTIFY buffers out to
>   	 * memory before requesting an interrupt.
>   	 */
> -	ret = intel_ring_begin(ring, 32);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 32);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
>   			PIPE_CONTROL_WRITE_FLUSH |
>   			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
> +	intel_ring_emit(ring, rq->engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, rq->seqno);
>   	intel_ring_emit(ring, 0);
> +
>   	PIPE_CONTROL_FLUSH(ring, scratch_addr);
>   	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
>   	PIPE_CONTROL_FLUSH(ring, scratch_addr);
> @@ -1140,96 +1108,80 @@ pc_render_add_request(struct intel_engine_cs *ring)
>   			PIPE_CONTROL_WRITE_FLUSH |
>   			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>   			PIPE_CONTROL_NOTIFY);
> -	intel_ring_emit(ring, ring->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
> +	intel_ring_emit(ring, rq->engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, rq->seqno);
>   	intel_ring_emit(ring, 0);
> -	__intel_ring_advance(ring);
> -
> -	return 0;
> -}
>   
> -static u32
> -gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> -{
> -	/* Workaround to force correct ordering between irq and seqno writes on
> -	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> -	if (!lazy_coherency) {
> -		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> -	}
> +	intel_ring_advance(ring);
>   
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +	return 0;
>   }
>   
>   static u32
> -ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +ring_get_seqno(struct intel_engine_cs *engine)
>   {
> -	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> +	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
>   }
>   
>   static void
> -ring_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> +ring_set_seqno(struct intel_engine_cs *engine, u32 seqno)
>   {
> -	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> +	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
>   }
>   
>   static u32
> -pc_render_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
> +gen5_render_get_seqno(struct intel_engine_cs *engine)
>   {
> -	return ring->scratch.cpu_page[0];
> +	return engine->scratch.cpu_page[0];
>   }
>   
>   static void
> -pc_render_set_seqno(struct intel_engine_cs *ring, u32 seqno)
> +gen5_render_set_seqno(struct intel_engine_cs *engine, u32 seqno)
>   {
> -	ring->scratch.cpu_page[0] = seqno;
> +	engine->scratch.cpu_page[0] = seqno;
>   }
>   
>   static bool
> -gen5_ring_get_irq(struct intel_engine_cs *ring)
> +gen5_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *i915 = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!i915->dev->irq_enabled)
>   		return false;
>   
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0)
> -		gen5_enable_gt_irq(dev_priv, ring->irq_enable_mask);
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	spin_lock_irqsave(&i915->irq_lock, flags);
> +	if (engine->irq_refcount++ == 0)
> +		gen5_enable_gt_irq(i915, engine->irq_enable_mask);
> +	spin_unlock_irqrestore(&i915->irq_lock, flags);
>   
>   	return true;
>   }
>   
>   static void
> -gen5_ring_put_irq(struct intel_engine_cs *ring)
> +gen5_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *i915 = engine->i915;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0)
> -		gen5_disable_gt_irq(dev_priv, ring->irq_enable_mask);
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	spin_lock_irqsave(&i915->irq_lock, flags);
> +	if (--engine->irq_refcount == 0)
> +		gen5_disable_gt_irq(i915, engine->irq_enable_mask);
> +	spin_unlock_irqrestore(&i915->irq_lock, flags);
>   }
>   
>   static bool
> -i9xx_ring_get_irq(struct intel_engine_cs *ring)
> +i9xx_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!dev_priv->dev->irq_enabled)
>   		return false;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		dev_priv->irq_mask &= ~ring->irq_enable_mask;
> +	if (engine->irq_refcount++ == 0) {
> +		dev_priv->irq_mask &= ~engine->irq_enable_mask;
>   		I915_WRITE(IMR, dev_priv->irq_mask);
>   		POSTING_READ(IMR);
>   	}
> @@ -1239,15 +1191,14 @@ i9xx_ring_get_irq(struct intel_engine_cs *ring)
>   }
>   
>   static void
> -i9xx_ring_put_irq(struct intel_engine_cs *ring)
> +i9xx_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		dev_priv->irq_mask |= ring->irq_enable_mask;
> +	if (--engine->irq_refcount == 0) {
> +		dev_priv->irq_mask |= engine->irq_enable_mask;
>   		I915_WRITE(IMR, dev_priv->irq_mask);
>   		POSTING_READ(IMR);
>   	}
> @@ -1255,18 +1206,17 @@ i9xx_ring_put_irq(struct intel_engine_cs *ring)
>   }
>   
>   static bool
> -i8xx_ring_get_irq(struct intel_engine_cs *ring)
> +i8xx_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!dev_priv->dev->irq_enabled)
>   		return false;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		dev_priv->irq_mask &= ~ring->irq_enable_mask;
> +	if (engine->irq_refcount++ == 0) {
> +		dev_priv->irq_mask &= ~engine->irq_enable_mask;
>   		I915_WRITE16(IMR, dev_priv->irq_mask);
>   		POSTING_READ16(IMR);
>   	}
> @@ -1276,175 +1226,120 @@ i8xx_ring_get_irq(struct intel_engine_cs *ring)
>   }
>   
>   static void
> -i8xx_ring_put_irq(struct intel_engine_cs *ring)
> +i8xx_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		dev_priv->irq_mask |= ring->irq_enable_mask;
> +	if (--engine->irq_refcount == 0) {
> +		dev_priv->irq_mask |= engine->irq_enable_mask;
>   		I915_WRITE16(IMR, dev_priv->irq_mask);
>   		POSTING_READ16(IMR);
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   }
>   
> -void intel_ring_setup_status_page(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	u32 mmio = 0;
> -
> -	/* The ring status page addresses are no longer next to the rest of
> -	 * the ring registers as of gen7.
> -	 */
> -	if (IS_GEN7(dev)) {
> -		switch (ring->id) {
> -		case RCS:
> -			mmio = RENDER_HWS_PGA_GEN7;
> -			break;
> -		case BCS:
> -			mmio = BLT_HWS_PGA_GEN7;
> -			break;
> -		/*
> -		 * VCS2 actually doesn't exist on Gen7. Only shut up
> -		 * gcc switch check warning
> -		 */
> -		case VCS2:
> -		case VCS:
> -			mmio = BSD_HWS_PGA_GEN7;
> -			break;
> -		case VECS:
> -			mmio = VEBOX_HWS_PGA_GEN7;
> -			break;
> -		}
> -	} else if (IS_GEN6(ring->dev)) {
> -		mmio = RING_HWS_PGA_GEN6(ring->mmio_base);
> -	} else {
> -		/* XXX: gen8 returns to sanity */
> -		mmio = RING_HWS_PGA(ring->mmio_base);
> -	}
> -
> -	I915_WRITE(mmio, (u32)ring->status_page.gfx_addr);
> -	POSTING_READ(mmio);
> -
> -	/*
> -	 * Flush the TLB for this page
> -	 *
> -	 * FIXME: These two bits have disappeared on gen8, so a question
> -	 * arises: do we still need this and if so how should we go about
> -	 * invalidating the TLB?
> -	 */
> -	if (INTEL_INFO(dev)->gen >= 6 && INTEL_INFO(dev)->gen < 8) {
> -		u32 reg = RING_INSTPM(ring->mmio_base);
> -
> -		/* ring should be idle before issuing a sync flush*/
> -		WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
> -
> -		I915_WRITE(reg,
> -			   _MASKED_BIT_ENABLE(INSTPM_TLB_INVALIDATE |
> -					      INSTPM_SYNC_FLUSH));
> -		if (wait_for((I915_READ(reg) & INSTPM_SYNC_FLUSH) == 0,
> -			     1000))
> -			DRM_ERROR("%s: wait for SyncFlush to complete for TLB invalidation timed out\n",
> -				  ring->name);
> -	}
> -}
> -
>   static int
> -bsd_ring_flush(struct intel_engine_cs *ring,
> -	       u32     invalidate_domains,
> -	       u32     flush_domains)
> +bsd_emit_flush(struct i915_gem_request *rq,
> +	       u32 flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 1);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_FLUSH);
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   	return 0;
>   }
>   
>   static int
> -i9xx_add_request(struct intel_engine_cs *ring)
> +i9xx_emit_breadcrumb(struct i915_gem_request *rq)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 5);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
>   	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
> +	intel_ring_emit(ring, rq->seqno);
>   	intel_ring_emit(ring, MI_USER_INTERRUPT);
> -	__intel_ring_advance(ring);
> +	intel_ring_emit(ring, MI_NOOP);
> +	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
>   static bool
> -gen6_ring_get_irq(struct intel_engine_cs *ring)
> +gen6_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!dev_priv->dev->irq_enabled)
>   	       return false;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		if (HAS_L3_DPF(dev) && ring->id == RCS)
> -			I915_WRITE_IMR(ring,
> -				       ~(ring->irq_enable_mask |
> -					 GT_PARITY_ERROR(dev)));
> -		else
> -			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
> -		gen5_enable_gt_irq(dev_priv, ring->irq_enable_mask);
> +	if (engine->irq_refcount++ == 0) {
> +		I915_WRITE_IMR(engine,
> +			       ~(engine->irq_enable_mask |
> +				 engine->irq_keep_mask));
> +		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   
> +	/* Keep the device awake to save expensive CPU cycles when
> +	 * reading the registers.
> +	 */
> +	gen6_gt_force_wake_get(dev_priv, engine->power_domains);
>   	return true;
>   }
>   
>   static void
> -gen6_ring_put_irq(struct intel_engine_cs *ring)
> +gen6_irq_barrier(struct intel_engine_cs *engine)
> +{
> +	/* w/a for lax serialisation of GPU writes with IRQs */
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	(void)I915_READ(RING_ACTHD(engine->mmio_base));
> +}
> +
> +static void
> +gen6_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> +	gen6_gt_force_wake_put(dev_priv, engine->power_domains);
> +
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		if (HAS_L3_DPF(dev) && ring->id == RCS)
> -			I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
> -		else
> -			I915_WRITE_IMR(ring, ~0);
> -		gen5_disable_gt_irq(dev_priv, ring->irq_enable_mask);
> +	if (--engine->irq_refcount == 0) {
> +		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
> +		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   }
>   
>   static bool
> -hsw_vebox_get_irq(struct intel_engine_cs *ring)
> +hsw_vebox_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!dev_priv->dev->irq_enabled)
>   		return false;
>   
> +	gen6_gt_force_wake_get(dev_priv, engine->power_domains);
> +
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
> -		gen6_enable_pm_irq(dev_priv, ring->irq_enable_mask);
> +	if (engine->irq_refcount++ == 0) {
> +		I915_WRITE_IMR(engine,
> +			       ~(engine->irq_enable_mask |
> +				 engine->irq_keep_mask));
> +		gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   
> @@ -1452,43 +1347,36 @@ hsw_vebox_get_irq(struct intel_engine_cs *ring)
>   }
>   
>   static void
> -hsw_vebox_put_irq(struct intel_engine_cs *ring)
> +hsw_vebox_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> -		return;
> -
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		I915_WRITE_IMR(ring, ~0);
> -		gen6_disable_pm_irq(dev_priv, ring->irq_enable_mask);
> +	if (--engine->irq_refcount == 0) {
> +		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
> +		gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +
> +	gen6_gt_force_wake_put(dev_priv, engine->power_domains);
>   }
>   
>   static bool
> -gen8_ring_get_irq(struct intel_engine_cs *ring)
> +gen8_irq_get(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
> -	if (!dev->irq_enabled)
> +	if (!dev_priv->dev->irq_enabled)
>   		return false;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (ring->irq_refcount++ == 0) {
> -		if (HAS_L3_DPF(dev) && ring->id == RCS) {
> -			I915_WRITE_IMR(ring,
> -				       ~(ring->irq_enable_mask |
> -					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
> -		} else {
> -			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
> -		}
> -		POSTING_READ(RING_IMR(ring->mmio_base));
> +	if (engine->irq_refcount++ == 0) {
> +		I915_WRITE_IMR(engine,
> +			       ~(engine->irq_enable_mask |
> +				 engine->irq_keep_mask));
> +		POSTING_READ(RING_IMR(engine->mmio_base));
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   
> @@ -1496,35 +1384,29 @@ gen8_ring_get_irq(struct intel_engine_cs *ring)
>   }
>   
>   static void
> -gen8_ring_put_irq(struct intel_engine_cs *ring)
> +gen8_irq_put(struct intel_engine_cs *engine)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--ring->irq_refcount == 0) {
> -		if (HAS_L3_DPF(dev) && ring->id == RCS) {
> -			I915_WRITE_IMR(ring,
> -				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> -		} else {
> -			I915_WRITE_IMR(ring, ~0);
> -		}
> -		POSTING_READ(RING_IMR(ring->mmio_base));
> +	if (--engine->irq_refcount == 0) {
> +		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
> +		POSTING_READ(RING_IMR(engine->mmio_base));
>   	}
>   	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>   }
>   
>   static int
> -i965_dispatch_execbuffer(struct intel_engine_cs *ring,
> -			 u64 offset, u32 length,
> -			 unsigned flags)
> +i965_emit_batchbuffer(struct i915_gem_request *rq,
> +		      u64 offset, u32 length,
> +		      unsigned flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 2);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring,
>   			MI_BATCH_BUFFER_START |
> @@ -1539,31 +1421,31 @@ i965_dispatch_execbuffer(struct intel_engine_cs *ring,
>   /* Just userspace ABI convention to limit the wa batch bo to a resonable size */
>   #define I830_BATCH_LIMIT (256*1024)
>   static int
> -i830_dispatch_execbuffer(struct intel_engine_cs *ring,
> -				u64 offset, u32 len,
> -				unsigned flags)
> +i830_emit_batchbuffer(struct i915_gem_request *rq,
> +		      u64 offset, u32 len,
> +		      unsigned flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
>   	if (flags & I915_DISPATCH_PINNED) {
> -		ret = intel_ring_begin(ring, 4);
> -		if (ret)
> -			return ret;
> +		ring = intel_ring_begin(rq, 3);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
>   
>   		intel_ring_emit(ring, MI_BATCH_BUFFER);
>   		intel_ring_emit(ring, offset | (flags & I915_DISPATCH_SECURE ? 0 : MI_BATCH_NON_SECURE));
>   		intel_ring_emit(ring, offset + len - 8);
> -		intel_ring_emit(ring, MI_NOOP);
>   		intel_ring_advance(ring);
>   	} else {
> -		u32 cs_offset = ring->scratch.gtt_offset;
> +		u32 cs_offset = rq->engine->scratch.gtt_offset;
>   
>   		if (len > I830_BATCH_LIMIT)
>   			return -ENOSPC;
>   
> -		ret = intel_ring_begin(ring, 9+3);
> -		if (ret)
> -			return ret;
> +		ring = intel_ring_begin(rq, 9+3);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
> +
>   		/* Blit the batch (which has now all relocs applied) to the stable batch
>   		 * scratch bo area (so that the CS never stumbles over its tlb
>   		 * invalidation bug) ... */
> @@ -1590,15 +1472,15 @@ i830_dispatch_execbuffer(struct intel_engine_cs *ring,
>   }
>   
>   static int
> -i915_dispatch_execbuffer(struct intel_engine_cs *ring,
> -			 u64 offset, u32 len,
> -			 unsigned flags)
> +i915_emit_batchbuffer(struct i915_gem_request *rq,
> +		      u64 offset, u32 len,
> +		      unsigned flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 2);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring, MI_BATCH_BUFFER_START | MI_BATCH_GTT);
>   	intel_ring_emit(ring, offset | (flags & I915_DISPATCH_SECURE ? 0 : MI_BATCH_NON_SECURE));
> @@ -1607,492 +1489,232 @@ i915_dispatch_execbuffer(struct intel_engine_cs *ring,
>   	return 0;
>   }
>   
> -static void cleanup_status_page(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_gem_object *obj;
> -
> -	obj = ring->status_page.obj;
> -	if (obj == NULL)
> -		return;
> -
> -	kunmap(sg_page(obj->pages->sgl));
> -	i915_gem_object_ggtt_unpin(obj);
> -	drm_gem_object_unreference(&obj->base);
> -	ring->status_page.obj = NULL;
> -}
> -
> -static int init_status_page(struct intel_engine_cs *ring)
> +static int setup_status_page(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_gem_object *obj;
> +	unsigned flags;
> +	int ret;
>   
> -	if ((obj = ring->status_page.obj) == NULL) {
> -		unsigned flags;
> -		int ret;
> +	obj = i915_gem_alloc_object(engine->i915->dev, 4096);
> +	if (obj == NULL) {
> +		DRM_ERROR("Failed to allocate status page\n");
> +		return -ENOMEM;
> +	}
>   
> -		obj = i915_gem_alloc_object(ring->dev, 4096);
> -		if (obj == NULL) {
> -			DRM_ERROR("Failed to allocate status page\n");
> -			return -ENOMEM;
> -		}
> +	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> +	if (ret)
> +		goto err_unref;
>   
> -		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> -		if (ret)
> -			goto err_unref;
> -
> -		flags = 0;
> -		if (!HAS_LLC(ring->dev))
> -			/* On g33, we cannot place HWS above 256MiB, so
> -			 * restrict its pinning to the low mappable arena.
> -			 * Though this restriction is not documented for
> -			 * gen4, gen5, or byt, they also behave similarly
> -			 * and hang if the HWS is placed at the top of the
> -			 * GTT. To generalise, it appears that all !llc
> -			 * platforms have issues with us placing the HWS
> -			 * above the mappable region (even though we never
> -			 * actualy map it).
> -			 */
> -			flags |= PIN_MAPPABLE;
> -		ret = i915_gem_obj_ggtt_pin(obj, 4096, flags);
> -		if (ret) {
> +	flags = 0;
> +	if (!HAS_LLC(engine->i915))
> +		/* On g33, we cannot place HWS above 256MiB, so
> +		 * restrict its pinning to the low mappable arena.
> +		 * Though this restriction is not documented for
> +		 * gen4, gen5, or byt, they also behave similarly
> +		 * and hang if the HWS is placed at the top of the
> +		 * GTT. To generalise, it appears that all !llc
> +		 * platforms have issues with us placing the HWS
> +		 * above the mappable region (even though we never
> +		 * actualy map it).
> +		 */
> +		flags |= PIN_MAPPABLE;
> +	ret = i915_gem_obj_ggtt_pin(obj, 4096, flags);
> +	if (ret) {
>   err_unref:
> -			drm_gem_object_unreference(&obj->base);
> -			return ret;
> -		}
> -
> -		ring->status_page.obj = obj;
> +		drm_gem_object_unreference(&obj->base);
> +		return ret;
>   	}
>   
> -	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
> -	ring->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
> -	memset(ring->status_page.page_addr, 0, PAGE_SIZE);
> +	engine->status_page.obj = obj;
>   
> -	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
> -			ring->name, ring->status_page.gfx_addr);
> +	engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
> +	engine->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
> +	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
>   
> +	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
> +			engine->name, engine->status_page.gfx_addr);
>   	return 0;
>   }
>   
> -static int init_phys_status_page(struct intel_engine_cs *ring)
> +static int setup_phys_status_page(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *i915 = engine->i915;
>   
> -	if (!dev_priv->status_page_dmah) {
> -		dev_priv->status_page_dmah =
> -			drm_pci_alloc(ring->dev, PAGE_SIZE, PAGE_SIZE);
> -		if (!dev_priv->status_page_dmah)
> -			return -ENOMEM;
> -	}
> +	i915->status_page_dmah =
> +		drm_pci_alloc(i915->dev, PAGE_SIZE, PAGE_SIZE);
> +	if (!i915->status_page_dmah)
> +		return -ENOMEM;
>   
> -	ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
> -	memset(ring->status_page.page_addr, 0, PAGE_SIZE);
> +	engine->status_page.page_addr = i915->status_page_dmah->vaddr;
> +	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
>   
>   	return 0;
>   }
>   
> -void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> +void intel_ring_free(struct intel_ringbuffer *ring)
>   {
> -	if (!ringbuf->obj)
> -		return;
> +	if (ring->obj) {
> +		iounmap(ring->virtual_start);
> +		i915_gem_object_ggtt_unpin(ring->obj);
> +		drm_gem_object_unreference(&ring->obj->base);
> +	}
>   
> -	iounmap(ringbuf->virtual_start);
> -	i915_gem_object_ggtt_unpin(ringbuf->obj);
> -	drm_gem_object_unreference(&ringbuf->obj->base);
> -	ringbuf->obj = NULL;
> +	list_del(&ring->engine_list);
> +	kfree(ring);
>   }
>   
> -int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> -			       struct intel_ringbuffer *ringbuf)
> +struct intel_ringbuffer *
> +intel_engine_alloc_ring(struct intel_engine_cs *engine,
> +			struct intel_context *ctx,
> +			int size)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_ringbuffer *ring;
>   	struct drm_i915_gem_object *obj;
>   	int ret;
>   
> -	if (ringbuf->obj)
> -		return 0;
> +	DRM_DEBUG("creating ringbuffer for %s, size %d\n", engine->name, size);
> +
> +	if (WARN_ON(!is_power_of_2(size)))
> +		return ERR_PTR(-EINVAL);
> +
> +	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
> +	if (ring == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ring->engine = engine;
> +	ring->ctx = ctx;
>   
> -	obj = NULL;
> -	if (!HAS_LLC(dev))
> -		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
> +	obj = i915_gem_object_create_stolen(i915->dev, size);
>   	if (obj == NULL)
> -		obj = i915_gem_alloc_object(dev, ringbuf->size);
> +		obj = i915_gem_alloc_object(i915->dev, size);
>   	if (obj == NULL)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);
>   
>   	/* mark ring buffers as read-only from GPU side by default */
>   	obj->gt_ro = 1;
>   
>   	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
> -	if (ret)
> +	if (ret) {
> +		DRM_ERROR("failed pin ringbuffer into GGTT\n");
>   		goto err_unref;
> +	}
>   
>   	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> -	if (ret)
> +	if (ret) {
> +		DRM_ERROR("failed mark ringbuffer for GTT writes\n");
>   		goto err_unpin;
> +	}
>   
> -	ringbuf->virtual_start =
> -		ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
> -				ringbuf->size);
> -	if (ringbuf->virtual_start == NULL) {
> +	ring->virtual_start =
> +		ioremap_wc(i915->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj),
> +			   size);
> +	if (ring->virtual_start == NULL) {
> +		DRM_ERROR("failed to map ringbuffer through GTT\n");
>   		ret = -EINVAL;
>   		goto err_unpin;
>   	}
>   
> -	ringbuf->obj = obj;
> -	return 0;
> -
> -err_unpin:
> -	i915_gem_object_ggtt_unpin(obj);
> -err_unref:
> -	drm_gem_object_unreference(&obj->base);
> -	return ret;
> -}
> -
> -static int intel_init_ring_buffer(struct drm_device *dev,
> -				  struct intel_engine_cs *ring)
> -{
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	int ret;
> -
> -	if (ringbuf == NULL) {
> -		ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
> -		if (!ringbuf)
> -			return -ENOMEM;
> -		ring->buffer = ringbuf;
> -	}
> -
> -	ring->dev = dev;
> -	INIT_LIST_HEAD(&ring->active_list);
> -	INIT_LIST_HEAD(&ring->request_list);
> -	INIT_LIST_HEAD(&ring->execlist_queue);
> -	ringbuf->size = 32 * PAGE_SIZE;
> -	ringbuf->ring = ring;
> -	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
> -
> -	init_waitqueue_head(&ring->irq_queue);
> -
> -	if (I915_NEED_GFX_HWS(dev)) {
> -		ret = init_status_page(ring);
> -		if (ret)
> -			goto error;
> -	} else {
> -		BUG_ON(ring->id != RCS);
> -		ret = init_phys_status_page(ring);
> -		if (ret)
> -			goto error;
> -	}
> -
> -	ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
> -	if (ret) {
> -		DRM_ERROR("Failed to allocate ringbuffer %s: %d\n", ring->name, ret);
> -		goto error;
> -	}
> +	ring->obj = obj;
> +	ring->size = size;
>   
>   	/* Workaround an erratum on the i830 which causes a hang if
>   	 * the TAIL pointer points to within the last 2 cachelines
>   	 * of the buffer.
>   	 */
> -	ringbuf->effective_size = ringbuf->size;
> -	if (IS_I830(dev) || IS_845G(dev))
> -		ringbuf->effective_size -= 2 * CACHELINE_BYTES;
> +	ring->effective_size = size;
> +	if (IS_I830(i915) || IS_845G(i915))
> +		ring->effective_size -= 2 * CACHELINE_BYTES;
>   
> -	ret = i915_cmd_parser_init_ring(ring);
> -	if (ret)
> -		goto error;
> +	ring->space = intel_ring_space(ring);
> +	ring->retired_head = -1;
>   
> -	ret = ring->init(ring);
> -	if (ret)
> -		goto error;
> +	INIT_LIST_HEAD(&ring->requests);
> +	INIT_LIST_HEAD(&ring->breadcrumbs);
> +	list_add_tail(&ring->engine_list, &engine->rings);
>   
> -	return 0;
> +	return ring;
>   
> -error:
> -	kfree(ringbuf);
> -	ring->buffer = NULL;
> -	return ret;
> +err_unpin:
> +	i915_gem_object_ggtt_unpin(obj);
> +err_unref:
> +	drm_gem_object_unreference(&obj->base);
> +	return ERR_PTR(ret);
> +}
> +
> +static void
> +nop_irq_barrier(struct intel_engine_cs *engine)
> +{
>   }
>   
> -void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
> +static int intel_engine_init(struct intel_engine_cs *engine,
> +			     struct drm_i915_private *i915)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> +	int ret;
>   
> -	if (!intel_ring_initialized(ring))
> -		return;
> +	engine->i915 = i915;
>   
> -	intel_stop_ring_buffer(ring);
> -	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
> +	INIT_LIST_HEAD(&engine->rings);
> +	INIT_LIST_HEAD(&engine->read_list);
> +	INIT_LIST_HEAD(&engine->write_list);
> +	INIT_LIST_HEAD(&engine->requests);
> +	INIT_LIST_HEAD(&engine->pending);
> +	INIT_LIST_HEAD(&engine->submitted);
>   
> -	intel_destroy_ringbuffer_obj(ringbuf);
> -	ring->preallocated_lazy_request = NULL;
> -	ring->outstanding_lazy_seqno = 0;
> +	spin_lock_init(&engine->lock);
> +	spin_lock_init(&engine->irqlock);
>   
> -	if (ring->cleanup)
> -		ring->cleanup(ring);
> +	engine->suspend = engine_suspend;
> +	engine->resume = engine_resume;
> +	engine->cleanup = engine_cleanup;
>   
> -	cleanup_status_page(ring);
> +	engine->get_seqno = ring_get_seqno;
> +	engine->set_seqno = ring_set_seqno;
>   
> -	i915_cmd_parser_fini_ring(ring);
> +	engine->irq_barrier = nop_irq_barrier;
>   
> -	kfree(ringbuf);
> -	ring->buffer = NULL;
> -}
> +	engine->get_ring = engine_get_ring;
> +	engine->put_ring = engine_put_ring;
>   
> -static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
> -{
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	struct drm_i915_gem_request *request;
> -	u32 seqno = 0;
> -	int ret;
> +	engine->semaphore.wait = NULL;
>   
> -	if (ringbuf->last_retired_head != -1) {
> -		ringbuf->head = ringbuf->last_retired_head;
> -		ringbuf->last_retired_head = -1;
> +	engine->add_request = engine_add_request;
> +	engine->write_tail = ring_write_tail;
> +	engine->is_complete = engine_rq_is_complete;
>   
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= n)
> -			return 0;
> -	}
> +	init_waitqueue_head(&engine->irq_queue);
>   
> -	list_for_each_entry(request, &ring->request_list, list) {
> -		if (__intel_ring_space(request->tail, ringbuf->tail,
> -				       ringbuf->size) >= n) {
> -			seqno = request->seqno;
> -			break;
> -		}
> +	if (I915_NEED_GFX_HWS(i915)) {
> +		ret = setup_status_page(engine);
> +	} else {
> +		BUG_ON(engine->id != RCS);
> +		ret = setup_phys_status_page(engine);
>   	}
> -
> -	if (seqno == 0)
> -		return -ENOSPC;
> -
> -	ret = i915_wait_seqno(ring, seqno);
>   	if (ret)
>   		return ret;
>   
> -	i915_gem_retire_requests_ring(ring);
> -	ringbuf->head = ringbuf->last_retired_head;
> -	ringbuf->last_retired_head = -1;
> +	ret = i915_cmd_parser_init_engine(engine);
> +	if (ret)
> +		return ret;
>   
> -	ringbuf->space = intel_ring_space(ringbuf);
>   	return 0;
>   }
>   
> -static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
> +static void gen6_bsd_ring_write_tail(struct intel_engine_cs *engine,
> +				     u32 value)
>   {
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	unsigned long end;
> -	int ret;
> -
> -	ret = intel_ring_wait_request(ring, n);
> -	if (ret != -ENOSPC)
> -		return ret;
> +	struct drm_i915_private *dev_priv = engine->i915;
>   
> -	/* force the tail write in case we have been skipping them */
> -	__intel_ring_advance(ring);
> +       /* Every tail move must follow the sequence below */
>   
> -	/* With GEM the hangcheck timer should kick us out of the loop,
> -	 * leaving it early runs the risk of corrupting GEM state (due
> -	 * to running on almost untested codepaths). But on resume
> -	 * timers don't work yet, so prevent a complete hang in that
> -	 * case by choosing an insanely large timeout. */
> -	end = jiffies + 60 * HZ;
> +	/* Disable notification that the ring is IDLE. The GT
> +	 * will then assume that it is busy and bring it out of rc6.
> +	 */
> +	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
> +		   _MASKED_BIT_ENABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
>   
> -	trace_i915_ring_wait_begin(ring);
> -	do {
> -		ringbuf->head = I915_READ_HEAD(ring);
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= n) {
> -			ret = 0;
> -			break;
> -		}
> -
> -		msleep(1);
> -
> -		if (dev_priv->mm.interruptible && signal_pending(current)) {
> -			ret = -ERESTARTSYS;
> -			break;
> -		}
> -
> -		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> -					   dev_priv->mm.interruptible);
> -		if (ret)
> -			break;
> -
> -		if (time_after(jiffies, end)) {
> -			ret = -EBUSY;
> -			break;
> -		}
> -	} while (1);
> -	trace_i915_ring_wait_end(ring);
> -	return ret;
> -}
> -
> -static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
> -{
> -	uint32_t __iomem *virt;
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	int rem = ringbuf->size - ringbuf->tail;
> -
> -	if (ringbuf->space < rem) {
> -		int ret = ring_wait_for_space(ring, rem);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	virt = ringbuf->virtual_start + ringbuf->tail;
> -	rem /= 4;
> -	while (rem--)
> -		iowrite32(MI_NOOP, virt++);
> -
> -	ringbuf->tail = 0;
> -	ringbuf->space = intel_ring_space(ringbuf);
> -
> -	return 0;
> -}
> -
> -int intel_ring_idle(struct intel_engine_cs *ring)
> -{
> -	u32 seqno;
> -	int ret;
> -
> -	/* We need to add any requests required to flush the objects and ring */
> -	if (ring->outstanding_lazy_seqno) {
> -		ret = i915_add_request(ring, NULL);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	/* Wait upon the last request to be completed */
> -	if (list_empty(&ring->request_list))
> -		return 0;
> -
> -	seqno = list_entry(ring->request_list.prev,
> -			   struct drm_i915_gem_request,
> -			   list)->seqno;
> -
> -	return i915_wait_seqno(ring, seqno);
> -}
> -
> -static int
> -intel_ring_alloc_seqno(struct intel_engine_cs *ring)
> -{
> -	if (ring->outstanding_lazy_seqno)
> -		return 0;
> -
> -	if (ring->preallocated_lazy_request == NULL) {
> -		struct drm_i915_gem_request *request;
> -
> -		request = kmalloc(sizeof(*request), GFP_KERNEL);
> -		if (request == NULL)
> -			return -ENOMEM;
> -
> -		ring->preallocated_lazy_request = request;
> -	}
> -
> -	return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
> -}
> -
> -static int __intel_ring_prepare(struct intel_engine_cs *ring,
> -				int bytes)
> -{
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	int ret;
> -
> -	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
> -		ret = intel_wrap_ring_buffer(ring);
> -		if (unlikely(ret))
> -			return ret;
> -	}
> -
> -	if (unlikely(ringbuf->space < bytes)) {
> -		ret = ring_wait_for_space(ring, bytes);
> -		if (unlikely(ret))
> -			return ret;
> -	}
> -
> -	return 0;
> -}
> -
> -int intel_ring_begin(struct intel_engine_cs *ring,
> -		     int num_dwords)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -	int ret;
> -
> -	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> -				   dev_priv->mm.interruptible);
> -	if (ret)
> -		return ret;
> -
> -	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
> -	if (ret)
> -		return ret;
> -
> -	/* Preallocate the olr before touching the ring */
> -	ret = intel_ring_alloc_seqno(ring);
> -	if (ret)
> -		return ret;
> -
> -	ring->buffer->space -= num_dwords * sizeof(uint32_t);
> -	return 0;
> -}
> -
> -/* Align the ring tail to a cacheline boundary */
> -int intel_ring_cacheline_align(struct intel_engine_cs *ring)
> -{
> -	int num_dwords = (ring->buffer->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
> -	int ret;
> -
> -	if (num_dwords == 0)
> -		return 0;
> -
> -	num_dwords = CACHELINE_BYTES / sizeof(uint32_t) - num_dwords;
> -	ret = intel_ring_begin(ring, num_dwords);
> -	if (ret)
> -		return ret;
> -
> -	while (num_dwords--)
> -		intel_ring_emit(ring, MI_NOOP);
> -
> -	intel_ring_advance(ring);
> -
> -	return 0;
> -}
> -
> -void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno)
> -{
> -	struct drm_device *dev = ring->dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -	BUG_ON(ring->outstanding_lazy_seqno);
> -
> -	if (INTEL_INFO(dev)->gen == 6 || INTEL_INFO(dev)->gen == 7) {
> -		I915_WRITE(RING_SYNC_0(ring->mmio_base), 0);
> -		I915_WRITE(RING_SYNC_1(ring->mmio_base), 0);
> -		if (HAS_VEBOX(dev))
> -			I915_WRITE(RING_SYNC_2(ring->mmio_base), 0);
> -	}
> -
> -	ring->set_seqno(ring, seqno);
> -	ring->hangcheck.seqno = seqno;
> -}
> -
> -static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
> -				     u32 value)
> -{
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -
> -       /* Every tail move must follow the sequence below */
> -
> -	/* Disable notification that the ring is IDLE. The GT
> -	 * will then assume that it is busy and bring it out of rc6.
> -	 */
> -	I915_WRITE(GEN6_BSD_SLEEP_PSMI_CONTROL,
> -		   _MASKED_BIT_ENABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
> -
> -	/* Clear the context id. Here be magic! */
> -	I915_WRITE64(GEN6_BSD_RNCID, 0x0);
> +	/* Clear the context id. Here be magic! */
> +	I915_WRITE64(GEN6_BSD_RNCID, 0x0);
>   
>   	/* Wait for the ring not to be idle, i.e. for it to wake up. */
>   	if (wait_for((I915_READ(GEN6_BSD_SLEEP_PSMI_CONTROL) &
> @@ -2101,8 +1723,8 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
>   		DRM_ERROR("timed out waiting for the BSD ring to wake up\n");
>   
>   	/* Now that the ring is fully powered up, update the tail */
> -	I915_WRITE_TAIL(ring, value);
> -	POSTING_READ(RING_TAIL(ring->mmio_base));
> +	I915_WRITE_TAIL(engine, value);
> +	POSTING_READ(RING_TAIL(engine->mmio_base));
>   
>   	/* Let the ring send IDLE messages to the GT again,
>   	 * and so let it sleep to conserve power when idle.
> @@ -2111,73 +1733,72 @@ static void gen6_bsd_ring_write_tail(struct intel_engine_cs *ring,
>   		   _MASKED_BIT_DISABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
>   }
>   
> -static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
> -			       u32 invalidate, u32 flush)
> +static int gen6_bsd_emit_flush(struct i915_gem_request *rq,
> +			       u32 flags)
>   {
> +	struct intel_ringbuffer *ring;
>   	uint32_t cmd;
> -	int ret;
> -
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
>   
> -	cmd = MI_FLUSH_DW;
> -	if (INTEL_INFO(ring->dev)->gen >= 8)
> +	cmd = 3;
> +	if (INTEL_INFO(rq->i915)->gen >= 8)
>   		cmd += 1;
> +
> +	ring = intel_ring_begin(rq, cmd);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
> +
>   	/*
>   	 * Bspec vol 1c.5 - video engine command streamer:
>   	 * "If ENABLED, all TLBs will be invalidated once the flush
>   	 * operation is complete. This bit is only valid when the
>   	 * Post-Sync Operation field is a value of 1h or 3h."
>   	 */
> -	if (invalidate & I915_GEM_GPU_DOMAINS)
> -		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> -			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +	cmd = MI_FLUSH_DW | (cmd - 2);
> +	if (flags & I915_INVALIDATE_CACHES)
> +		cmd |= (MI_INVALIDATE_TLB |
> +			MI_INVALIDATE_BSD |
> +			MI_FLUSH_DW_STORE_INDEX |
> +			MI_FLUSH_DW_OP_STOREDW);
>   	intel_ring_emit(ring, cmd);
>   	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
> -	if (INTEL_INFO(ring->dev)->gen >= 8) {
> +	if (INTEL_INFO(rq->i915)->gen >= 8)
>   		intel_ring_emit(ring, 0); /* upper addr */
> -		intel_ring_emit(ring, 0); /* value */
> -	} else  {
> -		intel_ring_emit(ring, 0);
> -		intel_ring_emit(ring, MI_NOOP);
> -	}
> +	intel_ring_emit(ring, 0); /* value */
>   	intel_ring_advance(ring);
>   	return 0;
>   }
>   
>   static int
> -gen8_ring_dispatch_execbuffer(struct intel_engine_cs *ring,
> -			      u64 offset, u32 len,
> -			      unsigned flags)
> +gen8_emit_batchbuffer(struct i915_gem_request *rq,
> +		      u64 offset, u32 len,
> +		      unsigned flags)
>   {
> -	bool ppgtt = USES_PPGTT(ring->dev) && !(flags & I915_DISPATCH_SECURE);
> -	int ret;
> +	struct intel_ringbuffer *ring;
> +	bool ppgtt = USES_PPGTT(rq->i915) && !(flags & I915_DISPATCH_SECURE);
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 3);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	/* FIXME(BDW): Address space and security selectors. */
>   	intel_ring_emit(ring, MI_BATCH_BUFFER_START_GEN8 | (ppgtt<<8));
>   	intel_ring_emit(ring, lower_32_bits(offset));
>   	intel_ring_emit(ring, upper_32_bits(offset));
> -	intel_ring_emit(ring, MI_NOOP);
>   	intel_ring_advance(ring);
>   
>   	return 0;
>   }
>   
>   static int
> -hsw_ring_dispatch_execbuffer(struct intel_engine_cs *ring,
> -			      u64 offset, u32 len,
> -			      unsigned flags)
> +hsw_emit_batchbuffer(struct i915_gem_request *rq,
> +		     u64 offset, u32 len,
> +		     unsigned flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 2);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring,
>   			MI_BATCH_BUFFER_START | MI_BATCH_PPGTT_HSW |
> @@ -2190,15 +1811,15 @@ hsw_ring_dispatch_execbuffer(struct intel_engine_cs *ring,
>   }
>   
>   static int
> -gen6_ring_dispatch_execbuffer(struct intel_engine_cs *ring,
> -			      u64 offset, u32 len,
> -			      unsigned flags)
> +gen6_emit_batchbuffer(struct i915_gem_request *rq,
> +		      u64 offset, u32 len,
> +		      unsigned flags)
>   {
> -	int ret;
> +	struct intel_ringbuffer *ring;
>   
> -	ret = intel_ring_begin(ring, 2);
> -	if (ret)
> -		return ret;
> +	ring = intel_ring_begin(rq, 2);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
>   
>   	intel_ring_emit(ring,
>   			MI_BATCH_BUFFER_START |
> @@ -2212,60 +1833,102 @@ gen6_ring_dispatch_execbuffer(struct intel_engine_cs *ring,
>   
>   /* Blitter support (SandyBridge+) */
>   
> -static int gen6_ring_flush(struct intel_engine_cs *ring,
> -			   u32 invalidate, u32 flush)
> +static int gen6_blt_emit_flush(struct i915_gem_request *rq,
> +			       u32 flags)
>   {
> -	struct drm_device *dev = ring->dev;
> +	struct intel_ringbuffer *ring;
>   	uint32_t cmd;
> -	int ret;
>   
> -	ret = intel_ring_begin(ring, 4);
> -	if (ret)
> -		return ret;
> -
> -	cmd = MI_FLUSH_DW;
> -	if (INTEL_INFO(ring->dev)->gen >= 8)
> +	cmd = 3;
> +	if (INTEL_INFO(rq->i915)->gen >= 8)
>   		cmd += 1;
> +
> +	ring = intel_ring_begin(rq, cmd);
> +	if (IS_ERR(ring))
> +		return PTR_ERR(ring);
> +
>   	/*
>   	 * Bspec vol 1c.3 - blitter engine command streamer:
>   	 * "If ENABLED, all TLBs will be invalidated once the flush
>   	 * operation is complete. This bit is only valid when the
>   	 * Post-Sync Operation field is a value of 1h or 3h."
>   	 */
> -	if (invalidate & I915_GEM_DOMAIN_RENDER)
> -		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> -			MI_FLUSH_DW_OP_STOREDW;
> +	cmd = MI_FLUSH_DW | (cmd - 2);
> +	if (flags & I915_INVALIDATE_CACHES)
> +		cmd |= (MI_INVALIDATE_TLB |
> +			MI_FLUSH_DW_STORE_INDEX |
> +			MI_FLUSH_DW_OP_STOREDW);
>   	intel_ring_emit(ring, cmd);
>   	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
> -	if (INTEL_INFO(ring->dev)->gen >= 8) {
> +	if (INTEL_INFO(rq->i915)->gen >= 8)
>   		intel_ring_emit(ring, 0); /* upper addr */
> -		intel_ring_emit(ring, 0); /* value */
> -	} else  {
> -		intel_ring_emit(ring, 0);
> -		intel_ring_emit(ring, MI_NOOP);
> -	}
> +	intel_ring_emit(ring, 0); /* value */
>   	intel_ring_advance(ring);
>   
> -	if (IS_GEN7(dev) && !invalidate && flush)
> -		return gen7_ring_fbc_flush(ring, FBC_REND_CACHE_CLEAN);
> +	if (IS_GEN7(rq->i915) && flags & I915_KICK_FBC)
> +		return gen7_ring_fbc_flush(rq, FBC_REND_CACHE_CLEAN);
>   
>   	return 0;
>   }
>   
> -int intel_init_render_ring_buffer(struct drm_device *dev)
> +static void gen8_engine_init_semaphore(struct intel_engine_cs *engine)
> +{
> +	if (engine->i915->semaphore_obj == NULL)
> +		return;
> +
> +	engine->semaphore.wait = gen8_emit_wait;
> +	engine->semaphore.signal =
> +		engine->id == RCS ? gen8_rcs_emit_signal : gen8_xcs_emit_signal;
> +}
> +
> +static bool semaphores_enabled(struct drm_i915_private *dev_priv)
> +{
> +	if (INTEL_INFO(dev_priv)->gen < 6)
> +		return false;
> +
> +	if (i915.semaphores >= 0)
> +		return i915.semaphores;
> +
> +	/* Until we get further testing... */
> +	if (IS_GEN8(dev_priv))
> +		return false;
> +
> +#ifdef CONFIG_INTEL_IOMMU
> +	/* Enable semaphores on SNB when IO remapping is off */
> +	if (INTEL_INFO(dev_priv)->gen == 6 && intel_iommu_gfx_mapped)
> +		return false;
> +#endif
> +
> +	return true;
> +}
> +
> +int intel_init_render_engine(struct drm_i915_private *dev_priv)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[RCS];
> +	struct intel_engine_cs *engine = &dev_priv->engine[RCS];
>   	struct drm_i915_gem_object *obj;
>   	int ret;
>   
> -	ring->name = "render ring";
> -	ring->id = RCS;
> -	ring->mmio_base = RENDER_RING_BASE;
> +	ret = intel_engine_init(engine, dev_priv);
> +	if (ret)
> +		return ret;
> +
> +	engine->name = "render ring";
> +	engine->id = RCS;
> +	engine->power_domains = FORCEWAKE_RENDER;
> +	engine->mmio_base = RENDER_RING_BASE;
> +
> +	engine->init_context = i915_gem_render_state_init;
> +
> +	if (HAS_L3_DPF(dev_priv)) {
> +		if (INTEL_INFO(dev_priv)->gen >= 8)
> +			engine->irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> +		else
> +			engine->irq_keep_mask |= GT_PARITY_ERROR(dev_priv);
> +	}
>   
> -	if (INTEL_INFO(dev)->gen >= 8) {
> -		if (i915_semaphore_is_enabled(dev)) {
> -			obj = i915_gem_alloc_object(dev, 4096);
> +	if (INTEL_INFO(dev_priv)->gen >= 8) {
> +		if (semaphores_enabled(dev_priv)) {
> +			obj = i915_gem_alloc_object(dev_priv->dev, 4096);
>   			if (obj == NULL) {
>   				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
>   				i915.semaphores = 0;
> @@ -2280,36 +1943,28 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   					dev_priv->semaphore_obj = obj;
>   			}
>   		}
> -		if (IS_CHERRYVIEW(dev))
> -			ring->init_context = chv_init_workarounds;
> +		if (IS_CHERRYVIEW(dev_priv))
> +			engine->init_context = chv_render_init_context;
>   		else
> -			ring->init_context = bdw_init_workarounds;
> -		ring->add_request = gen6_add_request;
> -		ring->flush = gen8_render_ring_flush;
> -		ring->irq_get = gen8_ring_get_irq;
> -		ring->irq_put = gen8_ring_put_irq;
> -		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		ring->get_seqno = gen6_ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			WARN_ON(!dev_priv->semaphore_obj);
> -			ring->semaphore.sync_to = gen8_ring_sync;
> -			ring->semaphore.signal = gen8_rcs_signal;
> -			GEN8_RING_SEMAPHORE_INIT;
> -		}
> -	} else if (INTEL_INFO(dev)->gen >= 6) {
> -		ring->add_request = gen6_add_request;
> -		ring->flush = gen7_render_ring_flush;
> -		if (INTEL_INFO(dev)->gen == 6)
> -			ring->flush = gen6_render_ring_flush;
> -		ring->irq_get = gen6_ring_get_irq;
> -		ring->irq_put = gen6_ring_put_irq;
> -		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		ring->get_seqno = gen6_ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			ring->semaphore.sync_to = gen6_ring_sync;
> -			ring->semaphore.signal = gen6_signal;
> +			engine->init_context = bdw_render_init_context;
> +		engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +		engine->emit_flush = gen8_render_emit_flush;
> +		engine->irq_get = gen8_irq_get;
> +		engine->irq_put = gen8_irq_put;
> +		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> +		gen8_engine_init_semaphore(engine);
> +	} else if (INTEL_INFO(dev_priv)->gen >= 6) {
> +		engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +		engine->emit_flush = gen7_render_emit_flush;
> +		if (INTEL_INFO(dev_priv)->gen == 6)
> +			engine->emit_flush = gen6_render_emit_flush;
> +		engine->irq_get = gen6_irq_get;
> +		engine->irq_barrier = gen6_irq_barrier;
> +		engine->irq_put = gen6_irq_put;
> +		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> +		if (semaphores_enabled(dev_priv)) {
> +			engine->semaphore.wait = gen6_emit_wait;
> +			engine->semaphore.signal = gen6_emit_signal;
>   			/*
>   			 * The current semaphore is only applied on pre-gen8
>   			 * platform.  And there is no VCS2 ring on the pre-gen8
> @@ -2317,63 +1972,62 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   			 * initialized as INVALID.  Gen8 will initialize the
>   			 * sema between VCS2 and RCS later.
>   			 */
> -			ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_RV;
> -			ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_RB;
> -			ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_RVE;
> -			ring->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.signal[RCS] = GEN6_NOSYNC;
> -			ring->semaphore.mbox.signal[VCS] = GEN6_VRSYNC;
> -			ring->semaphore.mbox.signal[BCS] = GEN6_BRSYNC;
> -			ring->semaphore.mbox.signal[VECS] = GEN6_VERSYNC;
> -			ring->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
> +			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_RV;
> +			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_RB;
> +			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_RVE;
> +			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.signal[RCS] = GEN6_NOSYNC;
> +			engine->semaphore.mbox.signal[VCS] = GEN6_VRSYNC;
> +			engine->semaphore.mbox.signal[BCS] = GEN6_BRSYNC;
> +			engine->semaphore.mbox.signal[VECS] = GEN6_VERSYNC;
> +			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
>   		}
> -	} else if (IS_GEN5(dev)) {
> -		ring->add_request = pc_render_add_request;
> -		ring->flush = gen4_render_ring_flush;
> -		ring->get_seqno = pc_render_get_seqno;
> -		ring->set_seqno = pc_render_set_seqno;
> -		ring->irq_get = gen5_ring_get_irq;
> -		ring->irq_put = gen5_ring_put_irq;
> -		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
> -					GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
> +	} else if (IS_GEN5(dev_priv)) {
> +		engine->emit_breadcrumb = gen5_emit_breadcrumb;
> +		engine->emit_flush = gen4_emit_flush;
> +		engine->get_seqno = gen5_render_get_seqno;
> +		engine->set_seqno = gen5_render_set_seqno;
> +		engine->irq_get = gen5_irq_get;
> +		engine->irq_put = gen5_irq_put;
> +		engine->irq_enable_mask =
> +			GT_RENDER_USER_INTERRUPT |
> +			GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
>   	} else {
> -		ring->add_request = i9xx_add_request;
> -		if (INTEL_INFO(dev)->gen < 4)
> -			ring->flush = gen2_render_ring_flush;
> +		engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +		if (INTEL_INFO(dev_priv)->gen < 4)
> +			engine->emit_flush = gen2_emit_flush;
>   		else
> -			ring->flush = gen4_render_ring_flush;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
> -		if (IS_GEN2(dev)) {
> -			ring->irq_get = i8xx_ring_get_irq;
> -			ring->irq_put = i8xx_ring_put_irq;
> +			engine->emit_flush = gen4_emit_flush;
> +		if (IS_GEN2(dev_priv)) {
> +			engine->irq_get = i8xx_irq_get;
> +			engine->irq_put = i8xx_irq_put;
>   		} else {
> -			ring->irq_get = i9xx_ring_get_irq;
> -			ring->irq_put = i9xx_ring_put_irq;
> +			engine->irq_get = i9xx_irq_get;
> +			engine->irq_put = i9xx_irq_put;
>   		}
> -		ring->irq_enable_mask = I915_USER_INTERRUPT;
> -	}
> -	ring->write_tail = ring_write_tail;
> -
> -	if (IS_HASWELL(dev))
> -		ring->dispatch_execbuffer = hsw_ring_dispatch_execbuffer;
> -	else if (IS_GEN8(dev))
> -		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
> -	else if (INTEL_INFO(dev)->gen >= 6)
> -		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> -	else if (INTEL_INFO(dev)->gen >= 4)
> -		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
> -	else if (IS_I830(dev) || IS_845G(dev))
> -		ring->dispatch_execbuffer = i830_dispatch_execbuffer;
> +		engine->irq_enable_mask = I915_USER_INTERRUPT;
> +	}
> +
> +	if (IS_GEN8(dev_priv))
> +		engine->emit_batchbuffer = gen8_emit_batchbuffer;
> +	else if (IS_HASWELL(dev_priv))
> +		engine->emit_batchbuffer = hsw_emit_batchbuffer;
> +	else if (INTEL_INFO(dev_priv)->gen >= 6)
> +		engine->emit_batchbuffer = gen6_emit_batchbuffer;
> +	else if (INTEL_INFO(dev_priv)->gen >= 4)
> +		engine->emit_batchbuffer = i965_emit_batchbuffer;
> +	else if (IS_I830(dev_priv) || IS_845G(dev_priv))
> +		engine->emit_batchbuffer = i830_emit_batchbuffer;
>   	else
> -		ring->dispatch_execbuffer = i915_dispatch_execbuffer;
> -	ring->init = init_render_ring;
> -	ring->cleanup = render_ring_cleanup;
> +		engine->emit_batchbuffer = i915_emit_batchbuffer;
> +
> +	engine->resume = render_resume;
> +	engine->cleanup = render_cleanup;
>   
>   	/* Workaround batchbuffer to combat CS tlb bug. */
> -	if (HAS_BROKEN_CS_TLB(dev)) {
> -		obj = i915_gem_alloc_object(dev, I830_BATCH_LIMIT);
> +	if (HAS_BROKEN_CS_TLB(dev_priv)) {
> +		obj = i915_gem_alloc_object(dev_priv->dev, I830_BATCH_LIMIT);
>   		if (obj == NULL) {
>   			DRM_ERROR("Failed to allocate batch bo\n");
>   			return -ENOMEM;
> @@ -2386,158 +2040,155 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   			return ret;
>   		}
>   
> -		ring->scratch.obj = obj;
> -		ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
> +		engine->scratch.obj = obj;
> +		engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
> +	}
> +
> +	if (INTEL_INFO(dev_priv)->gen >= 5) {
> +		ret = init_pipe_control(engine);
> +		if (ret)
> +			return ret;
>   	}
>   
> -	return intel_init_ring_buffer(dev, ring);
> +	return intel_engine_enable_execlists(engine);
>   }
>   
> -int intel_init_bsd_ring_buffer(struct drm_device *dev)
> +int intel_init_bsd_engine(struct drm_i915_private *dev_priv)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VCS];
> +	struct intel_engine_cs *engine = &dev_priv->engine[VCS];
> +	int ret;
> +
> +	ret = intel_engine_init(engine, dev_priv);
> +	if (ret)
> +		return ret;
> +
> +	engine->name = "bsd ring";
> +	engine->id = VCS;
> +	engine->power_domains = FORCEWAKE_MEDIA;
>   
> -	ring->name = "bsd ring";
> -	ring->id = VCS;
> +	if (INTEL_INFO(dev_priv)->gen >= 6) {
> +		engine->mmio_base = GEN6_BSD_RING_BASE;
>   
> -	ring->write_tail = ring_write_tail;
> -	if (INTEL_INFO(dev)->gen >= 6) {
> -		ring->mmio_base = GEN6_BSD_RING_BASE;
>   		/* gen6 bsd needs a special wa for tail updates */
> -		if (IS_GEN6(dev))
> -			ring->write_tail = gen6_bsd_ring_write_tail;
> -		ring->flush = gen6_bsd_ring_flush;
> -		ring->add_request = gen6_add_request;
> -		ring->get_seqno = gen6_ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
> -		if (INTEL_INFO(dev)->gen >= 8) {
> -			ring->irq_enable_mask =
> +		if (IS_GEN6(dev_priv))
> +			engine->write_tail = gen6_bsd_ring_write_tail;
> +		engine->emit_flush = gen6_bsd_emit_flush;
> +		engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +		if (INTEL_INFO(dev_priv)->gen >= 8) {
> +			engine->irq_enable_mask =
>   				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> -			ring->irq_get = gen8_ring_get_irq;
> -			ring->irq_put = gen8_ring_put_irq;
> -			ring->dispatch_execbuffer =
> -				gen8_ring_dispatch_execbuffer;
> -			if (i915_semaphore_is_enabled(dev)) {
> -				ring->semaphore.sync_to = gen8_ring_sync;
> -				ring->semaphore.signal = gen8_xcs_signal;
> -				GEN8_RING_SEMAPHORE_INIT;
> -			}
> +			engine->irq_get = gen8_irq_get;
> +			engine->irq_put = gen8_irq_put;
> +			engine->emit_batchbuffer = gen8_emit_batchbuffer;
> +			gen8_engine_init_semaphore(engine);
>   		} else {
> -			ring->irq_enable_mask = GT_BSD_USER_INTERRUPT;
> -			ring->irq_get = gen6_ring_get_irq;
> -			ring->irq_put = gen6_ring_put_irq;
> -			ring->dispatch_execbuffer =
> -				gen6_ring_dispatch_execbuffer;
> -			if (i915_semaphore_is_enabled(dev)) {
> -				ring->semaphore.sync_to = gen6_ring_sync;
> -				ring->semaphore.signal = gen6_signal;
> -				ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VR;
> -				ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID;
> -				ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VB;
> -				ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_VVE;
> -				ring->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> -				ring->semaphore.mbox.signal[RCS] = GEN6_RVSYNC;
> -				ring->semaphore.mbox.signal[VCS] = GEN6_NOSYNC;
> -				ring->semaphore.mbox.signal[BCS] = GEN6_BVSYNC;
> -				ring->semaphore.mbox.signal[VECS] = GEN6_VEVSYNC;
> -				ring->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
> +			engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
> +			engine->irq_get = gen6_irq_get;
> +			engine->irq_barrier = gen6_irq_barrier;
> +			engine->irq_put = gen6_irq_put;
> +			engine->emit_batchbuffer = gen6_emit_batchbuffer;
> +			if (semaphores_enabled(dev_priv)) {
> +				engine->semaphore.wait = gen6_emit_wait;
> +				engine->semaphore.signal = gen6_emit_signal;
> +				engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VR;
> +				engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID;
> +				engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VB;
> +				engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_VVE;
> +				engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> +				engine->semaphore.mbox.signal[RCS] = GEN6_RVSYNC;
> +				engine->semaphore.mbox.signal[VCS] = GEN6_NOSYNC;
> +				engine->semaphore.mbox.signal[BCS] = GEN6_BVSYNC;
> +				engine->semaphore.mbox.signal[VECS] = GEN6_VEVSYNC;
> +				engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
>   			}
>   		}
>   	} else {
> -		ring->mmio_base = BSD_RING_BASE;
> -		ring->flush = bsd_ring_flush;
> -		ring->add_request = i9xx_add_request;
> -		ring->get_seqno = ring_get_seqno;
> -		ring->set_seqno = ring_set_seqno;
> -		if (IS_GEN5(dev)) {
> -			ring->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
> -			ring->irq_get = gen5_ring_get_irq;
> -			ring->irq_put = gen5_ring_put_irq;
> +		engine->mmio_base = BSD_RING_BASE;
> +
> +		engine->emit_flush = bsd_emit_flush;
> +		engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +		if (IS_GEN5(dev_priv)) {
> +			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
> +			engine->irq_get = gen5_irq_get;
> +			engine->irq_put = gen5_irq_put;
>   		} else {
> -			ring->irq_enable_mask = I915_BSD_USER_INTERRUPT;
> -			ring->irq_get = i9xx_ring_get_irq;
> -			ring->irq_put = i9xx_ring_put_irq;
> +			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
> +			engine->irq_get = i9xx_irq_get;
> +			engine->irq_put = i9xx_irq_put;
>   		}
> -		ring->dispatch_execbuffer = i965_dispatch_execbuffer;
> +		engine->emit_batchbuffer = i965_emit_batchbuffer;
>   	}
> -	ring->init = init_ring_common;
>   
> -	return intel_init_ring_buffer(dev, ring);
> +	return intel_engine_enable_execlists(engine);
>   }
>   
>   /**
>    * Initialize the second BSD ring for Broadwell GT3.
>    * It is noted that this only exists on Broadwell GT3.
>    */
> -int intel_init_bsd2_ring_buffer(struct drm_device *dev)
> +int intel_init_bsd2_engine(struct drm_i915_private *dev_priv)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VCS2];
> +	struct intel_engine_cs *engine = &dev_priv->engine[VCS2];
> +	int ret;
>   
> -	if ((INTEL_INFO(dev)->gen != 8)) {
> +	if ((INTEL_INFO(dev_priv)->gen != 8)) {
>   		DRM_ERROR("No dual-BSD ring on non-BDW machine\n");
>   		return -EINVAL;
>   	}
>   
> -	ring->name = "bsd2 ring";
> -	ring->id = VCS2;
> +	ret = intel_engine_init(engine, dev_priv);
> +	if (ret)
> +		return ret;
> +
> +	engine->name = "bsd2 ring";
> +	engine->id = VCS2;
> +	engine->power_domains = FORCEWAKE_MEDIA;
> +	engine->mmio_base = GEN8_BSD2_RING_BASE;
>   
> -	ring->write_tail = ring_write_tail;
> -	ring->mmio_base = GEN8_BSD2_RING_BASE;
> -	ring->flush = gen6_bsd_ring_flush;
> -	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
> -	ring->irq_enable_mask =
> +	engine->emit_flush = gen6_bsd_emit_flush;
> +	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +	engine->emit_batchbuffer = gen8_emit_batchbuffer;
> +	engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> -	ring->irq_get = gen8_ring_get_irq;
> -	ring->irq_put = gen8_ring_put_irq;
> -	ring->dispatch_execbuffer =
> -			gen8_ring_dispatch_execbuffer;
> -	if (i915_semaphore_is_enabled(dev)) {
> -		ring->semaphore.sync_to = gen8_ring_sync;
> -		ring->semaphore.signal = gen8_xcs_signal;
> -		GEN8_RING_SEMAPHORE_INIT;
> -	}
> -	ring->init = init_ring_common;
> +	engine->irq_get = gen8_irq_get;
> +	engine->irq_put = gen8_irq_put;
> +	gen8_engine_init_semaphore(engine);
>   
> -	return intel_init_ring_buffer(dev, ring);
> +	return intel_engine_enable_execlists(engine);
>   }
>   
> -int intel_init_blt_ring_buffer(struct drm_device *dev)
> +int intel_init_blt_engine(struct drm_i915_private *dev_priv)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[BCS];
> +	struct intel_engine_cs *engine = &dev_priv->engine[BCS];
> +	int ret;
> +
> +	ret = intel_engine_init(engine, dev_priv);
> +	if (ret)
> +		return ret;
>   
> -	ring->name = "blitter ring";
> -	ring->id = BCS;
> +	engine->name = "blitter ring";
> +	engine->id = BCS;
> +	engine->power_domains = FORCEWAKE_MEDIA;
> +	engine->mmio_base = BLT_RING_BASE;
>   
> -	ring->mmio_base = BLT_RING_BASE;
> -	ring->write_tail = ring_write_tail;
> -	ring->flush = gen6_ring_flush;
> -	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
> -	if (INTEL_INFO(dev)->gen >= 8) {
> -		ring->irq_enable_mask =
> +	engine->emit_flush = gen6_blt_emit_flush;
> +	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +	if (INTEL_INFO(dev_priv)->gen >= 8) {
> +		engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> -		ring->irq_get = gen8_ring_get_irq;
> -		ring->irq_put = gen8_ring_put_irq;
> -		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			ring->semaphore.sync_to = gen8_ring_sync;
> -			ring->semaphore.signal = gen8_xcs_signal;
> -			GEN8_RING_SEMAPHORE_INIT;
> -		}
> +		engine->irq_get = gen8_irq_get;
> +		engine->irq_put = gen8_irq_put;
> +		engine->emit_batchbuffer = gen8_emit_batchbuffer;
> +		gen8_engine_init_semaphore(engine);
>   	} else {
> -		ring->irq_enable_mask = GT_BLT_USER_INTERRUPT;
> -		ring->irq_get = gen6_ring_get_irq;
> -		ring->irq_put = gen6_ring_put_irq;
> -		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			ring->semaphore.signal = gen6_signal;
> -			ring->semaphore.sync_to = gen6_ring_sync;
> +		engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
> +		engine->irq_get = gen6_irq_get;
> +		engine->irq_barrier = gen6_irq_barrier;
> +		engine->irq_put = gen6_irq_put;
> +		engine->emit_batchbuffer = gen6_emit_batchbuffer;
> +		if (semaphores_enabled(dev_priv)) {
> +			engine->semaphore.signal = gen6_emit_signal;
> +			engine->semaphore.wait = gen6_emit_wait;
>   			/*
>   			 * The current semaphore is only applied on pre-gen8
>   			 * platform.  And there is no VCS2 ring on the pre-gen8
> @@ -2545,124 +2196,510 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>   			 * initialized as INVALID.  Gen8 will initialize the
>   			 * sema between BCS and VCS2 later.
>   			 */
> -			ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_BR;
> -			ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_BV;
> -			ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_BVE;
> -			ring->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.signal[RCS] = GEN6_RBSYNC;
> -			ring->semaphore.mbox.signal[VCS] = GEN6_VBSYNC;
> -			ring->semaphore.mbox.signal[BCS] = GEN6_NOSYNC;
> -			ring->semaphore.mbox.signal[VECS] = GEN6_VEBSYNC;
> -			ring->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
> +			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_BR;
> +			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_BV;
> +			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_BVE;
> +			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.signal[RCS] = GEN6_RBSYNC;
> +			engine->semaphore.mbox.signal[VCS] = GEN6_VBSYNC;
> +			engine->semaphore.mbox.signal[BCS] = GEN6_NOSYNC;
> +			engine->semaphore.mbox.signal[VECS] = GEN6_VEBSYNC;
> +			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
>   		}
>   	}
> -	ring->init = init_ring_common;
>   
> -	return intel_init_ring_buffer(dev, ring);
> +	return intel_engine_enable_execlists(engine);
>   }
>   
> -int intel_init_vebox_ring_buffer(struct drm_device *dev)
> +int intel_init_vebox_engine(struct drm_i915_private *dev_priv)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct intel_engine_cs *ring = &dev_priv->ring[VECS];
> +	struct intel_engine_cs *engine = &dev_priv->engine[VECS];
> +	int ret;
>   
> -	ring->name = "video enhancement ring";
> -	ring->id = VECS;
> +	ret = intel_engine_init(engine, dev_priv);
> +	if (ret)
> +		return ret;
>   
> -	ring->mmio_base = VEBOX_RING_BASE;
> -	ring->write_tail = ring_write_tail;
> -	ring->flush = gen6_ring_flush;
> -	ring->add_request = gen6_add_request;
> -	ring->get_seqno = gen6_ring_get_seqno;
> -	ring->set_seqno = ring_set_seqno;
> +	engine->name = "video enhancement ring";
> +	engine->id = VECS;
> +	engine->power_domains = FORCEWAKE_MEDIA;
> +	engine->mmio_base = VEBOX_RING_BASE;
>   
> -	if (INTEL_INFO(dev)->gen >= 8) {
> -		ring->irq_enable_mask =
> +	engine->emit_flush = gen6_blt_emit_flush;
> +	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> +
> +	if (INTEL_INFO(dev_priv)->gen >= 8) {
> +		engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> -		ring->irq_get = gen8_ring_get_irq;
> -		ring->irq_put = gen8_ring_put_irq;
> -		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			ring->semaphore.sync_to = gen8_ring_sync;
> -			ring->semaphore.signal = gen8_xcs_signal;
> -			GEN8_RING_SEMAPHORE_INIT;
> -		}
> +		engine->irq_get = gen8_irq_get;
> +		engine->irq_put = gen8_irq_put;
> +		engine->emit_batchbuffer = gen8_emit_batchbuffer;
> +		gen8_engine_init_semaphore(engine);
>   	} else {
> -		ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
> -		ring->irq_get = hsw_vebox_get_irq;
> -		ring->irq_put = hsw_vebox_put_irq;
> -		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> -		if (i915_semaphore_is_enabled(dev)) {
> -			ring->semaphore.sync_to = gen6_ring_sync;
> -			ring->semaphore.signal = gen6_signal;
> -			ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VER;
> -			ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_VEV;
> -			ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VEB;
> -			ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> -			ring->semaphore.mbox.signal[RCS] = GEN6_RVESYNC;
> -			ring->semaphore.mbox.signal[VCS] = GEN6_VVESYNC;
> -			ring->semaphore.mbox.signal[BCS] = GEN6_BVESYNC;
> -			ring->semaphore.mbox.signal[VECS] = GEN6_NOSYNC;
> -			ring->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
> +		engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
> +		engine->irq_get = hsw_vebox_irq_get;
> +		engine->irq_barrier = gen6_irq_barrier;
> +		engine->irq_put = hsw_vebox_irq_put;
> +		engine->emit_batchbuffer = gen6_emit_batchbuffer;
> +		if (semaphores_enabled(dev_priv)) {
> +			engine->semaphore.wait = gen6_emit_wait;
> +			engine->semaphore.signal = gen6_emit_signal;
> +			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VER;
> +			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_VEV;
> +			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VEB;
> +			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
> +			engine->semaphore.mbox.signal[RCS] = GEN6_RVESYNC;
> +			engine->semaphore.mbox.signal[VCS] = GEN6_VVESYNC;
> +			engine->semaphore.mbox.signal[BCS] = GEN6_BVESYNC;
> +			engine->semaphore.mbox.signal[VECS] = GEN6_NOSYNC;
> +			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
>   		}
>   	}
> -	ring->init = init_ring_common;
>   
> -	return intel_init_ring_buffer(dev, ring);
> +	return intel_engine_enable_execlists(engine);
>   }
>   
>   int
> -intel_ring_flush_all_caches(struct intel_engine_cs *ring)
> +intel_engine_flush(struct intel_engine_cs *engine,
> +		   struct intel_context *ctx)
>   {
> +	struct i915_gem_request *rq;
>   	int ret;
>   
> -	if (!ring->gpu_caches_dirty)
> +	rq = intel_engine_alloc_request(engine, ctx);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	ret = i915_request_emit_breadcrumb(rq);
> +	if (ret == 0)
> +		ret = i915_request_commit(rq);
> +	i915_request_put(rq);
> +
> +	return ret;
> +}
> +
> +int intel_engine_sync(struct intel_engine_cs *engine)
> +{
> +	/* Wait upon the last request to be completed */
> +	if (engine->last_request == NULL)
>   		return 0;
>   
> -	ret = ring->flush(ring, 0, I915_GEM_GPU_DOMAINS);
> +	return i915_request_wait(engine->last_request);
> +}
> +
> +static u32
> +next_seqno(struct drm_i915_private *i915)
> +{
> +	/* reserve 0 for non-seqno */
> +	if (++i915->next_seqno == 0)
> +		++i915->next_seqno;
> +	return i915->next_seqno;
> +}
> +
> +struct i915_gem_request *
> +intel_engine_alloc_request(struct intel_engine_cs *engine,
> +			   struct intel_context *ctx)
> +{
> +	struct intel_ringbuffer *ring;
> +	struct i915_gem_request *rq;
> +	int ret, n;
> +
> +	ring = ctx->ring[engine->id].ring;
> +	if (ring == NULL) {
> +		ring = engine->get_ring(engine, ctx);
> +		if (IS_ERR(ring))
> +			return ERR_CAST(ring);
> +
> +		ctx->ring[engine->id].ring = ring;
> +	}
> +
> +	rq = kzalloc(sizeof(*rq), GFP_KERNEL);
> +	if (rq == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&rq->kref);
> +	INIT_LIST_HEAD(&rq->vmas);
> +	INIT_LIST_HEAD(&rq->breadcrumb_link);
> +
> +	rq->i915 = engine->i915;
> +	rq->ring = ring;
> +	rq->engine = engine;
> +
> +	rq->reset_counter = atomic_read(&rq->i915->gpu_error.reset_counter);
> +	if (rq->reset_counter & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED)) {
> +		ret = rq->reset_counter & I915_WEDGED ? -EIO : -EAGAIN;
> +		goto err;
> +	}
> +
> +	rq->seqno = next_seqno(rq->i915);
> +	memcpy(rq->semaphore, engine->semaphore.sync, sizeof(rq->semaphore));
> +	for (n = 0; n < ARRAY_SIZE(rq->semaphore); n++)
> +		if (__i915_seqno_passed(rq->semaphore[n], rq->seqno))
> +			rq->semaphore[n] = 0;
> +	rq->head = ring->tail;
> +	rq->outstanding = true;
> +	rq->pending_flush = ring->pending_flush;
> +
> +	rq->ctx = ctx;
> +	i915_gem_context_reference(rq->ctx);
> +
> +	ret = i915_request_switch_context(rq);
>   	if (ret)
> -		return ret;
> +		goto err_ctx;
> +
> +	return rq;
> +
> +err_ctx:
> +	i915_gem_context_unreference(ctx);
> +err:
> +	kfree(rq);
> +	return ERR_PTR(ret);
> +}
> +
> +struct i915_gem_request *
> +intel_engine_seqno_to_request(struct intel_engine_cs *engine,
> +			      u32 seqno)
> +{
> +	struct i915_gem_request *rq;
> +
> +	list_for_each_entry(rq, &engine->requests, engine_list) {
> +		if (rq->seqno == seqno)
> +			return rq;
> +
> +		if (__i915_seqno_passed(rq->seqno, seqno))
> +			break;
> +	}
> +
> +	return NULL;
> +}
> +
> +void intel_engine_cleanup(struct intel_engine_cs *engine)
> +{
> +	WARN_ON(engine->last_request);
> +
> +	if (engine->cleanup)
> +		engine->cleanup(engine);
> +}
> +
> +static void intel_engine_clear_rings(struct intel_engine_cs *engine)
> +{
> +	struct intel_ringbuffer *ring;
> +
> +	list_for_each_entry(ring, &engine->rings, engine_list) {
> +		if (ring->retired_head != -1) {
> +			ring->head = ring->retired_head;
> +			ring->retired_head = -1;
> +
> +			ring->space = intel_ring_space(ring);
> +		}
> +
> +		if (ring->last_context != NULL) {
> +			struct drm_i915_gem_object *obj;
> +
> +			obj = ring->last_context->ring[engine->id].state;
> +			if (obj)
> +				i915_gem_object_ggtt_unpin(obj);
> +
> +			ring->last_context = NULL;
> +		}
> +	}
> +}
> +
> +int intel_engine_suspend(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	int ret = 0;
> +
> +	if (WARN_ON(!intel_engine_initialized(engine)))
> +		return 0;
> +
> +	I915_WRITE_IMR(engine, ~0);
> +
> +	if (engine->suspend)
> +		ret = engine->suspend(engine);
> +
> +	intel_engine_clear_rings(engine);
> +
> +	return ret;
> +}
> +
> +int intel_engine_resume(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	int ret = 0;
> +
> +	if (WARN_ON(!intel_engine_initialized(engine)))
> +		return 0;
> +
> +	if (engine->resume)
> +		ret = engine->resume(engine);
> +
> +	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
> +	return ret;
> +}
> +
> +int intel_engine_retire(struct intel_engine_cs *engine,
> +			u32 seqno)
> +{
> +	int count;
> +
> +	if (engine->retire)
> +		engine->retire(engine, seqno);
> +
> +	count = 0;
> +	while (!list_empty(&engine->requests)) {
> +		struct i915_gem_request *rq;
> +
> +		rq = list_first_entry(&engine->requests,
> +				      struct i915_gem_request,
> +				      engine_list);
> +
> +		if (!__i915_seqno_passed(seqno, rq->seqno))
> +			break;
> +
> +		i915_request_retire(rq);
> +		count++;
> +	}
> +
> +	if (unlikely(engine->trace_irq_seqno &&
> +		     __i915_seqno_passed(seqno, engine->trace_irq_seqno))) {
> +		engine->irq_put(engine);
> +		engine->trace_irq_seqno = 0;
> +	}
> +
> +	return count;
> +}
> +
> +static struct i915_gem_request *
> +find_active_batch(struct list_head *list)
> +{
> +	struct i915_gem_request *rq, *last = NULL;
> +
> +	list_for_each_entry(rq, list, engine_list) {
> +		if (rq->batch == NULL)
> +			continue;
>   
> -	trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
> +		if (!__i915_request_complete__wa(rq))
> +			return rq;
> +
> +		last = rq;
> +	}
> +
> +	return last;
> +}
> +
> +static bool context_is_banned(const struct intel_context *ctx,
> +			      unsigned long now)
> +{
> +	const struct i915_ctx_hang_stats *hs = &ctx->hang_stats;
> +
> +	if (hs->banned)
> +		return true;
> +
> +	if (hs->ban_period_seconds == 0)
> +		return false;
> +
> +	if (now - hs->guilty_ts <= hs->ban_period_seconds) {
> +		if (!i915_gem_context_is_default(ctx)) {
> +			DRM_DEBUG("context hanging too fast, banning!\n");
> +			return true;
> +		} else if (i915_stop_ring_allow_ban(ctx->i915)) {
> +			if (i915_stop_ring_allow_warn(ctx->i915))
> +				DRM_ERROR("gpu hanging too fast, banning!\n");
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +static void
> +intel_engine_hangstats(struct intel_engine_cs *engine)
> +{
> +	struct i915_ctx_hang_stats *hs;
> +	struct i915_gem_request *rq;
> +
> +	rq = find_active_batch(&engine->requests);
> +	if (rq == NULL)
> +		return;
> +
> +	hs = &rq->ctx->hang_stats;
> +	if (engine->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
> +		unsigned long now = get_seconds();
> +		hs->banned = context_is_banned(rq->ctx, now);
> +		hs->guilty_ts = now;
> +		hs->batch_active++;
> +	} else
> +		hs->batch_pending++;
> +
> +	list_for_each_entry_continue(rq, &engine->requests, engine_list) {
> +		if (rq->batch == NULL)
> +			continue;
> +
> +		if (__i915_request_complete__wa(rq))
> +			continue;
> +
> +		rq->ctx->hang_stats.batch_pending++;
> +	}
> +}
> +
> +void intel_engine_reset(struct intel_engine_cs *engine)
> +{
> +	if (WARN_ON(!intel_engine_initialized(engine)))
> +		return;
> +
> +	if (engine->reset)
> +		engine->reset(engine);
> +
> +	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
> +	intel_engine_hangstats(engine);
> +
> +	intel_engine_retire(engine, engine->i915->next_seqno);
> +	intel_engine_clear_rings(engine);
> +}
> +
> +static int ring_wait(struct intel_ringbuffer *ring, int n)
> +{
> +	int ret;
> +
> +	trace_intel_ringbuffer_wait(ring, n);
> +
> +	do {
> +		struct i915_gem_request *rq;
> +
> +		i915_gem_retire_requests__engine(ring->engine);
> +		if (ring->retired_head != -1) {
> +			ring->head = ring->retired_head;
> +			ring->retired_head = -1;
> +
> +			ring->space = intel_ring_space(ring);
> +			if (ring->space >= n)
> +				return 0;
> +		}
> +
> +		list_for_each_entry(rq, &ring->breadcrumbs, breadcrumb_link)
> +			if (__intel_ring_space(rq->tail, ring->tail,
> +					       ring->size, I915_RING_RSVD) >= n)
> +				break;
> +
> +		if (WARN_ON(&rq->breadcrumb_link == &ring->breadcrumbs))
> +			return -EDEADLK;
> +
> +		ret = i915_request_wait(rq);
> +	} while (ret == 0);
> +
> +	return ret;
> +}
> +
> +static int ring_wrap(struct intel_ringbuffer *ring, int bytes)
> +{
> +	uint32_t __iomem *virt;
> +	int rem;
> +
> +	rem = ring->size - ring->tail;
> +	if (unlikely(ring->space < rem)) {
> +		rem = ring_wait(ring, rem);
> +		if (rem)
> +			return rem;
> +	}
> +
> +	trace_intel_ringbuffer_wrap(ring, rem);
> +
> +	virt = ring->virtual_start + ring->tail;
> +	rem = ring->size - ring->tail;
> +
> +	ring->space -= rem;
> +	ring->tail = 0;
> +
> +	rem /= 4;
> +	while (rem--)
> +		iowrite32(MI_NOOP, virt++);
>   
> -	ring->gpu_caches_dirty = false;
>   	return 0;
>   }
>   
> -int
> -intel_ring_invalidate_all_caches(struct intel_engine_cs *ring)
> +static int __intel_ring_prepare(struct intel_ringbuffer *ring,
> +				int bytes)
>   {
> -	uint32_t flush_domains;
>   	int ret;
>   
> -	flush_domains = 0;
> -	if (ring->gpu_caches_dirty)
> -		flush_domains = I915_GEM_GPU_DOMAINS;
> +	trace_intel_ringbuffer_begin(ring, bytes);
>   
> -	ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, flush_domains);
> -	if (ret)
> -		return ret;
> +	if (unlikely(ring->tail + bytes > ring->effective_size)) {
> +		ret = ring_wrap(ring, bytes);
> +		if (unlikely(ret))
> +			return ret;
> +	}
>   
> -	trace_i915_gem_ring_flush(ring, I915_GEM_GPU_DOMAINS, flush_domains);
> +	if (unlikely(ring->space < bytes)) {
> +		ret = ring_wait(ring, bytes);
> +		if (unlikely(ret))
> +			return ret;
> +	}
>   
> -	ring->gpu_caches_dirty = false;
>   	return 0;
>   }
>   
> -void
> -intel_stop_ring_buffer(struct intel_engine_cs *ring)
> +struct intel_ringbuffer *
> +intel_ring_begin(struct i915_gem_request *rq,
> +		 int num_dwords)
>   {
> +	struct intel_ringbuffer *ring = rq->ring;
>   	int ret;
>   
> -	if (!intel_ring_initialized(ring))
> -		return;
> +	/* TAIL updates must be aligned to a qword, so make sure we
> +	 * reserve space for any implicit padding required for this
> +	 * command.
> +	 */
> +	ret = __intel_ring_prepare(ring,
> +				   ALIGN(num_dwords, 2) * sizeof(uint32_t));
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	ring->space -= num_dwords * sizeof(uint32_t);
> +
> +	return ring;
> +}
> +
> +/* Align the ring tail to a cacheline boundary */
> +int intel_ring_cacheline_align(struct i915_gem_request *rq)
> +{
> +	struct intel_ringbuffer *ring;
> +	int tail, num_dwords;
> +
> +	do {
> +		tail = rq->ring->tail;
> +		num_dwords = (tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
> +		if (num_dwords == 0)
> +			return 0;
> +
> +		num_dwords = CACHELINE_BYTES / sizeof(uint32_t) - num_dwords;
> +		ring = intel_ring_begin(rq, num_dwords);
> +		if (IS_ERR(ring))
> +			return PTR_ERR(ring);
> +	} while (tail != rq->ring->tail);
> +
> +	while (num_dwords--)
> +		intel_ring_emit(ring, MI_NOOP);
> +
> +	intel_ring_advance(ring);
> +
> +	return 0;
> +}
> +
> +struct i915_gem_request *
> +intel_engine_find_active_batch(struct intel_engine_cs *engine)
> +{
> +	struct i915_gem_request *rq;
> +	unsigned long flags;
>   
> -	ret = intel_ring_idle(ring);
> -	if (ret && !i915_reset_in_progress(&to_i915(ring->dev)->gpu_error))
> -		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
> -			  ring->name, ret);
> +	spin_lock_irqsave(&engine->irqlock, flags);
> +	rq = find_active_batch(&engine->submitted);
> +	spin_unlock_irqrestore(&engine->irqlock, flags);
> +	if (rq)
> +		return rq;
>   
> -	stop_ring(ring);
> +	return find_active_batch(&engine->requests);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d689bac5c84f..46c8d2288821 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -20,61 +20,47 @@
>    * "If the Ring Buffer Head Pointer and the Tail Pointer are on the same
>    * cacheline, the Head Pointer must not be greater than the Tail
>    * Pointer."
> + *
> + * To also accommodate errata on 830/845 which makes the last pair of cachlines
> + * in the ringbuffer unavailable, reduce the available space further.
>    */
> -#define I915_RING_FREE_SPACE 64
> +#define I915_RING_RSVD (2*CACHELINE_BYTES)
>   
> -struct  intel_hw_status_page {
> +struct intel_hw_status_page {
>   	u32		*page_addr;
>   	unsigned int	gfx_addr;
>   	struct		drm_i915_gem_object *obj;
>   };
>   
> -#define I915_READ_TAIL(ring) I915_READ(RING_TAIL((ring)->mmio_base))
> -#define I915_WRITE_TAIL(ring, val) I915_WRITE(RING_TAIL((ring)->mmio_base), val)
> +#define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
> +#define I915_WRITE_TAIL(engine, val) I915_WRITE(RING_TAIL((engine)->mmio_base), val)
>   
> -#define I915_READ_START(ring) I915_READ(RING_START((ring)->mmio_base))
> -#define I915_WRITE_START(ring, val) I915_WRITE(RING_START((ring)->mmio_base), val)
> +#define I915_READ_START(engine) I915_READ(RING_START((engine)->mmio_base))
> +#define I915_WRITE_START(engine, val) I915_WRITE(RING_START((engine)->mmio_base), val)
>   
> -#define I915_READ_HEAD(ring)  I915_READ(RING_HEAD((ring)->mmio_base))
> -#define I915_WRITE_HEAD(ring, val) I915_WRITE(RING_HEAD((ring)->mmio_base), val)
> +#define I915_READ_HEAD(engine)  I915_READ(RING_HEAD((engine)->mmio_base))
> +#define I915_WRITE_HEAD(engine, val) I915_WRITE(RING_HEAD((engine)->mmio_base), val)
>   
> -#define I915_READ_CTL(ring) I915_READ(RING_CTL((ring)->mmio_base))
> -#define I915_WRITE_CTL(ring, val) I915_WRITE(RING_CTL((ring)->mmio_base), val)
> +#define I915_READ_CTL(engine) I915_READ(RING_CTL((engine)->mmio_base))
> +#define I915_WRITE_CTL(engine, val) I915_WRITE(RING_CTL((engine)->mmio_base), val)
>   
> -#define I915_READ_IMR(ring) I915_READ(RING_IMR((ring)->mmio_base))
> -#define I915_WRITE_IMR(ring, val) I915_WRITE(RING_IMR((ring)->mmio_base), val)
> +#define I915_READ_IMR(engine) I915_READ(RING_IMR((engine)->mmio_base))
> +#define I915_WRITE_IMR(engine, val) I915_WRITE(RING_IMR((engine)->mmio_base), val)
>   
> -#define I915_READ_MODE(ring) I915_READ(RING_MI_MODE((ring)->mmio_base))
> -#define I915_WRITE_MODE(ring, val) I915_WRITE(RING_MI_MODE((ring)->mmio_base), val)
> +#define I915_READ_MODE(engine) I915_READ(RING_MI_MODE((engine)->mmio_base))
> +#define I915_WRITE_MODE(engine, val) I915_WRITE(RING_MI_MODE((engine)->mmio_base), val)
>   
>   /* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
>    * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
>    */
>   #define i915_semaphore_seqno_size sizeof(uint64_t)
> -#define GEN8_SIGNAL_OFFSET(__ring, to)			     \
> -	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> -	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) +	\
> -	(i915_semaphore_seqno_size * (to)))
> -
> -#define GEN8_WAIT_OFFSET(__ring, from)			     \
> -	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> -	((from) * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> -	(i915_semaphore_seqno_size * (__ring)->id))
> -
> -#define GEN8_RING_SEMAPHORE_INIT do { \
> -	if (!dev_priv->semaphore_obj) { \
> -		break; \
> -	} \
> -	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
> -	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
> -	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
> -	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \
> -	ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(ring, VCS2); \
> -	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
> -	} while(0)
> -
> -enum intel_ring_hangcheck_action {
> +#define GEN8_SEMAPHORE_OFFSET(__dp, __from, __to)			     \
> +	(i915_gem_obj_ggtt_offset((__dp)->semaphore_obj) + \
> +	 ((__from) * I915_NUM_ENGINES + (__to)) * i915_semaphore_seqno_size)
> +
> +enum intel_engine_hangcheck_action {
>   	HANGCHECK_IDLE = 0,
> +	HANGCHECK_IDLE_WAITERS,
>   	HANGCHECK_WAIT,
>   	HANGCHECK_ACTIVE,
>   	HANGCHECK_ACTIVE_LOOP,
> @@ -84,47 +70,61 @@ enum intel_ring_hangcheck_action {
>   
>   #define HANGCHECK_SCORE_RING_HUNG 31
>   
> -struct intel_ring_hangcheck {
> +struct intel_engine_hangcheck {
>   	u64 acthd;
>   	u64 max_acthd;
>   	u32 seqno;
> +	u32 interrupts;
>   	int score;
> -	enum intel_ring_hangcheck_action action;
> +	enum intel_engine_hangcheck_action action;
>   	int deadlock;
>   };
>   
> +struct i915_gem_request;
> +struct intel_context;
> +struct intel_engine_cs;
> +
>   struct intel_ringbuffer {
> +	struct intel_context *last_context;
> +
> +	struct intel_engine_cs *engine;
> +	struct intel_context *ctx;
> +	struct list_head engine_list;
> +
>   	struct drm_i915_gem_object *obj;
>   	void __iomem *virtual_start;
>   
> -	struct intel_engine_cs *ring;
> -
> -	/*
> -	 * FIXME: This backpointer is an artifact of the history of how the
> -	 * execlist patches came into being. It will get removed once the basic
> -	 * code has landed.
> +	/**
> +	 * List of breadcrumbs associated with GPU requests currently
> +	 * outstanding.
>   	 */
> -	struct intel_context *FIXME_lrc_ctx;
> +	struct list_head requests;
> +	struct list_head breadcrumbs;
>   
> -	u32 head;
> -	u32 tail;
> +	int head;
> +	int tail;
>   	int space;
> +
>   	int size;
>   	int effective_size;
>   
>   	/** We track the position of the requests in the ring buffer, and
> -	 * when each is retired we increment last_retired_head as the GPU
> +	 * when each is retired we increment retired_head as the GPU
>   	 * must have finished processing the request and so we know we
>   	 * can advance the ringbuffer up to that position.
>   	 *
> -	 * last_retired_head is set to -1 after the value is consumed so
> +	 * retired_head is set to -1 after the value is consumed so
>   	 * we can detect new retirements.
>   	 */
> -	u32 last_retired_head;
> +	int retired_head;
> +	int breadcrumb_tail;
> +
> +	unsigned pending_flush:4;
>   };
>   
> -struct  intel_engine_cs {
> -	const char	*name;
> +struct intel_engine_cs {
> +	struct drm_i915_private *i915;
> +	const char *name;
>   	enum intel_ring_id {
>   		RCS = 0x0,
>   		VCS,
> @@ -132,46 +132,82 @@ struct  intel_engine_cs {
>   		VECS,
>   		VCS2
>   	} id;
> -#define I915_NUM_RINGS 5
> +#define I915_NUM_ENGINES 5
> +#define I915_NUM_ENGINE_BITS 4
>   #define LAST_USER_RING (VECS + 1)
> -	u32		mmio_base;
> -	struct		drm_device *dev;
> -	struct intel_ringbuffer *buffer;
> +	u32 mmio_base;
> +	u32 power_domains;
>   
> -	struct intel_hw_status_page status_page;
> +	/* protects requests against hangcheck */
> +	spinlock_t lock;
> +	/* protects exlists: pending + submitted */
> +	spinlock_t irqlock;
>   
> -	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
> -	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
> -	u32		trace_irq_seqno;
> -	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
> -	void		(*irq_put)(struct intel_engine_cs *ring);
> +	atomic_t interrupts;
> +	u32 breadcrumb[I915_NUM_ENGINES];
> +	u16 tag, next_tag;
> +
> +	struct list_head rings;
> +	struct list_head requests;
> +	struct list_head pending, submitted;
> +	struct i915_gem_request *last_request;
>   
> -	int		(*init)(struct intel_engine_cs *ring);
> +	struct intel_hw_status_page status_page;
>   
> -	int		(*init_context)(struct intel_engine_cs *ring);
> +	struct intel_ringbuffer *legacy_ring;
> +
> +	unsigned irq_refcount; /* protected by i915->irq_lock */
> +	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
> +	u32             irq_keep_mask; /* never mask these interrupts */
> +	u32		trace_irq_seqno;
> +	bool __must_check (*irq_get)(struct intel_engine_cs *engine);
> +	void		(*irq_barrier)(struct intel_engine_cs *engine);
> +	void		(*irq_put)(struct intel_engine_cs *engine);
> +
> +	struct intel_ringbuffer *
> +			(*get_ring)(struct intel_engine_cs *engine,
> +				    struct intel_context *ctx);
> +	void		(*put_ring)(struct intel_ringbuffer *ring,
> +				    struct intel_context *ctx);
> +
> +	void		(*retire)(struct intel_engine_cs *engine,
> +				  u32 seqno);
> +	void		(*reset)(struct intel_engine_cs *engine);
> +	int		(*suspend)(struct intel_engine_cs *engine);
> +	int		(*resume)(struct intel_engine_cs *engine);
> +	void		(*cleanup)(struct intel_engine_cs *engine);
>   
> -	void		(*write_tail)(struct intel_engine_cs *ring,
> -				      u32 value);
> -	int __must_check (*flush)(struct intel_engine_cs *ring,
> -				  u32	invalidate_domains,
> -				  u32	flush_domains);
> -	int		(*add_request)(struct intel_engine_cs *ring);
>   	/* Some chipsets are not quite as coherent as advertised and need
>   	 * an expensive kick to force a true read of the up-to-date seqno.
>   	 * However, the up-to-date seqno is not always required and the last
>   	 * seen value is good enough. Note that the seqno will always be
>   	 * monotonic, even if not coherent.
>   	 */
> -	u32		(*get_seqno)(struct intel_engine_cs *ring,
> -				     bool lazy_coherency);
> -	void		(*set_seqno)(struct intel_engine_cs *ring,
> +	u32		(*get_seqno)(struct intel_engine_cs *engine);
> +	void		(*set_seqno)(struct intel_engine_cs *engine,
>   				     u32 seqno);
> -	int		(*dispatch_execbuffer)(struct intel_engine_cs *ring,
> -					       u64 offset, u32 length,
> -					       unsigned flags);
> +
> +	int		(*init_context)(struct i915_gem_request *rq);
> +
> +	int __must_check (*emit_flush)(struct i915_gem_request *rq,
> +				       u32 domains);
> +#define I915_FLUSH_CACHES 0x1
> +#define I915_INVALIDATE_CACHES 0x2
> +#define I915_KICK_FBC 0x4
> +#define I915_COMMAND_BARRIER 0x8
> +	int __must_check (*emit_batchbuffer)(struct i915_gem_request *rq,
> +					     u64 offset, u32 length,
> +					     unsigned flags);
> +	int __must_check (*emit_breadcrumb)(struct i915_gem_request *rq);
> +
> +	int __must_check (*add_request)(struct i915_gem_request *rq);
> +	void		(*write_tail)(struct intel_engine_cs *engine,
> +				      u32 value);
> +
> +	bool (*is_complete)(struct i915_gem_request *rq);
> +
>   #define I915_DISPATCH_SECURE 0x1
>   #define I915_DISPATCH_PINNED 0x2
> -	void		(*cleanup)(struct intel_engine_cs *ring);
>   
>   	/* GEN8 signal/wait table - never trust comments!
>   	 *	  signal to	signal to    signal to   signal to      signal to
> @@ -211,38 +247,24 @@ struct  intel_engine_cs {
>   	 *  ie. transpose of f(x, y)
>   	 */
>   	struct {
> -		u32	sync_seqno[I915_NUM_RINGS-1];
> -
> -		union {
> -			struct {
> -				/* our mbox written by others */
> -				u32		wait[I915_NUM_RINGS];
> -				/* mboxes this ring signals to */
> -				u32		signal[I915_NUM_RINGS];
> -			} mbox;
> -			u64		signal_ggtt[I915_NUM_RINGS];
> -		};
> -
> -		/* AKA wait() */
> -		int	(*sync_to)(struct intel_engine_cs *ring,
> -				   struct intel_engine_cs *to,
> -				   u32 seqno);
> -		int	(*signal)(struct intel_engine_cs *signaller,
> -				  /* num_dwords needed by caller */
> -				  unsigned int num_dwords);
> +		struct {
> +			/* our mbox written by others */
> +			u32		wait[I915_NUM_ENGINES];
> +			/* mboxes this ring signals to */
> +			u32		signal[I915_NUM_ENGINES];
> +		} mbox;
> +
> +		int	(*wait)(struct i915_gem_request *waiter,
> +				struct i915_gem_request *signaller);
> +		int	(*signal)(struct i915_gem_request *rq, int id);
> +
> +		u32 sync[I915_NUM_ENGINES];
>   	} semaphore;
>   
>   	/* Execlists */
> -	spinlock_t execlist_lock;
> -	struct list_head execlist_queue;
> +	bool execlists_enabled;
> +	u32 execlists_submitted;
>   	u8 next_context_status_buffer;
> -	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
> -	int		(*emit_request)(struct intel_ringbuffer *ringbuf);
> -	int		(*emit_flush)(struct intel_ringbuffer *ringbuf,
> -				      u32 invalidate_domains,
> -				      u32 flush_domains);
> -	int		(*emit_bb_start)(struct intel_ringbuffer *ringbuf,
> -					 u64 offset, unsigned flags);
>   
>   	/**
>   	 * List of objects currently involved in rendering from the
> @@ -254,28 +276,13 @@ struct  intel_engine_cs {
>   	 *
>   	 * A reference is held on the buffer while on this list.
>   	 */
> -	struct list_head active_list;
> -
> -	/**
> -	 * List of breadcrumbs associated with GPU requests currently
> -	 * outstanding.
> -	 */
> -	struct list_head request_list;
> -
> -	/**
> -	 * Do we have some not yet emitted requests outstanding?
> -	 */
> -	struct drm_i915_gem_request *preallocated_lazy_request;
> -	u32 outstanding_lazy_seqno;
> -	bool gpu_caches_dirty;
> -	bool fbc_dirty;
> +	struct list_head read_list, write_list, fence_list;
>   
>   	wait_queue_head_t irq_queue;
>   
>   	struct intel_context *default_context;
> -	struct intel_context *last_context;
>   
> -	struct intel_ring_hangcheck hangcheck;
> +	struct intel_engine_hangcheck hangcheck;
>   
>   	struct {
>   		struct drm_i915_gem_object *obj;
> @@ -317,49 +324,32 @@ struct  intel_engine_cs {
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
>   };
>   
> -bool intel_ring_initialized(struct intel_engine_cs *ring);
> -
> -static inline unsigned
> -intel_ring_flag(struct intel_engine_cs *ring)
> +static inline bool
> +intel_engine_initialized(struct intel_engine_cs *engine)
>   {
> -	return 1 << ring->id;
> +	return engine->default_context;
>   }
>   
> -static inline u32
> -intel_ring_sync_index(struct intel_engine_cs *ring,
> -		      struct intel_engine_cs *other)
> +static inline unsigned
> +intel_engine_flag(struct intel_engine_cs *engine)
>   {
> -	int idx;
> -
> -	/*
> -	 * rcs -> 0 = vcs, 1 = bcs, 2 = vecs, 3 = vcs2;
> -	 * vcs -> 0 = bcs, 1 = vecs, 2 = vcs2, 3 = rcs;
> -	 * bcs -> 0 = vecs, 1 = vcs2. 2 = rcs, 3 = vcs;
> -	 * vecs -> 0 = vcs2, 1 = rcs, 2 = vcs, 3 = bcs;
> -	 * vcs2 -> 0 = rcs, 1 = vcs, 2 = bcs, 3 = vecs;
> -	 */
> -
> -	idx = (other - ring) - 1;
> -	if (idx < 0)
> -		idx += I915_NUM_RINGS;
> -
> -	return idx;
> +	return 1 << engine->id;
>   }
>   
>   static inline u32
> -intel_read_status_page(struct intel_engine_cs *ring,
> +intel_read_status_page(struct intel_engine_cs *engine,
>   		       int reg)
>   {
>   	/* Ensure that the compiler doesn't optimize away the load. */
>   	barrier();
> -	return ring->status_page.page_addr[reg];
> +	return engine->status_page.page_addr[reg];
>   }
>   
>   static inline void
> -intel_write_status_page(struct intel_engine_cs *ring,
> +intel_write_status_page(struct intel_engine_cs *engine,
>   			int reg, u32 value)
>   {
> -	ring->status_page.page_addr[reg] = value;
> +	engine->status_page.page_addr[reg] = value;
>   }
>   
>   /**
> @@ -381,64 +371,77 @@ intel_write_status_page(struct intel_engine_cs *ring,
>   #define I915_GEM_HWS_SCRATCH_INDEX	0x30
>   #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>   
> -void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
> -int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> -			       struct intel_ringbuffer *ringbuf);
> +struct intel_ringbuffer *
> +intel_engine_alloc_ring(struct intel_engine_cs *engine,
> +			struct intel_context *ctx,
> +			int size);
> +void intel_ring_free(struct intel_ringbuffer *ring);
>   
> -void intel_stop_ring_buffer(struct intel_engine_cs *ring);
> -void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
> -
> -int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
> -int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
> -static inline void intel_ring_emit(struct intel_engine_cs *ring,
> +struct intel_ringbuffer *__must_check
> +intel_ring_begin(struct i915_gem_request *rq, int n);
> +int __must_check intel_ring_cacheline_align(struct i915_gem_request *rq);
> +static inline void intel_ring_emit(struct intel_ringbuffer *ring,
>   				   u32 data)
>   {
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
> -	ringbuf->tail += 4;
> +	iowrite32(data, ring->virtual_start + ring->tail);
> +	ring->tail += 4;
>   }
> -static inline void intel_ring_advance(struct intel_engine_cs *ring)
> +static inline void intel_ring_advance(struct intel_ringbuffer *ring)
>   {
> -	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	ringbuf->tail &= ringbuf->size - 1;
> +	ring->tail &= ring->size - 1;
>   }
> -int __intel_ring_space(int head, int tail, int size);
> -int intel_ring_space(struct intel_ringbuffer *ringbuf);
> -bool intel_ring_stopped(struct intel_engine_cs *ring);
> -void __intel_ring_advance(struct intel_engine_cs *ring);
> -
> -int __must_check intel_ring_idle(struct intel_engine_cs *ring);
> -void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
> -int intel_ring_flush_all_caches(struct intel_engine_cs *ring);
> -int intel_ring_invalidate_all_caches(struct intel_engine_cs *ring);
> -
> -void intel_fini_pipe_control(struct intel_engine_cs *ring);
> -int intel_init_pipe_control(struct intel_engine_cs *ring);
> -
> -int intel_init_render_ring_buffer(struct drm_device *dev);
> -int intel_init_bsd_ring_buffer(struct drm_device *dev);
> -int intel_init_bsd2_ring_buffer(struct drm_device *dev);
> -int intel_init_blt_ring_buffer(struct drm_device *dev);
> -int intel_init_vebox_ring_buffer(struct drm_device *dev);
>   
> -u64 intel_ring_get_active_head(struct intel_engine_cs *ring);
> -void intel_ring_setup_status_page(struct intel_engine_cs *ring);
> -
> -static inline u32 intel_ring_get_tail(struct intel_ringbuffer *ringbuf)
> +static inline int __intel_ring_space(int head, int tail, int size, int rsvd)
>   {
> -	return ringbuf->tail;
> +	int space = head - (tail + 8);
> +	if (space < 0)
> +		space += size;
> +	return space - rsvd;
>   }
>   
> -static inline u32 intel_ring_get_seqno(struct intel_engine_cs *ring)
> +static inline int intel_ring_space(struct intel_ringbuffer *ring)
>   {
> -	BUG_ON(ring->outstanding_lazy_seqno == 0);
> -	return ring->outstanding_lazy_seqno;
> +	return __intel_ring_space(ring->head, ring->tail,
> +				  ring->size, I915_RING_RSVD);
>   }
>   
> -static inline void i915_trace_irq_get(struct intel_engine_cs *ring, u32 seqno)
> +
> +struct i915_gem_request * __must_check __attribute__((nonnull))
> +intel_engine_alloc_request(struct intel_engine_cs *engine,
> +			   struct intel_context *ctx);
> +
> +struct i915_gem_request *
> +intel_engine_find_active_batch(struct intel_engine_cs *engine);
> +
> +struct i915_gem_request *
> +intel_engine_seqno_to_request(struct intel_engine_cs *engine,
> +			      u32 seqno);
> +
> +int intel_init_render_engine(struct drm_i915_private *i915);
> +int intel_init_bsd_engine(struct drm_i915_private *i915);
> +int intel_init_bsd2_engine(struct drm_i915_private *i915);
> +int intel_init_blt_engine(struct drm_i915_private *i915);
> +int intel_init_vebox_engine(struct drm_i915_private *i915);
> +
> +#define intel_engine_hang(engine) \
> +	(engine->i915->gpu_error.stop_rings & intel_engine_flag(engine))
> +int __must_check intel_engine_sync(struct intel_engine_cs *engine);
> +int __must_check intel_engine_flush(struct intel_engine_cs *engine,
> +				    struct intel_context *ctx);
> +
> +int intel_engine_retire(struct intel_engine_cs *engine, u32 seqno);
> +void intel_engine_reset(struct intel_engine_cs *engine);
> +int intel_engine_suspend(struct intel_engine_cs *engine);
> +int intel_engine_resume(struct intel_engine_cs *engine);
> +void intel_engine_cleanup(struct intel_engine_cs *engine);
> +
> +
> +u64 intel_engine_get_active_head(struct intel_engine_cs *engine);
> +
> +static inline void i915_trace_irq_get(struct intel_engine_cs *engine, u32 seqno)
>   {
> -	if (ring->trace_irq_seqno == 0 && ring->irq_get(ring))
> -		ring->trace_irq_seqno = seqno;
> +	if (engine->trace_irq_seqno || engine->irq_get(engine))
> +		engine->trace_irq_seqno = seqno;
>   }
>   
>   #endif /* _INTEL_RINGBUFFER_H_ */




More information about the Intel-gfx mailing list