[Intel-gfx] [PATCH] drm/i915: Use hash tables for the command parser

Mon May 12 18:41:02 CEST 2014

On Mon, May 12, 2014 at 09:24:06AM -0700, Daniel Vetter wrote:
> On Thu, May 08, 2014 at 09:02:18AM -0700, Volkin, Bradley D wrote:
> > On Thu, May 08, 2014 at 08:45:07AM -0700, Ville Syrjälä wrote:
> > > On Thu, May 08, 2014 at 08:27:16AM -0700, Volkin, Bradley D wrote:
> > > > On Thu, May 08, 2014 at 02:56:05AM -0700, Tvrtko Ursulin wrote:
> > > > > 
> > > > > Hi Brad,
> > > > > 
> > > > > On 04/28/2014 04:22 PM, bradley.d.volkin at intel.com wrote:
> > > > > [snip]
> > > > > > -	BUG_ON(!validate_cmds_sorted(ring));
> > > > > > +	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
> > > > > >   	BUG_ON(!validate_regs_sorted(ring));
> > > > > > +
> > > > > > +	BUG_ON(init_hash_table(ring, cmd_tables, cmd_table_count));
> > > > > 
> > > > > Is a BUG_ON a bit harsh since the above fails only on ENOMEM condition?
> > > > > 
> > > > > If the concern is not allowing any command execution if parser setup has 
> > > > > failed, it would be nicer to the system as whole to just keep rejecting 
> > > > > everything, but let the rest of the kernel live to enable debug or whatever?
> > > > > 
> > > > > I know it won't happen almost ever so it's a minor point really. I just 
> > > > > dislike actively hosing the whole system if it is avoidable.
> > > > 
> > > > Hi Tvrtko,
> > > > 
> > > > I agree that a BUG_ON might be harsh here. I suppose we could log an
> > > > error and disable the command parser. Most command buffers would
> > > > still go through fine but HW parsing would reject some that the SW
> > > > parser might otherwise allow. That could be a bit odd if we ever did
> > > > get a failure - apps/functionality that worked the last time I booted
> > > > suddenly don't this time. The issue would be in the log though.
> > > > 
> > > > I don't have a strong preference on this. Whatever people prefer.
> > > 
> > > If the memory allocation fails there's probably not much point in
> > > trying to limp along and continue the driver init. So just pass error
> > > up and let the caller deal with it.
> > > 
> > > Looking at the error paths up from ring init, we probably leak a ton
> > > of junk but at least the kernel should remain otherwise operational.
> > 
> > Passing the error up is probably a good option. There was a recent change
> > to mark the GPU as wedged if ring init fails, which seems like a reasonable
> > enough response to parser init failure. I'll have to look at the cleanup
> > paths. The per-ring parser init can probably move around to avoid excessive
> > additional cleanup code on failure paths.
> 
> With the latest ring init rework from Chris we should just fail the gem
> side of things and declare the gpu terminally wedged. But the modeset side
> should keep on working, which means users will be able to take a look at
> dmesg instead of a black screen (since that's what you get from BUG_ON in
> driver load often).
> 
> Working modeset with broken gpu is infinitely better than a black screen,
> so no BUG_ON here please. At least if the error code isn't much more than
> a simply return, we always have to balance bugs in the error code with
> better abilities to limp on long enough for a useful bug report.

Ok. I sent a v2 and in that I have i915_cmd_parser_init_ring() pass the
error returned by init_hash_table() back to the caller, who passes it up,
etc. When I looked more closely at Chris' change, the wedge behavior only
happens if the return is -EIO, which it will not be in this case. The only
failure from init_hash_table() is -ENOMEM so if we specifically want the
wedging behavior (vs whatever would normally happen if ring init fails with
-ENOMEM) then we need to modify the return from i915_cmd_parser_init_ring()
to be -EIO. Probably also add a comment explaining why we do the modification.

I'm fine either way. The feedback earlier seemed to be that if we have an
allocation failure we probably can't do much else anyhow, hence just returning
the -ENOMEM.

Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch