[Intel-gfx] [PATCH 06/18] drm/i915: Defer default hardware context initialisation until first open

Tue Mar 31 06:11:38 PDT 2015

On Mon, Mar 30, 2015 at 12:11:51PM -0700, Yu Dai wrote:
> 
> 
> On 03/27/2015 01:45 AM, Daniel Vetter wrote:
> >On Thu, Mar 26, 2015 at 12:41:13PM -0700, yu.dai at intel.com wrote:
> >> From: Dave Gordon <david.s.gordon at intel.com>
> >>
> >> In order to fully initialise the default contexts, we have to execute
> >> batchbuffer commands on the GPU engines. But we can't do that until any
> >> required firmware has been loaded, which may not be possible during
> >> driver load, because the filesystem(s) containing the firmware may not
> >> be mounted until later.
> >>
> >> Therefore, we now allow the first call to the firmware-loading code to
> >> return -EAGAIN to indicate that it's not yet ready, and that it should
> >> be retried when the device is first opened from user code, by which
> >> time we expect that all required filesystems will have been mounted.
> >> The late-retry code will then re-attempt to load the firmware if the
> >> early attempt failed.
> >
> >We've tried a similar approach a while back and it doesn't work well in
> >conjunction with rps - the hw tends to fall over if the context state
> >isn't properly initialized when going into rc6.
> I believe patch 18/18 of this series is to notify GuC when RC6 is on and
> off. I don't know much details of implementation inside firmware, but I
> believe GuC will take care of it properly.
> >Why exactly can't we load that firmware right at boot-up, or at least
> >stall correctly until it's there?
> Dave G. wrote a very good comment about this. Sorry I lost it during patch
> squashing. Here is a copy of it. I will amend the comment in next version.
> 
> The GuC loader uses an asynchronous thread to fetch the firmware image
> (aka "binary blob") from a file. This thread has then had to wait for
> the mainline driver loading code to complete GEM initialisation before
> it can convert the blob into a GEM object and transfer it to the GuC's
> memory.
> 
> Unfortunately, with this scheme, the GuC loading was occurring*after*
> the internally-generated batches used to initialise contexts had already
> been submitted (using direct access to the ELSP, since the GuC wasn't
> ready).
> 
> In addition, the one-way synchronisation mechanism resulted in the
> firmware image being transfeered to the GuC at an indeterminate time
> (just "sometime" after the mainline thread releases the device's
> struct_mutex), with consequent confusion.
> 
> Rather than complicate the loader further by adding a second sync point
> and arranging the handover of the struct_mutex from one thread to the
> other, this commit reverses the synchronisation so that the mainline
> thread waits for the asynchronous thread rather than vice versa. The
> firmware loader now only has to save the reference to the blob and
> signal a completion; the mainline thread can then continue with the rest
> of the loading process when it catches up. The result is a much simpler
> (and fully deterministic) process for loading the GuC firmware.

Still sounds rather fragile. Chris suggestion to just not submit anything
until guc firmware is loaded and then kick of a submission of all
outstanding requests sounds much better. That would naturally insert
minimal amounts of sync points everywhere you need them using the normal
GEM execution tracking.

The only other change we'd need (I think at least, didn't check details)
is to wait for the hw context init asynchronously in the rps worker.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch