[Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support

Wed Jun 24 03:29:16 PDT 2015

On Fri, Jun 19, 2015 at 09:43:11AM +0100, Dave Gordon wrote:
> On 18/06/15 15:49, Daniel Vetter wrote:
> > On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
> >> On 17/06/15 13:05, Daniel Vetter wrote:
> >>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> >>>> Current devices may contain one or more programmable microcontrollers
> >>>> that need to have a firmware image (aka "binary blob") loaded from an
> >>>> external medium and transferred to the device's memory.
> >>>>
> >>>> This file provides generic support functions for doing this; they can
> >>>> then be used by each uC-specific loader, thus reducing code duplication
> >>>> and testing effort.
> >>>>
> >>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
> >>>> Signed-off-by: Alex Dai <yu.dai at intel.com>
> >>>
> >>> Given that I'm just shredding the synchronization used by the dmc loader
> >>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> >>> copy-paste for similar sounding but slightly different things doesn't
> >>> sound awful to me. And the critical bit in all the firmware loading I've
> >>> seen thus far is in synchronizing the loading with other operations,
> >>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
> >>> dev->struct_mutex.
> >>> -Daniel
> >>
> >> It's precisely because it's in some sense "trivial-but-tricky" that we
> >> should write it once, get it right, and use it everywhere. Copypaste
> >> /does/ sound awful; I've seen how the code this was derived from had
> >> already been cloned into three flavours, all different and all wrong.
> >>
> >> It's a very simple abstraction: one early call to kick things off as
> >> early as possible, no locking required. One late call with the
> >> struct_mutex held to complete the synchronisation and actually do the
> >> work, thus guaranteeing that the transfer to the target uC is done in a
> >> controlled fashion, at a time of the caller's choice, and by the
> >> driver's mainline thread, NOT by an asynchronous thread racing with
> >> other activity (which was one of the things wrong with the original
> >> version).
> > 
> > Yeah I've seen the origins of this in the display code, and that code gets
> > the syncing wrong. The only thing that one has do to is grab a runtime pm
> > reference for the appropriate power well to prevent dc5 entry, and release
> > it when the firmware is loaded and initialized.
> 
> Agreed.
> 
> > Which means any kind of firmware loader which requires/uses
> > dev->struct_mutex get stuff wrong and is not appropriate everywhere.
> 
> BUT, the loading of the firmware into any uC MUST be done in a
> controlled manner i.e. at a time when no other thread is touching the
> h/w. Otherwise the f/w load and whatever else is concurrently accessing
> the h/w could in some cases interfere disastrously. Examples of
> interference might be:
> 
> * interleaved accesses to the ELSP (in the case of the GuC)
> * incorrect handover of power management (DMC, GuC)
> * erroneous management of forcewake state
> 
> In general the f/w that is just starting on the uC may have certain
> expectations about the initial state of the h/w, which may not be met if
> other threads are accessing various bits of h/w while the uC is booting up.
> 
> So we absolutely need to guarantee that the f/w load is done by a thread
> which has exclusive ownership of any bit of the h/w that the f/w is
> going to make assumptions about. With the current locking structure of
> the driver, that means holding the struct_mutex (it shouldn't really,
> there should be a separate mutex for h/w register access vs.
> driver-private data structures, but there isn't).

If you really need this guarantee (and I seriously hope not) then the only
option is a synchronous firmware load at driver init _before_ we launch
any of the asynchronous setup code. And there is already a lot of that,
and we're adding more all the time.

What I expect we need is synchronization of just the revelant part with
the firmware loading, which necessarily needs to be somewhat async to be
able to support cros/android requirements. And yes that needs to be done
in a controlled manner, but most likely we need very specific solutions
for the problem at hand. Unconditionally holding dev->struct_mutex isn't
that solution.

The other problem with dev->struct_mutex is that it's a giantic lock with
ill defined coverage and semantics. It's imo the biggest piece of
technical debt we carry around in i915.ko, and we pay the price for that
dearly&daily. Which means that since a few years any kind of code
which extended dev->struct_mutex to anything not clearly core gem data
structures was rejected.

> >> We should convert the DMC loader to use this too, so there need be only
> >> one bit of code in the whole driver that needs to understand how to use
> >> completions to get correct handover from a free-running no-locks-held
> >> thread to the properly disciplined environment of driver mainline for
> >> purposes of programming the h/w.
> > 
> > Nack on using this for dmc, since I want them to convert it to the above
> > synchronization, since that's how all the other async power initialization
> > is done.
> > 
> > Guc is different since we really must have it ready for execbuf, and for
> > that usecase a completion at drm_open time sounds like the right thing.
> > 
> > As a rule of thumb for refactoring and share infastructure we use the
> > following recipe in drm:
> > - first driver implements things as straightforward as possible
> > - 2nd user copypastes
> > - 3rd one has the duty to figure out whether some refactoring is in order
> >   or not.
> > Imo that approach leads a really good balance between avoiding
> > overengineering and having maintainable code.
> 
> We've already been through these phases; the code has already been
> cloned twice (and then changed, but not enough to fix the problems with
> the original) and then when I found the issues with the GuC loader and
> noticed the hilarious ownership dance it was doing during handover I
> realised it was time to fix it in one place rather than several, and
> posted a patchset to the internal mailing list on 2015-02-24 with this
> commentary:
> 
> > The GuC loader uses an asynchronous thread to fetch the firmware image
> > (aka "binary blob") from a file and load it into the GuC's memory.
> > Unfortunately the GuC loading occurs *after* the internally-generated
> > batches used to initialise contexts have already been submitted using
> > direct access to the ELSP.  Also, the firmware ends up being loaded at
> > an indeterminate time, with consequent potential for confusion in the
> > switchover from ELSP- to GuC-based submission.
> > 
> > This patch series therefore reorganises the GuC loader to ensure that
> > the loading process occurs both early enough and at a well-defined
> > point in the sequence of operations during driver initialisation,
> > specifically *before* any batches are submitted to hardware.
> > 
> > [PATCH 1/3] GuC: reorganise source before rewriting this code
> > [PATCH 2/3] GuC: load firmware image from main thread
> > [PATCH 3/3] GuC: update names & comments ("load" => "fetch")
> 
> followed by [PATCH 0/2] unify and tidy firmware loading code
> on 2015-03-02.
> 
> For the DMC module, the basic conversion process is to separate
> intel_csr_load_program() from finish_csr_load(). The latter would remain
> as the callback in the async thread loading process that has to validate
> the loaded image; the former would then become the callback for the
> synchronous post-handover transfer of the image to the h/w.
> 
> BTW, the existing DMC loader probably won't work on Android :(

Yeah I completely missed out on this fun since I presumed that firmware
loading is easy and simple. And if you look around on other drm drivers it
indeed is, they all use a synchronous request_firmware and if the firmware
isn't there, they just fall over (fully in the case of radeon.ko,
partially in the case of nouveau.ko since they have all the support in
place for handling a kms-only accel-less gpu in userspace anyway, like we
do). But for a bunch of reasons (afaik it's "you can't include a blob in a
gpl-ed kernel image" we need async firmware loading for cros&android).

That leaves us with a situation where we should have done a special design
discussion about asynchronous firmware, but somehow failed do to that.
Which leaves us in a very ugly position.

I talked with a bunch of people over the past few days to figure out how
this is supposed to work and also figure out why it's being done like that
today. I think I have a reasonable good plan for moving forward too. I'll
start a new top-level thread here to discuss this.

Thanks, Daneil
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch