[Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support

Fri Jun 19 01:43:11 PDT 2015

On 18/06/15 15:49, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
>> On 17/06/15 13:05, Daniel Vetter wrote:
>>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
>>>> Current devices may contain one or more programmable microcontrollers
>>>> that need to have a firmware image (aka "binary blob") loaded from an
>>>> external medium and transferred to the device's memory.
>>>>
>>>> This file provides generic support functions for doing this; they can
>>>> then be used by each uC-specific loader, thus reducing code duplication
>>>> and testing effort.
>>>>
>>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>>>> Signed-off-by: Alex Dai <yu.dai at intel.com>
>>>
>>> Given that I'm just shredding the synchronization used by the dmc loader
>>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
>>> copy-paste for similar sounding but slightly different things doesn't
>>> sound awful to me. And the critical bit in all the firmware loading I've
>>> seen thus far is in synchronizing the loading with other operations,
>>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
>>> dev->struct_mutex.
>>> -Daniel
>>
>> It's precisely because it's in some sense "trivial-but-tricky" that we
>> should write it once, get it right, and use it everywhere. Copypaste
>> /does/ sound awful; I've seen how the code this was derived from had
>> already been cloned into three flavours, all different and all wrong.
>>
>> It's a very simple abstraction: one early call to kick things off as
>> early as possible, no locking required. One late call with the
>> struct_mutex held to complete the synchronisation and actually do the
>> work, thus guaranteeing that the transfer to the target uC is done in a
>> controlled fashion, at a time of the caller's choice, and by the
>> driver's mainline thread, NOT by an asynchronous thread racing with
>> other activity (which was one of the things wrong with the original
>> version).
> 
> Yeah I've seen the origins of this in the display code, and that code gets
> the syncing wrong. The only thing that one has do to is grab a runtime pm
> reference for the appropriate power well to prevent dc5 entry, and release
> it when the firmware is loaded and initialized.

Agreed.

> Which means any kind of firmware loader which requires/uses
> dev->struct_mutex get stuff wrong and is not appropriate everywhere.

BUT, the loading of the firmware into any uC MUST be done in a
controlled manner i.e. at a time when no other thread is touching the
h/w. Otherwise the f/w load and whatever else is concurrently accessing
the h/w could in some cases interfere disastrously. Examples of
interference might be:

* interleaved accesses to the ELSP (in the case of the GuC)
* incorrect handover of power management (DMC, GuC)
* erroneous management of forcewake state

In general the f/w that is just starting on the uC may have certain
expectations about the initial state of the h/w, which may not be met if
other threads are accessing various bits of h/w while the uC is booting up.

So we absolutely need to guarantee that the f/w load is done by a thread
which has exclusive ownership of any bit of the h/w that the f/w is
going to make assumptions about. With the current locking structure of
the driver, that means holding the struct_mutex (it shouldn't really,
there should be a separate mutex for h/w register access vs.
driver-private data structures, but there isn't).

>> We should convert the DMC loader to use this too, so there need be only
>> one bit of code in the whole driver that needs to understand how to use
>> completions to get correct handover from a free-running no-locks-held
>> thread to the properly disciplined environment of driver mainline for
>> purposes of programming the h/w.
> 
> Nack on using this for dmc, since I want them to convert it to the above
> synchronization, since that's how all the other async power initialization
> is done.
> 
> Guc is different since we really must have it ready for execbuf, and for
> that usecase a completion at drm_open time sounds like the right thing.
> 
> As a rule of thumb for refactoring and share infastructure we use the
> following recipe in drm:
> - first driver implements things as straightforward as possible
> - 2nd user copypastes
> - 3rd one has the duty to figure out whether some refactoring is in order
>   or not.
> Imo that approach leads a really good balance between avoiding
> overengineering and having maintainable code.
> -Daniel

We've already been through these phases; the code has already been
cloned twice (and then changed, but not enough to fix the problems with
the original) and then when I found the issues with the GuC loader and
noticed the hilarious ownership dance it was doing during handover I
realised it was time to fix it in one place rather than several, and
posted a patchset to the internal mailing list on 2015-02-24 with this
commentary:

> The GuC loader uses an asynchronous thread to fetch the firmware image
> (aka "binary blob") from a file and load it into the GuC's memory.
> Unfortunately the GuC loading occurs *after* the internally-generated
> batches used to initialise contexts have already been submitted using
> direct access to the ELSP.  Also, the firmware ends up being loaded at
> an indeterminate time, with consequent potential for confusion in the
> switchover from ELSP- to GuC-based submission.
> 
> This patch series therefore reorganises the GuC loader to ensure that
> the loading process occurs both early enough and at a well-defined
> point in the sequence of operations during driver initialisation,
> specifically *before* any batches are submitted to hardware.
> 
> [PATCH 1/3] GuC: reorganise source before rewriting this code
> [PATCH 2/3] GuC: load firmware image from main thread
> [PATCH 3/3] GuC: update names & comments ("load" => "fetch")

followed by [PATCH 0/2] unify and tidy firmware loading code
on 2015-03-02.

For the DMC module, the basic conversion process is to separate
intel_csr_load_program() from finish_csr_load(). The latter would remain
as the callback in the async thread loading process that has to validate
the loaded image; the former would then become the callback for the
synchronous post-handover transfer of the image to the h/w.

BTW, the existing DMC loader probably won't work on Android :(

.Dave.