[Intel-gfx] [PATCH 3/6] drm/i915/huc: Add HuC fw loading support

Dave Gordon david.s.gordon at intel.com
Thu Jul 14 14:08:41 UTC 2016


On 13/07/16 13:48, Daniel Vetter wrote:
> On Thu, Jun 23, 2016 at 02:52:41PM +0100, Peter Antoine wrote:
>> On Thu, 23 Jun 2016, Dave Gordon wrote:
>>> On 22/06/16 09:31, Daniel Vetter wrote:
>>> No, the *correct* fix is to unify all the firmware loaders we have.
>>> There should just be ONE piece of code that can be used to fetch and
>>> load ANy firmware into ANY auxiliary microcontroller. NOT one per
>>> microcontroller, all different -- that way lies madness.
>>>
>>> We already had a unified loader for the HuC and GuC a year ago, but IIRC
>>> the party line then was "just make it (GuC) specific, then copypaste it
>>> for the second uC, and when we've got three versions we'll have learnt
>>> how we really want a unified loader to behave."
>>>
>>> Well. here's the copypaste, and we already have a different loader for
>>> the DMC/CSR, so it must be time for (re-)unification.
>>>
>>> .Dave.
>>
>> Just to add, if you uc_fw_fetch() has an error code you will still have to
>> remember the state of the fetch or at each reset/resume/etc... or you will
>> have to try the firmware load again and that can take a long time. So the
>> state will have to be re-instated.
>>
>> Seeing this code was written with the given goals and were written in the
>> same vane as code that was deemed acceptable, it seems weird at this late
>> stage to change the design goals.
>>
>> Note: this is the third time that these patches have been posted and were
>> only rejected (as far as I know) due to no open-source user. Which there is
>> now, and is why I have reposted these patches.
>
> I never liked the guc firmware code, but figure for one copy it's not
> worth fighting over. Adding more copies (or perpetuating the design by
> making it generic) isn't what I'm looking for.

*You* asked for more copies, back when we proposed a single unified 
solution last year. We already had a *single* GuC+HuC loader which could 
also have been extended to support the DMC as well, but at the time you 
wanted a GuC-specific version -- and by implication, a separate HuC 
loader -- *in addition to* the DMC loader.

 > Firmware loading shouldn't be that complicated, really.

Maybe it shouldn't be, and maybe it isn't -- you may not be seeing how 
simple this code actually is. Fetch firmware, validate it, save it in a 
GEM object; later, DMA it to the h/w; at each stage keep track of status 
so we know what has been done and what is still to do (or redo, during 
reset).

Any complications are because the h/w (e.g. write-once memory) makes 
them necessary, or artefacts of the GEM object system, or because of the 
driver's byzantine sequence of operations during 
load/reset/suspend/resume/unload.

> The unified firmware loader is called request_firmware. If that's not good
> enough, pls fix the core function, not paper code over in i915.

That's exactly the function we call. Then we have to validate and save 
the blob. And remember that we've done so.

> In that regard DMC/CSR is unified, everything else isn't yet.

Unified with what? Maybe the "DMC" is unified with the "CSR" -- which 
AFAIK are the same thing -- and the software just randomly uses both 
names to maximise confusion?

	if (HAS_CSR(dev)) {
		struct intel_csr *csr = &dev_priv->csr;

		err_printf(m, "DMC loaded: %s\n",
			   yesno(csr->dmc_payload != NULL));
		err_printf(m, "DMC fw version: %d.%d\n",
			   CSR_VERSION_MAJOR(csr->version),
			   CSR_VERSION_MINOR(csr->version));
	}
...
	if (!IS_GEN9(dev_priv)) {
		DRM_ERROR("No CSR support available for this platform\n");
		return;
	}
	if (!dev_priv->csr.dmc_payload) {
		DRM_ERROR("Tried to program CSR with empty payload\n");
		return;
	}

And according to the comments in intel_csr.c -- but not the code --

/*
  * Firmware loading status will be one of the below states:
  * FW_UNINITIALIZED, FW_LOADED, FW_FAILED.
  *
  * Once the firmware is written into the registers status will
  * be moved from FW_UNINITIALIZED to FW_LOADED and for any
  * erroneous condition status will be moved to FW_FAILED.
  */

So I don't think you should hold this code up as a masterpiece of 
"unified" design -- which in any case you argued against last year, when 
we presented a unified loader. Specifically, you said, "In my experience 
trying to extract common code at all costs is harmful way too often."

Also, the approach taken in the DMC loader -- which appears to have been 
copypasted from a /very early/ version of the GuC loader, before I fixed 
the async-load problems -- just wouldn't work for the HuC/GuC, where the 
kernel needs to know when the firmware load has been completed so that 
it can start sending work to the GuC. The DMC loader only works because 
it doesn't actually matter when (or if) it's loaded. It would be 
*completely wrong* to load the HuC/Guc that way.

> Iirc the big issue is delayed firmware loading for built-in i915 and fw
> only available later on. This is an open issue in request_firmware() since
> years, and there's various patches floating around. If the problem is that
> Greg KH doesn't consider those patches, I can help with that. But not
> pushing the core fix forward isn't acceptable imo.

We're not addressing that issue at all here, for Linux we expect the 
firmware will be in the ramdisk so it's available immediately. Android 
has an issue with that, but we already have solutions there.

> Once that fix is landed
> we can treat request_firmware as reliable (it might take a while, hence
> must be run in an async work like DMC loading), with no need to ever retry
> anything.

No, it *can't* run "in an async work like DMC loading" -- that was 
exactly what was wrong with the original GuC loader before I got 
involved. The firmware was delivered to the GuC h/w asynchronously, 
*after* the kernel had already started sending work to the engines. That 
was utterly bogus!

*This* version is fully synchronous; the kernel calls for the firmware 
using request_firmware() (and waits until it's succeeded or failed), and 
later asks for the firmware to be loaded into the GuC (again, waiting 
until it has succeeded or failed).

> If fw loading fails we can just mark the entire render part of
> the gpu as dead by injecting the equivalent of a non-recoverable hang
> (async setup) or failing engine init with -EIO (if this is still
> synchronous, which I don't expect really).

Which is just what we do. This patchset is essentially just adding HuC 
loading to the existing GuC loading process, reusing as much as possible 
of the same code.

> If there's another reason for this complexity, please explain since I'd
> like to understand why we need this.
> -Daniel

Less complexity than you think.

   531  1698 14255 drivers/gpu/drm/i915/intel_csr.c
   751  2757 22889 drivers/gpu/drm/i915/intel_guc_loader.c

Much the same size, to within a (binary) order-of-magnitude. Obviously 
the GuC code *necessarily* does more because the f/w interfaces are much 
more complex (ctx pool, ADS, etc); but those are not optional. And the 
GuC code has to deal with reloading after RC6 or GPU reset, which AFAICT 
the DMC doesn't.

.Dave.


More information about the Intel-gfx mailing list