[Intel-gfx] [PATCH 00/15] Batch submission via GuC

Wed Jun 24 05:16:05 PDT 2015

Hi all,

Ok top post on this firmware loading problem. First to note imo is that
off the over 80 request_firmware calls in drm drivers only one (outside of
i915) uses request_firmware_nowait. And none of the other have async
firmware loading implemented with their own driver. That was kinda my
background for all this and why I missed the entire design implications -
I expected that the entirety of the firmware loading problem would boil
down to one request_firmware() call and that's it.

Obviously things are not quite as simple, so I started to dig around.

Looking at the big drivers (nouveau and radeon/amdgpu) they only ever try
to load the firmware once and if it's not there they either fail driver
init outright (amd) or disable parts of it (nouveau can e.g. disable
accelaration). None of them ever bothers with a 2nd attempt at firmware
loading.

I confirmed with systemd/udev developers that synchronous firmware loading
is indeed how modern systems are expected to work. Which also implies that
if you want to load a driver from the initrd, then the firmware must be in
the initrd (which modern linux distros ensure). And if you build the
driver into the kernel then the firmware also must be built-in. Delayed
loading of firmware is considered broken userspace.

The reason for this design switch is that with the delayed loading done
years ago random timeouts when the firmware would never show up routinely
showed up when booting. And on a general-purpose distro there's no way to
know whether the firmware isn't loaded yet because userspace isn't ready
or because it's simply not there. Hence the switch over to immediate
failure and requiring the firmware to be present when the driver is
initializing.

So from that pov the only (and really the only thing) we need for enabling
firmware loading on a normal linux distro is a synchronous
request_firmware call in the driver load function somewhere. This might
need some adjustements to the initrd builders in userspace, but that's all
we need in the kernel.

This is also all we need to enable internal testing, since as long as you
don't ship the resulting image anywhere you can always built-in the
firmware, which also resolves any "firmware isn't there yet" bugs.

Now if you want delayed firmware loading there's still support for that,
hidden behind the FW_LOADER_USER_HELPER_FALLBACK kernel config. Which
because of the unfixable timeouts is disabled by default, and also no
longer supported on the userspace side in modern distros. But the uevent
handler is only 3 lines of bash, so not a big deal to wire up.

Except that for hysterical raisins (one special snowflake driver needs it
like that, and the patch author decided it's easier to have an inconsisten
interface than fix up that driver) this userspace fallback stuff only
works with the request_firmware interface, and _not_ with
request_firmware_nowait. Despite that both run essentially the same code,
_nowait simply wraps it up in an async work for convenience.

That means to be able to use this we need our own async work item and use
request_firmware in there. Converting the implicit async work to an
explicit one is better anyway since it makes it more likely that patch
authors and reviewers don't forget to correctly synchronize/flush with
that async work. DMC loader is a perfect example of why implicit async
callbacks are evil in kernel code.

The problem now is that doing parts of gem init asynchronously is really
scary, and we never had a proper design discussion about that. And
unsurprisingly a lot of the review comments center around that. Otoh we
don't want to block merging guc-based cmd submission just on that.
Therefore my proposal is to rip out the current firmwareloader and
async respectively re-loading code with all it's complications fromm the
current patch series, and replace it with a simple&synchronous
request_firmware. Also no fallbacks to keep everything simple. The actual
guc enabling itself should carry over unchanged.

With that we can unblock everyone (at least internally) and figure out a
proper solutions for async gem init. To kickstart the design discussion
for that a few general and gem-specific things I think are important:

- Locking, async work and coordination should imo all be done explicitly
  in the code by default. Otherwise chances are way to high that critical
  issues get overlooked in the detailed code review. That means an
  explicit schedule_work and for synchronization explicit flush_work. No
  hiding them in frameworks and helper libraries.

- dev->struct_mutex is scary and shouldn't ever be extended. New stuff
  needs new locking. Might not apply here for guc specifically, but just a
  general principle.

- I'm absolutely scared off platform/feature differences in how the
  overall driver load sequence works. Which means if we need async gem
  init for guc, then we will do async gem init for everyone. That means
  improved test coverage for corner-cases and leass headaches to keep the
  unified driver afloat since there's only one way to do things. Of course
  this only applies to hw features actually shared between platforms, e.g.
  if we do async dmc loading that doesn't mean we need to do async loading
  of some other unrelated display feature. GuC is special here because
  cmd submission is the entire point of gem and hence has a big impact.

- If we do async init at driver load time, then we'll also do the same on
  resume and gpu reset, for the same reasons as the cross-platform
  consistency. For GuC async gem init on gpu reset might not make that
  much sense, since gpu reset is pretty much just gem (re) init and it's
  already an async worker. But async gem init on resume definitely makes
  sense.

- A common confusion I see with concurrent code is mixing up data
  consistency (resolved with locking) and handling synchronization and
  lifetimes (resolved with refcounting, completions and other
  synchronization design patterns). There's grey areas and exceptions, but
  I'm fairly picky on this topic. Also a big no-go is rolling your own
  synchronization - we have two of these in i915 (pageflip completion and
  reset handling) and both are still suspect or outright broken after
  years of banging heads against these walls.

- For async gem init it's tempting to repurpose the gpu reset code with
  all the existing synchronization: We'd only need to set the
  RESET_IN_PROGRESS flag synchronously and then run a gpu reset after the
  firmware is loaded, all from the async worker. The problem with that is
  that we still have issues with rogue -EIO escaping into the modeset code
  while a reset is pending, and that would not be good since we want
  modesets to work right away ofc. Also most modeset paths need to stall
  for gpu resets synchronously, which again isn't what we want.

- We should be able to reuse at least the locking scheme gpu reset relies
  on. And since gpu resets run gem_init_hw we should be able to run that
  asynchronously, with some care.

- Beside synchronizing at open time like you're doing now, or in the guts
  of the lrc ctx sumbission code like Chris suggesteds the now merged
  anti-olr also gives us the possibility to sync at request creation. The
  problems I see with that is locking inversions between flush_work and
  dev->struct_mutex. But we could fix that with the ioctl restart handling
  we already have. But that has the same caveats as the -EIO handling -
  our modeset code doesn't cope well with -EAGAIN.

  For me the exact synchronization point with the async gem init work is
  the really big open here.

Anyway that's my thoughts on firmware loading and all that, I hope this is
useful as a basis to get this merged in an efficient way.

Yours, Daniel

On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote:
> This patch series enables command submission via the GuC. In this mode,
> instead of the host CPU driving the execlist port directly, it hands
> over work items to the GuC, using a doorbell mechanism to tell the GuC
> that new items have been added to its work queue. The GuC then dispatches
> contexts to the various GPU engines, and manages the resulting context-
> switch interrupts. Completion of a batch is however still signalled to
> the CPU; the GuC is not involved in handling user interrupts.
> 
> There are three subsequences within the patch series:
> 
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Embedded microcontroller (uC) firmware loading support
> 
> These first two patches provide a generic framework for fetching the
> firmware that may be required by any embedded microcontroller from a
> file, using an asynchronous thread so that driver initialisation can
> continue while the firmware is being fetched. It is hoped that this
> framework is sufficiently general that it can be used for all curent
> and future microcontrollers.
> 
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status
> 
> These four patches complete the GuC loader. At this point in the sequence
> we can load and activate the GuC firmware, but not submit any batches
> through it. (This is nonetheless a potentially useful state, as the GuC
> can do other useful work even when not handling batch submissions).
> 
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Move execlists defines from .c to .h
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Integrate GuC-based command submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   Documentation/drm: kerneldoc for GuC
>   drm/i915: Enable GuC submission, where supported
> 
> In the final section, we implement the GuC submission mechanism, link
> it into the (execlist-based) submission path, and finally enable it
> (on supported platforms). On platforms where there is no GuC, or if
> the GuC firmware cannot be found or is invalid, batch submission will
> revert to using the execlist mechanism directly.
> 
> The GuC firmware itself is not included in this patchset; it is or will
> be available for download from https://01.org/linuxgraphics/downloads/
> This driver works with and requires GuC firmware revision 3.x. It will
> not work with any firmware version 1.x, as the GuC protocol in those
> revisions was incompatible and is no longer supported.
> 
> Prerequisites: GuC submission will expose existing inadequacies in
> some of the existing codepaths unless certain other patches are applied.
> In particular we will require some version of Michel Thierry's patch
>   drm/i915/lrc: Update PDPx registers with lri commands
> (because the GuC support light-restore, which execlist mode doesn't),
> and my own 
>   drm/i915: Allocate OLR more safely (workaround until OLR goes away)
> because otherwise the changed timing means that there is an increased
> risk of writing to a ringbuffer that is not currently pinned & mapped,
> causing a kernel OOPS.
> 
> Alex Dai (10):
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Integrate GuC-based command submission
>   Documentation/drm: kerneldoc for GuC
> 
> Dave Gordon (5):
>   drm/i915: Embedded microcontroller (uC) firmware loading support
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   drm/i915: Enable GuC submission, where supported
> 
> Michael H. Nguyen (1):
>   drm/i915: Move execlists defines from .c to .h
> 
> Ben Widawsky
> Vinit Azad
>   created the original versions on which some of these patches are based.
> 
>  Documentation/DocBook/drm.tmpl             |   19 +
>  drivers/gpu/drm/i915/Makefile              |    7 +
>  drivers/gpu/drm/i915/i915_debugfs.c        |  109 +++-
>  drivers/gpu/drm/i915/i915_dma.c            |    4 +
>  drivers/gpu/drm/i915/i915_drv.h            |   17 +
>  drivers/gpu/drm/i915/i915_gem.c            |   39 +-
>  drivers/gpu/drm/i915/i915_gem_context.c    |   52 +-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  873 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_irq.c            |   48 ++
>  drivers/gpu/drm/i915/i915_params.c         |    9 +
>  drivers/gpu/drm/i915/i915_reg.h            |   92 ++-
>  drivers/gpu/drm/i915/intel_guc.h           |  184 ++++++
>  drivers/gpu/drm/i915/intel_guc_api.h       |  227 ++++++++
>  drivers/gpu/drm/i915/intel_guc_loader.c    |  498 ++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c           |  128 ++--
>  drivers/gpu/drm/i915/intel_lrc.h           |    8 +
>  drivers/gpu/drm/i915/intel_uc_loader.c     |  312 ++++++++++
>  drivers/gpu/drm/i915/intel_uc_loader.h     |   82 +++
>  18 files changed, 2607 insertions(+), 101 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c
>  create mode 100644 drivers/gpu/drm/i915/intel_guc.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h
> 
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch