[PATCH 00/47] GuC submission support

Matthew Brost matthew.brost at intel.com
Tue Oct 26 15:51:17 UTC 2021


On Tue, Oct 26, 2021 at 11:59:35AM +0300, Joonas Lahtinen wrote:
> Quoting Matthew Brost (2021-10-25 18:15:09)
> > On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
> > > Quoting Matthew Brost (2021-10-22 19:42:19)
> > > > On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
> > > > > Hi Matt & John,
> > > > > 
> > > > > Can you please queue patches with the right Fixes: references to convert
> > > > > all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS
> > > > > protection for now. Please do so before next Wednesday so we get it
> > > > > queued in drm-intel-next-fixes.
> > > > > 
> > > > 
> > > > Don't we already do that? I checked i915_trace.h and every tracepoint I
> > > > added (intel_context class, i915_request_guc_submit) is protected by
> > > > LOW_LEVEL_TRACEPOINTS.
> > > > 
> > > > The only thing I changed outside of that protection is adding the guc_id
> > > > field to existing i915_request class tracepoints.
> > > 
> > > It's the first search hit for "guc" inside the i915_trace.h file :)
> > > 
> > > > Without the guc_id in
> > > > those tracepoints these are basically useless with GuC submission. We
> > > > could revert that if it is a huge deal but as I said then they are
> > > > useless...
> > > 
> > > Let's eliminate it for now and restore the tracepoint exactly as it was.
> > > 
> > 
> > Don't really agree - let's render tracepoints to be useless? Are
> > tracepoints ABI? I googled this and couldn't really find a definie
> > answer. If tracepoints are ABI, then OK I can revert this change but
> > still this is a poor technical decision (tracepoints should not be ABI).
> 
> Thats a very heated discussion in general. But the fact is that if
> tracepoint changes have caused regressions to applications, they have
> been forced to be remain untouched. You are free to raise the discussion
> with Linus/LKML if you feel that should not be the case. So the end
> result is that tracepoints are effectively in limbo, not ABI unless some
> application uses them like ABI.
> 
> Feel free to search the intel-gfx/lkml for "tracepoints" keyword and look
> for threads with many replies. It's not that I would not agree, it's more
> that I'm not in the mood for repeating that discussion over and over again
> and always land in the same spot.
> 
> So for now, we don't add anything new to tracepoints we can't guarantee
> to always be there untouched. Similarly, we don't guarantee any of them
> to remain stable. So we try to be compatible with the limbo.
> 
> I'm long overdue waiting for some stable consumer to step up for the
> tracepoints, so we can then start discussion what would actually be the
> best way of getting that information out for them. In ~5 years that has
> not happened.
> 
> > > If there is an immediate need, we should instead have an auxilary tracepoint
> > > which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the
> > > information of the basic tracepoint.
> > > 
> > 
> > Regardless of what I said above, I'll post 2 patches. The 1st just
> > remove the GuC, the 2nd modify the tracepoint to include guc_id if
> > LOW_LEVEL_TRACEPOINTS is defined.
> 
> Thanks. Let's get a patch merged which simply drops the guc_id for now
> to unblock things.
> 
> For the second, an auxilary tracepoint will be preferred instead of
> mutating the existing one (regardless of the LOW_LEVEL_TRACEPOINTS).
> 
> I only noticed a patch that mutates the tracepoints, can you
> double-check sending the first patch?

Sorry for the double reply - missed this one in the first.

I changed my plans / mind after I send the original email. I only sent a
patch which includes guc_id when LOW_LEVEL_TRACEPOINTS is enabled. That
is the bear minimum I live with. Without it any time there is a problem
results in hacking the kernel. I can't do that. This is a good
compromise.

Matt

> 
> Regards, Joonas
> 
> > 
> > > For the longer term solution we should align towards the dma fence
> > > tracepoints. When those are combined with the OA information, one should
> > > be able to get a good understanding of both the software and hardware
> > > scheduling decisions.
> > > 
> > 
> > Not sure about this either. I use these tracepoins to correlate things
> > to the GuC log. Between the 2, if you know what you are doing you
> > basically can figure out everything that is happening. Fields in the
> > trace translate directly to fields in the GuC log. Some of these fields
> > are backend specific, not sure how these could be pushed the dma fence
> > tracepoints. For what it is worth, without these tracepoints we'd likely
> > still have a bunch of bugs in the GuC firmware. I understand these
> > points, several other i915 developers do, and several of the GuC
> > firmware developers do too.
> > 
> > Matt
> > 
> > > Regards, Joonas
> > > 
> > > > 
> > > > Matt
> > > > 
> > > > > There's the orthogonal track to discuss what would be the stable set of
> > > > > tracepoints we could expose. However, before that discussion is closed,
> > > > > let's keep a rather strict line to avoid potential maintenance burned.
> > > > > 
> > > > > We can then relax in the future as needed.
> > > > > 
> > > > > Regards, Joonas
> > > > > 
> > > > > Quoting Matthew Brost (2021-06-24 10:04:29)
> > > > > > As discussed in [1], [2] we are enabling GuC submission support in the
> > > > > > i915. This is a subset of the patches in step 5 described in [1],
> > > > > > basically it is absolute to enable CI with GuC submission on gen11+
> > > > > > platforms.
> > > > > > 
> > > > > > This series itself will likely be broken down into smaller patch sets to
> > > > > > merge. Likely into CTBs changes, basic submission, virtual engines, and
> > > > > > resets.
> > > > > > 
> > > > > > A following series will address the missing patches remaining from [1].
> > > > > > 
> > > > > > Locally tested on TGL machine and basic tests seem to be passing.
> > > > > > 
> > > > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > > > 
> > > > > > [1] https://patchwork.freedesktop.org/series/89844/
> > > > > > [2] https://patchwork.freedesktop.org/series/91417/
> > > > > > 
> > > > > > Daniele Ceraolo Spurio (1):
> > > > > >   drm/i915/guc: Unblock GuC submission on Gen11+
> > > > > > 
> > > > > > John Harrison (10):
> > > > > >   drm/i915/guc: Module load failure test for CT buffer creation
> > > > > >   drm/i915: Track 'serial' counts for virtual engines
> > > > > >   drm/i915/guc: Provide mmio list to be saved/restored on engine reset
> > > > > >   drm/i915/guc: Don't complain about reset races
> > > > > >   drm/i915/guc: Enable GuC engine reset
> > > > > >   drm/i915/guc: Fix for error capture after full GPU reset with GuC
> > > > > >   drm/i915/guc: Hook GuC scheduling policies up
> > > > > >   drm/i915/guc: Connect reset modparam updates to GuC policy flags
> > > > > >   drm/i915/guc: Include scheduling policies in the debugfs state dump
> > > > > >   drm/i915/guc: Add golden context to GuC ADS
> > > > > > 
> > > > > > Matthew Brost (36):
> > > > > >   drm/i915/guc: Relax CTB response timeout
> > > > > >   drm/i915/guc: Improve error message for unsolicited CT response
> > > > > >   drm/i915/guc: Increase size of CTB buffers
> > > > > >   drm/i915/guc: Add non blocking CTB send function
> > > > > >   drm/i915/guc: Add stall timer to non blocking CTB send function
> > > > > >   drm/i915/guc: Optimize CTB writes and reads
> > > > > >   drm/i915/guc: Add new GuC interface defines and structures
> > > > > >   drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor
> > > > > >   drm/i915/guc: Add lrc descriptor context lookup array
> > > > > >   drm/i915/guc: Implement GuC submission tasklet
> > > > > >   drm/i915/guc: Add bypass tasklet submission path to GuC
> > > > > >   drm/i915/guc: Implement GuC context operations for new inteface
> > > > > >   drm/i915/guc: Insert fence on context when deregistering
> > > > > >   drm/i915/guc: Defer context unpin until scheduling is disabled
> > > > > >   drm/i915/guc: Disable engine barriers with GuC during unpin
> > > > > >   drm/i915/guc: Extend deregistration fence to schedule disable
> > > > > >   drm/i915: Disable preempt busywait when using GuC scheduling
> > > > > >   drm/i915/guc: Ensure request ordering via completion fences
> > > > > >   drm/i915/guc: Disable semaphores when using GuC scheduling
> > > > > >   drm/i915/guc: Ensure G2H response has space in buffer
> > > > > >   drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
> > > > > >   drm/i915/guc: Update GuC debugfs to support new GuC
> > > > > >   drm/i915/guc: Add several request trace points
> > > > > >   drm/i915: Add intel_context tracing
> > > > > >   drm/i915/guc: GuC virtual engines
> > > > > >   drm/i915: Hold reference to intel_context over life of i915_request
> > > > > >   drm/i915/guc: Disable bonding extension with GuC submission
> > > > > >   drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
> > > > > >   drm/i915/guc: Reset implementation for new GuC interface
> > > > > >   drm/i915: Reset GPU immediately if submission is disabled
> > > > > >   drm/i915/guc: Add disable interrupts to guc sanitize
> > > > > >   drm/i915/guc: Suspend/resume implementation for new interface
> > > > > >   drm/i915/guc: Handle context reset notification
> > > > > >   drm/i915/guc: Handle engine reset failure notification
> > > > > >   drm/i915/guc: Enable the timer expired interrupt for GuC
> > > > > >   drm/i915/guc: Capture error state on context reset
> > > > > > 
> > > > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   30 +-
> > > > > >  drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
> > > > > >  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
> > > > > >  drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   41 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   14 +-
> > > > > >  .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_context.c       |   41 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_context.h       |   31 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_context_types.h |   49 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_engine.h        |   72 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
> > > > > >  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
> > > > > >  .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   12 +-
> > > > > >  .../drm/i915/gt/intel_execlists_submission.c  |  234 +-
> > > > > >  .../drm/i915/gt/intel_execlists_submission.h  |   11 -
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   22 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    9 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
> > > > > >  drivers/gpu/drm/i915/gt/intel_reset.c         |   20 +-
> > > > > >  .../gpu/drm/i915/gt/intel_ring_submission.c   |   28 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
> > > > > >  drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
> > > > > >  .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
> > > > > >  drivers/gpu/drm/i915/gt/mock_engine.c         |   41 +-
> > > > > >  drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
> > > > > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  |   20 +-
> > > > > >  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  106 +-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  460 +++-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    3 +
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  318 ++-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   22 +-
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++--
> > > > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   17 +-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  102 +-
> > > > > >  drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
> > > > > >  drivers/gpu/drm/i915/i915_debugfs.c           |    2 +
> > > > > >  drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
> > > > > >  drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
> > > > > >  drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
> > > > > >  drivers/gpu/drm/i915/i915_reg.h               |    2 +
> > > > > >  drivers/gpu/drm/i915/i915_request.c           |  159 +-
> > > > > >  drivers/gpu/drm/i915/i915_request.h           |   21 +
> > > > > >  drivers/gpu/drm/i915/i915_scheduler.c         |    6 +
> > > > > >  drivers/gpu/drm/i915/i915_scheduler.h         |    6 +
> > > > > >  drivers/gpu/drm/i915/i915_scheduler_types.h   |    5 +
> > > > > >  drivers/gpu/drm/i915/i915_trace.h             |  197 +-
> > > > > >  .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
> > > > > >  .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
> > > > > >  57 files changed, 4159 insertions(+), 787 deletions(-)
> > > > > > 
> > > > > > -- 
> > > > > > 2.28.0
> > > > > > 


More information about the dri-devel mailing list