[Intel-gfx] [PATCH v5 08/10] drm/i915/guc: Plumb GuC-capture into gpu_coredump

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Sat Feb 12 00:31:00 UTC 2022


Thanks Umesh for reviewing this for me and for the R-v-b.

Responding on one comment below

On Thu, 2022-02-10 at 18:11 -0800, Umesh Nerlige Ramappa wrote:
> On Wed, Jan 26, 2022 at 02:48:20AM -0800, Alan Previn wrote:
> > Add a flags parameter through all of the coredump creation
> > functions. Add a bitmask flag to indicate if the top
> > level gpu_coredump event is triggered in response to
> > a GuC context reset notification.
> > 
> > lgtm. In general the implementation differences between submission 
> backends (guc vs execlists) need to be vfuncs so we can avoid having to 
> use the flags and conditions. Regardless, with the above comments 
> addressed, this is:
> 

I do agree its much cleaner and consistent with other engine related
vfuncs that differentiate guc vs execlist however, in the case of the
gpu coredump operation, we can have cases where GuC submission is enabled
but the capture_engine function may have been triggered NOT by GuC.
Consider the following scenarios:
1. A context was reset by the GuC as per the context-level execution
   quanta + preemption timeout... OR... a context failure caught
   by HW and routed to GuC.
2. If the user had started i915 with modparam reset == 1 and hangcheck
   enabled, but the context was executed with an infinite execution
   quanta, then heartbeat could trigger a full GT reset and capture
   all engine states despite having GuC submission and in this case,
   we do want manual HW register dumps by i915 and not rely on GuC.

That said, for now, with the existing gpu_coredump framework that,
idea may not work.

> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> 
> Umesh



More information about the Intel-gfx mailing list