[Intel-gfx] [PATCH v2 1/1] drm/i915/guc: Fix GuC error capture sizing estimation and reporting
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Mon Oct 3 18:36:20 UTC 2022
Hi John - how would you like to proceed? I have re-rev'd as per your original review comment on rev1.
Shall we adopt this rev2's "drm_warn" for the worst-case (knowing well that gpu_core_dump is still an external subsystem
that can cull our data, but at least within this subsystem we are adding this warning for worst case and merely a debug
when we want the 3x capture). As we know GuC operation is unaware of the gpu-core-dump restriction so its not hard to
imagine the GuC capturing the amount of data for the worst case scenario if have big problems in the workloads or hw.
Additionally, would you prefer to completely drop the spare size? Some context: with the calculation fix we are
allocating 4MB but we only need 78k as min-est.
...alan
On Fri, 2022-09-30 at 14:18 -0700, Alan Previn wrote:
> + /*
> + * Don't drm_warn or drm_error here on "possible" insufficient size because we only run out
> + * of space if all engines were to suffer resets all at once AND the driver is not able to
> + * extract that data fast enough in the interrupt handler worker (moving them to the
> + * larger pool of pre-allocated capture nodes. If GuC does run out of space, we will
> + * print an error when processing the G2H event capture-notification (search for
> + * "INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE").
> + */
> if (min_size < 0)
> drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
> min_size);
> else if (min_size > buffer_size)
> - drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
> + drm_warn(&i915->drm, "GuC error state capture buffer maybe small: %d < %d\n",
> buffer_size, min_size);
> else if (spare_size > buffer_size)
> - drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
> - buffer_size, spare_size, min_size);
> + drm_dbg(&i915->drm, "GuC error state capture buffer lacks spare size: %d < %d (min = %d)\n",
> + buffer_size, spare_size, min_size);
> }
>
> /*
> --
> 2.34.1
>
More information about the Intel-gfx
mailing list