[Intel-gfx] [PATCH 1/1] drm/i915/guc: Fix GuC error capture sizing estimation and reporting

John Harrison john.c.harrison at intel.com
Fri Sep 30 22:35:19 UTC 2022


On 9/30/2022 14:08, Teres Alexis, Alan Previn wrote:
> I disagree because its unlikely that all engines can reset all at once (we probably have bigger problems at the at
> point) and if they all occurred within the same G2H handler scheduled worker run, our current gpu_coredump framework
> would just discard the ones after the first one and so it wouldnt even matter if we did catch it.
So min_size is not actually the minimal size for a meaningful capture? 
So what is? And remember that for compute class engines, there is 
dependent engine reset. So a reset of CCS2 also means a reset of RCS, 
CCS0, CCS1 and CCS3. So having at least four engines per capture is not 
unreasonable.

It seems pointless to go through a lot of effort to calculate the 
minimum and recommend sizes only to basically ignore them by just 
whispering very, very quietly that there might be a problem. It also 
seems pointless to complain about a minimum size that actually isn't the 
minimum size. That's sort of worse - now you are telling the user there 
is a problem when really there isn't.

IMHO, the min_size check should be meaningful and should be visible to 
the user if it fails.

Also, are we still hitting the minimum size failure message? Now that 
the calculation has been fixed, what sizes does it come up with for min 
and spare? Are they within the allocation now or not?

John.


> But I'll go ahead and re-rev this.
> ...alan
>
> On Fri, 2022-09-30 at 10:48 -0700, Harrison, John C wrote:
>> Isn't min_size the bare minimum to get a valid capture? Surely this
>> still needs to be a warning not a debug. If we can't manage a basic
>> working error capture then there is a problem. This needs to be caught
>> by CI and logged as a bug if it is ever hit. And that means an end user
>> should never see it fire because we won't let a driver out the door
>> unless the default buffer size is sufficient.



More information about the Intel-gfx mailing list