[Intel-gfx] [PATCH] drm/i915: Include GuC fw version in error state

Michel Thierry michel.thierry at intel.com
Fri Feb 24 17:32:19 UTC 2017



On 2/24/2017 9:15 AM, Chris Wilson wrote:
> On Fri, Feb 24, 2017 at 08:30:43AM -0800, Michel Thierry wrote:
>> On 2/24/2017 2:49 AM, Chris Wilson wrote:
>>> On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
>>>> On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
>>>>> On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
>>>>>>   Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble at intel.com>
>>>>>>
>>>>>>   On 2/24/2017 4:41 AM, Michel Thierry wrote:
>>>>>>
>>>>>> There was no way to check if the platform is running the latest firmware.
>>>>>>
>>>>>> Cc: Tvrtko Ursulin [2]<tvrtko.ursulin at intel.com>
>>>>>> Cc: Arkadiusz Hiler [3]<arkadiusz.hiler at intel.com>
>>>>>> Signed-off-by: Michel Thierry [4]<michel.thierry at intel.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>>>>>  1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> index 2b1d15668192..e022187916ee 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>>>>>>                            CSR_VERSION_MINOR(csr->version));
>>>>>>         }
>>>>>>
>>>>>> +       if (HAS_GUC_UCODE(dev_priv)) {
>>>>>> +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>>>>>> +
>>>>>> +               err_printf(m, "GuC loaded: %s\n",
>>>>>> +                          yesno(guc_fw->load_status ==
>>>>>> +                                INTEL_UC_FIRMWARE_SUCCESS));
>>>>>> +               err_printf(m, "GuC fw version: %d.%d\n",
>>>>>> +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> Hmm. The firmware may change between the hang and cat
>>>>> /sys/class/drm/card0/error (as it will be reloaded after the reset).
>>>>
>>>> Btw, maybe we should add counter that will be incremented on each fw reload
>>>> and reported here ?
>>>
>>> If it occurs to you that we need it for post-mortem debugging and having
>>> it is worth more than any potential confusion....
>>>
>>> I can see the need for knowing what guc/huc/dmc/etc was running at the
>>> time of a hang - I just hope that what was previously running before an
>>> earlier reset doesn't contribute. But that's why we focus on the first
>>> error in a system...
>>
>> Can the firmware change?
>> Last time I checked the filename was hard-coded in the driver. It's
>> true that the load process could fail and then the information be
>> incorrect.
>
> Assume it won't be hardcoded for ever (or at least no more than a week)...
> And yes, the filesystem state may have changed since the previous load.

ok, I'll add an i915_capture_fw_state to collect the information before 
the reset (for dmc/guc/huc).



More information about the Intel-gfx mailing list