[PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture

Dong, Zhanjun zhanjun.dong at intel.com
Tue Mar 26 15:40:47 UTC 2024


See my comments inline below.

Regards,
Zhanjun
On 2024-03-22 5:36 p.m., Dong, Zhanjun wrote:
> See my comments below.
> 
> Regards,
> Zhanjun
> 
> On 2024-03-22 5:28 p.m., Souza, Jose wrote:
>> On Fri, 2024-03-22 at 11:37 -0700, José Roberto de Souza wrote:
>>> On Fri, 2024-03-22 at 09:57 -0700, José Roberto de Souza wrote:
>>>> On Tue, 2024-03-19 at 07:36 -0700, Zhanjun Dong wrote:
>>>>> Port GuC based register capture for error capture from i915 to Xe.
>>>>>
>>>>> There are 3 parts inside:
>>>>> . Prepare for capture registers
>>>>>      There is a bo create at guc ads init time, that is very early
>>>>>      and engi  ne map is not ready, make it hard to calculate the
>>>>>      capture buffer size, new function created for worst case size
>>>>>      caluation. Other than that, this part basically follows the i915
>>>>>      design.
>>>>> . Process capture notification message
>>>>>      Basically follows i915 design
>>>>> . Sysfs command process.
>>>>>      Xe switched to devcoredump, adopted command line process with
>>>>>      captured node list.
>>>>
>>>> Just some notes after trying this to debug a issue with this series:
>>>>
>>>> - CONFIG_DRM_XE_CAPTURE_ERROR is not a good name for a config to 
>>>> enable GuC based capture.
>>>>     - in my opinion this should not even exist, always capture with 
>>>> GuC if xe_device_uc_enabled() is true.

3 reasons:
1. This GuC based register error capture is for debug purpose, does it 
still important for production stage? I want to leave it an option for 
users who don't need it.
2. With this opiton disabled, extra memory could be saved. It might not 
be a big deal for PC, while might be valuable for customers who want to 
save few 10KBs to 100KB RAM usage.
3. The feature with an opiton is what we have in i915, where the feature 
is ported from. So make it the same style to avoid users migrated from 
i915 complains about it.

>>>>
>>>> - coredump don't have some registers like SAMPLER_INSTDONE, 
>>>> ROW_INSTDONE, XEHPG_INSTDONE_GEOM_SVG...
> 
> Yes, those steering register print out is missing.
> 
>>>> Here the output in LNL:
>>>
>>> ah this could explain...
>>>
>>> [  152.189386] xe 0000:00:02.0: 
>>> [drm:xe_hw_engine_snapshot_from_capture [xe]] GT0: GuC error capture 
>>> is empty, take snapshot from engine.
>>> [  152.190215] xe 0000:00:02.0: [drm] Xe device coredump has been 
>>> created
>>>
>>> but both code paths needs to dump the same registers
> 
> I found you have another patch:
> drm/xe/devcoredump: Print errno if VM snapshot was not captured
> I will take a look.
> 
>>
>> Hi Zhanjun
>>
>> Mesa team needs the INSTDONE registers to debug issues and I have 
>> patches implementing the dump of this registers.
>> Do you mind if I send this patches the list?
> 
> Sure, please send it.
> 
>>
>> I guess this GuC based register capture will need another version and 
>> someone with experience on GuC to review, so it will take a while...
>>
>>>
>>>>
>>>> **** HW Engines ****
>>>> rcs0 (physical), logical instance=0
>>>>     Forcewake: domain 0x2, ref 1
>>>>     HWSTAM: 0xffffffff
>>>>     RING_HWS_PGA: 0x01693000
>>>>     RING_START: 0x029d5000
>>>>     RING_HEAD: 0x00001914
>>>>     RING_TAIL: 0x00001968
>>>>     RING_CTL: 0x00003001
>>>>     RING_MI_MODE: 0x00001000
>>>>     RING_MODE: 0x00000008
>>>>     RING_IMR: 0x00000000
>>>>     RING_ESR: 0x00000000
>>>>     RING_EMR: 0xffffffff
>>>>     RING_EIR: 0x00000000
>>>>     IPEHR: 0x7a000004
>>>>     ACTHD: 0x0000effeffff9208
>>>>     RING_BBADDR: 0x0000effeffff9209
>>>>     RING_DMA_FADD: 0x00000000029d6968
>>>>     RING_EXECLIST_STATUS: 0x000000004000279c
>>>>     RING_EXECLIST_SQ_CONTENTS: 0x00000000029d9719
>>>>
>>>> - a lot of style issues, see 'CI.checkpatch'

After review the CI.checkpatch output, all style issues reported is by 
purpose.

>>>>
>>>>>
>>>>> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
>>>>> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>>>>>
>>>>> Changes from prior revs:
>>>>>    v6:-  Change hardcoded register snapshot fill to follow mapping 
>>>>> tables
>>>>>          When capture is empty, take snapshot from engine
>>>>>    v5:-  Split dss helper code out as an standalone patch
>>>>>          Remove old platform registers definition.
>>>>>          Split register map table to 32 and 64bit each
>>>>>    v4:-  Move register map table to xe_hw_engine.c
>>>>>    v3:-  Remove condition compilation in code
>>>>>    v2:-  Split into multiple chunks
>>>>>
>>>>> Zhanjun Dong (8):
>>>>>    drm/xe/guc: Add kconfig for GuC based register capture
>>>>>    drm/xe/guc: Update GuC ADS size for error capture
>>>>>    drm/xe/guc: Add XE_LP steered register lists
>>>>>    drm/xe/guc: Add capture size check in GuC log buffer
>>>>>    drm/xe/guc: Check sizing of guc_capture output
>>>>>    drm/xe/guc: Extract GuC error capture lists on G2H notification
>>>>>    drm/xe/guc: Pre-allocate output nodes for extraction
>>>>>    drm/xe/guc: Plumb GuC-capture into dev coredump
>>>>>
>>>>>   drivers/gpu/drm/xe/Kconfig               |   11 +
>>>>>   drivers/gpu/drm/xe/Makefile              |    1 +
>>>>>   drivers/gpu/drm/xe/abi/guc_actions_abi.h |    7 +
>>>>>   drivers/gpu/drm/xe/regs/xe_gt_regs.h     |    5 +
>>>>>   drivers/gpu/drm/xe/xe_gt_printk.h        |    3 +
>>>>>   drivers/gpu/drm/xe/xe_guc.c              |    6 +
>>>>>   drivers/gpu/drm/xe/xe_guc_ads.c          |  229 +++-
>>>>>   drivers/gpu/drm/xe/xe_guc_ads_types.h    |    2 +
>>>>>   drivers/gpu/drm/xe/xe_guc_capture.c      | 1332 
>>>>> ++++++++++++++++++++++
>>>>>   drivers/gpu/drm/xe/xe_guc_capture.h      |   21 +
>>>>>   drivers/gpu/drm/xe/xe_guc_capture_fwif.h |  221 ++++
>>>>>   drivers/gpu/drm/xe/xe_guc_ct.c           |    2 +
>>>>>   drivers/gpu/drm/xe/xe_guc_fwif.h         |   68 ++
>>>>>   drivers/gpu/drm/xe/xe_guc_log.c          |  179 +++
>>>>>   drivers/gpu/drm/xe/xe_guc_log.h          |   15 +
>>>>>   drivers/gpu/drm/xe/xe_guc_log_types.h    |   26 +
>>>>>   drivers/gpu/drm/xe/xe_guc_submit.c       |   21 +-
>>>>>   drivers/gpu/drm/xe/xe_guc_submit.h       |    3 +
>>>>>   drivers/gpu/drm/xe/xe_guc_types.h        |    2 +
>>>>>   drivers/gpu/drm/xe/xe_hw_engine.c        |  261 ++++-
>>>>>   drivers/gpu/drm/xe/xe_hw_engine.h        |    4 +
>>>>>   drivers/gpu/drm/xe/xe_hw_engine_types.h  |  117 +-
>>>>>   22 files changed, 2421 insertions(+), 115 deletions(-)
>>>>>   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.c
>>>>>   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.h
>>>>>   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_fwif.h
>>>>>
>>>>
>>>
>>


More information about the Intel-xe mailing list