[PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture
Dong, Zhanjun
zhanjun.dong at intel.com
Fri Mar 22 21:36:50 UTC 2024
See my comments below.
Regards,
Zhanjun
On 2024-03-22 5:28 p.m., Souza, Jose wrote:
> On Fri, 2024-03-22 at 11:37 -0700, José Roberto de Souza wrote:
>> On Fri, 2024-03-22 at 09:57 -0700, José Roberto de Souza wrote:
>>> On Tue, 2024-03-19 at 07:36 -0700, Zhanjun Dong wrote:
>>>> Port GuC based register capture for error capture from i915 to Xe.
>>>>
>>>> There are 3 parts inside:
>>>> . Prepare for capture registers
>>>> There is a bo create at guc ads init time, that is very early
>>>> and engi ne map is not ready, make it hard to calculate the
>>>> capture buffer size, new function created for worst case size
>>>> caluation. Other than that, this part basically follows the i915
>>>> design.
>>>> . Process capture notification message
>>>> Basically follows i915 design
>>>> . Sysfs command process.
>>>> Xe switched to devcoredump, adopted command line process with
>>>> captured node list.
>>>
>>> Just some notes after trying this to debug a issue with this series:
>>>
>>> - CONFIG_DRM_XE_CAPTURE_ERROR is not a good name for a config to enable GuC based capture.
>>> - in my opinion this should not even exist, always capture with GuC if xe_device_uc_enabled() is true.
>>>
>>> - coredump don't have some registers like SAMPLER_INSTDONE, ROW_INSTDONE, XEHPG_INSTDONE_GEOM_SVG...
Yes, those steering register print out is missing.
>>> Here the output in LNL:
>>
>> ah this could explain...
>>
>> [ 152.189386] xe 0000:00:02.0: [drm:xe_hw_engine_snapshot_from_capture [xe]] GT0: GuC error capture is empty, take snapshot from engine.
>> [ 152.190215] xe 0000:00:02.0: [drm] Xe device coredump has been created
>>
>> but both code paths needs to dump the same registers
I found you have another patch:
drm/xe/devcoredump: Print errno if VM snapshot was not captured
I will take a look.
>
> Hi Zhanjun
>
> Mesa team needs the INSTDONE registers to debug issues and I have patches implementing the dump of this registers.
> Do you mind if I send this patches the list?
Sure, please send it.
>
> I guess this GuC based register capture will need another version and someone with experience on GuC to review, so it will take a while...
>
>>
>>>
>>> **** HW Engines ****
>>> rcs0 (physical), logical instance=0
>>> Forcewake: domain 0x2, ref 1
>>> HWSTAM: 0xffffffff
>>> RING_HWS_PGA: 0x01693000
>>> RING_START: 0x029d5000
>>> RING_HEAD: 0x00001914
>>> RING_TAIL: 0x00001968
>>> RING_CTL: 0x00003001
>>> RING_MI_MODE: 0x00001000
>>> RING_MODE: 0x00000008
>>> RING_IMR: 0x00000000
>>> RING_ESR: 0x00000000
>>> RING_EMR: 0xffffffff
>>> RING_EIR: 0x00000000
>>> IPEHR: 0x7a000004
>>> ACTHD: 0x0000effeffff9208
>>> RING_BBADDR: 0x0000effeffff9209
>>> RING_DMA_FADD: 0x00000000029d6968
>>> RING_EXECLIST_STATUS: 0x000000004000279c
>>> RING_EXECLIST_SQ_CONTENTS: 0x00000000029d9719
>>>
>>> - a lot of style issues, see 'CI.checkpatch'
>>>
>>>>
>>>> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
>>>> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>>>>
>>>> Changes from prior revs:
>>>> v6:- Change hardcoded register snapshot fill to follow mapping tables
>>>> When capture is empty, take snapshot from engine
>>>> v5:- Split dss helper code out as an standalone patch
>>>> Remove old platform registers definition.
>>>> Split register map table to 32 and 64bit each
>>>> v4:- Move register map table to xe_hw_engine.c
>>>> v3:- Remove condition compilation in code
>>>> v2:- Split into multiple chunks
>>>>
>>>> Zhanjun Dong (8):
>>>> drm/xe/guc: Add kconfig for GuC based register capture
>>>> drm/xe/guc: Update GuC ADS size for error capture
>>>> drm/xe/guc: Add XE_LP steered register lists
>>>> drm/xe/guc: Add capture size check in GuC log buffer
>>>> drm/xe/guc: Check sizing of guc_capture output
>>>> drm/xe/guc: Extract GuC error capture lists on G2H notification
>>>> drm/xe/guc: Pre-allocate output nodes for extraction
>>>> drm/xe/guc: Plumb GuC-capture into dev coredump
>>>>
>>>> drivers/gpu/drm/xe/Kconfig | 11 +
>>>> drivers/gpu/drm/xe/Makefile | 1 +
>>>> drivers/gpu/drm/xe/abi/guc_actions_abi.h | 7 +
>>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +
>>>> drivers/gpu/drm/xe/xe_gt_printk.h | 3 +
>>>> drivers/gpu/drm/xe/xe_guc.c | 6 +
>>>> drivers/gpu/drm/xe/xe_guc_ads.c | 229 +++-
>>>> drivers/gpu/drm/xe/xe_guc_ads_types.h | 2 +
>>>> drivers/gpu/drm/xe/xe_guc_capture.c | 1332 ++++++++++++++++++++++
>>>> drivers/gpu/drm/xe/xe_guc_capture.h | 21 +
>>>> drivers/gpu/drm/xe/xe_guc_capture_fwif.h | 221 ++++
>>>> drivers/gpu/drm/xe/xe_guc_ct.c | 2 +
>>>> drivers/gpu/drm/xe/xe_guc_fwif.h | 68 ++
>>>> drivers/gpu/drm/xe/xe_guc_log.c | 179 +++
>>>> drivers/gpu/drm/xe/xe_guc_log.h | 15 +
>>>> drivers/gpu/drm/xe/xe_guc_log_types.h | 26 +
>>>> drivers/gpu/drm/xe/xe_guc_submit.c | 21 +-
>>>> drivers/gpu/drm/xe/xe_guc_submit.h | 3 +
>>>> drivers/gpu/drm/xe/xe_guc_types.h | 2 +
>>>> drivers/gpu/drm/xe/xe_hw_engine.c | 261 ++++-
>>>> drivers/gpu/drm/xe/xe_hw_engine.h | 4 +
>>>> drivers/gpu/drm/xe/xe_hw_engine_types.h | 117 +-
>>>> 22 files changed, 2421 insertions(+), 115 deletions(-)
>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.c
>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.h
>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_fwif.h
>>>>
>>>
>>
>
More information about the Intel-xe
mailing list