[PATCH 3/3] drm/xe: Drop duplicated information about GT tile in devcoredump
Souza, Jose
jose.souza at intel.com
Thu Jan 23 19:35:14 UTC 2025
On Thu, 2025-01-23 at 11:27 -0800, John Harrison wrote:
> On 1/23/2025 10:56, Souza, Jose wrote:
> > On Thu, 2025-01-23 at 10:30 -0800, John Harrison wrote:
> > > On 1/23/2025 10:24, Souza, Jose wrote:
> > > > On Thu, 2025-01-23 at 10:18 -0800, John Harrison wrote:
> > > > > On 1/23/2025 09:59, José Roberto de Souza wrote:
> > > > > > The GT tile is already available but was added again in commit
> > > > > > c28fd6c358db ("drm/xe/devcoredump: Improve section headings and add tile info"),
> > > > > > so here deleting it.
> > > > > >
> > > > > > Here a devcoredump example with the duplicated information:
> > > > > >
> > > > > > **** Xe Device Coredump ****
> > > > > > Reason: Timedout job - seqno=4294967170, lrc_seqno=4294967170, guc_id=13, flags=0x0
> > > > > > kernel: 6.13.0-zeh-xe+
> > > > > > module: xe
> > > > > > Snapshot time: 1737573530.243521319
> > > > > > Uptime: 2588.041930284
> > > > > > Process: deqp-vk [8850]
> > > > > > PCI ID: 0x64a0
> > > > > > PCI revision: 0x04
> > > > > > GT id: 0
> > > > > > Tile: 0
> > > > > > Type: main
> > > > > > IP ver: 20.4.4
> > > > > > CS reference clock: 19200000
> > > > > > GT id: 1
> > > > > > Tile: 0
> > > > > > Type: media
> > > > > > IP ver: 20.0.4
> > > > > > CS reference clock: 19200000
> > > > > This is an overview of all GTs/tiles in the device within the global
> > > > > section.
> > > > >
> > > > > > **** GT #0 ****
> > > > > This is a section header telling you that everything which follows is
> > > > > inside GT0.
> > > > >
> > > > > It is not duplicated information. And if you remove it then you now have
> > > > > the information of all GTs back to back with no indication of which GT
> > > > > they actually belong to.
> > > > Can't you get this information from Name + class + instance? if not that should be placed in one of the sections below.
> > > No. I was trying to do that originally and determined it was impossible
> > > to do reliably for all current and future platforms. Hence the section
> > > header was added.
> > >
> > > It is also much, much, much better to be explicit about debug
> > > information than force people to guess based on heuristics or tribal
> > > knowledge.
> > >
> > > > So this should be placed in one of this sections:
> > > No. It is not information within a context or within a hardware engine.
> > > It is the reverse. All the following contexts, hardware engines, etc.
> > > are within the GT.
> > >
> > > And when you have multiple GTs in a single dump then you need a definite
> > > delimiter to say that all the following is now in a different GT to what
> > > came before.
> > Can't I do a exec over a exec_queue created over engines in different tiles?
> > I think we are allowed, so having tile and gt in '**** HW Engines ****' would be better.
> Not that I am aware of. The whole point of the 'hw engine' section is
> that it is related to a command streamer. And those cannot bridge GTs.
No, I mean this:
- create a exec_queue with instances in CCS0(tile 0, gt 0) + CCS5(tile 1, gt 1).
- do a exec with a batch buffer CCS0 + other batch buffer for CCS5
- one of those gets hang
- devcoredump should dump information of both engines("xe_engine_snapshot_capture_for_queue(struct xe_exec_queue *q)")
so having tile and gt information in **** HW Engines **** would be better.
but we can work on this after the GuC log part.
>
> John.
>
> >
> > > John.
> > >
> > > > GuC ID: 13
> > > > Name: ccs13
> > > > Class: 5
> > > > Logical mask: 0x1
> > > > Width: 1
> > > > Ref: 2
> > > > Timeout: 5000 (ms)
> > > > Timeslice: 1000 (us)
> > > > Preempt timeout: 640000 (us)
> > > > HW Context Desc: 0x025e0000
> > > > HW Ring address: 0x025dc000
> > > > HW Indirect Ring State: 0x025e3000
> > > > LRC Head: (memory) 152
> > > > LRC Tail: (internal) 296, (memory) 296
> > > > Ring start: (memory) 0x025dc000
> > > > Start seqno: (memory) -126
> > > > Seqno: (memory) -127
> > > > Timestamp: 0x0000035e
> > > > Job Timestamp: 0x0000035e
> > > > Schedule State: 0x441
> > > > Flags: 0x0
> > > >
> > > > **** HW Engines ****
> > > > ccs0 (physical), logical instance=0
> > > > Capture_source: GuC
> > > > Coverage: full-capture
> > > > Forcewake: domain 0x2, ref 1
> > > > Reserved: no
> > > > FORCEWAKE_GT: 0x00010000
> > > > RCU_MODE: 0x00000001
> > > > HWSTAM: 0xffffffff
> > > > RING_HWS_PGA: 0x018db000
> > > > RING_HEAD: 0x000000ec
> > > > RING_TAIL: 0x00000128
> > > > RING_CTL: 0x00003001
> > > > RING_MI_MODE: 0x00001000
> > > > RING_MODE: 0x00000008
> > > > RING_ESR: 0x00000000
> > > > RING_EMR: 0xffffffff
> > > > RING_EIR: 0x00000000
> > > > RING_IMR: 0x00000000
> > > > IPEHR: 0x7a000a04
> > > > RING_INSTDONE: 0xffdefffe
> > > >
> > > > > John.
> > > > >
> > > > >
> > > > > > Tile: 0
> > > > > >
> > > > > > **** GuC Log ****
> > > > > > ....
> > > > > >
> > > > > > Cc: John Harrison <John.C.Harrison at Intel.com>
> > > > > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_devcoredump.c | 3 ---
> > > > > > 1 file changed, 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > index 1c86e6456d60f..2996945ffee39 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > > @@ -111,9 +111,6 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
> > > > > > drm_printf(&p, "Process: %s [%d]\n", ss->process_name, ss->pid);
> > > > > > xe_device_snapshot_print(xe, &p);
> > > > > >
> > > > > > - drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id);
> > > > > > - drm_printf(&p, "\tTile: %d\n", ss->gt->tile->id);
> > > > > > -
> > > > > > drm_puts(&p, "\n**** GuC Log ****\n");
> > > > > > xe_guc_log_snapshot_print(ss->guc.log, &p);
> > > > > > drm_puts(&p, "\n**** GuC CT ****\n");
>
More information about the Intel-xe
mailing list