[PATCH v12 4/5] drm/xe/guc: Extract GuC error capture lists
Dong, Zhanjun
zhanjun.dong at intel.com
Tue Jul 16 21:15:40 UTC 2024
See me inline comments below.
Regards,
Zhanjun Dong
On 2024-07-13 12:38 a.m., Teres Alexis, Alan Previn wrote:
> On Fri, 2024-07-12 at 09:47 -0700, Zhanjun Dong wrote:
>> Upon the G2H Notify-Err-Capture event, parse through the
>> GuC Log Buffer (error-capture-subregion) and generate one or
>> more capture-nodes. A single node represents a single "engine-
>> instance-capture-dump" and contains at least 3 register lists:
>> global, engine-class and engine-instance. An internal link
>> list is maintained to store one or more nodes.
>>
> alan:snip
>
>> +static int guc_capture_buf_cnt(struct __guc_capture_bufstate *buf)
>> +{
>> + if (buf->wr >= buf->rd)
>> + return (buf->wr - buf->rd);
>> + return (buf->size - buf->rd) + buf->wr;
>> +}
>> +
>> +static int guc_capture_buf_cnt_to_end(struct __guc_capture_bufstate *buf)
>> +{
>> + if (buf->rd > buf->wr)
>> + return (buf->size - buf->rd);
>> + return (buf->wr - buf->rd);
>> +}
>> +
>> +/*
>> + * GuC's error-capture output is a ring buffer populated in a byte-stream fashion:
>> + *
>> + * The GuC Log buffer region for error-capture is managed like a ring buffer.
>> + * The GuC firmware dumps error capture logs into this ring in a byte-stream flow.
>> + * Additionally, as per the current and foreseeable future, all packed error-
>> + * capture output structures are dword aligned.
>> + *
>> + * That said, if the GuC firmware is in the midst of writing a structure that is larger
>> + * than one dword but the tail end of the err-capture buffer-region has lesser space left,
>> + * we would need to extract that structure one dword at a time straddled across the end,
>> + * onto the start of the ring.
>> + *
>> + * Below function, guc_capture_log_remove_bytes is a helper for that. All callers of this
>> + * function would typically do a straight-up memcpy from the ring contents and will only
>> + * call this helper if their structure-extraction is straddling across the end of the
>> + * ring. GuC firmware does not add any padding. The reason for the no-padding is to ease
>> + * scalability for future expansion of output data types without requiring a redesign
>> + * of the flow controls.
>> + */
> alan: I see you have decided to change the function for extraction
> data when it straddles the ring end. Please do ensure you repeat igt
> tests in a loop with instrumented kernel messages (or any other means)
> to ensure its excercising the case where error capture data being
> extracted straddles the ring end.
Sure, thanks.
>> +static int
>> +guc_capture_log_remove_bytes(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> + void *out, int bytes_needed)
>> +{
>> + int tries = 2, fill_size = 0, copy_size, avail;
>> +
>> + if (!guc_capture_buf_cnt(buf))
>> + return 0;
>> +
>> + while (bytes_needed > 0 && tries--) {
> alan: not sure if limiting "tries" to 2 is sufficient. in the
> worst case scenario (straddling the end of the ring), we will
> experience the following:
> - worst-case-#1st-try: avail < bytes_needed, copy partial
> - worst-case-#2nd-try: avail == 0
> (since we reached end of ring in #1 and we have
> not haven't set buf->rd = zero in which only
> gets done in this step)
> - worst-case-#3rd-try: avail >= bytes_needed
> (now that we have wrapped around the ring)
>
> That said, i believe avail tries should be max'd at 3, which is
> different from the old "guc_capture_log_remove_dw" function that
> was designed to remove ONLY a single dword and only called when
> tail is to be at or near the ring end and needed to wraparound.
Sure, will be 3 in next rev.
>> + avail = guc_capture_buf_cnt_to_end(buf);
>> + /* wrap if at end */
>> + if (!avail) {
>> + /* output stream clipped */
>> + if (!buf->rd)
>> + return fill_size;
>> + buf->rd = 0;
>> + continue;
>> + }
>> + if (avail < sizeof(u32)) {
> alan: although this check would actually be true if we are
> in fact not 32-bit aligned, considering the prior check for
> if-zero-continue, it is however such a late place to check for
> alignment (i.e. right at the end of the ring). Might be better
> to check for alignent on the input size "bytes_needed" at start
> of function and also using "if (avail % sizoef(u32))"
will add assert at function entry:
xe_assert(guc_to_xe(guc), bytes_needed % sizeof(u32) == 0);
>> + xe_gt_dbg(guc_to_gt(guc), "Bytes extraction not dword aligned,skipping.\n");
>> + continue;
>> + }
>> + copy_size = avail < bytes_needed ? avail : bytes_needed;
to be:
misaligned = avail % sizeof(u32);
copy_size = avail < bytes_needed ? avail - misaligned : bytes_needed;
bytes_needed was asserted at function entry
Then copy_size is alway u32 aligned
>> + xe_map_memcpy_from(guc_to_xe(guc), out, &guc->log.bo->vmap,
>> + buf->data_offset + buf->rd, copy_size);
>> + buf->rd += copy_size;
>> + fill_size += copy_size;
>> + bytes_needed -= copy_size;
and here:
if (misaligned)
xe_gt_dbg(guc_to_gt(guc), "Bytes extraction not dword aligned,
skipping.\n");
So if we got 6 bytes, we copy 4 bytes, print debug info and skip latest
2 bytes and retry if not reach max.
>> + }
>> +
>> + return fill_size;
>> +}
>> +
>> +static bool
>> +guc_capture_data_extracted(struct xe_guc *guc, struct __guc_capture_bufstate *b,
>> + int size, void *dest)
>> +{
> alan: actually, I realize that based on the goal of guc_capture_log_remove_bytes,
> it means "guc_capture_data_extracted" is not needed anymore. callers only call
> guc_capture_log_remove_bytes once and that does both cases (case1 -> data to
> extract fits before end of ring... or case2 -> data straddles end of ring).
Right, this function no longer needed, to be removed
>
>> + if (guc_capture_buf_cnt_to_end(b) >= size) {
>> + xe_map_memcpy_from(guc_to_xe(guc), dest, &guc->log.bo->vmap,
>> + b->data_offset + b->rd, size);
>> + b->rd += size;
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_group_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> + struct guc_state_capture_group_header_t *ghdr)
>> +{
>> + int read = 0;
>> + int fullsize = sizeof(struct guc_state_capture_group_header_t);
>> +
>> + if (fullsize > guc_capture_buf_cnt(buf))
>> + return -1;
>> +
>> + if (guc_capture_data_extracted(guc, buf, fullsize, (void *)ghdr))
>> + return 0;
> alan: as mentioned above, this function not needed anymore.
to be removed
>> +
>> + read += guc_capture_log_remove_bytes(guc, buf, ghdr, sizeof(*ghdr));
>> + if (read != fullsize)
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_data_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> + struct guc_state_capture_header_t *hdr)
>> +{
>> + int read = 0;
>> + int fullsize = sizeof(struct guc_state_capture_header_t);
>> +
>> + if (fullsize > guc_capture_buf_cnt(buf))
>> + return -1;
>> +
>> + if (guc_capture_data_extracted(guc, buf, fullsize, (void *)hdr))
>> + return 0;
> alan: as mentioned above, this function not needed anymore.
to be removed
>> +
>> + read += guc_capture_log_remove_bytes(guc, buf, hdr, sizeof(*hdr));
>> + if (read != fullsize)
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_register(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> + struct guc_mmio_reg *reg)
>> +{
>> + int read = 0;
>> + int fullsize = sizeof(struct guc_mmio_reg);
>> +
>> + if (fullsize > guc_capture_buf_cnt(buf))
>> + return -1;
>> +
>> + if (guc_capture_data_extracted(guc, buf, fullsize, (void *)reg))
>> + return 0;
> alan: as mentioned above, this function not needed anymore. also
to be removed
>> +
>> + read += guc_capture_log_remove_bytes(guc, buf, reg, sizeof(*reg));
>> + if (read != fullsize)
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> +static struct __guc_capture_parsed_output *
>> +guc_capture_get_prealloc_node(struct xe_guc *guc)
>>
> alan:snip
>> +static void __guc_capture_process_output(struct xe_guc *guc)
>> +{
>> + unsigned int buffer_size, read_offset, write_offset, full_count;
>> + struct xe_uc *uc = container_of(guc, typeof(*uc), guc);
>> + struct guc_log_buffer_state log_buf_state_local;
>> + struct __guc_capture_bufstate buf;
>> + bool new_overflow;
>> + int ret, tmp;
>> + u32 log_buf_state_offset;
>> + u32 src_data_offset;
>> +
>> + log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_LOG_BUFFER_CAPTURE;
>> + src_data_offset = xe_guc_get_log_buffer_offset(&guc->log, GUC_LOG_BUFFER_CAPTURE);
>> +
>> + /*
>> + * Make a copy of the state structure, inside GuC log buffer
>> + * (which is uncached mapped), on the stack to avoid reading
>> + * from it multiple times.
>> + */
>> + xe_map_memcpy_from(guc_to_xe(guc), &log_buf_state_local, &guc->log.bo->vmap,
>> + log_buf_state_offset, sizeof(struct guc_log_buffer_state));
>> +
>> + buffer_size = xe_guc_get_log_buffer_size(&guc->log, GUC_LOG_BUFFER_CAPTURE);
>> + read_offset = log_buf_state_local.read_ptr;
>> + write_offset = log_buf_state_local.sampled_write_ptr;
>> + full_count = FIELD_GET(GUC_LOG_BUFFER_STATE_BUFFER_FULL_CNT, log_buf_state_local.flags);
>> +
>> + /* Bookkeeping stuff */
>> + tmp = FIELD_GET(GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE, log_buf_state_local.flags);
>> + guc->log.stats[GUC_LOG_BUFFER_CAPTURE].flush += tmp;
>> + new_overflow = xe_guc_check_log_buf_overflow(&guc->log, GUC_LOG_BUFFER_CAPTURE,
>> + full_count);
>> +
>> + /* Now copy the actual logs. */
>> + if (unlikely(new_overflow)) {
>> + /* copy the whole buffer in case of overflow */
>> + read_offset = 0;
>> + write_offset = buffer_size;
>> + } else if (unlikely((read_offset > buffer_size) ||
>> + (write_offset > buffer_size))) {
>> + xe_gt_err(guc_to_gt(guc),
>> + "Register capture buffer in invalid state: read = 0x%X, size = 0x%X!\n",
>> + read_offset, buffer_size);
>> + /* copy whole buffer as offsets are unreliable */
>> + read_offset = 0;
>> + write_offset = buffer_size;
>> + }
>> +
>> + buf.size = buffer_size;
>> + buf.rd = read_offset;
>> + buf.wr = write_offset;
>> + buf.data_offset = src_data_offset;
>> +
>> + if (!xe_guc_read_stopped(guc)) {
>> + do {
>> + ret = guc_capture_extract_reglists(guc, &buf);
> alan:nit: I notice that inside of guc_capture_extract_reglists,
> we have many cases where errors may occur but we dont print that an error
> occured there and we don't do that here either. Since guc error capture is
> about debuggability, not functionality, then I would recommend adding a
> drm_debug here with the error value. I know most of the error paths
> inside of guc_capture_extract_reglists end up returning -EIO, but
> perhaps this first step is sufficient, i.e. to know an error occured.
Sure, will add dbg print on error code.
Thanks for the advise, actually I found an unexpected error code that
the function returns -ENODATA on guc_capture_buf_cnt(buf) is 0, which is
expected at the end of process.
Will fix it by make guc_capture_extract_reglists return 0 on this normal
condition.
>> + } while (ret >= 0);
>> + }
>> +
>> + /* Update the state of log buffer err-cap state */
>> + xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
>> + log_buf_state_offset + offsetof(struct guc_log_buffer_state, read_ptr), u32,
>> + write_offset);
>> + /* Clear the flush_to_file from local first, the local was loaded by above
> alan: "/* Clear the flush_to_file..." comment block starting should
> have "/*" on its own line and then newline for the comments.
> + * xe_map_memcpy_from.
>> + */
>> + log_buf_state_local.flags &= ~GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE;
>> +
>>
> alan:nit: i feel like this newline above should be between thie xe_map_wr for write_offset
> and the start of the description of "Clear the flush_to_file...". Thus no need newline here
> since above this was about flush-flag and below is about flush-flag. Basically new line
> appeared like grouping things together but grouping unrelated things. Alternatively can
> just remove any of these newlines for these final write-out-update steps. I also thinks
> its neater to combine below comment "write out the updated local" together with the same
> comment block of "Clear the flush_to_file comment..." since both are about flush flags.
Sure
>> + /* Then write out the "updated local" through xe_map_wr() */
>> + xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
>> + log_buf_state_offset + offsetof(struct guc_log_buffer_state, flags), u32,
>> + log_buf_state_local.flags);
>> + __guc_capture_flushlog_complete(guc);
>> +}
>> +
>> +/*
>> + * xe_guc_capture_process - Process GuC register captured data
>> + * @guc: The GuC object
>> + *
>> + * When GuC captured data is ready, GuC will send message
>> + * XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION to host, this function will be
>> + * called to process the data comes with the message.
>> + *
>> + * Returns: None
>> + */
>> +void xe_guc_capture_process(struct xe_guc *guc)
>> +{
>> + if (guc->capture)
>> + __guc_capture_process_output(guc);
>> +}
>> +
>> +static struct __guc_capture_parsed_output *
>>
> alan:snip
>
>> --- a/drivers/gpu/drm/xe/xe_guc_capture.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_capture.h
>> @@ -11,6 +11,7 @@
>>
>> struct xe_guc;
>>
>> +void xe_guc_capture_process(struct xe_guc *guc);
>> int xe_guc_capture_getlist(struct xe_guc *guc, u32 owner, u32 type,
>> enum guc_capture_list_class_type capture_class, void **outptr);
>> int xe_guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type,
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 7d2e937da1d8..7bf9e45abf49 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1106,6 +1106,8 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> /* Selftest only at the moment */
>> break;
>> case XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION:
>> + ret = xe_guc_error_capture_handler(guc, payload, adj_len);
>> + break;
> alan: i notice that process_g2h_msg is in the ct queue and we are not
> making modifications to schedule error capture extraction (along with
> G2H context reset notification) into the queue that drm-sched schedules
> the TDR timer. I am not sure if this decision is intentional because i
> thought we had prior offline conversations that we needed the alternative
> to avoid some races. Lets connect offline, perhaps i am just mistaken here.
Since rev 11, the devcoredump capture add a pre-capture from hwe if guc
capture is not ready, and later on, the devcoredump being called after
the event wait, which avoid the race condition.
>
>> case XE_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE:
>> /* FIXME: Handle this */
>> break;
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>
> alan:snip
>
More information about the Intel-xe
mailing list