[PATCH v12 4/5] drm/xe/guc: Extract GuC error capture lists

Sat Jul 13 04:38:12 UTC 2024

On Fri, 2024-07-12 at 09:47 -0700, Zhanjun Dong wrote:
> Upon the G2H Notify-Err-Capture event, parse through the
> GuC Log Buffer (error-capture-subregion) and generate one or
> more capture-nodes. A single node represents a single "engine-
> instance-capture-dump" and contains at least 3 register lists:
> global, engine-class and engine-instance. An internal link
> list is maintained to store one or more nodes.
> 
alan:snip

> +static int guc_capture_buf_cnt(struct __guc_capture_bufstate *buf)
> +{
> +       if (buf->wr >= buf->rd)
> +               return (buf->wr - buf->rd);
> +       return (buf->size - buf->rd) + buf->wr;
> +}
> +
> +static int guc_capture_buf_cnt_to_end(struct __guc_capture_bufstate *buf)
> +{
> +       if (buf->rd > buf->wr)
> +               return (buf->size - buf->rd);
> +       return (buf->wr - buf->rd);
> +}
> +
> +/*
> + * GuC's error-capture output is a ring buffer populated in a byte-stream fashion:
> + *
> + * The GuC Log buffer region for error-capture is managed like a ring buffer.
> + * The GuC firmware dumps error capture logs into this ring in a byte-stream flow.
> + * Additionally, as per the current and foreseeable future, all packed error-
> + * capture output structures are dword aligned.
> + *
> + * That said, if the GuC firmware is in the midst of writing a structure that is larger
> + * than one dword but the tail end of the err-capture buffer-region has lesser space left,
> + * we would need to extract that structure one dword at a time straddled across the end,
> + * onto the start of the ring.
> + *
> + * Below function, guc_capture_log_remove_bytes is a helper for that. All callers of this
> + * function would typically do a straight-up memcpy from the ring contents and will only
> + * call this helper if their structure-extraction is straddling across the end of the
> + * ring. GuC firmware does not add any padding. The reason for the no-padding is to ease
> + * scalability for future expansion of output data types without requiring a redesign
> + * of the flow controls.
> + */
alan: I see you have decided to change the function for extraction
data when it straddles the ring end. Please do ensure you repeat igt
tests in a loop with instrumented kernel messages (or any other means)
to ensure its excercising the case where error capture data being
extracted straddles the ring end.
> +static int
> +guc_capture_log_remove_bytes(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
> +                            void *out, int bytes_needed)
> +{
> +       int tries = 2, fill_size = 0, copy_size, avail;
> +
> +       if (!guc_capture_buf_cnt(buf))
> +               return 0;
> +
> +       while (bytes_needed > 0 && tries--) {
alan: not sure if limiting "tries" to 2 is sufficient. in the
worst case scenario (straddling the end of the ring), we will
experience the following:
- worst-case-#1st-try: avail < bytes_needed, copy partial
- worst-case-#2nd-try: avail == 0
                  (since we reached end of ring in #1 and we have
                   not haven't set buf->rd = zero in which only
                   gets done in this step)
- worst-case-#3rd-try: avail >= bytes_needed
                 (now that we have wrapped around the ring)

That said, i believe avail tries should be max'd at 3, which is
different from the old "guc_capture_log_remove_dw" function that
was designed to remove ONLY a single dword and only called when
tail is to be at or near the ring end and needed to wraparound.
> +               avail = guc_capture_buf_cnt_to_end(buf);
> +               /* wrap if at end */
> +               if (!avail) {
> +                       /* output stream clipped */
> +                       if (!buf->rd)
> +                               return fill_size;
> +                       buf->rd = 0;
> +                       continue;
> +               }
> +               if (avail < sizeof(u32)) {
alan: although this check would actually be true if we are
in fact not 32-bit aligned, considering the prior check for
if-zero-continue, it is however such a late place to check for
alignment (i.e. right at the end of the ring). Might be better
to check for alignent on the input size "bytes_needed" at start
of function and also using "if (avail % sizoef(u32))"
> +                       xe_gt_dbg(guc_to_gt(guc), "Bytes extraction not dword aligned,skipping.\n");
> +                       continue;
> +               }
> +               copy_size = avail < bytes_needed ? avail : bytes_needed;
> +               xe_map_memcpy_from(guc_to_xe(guc), out, &guc->log.bo->vmap,
> +                                  buf->data_offset + buf->rd, copy_size);
> +               buf->rd += copy_size;
> +               fill_size += copy_size;
> +               bytes_needed -= copy_size;
> +       }
> +
> +       return fill_size;
> +}
> +
> +static bool
> +guc_capture_data_extracted(struct xe_guc *guc, struct __guc_capture_bufstate *b,
> +                          int size, void *dest)
> +{
alan: actually, I realize that based on the goal of guc_capture_log_remove_bytes,
it means "guc_capture_data_extracted" is not needed anymore. callers only call
guc_capture_log_remove_bytes once and that does both cases (case1 -> data to
extract fits before end of ring... or case2 -> data straddles end of ring).

> +       if (guc_capture_buf_cnt_to_end(b) >= size) {
> +               xe_map_memcpy_from(guc_to_xe(guc), dest, &guc->log.bo->vmap,
> +                                  b->data_offset + b->rd, size);
> +               b->rd += size;
> +               return true;
> +       }
> +       return false;
> +}
> +
> +static int
> +guc_capture_log_get_group_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
> +                             struct guc_state_capture_group_header_t *ghdr)
> +{
> +       int read = 0;
> +       int fullsize = sizeof(struct guc_state_capture_group_header_t);
> +
> +       if (fullsize > guc_capture_buf_cnt(buf))
> +               return -1;
> +
> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)ghdr))
> +               return 0;
alan: as mentioned above, this function not needed anymore.
> +
> +       read += guc_capture_log_remove_bytes(guc, buf, ghdr, sizeof(*ghdr));
> +       if (read != fullsize)
> +               return -1;
> +
> +       return 0;
> +}
> +
> +static int
> +guc_capture_log_get_data_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
> +                            struct guc_state_capture_header_t *hdr)
> +{
> +       int read = 0;
> +       int fullsize = sizeof(struct guc_state_capture_header_t);
> +
> +       if (fullsize > guc_capture_buf_cnt(buf))
> +               return -1;
> +
> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)hdr))
> +               return 0;
alan: as mentioned above, this function not needed anymore.
> +
> +       read += guc_capture_log_remove_bytes(guc, buf, hdr, sizeof(*hdr));
> +       if (read != fullsize)
> +               return -1;
> +
> +       return 0;
> +}
> +
> +static int
> +guc_capture_log_get_register(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
> +                            struct guc_mmio_reg *reg)
> +{
> +       int read = 0;
> +       int fullsize = sizeof(struct guc_mmio_reg);
> +
> +       if (fullsize > guc_capture_buf_cnt(buf))
> +               return -1;
> +
> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)reg))
> +               return 0;
alan: as mentioned above, this function not needed anymore. also 
> +
> +       read += guc_capture_log_remove_bytes(guc, buf, reg, sizeof(*reg));
> +       if (read != fullsize)
> +               return -1;
> +
> +       return 0;
> +}
> +
> +static struct __guc_capture_parsed_output *
> +guc_capture_get_prealloc_node(struct xe_guc *guc)
> 
alan:snip
> +static void __guc_capture_process_output(struct xe_guc *guc)
> +{
> +       unsigned int buffer_size, read_offset, write_offset, full_count;
> +       struct xe_uc *uc = container_of(guc, typeof(*uc), guc);
> +       struct guc_log_buffer_state log_buf_state_local;
> +       struct __guc_capture_bufstate buf;
> +       bool new_overflow;
> +       int ret, tmp;
> +       u32 log_buf_state_offset;
> +       u32 src_data_offset;
> +
> +       log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_LOG_BUFFER_CAPTURE;
> +       src_data_offset = xe_guc_get_log_buffer_offset(&guc->log, GUC_LOG_BUFFER_CAPTURE);
> +
> +       /*
> +        * Make a copy of the state structure, inside GuC log buffer
> +        * (which is uncached mapped), on the stack to avoid reading
> +        * from it multiple times.
> +        */
> +       xe_map_memcpy_from(guc_to_xe(guc), &log_buf_state_local, &guc->log.bo->vmap,
> +                          log_buf_state_offset, sizeof(struct guc_log_buffer_state));
> +
> +       buffer_size = xe_guc_get_log_buffer_size(&guc->log, GUC_LOG_BUFFER_CAPTURE);
> +       read_offset = log_buf_state_local.read_ptr;
> +       write_offset = log_buf_state_local.sampled_write_ptr;
> +       full_count = FIELD_GET(GUC_LOG_BUFFER_STATE_BUFFER_FULL_CNT, log_buf_state_local.flags);
> +
> +       /* Bookkeeping stuff */
> +       tmp = FIELD_GET(GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE, log_buf_state_local.flags);
> +       guc->log.stats[GUC_LOG_BUFFER_CAPTURE].flush += tmp;
> +       new_overflow = xe_guc_check_log_buf_overflow(&guc->log, GUC_LOG_BUFFER_CAPTURE,
> +                                                    full_count);
> +
> +       /* Now copy the actual logs. */
> +       if (unlikely(new_overflow)) {
> +               /* copy the whole buffer in case of overflow */
> +               read_offset = 0;
> +               write_offset = buffer_size;
> +       } else if (unlikely((read_offset > buffer_size) ||
> +                       (write_offset > buffer_size))) {
> +               xe_gt_err(guc_to_gt(guc),
> +                         "Register capture buffer in invalid state: read = 0x%X, size = 0x%X!\n",
> +                         read_offset, buffer_size);
> +               /* copy whole buffer as offsets are unreliable */
> +               read_offset = 0;
> +               write_offset = buffer_size;
> +       }
> +
> +       buf.size = buffer_size;
> +       buf.rd = read_offset;
> +       buf.wr = write_offset;
> +       buf.data_offset = src_data_offset;
> +
> +       if (!xe_guc_read_stopped(guc)) {
> +               do {
> +                       ret = guc_capture_extract_reglists(guc, &buf);
alan:nit: I notice that inside of guc_capture_extract_reglists,
we have many cases where errors may occur but we dont print that an error
occured there and we don't do that here either. Since guc error capture is
about debuggability, not functionality, then I would recommend adding a
drm_debug here with the error value. I know most of the error paths
inside of guc_capture_extract_reglists end up returning -EIO, but
perhaps this first step is sufficient, i.e. to know an error occured.
> +               } while (ret >= 0);
> +       }
> +
> +       /* Update the state of log buffer err-cap state */
> +       xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
> +                 log_buf_state_offset + offsetof(struct guc_log_buffer_state, read_ptr), u32,
> +                 write_offset);
> +       /* Clear the flush_to_file from local first, the local was loaded by above
alan: "/* Clear the flush_to_file..." comment block starting should
have "/*" on its own line and then newline for the comments.
+        * xe_map_memcpy_from.
> +        */
> +       log_buf_state_local.flags &= ~GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE;
> +
> 
alan:nit: i feel like this newline above should be between thie xe_map_wr for write_offset
and the start of the description of "Clear the flush_to_file...". Thus no need newline here
since above this was about flush-flag and below is about flush-flag. Basically new line
appeared like grouping things together but grouping unrelated things. Alternatively can
just remove any of these newlines for these final write-out-update steps. I also thinks
its neater to combine below comment "write out the updated local" together with the same
comment block of "Clear the flush_to_file comment..." since both are about flush flags.
> +       /* Then write out the "updated local" through xe_map_wr() */
> +       xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
> +                 log_buf_state_offset + offsetof(struct guc_log_buffer_state, flags), u32,
> +                 log_buf_state_local.flags);
> +       __guc_capture_flushlog_complete(guc);
> +}
> +
> +/*
> + * xe_guc_capture_process - Process GuC register captured data
> + * @guc: The GuC object
> + *
> + * When GuC captured data is ready, GuC will send message
> + * XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION to host, this function will be
> + * called to process the data comes with the message.
> + *
> + * Returns: None
> + */
> +void xe_guc_capture_process(struct xe_guc *guc)
> +{
> +       if (guc->capture)
> +               __guc_capture_process_output(guc);
> +}
> +
> +static struct __guc_capture_parsed_output *
> 
alan:snip

> --- a/drivers/gpu/drm/xe/xe_guc_capture.h
> +++ b/drivers/gpu/drm/xe/xe_guc_capture.h
> @@ -11,6 +11,7 @@
>  
>  struct xe_guc;
>  
> +void xe_guc_capture_process(struct xe_guc *guc);
>  int xe_guc_capture_getlist(struct xe_guc *guc, u32 owner, u32 type,
>                            enum guc_capture_list_class_type capture_class, void **outptr);
>  int xe_guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type,
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 7d2e937da1d8..7bf9e45abf49 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1106,6 +1106,8 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>                 /* Selftest only at the moment */
>                 break;
>         case XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION:
> +               ret = xe_guc_error_capture_handler(guc, payload, adj_len);
> +               break;
alan: i notice that process_g2h_msg is in the ct queue and we are not
making modifications to schedule error capture extraction (along with
G2H context reset notification) into the queue that drm-sched schedules
the TDR timer. I am not sure if this decision is intentional because i
thought we had prior offline conversations that we needed the alternative
to avoid some races. Lets connect offline, perhaps i am just mistaken here.

>         case XE_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE:
>                 /* FIXME: Handle this */
>                 break;
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c

alan:snip