[PATCH v12 4/5] drm/xe/guc: Extract GuC error capture lists

Tue Jul 16 21:15:40 UTC 2024

See me inline comments below.

Regards,
Zhanjun Dong

On 2024-07-13 12:38 a.m., Teres Alexis, Alan Previn wrote:
> On Fri, 2024-07-12 at 09:47 -0700, Zhanjun Dong wrote:
>> Upon the G2H Notify-Err-Capture event, parse through the
>> GuC Log Buffer (error-capture-subregion) and generate one or
>> more capture-nodes. A single node represents a single "engine-
>> instance-capture-dump" and contains at least 3 register lists:
>> global, engine-class and engine-instance. An internal link
>> list is maintained to store one or more nodes.
>>
> alan:snip
> 
>> +static int guc_capture_buf_cnt(struct __guc_capture_bufstate *buf)
>> +{
>> +       if (buf->wr >= buf->rd)
>> +               return (buf->wr - buf->rd);
>> +       return (buf->size - buf->rd) + buf->wr;
>> +}
>> +
>> +static int guc_capture_buf_cnt_to_end(struct __guc_capture_bufstate *buf)
>> +{
>> +       if (buf->rd > buf->wr)
>> +               return (buf->size - buf->rd);
>> +       return (buf->wr - buf->rd);
>> +}
>> +
>> +/*
>> + * GuC's error-capture output is a ring buffer populated in a byte-stream fashion:
>> + *
>> + * The GuC Log buffer region for error-capture is managed like a ring buffer.
>> + * The GuC firmware dumps error capture logs into this ring in a byte-stream flow.
>> + * Additionally, as per the current and foreseeable future, all packed error-
>> + * capture output structures are dword aligned.
>> + *
>> + * That said, if the GuC firmware is in the midst of writing a structure that is larger
>> + * than one dword but the tail end of the err-capture buffer-region has lesser space left,
>> + * we would need to extract that structure one dword at a time straddled across the end,
>> + * onto the start of the ring.
>> + *
>> + * Below function, guc_capture_log_remove_bytes is a helper for that. All callers of this
>> + * function would typically do a straight-up memcpy from the ring contents and will only
>> + * call this helper if their structure-extraction is straddling across the end of the
>> + * ring. GuC firmware does not add any padding. The reason for the no-padding is to ease
>> + * scalability for future expansion of output data types without requiring a redesign
>> + * of the flow controls.
>> + */
> alan: I see you have decided to change the function for extraction
> data when it straddles the ring end. Please do ensure you repeat igt
> tests in a loop with instrumented kernel messages (or any other means)
> to ensure its excercising the case where error capture data being
> extracted straddles the ring end.
Sure, thanks.

>> +static int
>> +guc_capture_log_remove_bytes(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> +                            void *out, int bytes_needed)
>> +{
>> +       int tries = 2, fill_size = 0, copy_size, avail;
>> +
>> +       if (!guc_capture_buf_cnt(buf))
>> +               return 0;
>> +
>> +       while (bytes_needed > 0 && tries--) {
> alan: not sure if limiting "tries" to 2 is sufficient. in the
> worst case scenario (straddling the end of the ring), we will
> experience the following:
> - worst-case-#1st-try: avail < bytes_needed, copy partial
> - worst-case-#2nd-try: avail == 0
>                    (since we reached end of ring in #1 and we have
>                     not haven't set buf->rd = zero in which only
>                     gets done in this step)
> - worst-case-#3rd-try: avail >= bytes_needed
>                   (now that we have wrapped around the ring)
> 
> That said, i believe avail tries should be max'd at 3, which is
> different from the old "guc_capture_log_remove_dw" function that
> was designed to remove ONLY a single dword and only called when
> tail is to be at or near the ring end and needed to wraparound.
Sure, will be 3 in next rev.
>> +               avail = guc_capture_buf_cnt_to_end(buf);
>> +               /* wrap if at end */
>> +               if (!avail) {
>> +                       /* output stream clipped */
>> +                       if (!buf->rd)
>> +                               return fill_size;
>> +                       buf->rd = 0;
>> +                       continue;
>> +               }
>> +               if (avail < sizeof(u32)) {
> alan: although this check would actually be true if we are
> in fact not 32-bit aligned, considering the prior check for
> if-zero-continue, it is however such a late place to check for
> alignment (i.e. right at the end of the ring). Might be better
> to check for alignent on the input size "bytes_needed" at start
> of function and also using "if (avail % sizoef(u32))"
will add assert at function entry:
xe_assert(guc_to_xe(guc), bytes_needed % sizeof(u32) == 0);

>> +                       xe_gt_dbg(guc_to_gt(guc), "Bytes extraction not dword aligned,skipping.\n");
>> +                       continue;
>> +               }
>> +               copy_size = avail < bytes_needed ? avail : bytes_needed;
to be:
misaligned = avail % sizeof(u32);
copy_size = avail < bytes_needed ? avail - misaligned : bytes_needed;

bytes_needed was asserted at function entry
Then copy_size is alway u32 aligned

>> +               xe_map_memcpy_from(guc_to_xe(guc), out, &guc->log.bo->vmap,
>> +                                  buf->data_offset + buf->rd, copy_size);
>> +               buf->rd += copy_size;
>> +               fill_size += copy_size;
>> +               bytes_needed -= copy_size;
and here:
if (misaligned)
	xe_gt_dbg(guc_to_gt(guc), "Bytes extraction not dword aligned, 
skipping.\n");

So if we got 6 bytes, we copy 4 bytes, print debug info and skip latest 
2 bytes and retry if not reach max.
>> +       }
>> +
>> +       return fill_size;
>> +}
>> +
>> +static bool
>> +guc_capture_data_extracted(struct xe_guc *guc, struct __guc_capture_bufstate *b,
>> +                          int size, void *dest)
>> +{
> alan: actually, I realize that based on the goal of guc_capture_log_remove_bytes,
> it means "guc_capture_data_extracted" is not needed anymore. callers only call
> guc_capture_log_remove_bytes once and that does both cases (case1 -> data to
> extract fits before end of ring... or case2 -> data straddles end of ring).
Right, this function no longer needed, to be removed

> 
>> +       if (guc_capture_buf_cnt_to_end(b) >= size) {
>> +               xe_map_memcpy_from(guc_to_xe(guc), dest, &guc->log.bo->vmap,
>> +                                  b->data_offset + b->rd, size);
>> +               b->rd += size;
>> +               return true;
>> +       }
>> +       return false;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_group_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> +                             struct guc_state_capture_group_header_t *ghdr)
>> +{
>> +       int read = 0;
>> +       int fullsize = sizeof(struct guc_state_capture_group_header_t);
>> +
>> +       if (fullsize > guc_capture_buf_cnt(buf))
>> +               return -1;
>> +
>> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)ghdr))
>> +               return 0;
> alan: as mentioned above, this function not needed anymore.
to be removed
>> +
>> +       read += guc_capture_log_remove_bytes(guc, buf, ghdr, sizeof(*ghdr));
>> +       if (read != fullsize)
>> +               return -1;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_data_hdr(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> +                            struct guc_state_capture_header_t *hdr)
>> +{
>> +       int read = 0;
>> +       int fullsize = sizeof(struct guc_state_capture_header_t);
>> +
>> +       if (fullsize > guc_capture_buf_cnt(buf))
>> +               return -1;
>> +
>> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)hdr))
>> +               return 0;
> alan: as mentioned above, this function not needed anymore.
to be removed
>> +
>> +       read += guc_capture_log_remove_bytes(guc, buf, hdr, sizeof(*hdr));
>> +       if (read != fullsize)
>> +               return -1;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +guc_capture_log_get_register(struct xe_guc *guc, struct __guc_capture_bufstate *buf,
>> +                            struct guc_mmio_reg *reg)
>> +{
>> +       int read = 0;
>> +       int fullsize = sizeof(struct guc_mmio_reg);
>> +
>> +       if (fullsize > guc_capture_buf_cnt(buf))
>> +               return -1;
>> +
>> +       if (guc_capture_data_extracted(guc, buf, fullsize, (void *)reg))
>> +               return 0;
> alan: as mentioned above, this function not needed anymore. also
to be removed
>> +
>> +       read += guc_capture_log_remove_bytes(guc, buf, reg, sizeof(*reg));
>> +       if (read != fullsize)
>> +               return -1;
>> +
>> +       return 0;
>> +}
>> +
>> +static struct __guc_capture_parsed_output *
>> +guc_capture_get_prealloc_node(struct xe_guc *guc)
>>
> alan:snip
>> +static void __guc_capture_process_output(struct xe_guc *guc)
>> +{
>> +       unsigned int buffer_size, read_offset, write_offset, full_count;
>> +       struct xe_uc *uc = container_of(guc, typeof(*uc), guc);
>> +       struct guc_log_buffer_state log_buf_state_local;
>> +       struct __guc_capture_bufstate buf;
>> +       bool new_overflow;
>> +       int ret, tmp;
>> +       u32 log_buf_state_offset;
>> +       u32 src_data_offset;
>> +
>> +       log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_LOG_BUFFER_CAPTURE;
>> +       src_data_offset = xe_guc_get_log_buffer_offset(&guc->log, GUC_LOG_BUFFER_CAPTURE);
>> +
>> +       /*
>> +        * Make a copy of the state structure, inside GuC log buffer
>> +        * (which is uncached mapped), on the stack to avoid reading
>> +        * from it multiple times.
>> +        */
>> +       xe_map_memcpy_from(guc_to_xe(guc), &log_buf_state_local, &guc->log.bo->vmap,
>> +                          log_buf_state_offset, sizeof(struct guc_log_buffer_state));
>> +
>> +       buffer_size = xe_guc_get_log_buffer_size(&guc->log, GUC_LOG_BUFFER_CAPTURE);
>> +       read_offset = log_buf_state_local.read_ptr;
>> +       write_offset = log_buf_state_local.sampled_write_ptr;
>> +       full_count = FIELD_GET(GUC_LOG_BUFFER_STATE_BUFFER_FULL_CNT, log_buf_state_local.flags);
>> +
>> +       /* Bookkeeping stuff */
>> +       tmp = FIELD_GET(GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE, log_buf_state_local.flags);
>> +       guc->log.stats[GUC_LOG_BUFFER_CAPTURE].flush += tmp;
>> +       new_overflow = xe_guc_check_log_buf_overflow(&guc->log, GUC_LOG_BUFFER_CAPTURE,
>> +                                                    full_count);
>> +
>> +       /* Now copy the actual logs. */
>> +       if (unlikely(new_overflow)) {
>> +               /* copy the whole buffer in case of overflow */
>> +               read_offset = 0;
>> +               write_offset = buffer_size;
>> +       } else if (unlikely((read_offset > buffer_size) ||
>> +                       (write_offset > buffer_size))) {
>> +               xe_gt_err(guc_to_gt(guc),
>> +                         "Register capture buffer in invalid state: read = 0x%X, size = 0x%X!\n",
>> +                         read_offset, buffer_size);
>> +               /* copy whole buffer as offsets are unreliable */
>> +               read_offset = 0;
>> +               write_offset = buffer_size;
>> +       }
>> +
>> +       buf.size = buffer_size;
>> +       buf.rd = read_offset;
>> +       buf.wr = write_offset;
>> +       buf.data_offset = src_data_offset;
>> +
>> +       if (!xe_guc_read_stopped(guc)) {
>> +               do {
>> +                       ret = guc_capture_extract_reglists(guc, &buf);
> alan:nit: I notice that inside of guc_capture_extract_reglists,
> we have many cases where errors may occur but we dont print that an error
> occured there and we don't do that here either. Since guc error capture is
> about debuggability, not functionality, then I would recommend adding a
> drm_debug here with the error value. I know most of the error paths
> inside of guc_capture_extract_reglists end up returning -EIO, but
> perhaps this first step is sufficient, i.e. to know an error occured.
Sure, will add dbg print on error code.
Thanks for the advise, actually I found an unexpected error code that 
the function returns -ENODATA on guc_capture_buf_cnt(buf) is 0, which is 
expected at the end of process.
Will fix it by make guc_capture_extract_reglists return 0 on this normal 
condition.

>> +               } while (ret >= 0);
>> +       }
>> +
>> +       /* Update the state of log buffer err-cap state */
>> +       xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
>> +                 log_buf_state_offset + offsetof(struct guc_log_buffer_state, read_ptr), u32,
>> +                 write_offset);
>> +       /* Clear the flush_to_file from local first, the local was loaded by above
> alan: "/* Clear the flush_to_file..." comment block starting should
> have "/*" on its own line and then newline for the comments.

> +        * xe_map_memcpy_from.
>> +        */
>> +       log_buf_state_local.flags &= ~GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE;
>> +
>>
> alan:nit: i feel like this newline above should be between thie xe_map_wr for write_offset
> and the start of the description of "Clear the flush_to_file...". Thus no need newline here
> since above this was about flush-flag and below is about flush-flag. Basically new line
> appeared like grouping things together but grouping unrelated things. Alternatively can
> just remove any of these newlines for these final write-out-update steps. I also thinks
> its neater to combine below comment "write out the updated local" together with the same
> comment block of "Clear the flush_to_file comment..." since both are about flush flags.
Sure
>> +       /* Then write out the "updated local" through xe_map_wr() */
>> +       xe_map_wr(guc_to_xe(guc), &guc->log.bo->vmap,
>> +                 log_buf_state_offset + offsetof(struct guc_log_buffer_state, flags), u32,
>> +                 log_buf_state_local.flags);
>> +       __guc_capture_flushlog_complete(guc);
>> +}
>> +
>> +/*
>> + * xe_guc_capture_process - Process GuC register captured data
>> + * @guc: The GuC object
>> + *
>> + * When GuC captured data is ready, GuC will send message
>> + * XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION to host, this function will be
>> + * called to process the data comes with the message.
>> + *
>> + * Returns: None
>> + */
>> +void xe_guc_capture_process(struct xe_guc *guc)
>> +{
>> +       if (guc->capture)
>> +               __guc_capture_process_output(guc);
>> +}
>> +
>> +static struct __guc_capture_parsed_output *
>>
> alan:snip
> 
>> --- a/drivers/gpu/drm/xe/xe_guc_capture.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_capture.h
>> @@ -11,6 +11,7 @@
>>   
>>   struct xe_guc;
>>   
>> +void xe_guc_capture_process(struct xe_guc *guc);
>>   int xe_guc_capture_getlist(struct xe_guc *guc, u32 owner, u32 type,
>>                             enum guc_capture_list_class_type capture_class, void **outptr);
>>   int xe_guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type,
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 7d2e937da1d8..7bf9e45abf49 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1106,6 +1106,8 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>                  /* Selftest only at the moment */
>>                  break;
>>          case XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION:
>> +               ret = xe_guc_error_capture_handler(guc, payload, adj_len);
>> +               break;
> alan: i notice that process_g2h_msg is in the ct queue and we are not
> making modifications to schedule error capture extraction (along with
> G2H context reset notification) into the queue that drm-sched schedules
> the TDR timer. I am not sure if this decision is intentional because i
> thought we had prior offline conversations that we needed the alternative
> to avoid some races. Lets connect offline, perhaps i am just mistaken here.
Since rev 11, the devcoredump capture add a pre-capture from hwe if guc 
capture is not ready, and later on, the devcoredump being called after 
the event wait, which avoid the race condition.
> 
>>          case XE_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE:
>>                  /* FIXME: Handle this */
>>                  break;
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> 
> alan:snip
>