[PATCH v8 4/6] drm/xe/guc: Extract GuC error capture lists

Wed May 15 21:45:43 UTC 2024

See my comments below.

Regards,
Zhanjun Dong

On 2024-05-10 9:43 p.m., Teres Alexis, Alan Previn wrote:
> On Mon, 2024-05-06 at 18:47 -0700, Zhanjun Dong wrote:
>> Upon the G2H Notify-Err-Capture event, parse through the
>> GuC Log Buffer (error-capture-subregion) and generate one or
>> more capture-nodes. A single node represents a single "engine-
>> instance-capture-dump" and contains at least 3 register lists:
>> global, engine-class and engine-instance. An internal link
>> list is maintained to store one or more nodes.
>>
>> Because the link-list node generation happen before the call
>> to devcoredump, duplicate global and engine-class register
>> lists for each engine-instance register dump if we find
>> dependent-engine resets in a engine-capture-group.
>>
> alan:snip
>> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c
>> index d2df027081b5..71d7c4a58925 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_capture.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_capture.c
>> @@ -520,6 +520,560 @@ static void check_guc_capture_size(struct xe_guc *guc)
>>                            buffer_size, spare_size, capture_size);
>>   }
>>
> alan:snip
>> +static struct __guc_capture_parsed_output *
>> +guc_capture_get_prealloc_node(struct xe_guc *guc)
>> +{
>> +       struct __guc_capture_parsed_output *found = NULL;
>> +
>> +       if (!list_empty(&guc->capture->cachelist)) {
>> +               struct __guc_capture_parsed_output *n, *ntmp;
>> +
>> +               /* get first avail node from the cache list */
>> +               list_for_each_entry_safe(n, ntmp, &guc->capture->cachelist, link) {
>> +                       found = n;
>> +                       list_del(&n->link);
>> +                       break;
>> +               }
>> +       } else {
>> +               struct __guc_capture_parsed_output *n, *ntmp;
>> +
>> +               /* traverse down and steal back the oldest node already allocated */
>> +               list_for_each_entry_safe(n, ntmp, &guc->capture->outlist, link) {
>> +                       found = n;
>> +               }
>> +               if (found)
>> +                       list_del(&found->link);
>> +       }
>> +       if (found)
>> +               guc_capture_init_node(guc, found);
>> +
>> +       return found;
>> +}
> alan: I mentioned this in rev6, you cannot start pre-allocated nodelist anywhere
> in this patch when you are only allocating it in patch 6. Look back at my rev 6
> comments on this. Also, take a look at the original i915 patch on how to implement
> guc_capture_alloc/delete_one_node without preallocated nodelist:
> https://patchwork.freedesktop.org/patch/479022/?series=101604&rev=1
> (note: watch especially for the use of new->reginfo[i].regs which needed
> additional allocation step. Alternatively we could squash patch 4 and patch 6
> together and change patch 4's comment but not sure it might be too large a
> patch (can discuss offline).

Good point, let me try with pre-alloc vs GFP_AOTMIC and will get back to 
you.

> 
>> +static int
>> +guc_capture_extract_reglists(struct xe_guc *guc, struct __guc_capture_bufstate *buf)
>> +{
>> +       struct xe_gt *gt = guc_to_gt(guc);
>> +       struct guc_state_capture_group_header_t ghdr = {0};
>> +       struct guc_state_capture_header_t hdr = {0};
>> +       struct __guc_capture_parsed_output *node = NULL;
>> +       struct guc_mmio_reg *regs = NULL;
>> +       int i, numlists, numregs, ret = 0;
>> +       enum guc_capture_type datatype;
>> +       struct guc_mmio_reg tmp;
>> +       bool is_partial = false;
> alan:snip
>> +               if (!node) {
>> +                       node = guc_capture_get_prealloc_node(guc);
> alan: see above comment on the use of prealloc_node (as per rev 6's comments)
> alan:snip
> 
>> +static void __guc_capture_process_output(struct xe_guc *guc)
>> +{
>> +       unsigned int buffer_size, read_offset, write_offset, full_count;
>> +       struct xe_uc *uc = container_of(guc, typeof(*uc), guc);
>> +       struct guc_log_buffer_state log_buf_state_local;
>> +       struct guc_log_buffer_state *log_buf_state;
>> +       struct __guc_capture_bufstate buf;
>> +       bool new_overflow;
>> +       int ret;
>> +       u32 log_buf_state_offset;
>> +       u32 src_data_offset;
>> +
>> +       log_buf_state = (struct guc_log_buffer_state *)((ulong)guc->log.bo->vmap.vaddr +
>> +                       (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER));
> alan: once again, i dont think we can use vmap.vaddr directly this this anymore right?
> i dont think we use "log_buf_state" until the end of this function to set the new read_ptr
> and flush flag. We ought to use xe_map_wr below?
Yes, need xe_map_xxx helper, as we are doing read, so should be xe_map_rd
>> +
>> +       log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER;
>> +       src_data_offset = xe_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
>> +
>> +       /*
>> +        * Make a copy of the state structure, inside GuC log buffer
>> +        * (which is uncached mapped), on the stack to avoid reading
>> +        * from it multiple times.
>> +        */
>> +       xe_map_memcpy_from(guc_to_xe(guc), &log_buf_state_local, &guc->log.bo->vmap,
>> +                          log_buf_state_offset, sizeof(struct guc_log_buffer_state));
> alan:snip
>