[Intel-gfx] [PATCH v5 07/10] drm/i915/guc: Extract GuC error capture lists on G2H notification.

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Thu Feb 17 19:21:39 UTC 2022


On Sun, Feb 13, 2022 at 11:47:00AM -0800, Teres Alexis, Alan Previn wrote:
>Thanks Umesh for reviewing the patch.
>Am fixing all the rest but a couple of comments.
>Responses to the latter and other questions below:
>
>...alan
>
>> > +enum intel_guc_state_capture_event_status {
>> > +   INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_SUCCESS = 0x0,
>> > +   INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE = 0x1,
>> > +};
>> > +
>> > +#define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x1
>>
>> MASK is not needed. See below
>
>Alan: Oh wait, actually the mask for the capture status is 0x000000FF
>(above is a typo). I'll fix above  mask and shall not change the
>code below because the upper 24 bits of the first dword of this msg
>is not defined.
>
>...
>
>
>> > +static int guc_capture_buf_cnt(struct __guc_capture_bufstate *buf)
>> > +{
>> > +   if (buf->rd == buf->wr)
>> > +           return 0;
>> > +   if (buf->wr > buf->rd)
>> > +           return (buf->wr - buf->rd);
>> > +   return (buf->size - buf->rd) + buf->wr;
>> > +}
>>
>> Is this a circular buffer shared between GuC and kmd? Since the size is
>> a power of 2, the above function is simply:
>>
>Alan: not this is not a circular buffer, so I'll keep the above
>version.
>> static u32 guc_capture_buf_count(struct __guc_capture_bufstate *buf)
>> {
>>       return (buf->wr - buf->rd) & (buf->size - 1);
>> }
>>
>
>...
>
>> > +static int
>> > +guc_capture_log_remove_dw(struct intel_guc *guc, struct __guc_capture_bufstate *buf,
>> > +                     u32 *dw)
>> > +{
>> > +   struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>> > +   int tries = 2;
>> > +   int avail = 0;
>> > +   u32 *src_data;
>> > +
>> > +   if (!guc_capture_buf_cnt(buf))
>> > +           return 0;
>> > +
>> > +   while (tries--) {
>> > +           avail = guc_capture_buf_cnt_to_end(buf);
>>
>> Shouldn't this be avail = guc_capture_buf_cnt(buf)?
>>
>
>Alan : The "guc_capture_log_get_[foo]" functions only call above
>guc_capture_log_remove_dw when there isnt sufficient space to
>copy out an entire structure from the space between the read pointer
>and the end of the subregion (before the wrap-around). Those function
>would populate the structure dword by dword by calling above func.
>(NOTE the buffer and all error capture output structs are dword
>aligned). Thats why above function tries twice and resets buf->rd = 0
>if we find no space left at the end of the subregion (i.e. need to
>wrap around) - which can only be done by calling
>"guc_capture_buf_cnt_to_end".
>
>...
>
>> > +
>> > +   /* Bookkeeping stuff */
>> > +   guc->log_state[GUC_CAPTURE_LOG_BUFFER].flush += log_buf_state_local.flush_to_file;
>> > +   new_overflow = intel_guc_check_log_buf_overflow(guc,
>> > +                                                   &guc->log_state[GUC_CAPTURE_LOG_BUFFER],
>> > +                                                   full_count);
>>
>> I am not sure how the overflow logic works here and whether it is
>> applicable to the error capture buffer. Is the guc log buffer one big
>> buffer where the error capture is just a portion of that buffer? If so,
>> is the wrap around applicable to just the errorcapture buffer or to the
>> whole buffer?
>>
>Alan: Yes, the guc log buffer is one big log buffer but there are 3 independent
>subregions within that are populated with different content and are used
>in different ways and timings. Each guc-log subregion (general-logs,
>crash-dump and error-capture) has it's own read and write pointers.

got it. I would also put this one detail in the commit message since 
it's not quickly inferred.

>
>
>> Also what is the wrap_offset field in struct guc_log_buffer_state?
>
>Alan: This is the byte offset of a location in the subregion that is the 1st byte
>after the last valid guc entry written by Guc firmware before a wraparound
>was done. This would generate a tiny hole at the end of the subregion for better
>cacheline alignment when flushing entries into the subregion. However,
>the error-capture subregion is dword aligned and all of the output structures
>used for error-capture are also dword aligned so this can never happen for the
>error-capture subregion.
>
Makes sense, thanks for clarifying.

Umesh
>
>>


More information about the Intel-gfx mailing list