[Intel-gfx] [PATCH v5 07/10] drm/i915/guc: Extract GuC error capture lists on G2H notification.

Sun Feb 13 19:47:00 UTC 2022

Thanks Umesh for reviewing the patch.
Am fixing all the rest but a couple of comments.
Responses to the latter and other questions below:

...alan

> > +enum intel_guc_state_capture_event_status {
> > +	INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_SUCCESS = 0x0,
> > +	INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE = 0x1,
> > +};
> > +
> > +#define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x1
> 
> MASK is not needed. See below

Alan: Oh wait, actually the mask for the capture status is 0x000000FF
(above is a typo). I'll fix above  mask and shall not change the 
code below because the upper 24 bits of the first dword of this msg 
is not defined.

...

> > +static int guc_capture_buf_cnt(struct __guc_capture_bufstate *buf)
> > +{
> > +	if (buf->rd == buf->wr)
> > +		return 0;
> > +	if (buf->wr > buf->rd)
> > +		return (buf->wr - buf->rd);
> > +	return (buf->size - buf->rd) + buf->wr;
> > +}
> 
> Is this a circular buffer shared between GuC and kmd? Since the size is 
> a power of 2, the above function is simply:
> 
Alan: not this is not a circular buffer, so I'll keep the above
version.
> static u32 guc_capture_buf_count(struct __guc_capture_bufstate *buf)
> {
> 	return (buf->wr - buf->rd) & (buf->size - 1);
> }
> 

...

> > +static int
> > +guc_capture_log_remove_dw(struct intel_guc *guc, struct __guc_capture_bufstate *buf,
> > +			  u32 *dw)
> > +{
> > +	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> > +	int tries = 2;
> > +	int avail = 0;
> > +	u32 *src_data;
> > +
> > +	if (!guc_capture_buf_cnt(buf))
> > +		return 0;
> > +
> > +	while (tries--) {
> > +		avail = guc_capture_buf_cnt_to_end(buf);
> 
> Shouldn't this be avail = guc_capture_buf_cnt(buf)?
> 

Alan : The "guc_capture_log_get_[foo]" functions only call above
guc_capture_log_remove_dw when there isnt sufficient space to
copy out an entire structure from the space between the read pointer
and the end of the subregion (before the wrap-around). Those function
would populate the structure dword by dword by calling above func.
(NOTE the buffer and all error capture output structs are dword
aligned). Thats why above function tries twice and resets buf->rd = 0
if we find no space left at the end of the subregion (i.e. need to
wrap around) - which can only be done by calling
"guc_capture_buf_cnt_to_end".

...

> > +
> > +	/* Bookkeeping stuff */
> > +	guc->log_state[GUC_CAPTURE_LOG_BUFFER].flush += log_buf_state_local.flush_to_file;
> > +	new_overflow = intel_guc_check_log_buf_overflow(guc,
> > +							&guc->log_state[GUC_CAPTURE_LOG_BUFFER],
> > +							full_count);
> 
> I am not sure how the overflow logic works here and whether it is 
> applicable to the error capture buffer. Is the guc log buffer one big 
> buffer where the error capture is just a portion of that buffer? If so, 
> is the wrap around applicable to just the errorcapture buffer or to the 
> whole buffer?
> 
Alan: Yes, the guc log buffer is one big log buffer but there are 3 independent
subregions within that are populated with different content and are used
in different ways and timings. Each guc-log subregion (general-logs,
crash-dump and error-capture) has it's own read and write pointers.

> Also what is the wrap_offset field in struct guc_log_buffer_state?

Alan: This is the byte offset of a location in the subregion that is the 1st byte
after the last valid guc entry written by Guc firmware before a wraparound
was done. This would generate a tiny hole at the end of the subregion for better
cacheline alignment when flushing entries into the subregion. However,
the error-capture subregion is dword aligned and all of the output structures
used for error-capture are also dword aligned so this can never happen for the
error-capture subregion.

>