[PATCH] drm/i915/perf: Update handling of MMIO triggered reports

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Tue Dec 19 06:07:38 UTC 2023


On Mon, Dec 18, 2023 at 09:48:39PM -0800, Dixit, Ashutosh wrote:
>On Mon, 18 Dec 2023 21:28:33 -0800, Dixit, Ashutosh wrote:
>>
>> On Mon, 18 Dec 2023 16:05:43 -0800, Umesh Nerlige Ramappa wrote:
>> >
>>
>> Hi Umesh,
>>
>> > On XEHP platforms user is not able to find MMIO triggered reports in the
>> > OA buffer since i915 squashes the context ID fields. These context ID
>> > fields hold the MMIO trigger markers.
>> >
>> > Update logic to not squash the context ID fields of MMIO triggered
>> > reports.
>> >
>> > Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
>> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
>> >  1 file changed, 34 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> > index 7b1c8de2f9cb..2d695818f006 100644
>> > --- a/drivers/gpu/drm/i915/i915_perf.c
>> > +++ b/drivers/gpu/drm/i915/i915_perf.c
>> > @@ -772,10 +772,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>> >		 * The reason field includes flags identifying what
>> >		 * triggered this specific report (mostly timer
>> >		 * triggered or e.g. due to a context switch).
>> > -		 *
>> > -		 * In MMIO triggered reports, some platforms do not set the
>> > -		 * reason bit in this field and it is valid to have a reason
>> > -		 * field of zero.
>> >		 */
>> >		reason = oa_report_reason(stream, report);
>> >		ctx_id = oa_context_id(stream, report32);
>> > @@ -787,8 +783,41 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>> >		 *
>> >		 * Note: that we don't clear the valid_ctx_bit so userspace can
>> >		 * understand that the ID has been squashed by the kernel.
>> > +		 *
>> > +		 * Update:
>> > +		 *
>> > +		 * On XEHP platforms the behavior of context id valid bit has
>> > +		 * changed compared to prior platforms. To describe this, we
>> > +		 * define a few terms:
>> > +		 *
>> > +		 * context-switch-report: This is a report with the reason type
>> > +		 * being context-switch. It is generated when a context switches
>> > +		 * out.
>> > +		 *
>> > +		 * context-valid-bit: A bit that is set in the report ID field
>> > +		 * to indicate that a valid context has been loaded.
>> > +		 *
>> > +		 * gpu-idle: A condition characterized by a
>> > +		 * context-switch-report with context-valid-bit set to 0.
>> > +		 *
>> > +		 * On prior platforms, context-id-valid bit is set to 0 only
>> > +		 * when GPU goes idle. In all other reports, it is set to 1.
>> > +		 *
>> > +		 * On XEHP platforms, context-valid-bit is set to 1 in a context
>> > +		 * switch report if a new context switched in. For all other
>> > +		 * reports it is set to 0.
>> > +		 *
>> > +		 * This change in behavior causes an issue with MMIO triggered
>> > +		 * reports. MMIO triggered reports have the markers in the
>> > +		 * context ID field and the context-valid-bit is 0. The logic
>> > +		 * below to squash the context ID would render the report
>> > +		 * useless since the user will not be able to find it in the OA
>> > +		 * buffer. Since MMIO triggered reports exist only on XEHP,
>> > +		 * we should avoid squashing these for XEHP platforms.
>>
>> Hmm I am wondering if this is over-information and this comment should be
>> made brief.
>
>Let me try: "For Gen's >= 12.50, the context id valid bit is reset when a
>context switches out, but the context id is still valid. Because of this we
>cannot squash the context id in this case".
>
>So this should affect both the regular as well as MMIO triggered cases
>afaiu.
>
>Anyway, please do what you think is right with the comment. I just thought
>I'll chime in.

The long and descriptive comment is entirely for my benefit. There is a 
very good chance I will forget this, so putting it down in the code.  
Also, I don't see this described in the spec, so thinking that we will 
benefit from it by having it here. I can put it in the commit msg 
instead if that helps.

Thanks,
Umesh

>
>> For the record, here's the explanation of what is happening from Robert
>> Krzemien's email (which at least makes it simpler for me to understand
>> what is happening):
>>
>>	For Gen12HP+ (ATS/DG2/PVC/MTL+) platforms, context id valid bit is
>>	set only for context switch reports and when a context is being
>>	loaded. When exiting a context, a context switch report is
>>	generated, ctx id is not zero, but the bit is not set. It allows us
>>	to distinguish whether context switch reports are generated due to
>>	entering or exiting GPU contexts. Ctx id field is non-zero for
>>	context switches and mmio triggers. Other types always have ctx id
>>	set to 0.
>>
>>	For previous platforms (like Gen12LP, Gen9/11), the bit is set for
>>	all types of reports if a context is loaded. But those older
>>	platforms don’t have mmio triggers. Ctx id field is non-zero for
>>	all types of reports if a context is loaded.
>>
>>	I don’t understand why i915 needs to set ctx id to 0xffffffff if
>>	the flag is not set. It has been removed from XE KMD as I remember
>>	correctly.
>>
>> >		 */
>> > -		if (oa_report_ctx_invalid(stream, report)) {
>> > +
>> > +		if (oa_report_ctx_invalid(stream, report) &&
>> > +		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
>> >			ctx_id = INVALID_CTX_ID;
>> >			oa_context_id_squash(stream, report32);
>> >		}
>>
>> So I am assuming there's some unknown reason (maybe not hearing from Mesa?)
>> why we can't get rid of the squashing even for legacy platforms. But that's
>> ok I guess. So this is:
>>
>> Reviewed-by: Ashutosh Dixit <ashutosh.dixit at intel.com>


More information about the Intel-gfx mailing list