[Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Tue Mar 24 03:15:34 UTC 2020


On Sat, Mar 21, 2020 at 04:26:42PM -0700, Dixit, Ashutosh wrote:
>On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote:
>>
>> From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>
>> We're about to introduce an options to open the perf stream, giving
>> the user ability to configure how often it wants the kernel to poll
>> the OA registers for available data.
>>
>> Right now the workaround against the OA tail pointer race condition
>> requires at least twice the internal kernel polling timer to make any
>> data available.
>>
>> This changes introduce checks on the OA data written into the circular
>> buffer to make as much data as possible available on the first
>> iteration of the polling timer.
>
>/snip/
>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 3222f6cd8255..c1429d3acaf9 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -223,26 +223,17 @@
>>   *
>>   * Although this can be observed explicitly while copying reports to userspace
>>   * by checking for a zeroed report-id field in tail reports, we want to account
>> - * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
>> - * read() attempts.
>> - *
>> - * In effect we define a tail pointer for reading that lags the real tail
>> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
>> - * time for the corresponding reports to become visible to the CPU.
>> - *
>> - * To manage this we actually track two tail pointers:
>> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
>> - *     can trust the corresponding data is visible to the CPU; at which point
>> - *     it is considered 'aged'.
>> - *  2) An 'aged' tail that can be used for read()ing.
>> - *
>> - * The two separate pointers let us decouple read()s from tail pointer aging.
>> - *
>> - * The tail pointers are checked and updated at a limited rate within a hrtimer
>> - * callback (the same callback that is used for delivering EPOLLIN events)
>> - *
>> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
>> - * indicates that an updated tail pointer is needed.
>> + * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of
>> + * redundant read() attempts.
>> + *
>> + * We workaround this issue in oa_buffer_check_unlocked() by reading the reports
>> + * in the OA buffer, starting from the tail reported by the HW until we find 2
>> + * consecutive reports with their first 2 dwords of not at 0. Those dwords are
>
>until we find a report with its first 2 dwords not 0 meaning its previous
>report is completely in memory and ready to be read.
>
>> + * also set to 0 once read and the whole buffer is cleared upon OA buffer
>> + * initialization. The first dword is the reason for this report while the
>> + * second is the timestamp, making the chances of having those 2 fields at 0
>> + * fairly unlikely. A more detailed explanation is available in
>> + * oa_buffer_check_unlocked().
>
>> @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>	 */
>>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>
>>	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>>
>>	hw_tail &= ~(report_size - 1);
>>
>> @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>
>>	now = ktime_get_mono_fast_ns();
>>
>> +	if (hw_tail == stream->oa_buffer.aging_tail &&
>> +	   (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
>> +		/* If the HW tail hasn't move since the last check and the HW
>> +		 * tail has been aging for long enough, declare it the new
>> +		 * tail.
>> +		 */
>> +		stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
>> +	} else {
>> +		u32 head, tail;
>>
>> +		/* NB: The head we observe here might effectively be a little
>> +		 * out of date. If a read() is in progress, the head could be
>> +		 * anywhere between this head and stream->oa_buffer.tail.
>> +		 */
>> +		head = stream->oa_buffer.head - gtt_offset;
>>
>> +		hw_tail -= gtt_offset;
>> +		tail = hw_tail;
>>
>> +		/* Walk the stream backward until we find a report with dword 0
>> +		 * & 1 not at 0. Since the circular buffer pointers progress by
>> +		 * increments of 64 bytes and that reports can be up to 256
>> +		 * bytes long, we can't tell whether a report has fully landed
>> +		 * in memory before the first 2 dwords of the following report
>> +		 * have effectively landed.
>> +		 *
>> +		 * This is assuming that the writes of the OA unit land in
>> +		 * memory in the order they were written to.
>> +		 * If not : (╯°□°)╯︵ ┻━┻
>>		 */
>> -		if (hw_tail >= gtt_offset &&
>> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
>> -			stream->oa_buffer.tails[!aged_idx].offset =
>> -				aging_tail = hw_tail;
>> -			stream->oa_buffer.aging_timestamp = now;
>> -		} else {
>> -			drm_err(&stream->perf->i915->drm,
>> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
>> -				hw_tail);
>> +		while (OA_TAKEN(tail, head) >= report_size) {
>> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>> +			u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail);
>
>Sorry, this is wrong. This should just be:
>
>			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>			report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
>Otherwise when we break out of the loop below tail is still set one
>report_size ahead. previous_tail is not needed. (In the previous version of
>the patch this used to work out correctly).

oh, that's right. tail should point to the first report that has 
non-zero dwords. everything before that should be considered landed.  

Thanks,
Umesh

>
>> +
>> +			/* Head of the report indicated by the HW tail register has
>> +			 * indeed landed into memory.
>> +			 */
>> +			if (report32[0] != 0 || report32[1] != 0)
>> +				break;
>> +
>> +			tail = previous_tail;
>>		}
>> +
>> +		if (((tail - hw_tail) & (OA_BUFFER_SIZE - 1)) > report_size &&
>
>nit: OA_TAKEN(hw_tail, tail) > report_size?
>
>> +		    __ratelimit(&stream->perf->tail_pointer_race))
>> +			DRM_NOTE("unlanded report(s) head=0x%x "
>> +				 "tail=0x%x hw_tail=0x%x\n",
>> +				 head, tail, hw_tail);
>> +
>> +		stream->oa_buffer.tail = gtt_offset + tail;
>> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
>> +		stream->oa_buffer.aging_timestamp = now;
>>	}
>>
>>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>
>> -	return aged_tail == INVALID_TAIL_PTR ?
>> -		false : OA_TAKEN(aged_tail, head) >= report_size;
>> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
>> +			stream->oa_buffer.head - gtt_offset) >= report_size;
>>  }
>
>> @@ -303,6 +292,12 @@ struct i915_perf_stream {
>>		 * OA buffer data to userspace.
>>		 */
>>		u32 head;
>> +
>> +		/**
>> +		 * @tail: The last tail verified tail that can be read by
>
>The last verified tail


More information about the Intel-gfx mailing list