[Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Tue Mar 24 03:17:46 UTC 2020


On Sat, Mar 21, 2020 at 09:44:43PM -0700, Dixit, Ashutosh wrote:
>On Sat, 21 Mar 2020 16:26:42 -0700, Dixit, Ashutosh wrote:
>>
>> On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote:
>> >
>> > From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>
>> > @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >	 */
>> >	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>> >
>> >	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>> >
>> >	hw_tail &= ~(report_size - 1);
>> >
>> > @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >
>> >	now = ktime_get_mono_fast_ns();
>> >
>> > +	if (hw_tail == stream->oa_buffer.aging_tail &&
>> > +	   (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
>> > +		/* If the HW tail hasn't move since the last check and the HW
>> > +		 * tail has been aging for long enough, declare it the new
>> > +		 * tail.
>> > +		 */
>> > +		stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
>> > +	} else {
>> > +		u32 head, tail;
>> >
>> > +		/* NB: The head we observe here might effectively be a little
>> > +		 * out of date. If a read() is in progress, the head could be
>> > +		 * anywhere between this head and stream->oa_buffer.tail.
>> > +		 */
>> > +		head = stream->oa_buffer.head - gtt_offset;
>> >
>> > +		hw_tail -= gtt_offset;
>> > +		tail = hw_tail;
>> >
>> > +		/* Walk the stream backward until we find a report with dword 0
>> > +		 * & 1 not at 0. Since the circular buffer pointers progress by
>> > +		 * increments of 64 bytes and that reports can be up to 256
>> > +		 * bytes long, we can't tell whether a report has fully landed
>> > +		 * in memory before the first 2 dwords of the following report
>> > +		 * have effectively landed.
>> > +		 *
>> > +		 * This is assuming that the writes of the OA unit land in
>> > +		 * memory in the order they were written to.
>> > +		 * If not : (╯°□°)╯︵ ┻━┻
>> >		 */
>> > +		while (OA_TAKEN(tail, head) >= report_size) {
>> > +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>> > +			u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail);
>>
>> Sorry, this is wrong. This should just be:
>>
>>			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>>			report32 = (void *)(stream->oa_buffer.vaddr + tail);
>>
>> Otherwise when we break out of the loop below tail is still set one
>> report_size ahead. previous_tail is not needed. (In the previous version of
>> the patch this used to work out correctly).
>>
>> > +
>> > +			/* Head of the report indicated by the HW tail register has
>> > +			 * indeed landed into memory.
>> > +			 */
>> > +			if (report32[0] != 0 || report32[1] != 0)
>> > +				break;
>> > +
>> > +			tail = previous_tail;
>> >		}
>
>Actually a couple of further improvements to the loop above are
>possible. First there is no reason to start at previous_tail, we can just
>start at the aligned hw_tail itself. Therefore the loop becomes:
>
>		while (OA_TAKEN(tail, head) >= report_size) {
>			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
>			if (report32[0] != 0 || report32[1] != 0)
>				break;
>
>			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>		}
>
>Further, there is no reason to go back to the head but only to the old
>tail. Therefore:
>
>		head = stream->oa_buffer.head - gtt_offset;
>		old_tail = stream->oa_buffer.tail - gtt_offset;
>
>		hw_tail -= gtt_offset;
>		tail = hw_tail;
>
>		while (OA_TAKEN(tail, old_tail) >= report_size) {
>			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
>			if (report32[0] != 0 || report32[1] != 0)
>				break;
>
>			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>		}
>
>Please review and see if these two improvements are possible. Thanks!

I think that this is possible. We could check reports between old_tail 
and tail to optimize the number of reports we read. I will give this a 
try.

Thanks,
Umesh


More information about the Intel-gfx mailing list