[Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Wed Apr 15 22:47:57 UTC 2020


On Wed, Apr 15, 2020 at 10:16:59PM +0300, Lionel Landwerlin wrote:
>On 15/04/2020 21:55, Umesh Nerlige Ramappa wrote:
>>On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>>>On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>>>A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>waiting on the event to occur. These checks are redundant when hrtimer
>>>>events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>>>pointers. The redundant checks add cpu overhead. Simplify the check
>>>>to reduce cpu overhead when using blocking io to read oa buffer 
>>>>reports.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>>
>>>
>>>I cherry picked this patch alone and it seems to break the 
>>>disabled-read-error test.
>>
>>Strange. I don't see it fail on my CFL. I am apply this on the 
>>latest drm-tip from yesterday.
>>
>>The patch still checks if reports are available before blocking. The 
>>behavior should still be the same w.r.t this test.
>>
>>What machine did you run it on? I will try on the same. Any chance 
>>you have the debug output from the test?
>>
>>Thanks,
>>Umesh
>>
>
>I got that on SKL GT4 : http://paste.debian.net/1140604/
>

Fails always on SKL GT4. Thanks for catching this :)

Also fails without this patch if this test were to use NONBLOCKing IO.

This has to do with being able to read reports at sampling frequencies 
as high as 5 us (on some platforms, I guess).

Will look into it further and repost the series. Let me know if you have 
any other thoughts on this.

Thanks,
Umesh

>
>Thanks,
>
>
>-Lionel
>
>
>>>
>>>
>>>>---
>>>> drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index 5cde3e4e7be6..e28a3ab83fde 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>     return pollin;
>>>> }
>>>>+/**
>>>>+ * oa_buffer_check_reports - quick check if reports are available
>>>>+ * @stream: i915 stream instance
>>>>+ *
>>>>+ * The return from this function is used as a condition for
>>>>+ * wait_event_interruptible in blocking read. This is used to detect
>>>>+ * available reports.
>>>>+ *
>>>>+ * A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>+ * waiting on an event to occur. These checks are redundant 
>>>>when hrtimer events
>>>>+ * will call oa_buffer_check_unlocked to update the oa_buffer 
>>>>tail pointers. The
>>>>+ * redundant checks add cpu overhead. We simplify the check to 
>>>>reduce cpu
>>>>+ * overhead.
>>>>+ */
>>>>+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>>>+{
>>>>+    unsigned long flags;
>>>>+    bool available;
>>>>+
>>>>+    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>>>+    available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>>>+ spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>>>+
>>>>+    return available;
>>>>+}
>>>>+
>>>> /**
>>>>  * append_oa_status - Appends a status record to a userspace 
>>>>read() buffer.
>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>         return -EIO;
>>>>     return wait_event_interruptible(stream->poll_wq,
>>>>-                    oa_buffer_check_unlocked(stream));
>>>>+                    oa_buffer_check_reports(stream));
>>>> }
>>>> /**
>>>
>>>
>


More information about the Intel-gfx mailing list