[igt-dev] [PATCH i-g-t 1/6] test/perf: Drop caches when closing perf stream

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Wed Mar 4 23:51:56 UTC 2020


On Thu, Mar 05, 2020 at 01:29:50AM +0200, Lionel Landwerlin wrote:
>On 05/03/2020 01:04, Umesh Nerlige Ramappa wrote:
>>On Thu, Mar 05, 2020 at 12:05:55AM +0200, Lionel Landwerlin wrote:
>>>On 04/03/2020 19:51, Umesh Nerlige Ramappa wrote:
>>>>On Wed, Mar 04, 2020 at 10:45:55AM +0200, Lionel Landwerlin wrote:
>>>>>On 04/03/2020 00:57, Umesh Nerlige Ramappa wrote:
>>>>>>On Tue, Mar 03, 2020 at 02:38:08PM -0800, Umesh Nerlige 
>>>>>>Ramappa wrote:
>>>>>>>Running ./build/tests/perf will run all the perf subtests 
>>>>>>>in a sequence.
>>>>>>>When running tests in a sequence, subsequent tests may not 
>>>>>>>run with a
>>>>>>>clean slate. For resources that are lazily released, drop caches in
>>>>>>>__perf_close.
>>>>>>
>>>>>>Hi Lionel, Chris,
>>>>>>
>>>>>>I notice an issue on TGL when running the entire suite of 
>>>>>>perf tests.  In my setup, the polling test was failing with 
>>>>>>invalid reports being seen in the beginning of the OA 
>>>>>>buffer. This issue is seen more prominently with the newly 
>>>>>>added subtests which call perf_open and perf_close a couple 
>>>>>>of times (say blocking-with-interrupt).
>>>>>>
>>>>>>What I see in some runs is that the second test would result 
>>>>>>in a bunch of unlanded reports in the beginning of the OA 
>>>>>>buffer. Assuming that we are already waiting for the NOA 
>>>>>>config with a noa_wait bo, I tried to look into this 
>>>>>>further.
>>>>>>
>>>>>>free_oa_buffer is called to free the oa_buffer bo and this 
>>>>>>work is deferred by the driver. If a test is called before 
>>>>>>this free completes, we see the issue. Just to test out this 
>>>>>>theory, if I comment out the free_oa_buffer entirely, I see 
>>>>>>that the tests pass without any issues since new gtt memory 
>>>>>>is being allocated each time.
>>>>>>
>>>>>>I guess the deferred free and the new allocation of the OA 
>>>>>>buffer for subsequent test has something missing. Maybe TLBs 
>>>>>>not being dropped? I imagine the OA unit might write valid 
>>>>>>reports somewhere based on what it sees in the TLBs and cpu 
>>>>>>is looking for them elsewhere (until the free completes). 
>>>>>>Just a theory though. Let me know what you think.
>>>>>>
>>>>>>For now igt_drop_caches_set(DROP_FREED) is what is helping 
>>>>>>and hence this patch.
>>>>>
>>>>>
>>>>>Hey Umesh,
>>>>>
>>>>>
>>>>>I guess this could be fixed by this commit :
>>>>>
>>>>>
>>>>>commit 4b4e973d5eb89244b67d3223b60f752d0479f253
>>>>>Author: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>Date:   Mon Mar 2 08:57:57 2020 +0000
>>>>>
>>>>>    drm/i915/perf: Reintroduce wait on OA configuration completion
>>>>>
>>>>>If you can give this commit a try or rebase on drm-tip it 
>>>>>would be great to confirm.
>>>>
>>>>I thought this commit was ensuring that the noa_wait is executed 
>>>>completely before we enable the OA buffer captures. That still 
>>>>does not explain why the issue goes away for me when I comment 
>>>>out free_oa_buffer.
>>>
>>>
>>>If noa_wait is not waited upon, either from CPU or GPU, then we 
>>>enable OA while the configuration is not completely applied.
>>>
>>>Hence the invalid data at the beginning of the buffer.
>>>
>>>
>>>Are you saying this commit didn't help?
>>>
>>No, I haven't tried it yet. I lost my reservation on the TGL machine 
>>:(, so waiting for another one.
>>
>>What I meant is that - not freeing the OA buffer (only for 
>>experimenting) results in a new gtt offset everytime we allocate the 
>>OA buffer. When I do this, I don't see any invalid OA reports. This 
>>is without the commit you mention above. If waiting for the NOA 
>>config to complete were indeed the issue, I should have seen it even 
>>with my experiment. Right?
>
>
>I see, thanks that's a useful experiment.
>
>Only thing I can think of would be HEAD/TAIL register writes that 
>didn't land before the OA unit was turned on.
>
>I'll dig into the code.
>
>
>Is this only on Gen12?

Yes. On TGL, There should be 2 way to reproduce this (without the above 
commit):

1. Just run perf so that it runs all the tests (or at least blocking 
followed by polling). The polling test fails.

2. If you are applying the interrupt patches from this and the kernel 
thread, then you just need to run the blocking-with-interrupts test.

In dmesg, you will see messages for unlanded reports. That's what I am 
going by to decide if any fix/workaround worked or not.

Thanks,
Umesh
>
>
>Thanks,
>
>
>-Lionel
>
>
>>
>>Thanks,
>>Umesh
>>
>>>
>>>Thanks,
>>>
>>>
>>>-Lionel
>>>
>>>
>>>>
>>>>Thanks,
>>>>Umesh
>>>>
>>>>>
>>>>>Otherwise we might need more digging to figure what's going on.
>>>>>
>>>>>
>>>>>Thanks,
>>>>>
>>>>>
>>>>>-Lionel
>>>>>
>>>>>
>>>>>>
>>>>>>Thanks,
>>>>>>Umesh
>>>>>>
>>>>>>>
>>>>>>>Signed-off-by: Umesh Nerlige Ramappa 
>>>>>>><umesh.nerlige.ramappa at intel.com>
>>>>>>>---
>>>>>>>tests/perf.c | 7 ++++++-
>>>>>>>1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>>diff --git a/tests/perf.c b/tests/perf.c
>>>>>>>index 5e818030..189c6aa1 100644
>>>>>>>--- a/tests/perf.c
>>>>>>>+++ b/tests/perf.c
>>>>>>>@@ -244,6 +244,12 @@ __perf_close(int fd)
>>>>>>>        close(pm_fd);
>>>>>>>        pm_fd = -1;
>>>>>>>    }
>>>>>>>+
>>>>>>>+    /* When running tests in a sequence, subsequent tests 
>>>>>>>may not run with a
>>>>>>>+     * clean slate. For resources that are lazily 
>>>>>>>released, cleanup here.
>>>>>>>+     */
>>>>>>>+    if (drm_fd >= 0 && !getgid() && !getuid())
>>>>>>>+        gem_quiescent_gpu(drm_fd);
>>>>>>>}
>>>>>>>
>>>>>>>static int
>>>>>>>@@ -3993,7 +3999,6 @@ test_rc6_disable(void)
>>>>>>>    igt_assert_eq(n_events_end - n_events_start, 0);
>>>>>>>
>>>>>>>    __perf_close(stream_fd);
>>>>>>>-    gem_quiescent_gpu(drm_fd);
>>>>>>>
>>>>>>>    n_events_start = rc6_residency_ms();
>>>>>>>    nanosleep(&(struct timespec){ .tv_sec = 1, .tv_nsec = 
>>>>>>>0 }, NULL);
>>>>>>>-- 
>>>>>>>2.20.1
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>igt-dev mailing list
>>>>>>>igt-dev at lists.freedesktop.org
>>>>>>>https://lists.freedesktop.org/mailman/listinfo/igt-dev
>>>>>
>>>>>
>>>
>


More information about the igt-dev mailing list