[Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Nov 23 09:15:10 UTC 2021
On 22/11/2021 23:39, Umesh Nerlige Ramappa wrote:
> On Mon, Nov 22, 2021 at 03:44:29PM +0000, Tvrtko Ursulin wrote:
>>
>> On 11/11/2021 16:48, Umesh Nerlige Ramappa wrote:
>>> On Thu, Nov 11, 2021 at 02:37:43PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 04/11/2021 22:04, Umesh Nerlige Ramappa wrote:
>>>>> On Thu, Nov 04, 2021 at 05:37:37PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 03/11/2021 22:47, Umesh Nerlige Ramappa wrote:
>>>>>>> Since the PMU callback runs in irq context, it synchronizes with gt
>>>>>>> reset using the reset count. We could run into a case where the PMU
>>>>>>> callback could read the reset count before it is updated. This has a
>>>>>>> potential of corrupting the busyness stats.
>>>>>>>
>>>>>>> In addition to the reset count, check if the reset bit is set before
>>>>>>> capturing busyness.
>>>>>>>
>>>>>>> In addition save the previous stats only if you intend to update
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Umesh Nerlige Ramappa
>>>>>>> <umesh.nerlige.ramappa at intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 ++++++++----
>>>>>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> index 5cc49c0b3889..d83ade77ca07 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> @@ -1183,6 +1183,7 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> u64 total, gt_stamp_saved;
>>>>>>> unsigned long flags;
>>>>>>> u32 reset_count;
>>>>>>> + bool in_reset;
>>>>>>> spin_lock_irqsave(&guc->timestamp.lock, flags);
>>>>>>> @@ -1191,7 +1192,9 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> * engine busyness from GuC, so we just use the driver stored
>>>>>>> * copy of busyness. Synchronize with gt reset using
>>>>>>> reset_count.
>>>>>>> */
>>>>>>> - reset_count = i915_reset_count(gpu_error);
>>>>>>> + rcu_read_lock();
>>>>>>> + in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags);
>>>>>>> + rcu_read_unlock();
>>>>>>
>>>>>> I don't really understand the point of rcu_read_lock over test_bit
>>>>>> but I guess you copied it from the trylock loop.
>>>>>
>>>>> Yes, I don't see other parts of code using the lock though. I can
>>>>> drop it.
>>>>>
>>>>>>
>>>>>>> *now = ktime_get();
>>>>>>> @@ -1201,9 +1204,10 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> * start_gt_clk is derived from GuC state. To get a consistent
>>>>>>> * view of activity, we query the GuC state only if gt is
>>>>>>> awake.
>>>>>>> */
>>>>>>> - stats_saved = *stats;
>>>>>>> - gt_stamp_saved = guc->timestamp.gt_stamp;
>>>>>>> - if (intel_gt_pm_get_if_awake(gt)) {
>>>>>>> + if (intel_gt_pm_get_if_awake(gt) && !in_reset) {
>>>>>>
>>>>>> What is the point of looking at the old value of in_reset here?
>>>>>> Gut feeling says if there is a race this does not fix it.
>>>>>>
>>>>>> I did not figure out from the commit message what does "could read
>>>>>> the reset count before it is updated" mean?
>>>>>> I thought the point of reading
>>>>>
>>>>>> the reset count twice was that you are sure there was no reset
>>>>>> while in here, in which case it is safe to update the software
>>>>>> copy. I don't easily see what test_bit does on top.
>>>>>
>>>>> This is what I see in the reset flow
>>>>> ---------------
>>>>>
>>>>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>>>>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>>>>> R3) reset prepare
>>>>> R4) do the HW reset
>>>>>
>>>>> The reset count is updated only once above and that's before an
>>>>> actual HW reset happens.
>>>>>
>>>>> PMU callback flow before this patch
>>>>> ---------------
>>>>>
>>>>> P1) read reset count
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> I am concerned that the PMU flow could run after step (R2). Then we
>>>>> wrongly conclude that the count stayed the same and no HW reset
>>>>> happened.
>>>
>>> Here is the problematic sequence: Threads R and P.
>>> ------------
>>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>>> P1) read reset count
>>> P2) update stats
>>> P3) read reset count
>>> P4) if reset count changed, use old stats. if not use updated stats.
>>> R3) reset prepare
>>> R4) do the HW reset
>>>
>>> Do you agree that this is racy? In thread P we don't know in if the
>>> reset flag was set or not when we captured the reset count in P1?
>>>
>>>>>
>>>>> PMU callback flow with this patch
>>>>> ---------------
>>>>> This would rely on the reset_count only if a reset is not in progress.
>>>>>
>>>>> P0) test_bit for I915_RESET_BACKOFF
>>>>> P1) read reset count if not in reset. if in reset, use old stats
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> Now that I think about it more, I do see one sequence that still
>>>>> needs fixing though - P0, R1, R2, P1 - P4. For that, I think I need
>>>>> to re-read the BACKOFF bit after reading the reset_count for the
>>>>> first time.
>>>>> Modified PMU callback sequence would be:
>>>>> ----------
>>>>>
>>>>> M0) test_bit for I915_RESET_BACKOFF
>>>>> M1) read reset count if not in reset, if in reset, use old stats
>>>>>
>>>>> M1.1) test_bit for I915_RESET_BACKOFF. if set, use old stats. if
>>>>> not, use reset_count to synchronize
>>>>>
>>>>> M2) update stats
>>>>> M3) read reset count
>>>>> M4) if reset count changed, use old stats. if not use updated stats.
>>>>
>>>> You did not end up implementing this flow? Have you later changed
>>>> your mind whether it is required or not? Or maybe I am looking at
>>>> not the latest patch.
>>>>
>>>> Is the below the latest?
>>>>
>>>> """
>>>> v2:
>>>> - The 2 reset counts captured in the PMU callback can end up being the
>>>> same if they were captured right after the count is incremented in the
>>>> reset flow. This can lead to a bad busyness state. Ensure that reset
>>>> is not in progress when the initial reset count is captured.
>>>> """
>>>
>>> Yes, v2 is the latest (maybe CI results re-ordered the patches).
>>> Instead of sampling the BACKOFF flag before and after the reset count
>>> (as in the modified sequence), I just sample it after. The order is
>>> critical - first sample reset count and then the reset flag.
>>>
>>>>
>>>> Is the key now that you rely on ordering of atomic_inc and set_bit
>>>> in the reset path?
>>>
>>> Yes
>>>
>>>> Frankly I still don't understand why you can get away
>>>
>>>> with using stale in_reset in v2. If you acknowledge it can change
>>>> between sampling and checking, then what is the point in having it
>>>> altogether? You still solely rely on reset count in that case, no?
>>>
>>> Correct, but now I know for sure that the first sample of reset_count
>>> was captured when reset flag was not set (since I am relying on the
>>> order of sampling).
>>>
>>> About solely using the reset_count, I have listed the problematic
>>> sequence above to highlight what the issue is.
>>
>> It was this:
>>
>> """
>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>> P1) read reset count
>> P2) update stats
>> P3) read reset count
>> P4) if reset count changed, use old stats. if not use updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> Do you agree that this is racy? In thread P we don't know in if the
>> reset flag was set or not when we captured the reset count in P1?
>> """
>>
>> Why it matter if reset flag was set or not? Lets see how things are
>> after this patch:
>>
>> After this patch it ends like this:
>>
>> P1) Read and store reset bit
>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>> P2) If reset bit was not set:
>> P2.1) read reset count
>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>> P2.2) update stats
>> P2.3) read reset count
>> P2.4) if reset count changed, use old stats. if not use
>> updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> So the reset bit got set between P1 and P2. How is that then not the
>> same as not looking at the reset bit at all?
>
> But the new sequence in this patch is this:
Oops I was looking at v1. Okay I can't come up with any further races,
acked.
Regards,
Tvrtko
>
> P0) read reset count
> P1) Read and store reset bit
> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
> P2) If reset bit was not set:
> R2) atomic_inc(>->i915->gpu_error.reset_count)
> P2.2) update stats
> P2.3) read reset count
> P2.4) if reset count changed, use old stats. if not use updated
> stats.
> R3) reset prepare
> R4) do the HW reset
>
> P2.1 moved to P0 when compared to the sequence you shared above.
>
> Thanks,
> Umesh
>
>>
>> Regards,
>>
>> Tvrtko
More information about the Intel-gfx
mailing list