[Intel-gfx] [PATCH] drm/i915: Fix NPD in PMU during driver teardown

Thu Jul 21 07:43:40 UTC 2022

On 21/07/2022 05:30, Summers, Stuart wrote:
> On Wed, 2022-07-20 at 13:07 -0700, Umesh Nerlige Ramappa wrote:
>> On Wed, Jul 20, 2022 at 09:14:38AM +0100, Tvrtko Ursulin wrote:
>>> On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
>>>> On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
>>>>> On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
>>>>>> On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin
>>>>>> wrote:
>>>>>>> On 01/07/2022 15:54, Summers, Stuart wrote:
>>>>>>>> On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>>>>>>> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>>>>>>> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart
>>>>>>>>>> Summers wrote:
>>>>>>>>>>> In the driver teardown, we are unregistering the gt
>>>>>>>>>>> prior
>>>>>>>>>>> to unregistering the PMU. This means there is a
>>>>>>>>>>> small window
>>>>>>>>>>> of time in which the application can request
>>>>>>>>>>> metrics from the
>>>>>>>>>>> PMU, some of which are calling into the uapi
>>>>>>>>>>> engines list,
>>>>>>>>>>> while the engines are not available. In this case
>>>>>>>>>>> we can
>>>>>>>>>>> see null pointer dereferences.
>>>>>>>>>>>
>>>>>>>>>>> Fix this ordering in both the driver load and
>>>>>>>>>>> unload sequences.
>>>>>>>>>>>
>>>>>>>>>>> Additionally add a check for engine presence to
>>>>>>>>>>> prevent this
>>>>>>>>>>> NPD in the event this ordering is accidentally
>>>>>>>>>>> reversed. Print
>>>>>>>>>>> a debug message indicating when they aren't
>>>>>>>>>>> available.
>>>>>>>>>>>
>>>>>>>>>>> v1: Actually address the driver load/unload
>>>>>>>>>>> ordering issue
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stuart Summers <
>>>>>>>>>>> stuart.summers at intel.com>not yet been able to force a failure on a system with lots of RAM. My dev system has 32G of ram, and I have not been able to arrive at the level of memory pressure to apply which causes the gem cache to exceed the system memory without being killed by OOM first.
>>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> I thought this is likely happening because
>>>>>>>>>> intel_gpu_top is running
>>>>>>>>>> in
>>>>>>>>>> the background when i915 is unloaded. I tried a quick
>>>>>>>>>> repro, I
>>>>>>>>>> don't see
>>>>>>>>>> the unload succeed ("fatal module in use", not sure
>>>>>>>>>> if this was a
>>>>>>>>>> partial unload), but when I try to kill
>>>>>>>>>> intel_gpu_top, I get an
>>>>>>>>>> NPD.
>>>>>>>>>> This is in the event disable path -
>>>>>>>>>> i915_pmu_event_stop ->
>>>>>>>>>> i915_pmu_disable.
>>>>>>>>>
>>>>>>>>> So i915 failed to unload (as expected - with perf
>>>>>>>>> events open we
>>>>>>>>> elevate
>>>>>>>>> the module ref count via i915_pmu_event_init ->
>>>>>>>>> drm_dev_get), then
>>>>>>>>> you
>>>>>>>>> quit intel_gpu_top and get NPD? On the engine lookup?
>>>>>>>>> With the
>>>>>>>>> re-ordered init/fini sequence as from this patch?
>>>>>>>>>
>>>>>>>>> With elevated module count there shouldn't be any
>>>>>>>>> unloading happening
>>>>>>>>> so
>>>>>>>>> I am intrigued.
>>>>>>>>>
>>>>>>>>>> It's likely that you are seeing a different path
>>>>>>>>>> (unload) leading
>>>>>>>>>> to the
>>>>>>>>>> same issue.
>>>>>>>>>>
>>>>>>>>>> I think in i915_pmu_disable/disable should be aware
>>>>>>>>>> of event-
>>>>>>>>>>> hw.state
>>>>>>>>>> and or pmu->closed states before accessing the event.
>>>>>>>>>> Maybe like,
>>>>>>>>>>
>>>>>>>>>> if (event->hw.state != PERF_HES_STOPPED &&
>>>>>>>>>> is_engine_event(event))
>>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>> @Tvrtko, wondering if this case is tested by igt at perf
>>>>>>>>>> _pmu at module-unload.
>>>>>>>>>
>>>>>>>>> A bit yes. From what Stuart wrote it seems the test
>>>>>>>>> would need to be
>>>>>>>>> extended to cover the case where PMU is getting opened
>>>>>>>>> while module
>>>>>>>>> unload is in progress.
>>>>>>>>>
>>>>>>>>> But the NPD you saw is for the moment confusing so I
>>>>>>>>> don't know what
>>>>>>>>> is
>>>>>>>>> happening.
>>>>>>>>>
>>>>>>>>>> I am not clear if we should use event->hw.state or
>>>>>>>>>> pmu->closed here
>>>>>>>>>> and
>>>>>>>>>> if/how they are related. IMO, for this issue, the
>>>>>>>>>> engine check is
>>>>>>>>>> good
>>>>>>>>>> enough too, so we don't really need the pmu state
>>>>>>>>>> checks.
>>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Engine check at the moment feels like papering.
>>>>>>>>>
>>>>>>>>> Indeed as you say I think the pmu->closed might be the
>>>>>>>>> solution.
>>>>>>>>> Perhaps
>>>>>>>>> the race is as mentioned above. PMU open happening in
>>>>>>>>> parallel to
>>>>>>>>> unload..
>>>>>>>>>
>>>>>>>>> If the sequence of events userspace triggers is:
>>>>>>>>>
>>>>>>>>>     i915_pmu_event_init
>>>>>>>>>     i915_pmu_event_start
>>>>>>>>>     i915_pmu_enable
>>>>>>>>>     i915_pmu_event_read
>>>>>>>>>
>>>>>>>>> I guess pmu->closed can get set halfway in
>>>>>>>>> i915_pmu_event_init. What
>>>>>>>>> would be the effect of that.. We'd try to get a module
>>>>>>>>> reference
>>>>>>>>> while
>>>>>>>>> in the process of unloading. Which is probably very
>>>>>>>>> bad.. So possibly
>>>>>>>>> a
>>>>>>>>> final check on pmu->close is needed there. Ho hum.. can
>>>>>>>>> it be made
>>>>>>>>> safe
>>>>>>>>> is the question.
>>>>>>>>>
>>>>>>>>> It doesn't explain the NPD on Ctrl-C though..
>>>>>>>>> intel_gpu_top keeps
>>>>>>>>> the
>>>>>>>>> evens open all the time. So I think more info needed,
>>>>>>>>> for me at
>>>>>>>>> least.
>>>>>>>>
>>>>>>>> So one thing here is this doesn't have to do with module
>>>>>>>> unload, but
>>>>>>>> module unbind specifically (while perf is open). I don't
>>>>>>>> know if the
>>>>>>>> NPD from Umesh is the same as what we're seeing here. I'd
>>>>>>>> really like
>>>>>>>> to separate these unless you know for sure that's
>>>>>>>> related. Also it
>>>>>>>> would be interesting to know if this patch fixes your
>>>>>>>> issue as well.
>>>>>>>>
>>>>>>>> I still think the re-ordering in i915_driver.c should be
>>>>>>>> enough and we
>>>>>>>> shouldn't need to check pmu->closed. The unregister
>>>>>>>> should
>>>>>>>> be enough to
>>>>>>>> ensure the perf tools are notified that new events aren't
>>>>>>>> allowed, and
>>>>>>>> at that time the engine structures are still intact. And
>>>>>>>> even if for
>>>>>>>> some reason the perf code still calls in to our function
>>>>>>>> pointers, we
>>>>>>>> have these engine checks as a failsafe.
>>>>>>>>
>>>>>>>> I'm by the way uploading one more version here with a
>>>>>>>> drm_WARN_ONCE
>>>>>>>> instead of the debug print.
>>>>>>>
>>>>>>> Problem is I am not a fan of papering so lets get to the
>>>>>>> bottom of the issue first. (In the meantime simple patch
>>>>>>> to
>>>>>>> re-order driver fini is okay since that seems obvious
>>>>>>> enough,
>>>>>>> I tnink.)
>>>>>>>
>>>>>>> We need to see call traces from any oopses and try to
>>>>>>> extend
>>>>>>> perf_pmu to catch them. And we need to understand the
>>>>>>> problem,
>>>>>>> if it is a real problem, which I laid out last week about
>>>>>>> race
>>>>>>> between module unload and elevating the module use count
>>>>>>> from
>>>>>>> our perf event init.
>>>>>>>
>>>>>>> Without understanding the details of possible failure mode
>>>>>>> flows we don't know how much the papering with engine
>>>>>>> checks
>>>>>>> solves and how much it leaves broken.
>>>>>>>
>>>>>>> If you guys are too busy to tackle that I'll put it onto
>>>>>>> myself, but help would certainly be appreciated.
>>>>>>
>>>>>> Looks like Stuart/Chris are pointing towards the unbind as an
>>>>>> issue.
>>>>>>
>>>>>> I ran this sequence and only the modprobe showed an error
>>>>>> (FATAL: ... still in use). What happens with the unbind.
>>>>>> Should
>>>>>> pmu also handle the unbind somehow?
>>>>>>
>>>>>> - run intel_gpu_top
>>>>>> - unbind
>>>>>> - modprobe -r i915
>>>>>> - kill intel_gpu_top.
>>>>>
>>>>> And it crashes or survives in this scenario?
>>>>
>>>> hangs on adlp, haven't been able to get the serial logs
>>>>
>>>>> Module still in use here would be expected since intel_gpu_top
>>>>> is
>>>>> holding a module reference.
>>>>>
>>>>> And pmu->closed should be set at the unbind step via
>>>>> i915_pci_remove -> i915_driver_unregister ->
>>>>> i915_pmu_unregister.
>>>>
>>>> After unbind,
>>>> kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop
>>>> ->
>>>> i915_pmu_disable -> likely HANGs when dereferencing engine.
>>>>
>>>> Can we can short circuit i915_pmu_disable with
>>>> if (pmu->closed)
>>>>      return;
>>>>
>>>> since this function is also adjusting pmu->enable_count. Does it
>>>> matter after pmu is closed?
>>>
>>> Erm yes.. this sounds obvious now but why I did not put a pmu-
>>>> closed check in i915_pmu_event_stop, since read and start/init
>>> have it!? Was it a simple oversight or something more I can't
>>> remember.
>>>
>>> Try like this maybe:
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 958b37123bf1..2399adf92cc0 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct
>>> perf_event *event, int flags)
>>> static void i915_pmu_event_stop(struct perf_event *event, int
>>> flags)
>>> {
>>> +       if (pmu->closed)
>>> +               goto out;
>>> +
>>>         if (flags & PERF_EF_UPDATE)
>>>                 i915_pmu_event_read(event);
>>>         i915_pmu_disable(event);
>>> +out:
>>>         event->hw.state = PERF_HES_STOPPED;
>>> }
>>>
>>> Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
>>
>> that works. I don't see a hang with the above sequence on ADLP. Do
>> you
>> want to post/merge this?
>>
>> Also what about Stuart's changes in this series. At a minimum, I
>> would
>> keep the engine checks in i915_pmu_disable (rev1)? I am not sure the
>> reorder of pmu/gt registrations is needed though.
> 
> Thanks for the help here Tvrtko/Umesh! Sorry for the late reply here.
> I've been swamped and haven't been able to get back here.
> 
> IMO we really should have all three of these, possibly in three
> separate patches. I'm happy to post any or all of these or one of you
> can - happy to review. It will be earliest some time tomorrow though.

Re-order on unbind path AFAIR yes, and pmu->closed check in either 
i915_pmu_event_stop or early return from i915_pmu_disable (I was going 
for symmetry with start, but perhaps it looks clumsy, not sure) yes. 
Those two should have a fixes tag as well. Null engine checks I still do 
not support. It adds a production build debug string for something which 
is supposed to be impossible and a programming error, and makes the code 
a bit uglier with the extra indentation.

Regards,

Tvrtko

> 
> Thanks,
> Stuart
> 
>>
>> Thanks,
>> Umesh
>>
>>> Enable count handling in i915_pmu_disable should not matter since
>>> the i915_pmu_unregister would have already been executed by this
>>> point so all we need to ensure is that pmu->closed is not use after
>>> free. And since open event hold the DRM device reference I think
>>> that is fine.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Umesh
>>>>
>>>>
>>>>> We also need to try a stress test with two threads:
>>>>>
>>>>>      Thread A        Thread B
>>>>>      -----------        -----------
>>>>>      loop:            loop:
>>>>>        open pmu event      rmmod
>>>>>        close pmu event      insmod
>>>>>
>>>>> To see if it can hit a problem with drm_dev_get from
>>>>> i915_pmu_event_init being called at a bad moment relative to
>>>>> module unload. Maybe I am confused but that seems a
>>>>> possibility
>>>>> and a serious problem currently.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko