[Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in per-gt sysfs

Dixit, Ashutosh ashutosh.dixit at intel.com
Fri May 13 05:05:42 UTC 2022


On Thu, 12 May 2022 00:48:08 -0700, Tvrtko Ursulin wrote:

Hi Tvrtko,

> On 12/05/2022 00:15, Dixit, Ashutosh wrote:
> > On Tue, 10 May 2022 03:41:57 -0700, Andrzej Hajda wrote:
> >> On 10.05.2022 11:48, Tvrtko Ursulin wrote:
> >>> On 10/05/2022 10:39, Andrzej Hajda wrote:
> >>>> On 10.05.2022 10:18, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> Was there closure/agreement on the matter of whether or not there is
> >>>>>>> a potential race between "kfree(gt)" and sysfs access (last put from
> >>>>>>> sysfs that is)? I've noticed Andrzej and Ashutosh were discussing it
> >>>>>>> but did not read all the details.
> >>>>>>>
> >>>>>>
> >>>>>> Not really :)
> >>>>>> IMO docs are against this practice, Ashutosh shows examples of this
> >>>>>> practice in code and according to his analysis it is safe.
> >>>>>> I gave up looking for contradictions :) Either it is OK, kobject is
> >>>>>> not fully shared object, docs are obsolete and needs update, either
> >>>>>> the patch is wrong.
> >>>>>> Anyway finally I tend to accept this solution, I failed to prove it is
> >>>>>> wrong :)
> >>>>>
> >>>>> Like a question of whether hotunplug can be triggered while userspace
> >>>>> is sitting in a sysfs hook? Final kfree then has to be delayed until
> >>>>> userspace exists.
> >>>>>
> >>>>> Btw where is the "kfree(gt)" for the tiles on the PCI remove path? I
> >>>>> can't find it.. Do we have a leak?
> >>>>
> >>>> intel_gt_tile_cleanup ?
> >>>
> >>> Called from intel_gt_release_all, whose only caller is the failure path
> >>> of i915_driver_probe. Feels like something is missing?
> >>
> >> This is final proof this patch is safe - no kfree, no UAF :)
> >>
> >> Apparently it is broken in internal branch as well.
> >> Should I take care of it?
> >
> > See Daniele's comment here:
> >
> > https://patchwork.freedesktop.org/patch/478856/?series=101551&rev=1
>
> Yeah we found that same leak yesterday, or the day before in this thread.
>
> > We clean up the gt sysfs during PCI device remove (i915_driver_remove ->
> > i915_driver_unregister -> intel_gt_driver_unregister ->
> > intel_gt_sysfs_unregister (added in this patch)). But from Daniele's mail
> > it appears that "kfree(gt)" can only be done from i915_driver_release().
> >
> > So as long as i915_driver_release() always happens after
> > i915_driver_remove() (which seems to be the case though I couldn't figure
> > out why (i.e. who is putting the final reference of the drm device)) there
> > is no UAF and no race. Thanks!
>
> No worried by the unknown?

Well if release() happens before or during remove() then (as is clear from
Daniele's mail) we have a much bigger problem than sysfs on our hands and
will see UAF crashes during device remove/unbind. But as far as we know no
such crashes have been reported.

> I had a quick look whether core_hotunplug tests for sysfs interactions
> but couldn't spot it. What I had in mind is userspace stuck in a sysfs
> hook (say read into a userfaultfd buffer) with device hotunplug in
> parallel. Maybe it is all handled already, not claiming that it isn't.

This is the 20 year old issue mentioned by Andrzej here:
https://lwn.net/Articles/36850/

So I thought I'll try this out today and see what actually happens to
settle this. And you will see it makes perfect sense. So this is what I
did:

* Change IGT to add a 20 second sleep after opening a sysfs file
* In that 20 second period, with an open fd, unbind the device using:
	echo -n "0000:03:00.0" > /sys/bus/pci/drivers/i915/unbind
  And also rmmod i915.

So this is what we see when we do this:
* As soon as the device is unbound, the complete i915 sysfs tree (under
  /sys/card/drm/card0) is cleanly removed (even with the open fd in IGT).
* The fd open in IGT is now orphan/invalid, so when IGT resumes and tries
  to use that fd IGT crashes.
* So no problem with device unbind but if IGT is still hanging around rmmod
  fails (saying module is in use, most likely due to the still open drm fd)
  but after IGT is completely killed rmmod is also fine.

So this confirms all this is correctly handled.

Separately, note that kobject_put's introduced in this patch are only
needed for freeing the memory allocated for the kobject's themselves (or
their containing struct's). kobject_put's don't play a role in cleaning up
the sysfs hierarchy itself (which will get cleaned up even without the
kobject_put's). Further, child kobject's take a reference on their parents
so child kobjects need a kobject_put before the parent kobject_put to free
the memory allocated for the parent (i.e. doing a kobject_put on the parent
will not automatically free all the child kobjects).

Thanks.
--
Ashutosh


More information about the Intel-gfx mailing list