[PATCH] epoll: try to be a _bit_ better about file lifetimes
Christian König
christian.koenig at amd.com
Mon May 6 16:15:06 UTC 2024
Am 05.05.24 um 22:53 schrieb Linus Torvalds:
> On Sun, 5 May 2024 at 13:30, Al Viro <viro at zeniv.linux.org.uk> wrote:
>> 0. special-cased ->f_count rule for ->poll() is a wart and it's
>> better to get rid of it.
>>
>> 1. fs/eventpoll.c is a steaming pile of shit and I'd be glad to see
>> git rm taken to it. Short of that, by all means, let's grab reference
>> in there around the call of vfs_poll() (see (0)).
> Agreed on 0/1.
>
>> 2. having ->poll() instances grab extra references to file passed
>> to them is not something that should be encouraged; there's a plenty
>> of potential problems, and "caller has it pinned, so we are fine with
>> grabbing extra refs" is nowhere near enough to eliminate those.
> So it's not clear why you hate it so much, since those extra
> references are totally normal in all the other VFS paths.
Sorry to maybe jumping into the middle of the discussion, but for
DMA-buf the behavior Al doesn't want is actually desired.
And I totally understand why Al is against it for file system based
files, but for this case it's completely intentional.
Removing the callback on close is what we used to do a long time ago,
but that turned out into a locking nightmare because it meant that we
need to be able to wait for driver specific locks from whatever non
interrupt context fput() is called from.
Regards,
Christian.
>
> I mean, they are perhaps not the *common* case, but we have a lot of
> random get_file() calls sprinkled around in various places when you
> end up passing a file descriptor off to some asynchronous operation
> thing.
>
> Yeah, I think most of them tend to be special operations (eg the tty
> TIOCCONS ioctl to redirect the console), but it's not like vfs_ioctl()
> is *that* different from vfs_poll. Different operation, not somehow
> "one is more special than the other".
>
> cachefiles and backing-file does it for regular IO, and drop it at IO
> completion - not that different from what dma-buf does. It's in
> ->read_iter() rather than ->poll(), but again: different operations,
> but not "one of them is somehow fundamentally different".
>
>> 3. dma-buf uses of get_file() are probably safe (epoll shite aside),
>> but they do look fishy. That has nothing to do with epoll.
> Now, what dma-buf basically seems to do is to avoid ref-counting its
> own fundamental data structure, and replaces that by refcounting the
> 'struct file' that *points* to it instead.
>
> And it is a bit odd, but it actually makes some amount of sense,
> because then what it passes around is that file pointer (and it allows
> passing it around from user space *as* that file).
>
> And honestly, if you look at why it then needs to add its refcount to
> it all, it actually makes sense. dma-bufs have this notion of
> "fences" that are basically completion points for the asynchronous
> DMA. Doing a "poll()" operation will add a note to the fence to get
> that wakeup when it's done.
>
> And yes, logically it takes a ref to the "struct dma_buf", but because
> of how the lifetime of the dma_buf is associated with the lifetime of
> the 'struct file', that then turns into taking a ref on the file.
>
> Unusual? Yes. But not illogical. Not obviously broken. Tying the
> lifetime of the dma_buf to the lifetime of a file that is passed along
> makes _sense_ for that use.
>
> I'm sure dma-bufs could add another level of refcounting on the
> 'struct dma_buf' itself, and not make it be 1:1 with the file, but
> it's not clear to me what the advantage would really be, or why it
> would be wrong to re-use a refcount that is already there.
>
> Linus
More information about the dri-devel
mailing list