[PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
Kasireddy, Vivek
vivek.kasireddy at intel.com
Thu Dec 19 07:02:01 UTC 2024
Hi Christian,
> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
>
> >>
> >>> I will resend the patch series. I was experiencing issues with my email
> >>> client, which inadvertently split the series into two separate emails.
> >>
> >> Alternatively I can also copy the code from the list archive and explain why
> >> that doesn't work:
> >>
> >> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool
> >> +revoked) {
> >> + struct vfio_pci_dma_buf *priv;
> >> + struct vfio_pci_dma_buf *tmp;
> >> +
> >> + lockdep_assert_held_write(&vdev->memory_lock);
> >>
> >> This makes sure that the caller is holding vdev->memory_lock.
> >>
> >> +
> >> + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> >> + if (!dma_buf_try_get(priv->dmabuf))
> >>
> >> This here only works because vfio_pci_dma_buf_release() also grabs
> vdev-
> >>> memory_lock and so we are protected against concurrent releases.
> >> + continue;
> >> + if (priv->revoked != revoked) {
> >> + dma_resv_lock(priv->dmabuf->resv, NULL);
> >> + priv->revoked = revoked;
> >> + dma_buf_move_notify(priv->dmabuf);
> >> + dma_resv_unlock(priv->dmabuf->resv);
> >> + }
> >> + dma_buf_put(priv->dmabuf);
> >>
> >> The problem is now that this here might drop the last reference which in
> turn
> >> calls vfio_pci_dma_buf_release() which also tries to grab vdev-
> >>> memory_lock and so results in a deadlock.
> > AFAICS, vfio_pci_dma_buf_release() would not be called synchronously
> after the
> > last reference is dropped by dma_buf_put(). This is because fput(), which is
> called
> > by dma_buf_put() triggers f_op->release() asynchronously; therefore, a
> deadlock
> > is unlikely to occur in this scenario, unless I am overlooking something.
>
> My recollection is that the f_op->release handler is only called
> asynchronously if fput() was issued in interrupt context.
Here is the code of fput() from the current master:
void fput(struct file *file)
{
if (file_ref_put(&file->f_ref)) {
struct task_struct *task = current;
if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
file_free(file);
return;
}
if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
init_task_work(&file->f_task_work, ____fput);
if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
return;
/*
* After this task has run exit_task_work(),
* task_work_add() will fail. Fall through to delayed
* fput to avoid leaking *file.
*/
}
if (llist_add(&file->f_llist, &delayed_fput_list))
schedule_delayed_work(&delayed_fput_work, 1);
}
}
IIUC, based on the above code, it looks like there are two ways in which the
f_op->release() handler is triggered from fput():
- via delayed_fput() for kernel threads and code in interrupt context
- via task_work_run() just before the task/process returns to the user-mode
The first scenario above is definitely asynchronous as the release() handler is
called from a worker thread. But I think the second case (which is the most
common and relevant for our use-case) can also be considered asynchronous,
because the execution of the system call or ioctl that led to the context in
which dma_buf_put() was called is completed.
Here is a trace from my light testing with the udmabuf driver, where you can
see the release() handler being called by syscall_exit_to_user_mode() :
[ 158.464203] Call Trace:
[ 158.466681] <TASK>
[ 158.468815] dump_stack_lvl+0x60/0x80
[ 158.472507] dump_stack+0x14/0x16
[ 158.475853] release_udmabuf+0x2f/0x9f
[ 158.479631] dma_buf_release+0x3c/0x90
[ 158.483408] __dentry_kill+0x8f/0x180
[ 158.487098] dput+0xe7/0x1a0
[ 158.490013] __fput+0x131/0x2b0
[ 158.493178] ____fput+0x19/0x20
[ 158.496352] task_work_run+0x61/0x90
[ 158.499959] syscall_exit_to_user_mode+0x1a4/0x1b0
[ 158.504769] do_syscall_64+0x5b/0x110
[ 158.508458] entry_SYSCALL_64_after_hwframe+0x4b/0x53
And, here is the relevant syscall code (from arch/x86/entry/common.c):
__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
{
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);
instrumentation_begin();
if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
/* Invalid system call, but still a system call. */
regs->ax = __x64_sys_ni_syscall(regs);
}
instrumentation_end();
syscall_exit_to_user_mode(regs);
I also confirmed that the release() handler is indeed called after dma_buf_put()
(and not by dma_buf_put()) by adding debug prints before and after
dma_buf_put() and one in the release() handler. Furthermore, I also found
that calling close() on the dmabuf fd in the user-space is one scenario in
which fput() is called synchronously. Here is the relevant trace:
[ 302.770910] Call Trace:
[ 302.773389] <TASK>
[ 302.775516] dump_stack_lvl+0x60/0x80
[ 302.779209] dump_stack+0x14/0x16
[ 302.782549] release_udmabuf+0x2f/0x9f
[ 302.786329] dma_buf_release+0x3c/0x90
[ 302.790105] __dentry_kill+0x8f/0x180
[ 302.793789] dput+0xe7/0x1a0
[ 302.796703] __fput+0x131/0x2b0
[ 302.799866] __fput_sync+0x53/0x70
[ 302.803299] __x64_sys_close+0x58/0xc0
[ 302.807076] x64_sys_call+0x126a/0x17d0
[ 302.810938] do_syscall_64+0x4f/0x110
[ 302.814622] entry_SYSCALL_64_after_hwframe+0x4b/0x53
As you can see above, there is indeed a synchronous version of fput() defined
just below fput():
/*
* synchronous analog of fput(); for kernel threads that might be needed
* in some umount() (and thus can't use flush_delayed_fput() without
* risking deadlocks), need to wait for completion of __fput() and know
* for this specific struct file it won't involve anything that would
* need them. Use only if you really need it - at the very least,
* don't blindly convert fput() by kernel thread to that.
*/
void __fput_sync(struct file *file)
{
if (file_ref_put(&file->f_ref))
__fput(file);
}
Based on all the above, I think we can conclude that since dma_buf_put()
does not directly (or synchronously) call the f_op->release() handler, a
deadlock is unlikely to occur in the scenario you described.
Thanks,
Vivek
>
> But could be that this information is outdated.
>
> Regards,
> Christian.
>
> >
> > Thanks,
> > Vivek
> >
> >> + }
> >> +}
> >>
> >> This pattern was suggested multiple times and I had to rejected it every
> time
> >> because the whole idea is just fundamentally broken.
> >>
> >> It's really astonishing how people always come up with the same broken
> >> pattern.
> >>
> >> Regards,
> >> Christian.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Apart from that I have to reject the adding of
> >> dma_buf_try_get(), that is clearly not something we should do.
> >>
> >>
> >>
> >> Understood. It appears that Vivek has confirmed that his v2 has
> >> resolved the issue. I will follow up with him to determine if he plans to
> >> resume his patch, and if so, I will apply my last patch on top of his
> updated
> >> patch series
> >>
> >> Thanks,
> >> Wei Lin
> >>
> >>
> >> Thanks,
> >> Christian.
> >>
> >>
> >>
> >>
More information about the dri-devel
mailing list