[PATCH v4 00/14] RFC Support hot device unplug in amdgpu
daniel at ffwll.ch
Thu Feb 25 10:25:50 UTC 2021
On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
> On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
> > On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky at amd.com> wrote:
> > > Looked a bit into it, I want to export sync_object to FD and import from that FD
> > > such that I will wait on the imported sync object handle from one thread while
> > > signaling the exported sync object handle from another (post device unplug) ?
> > >
> > > My problem is how to create a sync object with a non signaled 'fake' fence ?
> > > I only see API that creates it with already signaled fence (or none) -
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fdrm_syncobj.c%23L56&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5085bdd151c642514d2408d8d4c08e56%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637493270792459284%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sZWIn0Lo7ZujBq0e7MdFPhJDARXWpOlLgLzANMS8cCY%3D&reserved=0
> > >
> > > P.S I expect the kernel to crash since unlike with dma_bufs we don't hold
> > > drm device reference here on export.
> > Well maybe there's no crash. I think if you go through all your
> > dma_fence that you have and force-complete them, then I think external
> > callers wont go into the driver anymore. But there's still pointers
> > potentially pointing at your device struct and all that, but should
> > work. Still needs some audit ofc.
> > Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
> > - submit cs
> > - get the fence for that (either sync_file, but I don't think amdgpu
> > supports that, or maybe through drm_syncobj)
> > - hotunplug
> > - wait on that fence somehow (drm_syncobj has direct uapi for this,
> > same for sync_file I think)
> > Cheers, Daniel
> Indeed worked fine, did with 2 devices. Since syncobj is refcounted, even
> after I
> destroyed the original syncobj and unplugged the device, the exported
> syncobj and the
> fence inside didn't go anywhere.
> See my 3 tests in my branch on Gitlab
> and let me know if I should go ahead and do a merge request (into which
> target project/branch ?) or you
> have more comments.
igt still works with patch submission.
> > > Andrey
> > >
> > > On 2/9/21 4:50 AM, Daniel Vetter wrote:
> > > > Yeah in the end we'd need 2 hw devices for testing full fence
> > > > functionality. A useful intermediate step would be to just export the
> > > > fence (either as sync_file, which I think amdgpu doesn't support because
> > > > no android egl support in mesa) or drm_syncobj (which you can do as
> > > > standalone fd too iirc), and then just using the fence a bit from
> > > > userspace (like wait on it or get its status) after the device is
> > > > unplugged.
Software Engineer, Intel Corporation
More information about the amd-gfx