[PATCH v4 00/14] RFC Support hot device unplug in amdgpu

Andrey Grodzovsky andrey.grodzovsky at amd.com
Wed Feb 24 16:30:50 UTC 2021

On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
> On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky
> <Andrey.Grodzovsky at amd.com> wrote:
>> Looked a bit into it, I want to export sync_object to FD and import  from that FD
>> such that I will wait on the imported sync object handle from one thread while
>> signaling the exported sync object handle from another (post device unplug) ?
>> My problem is how to create a sync object with a non signaled 'fake' fence ?
>> I only see API that creates it with already signaled fence (or none) -
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fdrm_syncobj.c%23L56&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5085bdd151c642514d2408d8d4c08e56%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637493270792459284%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sZWIn0Lo7ZujBq0e7MdFPhJDARXWpOlLgLzANMS8cCY%3D&reserved=0
>> P.S I expect the kernel to crash since unlike with dma_bufs we don't hold
>> drm device reference here on export.
> Well maybe there's no crash. I think if you go through all your
> dma_fence that you have and force-complete them, then I think external
> callers wont go into the driver anymore. But there's still pointers
> potentially pointing at your device struct and all that, but should
> work. Still needs some audit ofc.
> Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
> - submit cs
> - get the fence for that (either sync_file, but I don't think amdgpu
> supports that, or maybe through drm_syncobj)
> - hotunplug
> - wait on that fence somehow (drm_syncobj has direct uapi for this,
> same for sync_file I think)
> Cheers, Daniel

Indeed worked fine, did with 2 devices. Since syncobj is refcounted, 
even after I
destroyed the original syncobj and unplugged the device, the exported 
syncobj and the
fence inside didn't go anywhere.

See my 3 tests in my branch on Gitlab 
and let me know if I should go ahead and do a merge request (into which 
target project/branch ?) or you
have more comments.


>> Andrey
>> On 2/9/21 4:50 AM, Daniel Vetter wrote:
>>> Yeah in the end we'd need 2 hw devices for testing full fence
>>> functionality. A useful intermediate step would be to just export the
>>> fence (either as sync_file, which I think amdgpu doesn't support because
>>> no android egl support in mesa) or drm_syncobj (which you can do as
>>> standalone fd too iirc), and then just using the fence a bit from
>>> userspace (like wait on it or get its status) after the device is
>>> unplugged.

More information about the amd-gfx mailing list