[PATCH v4 00/14] RFC Support hot device unplug in amdgpu

Andrey Grodzovsky andrey.grodzovsky at amd.com
Thu Feb 25 16:12:29 UTC 2021


On 2021-02-25 5:25 a.m., Daniel Vetter wrote:
> On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
>> On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
>>> On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky at amd.com> wrote:
>>>> Looked a bit into it, I want to export sync_object to FD and import  from that FD
>>>> such that I will wait on the imported sync object handle from one thread while
>>>> signaling the exported sync object handle from another (post device unplug) ?
>>>>
>>>> My problem is how to create a sync object with a non signaled 'fake' fence ?
>>>> I only see API that creates it with already signaled fence (or none) -
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fdrm_syncobj.c%23L56&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc6828a032b80464fc0f008d8d977bc32%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637498455582209331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Izca%2BYNliCefXqAgOIX%2Bs3XQ1vWVGXbfAh28B%2F51blQ%3D&reserved=0
>>>>
>>>> P.S I expect the kernel to crash since unlike with dma_bufs we don't hold
>>>> drm device reference here on export.
>>> Well maybe there's no crash. I think if you go through all your
>>> dma_fence that you have and force-complete them, then I think external
>>> callers wont go into the driver anymore. But there's still pointers
>>> potentially pointing at your device struct and all that, but should
>>> work. Still needs some audit ofc.
>>>
>>> Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
>>> - submit cs
>>> - get the fence for that (either sync_file, but I don't think amdgpu
>>> supports that, or maybe through drm_syncobj)
>>> - hotunplug
>>> - wait on that fence somehow (drm_syncobj has direct uapi for this,
>>> same for sync_file I think)
>>>
>>> Cheers, Daniel
>>
>> Indeed worked fine, did with 2 devices. Since syncobj is refcounted, even
>> after I
>> destroyed the original syncobj and unplugged the device, the exported
>> syncobj and the
>> fence inside didn't go anywhere.
>>
>> See my 3 tests in my branch on Gitlab
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagrodzov%2Figt-gpu-tools%2F-%2Fcommits%2Fmaster&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc6828a032b80464fc0f008d8d977bc32%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637498455582209331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IvoCIggDUV3EgDqOhrJokWei%2B6byg%2Be9cfaJel9y2RY%3D&reserved=0
>> and let me know if I should go ahead and do a merge request (into which
>> target project/branch ?) or you
>> have more comments.
> igt still works with patch submission.
> -Daniel


I see, Need to divert to other work for a while, will get to it once I 
am back to device unplug.

Andrey


>
>> Andrey
>>
>>
>>>> Andrey
>>>>
>>>> On 2/9/21 4:50 AM, Daniel Vetter wrote:
>>>>> Yeah in the end we'd need 2 hw devices for testing full fence
>>>>> functionality. A useful intermediate step would be to just export the
>>>>> fence (either as sync_file, which I think amdgpu doesn't support because
>>>>> no android egl support in mesa) or drm_syncobj (which you can do as
>>>>> standalone fd too iirc), and then just using the fence a bit from
>>>>> userspace (like wait on it or get its status) after the device is
>>>>> unplugged.
>>>


More information about the amd-gfx mailing list