<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Am 17.01.22 um 15:50 schrieb Marek Olšák: <blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com"> <div dir="ltr"> <div>I don't think fork() would work with userspace where all buffers are shared. It certainly doesn't work now. The driver needs to be notified that a buffer or texture is shared to ensure data coherency between processes, and the driver must execute decompression and other render passes when a buffer or texture is being shared for the first time. Those aren't called when fork() is called.</div> </div> </blockquote> Yeah, that's why you can install handlers which run before/after fork() is executed. But to summarize it is illegal for OpenGL, so we don't really need to worry about it. For compute there are a couple of use cases though, but even those are not real world ones as far as I know. But see below. <blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com"> <div dir="ltr"> <div> </div> <div>Marek </div> </div> <div class="gmail_quote"> <div dir="ltr" class="gmail_attr">On Mon, Jan 17, 2022 at 9:34 AM Felix Kuehling <<a href="mailto:felix.kuehling@amd.com" moz-do-not-send="true">felix.kuehling@amd.com</a>> wrote: </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Am 2022-01-17 um 9:21 a.m. schrieb Christian König: > Am 17.01.22 um 15:17 schrieb Felix Kuehling: >> Am 2022-01-17 um 6:44 a.m. schrieb Christian König: >>> Am 14.01.22 um 18:40 schrieb Felix Kuehling: >>>> Am 2022-01-14 um 12:26 p.m. schrieb Christian König: >>>>> Am 14.01.22 um 17:44 schrieb Daniel Vetter: >>>>>> Top post because I tried to catch up on the entire discussion here. >>>>>> >>>>>> So fundamentally I'm not opposed to just close this fork() hole >>>>>> once and >>>>>> for all. The thing that worries me from a upstream/platform pov is >>>>>> really >>>>>> only if we don't do it consistently across all drivers. >>>>>> >>>>>> So maybe as an idea: >>>>>> - Do the original patch, but not just for ttm but all gem rendernode >>>>>> drivers at least (or maybe even all gem drivers, no idea), with >>>>>> the >>>>>> below discussion cleaned up as justification. >>>>> I know of at least one use case which this will break. >>>>> >>>>> A couple of years back we had a discussion on the Mesa mailing list >>>>> because (IIRC) Marek introduced a background thread to push command >>>>> submissions to the kernel. >>>>> >>>>> That broke because some compositor used to initialize OpenGL and then >>>>> do a fork(). This indeed worked previously (no GPUVM at that time), >>>>> but with the addition of the backround thread obviously broke. >>>>> >>>>> The conclusion back then was that the compositor is broken and needs >>>>> fixing, but it still essentially means that there could be people out >>>>> there with really old userspace where this setting would just break >>>>> the desktop. >>>>> >>>>> I'm not really against that change either, but at least in theory we >>>>> could make fork() work perfectly fine even with VMs and background >>>>> threads. >>>> You may regret this if you ever try to build a shared virtual address >>>> space between GPU and CPU. Then you have two processes (parent and >>>> child) sharing the same render context and GPU VM address space. >>>> But the >>>> CPU address spaces are different. You can't maintain consistent shared >>>> virtual address spaces for both processes when the GPU address >>>> space is >>>> shared between them. >>> That's actually not much of a problem. >>> >>> All you need to do is to use pthread_atfork() and do the appropriate >>> action in parent/child to clean up your context: >>> <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fman7.org%2Flinux%2Fman-pages%2Fman3%2Fpthread_atfork.3.html&data=04%7C01%7Cchristian.koenig%40amd.com%7Cd917b56904c64bcb501a08d9d9c8c05e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637780278519496422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4%2FEATucZoZnlP4t0FI6bYtCdThxC3HTOtkIcTU8G%2FqY%3D&reserved=0" originalsrc="https://man7.org/linux/man-pages/man3/pthread_atfork.3.html" shash="gvfF3OeQh16iGIjvFVC4ZAVhSf1F3gpDT27m7ux9O7pCKsFopM3cySI5ICyOdyFCFV9h1oAeHelitThv2EFXqMtGx0b1fBXcB4k/7E3YOMGF4HX0VI7qYeCs5zP2BJ5oLGiJUCA1vWaj+3XxBDH/vGWUXOWvijvEwDzcVmhCl0U=" rel="noreferrer" target="_blank" moz-do-not-send="true">https://man7.org/linux/man-pages/man3/pthread_atfork.3.html</a> >> Thunk already does that. However, it's not foolproof. pthread_atfork >> hanlders aren't called when the process is forked with a clone call. > > Yeah, but that's perfectly intentional. clone() is usually used to > create threads. Clone can be used to create new processes. Maybe not the common use today. > >>> The rest is just to make sure that all shared and all private data are >>> kept separate all the time. Sharing virtual memory is already done for >>> decades this way, it's just that nobody ever did it with a statefull >>> device like GPUs. >> My concern is not with sharing or not sharing data. It's with sharing >> the address space itself. If you share the render node, you share GPU >> virtual address space. However CPU address space is not shared between >> parent and child. That's a fundamental mismatch between the CPU world >> and current GPU driver implementation. > > Correct, but even that is easily solvable. As I said before you can > hang this state on a VMA and let it be cloned together with the CPU > address space. I'm not following. The address space I'm talking about is struct amdgpu_vm. It's associated with the render node file descriptor. Inheriting and using that file descriptor in the child inherits the amdgpu_vm. I don't see how you can hang that state on any one VMA. </blockquote> </div> </blockquote> But you don't really need that. You can bind the VM to your VMA mapping and clone that as necessary. I'm not sure how else I should describe that, as far as I know the kernel that would be rather trivial to do. Cloning all the userspace state like Marek described above is the much harder part. Regards, Christian. <blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> To be consistent with the CPU, you'd need to clone the GPU address space (struct amdgpu_vm) in the child process. That means you need a new render node file descriptor that imports all the BOs from the parent address space. It's a bunch of extra work to fork a process, that you're proposing to immediately undo with an atfork handler. So I really don't see the point. Regards, Felix > > Since VMAs are informed about their cloning (in opposite to file > descriptors) it's trivial to even just clone kernel data on first access. > > Regards, > Christian. > >> >> Regards, >> Felix >> >> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Felix >>>> > </blockquote> </div> </blockquote> </body> </html>