<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 17.01.22 um 15:50 schrieb Marek Olšák:<br>
<blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com">
<div dir="ltr">
<div>I don't think fork() would work with userspace where all
buffers are shared. It certainly doesn't work now. The driver
needs to be notified that a buffer or texture is shared to
ensure data coherency between processes, and the driver must
execute decompression and other render passes when a buffer or
texture is being shared for the first time. Those aren't
called when fork() is called.</div>
</div>
</blockquote>
<br>
Yeah, that's why you can install handlers which run before/after
fork() is executed. But to summarize it is illegal for OpenGL, so we
don't really need to worry about it.<br>
<br>
For compute there are a couple of use cases though, but even those
are not real world ones as far as I know.<br>
<br>
But see below.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>Marek<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Jan 17, 2022 at 9:34
AM Felix Kuehling <<a href="mailto:felix.kuehling@amd.com" moz-do-not-send="true">felix.kuehling@amd.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Am
2022-01-17 um 9:21 a.m. schrieb Christian König:<br>
> Am 17.01.22 um 15:17 schrieb Felix Kuehling:<br>
>> Am 2022-01-17 um 6:44 a.m. schrieb Christian König:<br>
>>> Am 14.01.22 um 18:40 schrieb Felix Kuehling:<br>
>>>> Am 2022-01-14 um 12:26 p.m. schrieb Christian
König:<br>
>>>>> Am 14.01.22 um 17:44 schrieb Daniel
Vetter:<br>
>>>>>> Top post because I tried to catch up
on the entire discussion here.<br>
>>>>>><br>
>>>>>> So fundamentally I'm not opposed to
just close this fork() hole<br>
>>>>>> once and<br>
>>>>>> for all. The thing that worries me
from a upstream/platform pov is<br>
>>>>>> really<br>
>>>>>> only if we don't do it consistently
across all drivers.<br>
>>>>>><br>
>>>>>> So maybe as an idea:<br>
>>>>>> - Do the original patch, but not just
for ttm but all gem rendernode<br>
>>>>>> drivers at least (or maybe even
all gem drivers, no idea), with<br>
>>>>>> the<br>
>>>>>> below discussion cleaned up as
justification.<br>
>>>>> I know of at least one use case which
this will break.<br>
>>>>><br>
>>>>> A couple of years back we had a
discussion on the Mesa mailing list<br>
>>>>> because (IIRC) Marek introduced a
background thread to push command<br>
>>>>> submissions to the kernel.<br>
>>>>><br>
>>>>> That broke because some compositor used
to initialize OpenGL and then<br>
>>>>> do a fork(). This indeed worked
previously (no GPUVM at that time),<br>
>>>>> but with the addition of the backround
thread obviously broke.<br>
>>>>><br>
>>>>> The conclusion back then was that the
compositor is broken and needs<br>
>>>>> fixing, but it still essentially means
that there could be people out<br>
>>>>> there with really old userspace where
this setting would just break<br>
>>>>> the desktop.<br>
>>>>><br>
>>>>> I'm not really against that change
either, but at least in theory we<br>
>>>>> could make fork() work perfectly fine
even with VMs and background<br>
>>>>> threads.<br>
>>>> You may regret this if you ever try to build
a shared virtual address<br>
>>>> space between GPU and CPU. Then you have two
processes (parent and<br>
>>>> child) sharing the same render context and
GPU VM address space.<br>
>>>> But the<br>
>>>> CPU address spaces are different. You can't
maintain consistent shared<br>
>>>> virtual address spaces for both processes
when the GPU address<br>
>>>> space is<br>
>>>> shared between them.<br>
>>> That's actually not much of a problem.<br>
>>><br>
>>> All you need to do is to use pthread_atfork() and
do the appropriate<br>
>>> action in parent/child to clean up your context:<br>
>>> <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fman7.org%2Flinux%2Fman-pages%2Fman3%2Fpthread_atfork.3.html&data=04%7C01%7Cchristian.koenig%40amd.com%7Cd917b56904c64bcb501a08d9d9c8c05e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637780278519496422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4%2FEATucZoZnlP4t0FI6bYtCdThxC3HTOtkIcTU8G%2FqY%3D&reserved=0" originalsrc="https://man7.org/linux/man-pages/man3/pthread_atfork.3.html" shash="gvfF3OeQh16iGIjvFVC4ZAVhSf1F3gpDT27m7ux9O7pCKsFopM3cySI5ICyOdyFCFV9h1oAeHelitThv2EFXqMtGx0b1fBXcB4k/7E3YOMGF4HX0VI7qYeCs5zP2BJ5oLGiJUCA1vWaj+3XxBDH/vGWUXOWvijvEwDzcVmhCl0U=" rel="noreferrer" target="_blank" moz-do-not-send="true">https://man7.org/linux/man-pages/man3/pthread_atfork.3.html</a><br>
>> Thunk already does that. However, it's not foolproof.
pthread_atfork<br>
>> hanlders aren't called when the process is forked
with a clone call.<br>
><br>
> Yeah, but that's perfectly intentional. clone() is
usually used to<br>
> create threads.<br>
<br>
Clone can be used to create new processes. Maybe not the
common use today.<br>
<br>
<br>
><br>
>>> The rest is just to make sure that all shared and
all private data are<br>
>>> kept separate all the time. Sharing virtual
memory is already done for<br>
>>> decades this way, it's just that nobody ever did
it with a statefull<br>
>>> device like GPUs.<br>
>> My concern is not with sharing or not sharing data.
It's with sharing<br>
>> the address space itself. If you share the render
node, you share GPU<br>
>> virtual address space. However CPU address space is
not shared between<br>
>> parent and child. That's a fundamental mismatch
between the CPU world<br>
>> and current GPU driver implementation.<br>
><br>
> Correct, but even that is easily solvable. As I said
before you can<br>
> hang this state on a VMA and let it be cloned together
with the CPU<br>
> address space.<br>
<br>
I'm not following. The address space I'm talking about is
struct<br>
amdgpu_vm. It's associated with the render node file
descriptor.<br>
Inheriting and using that file descriptor in the child
inherits the<br>
amdgpu_vm. I don't see how you can hang that state on any one
VMA.<br>
</blockquote>
</div>
</blockquote>
<br>
But you don't really need that. You can bind the VM to your VMA
mapping and clone that as necessary.<br>
<br>
I'm not sure how else I should describe that, as far as I know the
kernel that would be rather trivial to do.<br>
<br>
Cloning all the userspace state like Marek described above is the
much harder part.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A5XrPUJD2QJHBcF1Gd5cw6T=EmEEuVvNT3SjasSy9E8yg@mail.gmail.com">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
To be consistent with the CPU, you'd need to clone the GPU
address space<br>
(struct amdgpu_vm) in the child process. That means you need a
new<br>
render node file descriptor that imports all the BOs from the
parent<br>
address space. It's a bunch of extra work to fork a process,
that you're<br>
proposing to immediately undo with an atfork handler. So I
really don't<br>
see the point.<br>
<br>
Regards,<br>
Felix<br>
<br>
<br>
><br>
> Since VMAs are informed about their cloning (in opposite
to file<br>
> descriptors) it's trivial to even just clone kernel data
on first access.<br>
><br>
> Regards,<br>
> Christian.<br>
><br>
>><br>
>> Regards,<br>
>> Felix<br>
>><br>
>><br>
>>> Regards,<br>
>>> Christian.<br>
>>><br>
>>>> Regards,<br>
>>>> Felix<br>
>>>><br>
><br>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>