[Mesa-dev] [PATCH] nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads.

Sat Jun 27 21:25:49 PDT 2015

But that's the thing... in this case, the fd lifetime is owned by the
X server. In fact, nouveau doesn't touch that fd in mesa beyond
dup'ing it, and then exclusively using the dup'd fd.

On Sun, Jun 28, 2015 at 12:23 AM, Mario Kleiner
<mario.kleiner.de at gmail.com> wrote:
> Ok, maybe one thing for the commit message, as i just made the same mixup in
> my reply: The fd's are not owned by the x-server, but by the mesa screens
> (pipe screens? dri screens?) they represent, so if such a screen goes away,
> the fd goes away. Using "X server" would be confusing, especially under
> dri3, although each such "mesa screen" corresponds to a x-screen.
>
> -mario
>
>
> On 06/28/2015 06:03 AM, Mario Kleiner wrote:
>>
>> On 06/28/2015 05:41 AM, Ilia Mirkin wrote:
>>>
>>> Oh duh. Thanks for the super-detailed explanation. To rephrase what
>>> you said in a slightly shorter manner:
>>>
>>> See bug https://bugs.freedesktop.org/show_bug.cgi?id=79823 for why we
>>> need to dupfd (which we are already, although radeon might not be).
>>>
>>
>> Radeon doesn't so far. My 2nd patch for radeon from earlier today adds
>> that.
>>
>>> Except instead of sticking the dup'd fd into the hash table, whose
>>> lifetime matches that of the device, we stick the other one in, which
>>> is effectively owned by the X server. As a result, those fstat's might
>>> fail down the line, and all sorts of hell will break loose.
>>>
>>
>> Yes. Essentially we should make sure the fd's we keep around have the
>> same lifetime as the data structure in which we need to use it. That was
>> the point, the server owned fd went away while we still needed it.
>>
>>> Would you be opposed to me rewriting your commit message to basically
>>> reflect the above? Perhaps your original one did as well, but it
>>> wasn't clear to me. I'd also like to throw some assert's in to make
>>> sure the fstat's don't fail -- does that sound reasonable?
>>>
>>
>> Oh please, yes! My commit messages often have this disease that they are
>> either too terse, when i think the problem is obvious, or too long and
>> somewhat convoluted when i'm going overboard in the other direction. Any
>> editing/shortening for clarity more than happily accepted :)
>>
>> thanks,
>> -mario
>>
>>> Another solution, btw, is to stick <stat.st_dev, stat.st_ino,
>>> stat.st_rdev> as the real key in the hash, although that'd involve
>>> making pointers. Probably not worth it.
>>>
>>> Cheers,
>>>
>>>    -ilia
>>>
>>> On Sat, Jun 27, 2015 at 11:13 PM, Mario Kleiner
>>> <mario.kleiner.de at gmail.com> wrote:
>>>>
>>>> On 06/28/2015 03:48 AM, Ilia Mirkin wrote:
>>>>>
>>>>>
>>>>> On Fri, Jun 5, 2015 at 9:36 AM, Mario Kleiner
>>>>> <mario.kleiner.de at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> The dup'ed fd owned by the nouveau_screen for a device node
>>>>>> must also be used as key for the winsys hash table, instead
>>>>>> of using the original fd passed in for a screen, to make
>>>>>> multi-x-screen ZaphodHeads configurations work on nouveau.
>>>>>>
>>>>>> This prevents the following crash scenario that was observed
>>>>>> when a dynamically loaded rendering plugin used OpenGL on a
>>>>>> ZaphodHeads setup, e.g., on a dual x-screen setup. At first
>>>>>> load the plugin worked, but after unloading and reloading it,
>>>>>> the next rendering operation crashed:
>>>>>>
>>>>>> 1. Client, e.g., a plugin, calls glXQueryVersion.
>>>>>>
>>>>>> 2. DRI screens for the x-screens 0 and 1 are created, one shared
>>>>>>      nouveau_screen is created for the shared device node of both
>>>>>>      screens, but the original fd of x-screen 0 is used as identifying
>>>>>>      key in the hash table, instead of the dup()ed fd of x-screen 0
>>>>>>      which is owned by the nouveau_screen. nouveau_screen's refcount
>>>>>>      is now 2.
>>>>>
>>>>>
>>>>>
>>>>> See below, but it shouldn't matter which fd gets used.
>>>>>
>>>>>>
>>>>>> 3. Regular rendering happens by the client plugin, then the plugin
>>>>>>      gets unloaded.
>>>>>>
>>>>>> 4. XCloseDisplay(). x-screen 0 gets its DRI screen destroyed,
>>>>>>      nouveau_drm_screen_unref() drops the refcount to 1, calling mesa
>>>>>>      code then closes the fd of x-screen 0, so now the fd which is
>>>>>>      used as key in the hash table is invalid. x-screen 1 gets
>>>>>>      destroyed, nouveau_drm_screen_unref() drops the refcount to 0,
>>>>>>      the nouveau_screen gets destroyed, but removal of its entry
>>>>>>      in the hash table fails, because the invalid fd in the hash
>>>>>>      table no longer matches anything (fstat() on the fd is used
>>>>>>      for hashing and key comparison, but fstat() on an already closed
>>>>>>      fd fails and returns bogus results). x-screen 1 closes its fd.
>>>>>>
>>>>>>      Now all fd's are closed, the nouveau_screen destroyed, but
>>>>>>      there is a dangling reference to the nouveau_screen in the
>>>>>>      hash table.
>>>>>>
>>>>>> 5. Some OpenGL client plugin gets loaded again and calls
>>>>>>      glXQueryVersion. Step 2 above repeats, but because a
>>>>>>      dangling reference with a matching fd is found in the winsys
>>>>>>      hash table, no new nouveau_screen is created this time. Instead
>>>>>>      the invalid pointer to the old nouveau_screen is recycled,
>>>>>>      which points to nirvana -> Crash.
>>>>>>
>>>>>> This problem is avoided by use of the dup()ed fd which is
>>>>>> owned by the nouveau_screen and has the same lifetime as
>>>>>> the nouveau_screen itself.
>>>>>
>>>>>
>>>>>
>>>>> I need to think about this some more, but... this shouldn't happen :)
>>>>>
>>>>> In fact, the whole dupfd thing was added there for ZaphodHeads screens
>>>>> in the first place. See
>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=79823 and commit
>>>>> a59f2bb17bcc which fixed it.
>>>>>
>>>>> Note that the hash has the following hash/eq functions:
>>>>>
>>>>> static unsigned hash_fd(void *key)
>>>>> {
>>>>>       int fd = pointer_to_intptr(key);
>>>>>       struct stat stat;
>>>>>       fstat(fd, &stat);
>>>>>
>>>>>       return stat.st_dev ^ stat.st_ino ^ stat.st_rdev;
>>>>> }
>>>>>
>>>>> static int compare_fd(void *key1, void *key2)
>>>>> {
>>>>>       int fd1 = pointer_to_intptr(key1);
>>>>>       int fd2 = pointer_to_intptr(key2);
>>>>>       struct stat stat1, stat2;
>>>>>       fstat(fd1, &stat1);
>>>>>       fstat(fd2, &stat2);
>>>>>
>>>>>       return stat1.st_dev != stat2.st_dev ||
>>>>>              stat1.st_ino != stat2.st_ino ||
>>>>>              stat1.st_rdev != stat2.st_rdev;
>>>>> }
>>>>>
>>>>> so fd and dupfd should get hashed to the same thing. I suspect there's
>>>>> something else going on in your application...
>>>>>
>>>>
>>>> My application is a set of dynamically loaded plugins running inside
>>>> another
>>>> host app (Psychtoolbox-3 inside GNU/Octave), so what happens often is
>>>> that
>>>> the OpenGL using plugin gets unloaded and reloaded at runtime, so a
>>>> after a
>>>> OpenGL session has ended with a XCloseDisplay() tearing the winsys
>>>> down, you
>>>> can easily have it restart at plugin reload with another XOpenDisplay ->
>>>> glXQueryVersion -> .... sequence, where the new call to
>>>> glXQueryVersion will
>>>> trigger a recreation of the winsys, which will find the stale entry
>>>> in the
>>>> hash table pointing to nowhere instead of the previously released
>>>> nouveau_screen -> use-after-free -> boom!
>>>>
>>>> The reason this fails is because during destruction of the 2nd, 3rd etc.
>>>> x-screen, the already closed fd associated with the 1st x-screen is
>>>> fed into
>>>> the compare_fd function, so fstat() errors out on the invalid fd. I
>>>> added
>>>> printf's etc. on both nouveau and now radeon to verify the fstat
>>>> gives me
>>>> EBADF errors. So the hash calculation goes wrong when trying to find the
>>>> matching element in the hash table with a fd that has a matching hash
>>>> -> The
>>>> element which should be removed is ignored/not removed because it
>>>> contains
>>>> an already closed fd for which no proper hash can be calculated
>>>> anymore ->
>>>> hash comparison during search goes wrong.
>>>>
>>>> This is because multiple x-screens, e.g., 2 x-screens are destroyed
>>>> in order
>>>> 0, 1 by FreeScreenConfigs() as part of XCloseDisplay().
>>>> FreeScreenConfigs()
>>>> calls dri2DestroyScreen() in dri2.c or dri3_destroy_screen in dri3.c,
>>>> which
>>>> are essentially identical, e.g.,:
>>>>
>>>> static void
>>>> dri2DestroyScreen(struct glx_screen *base)
>>>> {
>>>>     struct dri2_screen *psc = (struct dri2_screen *) base;
>>>>
>>>>     /* Free the direct rendering per screen data */
>>>>     (*psc->core->destroyScreen) (psc->driScreen);
>>>>
>>>> --> This core->destroyScreen calls eventually into the winsys, which
>>>> then
>>>> drops the refcount of the nouveau_screen or radeon winsys structure
>>>> by one,
>>>> and tries to release the struct and remove the hash table entry once the
>>>> refcount drops to zero.
>>>>
>>>>     driDestroyConfigs(psc->driver_configs);
>>>>     close(psc->fd);
>>>>
>>>> -> This close() closes the fd of a screen immediately after closing down
>>>> that screen. At the time screen 1 is closed down, the fd associated with
>>>> screen 0 is already closed.
>>>>
>>>>     free(psc);
>>>> }
>>>>
>>>> So you have this sequence during creation:
>>>>
>>>> glXQueryVersion() -> AllocAndFetchScreenConfigs():
>>>>
>>>> create screen 0 and its fd, e.g., fd==5, dup(fd), e.g., fd==6 for the
>>>> nouveau_screen, but store x-screen 0's fd==5 *itself* in the hash
>>>> table as
>>>> key.
>>>>
>>>> create screen 1. This now creates an fd 7
>>>>
>>>> => refcount is now 2.
>>>>
>>>>
>>>> And this sequence during destruction at XCloseDisplay();
>>>>
>>>> FreeScreenConfigs():
>>>>
>>>> free screen 0: core->destroyScreen() -> ... ->
>>>> nouveau_drm_screen_unref() =>
>>>> refcount-- is now 1
>>>> close(fd==5) of screen 0 and thereby the fd==5 stored inside the hash
>>>> table
>>>> as key. From now on using fcntl(fd==5) for hash calculation on that
>>>> element
>>>> will malfunction.
>>>>
>>>> free screen 1: -> nouveau_drm_screen_unref -> refcount-- is now 0. ->
>>>> Try to
>>>> remove hash table entry => fails because fcntl(fd==5) on the already
>>>> closed
>>>> fd==5 of screen 0 fails with EBADF. => Hash table keeps its stale
>>>> element
>>>> referencing the struct nouveau_screen.
>>>>
>>>> Free nouveau_screen struct (close dup()ed fd==6 for nouveau_screen,
>>>> close
>>>> down stuff, free the struct).
>>>> -> close(fd==7) associated with screen 1.
>>>>
>>>> Now all fd's (5,6,7) are closed, the struct is gone, the hash table
>>>> contains
>>>> a stale element with a no longer existent fd==5 and a pointer to an
>>>> already
>>>> free()d struct nouveau_screen.
>>>>
>>>> Now we reload the plugin and do glXQueryVersion() again, so
>>>> AllocAndFetchScreenConfigs() is executed again to create the winsys
>>>> etc. for
>>>> screens 0 and 1.
>>>>
>>>> create screen 0 and its fd: Here the first unused fd in the
>>>> filedescriptor
>>>> table is recycled, which happens to be fd==5, just as in the first
>>>> session
>>>> with the plugin! Now suddenly fd=5 is again the valid fd for x-screen
>>>> 0, and
>>>> it points to the same /dev/drm/card0 device file as before. That
>>>> means when
>>>> nouveau_drm_screen_create(fd==5) is called, it finds the stale
>>>> element in
>>>> the hash table from the previous session, because the previously
>>>> closed fd 5
>>>> in that element is now valid and open again and points to the same
>>>> device
>>>> file as in the previous session, ergo has the same hash etc.
>>>> nouveau_drm_screen_create() concludes that a fully initialized
>>>> nouveau_screen already exists for this session, except it doesn't
>>>> exist -
>>>> the pointer in the hash table points to invalid previously freed memory.
>>>> When it tries to access that memory we have a use-after-free and the
>>>> application segfaults.
>>>>
>>>> So it depends a bit on what happens in the applications memory
>>>> between the
>>>> two consecutive OpenGL sessions etc. how long it takes for an invalid
>>>> memory
>>>> access to happen, but eventually it hits invalid data. Maybe
>>>> occassionally
>>>> it gets lucky and the freed memory is still accessible and the struct
>>>> intact, so stuff works by chance.
>>>>
>>>> If the application happened to open other files inbetween the 1st and
>>>> 2nd
>>>> session, then the relevant fd in the 2nd session may not be exactly
>>>> the same
>>>> as in the first session, because recycling that fd didn't work, so we
>>>> are
>>>> only left with a dead but harmless entry in the hash table instead of a
>>>> crasher. I assume something like that prevented my crashes on radeon
>>>> in the
>>>> past, because when i wasn't expecting/looking for trouble i didn't use a
>>>> test sequence carefully made to trigger the bug for certain.
>>>>
>>>> Similar things happen on radeon for the same reason, ergo the 2nd
>>>> patch with
>>>> a similar fix.
>>>>
>>>> I don't have the gdb backtrace for nouveau anymore, but the traces for
>>>> radeon are attached to this mail to show you the flow of execution,
>>>> similar
>>>> to nouveau.
>>>>
>>>> -mario