[Bug 37168] New: Regression: Kernel hard-lock when running Second Life

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri May 13 03:40:13 PDT 2011


https://bugs.freedesktop.org/show_bug.cgi?id=37168

           Summary: Regression: Kernel hard-lock when running Second Life
           Product: Mesa
           Version: git
          Platform: Other
        OS/Version: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: Drivers/Gallium/r600
        AssignedTo: dri-devel at lists.freedesktop.org
        ReportedBy: gm.potato.ul at gmail.com


When running any of the following family of applications (the latter two are a
not-too-distant fork of the first):

Second Life from www.secondlife.com version 2.6 or later (32-bit)
Imprudence from www.kokuaviewer.org version 1.3.1 or 1.4.0 (32-bit or 64-bit)
Kokua from www.kokuaviewer.org version 0.1.0 (32-bit or 64-bit)

The following results are observed:

Mesa 7.10.2 tarball OR Mesa 7.10 branch from git: Correct rendering; no
crashing.

Mesa 7.11-dev from about February 2011: Massive memory leak (upwards of 100
MB/s) in user-space fills up the virtual address space of the application.
Eventually, kernel oomkiller kills it.

The version of Mesa 7.11-dev shipped by Fedora 15: Same as above; massive
memory leak causing eventual OOM.

Mesa 7.11-dev from about April 2011, including today's git master (May 13
2011): Memory leak appears to persist, by monitoring the memory usage of the
process. However, a much more sinister problem appears: the kernel hard-locks.
Completely. So hard that it can't kexec the crash kernel I set up, which I
tested and works with a NULL pointer dereference. No caps lock key response,
can't SSH in, no magic sysrq keys. Completely dead kernel. Motherboard lacks a
RS282 port or serial port header; can't debug over serial.

Steps to reproduce problems observed:

Simply run the application and log-in to any grid (Second Life and osgrid were
tested). It doesn't appear to be specific to any client settings or particular
objects in-world. Memory leak begins immediately once 3d rendering begins
(after the login process is complete). Kernel hard-lock will occur within 5
minutes. You can make the kernel hardlock occur more quickly by panning the
camera around, but it will occur regardless.

Because there are two interacting bugs, one causing me to have to hold down the
power button, this is a VERY difficult to bisect problem.


Test Parameters:

Application versions as stated above
AMD Radeon HD5970 (689c chipset, uses evergreen code)
Fedora 15 x86_64
Linux 2.6.38.2 through 2.6.38.5 (official Fedora build) and 2.6.39-rcX:
behavior is identical on all of these kernels, including the latest 2.6.39 RC
(rc7-git2 as of this writing)
libdrm from git master (kept updated)
xf86-video-ati from git master
Xorg Server from Fedora 15
mesa version: varies (see test results)
Driver parameters: Have tried a full factorial of the following settings:
SwapBuffersWait, EnablePageFlip, ColorTiling. Enabling and disabling them
individually results in 8 possible combinations; none of them have any impact
on the result. ONLY the Mesa version has any impact whatsoever on the result.

Reproduced independently by a user (BioTube) on #radeon IRC with a R600-class
GPU.

Troubleshooting:

The application allows you to toggle things such as various classes of shaders
(or whether to use shaders at all), framebuffer objects, and vertex buffer
objects. You can also toggle between a deferred rendering pipeline with shadows
and the classic immediate rendering pipeline. Enabling and disabling these
settings has no effect whatsoever on the outcome, except that enabling shaders
will produce a SIGSEGV on older versions of mesa (such as 7.10.2) due to a
missing feature in the shader compiler which has since been implemented.

The renderer was *significantly* rewritten in the "2.x" versions of Second Life
and in the Kokua experimental viewer, compared to the "1.x" Imprudence. None of
the changes between these two major versions of the renderer have any impact on
the outcome of the tests.

I haven't even been able to diagnose what the exact problem is because the
hardlock is so, well, hard. The memory leak might be more tractable with
valgrind and debugging symbols shipped with the mesa build.

I am allquixotic on IRC if you want me to test a patch or need help reproducing
it. Based on my diagnosis so far, you should be able to reproduce it using
*any* hardware that is supported by the r600g driver.

NB: These programs are open source software, so if you are so inclined, dive in
and take a look at what might possibly be causing this problem. The corporate
developers at Linden Lab unfortunately don't support running their client under
Mesa at all, and the Imprudence/Kokua developers lack detailed knowledge of the
internals of the open source graphics stack, so we can't rely on them either.
But it is open source, and it would be nice to get it working again (especially
since it works with the 7.10 release series), as well as fix a potential kernel
panic bug.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the dri-devel mailing list