[Nouveau] "enable dri3 support without glamor" causes gnome-shell regression on nv4x

Ilia Mirkin imirkin at alum.mit.edu
Tue Aug 11 05:21:53 PDT 2015


On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> Hi,
>
>
> On 03-08-15 20:09, Ilia Mirkin wrote:
>>
>> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdegoede at redhat.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>> On 03-08-15 17:36, Ilia Mirkin wrote:
>>>>
>>>>
>>>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdegoede at redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 30-07-15 16:09, Ilia Mirkin wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> FWIW this is a fail on nv50+ as well. See for example
>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445
>>>>>>
>>>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done
>>>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization
>>>>>> and so the commands never get flushed out. Easily verified by sticking
>>>>>> PUSH_KICK's everywhere.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I do not believe that that is the problem, in my case it clearly
>>>>> seems to be a pitch / swizzle problem rather then a synchronizarion
>>>>> problem, here is what my desktop with gnome shell looks like when
>>>>> using DRI2:
>>>>>
>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg
>>>>>
>>>>> And this is what it looks like when using DRI3:
>>>>>
>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg
>>>>>
>>>>> The DRI2 screenshot is made with Mario's 2 patches on top of
>>>>> current master:
>>>>>
>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html
>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html
>>>>>
>>>>> And then adding Option "DRI" "2" to xorg.conf.
>>>>
>>>>
>>>>
>>>> His patches should have defaulted it to DRI 2 I think, so this is
>>>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3
>>>> with his patches.
>>>>    --
>>>>>
>>>>>
>>>>>
>>>>> I've also tried disabling EXA using Option "AccelMethod" "none",
>>>>> but that seems to also automatically disable all DRI, leading to
>>>>> software rendering.
>>>>>
>>>>> I discussed this with Ben this morning and he suggested that this
>>>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx
>>>>> allocs the surfaces. I've tried disabling swizzling in the
>>>>> mesa code by forcing nv30_miptree_create() to always take
>>>>> the code path for linear textures, but that leads to the exact
>>>>> same result as before that change.
>>>>
>>>>
>>>>
>>>> Ah yes. Very different problem indeed. I actually suspect it has to do
>>>> with swizzling. Look at the white pattern of the moon -- it's all in a
>>>> line. That means that it expected some locality and instead it got
>>>> drawn all on a line. If it were merely a stride problem, I'd expect to
>>>> see strips of the moon below and offset from one another.
>>>>
>>>> So... take a look at nv30_miptree_from_handle -- I wonder if it can
>>>> now receive swizzled textures where it couldn't before.
>>>
>>>
>>>
>>> Ok, that does go in the direction I am expecting the problem to be,
>>> but I'm afraid I'm going to need a bit more guidance, what exactly
>>> am I looking for in that function / which "knobs" should I try to
>>> vary / play with to maybe fix this ?
>>
>>
>> Unfortunately this is playing near (or past) the limits of my
>> knowledge as well. My understanding is that DRI3 passes pixmaps around
>> with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism
>> which is not that (I think it just copies stuff around).
>>
>> Now on nv50+, bo's have "tile flags" (and memtype and probably other
>> annoyances). The tile flags indicate the specific tiling mechanism
>> used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look
>> at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a
>> "bo config" argument. This bo config can then later be retrieved using
>> some other syscall.
>>
>> However on nv30 there appears to not be any such thing. The
>> nouveau_bo_new call just passes in NULL for creating the bo, which
>> means that there's no way to recover the "are you swizzled"
>> information after-the-fact.
>>
>> Presumably you should create a "nv04" bo config section in the union,
>
>
> That already exists, and indeed gets set by the nouveau_allocate_surface
> function from src/nv_accel_common.c from the ddx,
>
>> and just pass the single "swizzled" bit through. I'm not sure what, if
>> anything, is required on the kernel side for that. I don't think
>> there's any optionality in how the swizzling is done for pre-nv50.
>>
>> Note that in the nv30_miptree logic, mt->swizzled implies that
>> mt->uniform_pitch = 0, but the level pitch is set "properly" (again,
>> see nv30_miptree_create).
>>
>> Hope this sheds some light and doesn't cause you to go in the wrong
>> direction -- please take everything I say with a grain of salt -- I'm
>> often a bit off on some of the details.
>
>
> Thanks this was helpful, I do feel we are getting somewhere, but I do
> need a bit more help.
>
> I've added some debug printf's to nv30_miptree.c, nv30_miptree_create
> and nv30_miptree_from_handle, where the latter is only used when using
> dri2 (e.g. in the working case).
>
> Doing a diff between a log of starting gnome-shell with dri vs dri3
> results in this:
>
> --- mesa.log.dri2       2015-08-10 14:18:03.182712022 +0200
> +++ mesa.log.dri3       2015-08-10 14:18:33.233336338 +0200
> @@ -1,8 +1,8 @@
>  nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
> @@ -12,6 +12,7 @@
>  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
> +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
> @@ -20,29 +21,24 @@
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
> -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0
> +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
>  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
> -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
> -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
> +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>  nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
>  nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>
> I've also added logging of the surface creation in the ddx, here
> is one such log line:
>
>  Surface alloc 1920x1080 at 32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags
> 2
>
> What stands out to me is:
>
> 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are
>  replaced by nv30_miptree_create calls

I bet it becomes the DDX that consumes these now though? And perhaps
it's not ready to take swizzled textures (i.e. pixmaps)?

>
> 2) These replacement calls use a different pitch for the 1920x... surfaces,
> 8192 vs 7680

Looks like it gets upgraded to a POT value (i.e. 2048 x whatever).
nv30_miptree will create an unswizzled surface for a non-POT-sized
texture, but I guess something somewhere is aligning it up in the DDX?
And *still* creating it as a linear texture? Weird.

>
> 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling
> nouveau_bo_new, where as nv30_miptree_create passes in NULL

Right. That's the bit I was talking about earlier -- you need to pass
the information that it will be swizzled. Otherwise it's lost in the
nv30_miptree should probably be setting those surf_flags as well on BO
creation. Apparently there are performance implications from setting
these.

>
> 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree,
> iow
> it always assumes non-swizzled ?

That's right.

>
> The thing I would like to investigate first is 2), it seems to me that
> nv30_miptree_create should not be doing pitch alignment itself, so if we
> want to have the same pitch for these surfaces as in the dri2 case, we
> need to fix this in the caller. Which leads to the question where is
> this code being called from in the dri3 case. And this is where I'm
> stuck and could use your or Benjamin's input. So do you know where the
> caller is and/or how to figure out where the caller is?

Well, a quick look at the DDX, looks like it never swizzles its
textures. See NV40EXAPictTexture for example -- it always puts
TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly
inefficient and silly.

Instead of doing that, you could "guess" whether it's swizzled or not
based on whether the w/h are POT. If both are, then don't put that
linear thing in. It's obviously possible to create POT-sized
non-swizzled textures, but you can see the nv30_miptree logic for when
it does and when it doesn't. I'm only suggesting this for debugging,
not for "real". The real solution to this will be to include the
swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the
bo_config and have the kernel hold on to it.

  -ilia


More information about the Nouveau mailing list