[Nouveau] "enable dri3 support without glamor" causes gnome-shell regression on nv4x

Hans de Goede hdegoede at redhat.com
Tue Aug 11 06:13:30 PDT 2015


Hi,

On 11-08-15 14:21, Ilia Mirkin wrote:
> On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <hdegoede at redhat.com> wrote:
>> Hi,
>>
>>
>> On 03-08-15 20:09, Ilia Mirkin wrote:
>>>
>>> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdegoede at redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On 03-08-15 17:36, Ilia Mirkin wrote:
>>>>>
>>>>>
>>>>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdegoede at redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 30-07-15 16:09, Ilia Mirkin wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> FWIW this is a fail on nv50+ as well. See for example
>>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445
>>>>>>>
>>>>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done
>>>>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization
>>>>>>> and so the commands never get flushed out. Easily verified by sticking
>>>>>>> PUSH_KICK's everywhere.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I do not believe that that is the problem, in my case it clearly
>>>>>> seems to be a pitch / swizzle problem rather then a synchronizarion
>>>>>> problem, here is what my desktop with gnome shell looks like when
>>>>>> using DRI2:
>>>>>>
>>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg
>>>>>>
>>>>>> And this is what it looks like when using DRI3:
>>>>>>
>>>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg
>>>>>>
>>>>>> The DRI2 screenshot is made with Mario's 2 patches on top of
>>>>>> current master:
>>>>>>
>>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html
>>>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html
>>>>>>
>>>>>> And then adding Option "DRI" "2" to xorg.conf.
>>>>>
>>>>>
>>>>>
>>>>> His patches should have defaulted it to DRI 2 I think, so this is
>>>>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3
>>>>> with his patches.
>>>>>     --
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've also tried disabling EXA using Option "AccelMethod" "none",
>>>>>> but that seems to also automatically disable all DRI, leading to
>>>>>> software rendering.
>>>>>>
>>>>>> I discussed this with Ben this morning and he suggested that this
>>>>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx
>>>>>> allocs the surfaces. I've tried disabling swizzling in the
>>>>>> mesa code by forcing nv30_miptree_create() to always take
>>>>>> the code path for linear textures, but that leads to the exact
>>>>>> same result as before that change.
>>>>>
>>>>>
>>>>>
>>>>> Ah yes. Very different problem indeed. I actually suspect it has to do
>>>>> with swizzling. Look at the white pattern of the moon -- it's all in a
>>>>> line. That means that it expected some locality and instead it got
>>>>> drawn all on a line. If it were merely a stride problem, I'd expect to
>>>>> see strips of the moon below and offset from one another.
>>>>>
>>>>> So... take a look at nv30_miptree_from_handle -- I wonder if it can
>>>>> now receive swizzled textures where it couldn't before.
>>>>
>>>>
>>>>
>>>> Ok, that does go in the direction I am expecting the problem to be,
>>>> but I'm afraid I'm going to need a bit more guidance, what exactly
>>>> am I looking for in that function / which "knobs" should I try to
>>>> vary / play with to maybe fix this ?
>>>
>>>
>>> Unfortunately this is playing near (or past) the limits of my
>>> knowledge as well. My understanding is that DRI3 passes pixmaps around
>>> with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism
>>> which is not that (I think it just copies stuff around).
>>>
>>> Now on nv50+, bo's have "tile flags" (and memtype and probably other
>>> annoyances). The tile flags indicate the specific tiling mechanism
>>> used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look
>>> at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a
>>> "bo config" argument. This bo config can then later be retrieved using
>>> some other syscall.
>>>
>>> However on nv30 there appears to not be any such thing. The
>>> nouveau_bo_new call just passes in NULL for creating the bo, which
>>> means that there's no way to recover the "are you swizzled"
>>> information after-the-fact.
>>>
>>> Presumably you should create a "nv04" bo config section in the union,
>>
>>
>> That already exists, and indeed gets set by the nouveau_allocate_surface
>> function from src/nv_accel_common.c from the ddx,
>>
>>> and just pass the single "swizzled" bit through. I'm not sure what, if
>>> anything, is required on the kernel side for that. I don't think
>>> there's any optionality in how the swizzling is done for pre-nv50.
>>>
>>> Note that in the nv30_miptree logic, mt->swizzled implies that
>>> mt->uniform_pitch = 0, but the level pitch is set "properly" (again,
>>> see nv30_miptree_create).
>>>
>>> Hope this sheds some light and doesn't cause you to go in the wrong
>>> direction -- please take everything I say with a grain of salt -- I'm
>>> often a bit off on some of the details.
>>
>>
>> Thanks this was helpful, I do feel we are getting somewhere, but I do
>> need a bit more help.
>>
>> I've added some debug printf's to nv30_miptree.c, nv30_miptree_create
>> and nv30_miptree_from_handle, where the latter is only used when using
>> dri2 (e.g. in the working case).
>>
>> Doing a diff between a log of starting gnome-shell with dri vs dri3
>> results in this:
>>
>> --- mesa.log.dri2       2015-08-10 14:18:03.182712022 +0200
>> +++ mesa.log.dri3       2015-08-10 14:18:33.233336338 +0200
>> @@ -1,8 +1,8 @@
>>   nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
>> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
>> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
>> -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
>> +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>> @@ -12,6 +12,7 @@
>>   nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
>> +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>> @@ -20,29 +21,24 @@
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>> -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0
>> +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>>   nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
>>   nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>> -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>> -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>>   nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>>   nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>> +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
>> +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>>   nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>>   nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>>   nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>>   nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
>>   nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
>>   nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
>>   nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>>   nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
>> -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
>>
>> I've also added logging of the surface creation in the ddx, here
>> is one such log line:
>>
>>   Surface alloc 1920x1080 at 32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags
>> 2
>>
>> What stands out to me is:
>>
>> 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are
>>   replaced by nv30_miptree_create calls
>
> I bet it becomes the DDX that consumes these now though? And perhaps
> it's not ready to take swizzled textures (i.e. pixmaps)?

Correct, it turns out these are scanout buffers, see below for a fix.

>> 2) These replacement calls use a different pitch for the 1920x... surfaces,
>> 8192 vs 7680
>
> Looks like it gets upgraded to a POT value (i.e. 2048 x whatever).
> nv30_miptree will create an unswizzled surface for a non-POT-sized
> texture, but I guess something somewhere is aligning it up in the DDX?
> And *still* creating it as a linear texture? Weird.

Correct, both on the aligning and on them still being linear, maybe
the scanout buffers always need to be linear on nv3x / nv4x ?

>> 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling
>> nouveau_bo_new, where as nv30_miptree_create passes in NULL
>
> Right. That's the bit I was talking about earlier -- you need to pass
> the information that it will be swizzled. Otherwise it's lost in the
> nv30_miptree should probably be setting those surf_flags as well on BO
> creation. Apparently there are performance implications from setting
> these.

OK, things do seem to work fine with both dri2 / dri3 without the nv30_miptree
code passing these in. But if you think it is useful I can write a patch
setting these in the non-linear case, and see what happens.

>> 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree,
>> iow
>> it always assumes non-swizzled ?
>
> That's right.

Which works because the ddx is always creating linear buffers, right ?

>> The thing I would like to investigate first is 2), it seems to me that
>> nv30_miptree_create should not be doing pitch alignment itself, so if we
>> want to have the same pitch for these surfaces as in the dri2 case, we
>> need to fix this in the caller. Which leads to the question where is
>> this code being called from in the dri3 case. And this is where I'm
>> stuck and could use your or Benjamin's input. So do you know where the
>> caller is and/or how to figure out where the caller is?
>
> Well, a quick look at the DDX, looks like it never swizzles its
> textures. See NV40EXAPictTexture for example -- it always puts
> TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly
> inefficient and silly.
>
> Instead of doing that, you could "guess" whether it's swizzled or not
> based on whether the w/h are POT. If both are, then don't put that
> linear thing in. It's obviously possible to create POT-sized
> non-swizzled textures, but you can see the nv30_miptree logic for when
> it does and when it doesn't. I'm only suggesting this for debugging,
> not for "real". The real solution to this will be to include the
> swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the
> bo_config and have the kernel hold on to it.

This assumes that it is useful / possible to use swizzled buffers
in combination with the scanout engine on nv3x/nv4x ...

Also given the age of the hardware I wonder whether it is worth the
trouble to do this.

###

Thanks for all the info, I've been working on this myself today, and
I've come up with the following fix for the immediate problem at
hand:

--- a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
@@ -28,6 +28,7 @@
  #include "util/u_surface.h"

  #include "nv_m2mf.xml.h"
+#include "nv_object.xml.h"
  #include "nv30/nv30_screen.h"
  #include "nv30/nv30_context.h"
  #include "nv30/nv30_resource.h"
@@ -362,6 +363,7 @@ nv30_miptree_create(struct pipe_screen *pscreen,
     blocksz = util_format_get_blocksize(pt->format);

     if ((pt->target == PIPE_TEXTURE_RECT) ||
+       (pt->bind & PIPE_BIND_SCANOUT) ||
         !util_is_power_of_two(pt->width0) ||
         !util_is_power_of_two(pt->height0) ||
         !util_is_power_of_two(pt->depth0) ||
@@ -369,6 +371,14 @@ nv30_miptree_create(struct pipe_screen *pscreen,
         util_format_is_float(pt->format) || mt->ms_mode) {
        mt->uniform_pitch = util_format_get_nblocksx(pt->format, w) * blocksz;
        mt->uniform_pitch = align(mt->uniform_pitch, 64);
+      if (pt->bind & PIPE_BIND_SCANOUT) {
+         struct nv30_screen *screen = nv30_screen(pscreen);
+         /* This assumes a tiled framebuffer */
+         int pitch_align = MAX2(
+               screen->eng3d->oclass >= NV40_3D_CLASS ? 1024 : 256,
+               /* round_down_pow2(mt->uniform_pitch / 4) */
+               1 << (util_last_bit(mt->uniform_pitch / 4) - 1));
+         mt->uniform_pitch = align(mt->uniform_pitch, pitch_align);
+      }
     }

     if (!mt->uniform_pitch)


What I believe is happening is that with dri3 the buffer which becomes
the buffer scanned-out by the hardware is allocated by a dri3 client,
rather then by the x-server/ddx, but the actual video-mode set is still
done by the ddx, so these scanout buffers must have a layout matching
the one expected by the video-mode as setup by the ddx.

This patch seems to fix the immediate problem at hand, it has one problem
though, that it assumes that pNv->tiled_layout in the ddx always is true,
I see no way to get to pNv->tiled_layout from the mesa code, so I think
that it may be best to fix this problem by simply always making
pNv->tiled_layout true when dri3 is active in the ddx.

Regards,

Hans


More information about the Nouveau mailing list