[git pull] drm for 5.8-rc1

Thu Jul 2 21:14:01 UTC 2020

On 7/2/20 1:22 AM, Daniel Stone wrote:
> Hi,
> 
> On Wed, 1 Jul 2020 at 20:45, James Jones <jajones at nvidia.com> wrote:
>> OK, I think I see what's going on.  In the Xorg modesetting driver, the
>> logic is basically:
>>
>> if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
>>     drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
>> } else {
>>     drmModeAddFB(...);
>> }
> 
> I read this thread expecting to explain the correct behaviour we
> implement in Weston and how modesetting needs to be fixed, but ...
> that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we
> used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'.

Yes, the hazards of reporting findings before verifying.  I now see 
modesetting does query the DRM-KMS modifiers and attempt to allocate 
with them if it found any.  However, I still see a lot of ways things 
can go wrong, but I'm not going to share my speculation again until I've 
actually verified it, which is taking a frustratingly long time.  The 
modesetting driver is not my friend right now.

>> There's no attempt to verify the DRM-KMS device supports the modifier,
>> but then, why would there be?  GBM presumably chose a supported modifier
>> at buffer creation time, and we don't know which plane the FB is going
>> to be used with yet.  GBM doesn't actually ask the kernel which
>> modifiers it supports here either though.
> 
> Right, it doesn't ask, because userspace tells it which modifiers to
> use. The correct behaviour is to take the list from the KMS
> `IN_FORMATS` property and then pass that to
> `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from
> that list and only that list. If that call does not succeed and Xorg
> falls back to `gbm_surface_create`, then it must not call
> `gbm_bo_get_modifier` - so that would be a modesetting bug. If that
> call does succeed and `gbm_bo_get_modifier` subsequently reports a
> modifier which was not in the list, that's a Mesa driver bug.
> 
>> It just goes into Mesa via
>> DRI and reports the modifier (unpatched) Mesa chose on its own.  Mesa
>> just hard-codes the modifiers in its driver backends since its thinking
>> in terms of a device's 3D engine, not display.  In theory, Mesa's DRI
>> drivers could query KMS for supported modifiers if allocating from GBM
>> using the non-modifiers path and the SCANOUT flag is set (perhaps some
>> drivers do this or its equivalent?  Haven't checked.), but that seems
>> pretty gnarly and doesn't fix the modifier-based GBM allocation path
>> AFAIK.  Bit of a mess.
> 
> Two options for GBM users:
> * call gbm_*_create_with_modifiers, it succeeds, call
> gbm_bo_get_modifier, pass modifier into AddFB
> * call gbm_*_create (without modifiers), it succeeds, do not call
> gbm_bo_get_modifier, do not pass a modifier into AddFB
> 
> Anything else is a bug in the user. Note that falling back from 1 to 2
> is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back
> to the non-modifier path, provided you don't later try to get a
> modifier back out.
>
>> For a quick userspace fix that could probably be pushed out everywhere
>> (Only affects Xorg server 1.20+ AFAIK), just retrying
>> drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on
>> failure should be sufficient.
> 
> This would break other drivers.

I think this could be done in a way that wouldn't, though it wouldn't be 
quite as simple.  Let's see what the true root cause is first though.

>> Still need to verify as I'm having
>> trouble wrangling my Xorg build at the moment and I'm pressed for time.
>> A more complete fix would be quite involved, as modesetting isn't really
>> properly plumbed to validate GBM's modifiers against KMS planes, and it
>> doesn't seem like GBM/Mesa/DRI should be responsible for this as noted
>> above given the general modifier workflow/design.
>>
>> Most importantly, options I've considered for fixing from the kernel side:
>>
>> -Accept "legacy" modifiers in nouveau in addition to the new modifiers,
>> though avoid reporting them to userspace as supported to avoid further
>> proliferation.  This is pretty straightforward.  I'll need to modify
>> both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set
>> plane validation logic (nv50_plane_format_mod_supported), but it should
>> end up just being a few lines of code.
> 
> I do think that they should also be reported to userspace if they are
> accepted. Other users can and do look at the modifier list to see if
> the buffer is acceptable for a given plane, so the consistency is good
> here. Of course, in Mesa you would want to prioritise the new
> modifiers over the legacy ones, and not allocate or return the legacy
> ones unless that was all you were asked for. This would involve
> tracking the used modifier explicitly through Mesa, rather than
> throwing it away at alloc time and then later divining it from the
> tiling mode.

Reporting them as supported is equivalent to reporting support for a 
memory layout the chips don't actually support (It corresponds to a 
valid layout on Tegra chips, but not on discrete NV chips).  This is 
what the new modifiers are trying to avoid in the first place: Implying 
buffers can be shared between these Tegra chips and discrete NV GPUs.

Thanks,
-James

> Cheers,
> Daniel
>