drm-next-misc merge breaks vmwgfx

Thu Apr 6 14:46:14 UTC 2017

On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
>> Hi Thomas,
>> Bisected an offender already? Afaik there's no one else who reported
>> issues thus far, and for our own CI it seems all still fine.
>> -Daniel
> Hi, Daniel,
> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
> to be
> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
> Reverting first 1fa4da04 and then
> 38b6441e
> fixes the problem.

Yeah, we seem to have a solid functional conflict between the vmwgfx
atomic conversion, and the changes in drm-misc-next. Preliminary
analysis, but I think what's going on is:
- With the above changes in -misc we punt the deadlock retry loop to
the callers of ->set_config.
- But since it would have been way too invasive, I only fixed up the
atomic callers (in most places we have special paths for atomic and
non-atomic due to slightly different semantics), which means for
legacy functions we in some cases pass a NULL ctx down to
->set_config. But since legacy paths only get called on legacy
drivers, no problem.
- Well except I've done that audit before vmwgfx became atomic, and
that audit is now wrong, and I've forgotten to properly re-audit when
the conflicts happened all around. But since I half-expect to hit a
mid-driver conversion with this I did sprinkle
WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.

So assuming this is correct, you should see a pile of WARN_ON
backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
combo. The proper fix would be to switch over to atomic primitives for
all these cases. On a quick look I see some in the vmwgfx fbdev
emulation code, might even be worth it to check whether we could reuse
the core helpers (which do this split handling alread) in some cases.

Cheers, Daniel
