[PATCH v14 11/41] compositor-drm: Move repaint state application to flush

Wed Jan 17 15:18:19 UTC 2018

Hi,

On 17 January 2018 at 13:58, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> this is still a tricky one, sorry.
>
> I looked at the details here hard and long, and then came to a
> revelation: is it at all possible to ever call drm_set_dpms() between
> repaint_begin and repaint_flush/cancel?
>
> I don't think it is and that would make many of my comments below just
> moot.

Yes, it is: when blanking the screen. Immediately after the output
repaint, we trigger any animation frame handlers. The fade-then-blank
animation will thus call output->set_dpms(WESTON_DPMS_OFF) immediately
after output->repaint() has completed, but before output->repaint().
If we just naïvely move repaint state application to repaint_flush()
without touching DPMS (as I'd done originally, with DPMS moved in a
follow-on patch), we turn the output off from drm_set_dpms(), and then
just turn it straight back on with a blank framebuffer in
drm_repaint_flush(). This patch is the result of trying to avoid
intermediate breakage by squashing the two patches together, as I
couldn't find a non-fragile way to do it separately.

We could avoid this issue by reworking the core animation API, giving
it a hook to be called on the next weston_output_frame_finish(), which
would give it the opportunity to disable the output and kill the
repaint cycle. That would require more work to do that and also
re-disentangle this patch, but it is doable.

As said on IRC, there's a second issue which bites us when we want to
use the atomic API. We use drmModeSetCrtc() in anything but
steady-state repaints: when we want to enable an output, change its
mode (or just its framebuffer's pitch), or disable it.
drmModeSetCrtc() synchronises in both directions, by waiting for any
previously-scheduled operations (e.g. pageflips) to complete, and also
blocking until it has completed itself. (Michel Dänzer points out the
AMD driver doesn't do this, so just how DPMS behaves on non-Intel
drivers, I have no real clue.)

Atomic trips up by demanding that we do our own synchronisation: if we
have a non-blocking operation still in flight, it's our responsibility
to not issue more commits until they have completed. One option to
circumvent this is what I've done here.

Another is to have a cut-out where we only ever use drmModeSetCrtc()
when disabling outputs (hoping that it synchronises as the legacy one
did; maybe that needs IGT), and enforce that output->set_dpms() is
never called from within the repaint loop.

The third is to continue to use the atomic API to set DPMS where
applicable, but never call weston_output_finish_frame from
drm_output_update_complete, and always do a blocking commit such that
there are no post-DPMS synchronisation issues (e.g. immediately
re-enabling the output). This still needs the dpms_off_pending flag
though, so we can avoid the inverse race, where we schedule a
non-blocking pageflip and then immediately issue DPMS-off before the
pageflip has had a chance to complete. (Related musing: I would like
the forest of int *_pending to become enum weston_drm_state
desired_state. Maybe that would make the desired core->backend
output-configuration API which also encapsulates things like
enabled/disabled and modes, easier to implement. But anyway.)

>> @@ -1325,6 +1355,21 @@ drm_output_update_complete(struct drm_output *output, uint32_t flags,
>>       } else if (output->disable_pending) {
>>               weston_output_disable(&output->base);
>>               output->disable_pending = 0;
>> +             output->dpms_off_pending = 0;
>> +             return;
>> +     } else if (output->dpms_off_pending) {
>> +             struct drm_pending_state *pending = drm_pending_state_alloc(b);
>> +             drm_output_get_disable_state(pending, output);
>> +             drm_pending_state_apply(pending);
>
> See the question about drm_pending_state_apply vs. apply_sync() far
> down below.

Should be apply_sync(), yes.

>> +             output->dpms_off_pending = 0;
>> +             return;
>> +     } else if (output->state_cur->dpms == WESTON_DPMS_OFF &&
>> +                output->base.repaint_status != REPAINT_AWAITING_COMPLETION) {
>> +             /* DPMS can happen to us either in the middle of a repaint
>> +              * cycle (when we have painted fresh content, only to throw it
>> +              * away for DPMS off), or at any other random point. If the
>> +              * latter is true, then we cannot go through finish_frame,
>> +              * because the repaint machinery does not expect this. */
>
> When turning an output off, there should be no pageflip event, so how
> can this ever be reached?
>
> drm_output_apply_state() on the DPMS-off path does not schedule a
> pageflip event.

Good catch, you're completely correct. I need to dig through to find
out what happened here: the intention was to call
drm_output_update_complete(), but that's clearly not what happens
(drm_output_apply_state calls weston_output_finish_frame directly,
which shouldn't happen!). The straight fix here would be to remove
some of the open-coding from drm_output_apply_state, and instead call
drm_output_update_complete.

>> @@ -1728,10 +1755,45 @@ drm_output_repaint(struct weston_output *output_base,
>>               output->cursor_plane->base.y = INT32_MIN;
>>       }
>>
>> -     drm_output_render(state, damage);
>> -     scanout_state = drm_output_state_get_plane(state, scanout_plane);
>> -     if (!scanout_state || !scanout_state->fb)
>> -             goto err;
>> +     if (state->dpms != WESTON_DPMS_ON) {
>> +             wl_list_for_each(ps, &state->plane_list, link) {
>> +                     p = ps->plane;
>> +                     assert(ps->fb == NULL);
>> +                     assert(ps->output == NULL);
>> +
>> +                     if (p->type != WDRM_PLANE_TYPE_OVERLAY)
>> +                             continue;
>> +
>> +
>> +                     ret = drmModeSetPlane(backend->drm.fd, p->plane_id,
>> +                                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
>> +                     if (ret)
>> +                             weston_log("drmModeSetPlane failed disable: %m\n");
>> +             }
>> +
>> +             if (output->cursor_plane) {
>> +                     ret = drmModeSetCursor(backend->drm.fd, output->crtc_id,
>> +                                            0, 0, 0);
>> +                     if (ret)
>> +                             weston_log("drmModeSetCursor failed disable: %m\n");
>> +             }
>> +
>> +             ret = drmModeSetCrtc(backend->drm.fd, output->crtc_id, 0, 0, 0,
>> +                                  &output->connector_id, 0, NULL);
>> +             if (ret)
>> +                     weston_log("drmModeSetCrtc failed disabling: %m\n");
>> +
>> +             drm_output_assign_state(state, DRM_STATE_APPLY_SYNC);
>> +             weston_compositor_read_presentation_clock(output->base.compositor, &now);
>> +             weston_output_finish_frame(&output->base,
>> +                                        &now,
>> +                                        WP_PRESENTATION_FEEDBACK_KIND_HW_COMPLETION);
>
> See the weston_output_finish_frame() comment further down below.
>
> finish_frame here is necessary if and only if drm_set_dpms(off) was
> called between repaint_begin and repaint_flush... but that's not
> possible, is it?

As above, it is. However, it is a bug to call it _unless_ it is: the
repaint machinery will assert if we call finish_frame when we're
outside repaint. This is what the guard in
drm_output_update_complete() prevents, and why we should call that
here instead ...

>> @@ -3058,9 +3229,6 @@ drm_output_find_special_plane(struct drm_backend *b, struct drm_output *output,
>>  static void
>>  drm_plane_destroy(struct drm_plane *plane)
>>  {
>> -     if (plane->plane_id > 0)
>> -             drmModeSetPlane(plane->backend->drm.fd, plane->plane_id,
>> -                             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
>
> Did you have a rationale for this change? I still found it surprising,
> and for v12 I wrote:
>
> : This was a slightly surprising piece in this patch. Is the rationale
> : here that Weston should not reset CRTC state on exit? Shouldn't that
> : wait for the "compositor-drm: Don't restore original CRTC mode" patch?
> :
> : Feels like this hunk doesn't belong in this patch.
> :
> : I suppose this might leave an overlay up, but since we have overlays
> : disabled by default, it shouldn't cause issues.

I could certainly pull it out into a separate change, yeah.

>> +             /* The repaint loop already sets DPMS on; we don't need to
>> +              * explicitly set it on here, as it will already happen
>> +              * whilst applying the repaint state. */
>> +             if (level == WESTON_DPMS_ON)
>> +                     return;
>> +
>> +             state = drm_pending_state_get_output(pending_state, output);
>> +             if (state)
>> +                     drm_output_state_free(state);
>> +             state = drm_output_get_disable_state(pending_state, output);
>
> We set the disable state here, then:
> drm_repaint_flush() -> drm_pending_state_apply() ->
> drm_output_apply_state() ->
> hit the "state->dpms != WESTON_DPMS_ON" path which will call
> weston_output_finish_frame() unconditionally.
>
> Would that not lead to a repaint state machine assertion violation,
> which Michael Tretter was complaining about?

Yes, absolutely: that would explain it. So, this should be
drm_output_update_complete() and things (should) work as expected. But
that needs more testing, and also I guess you have reservations about
it.

>> +             state->dpms = level;
>
> get_disable_state() already sets state->dpms.

Good catch.

>> +     /* If we've already got a request in the pipeline, then we need to
>> +      * park our DPMS request until that request has quiesced. */
>> +     if (output->state_last) {
>> +             output->dpms_off_pending = 1;
>> +             return;
>> +     }
>> +
>> +     pending_state = drm_pending_state_alloc(b);
>> +     drm_output_get_disable_state(pending_state, output);
>> +     ret = drm_pending_state_apply(pending_state);
>
> Should this not be calling drm_pending_state_apply_sync() instead?
>
> It's a little confusing as it does not matter whether one calls apply
> or apply_sync, it's the DPMS state that determines whether it will be
> sync or not.
>
> Otherwise this code hints to me that it's an async apply.

Yes, it should be sync for DPMS off, for consistency.

Thanks for the thorough review! I'll rethink this, to see if I can
make it more consistent, much more clear, and also less error-prone.

Cheers,
Daniel