[PATCH 7/9] drm/omap: fix omap_crtc_flush() to handle the workqueue

Tue Sep 9 00:07:46 PDT 2014

On 08/09/14 16:31, Daniel Vetter wrote:
> On Mon, Sep 08, 2014 at 04:03:18PM +0300, Tomi Valkeinen wrote:
>> On 03/09/14 17:27, Daniel Vetter wrote:
>>> On Wed, Sep 03, 2014 at 02:55:08PM +0300, Tomi Valkeinen wrote:
>>>> omap_crtc_flush() is used to wait for scheduled work to be done for the
>>>> give crtc. However, it's not quite right at the moment.
>>>>
>>>> omap_crtc_flush() does wait for work that is ran via vsync irq to be
>>>> done. However, work is also queued to the driver's priv->wq workqueue,
>>>> which is not handled by omap_crtc_flush().
>>>>
>>>> Improve omap_crtc_flush() to flush the workqueue so that work there will
>>>> be ran.
>>>>
>>>> This fixes a race issue on module unload, where an unpin work may be on
>>>> the work queue, but does not get ran before drm core starts tearing the
>>>> driver down, leading to a WARN.
>>>>
>>>> Signed-off-by: Tomi Valkeinen <tomi.valkeinen at ti.com>
>>>
>>> I didn't really dig into details, but isn't that the same workqueue as
>>> used by the async modeset code? So the same deadlocks might happen ...
>>
>> Yes, we have just one workqueue in the driver.
>>
>> Hmm, deadlocks with what lock? The modeconfig or crtc->mutex? I don't
>> think they are locked at any place where omap_crtc_flush is called.
> 
> Oh, I presumed you're using _flush in the relevant modeset functions - we

No. That's the locking issue again. We can't flush when holding the crtc
mutex, as the works in the workqueue also try to grab it...

> do that in i915 to make sure that all the pageflips and other stuff
> completed before we do another modeset. But omap only calls this at driver
> unload, so no direct problem.

At the moment yes, but in this series I add the same omap_crtc_flush()
call to two new places: dev_preclose and omap_crtc_commit. Of which the
omap_crtc_commit is the problematic one, discussed in the mail thread
for patch 4.

>>> lockdep won't complain though since you essentially open-code a
>>> workqueue_flush, and lockdep also doesn't complain about all possible
>>> deadlocks (due to some design issues with lockdep).
>>
>> What do you mean "open-code a workqueue_flush"?. I use flush_workqueue
>> there. We have two things to wait for: work on the workqueue and work
>> which is triggered by the vsync irq. So we loop and test for both of
>> those, until there's no more work.
> 
> Oops, missed that. Ordering looks wrong though since if the irq can latch
> the workqueue you need to wait for irqs to happen first before flushing.
> And obviously queue the work before signalling the completion of the
> interrupt. But since this seems to lack locking anyway and is only used
> for unload it doesn't really matter.

Yeah, well, the workqueue can create work for the irq also. I don't know
if it does, currently, but I think it's safer to presume that both
workqueue and the irq can create work to the other.

But that's why I have a loop there. So we flush, then check if there is
work for the irq. If yes, sleep a bit and go back to start. So if the
irq work created new work for the wq, we flush that. And if that work
created new work for the irq, we check that again. Etc.

 Tomi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140909/0ef9ea4d/attachment.sig>