[PATCH 03/33] HACK: drm/omap: fix memory barrier bug in DMM driver

Tomi Valkeinen tomi.valkeinen at ti.com
Wed Feb 24 10:34:35 UTC 2016


On 23/02/16 23:13, Laurent Pinchart wrote:
> Hi Tomi,
> 
> Thank you for the patch.
> 
> On Friday 19 February 2016 11:47:38 Tomi Valkeinen wrote:
>> A DMM timeout "timed out waiting for done" has been observed on DRA7
>> devices. The timeout happens rarely, and only when the system is under
>> heavy load.
>>
>> Debugging showed that the timeout can be made to happen much more
>> frequently by optimizing the DMM driver, so that there's almost no code
>> between writing the last DMM descriptors to RAM, and writing to DMM
>> register which starts the DMM transaction.
>>
>> The current theory is that a wmb() does not properly ensure that the
>> data written to RAM is observable by all the components in the system.
>>
>> This DMM timeout has caused interesting (and rare) bugs as the error
>> handling was not functioning properly (the error handling has been fixed
>> in previous commits):
>>
>>  * If a DMM timeout happened when a GEM buffer was being pinned for
>>    display on the screen, a timeout error would be shown, but the driver
>>    would continue programming DSS HW with broken buffer, leading to
>>    SYNCLOST floods and possible crashes.
>>
>>  * If a DMM timeout happened when other user (say, video decoder) was
>>    pinning a GEM buffer, a timeout would be shown but if the user
>>    handled the error properly, no other issues followed.
>>
>>  * If a DMM timeout happened when a GEM buffer was being released, the
>>    driver does not even notice the error, leading to crashes or hang
>>    later.
>>
>> This patch adds wmb() and readl() calls after the last bit is written to
>> RAM, which should ensure that the execution proceeds only after the data
>> is actually in RAM, and thus observable by DMM.
>>
>> This patch is a HACK, as a read-back should not be needed. Further study
>> is required to understand if DMM is somehow special case and read-back
>> is ok, or if DRA7's memory barriers do not work correctly.
> 
> CONFIG_SOC_DRA7XX selects OMAP_INTERCONNECT and OMAP_INTERCONNECT_BARRIER, but 
> dra7xx_map_io() doesn't call omap_barriers_init(). Could that be the root 
> cause of the issue ? I don't have access to a DRA7xx system, would you be able 
> to test that ?

No idea, but I did dig up discussions about this in my mailbox, and it
seems there's been some work done after I wrote this patch, in "Fix
OMAP4 barrier support" series last summer. I'm not sure if that's only
for OMAP4, though.

I'll drop this patch too from the series, and spend a bit more time on
it. This is again something that's a bit tricky to reproduce and test.

 Tomi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160224/4a1f509e/attachment.sig>


More information about the dri-devel mailing list