[PATCH 03/33] HACK: drm/omap: fix memory barrier bug in DMM driver
Tomi Valkeinen
tomi.valkeinen at ti.com
Wed Feb 24 10:34:35 UTC 2016
On 23/02/16 23:13, Laurent Pinchart wrote:
> Hi Tomi,
>
> Thank you for the patch.
>
> On Friday 19 February 2016 11:47:38 Tomi Valkeinen wrote:
>> A DMM timeout "timed out waiting for done" has been observed on DRA7
>> devices. The timeout happens rarely, and only when the system is under
>> heavy load.
>>
>> Debugging showed that the timeout can be made to happen much more
>> frequently by optimizing the DMM driver, so that there's almost no code
>> between writing the last DMM descriptors to RAM, and writing to DMM
>> register which starts the DMM transaction.
>>
>> The current theory is that a wmb() does not properly ensure that the
>> data written to RAM is observable by all the components in the system.
>>
>> This DMM timeout has caused interesting (and rare) bugs as the error
>> handling was not functioning properly (the error handling has been fixed
>> in previous commits):
>>
>> * If a DMM timeout happened when a GEM buffer was being pinned for
>> display on the screen, a timeout error would be shown, but the driver
>> would continue programming DSS HW with broken buffer, leading to
>> SYNCLOST floods and possible crashes.
>>
>> * If a DMM timeout happened when other user (say, video decoder) was
>> pinning a GEM buffer, a timeout would be shown but if the user
>> handled the error properly, no other issues followed.
>>
>> * If a DMM timeout happened when a GEM buffer was being released, the
>> driver does not even notice the error, leading to crashes or hang
>> later.
>>
>> This patch adds wmb() and readl() calls after the last bit is written to
>> RAM, which should ensure that the execution proceeds only after the data
>> is actually in RAM, and thus observable by DMM.
>>
>> This patch is a HACK, as a read-back should not be needed. Further study
>> is required to understand if DMM is somehow special case and read-back
>> is ok, or if DRA7's memory barriers do not work correctly.
>
> CONFIG_SOC_DRA7XX selects OMAP_INTERCONNECT and OMAP_INTERCONNECT_BARRIER, but
> dra7xx_map_io() doesn't call omap_barriers_init(). Could that be the root
> cause of the issue ? I don't have access to a DRA7xx system, would you be able
> to test that ?
No idea, but I did dig up discussions about this in my mailbox, and it
seems there's been some work done after I wrote this patch, in "Fix
OMAP4 barrier support" series last summer. I'm not sure if that's only
for OMAP4, though.
I'll drop this patch too from the series, and spend a bit more time on
it. This is again something that's a bit tricky to reproduce and test.
Tomi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160224/4a1f509e/attachment.sig>
More information about the dri-devel
mailing list