[PATCH 00/48] Etnaviv changes RFCv1->RFCv2

Tue Oct 20 02:40:41 PDT 2015

Hi Jon,

2015-10-20 11:09 GMT+02:00 Jon Nettleton <jon.nettleton at gmail.com>:
> On Tue, Oct 20, 2015 at 11:00 AM, Lucas Stach <l.stach at pengutronix.de> wrote:
>> Am Dienstag, den 20.10.2015, 09:20 +0200 schrieb Christian Gmeiner:
>>> Hi Lucas,
>>>
>>> 2015-10-13 10:25 GMT+02:00 Lucas Stach <l.stach at pengutronix.de>:
>>> > Am Mittwoch, den 30.09.2015, 09:53 +0200 schrieb Christian Gmeiner:
>>> >> Hi Lucas,
>>> >>
>>> >> 2015-09-28 12:39 GMT+02:00 Lucas Stach <l.stach at pengutronix.de>:
>>> >> > Hi Christian,
>>> >> >
>>> >> > Am Montag, den 28.09.2015, 11:46 +0200 schrieb Christian Gmeiner:
>>> >> >> Hi Lucas.
>>> >> >>
>>> >> >> I think I have run into a cache flush / cache coherency issue. I will
>>> >> >> try to reproduce this issue with a small example and will
>>> >> >> keep you updated.
>>> >> >
>>> >> > What are the symptoms of the issue you are hitting? Maybe I can
>>> >> > reproduce or see if I have an idea right away.
>>> >> >
>>> >>
>>> >> With the help of the etnaviv_2d_test in my libdrm repo on github I was able
>>> >> to test different bo flags.
>>> >>
>>> >> ETNA_BO_UNCACHED and ETNA_BO_CACHED are working as expected.
>>> >> The rendering result looks as expected. If I try ETNA_BO_WC the rendering
>>> >> result looks different for every run.
>>> >>
>>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>>> >> Version: 1.0.0
>>> >>   Name: etnaviv
>>> >>   Date: 20150910
>>> >>   Description: etnaviv DRM
>>> >> bo cpu prep: 0
>>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>>> >> 052880d433e1bf495e268206addd4087  /tmp/etna.bmp
>>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>>> >> Version: 1.0.0
>>> >>   Name: etnaviv
>>> >>   Date: 20150910
>>> >>   Description: etnaviv DRM
>>> >> bo cpu prep: 0
>>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>>> >> f1a02a52d81c0b79b098877e6b7d9303  /tmp/etna.bmp
>>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>>> >> Version: 1.0.0
>>> >>   Name: etnaviv
>>> >>   Date: 20150910
>>> >>   Description: etnaviv DRM
>>> >> bo cpu prep: 0
>>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>>> >> de5a428eb1f6567849ef40a944a995b8  /tmp/etna.bmp
>>> >>
>>> >> etna_cmd_stream_finish() waits until the submitted command stream was
>>> >> processed by the GPU. I tried to use etna_bo_cpu_prep(..) but I that did not
>>> >> help.
>>> >>
>>> >> I am doing something wrong? Should this work in theory?
>>> >>
>>> > For the record: this is caused by the "shared attribute override enable"
>>> > bit in the AUX_CTRL register of the PL310 cache controller not being
>>> > set, which makes it non-compliant with the ARMv7 ARM specified behavior
>>> > of conflicting memory aliases.
>>> > The etnaviv kernel driver makes sure to clean the cachable alias before
>>> > handing out the pages, but without this bit set the L2 controller will
>>> > turn the userspace bufferable reads into "cachable, no allocate" which
>>> > will lead to userspace hitting stale, non-evicted cache lines.
>>> >
>>> > This can be worked around by adding a "arm,shared-override" property to
>>> > the L2 cache DT node. This will only work if the kernel is booted in
>>> > secure state and will fault on a NS kernel. So this should be considered
>>> > a hack and the bootloader/firmware should make sure to set this bit.
>>> >
>>>
>>> Is the kernel able to detect this faulty environment? It would be great
>>> if we can warn the user about this issue and 'convert' all ETNA_BO_WC
>>> request to ETNA_BO_CACHED or ETNA_BO_UNCACHED.
>>>
>>> Else there might be some bug reports in the future where something is
>>> broken due to a bad environment and these kind of bugs are hard to
>>> sort out.
>>>
>> I don't think we should work around a platform issue in individual
>> drivers. I mean etnaviv makes the issue really visible, but not having
>> this bit set in the PL310 controller makes the whole platform
>> non-conforming to what is specified in the ARMv7 ARM, so the platform
>> may exhibit all kinds of subtle breakages anyway.
>>
>> We may check for this bit in the PL310 initialization and warn the user
>> that a firmware update might be needed to fix this. This should be
>> enough to sort out bug reports caused by this specific issue.
>>
>
> I agree, although we can also point them to the device-tree option to
> workaround the problem temporarily.
>
> Christian, fyi I have fixed this in u-boot for all the SolidRun platforms.

Great - will update all my devices.

greets
--
Christian Gmeiner, MSc

https://soundcloud.com/christian-gmeiner