[PATCH 00/48] Etnaviv changes RFCv1->RFCv2

Tue Oct 20 02:09:52 PDT 2015

On Tue, Oct 20, 2015 at 11:00 AM, Lucas Stach <l.stach at pengutronix.de> wrote:
> Am Dienstag, den 20.10.2015, 09:20 +0200 schrieb Christian Gmeiner:
>> Hi Lucas,
>>
>> 2015-10-13 10:25 GMT+02:00 Lucas Stach <l.stach at pengutronix.de>:
>> > Am Mittwoch, den 30.09.2015, 09:53 +0200 schrieb Christian Gmeiner:
>> >> Hi Lucas,
>> >>
>> >> 2015-09-28 12:39 GMT+02:00 Lucas Stach <l.stach at pengutronix.de>:
>> >> > Hi Christian,
>> >> >
>> >> > Am Montag, den 28.09.2015, 11:46 +0200 schrieb Christian Gmeiner:
>> >> >> Hi Lucas.
>> >> >>
>> >> >> I think I have run into a cache flush / cache coherency issue. I will
>> >> >> try to reproduce this issue with a small example and will
>> >> >> keep you updated.
>> >> >
>> >> > What are the symptoms of the issue you are hitting? Maybe I can
>> >> > reproduce or see if I have an idea right away.
>> >> >
>> >>
>> >> With the help of the etnaviv_2d_test in my libdrm repo on github I was able
>> >> to test different bo flags.
>> >>
>> >> ETNA_BO_UNCACHED and ETNA_BO_CACHED are working as expected.
>> >> The rendering result looks as expected. If I try ETNA_BO_WC the rendering
>> >> result looks different for every run.
>> >>
>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>> >> Version: 1.0.0
>> >>   Name: etnaviv
>> >>   Date: 20150910
>> >>   Description: etnaviv DRM
>> >> bo cpu prep: 0
>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>> >> 052880d433e1bf495e268206addd4087  /tmp/etna.bmp
>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>> >> Version: 1.0.0
>> >>   Name: etnaviv
>> >>   Date: 20150910
>> >>   Description: etnaviv DRM
>> >> bo cpu prep: 0
>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>> >> f1a02a52d81c0b79b098877e6b7d9303  /tmp/etna.bmp
>> >> debian at cubox:~/libdrm$ tests/etnaviv/etnaviv_2d_test /dev/dri/card1
>> >> Version: 1.0.0
>> >>   Name: etnaviv
>> >>   Date: 20150910
>> >>   Description: etnaviv DRM
>> >> bo cpu prep: 0
>> >> debian at cubox:~/libdrm$ md5sum /tmp/etna.bmp
>> >> de5a428eb1f6567849ef40a944a995b8  /tmp/etna.bmp
>> >>
>> >> etna_cmd_stream_finish() waits until the submitted command stream was
>> >> processed by the GPU. I tried to use etna_bo_cpu_prep(..) but I that did not
>> >> help.
>> >>
>> >> I am doing something wrong? Should this work in theory?
>> >>
>> > For the record: this is caused by the "shared attribute override enable"
>> > bit in the AUX_CTRL register of the PL310 cache controller not being
>> > set, which makes it non-compliant with the ARMv7 ARM specified behavior
>> > of conflicting memory aliases.
>> > The etnaviv kernel driver makes sure to clean the cachable alias before
>> > handing out the pages, but without this bit set the L2 controller will
>> > turn the userspace bufferable reads into "cachable, no allocate" which
>> > will lead to userspace hitting stale, non-evicted cache lines.
>> >
>> > This can be worked around by adding a "arm,shared-override" property to
>> > the L2 cache DT node. This will only work if the kernel is booted in
>> > secure state and will fault on a NS kernel. So this should be considered
>> > a hack and the bootloader/firmware should make sure to set this bit.
>> >
>>
>> Is the kernel able to detect this faulty environment? It would be great
>> if we can warn the user about this issue and 'convert' all ETNA_BO_WC
>> request to ETNA_BO_CACHED or ETNA_BO_UNCACHED.
>>
>> Else there might be some bug reports in the future where something is
>> broken due to a bad environment and these kind of bugs are hard to
>> sort out.
>>
> I don't think we should work around a platform issue in individual
> drivers. I mean etnaviv makes the issue really visible, but not having
> this bit set in the PL310 controller makes the whole platform
> non-conforming to what is specified in the ARMv7 ARM, so the platform
> may exhibit all kinds of subtle breakages anyway.
>
> We may check for this bit in the PL310 initialization and warn the user
> that a firmware update might be needed to fix this. This should be
> enough to sort out bug reports caused by this specific issue.
>

I agree, although we can also point them to the device-tree option to
workaround the problem temporarily.

Christian, fyi I have fixed this in u-boot for all the SolidRun platforms.