dri/drm/kms question with regards to minor faults
Steven Price
steven.price at arm.com
Wed Nov 3 11:54:33 UTC 2021
On 01/11/2021 05:20, Bert Schiettecatte wrote:
> Hi John
>
>> Coincidentally, I've been looking at Panfrost on RK3288 this week as
>> well! I'm testing it with a project that has been using the binary blob
>> driver for several years and unfortunately Panfrost seems to use ~15%
>> more CPU.
>> Like you, I see a huge number of minor faults (~500/second compared with
>> ~3/second on libmali). It seems that Panfrost is mmap'ing and
>> munmap'ing buffers on every frame which doesn't happen when the same
>> application is using the binary driver.
>
> Thanks for confirming you are seeing the same issue.
>
>> Panfrost experts, is there a missed opportunity for optimisation here?
>> Or is there something applications should be doing differently to avoid
>> repeatedly mapping & unmapping the same buffers?
>
> Panfrost team - any update on this?
I was hoping Alyssa would comment since she's much more familiar with
Mesa than I am!
On the first point of libmali not performing mmap()s very often - I'll
just note that this was a specific design goal and for example the kbase
kernel driver provides ioctl()s to do CPU cache maintenance for this to
work on arm platforms (i.e. it's not a portable solution).
So short answer: yes there is room for optimisation here.
However things get tricky when fitting into a portable framework. The
easiest way of ensuring cache coherency is to ensure there is a clear
owner - so the usual approach is mmap(), read/write some data on the
CPU, munmap(), GPU accesses data, repeat. The DMA framework in the
kernel will then ensure that any cache maintenance/bounce buffering or
other quirks are dealt with.
Having said that we know that existing platforms don't require these
'quirks' (because libmali works on them) so in theory it should be
possible for Mesa to avoid the mmap()/munmap() dance in many cases
(where the memory is coherent with the GPU[1]). But this is where my
knowledge of Mesa is lacking as I've no idea how to go about that.
Regards,
Steve
[1] I think this should actually be true all the time with Panfrost as
the buffer is mapped write-combining on the CPU if the GPU isn't fully
coherent. But I haven't double checked this.
More information about the dri-devel
mailing list