[Mesa-dev] [Bug 107670] Massive slowdown under specific memcpy implementations (32bit, no-SIMD, backward copy).
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Aug 24 00:15:07 UTC 2018
https://bugs.freedesktop.org/show_bug.cgi?id=107670
Bug ID: 107670
Summary: Massive slowdown under specific memcpy implementations
(32bit, no-SIMD, backward copy).
Product: Mesa
Version: unspecified
Hardware: x86 (IA32)
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: Other
Assignee: mesa-dev at lists.freedesktop.org
Reporter: iive at yahoo.com
QA Contact: mesa-dev at lists.freedesktop.org
I've traced the massive slowdown to the memcpy() in
"mesa/src/gallium/auxiliary/util/u_upload_mgr.c::u_upload_data()" that seems to
be used to move data from the host memory into the video card memory.
The slowdown could be observed if non-SIMD version of the glibc-2.27 function
is used (like the one that comes with the 32 bit Slackware-current). The system
mesa3d package does not exhibit the same slowdown, but it seems to be linked to
glibc-2.5.
I do suspect that the slowdown is caused by memcpy() implementation that copies
data backwards, starting from the end and moving to the beginning. This is
likely treated as non-sequential data transfer over the PCI bus (it probably
sends the full 32 bit address for every 32 bits of data).
Using SSE2 memcpy seems to avoid this problem, but I have no idea if it is
because it copies more data at once or because it copies forward.
In my benchmarks, `perf top` showed that the problematic memcpy() consumes 25%
CPU time. In a particular game benchmark, I was getting 50fps instead of 70fps.
Just replacing that memcpy() with memmove() fixed the issue for me, without
having to recompile and replace glibc.
However I do not consider it reliable fix, as there is nothing guaranteeing
that memmove() would do the right thing.
I think that the correct solution would be to create a new function
memcpy_to_pci() and having assembly implementation(s) that are specifically
crafted to maximize PCI/PCIe throughput.
The kernel has memcpy_toio/fromio(), but they don't seem to be asm optimized.
I've seen MPlayer MMX optimized mem2agpcpy() in aclib_template.c .
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180824/2c849006/attachment.html>
More information about the mesa-dev
mailing list