[Libreoffice-bugs] [Bug 137468] Severe performance degradation on a macOS with a 5K display
bugzilla-daemon at bugs.documentfoundation.org
bugzilla-daemon at bugs.documentfoundation.org
Wed Oct 21 15:44:36 UTC 2020
https://bugs.documentfoundation.org/show_bug.cgi?id=137468
--- Comment #29 from Leo Wang <ilford at gmail.com> ---
Currently the hottest spot in the downstream of the copyArea and copyBits calls
is:
7 CoreGraphics 2781.0 ripc_DrawLayer
6 CoreGraphics 2777.0 ripc_DrawImage
5 CoreGraphics 2774.0 ripc_RenderImage
4 CoreGraphics 2774.0 RIPLayerBltImage
3 CoreGraphics 2774.0 ripl_Mark
2 CoreGraphics 2774.0 ARGB32_image
1 CoreGraphics 2773.0 rgba32_image_mark
0 CoreGraphics 1985.0 rgba32_sample_rgba32
where the bottom rgba32_sample_rgba32 is not optimized with SIMD.
If the following modification is made:
diff --git a/vcl/quartz/salgdiutils.cxx b/vcl/quartz/salgdiutils.cxx
index 426aea29dc78..3ca31c2a3a7b 100644
--- a/vcl/quartz/salgdiutils.cxx
+++ b/vcl/quartz/salgdiutils.cxx
@@ -134,7 +134,7 @@ bool AquaSalGraphics::CheckContext()
const int nBytesPerRow = (nBitmapDepth * nScaledWidth) / 8;
void* pRawData = std::malloc(nBytesPerRow * nScaledHeight);
- const int nFlags = kCGImageAlphaNoneSkipFirst;
+ const int nFlags = kCGImageAlphaNoneSkipFirst |
kCGBitmapByteOrder32Host;
CGContextHolder aContextHolder(CGBitmapContextCreate(
pRawData, nScaledWidth, nScaledHeight, 8, nBytesPerRow,
GetSalData()->mxRGBSpace, nFlags));
i.e., to create a bitmap with the host byte order, at least in my Mac, all
calls to rgba32_sample_rgba32 are avoided. Here is it:
8 CoreGraphics 2009.0 CGContextDrawLayerAtPoint
7 CoreGraphics 2009.0 ripc_DrawLayer
6 CoreGraphics 2004.0 ripc_DrawImage
5 CoreGraphics 2002.0 ripc_RenderImage
4 CoreGraphics 2002.0 RIPLayerBltImage
3 CoreGraphics 2002.0 ripl_Mark
2 CoreGraphics 2002.0 argb32_image
1 CoreGraphics 2002.0 CGSBlend8888toRGBA8888
0 vImage 2001.0 vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx512
This brings almost 30% performance boost in my Mac Pro 2019 (2.5G Intel Xeon
W), there are optimizations for older CPUs, found in vImage (macOS 10.15.7):
0000000000663030 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx
000000000076b660 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx2
0000000000017560 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx512
0000000000583740 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_sse4_1
00000000003aa2b0 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_vec
I don't have other Macs, but I think it is safe to say the series of functions
should be faster than the unoptimized rgba32_sample_rgba32 function.
This also depends on the CGDisplayCopyColorSpace(CGMainDisplayID())
modification.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20201021/577a4c24/attachment.htm>
More information about the Libreoffice-bugs
mailing list