[Libreoffice-bugs] [Bug 137468] Severe performance degradation on a macOS with a 5K display

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Oct 21 15:44:36 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=137468

--- Comment #29 from Leo Wang <ilford at gmail.com> ---
Currently the hottest spot in the downstream of the copyArea and copyBits calls
is:

   7 CoreGraphics 2781.0  ripc_DrawLayer
   6 CoreGraphics 2777.0  ripc_DrawImage
   5 CoreGraphics 2774.0  ripc_RenderImage
   4 CoreGraphics 2774.0  RIPLayerBltImage
   3 CoreGraphics 2774.0  ripl_Mark
   2 CoreGraphics 2774.0  ARGB32_image
   1 CoreGraphics 2773.0  rgba32_image_mark
   0 CoreGraphics 1985.0  rgba32_sample_rgba32

where the bottom rgba32_sample_rgba32 is not optimized with SIMD.

If the following modification is made:

diff --git a/vcl/quartz/salgdiutils.cxx b/vcl/quartz/salgdiutils.cxx
index 426aea29dc78..3ca31c2a3a7b 100644
--- a/vcl/quartz/salgdiutils.cxx
+++ b/vcl/quartz/salgdiutils.cxx
@@ -134,7 +134,7 @@ bool AquaSalGraphics::CheckContext()

             const int nBytesPerRow = (nBitmapDepth * nScaledWidth) / 8;
             void* pRawData = std::malloc(nBytesPerRow * nScaledHeight);
-            const int nFlags = kCGImageAlphaNoneSkipFirst;
+            const int nFlags = kCGImageAlphaNoneSkipFirst |
kCGBitmapByteOrder32Host;
             CGContextHolder aContextHolder(CGBitmapContextCreate(
                 pRawData, nScaledWidth, nScaledHeight, 8, nBytesPerRow,
GetSalData()->mxRGBSpace, nFlags));

i.e., to create a bitmap with the host byte order, at least in my Mac, all
calls to rgba32_sample_rgba32 are avoided. Here is it:

   8 CoreGraphics 2009.0  CGContextDrawLayerAtPoint
   7 CoreGraphics 2009.0  ripc_DrawLayer
   6 CoreGraphics 2004.0  ripc_DrawImage
   5 CoreGraphics 2002.0  ripc_RenderImage
   4 CoreGraphics 2002.0  RIPLayerBltImage
   3 CoreGraphics 2002.0  ripl_Mark
   2 CoreGraphics 2002.0  argb32_image
   1 CoreGraphics 2002.0  CGSBlend8888toRGBA8888
   0 vImage 2001.0  vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx512

This brings almost 30% performance boost in my Mac Pro 2019 (2.5G Intel Xeon
W), there are optimizations for older CPUs, found in vImage (macOS 10.15.7):

0000000000663030 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx
000000000076b660 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx2
0000000000017560 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_avx512
0000000000583740 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_sse4_1
00000000003aa2b0 t _vPremultipliedAlphaBlendWithPermute_RGBA8888_CV_vec

I don't have other Macs, but I think it is safe to say the series of functions
should be faster than the unoptimized rgba32_sample_rgba32 function.

This also depends on the CGDisplayCopyColorSpace(CGMainDisplayID())
modification.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20201021/577a4c24/attachment.htm>


More information about the Libreoffice-bugs mailing list