BinaryDataContainer swapping ? ...

Mon Apr 3 11:11:21 UTC 2023

Hi,

On 03/04/2023 17:50, Michael Meeks wrote:
> Hi there,
>
>     I've been looking at some memory profiles, and it seems to me that 
> we're storing too much bulk image data in memory. I understand that 
> previously we had real problems with image lifecycle and loss due to 
> optimistic attempts to use the original document as a reliable source 
> for fetching data from; so it made good sense to brute-force this by 
> holding all images in memory: fair enough - I've not seen us loose an 
> image while copy/pasting/closing/opening etc. since that was done - 
> which is awesome, and I don't want to go back to that =)

Yeah, it was never intended to stay that way, but the size of compressed 
images are small enough usually to not cause any super high memory usage 
(that is when running it locally on user's computers) so bringing 
swapping to disk back wasn't that necessary as nobody complained and 
worked generally way faster without doing unnecessary swap-outs and 
swap-ins when loading a document. The biggest problem was how to prevent 
that to happen - things like parallel loading that was introduced and 
re-introduced did not help this - it made everything worse, which is 
what I complained about it and provided a solution to how it should be 
done properly, but yeah.... The issue is why load images in parallel 
when you immediately notice that the image isn't used and swap it out 
and even when you take this into account, the effect of parallel loading 
is minimal because it's unlikely that many images will be used 
immediately when loading. Better time would be spent to create a 
parallel image loading worker that would load images on demand in a 
separate thread - so we actually could load multiple images in parallel 
all the time, not just when a document is loaded.

>
>     Nevertheless, the memory profiles show that for large documents 
> the compressed image data can also be large: hundreds of Mb (and yes 
> we have some customers that love copy/pasting lots of large complex 
> images into their files ;-)
>

Yeah - I would imagine PNG files could get super big, but mostly they 
shouldn't be that big...? That's strange. MSO automatically enables 
image re-compression on save, which drastically lowers the size of 
images that are saved into the documents. Maybe we should consider the same.

Also it doesn't help that we load WebP images and convert those to PNG 
images (which is something I complained about and wanted to fix it but 
was demotivated by the resistance to my idea). A lot of images on the 
web today are WebP.

>     Anyhow - I was doing some re-factoring of the BinaryDataContainer 
> to encapsulate it better; and I was thinking of adding the ability to 
> swap these out (which should be async), and to then read them back in 
> on demand which (I hope) in the era of SSDs - should be extremely 
> fast, although synchronous.
>
>     Then again - since I've not touched these code paths for a long 
> time; and lost any deep understanding I may once have had, I'd love 
> some advice:
>
> https://git.libreoffice.org/core/+log/refs/heads/private/mmeeks/swapdatacontainer 
>
>
>     Would be good to have two forms of that: advice for long-term 
> star-trek futures, and also advice for 
> quick-fix-that-goes-in-the-right-direction =)

It's tempting but not sure if swapping should be the responsibility of 
BinaryDataContainer. BinaryDataContainer is copyable, but the internal 
"std::vector<sal_uInt8>", which isn't - so you need to wrap that again 
and put that into a shared_ptr. But then I would leave the choice of 
what kind of data (swappable or not) to the user.

>
>     Beyond that - I'd -really- like to get more de-compressed images 
> out of memory; the 20 MPixel image that turns into 80Mb of RAM - and 
> then is interpolated down to some 1080p screen - so ~2Mpixel once, 
> then cached, and re-rendered from the cache. I wonder - will our 
> existing usage of eg. cairo's scaling cache allow us to push the giant 
> backing bitmap out of memory, and leave the 8Mb alternative in the 
> scaling cache ? ;-)

Well I think this is more useful than swapping (unless the compressed 
image actually takes a huge amount of memory). I wouldn't do it on the 
Cairo level but on the BitmapEx / Graphic level, because the 
GraphicManager should have a say in this and it is independent of the 
bitmap backend. Adding the mip-mapping part, which I have been talking 
about for years so that 1. it would make scaling images faster because 
you start with the nearest 2^n scaled down image and not the original 
resolution, 2. would purge large original resolution images  from the 
memory and load it from the compressed source again only when required. 
This can be made super smart to load and decompress the image on demand 
and use a scaled down version in the meantime, and on load even use the 
thumbnail that a lot of images have as the metadata - stuff like that.

> Thoughts appreciated,
>
>         Michael.
>

Tomaž