[Mesa-dev] Has anyone stressed radeonsi memory?

Mon Nov 13 03:39:14 UTC 2017

Hi!

I am on a Radeon RX 560 2GB; using mesa git-57c8ead0cd (So... not too new not too old), Kernel 4.12.10

I've been having complaints about our WIP branch of Ogre 2.2 about out of memory crashes, and I fixed them.

I made a stress test where 495 textures with very different resolutions (most of them not power-of-2), and total memory from those textures is around 700MB (for some reason radentop reports all 2GB of my card are used during this stress test).
Additionally, 495 cubes (one cube for each texture) are rendered to screen to ensure driver keeps them resident.

The problem is, we have different strategies:
1. In one extreme, we can load every texture to a staging region, one at a time; and then from staging region copy to the final texture.
2. In the other extreme, we load all textures to RAM at once, and use one giant staging region.

Loading everything at once causes a GL_OUT_OF_MEMORY while creating the staging area of 700MB. Ok... sounds sorta reasonable.

But things get interesting when loading using a staging area of 512MB:
1. Loading goes fine.
2. For a time, everything works fine.
3. If I hide all cubes so that they aren't shown anymore:
    1. Framerate usually goes way down (not always), like 8 fps or so (should be at 1000 fps while empty, around 200 fps while showing the cubes).
How slow it becomes is not consistent.    2. radeontop shows consumption goes down a lot (like half or more).
    3. A few seconds later, I almost always get a crash (SIGBUS) while writing to an UBO buffer that had been persistently mapped (non-coherent) since the beginning of the application.
    4. Running through valgrind, I don't get a crash.
    5. There are no errors reported by OpenGL.
4. I don't get a crash if I never hide the cubes.

Using a smaller staging area (256MB or lower) everything is always fine.

So... is this behavior expected?
Am I uncovering a weird bug in how radeonsi/amdgpu-pro handle memory pages?

I'd normally update to latest git, then create a test if the problem persists; but I've pulled latest git and saw that it required me to recompile llvm as well... so this is why I'm asking first, before losing any more time to this.

>From my perspective, if a limit of 256MB works, then I'm happy.
If you tell me this isn't normal, then I'll try to find some time to update mesa to try again; and if problem persists create a small test.

Cheers
Matias