[Intel-gfx] How to avoid "GTT mapping a busy miptree BO" stalls?

Clemens Eisserer linuxhippy at gmail.com
Thu Aug 11 08:57:26 UTC 2016


Hello,

I am currently working on an application which has to stream huge
amounts of texture data to the GPU.
I use multiple buffers to avoid mapping a buffer which is currently
involved in a DMA operation, so there is typically a delay of more
than 1 frame between triggering an upload with glTexSubImage2D and
mapping the buffer again.

The typical state of my buffers some frame could look like:
PBO0: mapped
PBO1: mapped
PBO2: glUnmapBuffer(PBO2)
PBO3: currently used by glTexSubImage2D(PBO3, ...
PBO4: unmapped & unused, texture of PBO4 used used for drawing
PBO5: unmapped & idle
PBO6: glMapBuffer(PBO6)
PBO7: mapped
PBO8: mapped
PBO9: mapped


However I get quite confusing results INTEL_DEBUG=perf.

The best-case seems to be when there are only 1-2 buffers mapped at a time:
    Transfer Rate: 1455.2 MB/s. (121.3 FPS)
    GTT mapping a busy miptree BO stalled and took 0.686 ms.

However even increasing the number of buffers mapped concurrently,
with the same delay between unmapping/using/mapping, reduces
throughput drmaatically:
    GTT mapping a busy miptree BO stalled and took 7.128 ms.
    Transfer Rate: 844.4 MB/s. (70.4 FPS)

Increasing the delay between glTexSubImage2D and glMapBuffer doesn't
seem to improve the situation at all.
Also, discarding the buffer contents before mapping (glBufferDataARB)
doesn't reduce the stall time of 7ms.

Is the large number (~4-6) of simultaneously mapped buffers a problem?

Thank you in advnace, Clemens


More information about the Intel-gfx mailing list