[Mesa-dev] [PATCH 00/12] RadeonSI: Prevent IB submissions with illegal memory usage

Marek Olšák maraeo at gmail.com
Tue Aug 2 18:25:27 UTC 2016


On Tue, Aug 2, 2016 at 5:15 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 29.07.2016 23:42, Marek Olšák wrote:
>>
>> This series ensures that memory usage of gfx IBs is below the kernel-
>> exposed memory limits in most cases.
>>
>> It's not possible to prevent CS ioctl failures completely because it
>> depends on whether continuous free space for all buffers is available,
>> i.e. it depends on where some system buffers are pinned, rather than
>> how much total free space you have.
>>
>> Piglit max-texture-size and tex3d-maxsize tests easily pass with this.
>>
>> There are also some cleanups.
>
>
> Reading through the series, I believe that the individual patches are fine
> (other than some bike-shedding perhaps), but it may be time to change the CS
> submission approach into more of a transaction-with-commit/rollback one.
> This is mostly because the whole dance around
> r600_context_add_resource_size, radeon_add_to_buffer_list and
> radeon_add_to_buffer_list_check_mem just feels very fragile.
>
> Basically, radeon_winsys_cs would have a "committed" pointer into the IB in
> addition to the "current" pointer. Similarly, the buffer list would have a
> "committed" index. Winsys callbacks cs_commit and cs_rollback would do the
> obvious thing of moving the committed pointers to current or the other way
> around. (On second thought, it's probably only the buffer list that needs
> "committed" pointers.)
>
> Then, the buffer handling in the driver code would be simplified as follows:
>
> - As long as the CS already contains some committed draw/DMA call,
> radeon_add_to_buffer_list always checks whether the memory size is exceeded.
> If so, it calls cs_rollback, flushes, and returns a corresponding code.
>
> - There is no need to distinguish radeon_add_to_buffer_list and
> radeon_add_to_buffer_list_check_mem anymore, since all additions are checked
> if and only if there really is something to flush.
>
> - There is no more need for r600_context_add_resource_size. Instead, all
> draw/DMA-related buffers (vertex & index buffers, indirect data, DMA
> buffers) are added via radeon_add_to_buffer_list just before the call to
> si_need_cs_space. If radeon_add_to_buffer_list indicates that a flush
> happened, the sequence is repeated.
>
> - After each draw call, cs_commit is called.
>
> Another, tangentially related issue: we really shouldn't add buffers tied to
> descriptors in the *_begin_new_cs and even the set_* functions; instead, it
> should ideally happen at si_upload_descriptors time. Otherwise, we may end
> up adding buffers to a CS that aren't needed anymore.

Yeah, I agree it's fragile. On the other hand, moving more stuff into
draw_vbo sounds like it would increase CPU overhead.

The radeon winsys can roll back the buffer list with cs_validate.
r300_emit_buffer_validate is an easy-to-read example showing how it
works. It's pretty solid and never goes above the memory limits.

We didn't use that approach for r600 because r600 can use a lot more
buffers and tracking all of them can take a lot of CPU time. For
example, r300 is limited to only 1 active query per draw call, but
later hardware is unlimited.

Initially, the solution was "don't check memory usage", because IB
submissions almost never fail with real apps, but eventually
r600_context_add_resource_size was added as a quick hack to help with
low-memory cards and some synthetic tests.

I'm sceptical that we can have something more robust and not regress
performance, but maybe you are onto something that I'm not able to
see.

Marek


More information about the mesa-dev mailing list