[Mesa-dev] [PATCH 00/16] anv: Handle out of memory situations better

Iago Toral itoral at igalia.com
Thu Mar 9 07:04:29 UTC 2017


On Wed, 2017-03-08 at 15:00 -0800, Jason Ekstrand wrote:
> On Wed, Mar 8, 2017 at 12:24 AM, Iago Toral <itoral at igalia.com>
> wrote:
> > On Tue, 2017-03-07 at 10:31 +0200, Pohjolainen, Topi wrote:
> > > On Tue, Mar 07, 2017 at 08:15:45AM +0100, Iago Toral Quiroga
> > wrote:
> > > >
> > > > There are a number of work-in-progress CTS tests that check OOM
> > > > handling
> > > > and raised a number of issues that this series addresses.
> > > >
> > > > Particularly noteworthy is the case of command buffer
> > recording:
> > > > since the
> > > > various vkCmd*() functions do not return errors, it is expected
> > > > that drivers survive until vkEndCommanBuffer() and report any
> > > > errors
> > > > at that point. This requires that we are particularly careful
> > with
> > > > out of
> > > > memory scenarios, since we need to make sure that vkCmd*()
> > commands
> > > > do not
> > > > attempt to access memory that we have failed to allocate in a
> > > > previous step.
> > > > We achieve this by tracking errors generated during command
> > buffer
> > > > recording
> > > > into the command buffer object and, generally, avoiding to
> > execute
> > > > new
> > > > vkCmd*() functions if the command buffer recording has
> > generated an
> > > > error
> > > > before. In general, I tried to  only guard execution of
> > vkCmd*()
> > > > commands that
> > > > the tests showed needed to be guarded since most of the
> > vkCmd*()
> > > > commands are
> > > > safe to execute in any scenario.
> > > >
> > > > This series fixes all the issues raised with the new tests and
> > a
> > > > few more that
> > > > I found by inspection but I am sure we could do better in more
> > > > places.
> > > > It is a first step though and the tests are still in
> > development.
> > > >
> > > > Iago Toral Quiroga (16):
> > > >   blorp: handle out of memory in blorp_emitn()
> > > >   blorp: handle out of memory without crashing in various batch
> > > >     emissions
> > > These two go sort of half way - blorp_exec() still ignores the
> > return
> > > values of blorp_emit_vertex_buffers() and
> > > blorp_emit_vertex_elements().
> > > Neither does blorp_emit_depth_stencil_state() returning zero make
> > > blorp_exec() to take any further action. 
> > 
> > Right, I focused on avoiding the crashes only because there are a
> > lot
> > of other places where we do the same thing. Specifically, the
> > blorp_emit() macro that we use all over the place, suffers from
> > this
> > same issue: it prevents the crash by avoiding the loop body to
> > execute
> > if we failed to allocate memory, but it does nothing to report the
> > allocation failure back to its callers, so I figured it would not
> > make
> > sense to do something different for blorp_emitn().
> > 
> > I am fine with being more robust here and try to address the
> > problem
> > more thoroughly if you think it makes sense, but if we are going to
> > do
> > that, then I think we would need to look beyond this macro and do a
> > lot
> > more changes. Do you want me to work on that?
> I'm not sure what you mean.  If blorp_emit calls blorp_emit_dwords
> which calls anv_batch_emit_dwords which handles OOM by setting
> anv_batch::status to VK_OUT_OF_DEVICE_MEMORY and returning NULL, then
> that should take care of almost everything.  We would also need
> anv_cmd_buffer_alloc_dynamic_state and
> anv_cmd_buffer_alloc_blorp_binding_table to do the right thing.  I
> really don't think this will require massive BLORP changes.

Ok, I see what you mean. I understood that Topi wanted to have
functions that called blorp_emit macros to handle and return errors to
their callers explicitly, which would've meant to adds checks every
time we called any of these macros. If it is only about making sure
that we check the command buffer status after calling functions that
can emit batches I agree it should be trivial, I'll do that.

Iago

> > In any case, even if we agree on doing this, since these patches
> > put
> > blorp_emitn() at the same level as blorp_emit() and they solve some
> > problems, would it be okay to land this series first and tackle
> > that in
> > a second series?
> > 
> > Iago
> > 
> > > >
> > > >   anv: don't crash if we fail to grow the reloc list
> > > >   anv: handle out of memory situations in
> > anv_batch_emit_dwords()
> > > >   anv: do not try to ref/unref NULL shaders
> > > >   anv/blorp: return early if we failed to create the shader
> > binary
> > > >   anv/cmd_buffer: return a VkResult in
> > cmd_buffer_setup_attachments
> > > >   anv/cmd_buffer: track error scenarios during command buffer
> > > > recording
> > > >   anv: handle out of memory scenarios in anv_batch_emit_batch()
> > > >   anv/cmd_buffer: handle out of memory during
> > vkCmdPushConstants
> > > >   anv: handle out of memory situations during queue submissions
> > > >   anv: ensure that we don't ever try to adjust relocations more
> > > > than
> > > >     once
> > > >   anv/cmd_buffer: skip vkCmdDraw*() on broken command buffers
> > > >   anv/cmd_buffer: skip vkCmdDispatch() on broken command
> > buffers
> > > >   anv/cmd_buffer: skip vkCmdExecuteCommands() on broken command
> > > > buffers
> > > >   anv/cmd_buffer: handle out of device memory during binding
> > table
> > > >     emission
> > > >
> > > >  src/intel/blorp/blorp_genX_exec.h  | 25 ++++++++++-----
> > > >  src/intel/vulkan/anv_batch_chain.c | 45 +++++++++++++++++++---
> > ----
> > > > -
> > > >  src/intel/vulkan/anv_blorp.c       |  3 ++
> > > >  src/intel/vulkan/anv_cmd_buffer.c  | 21 +++++++++++--
> > > >  src/intel/vulkan/anv_private.h     | 19 ++++++++++++
> > > >  src/intel/vulkan/genX_cmd_buffer.c | 63
> > > > ++++++++++++++++++++++++++++++++------
> > > >  6 files changed, 142 insertions(+), 34 deletions(-)
> > > >
> > 


More information about the mesa-dev mailing list