[Mesa-dev] [PATCH] anv: don't crash when creating a huge image
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Fri Nov 10 05:59:03 UTC 2017
On Thu, 2017-11-09 at 16:34 -0800, Jason Ekstrand wrote:
> On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace <chadversary at chromium.or
> g>
> wrote:
>
> > On Wed 08 Nov 2017, Jason Ekstrand wrote:
> > > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> > > siglesias at igalia.com> wrote:
> > >
> > > The HW has some limits but, according to the spec, we can
> > > create
> > > the image as it has not yet any memory backing it. When we
> > > allocate
> > > that memory, then we fail following what Vulkan spec, "10.2.
> > > Device
> > > Memory" says when talking about vkAllocateMemory():
> > >
> > > "Some platforms may have a limit on the maximum size of a
> > > single
> > > allocation. For example, certain systems may fail to create
> > > allocations with a size greater than or equal to 4GB. Such a
> > > limit
> >
> > is
> > > implementation-dependent, and if such a failure occurs then
> > > the
> >
> > error
> > > VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> > >
> > > Fixes the crashes on BDW for the following tests:
> > >
> > > dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> > > dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> > >
> > > Signed-off-by: Samuel Iglesias Gonsálvez <[2]siglesias at igalia
> > > .com>
> > > ---
> > >
> > > Jason, I was tempted to move this piece of code to
> >
> > anv_AllocateMemory()
> > > but then I found the kernel relocation limitation of 32-
> > > bit... Is
> >
> > that
> > > limitation still applicable? Or was it from the BDW age and
> > > we forgot
> > > to update that limitation for gen9+?
> > >
> > >
> > > We're still using relocations on all hardware so it applies to
> > > everything
> > > today. One of my 2018 projects is to fix that and get rid of
> >
> > relocations on
> > > gen8+.
> > >
> > >
> > > Sam
> > >
> > > src/intel/isl/isl.c | 22 ----------------------
> > > 1 file changed, 22 deletions(-)
> > >
> > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > > index 59f512fc050..aaadcbaf991 100644
> > > --- a/src/intel/isl/isl.c
> > > +++ b/src/intel/isl/isl.c
> > > @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct
> > > isl_device *dev,
> > > base_alignment = MAX(info->min_alignment, tile_size);
> > > }
> > >
> > > - if (ISL_DEV_GEN(dev) < 9) {
> > > - /* From the Broadwell PRM Vol 5, Surface Layout:
> > > - *
> > > - * "In addition to restrictions on maximum height,
> > > width,
> >
> > and
> > > depth,
> > > - * surfaces are also restricted to a maximum size
> > > in
> >
> > bytes. This
> > > - * maximum is 2 GB for all products and all
> > > surface
> >
> > types."
> > > - *
> > > - * This comment is applicable to all Pre-gen9
> > > platforms.
> > > - */
> > > - if (size > (uint64_t) 1 << 31)
> > > - return false;
> > > - } else {
> > > - /* From the Skylake PRM Vol 5, Maximum Surface Size in
> > > Bytes:
> > > - * "In addition to restrictions on maximum height,
> > > width,
> >
> > and
> > > depth,
> > > - * surfaces are also restricted to a maximum size
> > > of 2^38
> >
> > bytes.
> > > - * All pixels within the surface must be contained
> > > within
> >
> > 2^38
> > > bytes
> > > - * of the base address."
> > > - */
> > > - if (size > (uint64_t) 1 << 38)
> > > - return false;
> > > - }
> >
> > I think it very unwise to delete code that enforces requirements
> > defined
> > by the hardware spec. Deleting the code doesn't make the hardware
> > requirements go away :)
> >
The idea was to move that code to another place, hence my question out
of the commit log message :-)
> > > I'm not sure how I feel about removing this from ISL. There are
> > > really
> >
> > two
> > > limitations going on here. One is a limitation imposed by
> > > relocations,
> >
> > and the
> > > other is some sort of fundamental hardware surface size
> > > limitation. Most
> > > likely, the surface size limitation has to do with how many bits
> > > they
> >
> > use for
> > > image address computations in the hardware. Most likely, on
> > > gen8, they
> >
> > do all
> > > of the internal calculations in 32 bits and only convert to 48 at
> > > the
> >
> > end when
> > > they need to add it to Surface Base Address.
> > >
> > > If my understanding is correct then we will still have this
> > > limitation
> >
> > on gen8
> > > even after we get rid of relocations and remove the BO size
> > > limitation.
> >
> > I see
> > > a couple of options, neither of which I like very much:
> > >
> > > 1) Take something like this patch and then keep the BO size
> > > limitation
> >
> > on BDW
> > > to 2GiB when we get rid of relocations even though it's
> > > artificial.
> > > 2) "Gracefully" handle isl_surf_init failure by doing a
> > > debug_log and
> > > succeeding but setting the image size (that will be returned by
> > > vkGetImageMemoryRequirements) to UINT64_MAX so that the client
> > > won't
> >
> > ever be
> > > able to find memory for it.
> > >
> > > My feeling is that 1) above is probably the better of the two as
> > > 2)
> >
> > seems to be
> > > a twisting of the spec. That said, I would like to keep the
> > > restriction
> >
> > in ISL
> > > somehow and we need to make sure it still gets applied in GL.
> >
> > I dislike both. I originally designed isl to mimic the VkImage API,
> > so
> > let's continue that trend.
> >
> > Option 3) Change isl_surf_init() to return a meaningful result
> > code:
> >
> > ISL_SUCCESS = 0
> > ISL_ERROR_SOMETHING_SOMETHING_THE_USUAL_FAILURES = -1
> > ISL_ERROR_SURFACE_SIZE_TOO_LARGE = -2
> >
> > I like option 3 because it avoids secret implicit contracts between
> > isl
> > and anvil, and thus avoids hidden hacks.
> >
>
> I mostly agree but that still doesn't answer the question of what do
> we do
> with that return code? We can't fail the vkCreateImage because our
> only
> options there are the two OOM errors both of which are lies. The
> only real
> option we have is to go ahead and create the VkImage but give it a
> size
> that doesn't fit in any allocatable memory object. So, we're back to
> options 1 and 2...
I'm going to implement option 1) first and then we will decide if we
like it or we go to option 2).
Sam
More information about the mesa-dev
mailing list