[Mesa-dev] [PATCH] anv: don't crash when creating a huge image

Jason Ekstrand jason at jlekstrand.net
Fri Nov 10 19:16:18 UTC 2017


On Fri, Nov 10, 2017 at 10:48 AM, Chad Versace <chadversary at chromium.org>
wrote:

> On Thu 09 Nov 2017, Jason Ekstrand wrote:
> > On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace <[1]
> chadversary at chromium.org>
> > wrote:
> >
> >     On Wed 08 Nov 2017, Jason Ekstrand wrote:
> >     > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> >     > [2]siglesias at igalia.com> wrote:
> >     >
> >     >     The HW has some limits but, according to the spec, we can
> create
> >     >     the image as it has not yet any memory backing it. When we
> allocate
> >     >     that memory, then we fail following what Vulkan spec, "10.2.
> Device
> >     >     Memory" says when talking about vkAllocateMemory():
> >     >
> >     >     "Some platforms may have a limit on the maximum size of a
> single
> >     >      allocation. For example, certain systems may fail to create
> >     >      allocations with a size greater than or equal to 4GB. Such a
> limit
> >     is
> >     >      implementation-dependent, and if such a failure occurs then
> the
> >     error
> >     >      VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> >     >
> >     >     Fixes the crashes on BDW for the following tests:
> >     >
> >     >     dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> >     >     dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> >     >
> >     >     Signed-off-by: Samuel Iglesias Gonsálvez <[2][3]
> siglesias at igalia.com>
> >     >     ---
> >     >
> >     >     Jason, I was tempted to move this piece of code to
> anv_AllocateMemory
> >     ()
> >     >     but then I found the kernel relocation limitation of 32-bit...
> Is
> >     that
> >     >     limitation still applicable? Or was it from the BDW age and we
> forgot
> >     >     to update that limitation for gen9+?
> >     >
> >     >
> >     > We're still using relocations on all hardware so it applies to
> everything
> >     > today.  One of my 2018 projects is to fix that and get rid of
> relocations
> >     on
> >     > gen8+.
> >     >
> >     >
> >     >     Sam
> >     >
> >     >      src/intel/isl/isl.c | 22 ----------------------
> >     >      1 file changed, 22 deletions(-)
> >     >
> >     >     diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> >     >     index 59f512fc050..aaadcbaf991 100644
> >     >     --- a/src/intel/isl/isl.c
> >     >     +++ b/src/intel/isl/isl.c
> >     >     @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct isl_device
> *dev,
> >     >            base_alignment = MAX(info->min_alignment, tile_size);
> >     >         }
> >     >
> >     >     -   if (ISL_DEV_GEN(dev) < 9) {
> >     >     -      /* From the Broadwell PRM Vol 5, Surface Layout:
> >     >     -       *
> >     >     -       *    "In addition to restrictions on maximum height,
> width,
> >     and
> >     >     depth,
> >     >     -       *     surfaces are also restricted to a maximum size in
> >     bytes. This
> >     >     -       *     maximum is 2 GB for all products and all surface
> >     types."
> >     >     -       *
> >     >     -       * This comment is applicable to all Pre-gen9 platforms.
> >     >     -       */
> >     >     -      if (size > (uint64_t) 1 << 31)
> >     >     -         return false;
> >     >     -   } else {
> >     >     -      /* From the Skylake PRM Vol 5, Maximum Surface Size in
> Bytes:
> >     >     -       *    "In addition to restrictions on maximum height,
> width,
> >     and
> >     >     depth,
> >     >     -       *     surfaces are also restricted to a maximum size
> of 2^38
> >     bytes.
> >     >     -       *     All pixels within the surface must be contained
> within
> >     2^38
> >     >     bytes
> >     >     -       *     of the base address."
> >     >     -       */
> >     >     -      if (size > (uint64_t) 1 << 38)
> >     >     -         return false;
> >     >     -   }
> >
> >     I think it very unwise to delete code that enforces requirements
> defined
> >     by the hardware spec. Deleting the code doesn't make the hardware
> >     requirements go away :)
> >
> >     > I'm not sure how I feel about removing this from ISL.  There are
> really
> >     two
> >     > limitations going on here.  One is a limitation imposed by
> relocations,
> >     and the
> >     > other is some sort of fundamental hardware surface size
> limitation.  Most
> >     > likely, the surface size limitation has to do with how many bits
> they use
> >     for
> >     > image address computations in the hardware.  Most likely, on gen8,
> they
> >     do all
> >     > of the internal calculations in 32 bits and only convert to 48 at
> the end
> >     when
> >     > they need to add it to Surface Base Address.
> >     >
> >     > If my understanding is correct then we will still have this
> limitation on
> >     gen8
> >     > even after we get rid of relocations and remove the BO size
> limitation.
> >     I see
> >     > a couple of options, neither of which I like very much:
> >     >
> >     >  1) Take something like this patch and then keep the BO size
> limitation
> >     on BDW
> >     > to 2GiB when we get rid of relocations even though it's artificial.
> >     >  2) "Gracefully" handle isl_surf_init failure by doing a debug_log
> and
> >     > succeeding but setting the image size (that will be returned by
> >     > vkGetImageMemoryRequirements) to UINT64_MAX so that the client
> won't ever
> >     be
> >     > able to find memory for it.
> >     >
> >     > My feeling is that 1) above is probably the better of the two as
> 2) seems
> >     to be
> >     > a twisting of the spec.  That said, I would like to keep the
> restriction
> >     in ISL
> >     > somehow and we need to make sure it still gets applied in GL.
> >
> >     I dislike both. I originally designed isl to mimic the VkImage API,
> so
> >     let's continue that trend.
> >
> >       Option 3) Change isl_surf_init() to return a meaningful result
> code:
> >
> >                 ISL_SUCCESS = 0
> >                 ISL_ERROR_SOMETHING_SOMETHING_THE_USUAL_FAILURES = -1
> >                 ISL_ERROR_SURFACE_SIZE_TOO_LARGE = -2
> >
> >     I like option 3 because it avoids secret implicit contracts between
> isl
> >     and anvil, and thus avoids hidden hacks.
> >
> >
> > I mostly agree but that still doesn't answer the question of what do we
> do with
> > that return code?  We can't fail the vkCreateImage because our only
> options
> > there are the two OOM errors both of which are lies.  The only real
> option we
> > have is to go ahead and create the VkImage but give it a size that
> doesn't fit
> > in any allocatable memory object.  So, we're back to options 1 and 2...
>
> Yeah, you're right.
>
> The Vulkan spec has strange language regarding
> VkImageFormatProperties::maxResourceSize, which gives the implementatin
> two options:
>
>     NOTE
>     There is no mechanism to query the size of an image before creating it,
>     to compare that size against maxResourceSize. If an application
> attempts
>     to create an image that exceeds this limit, the creation will *fail* or
>     the image will be *invalid*. While the advertised limit must be at
> least
>     2^31, it may not be possible to create an image that approaches that
>     size, particularly for VK_IMAGE_TYPE_1D.
>
>     [Emphasis on *fail* and *invalid* are mine].
>
> On the choice to fail:
>     Like you point out, vkCreateImage has no appropriate result code for
>     this failure.
>
> On the choice to return an invalid image:
>     The spec is silent on what should happen when vkCreateImage succeeds
>     and returns an invalid image. The spec never refers to an invalid
>     image (or any invalid resource) except in this paragraph.
>
>     But I think setting anv_image::size = UINT64_MAX complies with the
>     spec, because it prevents the app from binding the image to memory.
>
> So I vote for option 2.
>

Sounds good to me.  In the mean time, I filed a spec bug with more-or-less
the contents of your e-mail:  It's internal issue 1078 if you're
interested.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171110/82c17377/attachment.html>


More information about the mesa-dev mailing list