[Mesa-dev] [PATCH] anv: don't crash when creating a huge image

Chad Versace chadversary at chromium.org
Fri Nov 10 18:48:10 UTC 2017


On Thu 09 Nov 2017, Jason Ekstrand wrote:
> On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace <[1]chadversary at chromium.org>
> wrote:
> 
>     On Wed 08 Nov 2017, Jason Ekstrand wrote:
>     > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
>     > [2]siglesias at igalia.com> wrote:
>     >
>     >     The HW has some limits but, according to the spec, we can create
>     >     the image as it has not yet any memory backing it. When we allocate
>     >     that memory, then we fail following what Vulkan spec, "10.2. Device
>     >     Memory" says when talking about vkAllocateMemory():
>     >
>     >     "Some platforms may have a limit on the maximum size of a single
>     >      allocation. For example, certain systems may fail to create
>     >      allocations with a size greater than or equal to 4GB. Such a limit
>     is
>     >      implementation-dependent, and if such a failure occurs then the
>     error
>     >      VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
>     >
>     >     Fixes the crashes on BDW for the following tests:
>     >
>     >     dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
>     >     dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
>     >
>     >     Signed-off-by: Samuel Iglesias Gonsálvez <[2][3]siglesias at igalia.com>
>     >     ---
>     >
>     >     Jason, I was tempted to move this piece of code to anv_AllocateMemory
>     ()
>     >     but then I found the kernel relocation limitation of 32-bit... Is
>     that
>     >     limitation still applicable? Or was it from the BDW age and we forgot
>     >     to update that limitation for gen9+?
>     >
>     >
>     > We're still using relocations on all hardware so it applies to everything
>     > today.  One of my 2018 projects is to fix that and get rid of relocations
>     on
>     > gen8+.
>     >  
>     >
>     >     Sam
>     >
>     >      src/intel/isl/isl.c | 22 ----------------------
>     >      1 file changed, 22 deletions(-)
>     >
>     >     diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
>     >     index 59f512fc050..aaadcbaf991 100644
>     >     --- a/src/intel/isl/isl.c
>     >     +++ b/src/intel/isl/isl.c
>     >     @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct isl_device *dev,
>     >            base_alignment = MAX(info->min_alignment, tile_size);
>     >         }
>     >
>     >     -   if (ISL_DEV_GEN(dev) < 9) {
>     >     -      /* From the Broadwell PRM Vol 5, Surface Layout:
>     >     -       *
>     >     -       *    "In addition to restrictions on maximum height, width,
>     and
>     >     depth,
>     >     -       *     surfaces are also restricted to a maximum size in
>     bytes. This
>     >     -       *     maximum is 2 GB for all products and all surface
>     types."
>     >     -       *
>     >     -       * This comment is applicable to all Pre-gen9 platforms.
>     >     -       */
>     >     -      if (size > (uint64_t) 1 << 31)
>     >     -         return false;
>     >     -   } else {
>     >     -      /* From the Skylake PRM Vol 5, Maximum Surface Size in Bytes:
>     >     -       *    "In addition to restrictions on maximum height, width,
>     and
>     >     depth,
>     >     -       *     surfaces are also restricted to a maximum size of 2^38
>     bytes.
>     >     -       *     All pixels within the surface must be contained within
>     2^38
>     >     bytes
>     >     -       *     of the base address."
>     >     -       */
>     >     -      if (size > (uint64_t) 1 << 38)
>     >     -         return false;
>     >     -   }
> 
>     I think it very unwise to delete code that enforces requirements defined
>     by the hardware spec. Deleting the code doesn't make the hardware
>     requirements go away :)
>    
>     > I'm not sure how I feel about removing this from ISL.  There are really
>     two
>     > limitations going on here.  One is a limitation imposed by relocations,
>     and the
>     > other is some sort of fundamental hardware surface size limitation.  Most
>     > likely, the surface size limitation has to do with how many bits they use
>     for
>     > image address computations in the hardware.  Most likely, on gen8, they
>     do all
>     > of the internal calculations in 32 bits and only convert to 48 at the end
>     when
>     > they need to add it to Surface Base Address.
>     >
>     > If my understanding is correct then we will still have this limitation on
>     gen8
>     > even after we get rid of relocations and remove the BO size limitation. 
>     I see
>     > a couple of options, neither of which I like very much:
>     >
>     >  1) Take something like this patch and then keep the BO size limitation
>     on BDW
>     > to 2GiB when we get rid of relocations even though it's artificial.
>     >  2) "Gracefully" handle isl_surf_init failure by doing a debug_log and
>     > succeeding but setting the image size (that will be returned by
>     > vkGetImageMemoryRequirements) to UINT64_MAX so that the client won't ever
>     be
>     > able to find memory for it.
>     >
>     > My feeling is that 1) above is probably the better of the two as 2) seems
>     to be
>     > a twisting of the spec.  That said, I would like to keep the restriction
>     in ISL
>     > somehow and we need to make sure it still gets applied in GL.
> 
>     I dislike both. I originally designed isl to mimic the VkImage API, so
>     let's continue that trend.
> 
>       Option 3) Change isl_surf_init() to return a meaningful result code:
> 
>                 ISL_SUCCESS = 0
>                 ISL_ERROR_SOMETHING_SOMETHING_THE_USUAL_FAILURES = -1
>                 ISL_ERROR_SURFACE_SIZE_TOO_LARGE = -2
> 
>     I like option 3 because it avoids secret implicit contracts between isl
>     and anvil, and thus avoids hidden hacks.
> 
> 
> I mostly agree but that still doesn't answer the question of what do we do with
> that return code?  We can't fail the vkCreateImage because our only options
> there are the two OOM errors both of which are lies.  The only real option we
> have is to go ahead and create the VkImage but give it a size that doesn't fit
> in any allocatable memory object.  So, we're back to options 1 and 2...

Yeah, you're right.

The Vulkan spec has strange language regarding
VkImageFormatProperties::maxResourceSize, which gives the implementatin
two options:

    NOTE
    There is no mechanism to query the size of an image before creating it,
    to compare that size against maxResourceSize. If an application attempts
    to create an image that exceeds this limit, the creation will *fail* or
    the image will be *invalid*. While the advertised limit must be at least
    2^31, it may not be possible to create an image that approaches that
    size, particularly for VK_IMAGE_TYPE_1D.

    [Emphasis on *fail* and *invalid* are mine].

On the choice to fail:
    Like you point out, vkCreateImage has no appropriate result code for
    this failure.

On the choice to return an invalid image:
    The spec is silent on what should happen when vkCreateImage succeeds
    and returns an invalid image. The spec never refers to an invalid
    image (or any invalid resource) except in this paragraph.

    But I think setting anv_image::size = UINT64_MAX complies with the
    spec, because it prevents the app from binding the image to memory.

So I vote for option 2.


More information about the mesa-dev mailing list