[Mesa-dev] [PATCH] anv: don't crash when creating a huge image
Jason Ekstrand
jason at jlekstrand.net
Fri Nov 10 19:16:18 UTC 2017
On Fri, Nov 10, 2017 at 10:48 AM, Chad Versace <chadversary at chromium.org>
wrote:
> On Thu 09 Nov 2017, Jason Ekstrand wrote:
> > On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace <[1]
> chadversary at chromium.org>
> > wrote:
> >
> > On Wed 08 Nov 2017, Jason Ekstrand wrote:
> > > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> > > [2]siglesias at igalia.com> wrote:
> > >
> > > The HW has some limits but, according to the spec, we can
> create
> > > the image as it has not yet any memory backing it. When we
> allocate
> > > that memory, then we fail following what Vulkan spec, "10.2.
> Device
> > > Memory" says when talking about vkAllocateMemory():
> > >
> > > "Some platforms may have a limit on the maximum size of a
> single
> > > allocation. For example, certain systems may fail to create
> > > allocations with a size greater than or equal to 4GB. Such a
> limit
> > is
> > > implementation-dependent, and if such a failure occurs then
> the
> > error
> > > VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> > >
> > > Fixes the crashes on BDW for the following tests:
> > >
> > > dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> > > dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> > >
> > > Signed-off-by: Samuel Iglesias Gonsálvez <[2][3]
> siglesias at igalia.com>
> > > ---
> > >
> > > Jason, I was tempted to move this piece of code to
> anv_AllocateMemory
> > ()
> > > but then I found the kernel relocation limitation of 32-bit...
> Is
> > that
> > > limitation still applicable? Or was it from the BDW age and we
> forgot
> > > to update that limitation for gen9+?
> > >
> > >
> > > We're still using relocations on all hardware so it applies to
> everything
> > > today. One of my 2018 projects is to fix that and get rid of
> relocations
> > on
> > > gen8+.
> > >
> > >
> > > Sam
> > >
> > > src/intel/isl/isl.c | 22 ----------------------
> > > 1 file changed, 22 deletions(-)
> > >
> > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > > index 59f512fc050..aaadcbaf991 100644
> > > --- a/src/intel/isl/isl.c
> > > +++ b/src/intel/isl/isl.c
> > > @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct isl_device
> *dev,
> > > base_alignment = MAX(info->min_alignment, tile_size);
> > > }
> > >
> > > - if (ISL_DEV_GEN(dev) < 9) {
> > > - /* From the Broadwell PRM Vol 5, Surface Layout:
> > > - *
> > > - * "In addition to restrictions on maximum height,
> width,
> > and
> > > depth,
> > > - * surfaces are also restricted to a maximum size in
> > bytes. This
> > > - * maximum is 2 GB for all products and all surface
> > types."
> > > - *
> > > - * This comment is applicable to all Pre-gen9 platforms.
> > > - */
> > > - if (size > (uint64_t) 1 << 31)
> > > - return false;
> > > - } else {
> > > - /* From the Skylake PRM Vol 5, Maximum Surface Size in
> Bytes:
> > > - * "In addition to restrictions on maximum height,
> width,
> > and
> > > depth,
> > > - * surfaces are also restricted to a maximum size
> of 2^38
> > bytes.
> > > - * All pixels within the surface must be contained
> within
> > 2^38
> > > bytes
> > > - * of the base address."
> > > - */
> > > - if (size > (uint64_t) 1 << 38)
> > > - return false;
> > > - }
> >
> > I think it very unwise to delete code that enforces requirements
> defined
> > by the hardware spec. Deleting the code doesn't make the hardware
> > requirements go away :)
> >
> > > I'm not sure how I feel about removing this from ISL. There are
> really
> > two
> > > limitations going on here. One is a limitation imposed by
> relocations,
> > and the
> > > other is some sort of fundamental hardware surface size
> limitation. Most
> > > likely, the surface size limitation has to do with how many bits
> they use
> > for
> > > image address computations in the hardware. Most likely, on gen8,
> they
> > do all
> > > of the internal calculations in 32 bits and only convert to 48 at
> the end
> > when
> > > they need to add it to Surface Base Address.
> > >
> > > If my understanding is correct then we will still have this
> limitation on
> > gen8
> > > even after we get rid of relocations and remove the BO size
> limitation.
> > I see
> > > a couple of options, neither of which I like very much:
> > >
> > > 1) Take something like this patch and then keep the BO size
> limitation
> > on BDW
> > > to 2GiB when we get rid of relocations even though it's artificial.
> > > 2) "Gracefully" handle isl_surf_init failure by doing a debug_log
> and
> > > succeeding but setting the image size (that will be returned by
> > > vkGetImageMemoryRequirements) to UINT64_MAX so that the client
> won't ever
> > be
> > > able to find memory for it.
> > >
> > > My feeling is that 1) above is probably the better of the two as
> 2) seems
> > to be
> > > a twisting of the spec. That said, I would like to keep the
> restriction
> > in ISL
> > > somehow and we need to make sure it still gets applied in GL.
> >
> > I dislike both. I originally designed isl to mimic the VkImage API,
> so
> > let's continue that trend.
> >
> > Option 3) Change isl_surf_init() to return a meaningful result
> code:
> >
> > ISL_SUCCESS = 0
> > ISL_ERROR_SOMETHING_SOMETHING_THE_USUAL_FAILURES = -1
> > ISL_ERROR_SURFACE_SIZE_TOO_LARGE = -2
> >
> > I like option 3 because it avoids secret implicit contracts between
> isl
> > and anvil, and thus avoids hidden hacks.
> >
> >
> > I mostly agree but that still doesn't answer the question of what do we
> do with
> > that return code? We can't fail the vkCreateImage because our only
> options
> > there are the two OOM errors both of which are lies. The only real
> option we
> > have is to go ahead and create the VkImage but give it a size that
> doesn't fit
> > in any allocatable memory object. So, we're back to options 1 and 2...
>
> Yeah, you're right.
>
> The Vulkan spec has strange language regarding
> VkImageFormatProperties::maxResourceSize, which gives the implementatin
> two options:
>
> NOTE
> There is no mechanism to query the size of an image before creating it,
> to compare that size against maxResourceSize. If an application
> attempts
> to create an image that exceeds this limit, the creation will *fail* or
> the image will be *invalid*. While the advertised limit must be at
> least
> 2^31, it may not be possible to create an image that approaches that
> size, particularly for VK_IMAGE_TYPE_1D.
>
> [Emphasis on *fail* and *invalid* are mine].
>
> On the choice to fail:
> Like you point out, vkCreateImage has no appropriate result code for
> this failure.
>
> On the choice to return an invalid image:
> The spec is silent on what should happen when vkCreateImage succeeds
> and returns an invalid image. The spec never refers to an invalid
> image (or any invalid resource) except in this paragraph.
>
> But I think setting anv_image::size = UINT64_MAX complies with the
> spec, because it prevents the app from binding the image to memory.
>
> So I vote for option 2.
>
Sounds good to me. In the mean time, I filed a spec bug with more-or-less
the contents of your e-mail: It's internal issue 1078 if you're
interested.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171110/82c17377/attachment.html>
More information about the mesa-dev
mailing list