<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Mar 29, 2017 at 1:51 AM, Chris Wilson <span dir="ltr"><<a href="mailto:chris@chris-wilson.co.uk" target="_blank">chris@chris-wilson.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Mar 28, 2017 at 05:41:12PM -0700, Jason Ekstrand wrote:<br>
> This commit adds support for using the full 48-bit address space on<br>
> Broadwell and newer hardware. Thanks to certain limitations, not all<br>
> objects can be placed above the 32-bit boundary. In particular, general<br>
> and state base address need to live within 32 bits. (See also<br>
> Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order<br>
> to handle this, we add a supports_48bit_address field to anv_bo and only<br>
> set EXEC_OBJECT_SUPPORTS_48B_<wbr>ADDRESS if that bit is set. We set the bit<br>
> for all client-allocated memory objects but leave it false for<br>
> driver-allocated objects. While this is more conservative than needed,<br>
> all driver allocations should easily fit in the first 32 bits of address<br>
> space and keeps things simple because we don't have to think about<br>
> whether or not any given one of our allocation data structures will be<br>
> used in a 48-bit-unsafe way.<br>
> ---<br>
</span><span class="">> static VkResult<br>
> -anv_execbuf_add_bo(struct anv_execbuf *exec,<br>
> +anv_execbuf_add_bo(struct anv_device *device,<br>
> + struct anv_execbuf *exec,<br>
> struct anv_bo *bo,<br>
> struct anv_reloc_list *relocs,<br>
> const VkAllocationCallbacks *alloc)<br>
> @@ -1039,6 +1040,10 @@ anv_execbuf_add_bo(struct anv_execbuf *exec,<br>
> obj->flags = bo->is_winsys_bo ? EXEC_OBJECT_WRITE : 0;<br>
> obj->rsvd1 = 0;<br>
> obj->rsvd2 = 0;<br>
> +<br>
> + if (device->instance-><wbr>physicalDevice.supports_48bit_<wbr>addresses &&<br>
> + bo->supports_48bit_address)<br>
> + obj->flags |= EXEC_OBJECT_SUPPORTS_48B_<wbr>ADDRESS;<br>
<br>
</span>Don't set bo->supports_48bit_address when<br>
!device->instance-><wbr>physicalDevice.supports_48bit_<wbr>addresses? My guess is<br>
that flagging bo is a rarer task than add_bo(), and it looks like you<br>
already have device available in the callers of bo_init(true).<br></blockquote><div><br></div><div>I thought a bout making that change right before I sent it but decided I marginally liked this better. I'm happy to change it if you'd like.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
> }<br>
><br>
> if (relocs != NULL && obj->relocation_count == 0) {<br>
> @@ -1052,7 +1057,7 @@ anv_execbuf_add_bo(struct anv_execbuf *exec,<br>
> for (size_t i = 0; i < relocs->num_relocs; i++) {<br>
> /* A quick sanity check on relocations */<br>
> assert(relocs->relocs[i].<wbr>offset < bo->size);<br>
> - anv_execbuf_add_bo(exec, relocs->reloc_bos[i], NULL, alloc);<br>
> + anv_execbuf_add_bo(device, exec, relocs->reloc_bos[i], NULL, alloc);<br>
> }<br>
> }<br>
><br>
> @@ -1264,7 +1269,8 @@ anv_cmd_buffer_execbuf(struct anv_device *device,<br>
> adjust_relocations_from_state_<wbr>pool(ss_pool, &cmd_buffer->surface_relocs,<br>
> cmd_buffer->last_ss_pool_<wbr>center);<br>
> VkResult result =<br>
> - anv_execbuf_add_bo(&execbuf, &ss_pool->bo, &cmd_buffer->surface_relocs,<br>
> + anv_execbuf_add_bo(device, &execbuf, &ss_pool->bo,<br>
> + &cmd_buffer->surface_relocs,<br>
> &cmd_buffer->pool->alloc);<br>
> if (result != VK_SUCCESS)<br>
> return result;<br>
> @@ -1277,7 +1283,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device,<br>
> adjust_relocations_to_state_<wbr>pool(ss_pool, &(*bbo)->bo, &(*bbo)->relocs,<br>
> cmd_buffer->last_ss_pool_<wbr>center);<br>
><br>
> - anv_execbuf_add_bo(&execbuf, &(*bbo)->bo, &(*bbo)->relocs,<br>
> + anv_execbuf_add_bo(device, &execbuf, &(*bbo)->bo, &(*bbo)->relocs,<br>
> &cmd_buffer->pool->alloc);<br>
> }<br>
><br>
> diff --git a/src/intel/vulkan/anv_device.<wbr>c b/src/intel/vulkan/anv_device.<wbr>c<br>
> index 4e4fa19..f9d04ee 100644<br>
> --- a/src/intel/vulkan/anv_device.<wbr>c<br>
> +++ b/src/intel/vulkan/anv_device.<wbr>c<br>
> @@ -149,6 +149,8 @@ anv_physical_device_init(<wbr>struct anv_physical_device *device,<br>
> goto fail;<br>
> }<br>
><br>
> + device->supports_48bit_<wbr>addresses = anv_gem_supports_48b_<wbr>addresses(fd);<br>
> +<br>
> if (!anv_device_get_cache_uuid(<wbr>device->uuid)) {<br>
> result = vk_errorf(VK_ERROR_<wbr>INITIALIZATION_FAILED,<br>
> "cannot generate UUID");<br>
> @@ -1396,7 +1398,7 @@ anv_bo_init_new(struct anv_bo *bo, struct anv_device *device, uint64_t size)<br>
> if (!gem_handle)<br>
> return vk_error(VK_ERROR_OUT_OF_<wbr>DEVICE_MEMORY);<br>
><br>
> - anv_bo_init(bo, gem_handle, size);<br>
> + anv_bo_init(bo, gem_handle, size, true);<br>
><br>
> return VK_SUCCESS;<br>
> }<br>
> diff --git a/src/intel/vulkan/anv_gem.c b/src/intel/vulkan/anv_gem.c<br>
> index 0dde6d9..3d45243 100644<br>
> --- a/src/intel/vulkan/anv_gem.c<br>
> +++ b/src/intel/vulkan/anv_gem.c<br>
> @@ -301,6 +301,24 @@ anv_gem_get_aperture(int fd, uint64_t *size)<br>
> return 0;<br>
> }<br>
><br>
> +bool<br>
> +anv_gem_supports_48b_<wbr>addresses(int fd)<br>
> +{<br>
> + struct drm_i915_gem_exec_object2 obj = {<br>
> + .flags = EXEC_OBJECT_SUPPORTS_48B_<wbr>ADDRESS,<br>
> + };<br>
> +<br>
> + struct drm_i915_gem_execbuffer2 execbuf = {<br>
> + .buffers_ptr = (uintptr_t)&obj,<br>
> + .buffer_count = 1,<br>
> + .rsvd1 = 0xffffffu,<br>
> + };<br>
> +<br>
> + int ret = anv_ioctl(fd, DRM_IOCTL_I915_GEM_<wbr>EXECBUFFER2, &execbuf);<br>
> +<br>
> + return ret == -1 && errno == ENOENT;<br>
> +}<br>
> +<br>
> int<br>
> anv_gem_handle_to_fd(struct anv_device *device, uint32_t gem_handle)<br>
> {<br>
> diff --git a/src/intel/vulkan/anv_intel.c b/src/intel/vulkan/anv_intel.c<br>
> index c356e84..d97fbf7 100644<br>
> --- a/src/intel/vulkan/anv_intel.c<br>
> +++ b/src/intel/vulkan/anv_intel.c<br>
> @@ -57,7 +57,7 @@ VkResult anv_CreateDmaBufImageINTEL(<br>
><br>
> uint64_t size = (uint64_t)pCreateInfo-><wbr>strideInBytes * pCreateInfo->extent.height;<br>
><br>
> - anv_bo_init(&mem->bo, gem_handle, size);<br>
> + anv_bo_init(&mem->bo, gem_handle, size, true);<br>
><br>
> anv_image_create(_device,<br>
> &(struct anv_image_create_info) {<br>
> diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
> index 27c887c..425e376 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> @@ -299,11 +299,34 @@ struct anv_bo {<br>
> * writing to them and synchronize uses on other rings (eg if the display<br>
> * server uses the blitter ring).<br>
> */<br>
> - bool is_winsys_bo;<br>
> + bool is_winsys_bo:1;<br>
> +<br>
> + /* Whether or not this BO supports having a 48-bit address. Not all<br>
> + * buffers support arbitrary 48-bit addresses. In particular, we need to<br>
> + * be careful with general and instruction state buffers because we set the<br>
> + * size in STATE_BASE_ADDRESS to 0xfffff (the maximum) even though the BO<br>
> + * is most likely significantly smaller. If we let the kernel place it<br>
> + * anywhere it wants, it will default to placing it as high up the address<br>
> + * space as possible, the range specified by STATE_BASE_ADDRESS will<br>
> + * over-flow the 48-bit address range, and the GPU will hang. In order to<br>
> + * avoid this problem, we tell the kernel that the buffer does not support<br>
> + * 48-bit addresses, and it places the buffer at a 32-bit address. While<br>
> + * this solution is probably overkill, it is effective.<br>
<br>
</div></div>How about just setting the field to the bo->size? You must know the bo<br>
already at that point so that you can set the relocation target.<br></blockquote><div><br></div><div>Actually, we don't. We have a pointer to a thing that claims to be a BO but the actual GEM handle and size aren't known until execbuf time. (Yes, that's a bit weird but there are good reasons for it and it's not likely to change. When we stop doing relocations, there's a separate plan for how to handle that.)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Or you could allocate the sparse trtt in the high range and the kernel<br>
will then never be able to place objects high enough to overflow. I<br>
presume the overflow of the bo base + limit into the trtt will not cause<br>
similar issues...<span class="HOEnZb"><font color="#888888"><br>
</font></span></blockquote></div><br></div><div class="gmail_extra">As far as I can tell, overflowing the GTT is the only issue. However, I can't claim my testing was exhaustive.<br><br></div><div class="gmail_extra">--Jason<br></div></div>