[Intel-gfx] [PATCH 14/15] drm/i915: Async GPU relocation processing
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Fri Mar 17 11:15:33 UTC 2017
On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> If the user requires patching of their batch or auxiliary buffers, we
> currently make the alterations on the cpu. If they are active on the GPU
> at the time, we wait under the struct_mutex for them to finish executing
> before we rewrite the contents. This happens if shared relocation trees
> are used between different contexts with separate address space (and the
> buffers then have different addresses in each), the 3D state will need
> to be adjusted between execution on each context. However, we don't need
> to use the CPU to do the relocation patching, as we could queue commands
> to the GPU to perform it and use fences to serialise the operation with
> the current activity and future - so the operation on the GPU appears
> just as atomic as performing it immediately. Performing the relocation
> rewrites on the GPU is not free, in terms of pure throughput, the number
> of relocations/s is about halved - but more importantly so is the time
> under the struct_mutex.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
<SNIP>
> /* Must be a variable in the struct to allow GCC to unroll. */
> + cache->gen = INTEL_GEN(i915);
> cache->has_llc = HAS_LLC(i915);
> - cache->has_fence = INTEL_GEN(i915) < 4;
> - cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
> + cache->has_fence = cache->gen < 4;
> + cache->use_64bit_reloc = cache->gen >= 8;
Keep using HAS_64BIT_RELOC(i915)...
> +static u32 *reloc_gpu(struct i915_execbuffer *eb,
> + struct i915_vma *vma,
> + unsigned int len)
> +{
> + struct reloc_cache *cache = &eb->reloc_cache;
> + u32 *cmd;
> +
> + if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))
There's no overflow chance here, so I'd rq_size + len + 1.
> + reloc_gpu_flush(cache);
> +
> + if (!cache->rq) {
> + struct drm_i915_gem_object *obj;
> + struct drm_i915_gem_request *rq;
> + struct i915_vma *batch;
> + int err;
> +
> + GEM_BUG_ON(vma->obj->base.write_domain & I915_GEM_DOMAIN_CPU);
Use ==.
I just close my eyes for the reminder of this function and expect v2 to
have a proper teardown :P
Also vma / obj pair naming varies from what they usually are, so I'd
consider renaming one of them to lessen confusion.
> @@ -1012,6 +1148,67 @@ relocate_entry(struct i915_vma *vma,
> bool wide = eb->reloc_cache.use_64bit_reloc;
> void *vaddr;
>
> + if (!eb->reloc_cache.vaddr &&
> + (DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
> + !reservation_object_test_signaled_rcu(obj->resv, true))) {
> + const unsigned int gen = eb->reloc_cache.gen;
> + unsigned int len;
> + u32 *batch;
> + u64 addr;
> +
> + if (wide)
> + len = offset & 7 ? 8 : 5;
> + else if (gen >= 4)
> + len = 4;
> + else if (gen >= 3)
> + len = 3;
> + else /* On gen2 MI_STORE_DWORD_IMM uses a physical address */
> + goto repeat;
> +
> + batch = reloc_gpu(eb, vma, len);
> + if (IS_ERR(batch))
> + goto repeat;
> +
> + addr = gen8_canonical_addr(vma->node.start + offset);
> + if (wide) {
> + if (offset & 7) {
> + *batch++ = MI_STORE_DWORD_IMM_GEN4;
This indent level calls for a new function. __relocate_entry_gpu
Couldn't we share some of this STORE IMM code we have around? I don't
want to crawl the specs every time :(
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list