[Mesa-dev] [PATCH 4/4] i965: Drop non-LLC lunacy in the program cache code.
Chris Wilson
chris at chris-wilson.co.uk
Wed Jul 12 09:40:43 UTC 2017
Quoting Kenneth Graunke (2017-07-12 08:22:25)
> The non-LLC story was a horror show. We uploaded data via pwrite
> (drm_intel_bo_subdata), which would stall if the cache BO was in
> use (being read) by the GPU. Obviously, we wanted to avoid that.
> So, we tried to detect whether the buffer was busy, and if so, we'd
> allocate a new BO, map the old one read-only (hopefully not stalling),
> copy all shaders compiled since the dawn of time to the new buffer,
> upload our new one, toss the old BO, and let the state upload code
> know that our program cache BO changed. This was a lot of extra data
> copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
> STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.
>
> Not only that, but our rudimentary busy tracking consistented of a flag
> set at execbuf time, and not cleared until we threw out the program
> cache BO. So, the first shader upload after any drawing would hit this
> "abandon the cache and start over" copying path.
>
> None of this is necessary - it's just ancient crufty code. We can
> use the same persistent mapping paths on all platforms. On non-LLC
> platforms, this should use a write combining map, which should be
> decently fast. (On ancient kernels, this will fall through to an
> uncached GTT map, which will be less efficient, but...upgrade your
> kernel, seriously...)
>
> This is not only better, but the code is significantly simpler.
Another on the insta-kill list is the handling of !llc batches.
> brw_alloc_item_data(struct brw_cache *cache, uint32_t size)
> {
> uint32_t offset;
> - struct brw_context *brw = cache->brw;
>
> /* Allocate space in the cache BO for our new program. */
> if (cache->next_offset + size > cache->bo->size) {
> @@ -301,14 +278,6 @@ brw_alloc_item_data(struct brw_cache *cache, uint32_t size)
> brw_cache_new_bo(cache, new_size);
> }
>
> - /* If we would block on writing to an in-use program BO, just
> - * recreate it.
> - */
> - if (!brw->has_llc && cache->bo_used_by_gpu) {
> - perf_debug("Copying busy program cache buffer.\n");
> - brw_cache_new_bo(cache, cache->bo->size);
> - }
cache->bo_used_by_gpu is no longer used.
> -
> offset = cache->next_offset;
>
> /* Programs are always 64-byte aligned, so set up the next one now */
> @@ -346,7 +315,6 @@ brw_upload_cache(struct brw_cache *cache,
> uint32_t *out_offset,
> void *out_aux)
> {
> - struct brw_context *brw = cache->brw;
> struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
> const struct brw_cache_item *matching_data =
> brw_lookup_prog(cache, cache_id, data, data_size);
> @@ -373,11 +341,7 @@ brw_upload_cache(struct brw_cache *cache,
> item->offset = brw_alloc_item_data(cache, data_size);
>
> /* Copy data to the buffer */
> - if (brw->has_llc) {
> - memcpy(cache->map + item->offset, data, data_size);
> - } else {
> - brw_bo_subdata(cache->bo, item->offset, data_size, data);
> - }
> + memcpy(cache->map + item->offset, data, data_size);
> }
>
> /* Set up the memory containing the key and aux_data */
> @@ -416,8 +380,8 @@ brw_init_caches(struct brw_context *brw)
> cache->bo = brw_bo_alloc(brw->bufmgr, "program cache", 4096, 64);
> if (can_do_exec_capture(brw->screen))
> cache->bo->kflags = EXEC_OBJECT_CAPTURE;
> - if (brw->has_llc)
> - cache->map = brw_bo_map(brw, cache->bo, MAP_READ | MAP_WRITE | MAP_ASYNC);
> +
> + cache->map = brw_bo_map(brw, cache->bo, MAP_READ | MAP_WRITE | MAP_ASYNC);
MAP_PERSISTENT for completeness?
-Chris
More information about the mesa-dev
mailing list