[Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.
Kristian Høgsberg
hoegsberg at gmail.com
Fri Sep 26 15:06:22 PDT 2014
On Fri, Sep 26, 2014 at 2:21 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Friday, September 26, 2014 09:22:31 AM Kristian Høgsberg wrote:
>> On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote:
>> > There's no reason to stall on pwrite - the CPU always appends to the
>> > buffer and never modifies existing contents, and the GPU never writes
>> > it. Further, the CPU always appends new data before submitting a batch
>> > that requires it.
>> >
>> > This code predates the unsynchronized mapping feature, so we simply
>> > didn't have the option when it was written.
>> >
>> > Ideally, we would do this for non-LLC platforms too, but unsynchronized
>> > mapping support only exists for LLC systems.
>> >
>> > Saves repeated 0.001ms stalls on program upload.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> > src/mesa/drivers/dri/i965/brw_state_cache.c | 34 +++++++++++++++++++++++------
>> > 1 file changed, 27 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c
>> > index b9bb0fc..1d2d32f 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
>> > @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size)
>> > drm_intel_bo *new_bo;
>> >
>> > new_bo = drm_intel_bo_alloc(brw->bufmgr, "program cache", new_size, 64);
>> > + if (brw->has_llc)
>> > + drm_intel_gem_bo_map_unsynchronized(new_bo);
>> >
>> > /* Copy any existing data that needs to be saved. */
>> > if (cache->next_offset != 0) {
>> > - brw_bo_map(brw, cache->bo, false, "program cache");
>> > - drm_intel_bo_subdata(new_bo, 0, cache->next_offset, cache->bo->virtual);
>> > - drm_intel_bo_unmap(cache->bo);
>> > + if (brw->has_llc) {
>> > + memcpy(new_bo->virtual, cache->bo->virtual, cache->next_offset);
>>
>> Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap()
>> calls into this block so they bracket the memcpy as for the subdata case
>> below?
>>
>> Other than that,
>>
>> Reviewed-by: Kristian Høgsberg <krh at bitplanet.net>
>
> That won't work---the point is to map new_bo, and leave it mapped...and unmap the old BO before throwing it away. If I moved the map call into the if (cache->next_offset != 0) block, then the initial mapping would never occur.
Yup, that makes sense.
Kristian
>
>> > + } else {
>> > + brw_bo_map(brw, cache->bo, false, "program cache");
>> > + drm_intel_bo_subdata(new_bo, 0, cache->next_offset,
>> > + cache->bo->virtual);
>> > + drm_intel_bo_unmap(cache->bo);
>> > + }
>> > }
>> >
>> > + if (brw->has_llc)
>> > + drm_intel_bo_unmap(cache->bo);
>> > drm_intel_bo_unreference(cache->bo);
>> > cache->bo = new_bo;
>> > cache->bo_used_by_gpu = false;
>> > @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache,
>> > continue;
>> > }
>> >
>> > - brw_bo_map(brw, cache->bo, false, "program cache");
>> > + if (!brw->has_llc)
>> > + brw_bo_map(brw, cache->bo, false, "program cache");
>> > ret = memcmp(cache->bo->virtual + item->offset, data, item->size);
>> > - drm_intel_bo_unmap(cache->bo);
>> > + if (!brw->has_llc)
>> > + drm_intel_bo_unmap(cache->bo);
>> > if (ret)
>> > continue;
>> >
>> > @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache,
>> > /* If we would block on writing to an in-use program BO, just
>> > * recreate it.
>> > */
>> > - if (cache->bo_used_by_gpu) {
>> > + if (!brw->has_llc && cache->bo_used_by_gpu) {
>> > perf_debug("Copying busy program cache buffer.\n");
>> > brw_cache_new_bo(cache, cache->bo->size);
>> > }
>> > @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache,
>> > uint32_t *out_offset,
>> > void *out_aux)
>> > {
>> > + struct brw_context *brw = cache->brw;
>> > struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
>> > GLuint hash;
>> > void *tmp;
>> > @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache,
>> > cache->n_items++;
>> >
>> > /* Copy data to the buffer */
>> > - drm_intel_bo_subdata(cache->bo, item->offset, data_size, data);
>> > + if (brw->has_llc) {
>> > + memcpy((char *) cache->bo->virtual + item->offset, data, data_size);
>> > + } else {
>> > + drm_intel_bo_subdata(cache->bo, item->offset, data_size, data);
>> > + }
>> >
>> > *out_offset = item->offset;
>> > *(void **)out_aux = (void *)((char *)item->key + item->key_size);
>> > @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw)
>> > cache->bo = drm_intel_bo_alloc(brw->bufmgr,
>> > "program cache",
>> > 4096, 64);
>> > + if (brw->has_llc)
>> > + drm_intel_gem_bo_map_unsynchronized(cache->bo);
>> >
>> > cache->aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare;
>> > cache->aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare;
>> > @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct brw_cache *cache)
>> >
>> > DBG("%s\n", __FUNCTION__);
>> >
>> > + if (brw->has_llc)
>> > + drm_intel_bo_unmap(cache->bo);
>> > drm_intel_bo_unreference(cache->bo);
>> > cache->bo = NULL;
>> > brw_clear_cache(brw, cache);
>>
More information about the mesa-dev
mailing list