[Intel-gfx] [Mesa-dev] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.

Eero Tamminen eero.t.tamminen at intel.com
Tue Mar 14 14:25:08 UTC 2017


Hi,

On 14.03.2017 12:48, Eero Tamminen wrote:
> On 11.03.2017 03:14, Kenneth Graunke wrote:
>> On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
>> had the surprising behavior of doing a synchronized GTT mapping.
>> This is obviously not what the user of the API wanted.
>>
>> Eric left a comment indicating a valid concern: if the CPU and GPU
>> caches are incoherent, we don't keep track of where the user last
>> mapped the buffer, and what caches might contain relevant data.
>>
>> Modern Atom systems still don't have LLC, but they do offer snooping,
>> which effectively makes the caches coherent.  The kernel appears to
>> set up the PTE/PPAT to enable snooping for everything where the cache
>> level is not I915_CACHE_NONE.  As far as I know, only scanout buffers
>> are marked as uncached.
>>
>> Any buffers used by scanout should be flagged as non-reusable with
>> drm_intel_bo_disable_reuse(), prime export, or flink.  So, we can
>> assume that any reusable buffer should be snooped.
>>
>> This patch enables unsynchronized mappings for reusable buffers
>> on all Gen6+ hardware (which have either LLC or snooping).
>>
>> On Broxton, this improves the performance of Unigine Valley 1.0
>> on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
>> (same settings) by about 53%.
>
> I tested it with our normal set of benchmarks.
>
> Using FullHD resolution and "high" quality settings, on Broxton, Valley
> improved by ~11% and Heaven (with tessellation enabled) by 2-3%.

BSW: Valley +10%, Heaven +4%.  Blend showed regression, but it could 
still be within variance.

BYT: Valley +5%, Heaven +11%.  Rest of changes were within normal variance.


	- Eero

> CarChase seemed to improve also by several percents, but everything else
> was within normal variation.
>
> I'll check BYT & BSW too.
>
>
>     - Eero
>
>
>> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: mesa-dev at lists.freedesktop.org
>> ---
>>  intel/intel_bufmgr_gem.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> It looks like Mesa and Beignet are the only callers of this function
>> (SNA and Anvil don't use libdrm, UXA and vaapi don't use this function.)
>>
>> This passed our full barrage of Piglit/dEQP/GLCTS/GLESCTS testing.
>> gnome-shell still works, as does Unigine, and GLBenchmark.
>>
>> I haven't tested any OpenCL workloads.
>>
>> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
>> index e260f2dc..f53f1fcc 100644
>> --- a/intel/intel_bufmgr_gem.c
>> +++ b/intel/intel_bufmgr_gem.c
>> @@ -1630,9 +1630,7 @@ int
>>  drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>>  {
>>      drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)
>> bo->bufmgr;
>> -#ifdef HAVE_VALGRIND
>>      drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
>> -#endif
>>      int ret;
>>
>>      /* If the CPU cache isn't coherent with the GTT, then use a
>> @@ -1641,8 +1639,12 @@
>> drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>>       * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
>>       * we would potentially corrupt the buffer even when the user
>>       * does reasonable things.
>> +     *
>> +     * The caches are coherent on LLC platforms or snooping is enabled
>> +     * for the BO.  The kernel enables snooping for non-scanout
>> (reusable)
>> +     * buffers on modern non-LLC systems.
>>       */
>> -    if (!bufmgr_gem->has_llc)
>> +    if (bufmgr_gem->gen < 6 || !bo_gem->reusable)
>>          return drm_intel_gem_bo_map_gtt(bo);
>>
>>      pthread_mutex_lock(&bufmgr_gem->lock);
>>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the Intel-gfx mailing list