[Intel-gfx] [Mesa-dev] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.
Eero Tamminen
eero.t.tamminen at intel.com
Tue Mar 14 14:25:08 UTC 2017
Hi,
On 14.03.2017 12:48, Eero Tamminen wrote:
> On 11.03.2017 03:14, Kenneth Graunke wrote:
>> On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
>> had the surprising behavior of doing a synchronized GTT mapping.
>> This is obviously not what the user of the API wanted.
>>
>> Eric left a comment indicating a valid concern: if the CPU and GPU
>> caches are incoherent, we don't keep track of where the user last
>> mapped the buffer, and what caches might contain relevant data.
>>
>> Modern Atom systems still don't have LLC, but they do offer snooping,
>> which effectively makes the caches coherent. The kernel appears to
>> set up the PTE/PPAT to enable snooping for everything where the cache
>> level is not I915_CACHE_NONE. As far as I know, only scanout buffers
>> are marked as uncached.
>>
>> Any buffers used by scanout should be flagged as non-reusable with
>> drm_intel_bo_disable_reuse(), prime export, or flink. So, we can
>> assume that any reusable buffer should be snooped.
>>
>> This patch enables unsynchronized mappings for reusable buffers
>> on all Gen6+ hardware (which have either LLC or snooping).
>>
>> On Broxton, this improves the performance of Unigine Valley 1.0
>> on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
>> (same settings) by about 53%.
>
> I tested it with our normal set of benchmarks.
>
> Using FullHD resolution and "high" quality settings, on Broxton, Valley
> improved by ~11% and Heaven (with tessellation enabled) by 2-3%.
BSW: Valley +10%, Heaven +4%. Blend showed regression, but it could
still be within variance.
BYT: Valley +5%, Heaven +11%. Rest of changes were within normal variance.
- Eero
> CarChase seemed to improve also by several percents, but everything else
> was within normal variation.
>
> I'll check BYT & BSW too.
>
>
> - Eero
>
>
>> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: mesa-dev at lists.freedesktop.org
>> ---
>> intel/intel_bufmgr_gem.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> It looks like Mesa and Beignet are the only callers of this function
>> (SNA and Anvil don't use libdrm, UXA and vaapi don't use this function.)
>>
>> This passed our full barrage of Piglit/dEQP/GLCTS/GLESCTS testing.
>> gnome-shell still works, as does Unigine, and GLBenchmark.
>>
>> I haven't tested any OpenCL workloads.
>>
>> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
>> index e260f2dc..f53f1fcc 100644
>> --- a/intel/intel_bufmgr_gem.c
>> +++ b/intel/intel_bufmgr_gem.c
>> @@ -1630,9 +1630,7 @@ int
>> drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>> {
>> drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)
>> bo->bufmgr;
>> -#ifdef HAVE_VALGRIND
>> drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
>> -#endif
>> int ret;
>>
>> /* If the CPU cache isn't coherent with the GTT, then use a
>> @@ -1641,8 +1639,12 @@
>> drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>> * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
>> * we would potentially corrupt the buffer even when the user
>> * does reasonable things.
>> + *
>> + * The caches are coherent on LLC platforms or snooping is enabled
>> + * for the BO. The kernel enables snooping for non-scanout
>> (reusable)
>> + * buffers on modern non-LLC systems.
>> */
>> - if (!bufmgr_gem->has_llc)
>> + if (bufmgr_gem->gen < 6 || !bo_gem->reusable)
>> return drm_intel_gem_bo_map_gtt(bo);
>>
>> pthread_mutex_lock(&bufmgr_gem->lock);
>>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the Intel-gfx
mailing list