[Intel-gfx] [RFC] drm/i915: Emit to ringbuffer directly
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Sep 9 08:32:50 UTC 2016
On 08/09/16 17:40, Chris Wilson wrote:
> On Thu, Sep 08, 2016 at 04:12:55PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> This removes the usage of intel_ring_emit in favour of
>> directly writing to the ring buffer.
>
> I have the same patch! But I called it out, for historical reasons.
Yes I know we talked about it in the past but I did not think you will
find time to actually write it amongst all the other things.
> Oh, except mine uses out[0]...out[N] because gcc prefers that over
> *out++ = ...
It copes just fine with the latter here, for example:
*rbuf++ = cmd;
*rbuf++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
*rbuf++ = 0; /* upper addr */
*rbuf++ = 0; /* value */
Is:
3e9: 89 10 mov %edx,(%rax)
3eb: c7 40 04 04 01 00 00 movl $0x104,0x4(%rax)
3f2: c7 40 08 00 00 00 00 movl $0x0,0x8(%rax)
3f9: c7 40 0c 00 00 00 00 movl $0x0,0xc(%rax)
And for the record, before this patch, with intel_ring_emit:
53a: 8b 53 3c mov 0x3c(%rbx),%edx
53d: 48 8b 4b 08 mov 0x8(%rbx),%rcx
541: 89 04 11 mov %eax,(%rcx,%rdx,1)
544: 8b 43 3c mov 0x3c(%rbx),%eax
547: 48 8b 53 08 mov 0x8(%rbx),%rdx
54b: 83 c0 04 add $0x4,%eax
54e: 89 43 3c mov %eax,0x3c(%rbx)
551: c7 04 02 04 01 00 00 movl $0x104,(%rdx,%rax,1)
558: 8b 43 3c mov 0x3c(%rbx),%eax
55b: 48 8b 53 08 mov 0x8(%rbx),%rdx
55f: 83 c0 04 add $0x4,%eax
562: 89 43 3c mov %eax,0x3c(%rbx)
565: c7 04 02 00 00 00 00 movl $0x0,(%rdx,%rax,1)
56c: 8b 43 3c mov 0x3c(%rbx),%eax
56f: 48 8b 53 08 mov 0x8(%rbx),%rdx
573: 83 c0 04 add $0x4,%eax
576: 89 43 3c mov %eax,0x3c(%rbx)
579: c7 04 02 00 00 00 00 movl $0x0,(%rdx,%rax,1)
Yuck :) At least they are not function calls to iowrite any more. :)
>> intel_ring_emit was preventing the compiler for optimising
>> fetch and increment of the current ring buffer pointer and
>> therefore generating very verbose code for every write.
>>
>> It had no useful purpose since all ringbuffer operations
>> are started and ended with intel_ring_begin and
>> intel_ring_advance respectively, with no bail out in the
>> middle possible, so it is fine to increment the tail in
>> intel_ring_begin and let the code manage the pointer
>> itself.
>>
>> Useless instruction removal amounts to approximately
>> 2384 bytes of saved text on my build.
>>
>> Not sure if this has any measurable performance
>> implications but executing a ton of useless instructions
>> on fast paths cannot be good.
>
> It does show up in perf.
Cool.
>> Patch is not fully polished, but it compiles and runs
>> on Gen9 at least.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_gem_context.c | 62 ++--
>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 27 +-
>> drivers/gpu/drm/i915/i915_gem_gtt.c | 57 ++--
>> drivers/gpu/drm/i915/intel_display.c | 113 ++++---
>> drivers/gpu/drm/i915/intel_lrc.c | 223 +++++++-------
>> drivers/gpu/drm/i915/intel_mocs.c | 43 +--
>> drivers/gpu/drm/i915/intel_overlay.c | 69 ++---
>> drivers/gpu/drm/i915/intel_ringbuffer.c | 480 +++++++++++++++--------------
>> drivers/gpu/drm/i915/intel_ringbuffer.h | 19 +-
>> 9 files changed, 555 insertions(+), 538 deletions(-)
>
> Hmm, mine is bigger.
>
> drivers/gpu/drm/i915/i915_gem_context.c | 85 ++--
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 37 +-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 62 +--
> drivers/gpu/drm/i915/i915_gem_request.c | 135 ++++-
> drivers/gpu/drm/i915/i915_gem_request.h | 2 +
> drivers/gpu/drm/i915/intel_display.c | 133 +++--
> drivers/gpu/drm/i915/intel_lrc.c | 188 ++++---
> drivers/gpu/drm/i915/intel_lrc.h | 2 -
> drivers/gpu/drm/i915/intel_mocs.c | 50 +-
> drivers/gpu/drm/i915/intel_overlay.c | 77 ++-
> drivers/gpu/drm/i915/intel_ringbuffer.c | 762 ++++++++++++-----------------
> drivers/gpu/drm/i915/intel_ringbuffer.h | 36 +-
> 12 files changed, 721 insertions(+), 848 deletions(-)
>
> (this includes moving the intel_ring_begin to i915_gem_request)
>
> plus an ealier
>
> drivers/gpu/drm/i915/i915_gem_request.c | 26 ++---
> drivers/gpu/drm/i915/intel_lrc.c | 121 ++++++++---------------
> drivers/gpu/drm/i915/intel_ringbuffer.c | 168 +++++++++++---------------------
> drivers/gpu/drm/i915/intel_ringbuffer.h | 10 +-
> 4 files changed, 112 insertions(+), 213 deletions(-)
>
> since I wanted parts of it for emitting timelines.
Ok what do you want to do?
Regards,
Tvrtko
More information about the Intel-gfx
mailing list