[Intel-gfx] [PATCH v6] drm/i915: Emit to ringbuffer directly
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Feb 9 11:49:28 UTC 2017
On 09/02/2017 10:37, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
>> On Thu, Feb 09, 2017 at 10:00:35AM +0200, Joonas Lahtinen wrote:
>>> On ke, 2017-02-08 at 18:04 +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> This removes the usage of intel_ring_emit in favour of
>>>> directly writing to the ring buffer.
>>>>
>>>> intel_ring_emit was preventing the compiler for optimising
>>>> fetch and increment of the current ring buffer pointer and
>>>> therefore generating very verbose code for every write.
>>>>
>>>> It had no useful purpose since all ringbuffer operations
>>>> are started and ended with intel_ring_begin and
>>>> intel_ring_advance respectively, with no bail out in the
>>>> middle possible, so it is fine to increment the tail in
>>>> intel_ring_begin and let the code manage the pointer
>>>> itself.
>>>>
>>>> Useless instruction removal amounts to approximately
>>>> two and half kilobytes of saved text on my build.
>>>>
>>>> Not sure if this has any measurable performance
>>>> implications but executing a ton of useless instructions
>>>> on fast paths cannot be good.
>>>>
>>>> Patch is not fully polished, but it compiles and runs
>>>> on Gen9 at least.
>>>>
>>>> v2:
>>>> * Change return from intel_ring_begin to error pointer by
>>>> popular demand.
>>>> * Move tail increment to intel_ring_advance to enable some
>>>> error checking.
>>>>
>>>> v3:
>>>> * Move tail advance back into intel_ring_begin.
>>>> * Rebase and tidy.
>>>>
>>>> v4:
>>>> * Complete rebase after a few months since v3.
>>>>
>>>> v5:
>>>> * Remove unecessary cast and fix !debug compile. (Chris Wilson)
>>>>
>>>> v6:
>>>> * Make intel_ring_offset take request as well.
>>>> * Fix recording of request postfix plus a sprinkle of asserts.
>>>> (Chris Wilson)
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> <SNIP>
>>>
>>>> @@ -617,99 +616,92 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>>>> if (INTEL_GEN(dev_priv) >= 7)
>>>> len += 2 + (num_rings ? 4*num_rings + 6 : 0);
>>>>
>>>> - ret = intel_ring_begin(req, len);
>>>> - if (ret)
>>>> - return ret;
>>>> + out = intel_ring_begin(req, len);
>>>> + if (IS_ERR(out))
>>>> + return PTR_ERR(out);
>>>>
>>>> /* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
>>>> if (INTEL_GEN(dev_priv) >= 7) {
>>>> - intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE);
>>>> + *out++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
>>>
>>> I expressed my concern in the previous iteration of this series months
>>> ago, and here goes again; Lets try to keep the writes easily greppable.
>>>
>>> So intel_ring_emit (or better name) could remain as a wrapper
>>>
>>> #define (something something)_emit(x, y) *(x)++ = (y)
>>
>> My concern with intel_ring_emit() remaining is that we are no longer
>> operating on the ring. The pointer to use for emitting is retrieved from
>> the request, so I think pointer = i915_gem_request_emit(rq, num_dwords)
>> is what we want in the near future.
>>
>> I suppose if that was
>>
>> ring = i915_gem_request_emit(rq, num_dwords);
>> intel_ring_emit(ring, blah)
>> intel_ring_advance(rq, ring); /* this still needs polish */
>>
>
> Going through request feels right. For ring_emit
> we could use shorter:
>
> cs_emit and cs_advance.
>
> They are rings but for users at this level the distinction
> feels unimportant.
>
> Just my few bikesheds.
If the main concern is "grepability" then I am not sure that there is a
problem with "*out++", be it out, batch, or something (just not ring!).
There are no other such statements in the code base at the moment. So It
would be completely unique for ring emission.
On top of that intel_ring_begin is also a current grep marker when
looking for ring emission. Only exception is I think breadcrumb emission.
I can see one argument about helper being future proof, that if one day
we decide to allow wrap or something, there would be no need to change
it all back. It would mean having it as "out = intel_ring_emit(rq, out,
dw)" though, so still a churny patch.
If the lasts thing is something we want to consider then OK, otherwise I
am ambiguous whether we need a helper.
I am also not sure macro which increments the argument is that great.
1.
*out++ = MI_USER_INTERURPT;
*out++ = MI_NOOP;
I don't have a problem with this one.
2.
ring_emit(out, MI_USER_INTERRUPT);
ring_emit(out, MI_NOOP);
Don't like the automagicall increment.
3.
ring_emit(out++, MI_USER_INTERRUPT);
ring_emit(out++, MI_NOOP);
Better but for me doesn't beat 1 by a huge margin. Would it be a bit
unsafe macro since we could put the argument in parentheses?
#define ring_emit(out, dw) *((out) - 1) = dw ?
Wonder what would happen with this one. :)
4.
out = ring_emit(out, MI_USER_INTERRUPT);
out = ring_emit(out, MI_NOOP);
Alternative to 3. but not sure if GCC would be able co completely
optimize this. It should?
5.
out = ring_emit(rq, out, MI_USER_INTERURPT);
out = ring_emit(rq, out, MI_NOOP);
Future proof version?
Regards,
Tvrtko
More information about the Intel-gfx
mailing list