[Intel-gfx] [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
david.s.gordon at intel.com
Tue Jan 12 08:45:02 PST 2016
On 12/01/16 13:11, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
>> On 12/01/16 12:12, Chris Wilson wrote:
>>> On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> LRC lifetime is well defined so we can cache the page pointing
>>>> to the object backing store in the context in order to avoid
>>>> walking over the object SG page list from the interrupt context
>>>> without the big lock held.
>>>> v2: Also cache the mapping. (Chris Wilson)
>>>> v3: Unmap on the error path.
>>> Then we only use the lrc_state_page in the unmapping, hardly worth the
>>> cost of saving it.
>> Do you also know if this would now require any flushing or something
>> if previously kunmap_atomic might have done something under the
> kmap_atomic only changes the PTE and the unmap would flush the TLB. In
> terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
> kmap_atomic is meant to be cheaper to set up and a limited resource
> which can only be used without preemption.)
>>> The reg_state is better associated with the ring (since it basically
>>> contains the analog of the RING_HEAD and friends).
>> Hm, not sure. The page belongs to the object from that anonymous
>> struct in intel_context so I think it is best to keep them together.
> The page does sure, but the state does not.
> The result is that we get:
> ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
> ring->registers[CTX_RING_TAIL+1] = req->tail;
> which makes a lot more sense, to me, when viewed against the underlying
> architecture of the hardware and comparing against the legacy ringbuffer.
When you say "ring", do you mean "engine" or "ringbuffer"?
The register state can't be associated with the engine /per se/, because
it has to be per-context as well as per-engine. It doesn't really belong
with the ringbuffer; in particular I've seen code for allocating the
ringbuffer in stolen memory and dropping it during hibernation, whereas
this register state shouldn't be lost across hibernation. So the
lifetime of a register state image and a ringbuffer are different,
therefore they don't belong together.
The register state is pretty much the /definition/ of a context, in
hardware terms. OK, it's got an HWSP and other bits, but the register
save area is what the h/w really cares about. So it makes sense that the
kmapping for that area is also part of (the CPU's idea of) the context. Yes,
ctx.engine[engine_id].registers[regno] = ...
is a bit clumsy, but I'd expect to cache the register-image pointer in a
local here, along the lines of:
uint32_t *registers = ctx.engine[engine_id].registers;
registers[CTX_RING_TAIL+1] = req->tail;
Some of this would be much less clumsy if the anonymous engine
struct had a name, say, "engine_info", so we could just do
struct engine_info *eip = &ctx.engines[engine->id];
once at the beginning of any per-engine function and then use
eip->ringbuf, eip->state, eip->registers, etc without continually
repeating the 'ctx.' and 'engine->id' bits over and over ...
Apart from that, I think Tvrtko's patch has lost the dirtying from:
> - page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
in execlists_update_context(), so should add
instead (and that's the use for it, so you /do/ want to cache it).
More information about the Intel-gfx