[Intel-gfx] [PATCH v3 4/7] drm/i915: Cache LRC state page in the context

Tue Jan 12 08:45:02 PST 2016

On 12/01/16 13:11, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
>>
>> On 12/01/16 12:12, Chris Wilson wrote:
>>> On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> LRC lifetime is well defined so we can cache the page pointing
>>>> to the object backing store in the context in order to avoid
>>>> walking over the object SG page list from the interrupt context
>>>> without the big lock held.
>>>>
>>>> v2: Also cache the mapping. (Chris Wilson)
>>>> v3: Unmap on the error path.
>>>
>>> Then we only use the lrc_state_page in the unmapping, hardly worth the
>>> cost of saving it.
>>
>> Ok.
>>
>> Do you also know if this would now require any flushing or something
>> if previously kunmap_atomic might have done something under the
>> covers?
>
> kmap_atomic only changes the PTE and the unmap would flush the TLB. In
> terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
> kmap_atomic is meant to be cheaper to set up and a limited resource
> which can only be used without preemption.)
>
>>> The reg_state is better associated with the ring (since it basically
>>> contains the analog of the RING_HEAD and friends).
>>
>> Hm, not sure. The page belongs to the object from that anonymous
>> struct in intel_context so I think it is best to keep them together.
>
> The page does sure, but the state does not.
>
> The result is that we get:
>
> ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
> ring->registers[CTX_RING_TAIL+1] = req->tail;
>
> which makes a lot more sense, to me, when viewed against the underlying
> architecture of the hardware and comparing against the legacy ringbuffer.
> -Chris

When you say "ring", do you mean "engine" or "ringbuffer"?

The register state can't be associated with the engine /per se/, because 
it has to be per-context as well as per-engine. It doesn't really belong 
with the ringbuffer; in particular I've seen code for allocating the 
ringbuffer in stolen memory and dropping it during hibernation, whereas 
this register state shouldn't be lost across hibernation. So the 
lifetime of a register state image and a ringbuffer are different, 
therefore they don't belong together.

The register state is pretty much the /definition/ of a context, in 
hardware terms. OK, it's got an HWSP and other bits, but the register 
save area is what the h/w really cares about. So it makes sense that the 
kmapping for that area is also part of (the CPU's idea of) the context. Yes,

	ctx.engine[engine_id].registers[regno] = ...

is a bit clumsy, but I'd expect to cache the register-image pointer in a 
local here, along the lines of:

	uint32_t *registers = ctx.engine[engine_id].registers;
	registers[CTX_RING_TAIL+1] = req->tail;

etc.

[aside]
Some of this would be much less clumsy if the anonymous engine[]
struct had a name, say, "engine_info", so we could just do
	struct engine_info *eip = &ctx.engines[engine->id];
once at the beginning of any per-engine function and then use
eip->ringbuf, eip->state, eip->registers, etc without continually 
repeating the 'ctx.' and 'engine->id' bits over and over ...
[/aside]

Apart from that, I think Tvrtko's patch has lost the dirtying from:

 > -	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);

in execlists_update_context(), so should add

	set_page_dirty(lrc_state_page)

instead (and that's the use for it, so you /do/ want to cache it).

.Dave.