[Intel-gfx] [PATCH 3/6] drm/i915: Split the batch pool by engine

Thu Mar 19 04:58:17 PDT 2015

On 03/19/2015 11:46 AM, Chris Wilson wrote:
> On Thu, Mar 19, 2015 at 11:39:16AM +0000, Tvrtko Ursulin wrote:
>> On 03/19/2015 10:06 AM, Chris Wilson wrote:
>>> On Thu, Mar 19, 2015 at 09:36:14AM +0000, Tvrtko Ursulin wrote:
>>>> Well in a way at least where when we talk about LRU ordering, it
>>>> depends on retiring working properly and that is not obvious from
>>>> code layout and module separation.
>>>
>>> I've lost you. The list is in LRU submission order. With this split, the
>>> list is both in LRU submission and LRU retirememnt order. That the two
>>> are not the same originally is not a fault of retiring not working
>>> properly, but that the hardware is split into different units and
>>> timelines.
>>>
>>>> And then with this me move traversal inefficiency to possible more
>>>> resource use. Would it be better to fix the cause rather than
>>>> symptoms? Is it feasible? What would be the downside of retiring all
>>>> rings before submission?
>>>
>>> Not really. Inefficient userspace is inefficient. All we want to be sure
>>> is that one abusive client doesn't cause a DoS on another, whilst making
>>> sure that good clients are not penalized.
>>
>> Not sure to which of my question your "not really" was the answer.
>
> We do "fix" the cause later, and I've amended the throttling in mesa to
> prevent a reoccurrence.  So I was thinking of why we only retire on
> the current ring.
>
>> I understood that this is about the completed work which hasn't been
>> retired due the latter only happening on submission to the same
>> ring, or with too low frequency from retire work handler.
>>
>> If this is true, could we just not do a retire pass on all rings on
>> any submission?
>
> No. The problem is that rings retire out of order. So a global LRU
> submission list is not strictly separated between inactive and active
> objects (in contrast to the per-engine list where it is true).

How about retire all rings and then the inactive batch search with a 
global pool becomes only O(num_rings) at worst? Might be worth saving 
memory resource (multiple pools) vs. trivial traversal like that?

Regards,

Tvrtko