[Intel-gfx] [PATCH 1/7] drm/i915: Specify bsd rings through exec flag

Wed Dec 10 07:55:18 PST 2014

On 10/12/14 09:11, Daniel Vetter wrote:
> On Wed, Dec 10, 2014 at 02:18:15AM +0000, Gong, Zhipeng wrote:
>> On Tue, 2014-12-09 at 10:46 +0100, Daniel Vetter wrote:
>>> On Mon, Dec 08, 2014 at 01:55:56PM -0800, Rodrigo Vivi wrote:

[snip]

>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> index e1ed85a..d9081ec 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> @@ -1273,8 +1273,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>>>       else if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_BSD) {
>>>>               if (HAS_BSD2(dev)) {
>>>>                       int ring_id;
>>>> -                     ring_id = gen8_dispatch_bsd_ring(dev, file);
>>>> -                     ring = &dev_priv->ring[ring_id];
>>>> +
>>>> +                     switch (args->flags & I915_EXEC_BSD_MASK) {
>>>> +                     case I915_EXEC_BSD_DEFAULT:
>>>> +                             ring_id = gen8_dispatch_bsd_ring(dev, file);
>>>> +                             ring = &dev_priv->ring[ring_id];
>>>> +                             break;
>>>> +                     case I915_EXEC_BSD_RING1:
>>>> +                             ring = &dev_priv->ring[VCS];
>>>
>>> Do we have any use-case for selecting ring1 specifically? I've thought
>>> it's only ring2 that is special?
>> The HEVC GPU commands should be dispatched to BSD RING 1 instead of BSD
>> RING2 as the two rings are asymmetrical. 
>> For the H264 decoding/encoding either ring is OK.
> 
> Well then same arguments applies with ring2 since only ring1 is special?
> It's just to minimize abi and reduce the amount of rope we hand to
> userspace.

Anyone who knows to use any of these flags is taking responsibility for
doing explicit engine allocation, so why not give them all the options
-- if for no other reason, more symmetry is good.

As an examle, there could be a case where userspace knows better than
the kernel how long each batch will take, and can predict an optimal
allocation pattern rather than just flip-flopping. So even when a batch
*can* run on either engine, there might be a reason to pick a specific one.

e.g.	short-1 -> ring 1
	short-2 -> ring 1
	long-1  -> ring 2
	short-3 -> ring 1
	long-2  -> ring 1

because the program knows that the three short batches together will
take less time than the one first long one.

.Dave.