[Intel-gfx] [PATCH 5/5] drm/i915:Use the coarse mechanism based on drm fd to dispatch the BSD command on BDW GT3

Zhao Yakui yakui.zhao at intel.com
Fri Apr 11 02:53:21 CEST 2014


On Thu, 2014-04-10 at 03:03 -0600, Daniel Vetter wrote:
> On Thu, Apr 10, 2014 at 04:04:22PM +0800, Zhao Yakui wrote:
> > On Thu, 2014-04-10 at 00:48 -0600, Daniel Vetter wrote:
> > > On Thu, Apr 10, 2014 at 10:24:53AM +0800, Zhao Yakui wrote:
> > > > On Wed, 2014-04-09 at 08:34 -0600, Daniel Vetter wrote:
> > > > > On Wed, Apr 09, 2014 at 09:59:56AM +0800, Zhao Yakui wrote:
> > > > > > The BDW GT3 has two independent BSD rings, which can be used to process the
> > > > > > video commands. To be simpler, it is transparent to user-space driver/middleware.
> > > > > > Instead the kernel driver will decide which ring is to dispatch the BSD video
> > > > > > command.
> > > > > > 
> > > > > > As every BSD ring is powerful, it is enough to dispatch the BSD video command
> > > > > > based on the drm fd. In such case the different BSD ring is used for video playing
> > > > > > back and encoding. At the same time the coarse dispatch mechanism can help to avoid
> > > > > > the object synchronization between the BSD rings.
> > > > > > 
> > > > > > Signed-off-by: Zhao Yakui <yakui.zhao at intel.com>
> > > > > 
> > > > > This looks way too complicated. First things first please get rid of the
> > > > > atomic_t usage. If you don't have _massive_ comments explaining the memory
> > > > > barriers you're most likely using linux kernel atomic_t wrong. They are
> > > > > fully unordered.
> > > > 
> > > > Thanks for the review.
> > > > 
> > > > For the atomic_t usage:  I will remove it in next version as the counter
> > > > is already protected by the lock.  
> > > > 
> > > > > 
> > > > > With that out of the way this still looks a bit complicated really. Can't
> > > > > we just use a very simple static rule in gen8_dispatch_bsd_ring which
> > > > > hashed the pointer address of the file_priv? Just to get things going,
> > > > > once we have a clear need we can try to make things more intelligent. But
> > > > > in case of doubt I really prefer if we start with the dumbest possible
> > > > > approach first and add complexity instead of starting with something
> > > > > really complex and simplifying it.
> > > > 
> > > > Do you mean that file_priv is hashed and then is mapped to BSD 0 or 1
> > > > ring?  
> > > 
> > > Yeah, that's the idea. Get in the basic support first, make it fancy like
> > > you describe below second. This has a few upsides:
> > > - We can concentrate on validating basic support in the first round
> > >   instead of potentially fighting a bug in the load balancer.
> > > - Discussions and performance testing for the load balancer won't hold up
> > >   the entire feature.
> > > - Like I've said this might not be required. Before we add more complexity
> > >   than just hashing the file_priv I want to see some benchmarks of
> > >   expected workloads that show that the load balancing is indeed a good
> > >   idea - for the case of a transcode server I guess we should have
> > >   sufficient in-flight operations that it won't really matter. Or at least
> > >   I hope so.
> > > 
> > 
> > OK. Understand your concerns. I can split it two steps. One is to add
> > the basic support. The second step is for the optimization.
> > 
> > But I don't think that the hash of file_priv is a good idea. As it only
> > has two rings, it is possible that the hash value is always mapped to
> > BSD ring 0.  In such case when multiples video clips are played back,
> > the performance can't meet with the requirement.(For example: User can
> > play back 4 1080p video clips concurrently when only one BSD ring is
> > used. On the BDW GT3, they hope to play back 8 1080p video clips
> > concurrently. The poor hash design will cause that all the workload are
> > mapped to one BSD ring and then it can't meet with the requirement).
> > 
> > How about using the ping-pong mechanism for the file_priv? For one new
> > fd, it will use BSD ring 0 and then next file_priv will use BSD ring 1.
> > Then BSD ring 0....BSD ring 1. 
> > 
> > Does this make sense to you?
> 
> Well the point of the hash is that it's dumb and simple, but maybe too
> dumb. If we wend up with 3 streams on one vcs and 1 on the other, then we
> have a good reason to merge the 2nd patch ;-)
> 

Hi, Daniel

    Thanks for your comments. Now we get get the same point about the
support of dual BSD rings on BDW GT3 machine.  So this will be divided
into two steps. The first step is to use the simple ping-pong mechanism
to add the basic support. And the second step is for the
optimization(balance video workloads among the two rings).
    From my point the ping-pong mechanism is simpler and easier to
implement. Of course this can also be regarded as the specific hash.

> Really, the point of the first patch is just so that we have /something/
> which uses both rings with a reasonable chance, so that we can get testing
> and validation off the ground. E.g. in the test I'd use 10 or so drm fds to
> make sure that at least one of them uses the other ring, in case the hash
> function isn't great.

   Understand. We need such test case to verify it. This is already in
my plan.

Thanks.
    Yakui





> -Daniel





More information about the Intel-gfx mailing list