[PATCH RFC 102/111] staging: etnaviv: separate GPU pipes from execution state

Tue Apr 7 14:25:57 PDT 2015

On Tue, Apr 07, 2015 at 06:59:59PM +0200, Christian Gmeiner wrote:
> Hi Lucas.
> 
> 2015-04-07 17:29 GMT+02:00 Lucas Stach <l.stach at pengutronix.de>:
> > And I don't get why each core needs to have a single device node. IMHO
> > this is purely an implementation decision weather to have one device
> > node for all cores or one device node per core.
> 
> It is an important decision. And I think that one device node per core
> reflects the hardware design to 100%.

Since when do the interfaces to userspace need to reflect the hardware
design?

Isn't the point of having a userspace interface, in part, to abstract
the hardware design details and provide userspace with something that
is relatively easy to use without needlessly exposing the variation
of the underlying hardware?

Please get away from the idea that userspace interfaces should reflect
the hardware design.

> What makes harder to get it right? The needed changes to the kernel
> driver are not that hard. The user space is an other story but thats
> because of the render-only thing, where we need to pass (prime)
> buffers around and do fence syncs etc. In the end I do not see a
> showstopper in the user space.

The fence syncs are an issue when you have multiple cores - that's
something I started to sort out in my patch series, but when you
appeared to refuse to accept some of the patches, I stopped...

The problem when you have multiple cores is one global fence event
counter which gets compared to the fence values in each buffer
object no longer works.

Consider this scenario:

You have two threads, thread A making use of a 2D core, and thread B
using the 3D core.

Thread B submits a big long render operation, and the buffers get
assigned fence number 1.

Thread A submits a short render operation, and the buffers get assigned
fence number 2.

The 2D core finishes, and sends its interrupt.  Etnaviv updates the
completed fence position to 2.

At this point, we believe that fence numbers 1 and 2 are now complete,
despite the 3D core continuing to execute and operate on the buffers
with fence number 1.

I'm certain that the fence implementation we currently have can't be
made to work with multiple cores with a few tweeks - we need something
better to cater for what is essentially out-of-order completion amongst
the cores.

A simple resolution to that _would_ be your argument of exposing each
GPU as a separate DRM node, because then we get completely separate
accounting of each - but it needlessly adds an expense in userspace.
Userspace would have to make multiple calls - to each GPU DRM node -
to check whether the buffer is busy on any of the GPUs as it may not
know which GPU could be using the buffer, especially if it got it via
a dmabuf fd sent over the DRI3 protocol.  To me, that sounds like a
burden on userspace.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.