[REPOST PATCH 4/8] android: convert sync to fence api, v5

Mon Jun 23 01:45:47 PDT 2014

Hey,

op 20-06-14 22:52, Thierry Reding schreef:
> On Thu, Jun 19, 2014 at 02:28:14PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
>> <thierry.reding at gmail.com> wrote:
>>>>> With these changes, can we pull the android sync logic out of
>>>>> drivers/staging/ now?
>>>> Afaik the google guys never really looked at this and acked it. So I'm not
>>>> sure whether they'll follow along. The other issue I have as the
>>>> maintainer of gfx driver is that I don't want to implement support for two
>>>> different sync object primitives (once for dma-buf and once for android
>>>> syncpts), and my impression thus far has been that even with this we're
>>>> not there.
>>>>
>>>> I'm trying to get our own android guys to upstream their i915 syncpts
>>>> support, but thus far I haven't managed to convince them to throw people's
>>>> time at this.
>>> This has been discussed a fair bit internally recently and some of our
>>> GPU experts have raised concerns that this may result in seriously
>>> degraded performance in our proprietary graphics stack. Now I don't care
>>> very much for the proprietary graphics stack, but by extension I would
>>> assume that the same restrictions are relevant for any open-source
>>> driver as well.
>>>
>>> I'm still trying to fully understand all the implications and at the
>>> same time get some of the people who raised concerns to join in this
>>> discussion. As I understand it the concern is mostly about explicit vs.
>>> implicit synchronization and having this mechanism in the kernel will
>>> implicitly synchronize all accesses to these buffers even in cases where
>>> it's not needed (read vs. write locks, etc.). In one particular instance
>>> it was even mentioned that this kind of implicit synchronization can
>>> lead to deadlocks in some use-cases (this was mentioned for Android
>>> compositing, but I suspect that the same may happen for Wayland or X
>>> compositors).
>> Well the implicit fences here actually can't deadlock. That's the
>> entire point behind using ww mutexes. I've also heard tons of
>> complaints about implicit enforced syncing (especially from opencl
>> people), but in the end drivers and always expose unsynchronized
>> access for specific cases. We do that in i915 for upload buffers and
>> other fun stuff. This is about shared stuff across different drivers
>> and different processes.
> Tegra K1 needs to share buffers across different drivers even for very
> basic use-cases since the GPU and display drivers are separate. So while
> I agree that the GPU driver can still use explicit synchronization for
> internal work, things aren't that simple in general.
>
> Let me try to reconstruct the use-case that caused the lock on Android:
> the compositor uses a hardware overlay to display an image. The system
> detects that there's little activity and instructs the compositor to put
> everything into one image and scan out only that (for power efficiency).
> Now with implicit locking the display driver has a lock on the image, so
> the GPU (used for compositing) needs to wait for it before it can
> composite everything into one image. But the display driver cannot
> release the lock on the image until the final image has been composited
> and can be displayed instead.
>
> This may not be technically a deadlock, but it's still a stalemate.
> Unless I'm missing something fundamental about DMA fences and ww mutexes
> I don't see how you can get out of this situation.
This sounds like a case for implicit shared fences. ;-) Reading and scanning out would
only wait for the last 'exclusive' fence, not on each other.

But in drivers/drm I can encounter a similar issue, people expect to be able to
overwrite the contents of the currently displayed buffer, so I 'solved' it by not adding
a fence on the buffer, only by waiting for buffer idle before page flipping.
The rationale is that the buffer is pinned internally, and the backing storage cannot
go away until dma_buf_unmap_attachment is called. So when you render to the
current front buffer without queuing a page flip you get exactly what you expect. ;-)

> Explicit vs. implicit synchronization may also become more of an issue
> as buffers are imported from other sources (such as cameras).
Yeah, but the kernel space primitives would in both cases be the same, so drivers don't need to implement 2 separate fencing mechanisms for that. :-)

~Maarten