[PATCH weston 3/3] gl-renderer: Emit GPU rendering begin and end timeline timepoints

Mon Sep 18 11:23:46 UTC 2017

On Mon, Sep 18, 2017 at 11:20:52AM +0300, Pekka Paalanen wrote:
> On Sat, 16 Sep 2017 20:46:08 +0300
> Alexandros Frantzis <alexandros.frantzis at collabora.com> wrote:
> 
> > On Fri, Sep 15, 2017 at 02:39:56PM +0300, Pekka Paalanen wrote:
> > > On Thu, 14 Sep 2017 14:07:35 +0300
> > > Alexandros Frantzis <alexandros.frantzis at collabora.com> wrote:
> > >   
> > > > Use EGL fence sync objects to emit timepoints for the beginning and the
> > > > end of rendering on the GPU. The timepoints are emitted asynchronously
> > > > using the sync file fds associated with the fence sync objects. The sync
> > > > file fds are acquired using the facilities provided by the
> > > > EGL_ANDROID_native_fence_sync extension.
> > > > 
> > > > If timelining is inactive or the required EGL extensions are not
> > > > present, then GPU timepoint processing and emission are skipped.
> > > > ---
> > > >  libweston/gl-renderer.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 106 insertions(+)  
> > > 
> > > Hi,
> > > 
> > > I am not familiar yet with the fence/sync APIs so I cannot review their
> > > usage at this time. I do review all the Weston parts.  
> > 
> > Hi Pekka,
> > 
> > thanks for the thorough review. I will work on the proposed changes for
> > v2, but in the meantime I have some thoughts about some of the review
> > comments:
> 
> Hi,
> 
> cool.

Hi Pekka,

thank you for your reply.

Please note that some of my comments below (related to timeline clock
domains etc) are non-blocking and only tangentially related to this
patchset, but of course I am happy to continue discussing them
independently of the decisions for v2.

> > > > diff --git a/libweston/gl-renderer.c b/libweston/gl-renderer.c
> > > > index c2e88a65..e4726d96 100644
> > > > --- a/libweston/gl-renderer.c
> > > > +++ b/libweston/gl-renderer.c
> > > > @@ -38,8 +38,13 @@
> > > >  #include <float.h>
> > > >  #include <assert.h>
> > > >  #include <linux/input.h>
> > > > +#include <linux/sync_file.h>  
> > > 
> > > How old is linux/sync_file.h?
> > > 
> > > Would we need configure.ac check for its existence?
> > > 
> > > I think we would need checks for it and make it optional, because my
> > > Gentoo system has no such header installed. Linux-headers package seems
> > > to be version 4.4 here.
> > > 
> > > Or should we have a fallback definition?  
> > 
> > I think linux/sync_file.h was introduced in kernel 4.7, released in July
> > 2016.
> > 
> > I don't know what the policy is for supporting older kernels, but, since
> > we have this built-time compatibility issue, I think it's worth hiding
> > all the details behind a linux-sync-file abstraction (e.g,
> > linux_sync_file_read_timestamp() in linux-sync-file.{c,h}).  In this
> > abstraction we can handle the compatibility issues any way we like while
> > still providing a clean interface to gl-renderer.
> 
> But if linux/sync_file.h does not exist build time in the system, what
> could we hope to do instead? There is no other timestamping mechanism
> to fall back to that would make any sense, is there?
> 
> Therefore providing an abstraction does not seem useful here. Even more
> so as we are talking about a build time dependency, not something we
> could pick at runtime which implementation to use.

I think we may be talking about different levels or kinds of
"abstraction".  I am not referring to a sync fence "interface" (e.g.
struct with function pointers) with multiple run-time implementations.
What I meant is putting the implementation in a separate
linux-sync-file.{c,h} and providing, e.g.:

int linux_sync_file_read_timestamp(int fd, uint64_t *ts);

for gl-renderer to use. Then we can hide all the build-time details of
supporting sync files (fallback definitions etc) in linux-sync-file.c.

> > For the optional vs fallback question, my preference would be to provide
> > fallback definitions. This is a stable kernel interface, and since the
> > kernel is guaranteed not to break userspace, we can safely carry the
> > definitions for as long as we think it's necessary.
> > 
> > Providing fallback definitions makes the produced code independent of
> > the kernel version it was built with, which seems to be the policy for
> > EGL, too. We can build on an older kernel and the feature will just work
> > when running on a newer kernel. At runtime, if the kernel doesn't
> > support sync files, the code will fail gracefully and just return a
> > timestamp of 0 (or some other invalid value), which we could check for
> > and not add a timepoint to the log.
> 
> Yes, I think providing fallback definitions would be fine here. The
> alternative is to #if all the code using it or put it in a separate
> file. I would not like adding #ifs, but putting in a separate file
> built conditionally is ok in principle but a fairly heavy solution in
> this case, IMHO.
> 
> What to do depends on how it will look like, what is the least amount
> of effort to maintain and with the highest code readablity.

See my reply above about moving the sync file details to a separate
file. Does this match what you had in mind?

The main gl_renderer code will not have any conditional compilation. At
runtime, if linux_sync_file_read_timestamp() fails, it just won't emit
the timepoint.

> > > > +	const char *tp_name = tr->end ? "renderer_gpu_end" : "renderer_gpu_begin";
> > > > +
> > > > +	if (mask & WL_EVENT_READABLE) {
> > > > +		uint64_t ts = read_fence_sync_fd_timestamp(fd);
> > > > +		struct timespec tspec = { 0 };
> > > > +
> > > > +		timespec_add_nsec(&tspec, &tspec, ts);
> > > > +
> > > > +		TL_POINT(tp_name, TLP_GPU(&tspec), TLP_OUTPUT(tr->output), TLP_END);  
> > > 
> > > What do we know about the clock domain / time base for the GPU
> > > timestamps?  
> > 
> > The clock domain for sync files is CLOCK_MONOTONIC. This is not
> > explicitly stated anywhere unfortunately, but all implementations of
> > sync files, from the early android days to the current upstream version
> > (which is backed by dmabuf fences) use ktime_get() in the kernel for the
> > timestamp (ktime_get() is essentially CLOCK_MONOTONIC).
> 
> Very good.
> 
> > > I did not see anywhere a check that the graphics clock domain we have
> > > chosen (weston_compositor::presentation_clock) is the same as the GPU
> > > timestamp domain.
> > > 
> > > I believe there should be a check, and if the domains do not match,
> > > print a message to the log saying they don't match and disable the GPU
> > > timestamping.  
> > 
> > Using different clocks would certainly limit the ability to correlate
> > events between display and gpu/rendering, but I am not sure if turning
> > off the GPU timepoints is the best approach. The duration of the
> > renderings could be interesting in themselves, even without correlation
> > with particular display events.
> 
> But we don't record the duration, we record absolute timestamps.

Since the begin/end rendering events use the same clock, we can use them
to calculate the rendering duration, which is still useful to have
regardless of whether we can correlate with other events in the
timeline.

> > I think one way to resolve this would be to add some indication of clock
> > domain in all timestamps in the timeline (whether timepoints or event
> > arguments). This way tools would know whether/how to interpret
> > relationships between events from different sources.
> 
> Please, no. The timelines are already complicated to interpret, let's
> not add a new dimension to the problem. All absolute timestamps must be
> kept in the same clock domain and time base.

The problem is that even now, that's not the case; the timeline log
files potentially contain timestamps from different clock domains. The
timepoint timestamps are in CLOCK_MONOTONIC, but the accurate
presentation timestamps (i.e., in VBLANK arguments) are in an arbitrary
clock depending on the compositor.

The software presentation clock, used by non-DRM platforms, prefers
other clocks, and the DRM platform may use either CLOCK_MONOTONIC or
CLOCK_REALTIME, so no guarantees there either.

In order for the timeline log to be generally useful we either need to
guarantee that we use the same clock domain everywhere (I don't think
conversions are a good option, read my thoughts below), or include the
domain indication. Otherwise, the presentation data in the timeline log
is only valid when captured under a limited setup, and, most
importantly, there is no indication in the logs about the validity of
the data.

> If you really wanted to keep the GPU timestamps even with different
> clocks, libweston should do the time conversion itself.

It would be great to use conversion to maintain a single clock base for
all timestamps. However, I don't think it's easy (or even possible in
some cases), to convert arbitrary past timestamps (and in a way all
events we get are in the past) accurately between arbitrary clocks,
given all the intricacies of steps and slews.  A possibly adequate
approximation could be achieved by continuously keeping track of the
clocks and their relative offsets.  Even in that case, though, in order
to convert between points in the past (even the close past, e.g., events
we get about vblank or gpu) we would need to keep have some record of
their relative offsets at exactly that past point. I don't think it's
worth all the effort.

Conversion could be directly possible between some clocks if there is a
guarantee that all their timestamps, or at least the timestamp we are
trying to convert, are offset by a fixed known value, but that's not the
general case.

Perhaps, though, we don't need such correctness guarantees for this
scenario, and would be OK to have some errors introduced by assuming a
fixed offset between clocks, even if that's not always accurate.

> The presentation clock comes from the DRM sub-system in the first place
> when using the DRM-backend, so getting that to be different from GPU
> timestamp clock would be quite unexpected. With other backends, it
> could be different, but I believe also less interesting.
>
> > Another point, is that, as things stand, performing the check would
> > deactivate the GPU time points on most non-DRM platforms, since they use
> > the software presentation clock which prefers other domains over
> > CLOCK_MONOTONIC.  There are some slight benefits to preferring other
> > clocks, e.g., CLOCK_MONOTONIC_RAW, when measuring short durations, but I
> > think these benefits are outweighed by the drawback of being
> > incompatible with CLOCK_MONOTONIC, which is more widely used.
> 
> I actually would prefer to disable the GPU timestamps.
> 
> DRM-backend is the only backend aside from the Wayland backend that can
> get accurate pageflip timestamps.
> 
> If the Wayland backend used the Presentation time extension, it would
> get the presentation clock id from the parent compositor, which ideally
> gets it from DRM anyway. But currently it does not, and so it does not
> receive good pageflip timestamps either.
> 
> As the timestamping on non-DRM backends is inaccurate to begin with, I
> don't think there is need to invest the effort to make GPU timestamps
> work there either. If a need arises, we can always return here and
> re-evaluate.

As explained previously, I don't agree that render begin/end GPU
timestamps are useless without presentation timestamp correlation.
However, I agree it's a small change to make, so let's not block on
this. In v2 I will disable GPU timestamps if the clocks are different,
and we can revisit as needed. The practical effect is that GPU
timepoints will be enabled only when running with the DRM backend (and
then only when DRM uses monotonic clocks).

> > Going on a tangent here, ideally we would have a mechanism to negotiate
> > a single clock for all time points from different sources. For example,
> > the compositor (display) would list its clocks, so would the renderer
> > (gpu), and we would in some way select the best match, if there is one.
> > The timeline clock could be then set to the selected clock for maximum
> > event correlation. If there is no global match, the domain indication in
> > the timelog timestamps (discussed above) would help resolve any
> > confusion.
> 
> Domain indication does not help, unless libweston also records the
> necessary information to correlate the clocks. If libweston can record
> that, it might as well just convert all timestamps to the same base to
> begin with.

When global clock negotiation fails, and since conversion is not
possible in general (as discussed above) there could still be value in
recording these events, as they could be correlated with other events in
the same clock domain. Why shouldn't we include this information and let
the tool/user decide if and how to use it?  Anyway, this is beyond the
scope of this patchset.

Thanks,
Alexandros