Apitrace based frame retracer

Mon Jun 1 20:43:06 PDT 2015

(Sorry - snipped out most of the conversation here, so consider this a
fork of the thread)

>> 2) The concept of frame is only well-defined on single OpenGL contexts /
>> single-threaded traces.  If you have multiple contexts/threads, frames are
>> ill-defined, and it's hard if not impossible to find a range of calls where
>> it's safe to loop.
>
> Could this case be handled by allowing the user to select which context
> they want to debug?

  Isn't this issue already handled by the run single-threaded option?
Even given
multiple contexts, I thought the tracer captured timestamps for each
command.  Given
that the frame boundaries are on swapbuffers calls - it shouldn't be
particularly difficult
to assume that each context runs in parallel and order/block execution
based on the
individual context's swapbuffer call timestamp.  That being said - if
an app is truly
doing synchronization between multiple contexts - then the single
threaded run option
should still generate the correct output - just potentially more
slowly.  Even then though,
the parallel option could be expanded by doing some detection of
shared resources
between the contexts.

> The multi-context games that I've seen generally have a second context
> to overlay adds or other content which is not the core workload.  The
> other example is ChromeOS, which IIRC renders each tab in a separate
> context.  My hope is that the complex GL workloads on Chrome are
> benchmarks that can be easily captured/analyzed/optimized on a more
> accessible and typical platform.
>
> Is there another category of multi-context optimization that you think
> is important?
>
>> 4) There are many applications which don't have regular frames (e.g, OpenGL
>> accelerated UI toolkits, web browsers using GL for composition) they only
>> render new frames as response to user events, so every frame might end up
>> being different)
>
> I'm not sure how helpful a frame analysis tool would be for these
> cases.  A system trace that includes both UI inputs and the resulting GPU
> events would be more helpful in identifying latency.

  Why wouldn't it be helpful?  The nice thing about retracing a frame is that it
throws out the idle time between frames, i.e. time not spent doing rendering.

>> 5) Even "regular" applications might have "irregular" frames  -- e.g.,
>> maybe the player of a first-person shooter entered a new area, and new
>> objects were created, old ones deleted -- whereby  replaying that frame in
>> a loop will lead to corrupted/invalid state.
>
> I was trying to think about these cases as well.

  There is no such thing as a regular frame in anything except a truly
derivative
graphics application.
http://www.hardocp.com/image.html?image=MTQzMzEyMDM2NE1BdTlPTUdLMTVfMl8xX2wuanBn
Any truly intensive app will have such significant frame to frame
variation based on
the data that the idea of a "regular" frame is impossible to define.

>  * new objects: Retracing this frame would result in apitrace constantly
>    re-creating these objects, correct?  This would constitute a resource
>    leak during retrace, but the frame would still render correctly.  It
>    seems feasible to track resource generation during retrace.
>
>  * deletion of old resources: This would constitute a double-deletion on
>    retrace, which in many cases would be ignored or generate a GL error.
>    It would be curious for an app to use then delete a resource in a
>    single frame.  Retrace could skip deletions in the target frame if it
>    is a problem.
>
> I think it is more common for apps to create all the resources in
> advance, then bind them as needed directly before rendering.

  Many resources - yes.  Geometry and textures, yes.  But there is plenty of
per-frame generated data - Per object transforms are likely CPU
generated per-frame.

>> In short, the way I see it, this "frame retracing" idea can be indeed speed
>> up lookups for apitrace users in many circumstances, but it should be
>> something that users can opt-in/out, so that they can still get work done,
>> when for one reason or another the assumptions made there just don't hold.
>>
>> Of course, if you're goal is just profiling (and not debugging bad
>> rendering), then it should be always possible to find frames that meet the
>> assumptions.

  Being able to extract frames would be extremely helpful.  There's
really no reason
this would isn't doable, and I would love to see it.

> I chose sockets to enable the remote machine use case.  Performance
> analysis is the motivating capability for my team.  It is especially
> needed on under-powered devices that would struggle running an apitrace
> UI.  Tablets and other form factors cannot be analyzed without a remote
> connection.

  I think what you're asking for is GPUView for apitrace?
http://graphics.stanford.edu/~mdfisher/GPUView.html

  Chris

-- 
Like the famous mad philosopher said, when you stare into the void,
the void stares also; but if you cast into the void, you get a type
conversion error.  -- Charles Stross