Apitrace based frame retracer

Mon Jun 8 17:16:01 PDT 2015

Hello José,

Thanks for the helpful response.  I didn't know about the UBJSON work,
and I agree with your point about the use of GL by QtQuick.

I think the most sensible place to begin is by exposing the existing
retrace functionality in a usable library.  I'd like to build up some
compelling functionality before attempting to incorporate it in
qapitrace.  If my work doesn't pan out, then you won't have spent much
time reviewing my patches.

I'll keep my work MIT licensed and attempt to follow the apitrace
conventions, to reduce the amount of work required to bring it into
apitrace.  And I'll keep a public clone of the work available on github
as I make progress.

The items I'd like to change in apitrace retrace:

 - remove main routines from retrace libraries, so other main routines
   can link against them.

 - encapsulate global variables used by retrace functions in a "retrace
   state" object that is passed to the generated functions.

Unless you think this is unreasonable, I'll prepare a branch for you to
review.

-Mark

José Fonseca <jose.r.fonseca at gmail.com> writes:

> On Tue, Jun 2, 2015 at 1:55 AM, Mark Janes <mark.a.janes at intel.com> wrote:
>
>> José Fonseca <jose.r.fonseca at gmail.com> writes:
>>
>> > On Fri, May 29, 2015 at 11:50 PM, Mark Janes <mark.a.janes at intel.com>
>> wrote:
>> > 2) The concept of frame is only well-defined on single OpenGL contexts /
>> > single-threaded traces.  If you have multiple contexts/threads, frames
>> are
>> > ill-defined, and it's hard if not impossible to find a range of calls
>> where
>> > it's safe to loop.
>>
>> Could this case be handled by allowing the user to select which context
>> they want to debug?
>>
>
> Potentially.
>
>
>> The multi-context games that I've seen generally have a second context
>> to overlay adds or other content which is not the core workload.  The
>> other example is ChromeOS, which IIRC renders each tab in a separate
>> context.  My hope is that the complex GL workloads on Chrome are
>> benchmarks that can be easily captured/analyzed/optimized on a more
>> accessible and typical platform.
>>
>> Is there another category of multi-context optimization that you think
>> is important?
>>
>
> I was thinking more in terms of debugging than optimization.
>
> Another category is GPU emulators (virtualization, or console emulators,
> Android emulators, etc). But I admit that one is quite niche.
>
>
>> > 4) There are many applications which don't have regular frames (e.g,
>> OpenGL
>> > accelerated UI toolkits, web browsers using GL for composition) they only
>> > render new frames as response to user events, so every frame might end up
>> > being different)
>>
>> I'm not sure how helpful a frame analysis tool would be for these
>> cases.  A system trace that includes both UI inputs and the resulting GPU
>> events would be more helpful in identifying latency.
>>
>
> If the frame analysis tool's only concern is performance (as GPA Frame
> Analyzer's seems to be), then the answer would be: no, not useful.
>
> But if tool is for frame debugging and optimization then yes.
>
> And your initial email you did describe this as a "frame debug/optimization
> tool" -- that's what I've been assuming on my replies.
>
> Even if performance if the major/only use case for this tool, this idea
> seems a good way to speed up state dump lookups, which seems particularly
> useful for debugging. So even if this tool has an independent life as
> optimization tool, I think that theres's merit to get the core concept of
> this in qapitrace so we can speed up the state dump lookups when possible.
>
>
>> > 5) Even "regular" applications might have "irregular" frames  -- e.g.,
>> > maybe the player of a first-person shooter entered a new area, and new
>> > objects were created, old ones deleted -- whereby  replaying that frame
>> in
>> > a loop will lead to corrupted/invalid state.
>>
>> I was trying to think about these cases as well.
>>
>>  * new objects: Retracing this frame would result in apitrace constantly
>>    re-creating these objects, correct?  This would constitute a resource
>>    leak during retrace, but the frame would still render correctly.  It
>>    seems feasible to track resource generation during retrace.
>>
>>  * deletion of old resources: This would constitute a double-deletion on
>>    retrace, which in many cases would be ignored or generate a GL error.
>>    It would be curious for an app to use then delete a resource in a
>>    single frame.  Retrace could skip deletions in the target frame if it
>>    is a problem.
>>
>> I think it is more common for apps to create all the resources in
>> advance, then bind them as needed directly before rendering.
>>
>
> Yes, I think so too.
>
> Also note that, if you omit these create/destroys when replaying, you'll be
> significantly adulterating the performance.
>
> I think that, rather than trying to make it work there, it might be better
> to say,
>
>
>> > In short, the way I see it, this "frame retracing" idea can be indeed
>> speed
>> > up lookups for apitrace users in many circumstances, but it should be
>> > something that users can opt-in/out, so that they can still get work
>> done,
>> > when for one reason or another the assumptions made there just don't
>> hold.
>> >
>> > Of course, if you're goal is just profiling (and not debugging bad
>> > rendering), then it should be always possible to find frames that meet
>> the
>> > assumptions.
>> >
>> >
>> >>
>> >>  * Should a tool like this be built within Apitrace, or should
>> >>    it have it's own repo and link against Apitrace?  Because it
>> >>    communicates through a socket, my POC picks up protocol buffers as
>> >>    a dependency.  There is a bunch of threading/socket/rpc
>> >>    infrastructure that won't apply elsewhere in Apitrace.
>> >>
>> >
>> > I see overlap between what it's been proposed, and what apitrace already
>> > does or should do one day.  In particular:
>> >
>> > - no need for a separate frame retrace daemon executable -- this can
>> easily
>> > be achieved on the existing retrace executables, by adding a new option
>> to
>> > "keep the process alive", which would keep the retrace looping over the
>> > most recent frame, until it get a request to dump an earlier call
>> >
>> > - sockets are not strictly necessary either  -- one could use the stdin
>> > (just like we use stdout for ouput)  (of course, we could one day replace
>> > stdin/out with sockets to cleanly allow retrace in a separate machine,
>> but
>> > it's orthogonal to what's being proposed)
>>
>> I chose sockets to enable the remote machine use case.  Performance
>> analysis is the motivating capability for my team.  It is especially
>> needed on under-powered devices that would struggle running an apitrace
>> UI.  Tablets and other form factors cannot be analyzed without a remote
>> connection.
>>
>
> FYI, https://github.com/apitrace/apitrace/pull/311 already added something
> like for Android. But it would indeed be nice to generalize.
>
>> - I don't think there's place for another gui tool -- something resembles
>> > qapitrace but doesn't completely replace it -- in apitrace tree.  For
>> this
>> > to be merged, the frame retrace UI would have to be fully integrated with
>> > qapitrace, not something on the side.
>>
>> I understand your concerns.  I have some doubts about the use of Qt4
>> widgets for qapitrace, and was planning to build features in qml.  I
>> would like to understand your thoughts on extending qapitrace as Qt
>> moves further away from widgets.
>>
>
> I actually have some concerns with QML precisely due to the use of OpenGL.
>
> But if everybody agrees QML is the future, I have no problems in migrating
> the GUI to it. (In short, I'm flexible on anything except two complete
> disjoint GUIs.)
>
> I also think the json interface between qapitrace and glretrace may be
>> inadequate for a more complex tool.  Hand-rolling a new parser to
>> exchange a new structured data type is tedious and buggy in my
>> experience.
>>
>
> Actually that has been done already: JSON has been replaced with UBJSON on
> master for a few weeks now.  It's even possible to choose the dump format
> (JSON vs UBJSON) as a glretrace command line option, and it wouldn't be
> difficult to add another.
>
>
>> >   In fact everything that works in a "frame retrace"  should work with
>> > "full trace" mode too.  The frame vs full should be a simple switch
>> > somewhere.
>> >
>> > - Editing live state: (e.g, where you say "the user will be able to edit
>> > the bound shaders", etc), but I don't see exactly how one would achieve
>> > that. Currently qapitrace doesn't allow to change state directly, but
>> > rather allow editing the calls that set the state.
>>
>> My plan was to create an api for setting state, which would result in
>> the insertion of new GL calls directly before the target render.
>>
>> The retrace api for setting new shaders would be compile and link them
>> up front, and pass back error state if any.  On retrace, the new program
>> id would be inserted with glUseProgram directly before the render.
>>
>> My contention is that redundant/overlapping state and binding commands
>> have little or no performance impact.
>>
>> > - State tracking inside glretrace: state tracking might seem easy at
>> first
>> > but it's really hard to get right in the general case (and should be
>> > avoided as much as possible since a state tracker doesn't know when the
>> > application emits errors, and can easily diverge).  So, IMO, there should
>> > be only one implementation of state tracking in apitrace, and that should
>> > be in trimming (plus double-duty of x-referencing the trace later on).
>> > Adding a bit of state tracking here, a bit of state tracking there, is a
>> > recipe for introducing bugs and duplicate code in a lot of places.
>>
>> I agree with you completely.  I've written a gles1/2/3 state tracker,
>> and it's not a small task.  I hacked some shader tracking into my
>> prototype because it was a quick way to connect compile time shader
>> assemblies with bound shaders.
>>
>> OTOH, "query all bound state" can take a while at run time, and mixing
>> it with requests for metrics and render targets may result in latency
>> that makes the application unusable.
>>
>> > In short, to have this in apitrace tree would involve a compromise --
>> you'd
>> > compromise some of your goals and flexibility, and in exchange have more
>> > reuse with the rest of apitrace tree, hence less code to write and
>> maintain
>> > by yourself.  But it's really up to you.
>> >
>> >
>> > My goal here is ensure the scope of apitrace stays within manageable
>> > limits, so that the things it can do it can them well.  I'd rather have
>> > slow yet reliable results, than have very quick but "YMMV" like results.
>> I
>> > also can't accept things that would make the existing functionality too
>> > complex, or a maintenance headache.
>>
>> Keeping the mechanism reliable has been a great strategy for apitrace.
>> I have a great deal of respect for the work you've done, and would be
>> very happy if I could produce something that was useful enough to
>> include in apitrace.  Also, collaboration with folks wanting to target
>> other gpus would be much more likely as part of a widely-used project
>> like apitrace.
>>
>
> Yes.
>
>
>>
>> >  * I would like to make use of the metrics work that is being done
>> >>    this summer.  I'm eager to see the proposed metrics abstractions as
>> >>    the details are worked out.
>> >>
>> >>  * Are there more use cases that I should consider, beyond what is
>> >>    described in the wiki?
>> >
>> >
>> > Honestly, unless you command an army of developers, I suspect there's
>> > already too many use cases in there already!  (In particular, the state
>> > tracking/editing as I said above.)
>> >
>> > But if you want one more, something that makes a lot of sense on a frame
>> > analysis for debugging is a pixel history --
>> > https://github.com/apitrace/apitrace/issues/317
>>
>> Pixel history is a popular feature in Frame Analyzer.  I think it would
>> be fairly easy to implement with a frame retracer:
>>
>>   * clear the framebuffer with an unusual color before each render
>>
>
> Unfortunately that won't work with all blend modes (particularly those that
> take the destination alpha to module the source)
>
> But the rest seems sensible.
>
>
>>
>>   * compare all pixels after each render.  Changed pixels go into a bloom
>>     filter for that render.
>
>
>>   * repeat with a second color, if you want to be completely accurate.
>>
>>   * when user requests "select all renders that affected this pixel",
>>     iterate over the bloom filters and check for membership.
>>
>
> The only other difficulty is tracking FBO changes -- they can be quite
> common nowadays.
>
>
>>
>> Pixel history is more valuable when you have an overdraw visualization
>> of the frame buffer (brighter pixel => more gpu cost).  Developers will
>> want to analyze the history of the most expensive pixels.  Overdraw can
>> be built from the same data as pixel history.
>>
>> This is not a feature which is more likely to benefit game developers as
>> compared to driver developers.  I am focusing on the driver use cases
>> first.
>>
>
>
> Right.
>
> Jose