Apitrace based frame retracer

Mon Jun 1 17:55:21 PDT 2015

José Fonseca <jose.r.fonseca at gmail.com> writes:

> On Fri, May 29, 2015 at 11:50 PM, Mark Janes <mark.a.janes at intel.com> wrote:
>
>> I have spent some time prototyping a frame debug/optimization tool
>> based on Apitrace, and I'd like to get feedback.
>>
>> Apitrace's retrace functionality limits the usability of qapitrace,
>> because it invokes a full retrace whenever qapitrace needs more
>> information.  This creates big delays as the user explores the trace
>> file.
>>
>> Finding bugs and bottlenecks in a complex GPU workload involves
>> exploration and experimentation.  Users need a more interactive
>> experience than qapitrace can provide.
>>
>> I've done some hacking to set up a server process which retraces a
>> trace file to a specified frame, then accepts subsequent retrace
>> requests for renders within the frame.  Because the server process
>> preserves the GL state from previous frames, it can execute any frame
>> retrace request in the time it took the original app to render the
>> frame.
>>
>> My proof-of-concept branch currently displays frame buffer images when
>> a user selects a render, with the minor modification that glClear is
>> called before the render, so the framebuffer only shows pixels which
>> were rendered by the selected call.  Also, it parses shader assemblies
>> from the "INTEL_DEBUG=vs,ps" setting, and displays the IR and assembly
>> for the render.  These features were chosen to be minimally
>> demonstrate the interactivity that can be accomplished with this
>> approach.
>>
>> I've set up a wiki describing my apitrace branch, and the features I'd
>> like to build with a frame retracer:
>>
>> https://github.com/janesma/apitrace/wiki/frameretrace-branch
>>
>> The wiki has some screen shots of the features I listed above.
>>
>> I'd like to get some input on the following:
>>
>
> First of all, I think there's a lot of potential on this idea.
>
> There was a feature request open on batching glretrace dumps --
> https://github.com/apitrace/apitrace/issues/51 -- but I never saw so much
> potential, as I failed to spot the connection batching with looping calls
> within a frame.
>
> Now to specifics...
>
>
>>
>>  * Does anyone see technical issues with this approach?
>>
>
> There are a few to keep in mind:
>
> 1) In the wiki you said:
>
>   "The UI commands the server process to retrace from the beginning of the
> frame to a specified draw call and provide results (framebuffer images,
> bound state, etc) back to the UI. This is generally safe to do because of
> the repeatable nature of GL commands within a frame boundary."
>
> And indeed quite often applications reach a steady state where the calls
> executed in two successive frames are practically indistinguishable, where
> the GL state at the beginning and end of the frame is virtually the same,
> therefore one can play the calls of one particular frame in a loop without
> altering behavior.
>
> But IIUC the current implementation of frameretrace does not do that: after
> dumping the state mid-frame, it will reset the trace position to the
> beggining of the frame.  That will not work -- imagine the calls at the
> beginning of the frame assume blending is disabled, and mid-frame blend is
> enabled -- so if you jump straight from mid-frame to start-of-frame all
> that initial draw calls will misdraw.
>
> There's an easy solution though: always retrace until the end of the frame,
> before reset the trace position.

I was aware of that bug, and have the same understanding of how to fix
it.  Thank you for taking the time to look at the branch as closely as
you did.

>
> 2) The concept of frame is only well-defined on single OpenGL contexts /
> single-threaded traces.  If you have multiple contexts/threads, frames are
> ill-defined, and it's hard if not impossible to find a range of calls where
> it's safe to loop.

Could this case be handled by allowing the user to select which context
they want to debug?

The multi-context games that I've seen generally have a second context
to overlay adds or other content which is not the core workload.  The
other example is ChromeOS, which IIRC renders each tab in a separate
context.  My hope is that the complex GL workloads on Chrome are
benchmarks that can be easily captured/analyzed/optimized on a more
accessible and typical platform.

Is there another category of multi-context optimization that you think
is important?

> 4) There are many applications which don't have regular frames (e.g, OpenGL
> accelerated UI toolkits, web browsers using GL for composition) they only
> render new frames as response to user events, so every frame might end up
> being different)

I'm not sure how helpful a frame analysis tool would be for these
cases.  A system trace that includes both UI inputs and the resulting GPU
events would be more helpful in identifying latency.

> 5) Even "regular" applications might have "irregular" frames  -- e.g.,
> maybe the player of a first-person shooter entered a new area, and new
> objects were created, old ones deleted -- whereby  replaying that frame in
> a loop will lead to corrupted/invalid state.

I was trying to think about these cases as well.

 * new objects: Retracing this frame would result in apitrace constantly
   re-creating these objects, correct?  This would constitute a resource
   leak during retrace, but the frame would still render correctly.  It
   seems feasible to track resource generation during retrace.

 * deletion of old resources: This would constitute a double-deletion on
   retrace, which in many cases would be ignored or generate a GL error.
   It would be curious for an app to use then delete a resource in a
   single frame.  Retrace could skip deletions in the target frame if it
   is a problem.

I think it is more common for apps to create all the resources in
advance, then bind them as needed directly before rendering.

> In short, the way I see it, this "frame retracing" idea can be indeed speed
> up lookups for apitrace users in many circumstances, but it should be
> something that users can opt-in/out, so that they can still get work done,
> when for one reason or another the assumptions made there just don't hold.
>
> Of course, if you're goal is just profiling (and not debugging bad
> rendering), then it should be always possible to find frames that meet the
> assumptions.
>
>
>>
>>  * Should a tool like this be built within Apitrace, or should
>>    it have it's own repo and link against Apitrace?  Because it
>>    communicates through a socket, my POC picks up protocol buffers as
>>    a dependency.  There is a bunch of threading/socket/rpc
>>    infrastructure that won't apply elsewhere in Apitrace.
>>
>
> I see overlap between what it's been proposed, and what apitrace already
> does or should do one day.  In particular:
>
> - no need for a separate frame retrace daemon executable -- this can easily
> be achieved on the existing retrace executables, by adding a new option to
> "keep the process alive", which would keep the retrace looping over the
> most recent frame, until it get a request to dump an earlier call
>
> - sockets are not strictly necessary either  -- one could use the stdin
> (just like we use stdout for ouput)  (of course, we could one day replace
> stdin/out with sockets to cleanly allow retrace in a separate machine, but
> it's orthogonal to what's being proposed)

I chose sockets to enable the remote machine use case.  Performance
analysis is the motivating capability for my team.  It is especially
needed on under-powered devices that would struggle running an apitrace
UI.  Tablets and other form factors cannot be analyzed without a remote
connection.

You are right, I didn't mention remote analysis on the wiki or give a
hint in the prototype that it was a goal.  I skipped it because I didn't
want to deal with transferring the huge trace file between systems.

> - a lot of the stuff
> https://github.com/janesma/apitrace/wiki/frameretrace-use-cases overlaps
> with things qpapitrace does, or we'd like it to do
>
> There a few things being proposed that I have serious reservations though:
>
> - I don't think there's place for another gui tool -- something resembles
> qapitrace but doesn't completely replace it -- in apitrace tree.  For this
> to be merged, the frame retrace UI would have to be fully integrated with
> qapitrace, not something on the side.

I understand your concerns.  I have some doubts about the use of Qt4
widgets for qapitrace, and was planning to build features in qml.  I
would like to understand your thoughts on extending qapitrace as Qt
moves further away from widgets.

I also think the json interface between qapitrace and glretrace may be
inadequate for a more complex tool.  Hand-rolling a new parser to
exchange a new structured data type is tedious and buggy in my
experience.

>   In fact everything that works in a "frame retrace"  should work with
> "full trace" mode too.  The frame vs full should be a simple switch
> somewhere.
>
> - Editing live state: (e.g, where you say "the user will be able to edit
> the bound shaders", etc), but I don't see exactly how one would achieve
> that. Currently qapitrace doesn't allow to change state directly, but
> rather allow editing the calls that set the state.

My plan was to create an api for setting state, which would result in
the insertion of new GL calls directly before the target render.

The retrace api for setting new shaders would be compile and link them
up front, and pass back error state if any.  On retrace, the new program
id would be inserted with glUseProgram directly before the render.

My contention is that redundant/overlapping state and binding commands
have little or no performance impact.

> - State tracking inside glretrace: state tracking might seem easy at first
> but it's really hard to get right in the general case (and should be
> avoided as much as possible since a state tracker doesn't know when the
> application emits errors, and can easily diverge).  So, IMO, there should
> be only one implementation of state tracking in apitrace, and that should
> be in trimming (plus double-duty of x-referencing the trace later on).
> Adding a bit of state tracking here, a bit of state tracking there, is a
> recipe for introducing bugs and duplicate code in a lot of places.

I agree with you completely.  I've written a gles1/2/3 state tracker,
and it's not a small task.  I hacked some shader tracking into my
prototype because it was a quick way to connect compile time shader
assemblies with bound shaders.

OTOH, "query all bound state" can take a while at run time, and mixing
it with requests for metrics and render targets may result in latency
that makes the application unusable.

> In short, to have this in apitrace tree would involve a compromise -- you'd
> compromise some of your goals and flexibility, and in exchange have more
> reuse with the rest of apitrace tree, hence less code to write and maintain
> by yourself.  But it's really up to you.
>
>
> My goal here is ensure the scope of apitrace stays within manageable
> limits, so that the things it can do it can them well.  I'd rather have
> slow yet reliable results, than have very quick but "YMMV" like results. I
> also can't accept things that would make the existing functionality too
> complex, or a maintenance headache.

Keeping the mechanism reliable has been a great strategy for apitrace.
I have a great deal of respect for the work you've done, and would be
very happy if I could produce something that was useful enough to
include in apitrace.  Also, collaboration with folks wanting to target
other gpus would be much more likely as part of a widely-used project
like apitrace.

>  * I would like to make use of the metrics work that is being done
>>    this summer.  I'm eager to see the proposed metrics abstractions as
>>    the details are worked out.
>>
>>  * Are there more use cases that I should consider, beyond what is
>>    described in the wiki?
>
>
> Honestly, unless you command an army of developers, I suspect there's
> already too many use cases in there already!  (In particular, the state
> tracking/editing as I said above.)
>
> But if you want one more, something that makes a lot of sense on a frame
> analysis for debugging is a pixel history --
> https://github.com/apitrace/apitrace/issues/317

Pixel history is a popular feature in Frame Analyzer.  I think it would
be fairly easy to implement with a frame retracer:

  * clear the framebuffer with an unusual color before each render

  * compare all pixels after each render.  Changed pixels go into a bloom
    filter for that render.

  * repeat with a second color, if you want to be completely accurate.

  * when user requests "select all renders that affected this pixel",
    iterate over the bloom filters and check for membership.

Pixel history is more valuable when you have an overdraw visualization
of the frame buffer (brighter pixel => more gpu cost).  Developers will
want to analyze the history of the most expensive pixels.  Overdraw can
be built from the same data as pixel history.

This is not a feature which is more likely to benefit game developers as
compared to driver developers.  I am focusing on the driver use cases
first.

> BTW, I think we should devise an Mesa specific GL extension to extract GLSL
> IR & HW assembly in a clean fashion, instead of parsing driver debug output.

Your idea has been suggested by mesa developers also, and I'm hopeful
that we can get a reliable mechanism in place, at least for Mesa.  An
early goal of this effort is to compare shader assembly from mesa with
Intel's GL driver for Windows.  That driver is unlikely to change, so I
need to abstract and encapsulate the whatever hacks are necessary.

>
>
> Jose