<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 2, 2015 at 1:55 AM, Mark Janes <span dir="ltr"><<a href="mailto:mark.a.janes@intel.com" target="_blank">mark.a.janes@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div>José Fonseca <<a href="mailto:jose.r.fonseca@gmail.com" target="_blank">jose.r.fonseca@gmail.com</a>> writes:<br>
<br>
> On Fri, May 29, 2015 at 11:50 PM, Mark Janes <<a href="mailto:mark.a.janes@intel.com" target="_blank">mark.a.janes@intel.com</a>> wrote:<br></div></div><span>> 2) The concept of frame is only well-defined on single OpenGL contexts /<br>
> single-threaded traces. If you have multiple contexts/threads, frames are<br>
> ill-defined, and it's hard if not impossible to find a range of calls where<br>
> it's safe to loop.<br>
<br>
</span>Could this case be handled by allowing the user to select which context<br>
they want to debug?<br></blockquote><div><br></div><div>Potentially.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
The multi-context games that I've seen generally have a second context<br>
to overlay adds or other content which is not the core workload. The<br>
other example is ChromeOS, which IIRC renders each tab in a separate<br>
context. My hope is that the complex GL workloads on Chrome are<br>
benchmarks that can be easily captured/analyzed/optimized on a more<br>
accessible and typical platform.<br>
<br>
Is there another category of multi-context optimization that you think<br>
is important?<br></blockquote><div><br></div><div>I was thinking more in terms of debugging than optimization.</div><div><br></div><div>Another category is GPU emulators (virtualization, or console emulators, Android emulators, etc). But I admit that one is quite niche.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>
> 4) There are many applications which don't have regular frames (e.g, OpenGL<br>
> accelerated UI toolkits, web browsers using GL for composition) they only<br>
> render new frames as response to user events, so every frame might end up<br>
> being different)<br>
<br>
</span>I'm not sure how helpful a frame analysis tool would be for these<br>
cases. A system trace that includes both UI inputs and the resulting GPU<br>
events would be more helpful in identifying latency.<br></blockquote><div><br></div><div>If the frame analysis tool's only concern is performance (as GPA Frame Analyzer's seems to be), then the answer would be: no, not useful. </div><div><br></div><div>But if tool is for frame debugging and optimization then yes.</div><div><br></div><div>And your initial email you did describe this as a "frame debug/optimization tool" -- that's what I've been assuming on my replies.</div><div><br></div><div>Even if performance if the major/only use case for this tool, this idea seems a good way to speed up state dump lookups, which seems particularly useful for debugging. So even if this tool has an independent life as optimization tool, I think that theres's merit to get the core concept of this in qapitrace so we can speed up the state dump lookups when possible. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>
> 5) Even "regular" applications might have "irregular" frames -- e.g.,<br>
> maybe the player of a first-person shooter entered a new area, and new<br>
> objects were created, old ones deleted -- whereby replaying that frame in<br>
> a loop will lead to corrupted/invalid state.<br>
<br>
</span>I was trying to think about these cases as well.<br>
<br>
* new objects: Retracing this frame would result in apitrace constantly<br>
re-creating these objects, correct? This would constitute a resource<br>
leak during retrace, but the frame would still render correctly. It<br>
seems feasible to track resource generation during retrace.<br>
<br>
* deletion of old resources: This would constitute a double-deletion on<br>
retrace, which in many cases would be ignored or generate a GL error.<br>
It would be curious for an app to use then delete a resource in a<br>
single frame. Retrace could skip deletions in the target frame if it<br>
is a problem.<br>
<br>
I think it is more common for apps to create all the resources in<br>
advance, then bind them as needed directly before rendering.<br></blockquote><div><br></div><div>Yes, I think so too.</div><div><br></div><div>Also note that, if you omit these create/destroys when replaying, you'll be significantly adulterating the performance. </div><div><br></div><div>I think that, rather than trying to make it work there, it might be better to say, </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span><br>
> In short, the way I see it, this "frame retracing" idea can be indeed speed<br>
> up lookups for apitrace users in many circumstances, but it should be<br>
> something that users can opt-in/out, so that they can still get work done,<br>
> when for one reason or another the assumptions made there just don't hold.<br>
><br>
> Of course, if you're goal is just profiling (and not debugging bad<br>
> rendering), then it should be always possible to find frames that meet the<br>
> assumptions.<br>
><br>
><br>
>><br>
>> * Should a tool like this be built within Apitrace, or should<br>
>> it have it's own repo and link against Apitrace? Because it<br>
>> communicates through a socket, my POC picks up protocol buffers as<br>
>> a dependency. There is a bunch of threading/socket/rpc<br>
>> infrastructure that won't apply elsewhere in Apitrace.<br>
>><br>
><br>
> I see overlap between what it's been proposed, and what apitrace already<br>
> does or should do one day. In particular:<br>
><br>
> - no need for a separate frame retrace daemon executable -- this can easily<br>
> be achieved on the existing retrace executables, by adding a new option to<br>
> "keep the process alive", which would keep the retrace looping over the<br>
> most recent frame, until it get a request to dump an earlier call<br>
><br>
> - sockets are not strictly necessary either -- one could use the stdin<br>
> (just like we use stdout for ouput) (of course, we could one day replace<br>
> stdin/out with sockets to cleanly allow retrace in a separate machine, but<br>
> it's orthogonal to what's being proposed)<br>
<br>
</span>I chose sockets to enable the remote machine use case. Performance<br>
analysis is the motivating capability for my team. It is especially<br>
needed on under-powered devices that would struggle running an apitrace<br>
UI. Tablets and other form factors cannot be analyzed without a remote<br>
connection.<br></blockquote><div><br></div><div>FYI, <a href="https://github.com/apitrace/apitrace/pull/311" target="_blank">https://github.com/apitrace/apitrace/pull/311</a> already added something like for Android. But it would indeed be nice to generalize.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>> - I don't think there's place for another gui tool -- something resembles<br>
> qapitrace but doesn't completely replace it -- in apitrace tree. For this<br>
> to be merged, the frame retrace UI would have to be fully integrated with<br>
> qapitrace, not something on the side.<br>
<br>
</span>I understand your concerns. I have some doubts about the use of Qt4<br>
widgets for qapitrace, and was planning to build features in qml. I<br>
would like to understand your thoughts on extending qapitrace as Qt<br>
moves further away from widgets.<br></blockquote><div><br></div><div>I actually have some concerns with QML precisely due to the use of OpenGL.</div><div><br></div><div>But if everybody agrees QML is the future, I have no problems in migrating the GUI to it. (In short, I'm flexible on anything except two complete disjoint GUIs.)</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I also think the json interface between qapitrace and glretrace may be<br>
inadequate for a more complex tool. Hand-rolling a new parser to<br>
exchange a new structured data type is tedious and buggy in my<br>
experience.<br></blockquote><div><br></div><div>Actually that has been done already: JSON has been replaced with UBJSON on master for a few weeks now. It's even possible to choose the dump format (JSON vs UBJSON) as a glretrace command line option, and it wouldn't be difficult to add another.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>> In fact everything that works in a "frame retrace" should work with<br>
> "full trace" mode too. The frame vs full should be a simple switch<br>
> somewhere.<br>
><br>
> - Editing live state: (e.g, where you say "the user will be able to edit<br>
> the bound shaders", etc), but I don't see exactly how one would achieve<br>
> that. Currently qapitrace doesn't allow to change state directly, but<br>
> rather allow editing the calls that set the state.<br>
<br>
</span>My plan was to create an api for setting state, which would result in<br>
the insertion of new GL calls directly before the target render.<br>
<br>
The retrace api for setting new shaders would be compile and link them<br>
up front, and pass back error state if any. On retrace, the new program<br>
id would be inserted with glUseProgram directly before the render.<br>
<br>
My contention is that redundant/overlapping state and binding commands<br>
have little or no performance impact.<br>
<span><br>
> - State tracking inside glretrace: state tracking might seem easy at first<br>
> but it's really hard to get right in the general case (and should be<br>
> avoided as much as possible since a state tracker doesn't know when the<br>
> application emits errors, and can easily diverge). So, IMO, there should<br>
> be only one implementation of state tracking in apitrace, and that should<br>
> be in trimming (plus double-duty of x-referencing the trace later on).<br>
> Adding a bit of state tracking here, a bit of state tracking there, is a<br>
> recipe for introducing bugs and duplicate code in a lot of places.<br>
<br>
</span>I agree with you completely. I've written a gles1/2/3 state tracker,<br>
and it's not a small task. I hacked some shader tracking into my<br>
prototype because it was a quick way to connect compile time shader<br>
assemblies with bound shaders.<br>
<br>
OTOH, "query all bound state" can take a while at run time, and mixing<br>
it with requests for metrics and render targets may result in latency<br>
that makes the application unusable.<br>
<span><br>
> In short, to have this in apitrace tree would involve a compromise -- you'd<br>
> compromise some of your goals and flexibility, and in exchange have more<br>
> reuse with the rest of apitrace tree, hence less code to write and maintain<br>
> by yourself. But it's really up to you.<br>
><br>
><br>
> My goal here is ensure the scope of apitrace stays within manageable<br>
> limits, so that the things it can do it can them well. I'd rather have<br>
> slow yet reliable results, than have very quick but "YMMV" like results. I<br>
> also can't accept things that would make the existing functionality too<br>
> complex, or a maintenance headache.<br>
<br>
</span>Keeping the mechanism reliable has been a great strategy for apitrace.<br>
I have a great deal of respect for the work you've done, and would be<br>
very happy if I could produce something that was useful enough to<br>
include in apitrace. Also, collaboration with folks wanting to target<br>
other gpus would be much more likely as part of a widely-used project<br>
like apitrace.<br></blockquote><div><br></div><div>Yes.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span><br>
> * I would like to make use of the metrics work that is being done<br>
>> this summer. I'm eager to see the proposed metrics abstractions as<br>
>> the details are worked out.<br>
>><br>
>> * Are there more use cases that I should consider, beyond what is<br>
>> described in the wiki?<br>
><br>
><br>
> Honestly, unless you command an army of developers, I suspect there's<br>
> already too many use cases in there already! (In particular, the state<br>
> tracking/editing as I said above.)<br>
><br>
> But if you want one more, something that makes a lot of sense on a frame<br>
> analysis for debugging is a pixel history --<br>
> <a href="https://github.com/apitrace/apitrace/issues/317" target="_blank">https://github.com/apitrace/apitrace/issues/317</a><br>
<br>
</span>Pixel history is a popular feature in Frame Analyzer. I think it would<br>
be fairly easy to implement with a frame retracer:<br>
<br>
* clear the framebuffer with an unusual color before each render<br></blockquote><div><br></div><div>Unfortunately that won't work with all blend modes (particularly those that take the destination alpha to module the source) </div><div><br></div><div>But the rest seems sensible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
* compare all pixels after each render. Changed pixels go into a bloom<br>
filter for that render. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
* repeat with a second color, if you want to be completely accurate.<br>
<br>
* when user requests "select all renders that affected this pixel",<br>
iterate over the bloom filters and check for membership.<br></blockquote><div><br></div><div>The only other difficulty is tracking FBO changes -- they can be quite common nowadays.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Pixel history is more valuable when you have an overdraw visualization<br>
of the frame buffer (brighter pixel => more gpu cost). Developers will<br>
want to analyze the history of the most expensive pixels. Overdraw can<br>
be built from the same data as pixel history.<br>
<br>
This is not a feature which is more likely to benefit game developers as<br>
compared to driver developers. I am focusing on the driver use cases<br>
first.<br></blockquote><div> </div><div><br></div><div>Right.</div><div><br></div><div>Jose</div></div></div></div>