<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 2, 2015 at 1:55 AM, Mark Janes <span dir="ltr"><<a href="mailto:mark.a.janes@intel.com" target="_blank">mark.a.janes@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div>José Fonseca <<a href="mailto:jose.r.fonseca@gmail.com" target="_blank">jose.r.fonseca@gmail.com</a>> writes:<br> <br> > On Fri, May 29, 2015 at 11:50 PM, Mark Janes <<a href="mailto:mark.a.janes@intel.com" target="_blank">mark.a.janes@intel.com</a>> wrote:<br></div></div><span>> 2) The concept of frame is only well-defined on single OpenGL contexts /<br> > single-threaded traces. If you have multiple contexts/threads, frames are<br> > ill-defined, and it's hard if not impossible to find a range of calls where<br> > it's safe to loop.<br> <br> </span>Could this case be handled by allowing the user to select which context<br> they want to debug?<br></blockquote><div><br></div><div>Potentially.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> The multi-context games that I've seen generally have a second context<br> to overlay adds or other content which is not the core workload. The<br> other example is ChromeOS, which IIRC renders each tab in a separate<br> context. My hope is that the complex GL workloads on Chrome are<br> benchmarks that can be easily captured/analyzed/optimized on a more<br> accessible and typical platform.<br> <br> Is there another category of multi-context optimization that you think<br> is important?<br></blockquote><div><br></div><div>I was thinking more in terms of debugging than optimization.</div><div><br></div><div>Another category is GPU emulators (virtualization, or console emulators, Android emulators, etc). But I admit that one is quite niche.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span> > 4) There are many applications which don't have regular frames (e.g, OpenGL<br> > accelerated UI toolkits, web browsers using GL for composition) they only<br> > render new frames as response to user events, so every frame might end up<br> > being different)<br> <br> </span>I'm not sure how helpful a frame analysis tool would be for these<br> cases. A system trace that includes both UI inputs and the resulting GPU<br> events would be more helpful in identifying latency.<br></blockquote><div><br></div><div>If the frame analysis tool's only concern is performance (as GPA Frame Analyzer's seems to be), then the answer would be: no, not useful. </div><div><br></div><div>But if tool is for frame debugging and optimization then yes.</div><div><br></div><div>And your initial email you did describe this as a "frame debug/optimization tool" -- that's what I've been assuming on my replies.</div><div><br></div><div>Even if performance if the major/only use case for this tool, this idea seems a good way to speed up state dump lookups, which seems particularly useful for debugging. So even if this tool has an independent life as optimization tool, I think that theres's merit to get the core concept of this in qapitrace so we can speed up the state dump lookups when possible. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span> > 5) Even "regular" applications might have "irregular" frames -- e.g.,<br> > maybe the player of a first-person shooter entered a new area, and new<br> > objects were created, old ones deleted -- whereby replaying that frame in<br> > a loop will lead to corrupted/invalid state.<br> <br> </span>I was trying to think about these cases as well.<br> <br> * new objects: Retracing this frame would result in apitrace constantly<br> re-creating these objects, correct? This would constitute a resource<br> leak during retrace, but the frame would still render correctly. It<br> seems feasible to track resource generation during retrace.<br> <br> * deletion of old resources: This would constitute a double-deletion on<br> retrace, which in many cases would be ignored or generate a GL error.<br> It would be curious for an app to use then delete a resource in a<br> single frame. Retrace could skip deletions in the target frame if it<br> is a problem.<br> <br> I think it is more common for apps to create all the resources in<br> advance, then bind them as needed directly before rendering.<br></blockquote><div><br></div><div>Yes, I think so too.</div><div><br></div><div>Also note that, if you omit these create/destroys when replaying, you'll be significantly adulterating the performance. </div><div><br></div><div>I think that, rather than trying to make it work there, it might be better to say, </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <span><br> > In short, the way I see it, this "frame retracing" idea can be indeed speed<br> > up lookups for apitrace users in many circumstances, but it should be<br> > something that users can opt-in/out, so that they can still get work done,<br> > when for one reason or another the assumptions made there just don't hold.<br> ><br> > Of course, if you're goal is just profiling (and not debugging bad<br> > rendering), then it should be always possible to find frames that meet the<br> > assumptions.<br> ><br> ><br> >><br> >> * Should a tool like this be built within Apitrace, or should<br> >> it have it's own repo and link against Apitrace? Because it<br> >> communicates through a socket, my POC picks up protocol buffers as<br> >> a dependency. There is a bunch of threading/socket/rpc<br> >> infrastructure that won't apply elsewhere in Apitrace.<br> >><br> ><br> > I see overlap between what it's been proposed, and what apitrace already<br> > does or should do one day. In particular:<br> ><br> > - no need for a separate frame retrace daemon executable -- this can easily<br> > be achieved on the existing retrace executables, by adding a new option to<br> > "keep the process alive", which would keep the retrace looping over the<br> > most recent frame, until it get a request to dump an earlier call<br> ><br> > - sockets are not strictly necessary either -- one could use the stdin<br> > (just like we use stdout for ouput) (of course, we could one day replace<br> > stdin/out with sockets to cleanly allow retrace in a separate machine, but<br> > it's orthogonal to what's being proposed)<br> <br> </span>I chose sockets to enable the remote machine use case. Performance<br> analysis is the motivating capability for my team. It is especially<br> needed on under-powered devices that would struggle running an apitrace<br> UI. Tablets and other form factors cannot be analyzed without a remote<br> connection.<br></blockquote><div><br></div><div>FYI, <a href="https://github.com/apitrace/apitrace/pull/311" target="_blank">https://github.com/apitrace/apitrace/pull/311</a> already added something like for Android. But it would indeed be nice to generalize.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>> - I don't think there's place for another gui tool -- something resembles<br> > qapitrace but doesn't completely replace it -- in apitrace tree. For this<br> > to be merged, the frame retrace UI would have to be fully integrated with<br> > qapitrace, not something on the side.<br> <br> </span>I understand your concerns. I have some doubts about the use of Qt4<br> widgets for qapitrace, and was planning to build features in qml. I<br> would like to understand your thoughts on extending qapitrace as Qt<br> moves further away from widgets.<br></blockquote><div><br></div><div>I actually have some concerns with QML precisely due to the use of OpenGL.</div><div><br></div><div>But if everybody agrees QML is the future, I have no problems in migrating the GUI to it. (In short, I'm flexible on anything except two complete disjoint GUIs.)</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> I also think the json interface between qapitrace and glretrace may be<br> inadequate for a more complex tool. Hand-rolling a new parser to<br> exchange a new structured data type is tedious and buggy in my<br> experience.<br></blockquote><div><br></div><div>Actually that has been done already: JSON has been replaced with UBJSON on master for a few weeks now. It's even possible to choose the dump format (JSON vs UBJSON) as a glretrace command line option, and it wouldn't be difficult to add another.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>> In fact everything that works in a "frame retrace" should work with<br> > "full trace" mode too. The frame vs full should be a simple switch<br> > somewhere.<br> ><br> > - Editing live state: (e.g, where you say "the user will be able to edit<br> > the bound shaders", etc), but I don't see exactly how one would achieve<br> > that. Currently qapitrace doesn't allow to change state directly, but<br> > rather allow editing the calls that set the state.<br> <br> </span>My plan was to create an api for setting state, which would result in<br> the insertion of new GL calls directly before the target render.<br> <br> The retrace api for setting new shaders would be compile and link them<br> up front, and pass back error state if any. On retrace, the new program<br> id would be inserted with glUseProgram directly before the render.<br> <br> My contention is that redundant/overlapping state and binding commands<br> have little or no performance impact.<br> <span><br> > - State tracking inside glretrace: state tracking might seem easy at first<br> > but it's really hard to get right in the general case (and should be<br> > avoided as much as possible since a state tracker doesn't know when the<br> > application emits errors, and can easily diverge). So, IMO, there should<br> > be only one implementation of state tracking in apitrace, and that should<br> > be in trimming (plus double-duty of x-referencing the trace later on).<br> > Adding a bit of state tracking here, a bit of state tracking there, is a<br> > recipe for introducing bugs and duplicate code in a lot of places.<br> <br> </span>I agree with you completely. I've written a gles1/2/3 state tracker,<br> and it's not a small task. I hacked some shader tracking into my<br> prototype because it was a quick way to connect compile time shader<br> assemblies with bound shaders.<br> <br> OTOH, "query all bound state" can take a while at run time, and mixing<br> it with requests for metrics and render targets may result in latency<br> that makes the application unusable.<br> <span><br> > In short, to have this in apitrace tree would involve a compromise -- you'd<br> > compromise some of your goals and flexibility, and in exchange have more<br> > reuse with the rest of apitrace tree, hence less code to write and maintain<br> > by yourself. But it's really up to you.<br> ><br> ><br> > My goal here is ensure the scope of apitrace stays within manageable<br> > limits, so that the things it can do it can them well. I'd rather have<br> > slow yet reliable results, than have very quick but "YMMV" like results. I<br> > also can't accept things that would make the existing functionality too<br> > complex, or a maintenance headache.<br> <br> </span>Keeping the mechanism reliable has been a great strategy for apitrace.<br> I have a great deal of respect for the work you've done, and would be<br> very happy if I could produce something that was useful enough to<br> include in apitrace. Also, collaboration with folks wanting to target<br> other gpus would be much more likely as part of a widely-used project<br> like apitrace.<br></blockquote><div><br></div><div>Yes.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <span><br> > * I would like to make use of the metrics work that is being done<br> >> this summer. I'm eager to see the proposed metrics abstractions as<br> >> the details are worked out.<br> >><br> >> * Are there more use cases that I should consider, beyond what is<br> >> described in the wiki?<br> ><br> ><br> > Honestly, unless you command an army of developers, I suspect there's<br> > already too many use cases in there already! (In particular, the state<br> > tracking/editing as I said above.)<br> ><br> > But if you want one more, something that makes a lot of sense on a frame<br> > analysis for debugging is a pixel history --<br> > <a href="https://github.com/apitrace/apitrace/issues/317" target="_blank">https://github.com/apitrace/apitrace/issues/317</a><br> <br> </span>Pixel history is a popular feature in Frame Analyzer. I think it would<br> be fairly easy to implement with a frame retracer:<br> <br> * clear the framebuffer with an unusual color before each render<br></blockquote><div><br></div><div>Unfortunately that won't work with all blend modes (particularly those that take the destination alpha to module the source) </div><div><br></div><div>But the rest seems sensible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br> * compare all pixels after each render. Changed pixels go into a bloom<br> filter for that render. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br> * repeat with a second color, if you want to be completely accurate.<br> <br> * when user requests "select all renders that affected this pixel",<br> iterate over the bloom filters and check for membership.<br></blockquote><div><br></div><div>The only other difficulty is tracking FBO changes -- they can be quite common nowadays.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br> Pixel history is more valuable when you have an overdraw visualization<br> of the frame buffer (brighter pixel => more gpu cost). Developers will<br> want to analyze the history of the most expensive pixels. Overdraw can<br> be built from the same data as pixel history.<br> <br> This is not a feature which is more likely to benefit game developers as<br> compared to driver developers. I am focusing on the driver use cases<br> first.<br></blockquote><div> </div><div><br></div><div>Right.</div><div><br></div><div>Jose</div></div></div></div>