<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 26/12/17 05:32, Sagar Arun Kamble wrote:<br> </div> <blockquote type="cite" cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <p><br> </p> <br> <div class="moz-cite-prefix">On 12/22/2017 3:46 PM, Lionel Landwerlin wrote:<br> </div> <blockquote type="cite" cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> <div class="moz-cite-prefix">On 22/12/17 09:30, Sagar Arun Kamble wrote:<br> </div> <blockquote type="cite" cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com"> <p><br> </p> <br> <div class="moz-cite-prefix">On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:<br> </div> <blockquote type="cite" cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com"> <div class="moz-cite-prefix">Some more findings I made while playing with this series & GPUTop.<br> Turns out the 2ms drift per second is due to timecounter. Adding the delta this way :<br> <br> <a class="moz-txt-link-freetext" href="https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607" moz-do-not-send="true">https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607</a><br> <br> Eliminates the drift.</div> </blockquote> I see two imp. changes 1. approximation of start time during init_timecounter 2. overflow handling in delta accumulation.<br> With these incorporated, I guess timecounter should also work in same fashion.<br> </blockquote> <br> I think the arithmetic in timecounter is inherently lossy and that's why we're seeing a drift.</blockquote> Could you share details about platform, scenario in which 2ms drift per second is being seen with timecounter.<br> I did not observe this on SKL.<br> </blockquote> <br> The 2ms drift was on SKL GT4.<br> <br> With the patch above, I'm seeing only a ~40us drift over ~7seconds of recording both perf tracepoints & i915 perf reports.<br> I'm tracking the kernel tracepoints adding gem requests and the i915 perf reports.<br> Here a screenshot at the beginning of the 7s recording : <a class="moz-txt-link-freetext" href="https://i.imgur.com/hnexgjQ.png">https://i.imgur.com/hnexgjQ.png</a> (you can see the gem request add before the work starts in the i915 perf reports).<br> At the end of the recording, the gem requests appear later than the work in the i915 perf report : <a class="moz-txt-link-freetext" href="https://i.imgur.com/oCd0C9T.png">https://i.imgur.com/oCd0C9T.png</a><br> <br> I'll try to prepare some IGT tests that show the drift using perf & i915 perf, so we can run those on different platforms.<br> I tend to mostly test on a SKL GT4 & KBL GT2, but BXT definitely needs more attention...<br> <br> <blockquote type="cite" cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com"> <blockquote type="cite" cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> Could we be using it wrong?<br> <br> </blockquote> if we use two changes highlighted above with timecounter maybe we will get same results as your current implementation.<br> <blockquote type="cite" cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> In the patch above, I think there is still a drift because of the potential fractional part loss at every delta we add.<br> But it should only be a fraction of a nanosecond multiplied by the number of reports over a period of time.<br> With a report every 1us, that should still be much less than a 1ms of drift over 1s.<br> <br> </blockquote> timecounter interface takes care of fractional parts so that should help us.<br> we can either go with timecounter or our own implementation provided conversions are precise.<br> </blockquote> <br> Looking at clocks_calc_mult_shift(), it seems clear to me that there is less precision when using timecounter :<br> <br> /*<br> * Find the conversion shift/mult pair which has the best<br> * accuracy and fits the maxsec conversion range:<br> */<br> <br> On the other hand, there is a performance penalty for doing a div64 for every report.<br> <br> <blockquote type="cite" cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com"> <blockquote type="cite" cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> We can probably do better by always computing the clock using the entire delta rather than the accumulated delta.<br> <br> </blockquote> issue is that the reported clock cycles in the OA report is 32bits LSB of GPU TS whereas counter is 36bits. Hence we will need to<br> accumulate the delta. ofc there is assumption that two reports can't be spaced with count value of 0xffffffff apart.<br> </blockquote> <br> You're right :)<br> I thought maybe we could do this : <br> <br> Look at teduhe opening period parameter, if it's superior to the period of timestamps wrapping, make sure we schle some work on kernel context to generate a context switch report (like at least once every 6 minutes on gen9).<br> <br> <blockquote type="cite" cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com"> <blockquote type="cite" cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> <blockquote type="cite" cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com"> <blockquote type="cite" cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com"> <div class="moz-cite-prefix"> Timelines of perf i915 tracepoints & OA reports now make a lot more sense.<br> <br> There is still the issue that reading the CPU clock & the RCS timestamp is inherently not atomic. So there is a delta there.<br> I think we should add a new i915 perf record type to express the delta that we measure this way :<br> <br> <a class="moz-txt-link-freetext" href="https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475" moz-do-not-send="true">https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475</a><br> <br> So that userspace knows there might be a global offset between the 2 times and is able to present it.<br> </div> </blockquote> agree on this. Delta ns1-ns0 can be interpreted as max drift.<br> <blockquote type="cite" cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com"> <div class="moz-cite-prefix"> Measurement on my KBL system were in the order of a few microseconds (~30us).<br> I guess we might be able to setup the correlation point better (masking interruption?) to reduce the delta.<br> </div> </blockquote> already using spin_lock. Do you mean NMI?<br> </blockquote> <br> I don't actually know much on this point.<br> if spin_lock is the best we can do, then that's it :)<br> <br> <blockquote type="cite" cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com"> <blockquote type="cite" cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com"> <div class="moz-cite-prefix"> <br> Thanks,<br> <br> -<br> Lionel<br> <br> <br> On 07/12/17 00:57, Robert Bragg wrote:<br> </div> <blockquote type="cite" cite="mid:CAMou1-2Z7=A_GBcD9a5AvjRGM3_bG-ezoZJnGYvXkrCqqrmT1w@mail.gmail.com"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <span dir="ltr"><<a href="mailto:robert@sixbynine.org" target="_blank" moz-do-not-send="true">robert@sixbynine.org</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"><br> </div> </blockquote> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> at least from what I wrote back then it looks like I was seeing a drift of a few milliseconds per second on SKL. I vaguely recall it being much worse given the frequency constants we had for Haswell.<br> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>Sorry I didn't actually re-read my own message properly before referencing it :) Apparently the 2ms per second drift was for Haswell, so presumably not quite so bad for SKL. <br> </div> <div><br> </div> <div>- Robert<br> </div> </div> <br> </div> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Intel-gfx mailing list <a class="moz-txt-link-abbreviated" href="mailto:Intel-gfx@lists.freedesktop.org" moz-do-not-send="true">Intel-gfx@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/intel-gfx</a> </pre> </blockquote> <p><br> </p> </blockquote> <br> </blockquote> <p><br> </p> </blockquote> <br> </blockquote> <p><br> </p> </body> </html>