<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 12/28/2017 10:43 PM, Lionel
      Landwerlin wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:ed173123-6d6b-9231-bbbc-4d5094c42c57@intel.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">On 26/12/17 05:32, Sagar Arun Kamble
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com">
        <p><br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 12/22/2017 3:46 PM, Lionel
          Landwerlin wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com">
          <div class="moz-cite-prefix">On 22/12/17 09:30, Sagar Arun
            Kamble wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com">
            <p><br>
            </p>
            <br>
            <div class="moz-cite-prefix">On 12/21/2017 6:29 PM, Lionel
              Landwerlin wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com">
              <div class="moz-cite-prefix">Some more findings I made
                while playing with this series & GPUTop.<br>
                Turns out the 2ms drift per second is due to
                timecounter. Adding the delta this way :<br>
                <br>
                <a class="moz-txt-link-freetext"
href="https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607"
                  moz-do-not-send="true">https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607</a><br>
                <br>
                Eliminates the drift.</div>
            </blockquote>
            I see two imp. changes 1. approximation of start time during
            init_timecounter 2. overflow handling in delta accumulation.<br>
            With these incorporated, I guess timecounter should also
            work in same fashion.<br>
          </blockquote>
          <br>
          I think the arithmetic in timecounter is inherently lossy and
          that's why we're seeing a drift.</blockquote>
        Could you share details about platform, scenario in which 2ms
        drift per second is being seen with timecounter.<br>
        I did not observe this on SKL.<br>
      </blockquote>
      <br>
      The 2ms drift was on SKL GT4.<br>
      <br>
    </blockquote>
    I have checked the timecounter arithmetic. Accuracy is very high (of
    the order of micro ns per tick).<br>
    I interpreted maxsec parameter in calculation of mult/shift using
    clocks_calc_mult_shift function as total time covered by counter<br>
    but actually it controls the conversion accuracy. Since we want best
    possible accuracy passing zero should be preferred there.<br>
    For instance below are the mult/shift values and time reported for
    10 minutes with these values for SKL GT2 12mhz.<br>
    As you can see drift due to calculation is only about 2us. We should
    check by passing zero to clocks_calc_mult_shift and<br>
    delta handling new added with timecounter on SKL GT4. 2ms is huge
    drift and it is very unlikely related to these calculations.<br>
    <br>
    maxsec, mult, shift, tick time (mult/2^shift), total time
    (10*60*12000000 * tick time), drift due to calculation<br>
    0, 2796202667, 25, 83.33333334326, 600,000,000,071.525, 71ns<br>
    3000, 174762667, 21, 83.33333349227, 600,000,001,144.409, 1144ns<br>
    6000, 87381333, 20, 83.33333301544, 599,999,997,711.181, 2289ns<br>
    <blockquote type="cite"
      cite="mid:ed173123-6d6b-9231-bbbc-4d5094c42c57@intel.com"> With
      the patch above, I'm seeing only a ~40us drift over ~7seconds of
      recording both perf tracepoints & i915 perf reports.<br>
      I'm tracking the kernel tracepoints adding gem requests and the
      i915 perf reports.<br>
      Here a screenshot at the beginning of the 7s recording : <a
        class="moz-txt-link-freetext"
        href="https://i.imgur.com/hnexgjQ.png" moz-do-not-send="true">https://i.imgur.com/hnexgjQ.png</a>
      (you can see the gem request add before the work starts in the
      i915 perf reports).<br>
      At the end of the recording, the gem requests appear later than
      the work in the i915 perf report : <a
        class="moz-txt-link-freetext"
        href="https://i.imgur.com/oCd0C9T.png" moz-do-not-send="true">https://i.imgur.com/oCd0C9T.png</a><br>
      <br>
    </blockquote>
    Looks like we need to have error margin of only few microseconds :)
    <br>
    <blockquote type="cite"
      cite="mid:ed173123-6d6b-9231-bbbc-4d5094c42c57@intel.com"> I'll
      try to prepare some IGT tests that show the drift using perf &
      i915 perf, so we can run those on different platforms.<br>
      I tend to mostly test on a SKL GT4 & KBL GT2, but BXT
      definitely needs more attention...<br>
      <br>
      <blockquote type="cite"
        cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com">
        <blockquote type="cite"
          cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com">
          Could we be using it wrong?<br>
          <br>
        </blockquote>
        if we use two changes highlighted above with timecounter maybe
        we will get same results as your current implementation.<br>
        <blockquote type="cite"
          cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> In
          the patch above, I think there is still a drift because of the
          potential fractional part loss at every delta we add.<br>
          But it should only be a fraction of a nanosecond multiplied by
          the number of reports over a period of time.<br>
          With a report every 1us, that should still be much less than a
          1ms of drift over 1s.<br>
          <br>
        </blockquote>
        timecounter interface takes care of fractional parts so that
        should help us.<br>
        we can either go with timecounter or our own implementation
        provided conversions are precise.<br>
      </blockquote>
      <br>
      Looking at clocks_calc_mult_shift(), it seems clear to me that
      there is less precision when using timecounter :<br>
      <br>
       /*<br>
        * Find the conversion shift/mult pair which has the best<br>
        * accuracy and fits the maxsec conversion range:<br>
        */<br>
      <br>
    </blockquote>
    We can improve upon this by passing zero as maxsec to
    clocks_calc_mult_shift.<br>
    <blockquote type="cite"
      cite="mid:ed173123-6d6b-9231-bbbc-4d5094c42c57@intel.com"> On the
      other hand, there is a performance penalty for doing a div64 for
      every report.<br>
      <br>
      <blockquote type="cite"
        cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com">
        <blockquote type="cite"
          cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com"> We
          can probably do better by always computing the clock using the
          entire delta rather than the accumulated delta.<br>
          <br>
        </blockquote>
        issue is that the reported clock cycles in the OA report is
        32bits LSB of GPU TS whereas counter is 36bits. Hence we will
        need to<br>
        accumulate the delta. ofc there is assumption that two reports
        can't be spaced with count value of 0xffffffff apart.<br>
      </blockquote>
      <br>
      You're right :)<br>
      I thought maybe we could do this : <br>
      <br>
      Look at teduhe opening period parameter, if it's superior to the
      period of timestamps wrapping, make sure we schle some work on
      kernel context to generate a context switch report (like at least
      once every 6 minutes on gen9).<br>
      <br>
    </blockquote>
    Looks fine to me.<br>
    <blockquote type="cite"
      cite="mid:ed173123-6d6b-9231-bbbc-4d5094c42c57@intel.com">
      <blockquote type="cite"
        cite="mid:04eca028-3705-5a28-b500-089ca19e712c@intel.com">
        <blockquote type="cite"
          cite="mid:30446e83-3d0f-ae39-5c0d-a23bc4c89557@intel.com">
          <blockquote type="cite"
            cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com">
            <blockquote type="cite"
              cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com">
              <div class="moz-cite-prefix"> Timelines of perf i915
                tracepoints & OA reports now make a lot more sense.<br>
                <br>
                There is still the issue that reading the CPU clock
                & the RCS timestamp is inherently not atomic. So
                there is a delta there.<br>
                I think we should add a new i915 perf record type to
                express the delta that we measure this way :<br>
                <br>
                <a class="moz-txt-link-freetext"
href="https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475"
                  moz-do-not-send="true">https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475</a><br>
                <br>
                So that userspace knows there might be a global offset
                between the 2 times and is able to present it.<br>
              </div>
            </blockquote>
            agree on this. Delta ns1-ns0 can be interpreted as max
            drift.<br>
            <blockquote type="cite"
              cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com">
              <div class="moz-cite-prefix"> Measurement on my KBL system
                were in the order of a few microseconds (~30us).<br>
                I guess we might be able to setup the correlation point
                better (masking interruption?) to reduce the delta.<br>
              </div>
            </blockquote>
            already using spin_lock. Do you mean NMI?<br>
          </blockquote>
          <br>
          I don't actually know much on this point.<br>
          if spin_lock is the best we can do, then that's it :)<br>
          <br>
          <blockquote type="cite"
            cite="mid:523e5349-2db7-747a-bea0-774227913592@intel.com">
            <blockquote type="cite"
              cite="mid:62813062-eba1-0fa2-1959-6abf19e3dcae@intel.com">
              <div class="moz-cite-prefix"> <br>
                Thanks,<br>
                <br>
                -<br>
                Lionel<br>
                <br>
                <br>
                On 07/12/17 00:57, Robert Bragg wrote:<br>
              </div>
              <blockquote type="cite"
cite="mid:CAMou1-2Z7=A_GBcD9a5AvjRGM3_bG-ezoZJnGYvXkrCqqrmT1w@mail.gmail.com">
                <div dir="ltr"><br>
                  <div class="gmail_extra"><br>
                    <div class="gmail_quote">On Thu, Dec 7, 2017 at
                      12:48 AM, Robert Bragg <span dir="ltr"><<a
                          href="mailto:robert@sixbynine.org"
                          target="_blank" moz-do-not-send="true">robert@sixbynine.org</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr"><br>
                        </div>
                      </blockquote>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div> at least from what I wrote back then
                                it looks like I was seeing a drift of a
                                few milliseconds per second on SKL. I
                                vaguely recall it being much worse given
                                the frequency constants we had for
                                Haswell.<br>
                              </div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                      <div>Sorry I didn't actually re-read my own
                        message properly before referencing it :)
                        Apparently the 2ms per second drift was for
                        Haswell, so presumably not quite so bad for SKL.
                        <br>
                      </div>
                      <div><br>
                      </div>
                      <div>- Robert<br>
                      </div>
                    </div>
                    <br>
                  </div>
                </div>
                <br>
                <fieldset class="mimeAttachmentHeader"></fieldset>
                <br>
                <pre wrap="">_______________________________________________
Intel-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Intel-gfx@lists.freedesktop.org" moz-do-not-send="true">Intel-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/intel-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/intel-gfx</a>
</pre>
              </blockquote>
              <p><br>
              </p>
            </blockquote>
            <br>
          </blockquote>
          <p><br>
          </p>
        </blockquote>
        <br>
      </blockquote>
      <p><br>
      </p>
    </blockquote>
    <br>
  </body>
</html>