<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Nice experiment, which is exactly SW scheduler can provide.<br>
    And as you said "<span style="font-family:monospace,monospace">I.e.
      your context can be scheduled into the<br>
      HW queue ahead of any other context, but everything already
      commited<br>
      to the HW queue is executed in strict FIFO order.</span>"<br>
    <br>
    If you want to keep <span style="font-family:monospace,monospace">consistent</span>
    latency, which will need to enable hw priority queue feature.<br>
    <br>
    Regards,<br>
    David Zhou<br>
    <br>
    <div class="moz-cite-prefix">On 2016年12月24日 06:20, Andres Rodriguez
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAFQ_0eHg=Kf5qV50cgm51m6bTcMYdkgRXkT-sykJnYNzu3Zzsg@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div>
          <div><span style="font-family:monospace,monospace">Hey John,<br>
              <br>
            </span></div>
          <span style="font-family:monospace,monospace">I've collected
            bit of data using high priority SW scheduler queues,<br>
          </span></div>
        <div><span style="font-family:monospace,monospace">thought you
            might be interested.<br>
          </span></div>
        <div><span style="font-family:monospace,monospace"><br>
            Implementation as per the patch above.<br>
            <br>
            Control test 1<br>
            ==============<br>
            <br>
            Sascha Willems mesh sample running on its own at regular
            priority<br>
            <br>
            Results<br>
            -------<br>
            <br>
            Mesh: ~0.14ms per-frame latency<br>
            <br>
            Control test 2<br>
            ==============<br>
            <br>
            Two Sascha Willems mesh sample running on its own at regular
            priority<br>
            <br>
            Results<br>
            -------<br>
            <br>
            Mesh 1: ~0.26ms per-frame latency<br>
            Mesh 2: ~0.26ms per-frame latency<br>
            <br>
            Test 1<br>
            ======<br>
            <br>
            Two Sascha Willems mesh samples running simultaneously. One
            at high<br>
            priority and the other running in a regular priority
            graphics context.<br>
            <br>
            Results<br>
            -------<br>
            <br>
            Mesh High:    0.14 - 0.24ms per-frame latency<br>
            Mesh Regular: 0.24 - 0.40ms per-frame latency<br>
            <br>
            Test 2<br>
            ======<br>
            <br>
            Ten Sascha Willems mesh samples running simultaneously. One
            at high<br>
            priority and the others running in a regular priority
            graphics context.<br>
            <br>
            Results<br>
            -------<br>
            <br>
            Mesh High:    0.14 - 0.8ms per-frame latency<br>
            Mesh Regular: 1.10 - 2.05ms per-frame latency<br>
            <br>
            Test 3<br>
            ======<br>
            <br>
            Two Sascha Willems mesh samples running simultaneously. One
            at high<br>
            priority and the other running in a regular priority
            graphics context.<br>
            <br>
            Also running Unigine Heaven at Exteme preset @ 2560x1600<br>
            <br>
            Results<br>
            -------<br>
            <br>
            Mesh High:     7 - 100ms per-frame latency </span><span
            style="font-family:monospace,monospace"><span
              style="font-family:monospace,monospace"><span
                style="font-family:monospace,monospace">(Lots of
                fluctuation)</span></span><br>
            Mesh Regular: 40 - 130ms per-frame latency</span><span
            style="font-family:monospace,monospace"><span
              style="font-family:monospace,monospace"></span><span
              style="font-family:monospace,monospace"><span
                style="font-family:monospace,monospace"><span
                  style="font-family:monospace,monospace"> (Lots of
                  fluctuation)<br>
                </span></span></span>Unigine Heaven: 20-40 fps<br>
            <br>
          </span><br>
          <span style="font-family:monospace,monospace"><span
              style="font-family:monospace,monospace">Test 4<br>
              ======<br>
              <br>
              Two Sascha Willems mesh samples running simultaneously.
              One at high<br>
              priority and the other running in a regular priority
              graphics context.<br>
              <br>
              Also running Talos Principle @ 4K<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh High:    0.14 - 3.97ms per-frame latency (Mostly
              floats ~0.4ms)<br>
              Mesh Regular: 0.43 - 8.11ms per-frame latency (Lots of
              fluctuation)<br>
              Talos: 24.8 fps AVG</span><br>
            <br>
            Observations<br>
            ============<br>
            <br>
            The high priority queue based on the SW scheduler provides
            significant<br>
            gains when paired with tasks that submit short duration
            commands into<br>
            the queue. This can be observed in tests 1 and 2.<br>
            <br>
            When the pipe is full of long running commands, the effects
            are dampened.<br>
            As observed in test 3, the per-frame latency suffers very
            large spikes,<br>
            and the latencies are very inconsistent.<br>
            <br>
            Talos seems to be a better behaved game. It may be
            submitting shorter<br>
            draw commands and the SW scheduler is able to interleave the
            rest of<br>
            the work.<br>
            <br>
            The results seem consistent with the hypothetical advantages
            the SW<br>
            scheduler should provide. I.e. your context can be scheduled
            into the<br>
            HW queue ahead of any other context, but everything already
            commited<br>
            to the HW queue is executed in strict FIFO order.<br>
            <br>
          </span></div>
        <div><span style="font-family:monospace,monospace">In order to
            deal with cases similar to Test 3, we will need to take<br>
          </span></div>
        <div><span style="font-family:monospace,monospace">advantage of
            further features.<br>
            <br>
            Notes<br>
            =====<br>
            <br>
            - Tests were run multiple times, and reboots were performed
            during tests.<br>
            - The mesh sample isn't really designed for benchmarking,
            but it should<br>
              be decent for ballpark figures<br>
            - The high priority mesh app was run with default niceness
            and also niceness<br>
              at -20. This had no effect on the results, so it was not
            added above.<br>
            - CPU usage was not saturated while running the tests<br>
            <br>
          </span></div>
        <div><span style="font-family:monospace,monospace">Regards,<br>
          </span></div>
        <div><span style="font-family:monospace,monospace">Andres<br>
          </span></div>
        <br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Fri, Dec 23, 2016 at 1:18 PM,
          Pierre-Loup A. Griffais <span dir="ltr"><<a
              moz-do-not-send="true"
              href="mailto:pgriffais@valvesoftware.com" target="_blank">pgriffais@valvesoftware.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">I hate to
            keep bringing up display topics in an unrelated
            conversation, but I'm not sure where you got "Application
            -> X server -> compositor -> X server" from. As I
            was saying before, we need to be presenting directly to the
            HMD display as no display server can be in the way, both for
            latency but also quality of service reasons (a buggy
            application cannot be allowed to accidentally display
            undistorted rendering into the HMD); we intend to do the
            necessary work for this, and the extent of X's (or a Wayland
            implementation, or any other display server) involvment will
            be to participate enough to know that the HMD display is
            off-limits. If you have more questions on the display
            aspect, or VR rendering in general, I'm happy to try to
            address them out-of-band from this conversation.
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                On 12/23/2016 02:54 AM, Christian König wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    But yes, in general you don't want another
                    compositor in the way, so<br>
                    we'll be acquiring the HMD display directly,
                    separate from any desktop<br>
                    or display server.<br>
                  </blockquote>
                  Assuming that the the HMD is attached to the rendering
                  device in some<br>
                  way you have the X server and the Compositor which
                  both try to be DRM<br>
                  master at the same time.<br>
                  <br>
                  Please correct me if that was fixed in the meantime,
                  but that sounds<br>
                  like it will simply not work. Or is this what Andres
                  mention below Dave<br>
                  is working on ?.<br>
                  <br>
                  Additional to that a compositor in combination with X
                  is a bit counter<br>
                  productive when you want to keep the latency low.<br>
                  <br>
                  E.g. the "normal" flow of a GL or Vulkan surface
                  filled with rendered<br>
                  data to be displayed is from the Application -> X
                  server -> compositor<br>
                  -> X server.<br>
                  <br>
                  The extra step between X server and compositor just
                  means extra latency<br>
                  and for this use case you probably don't want that.<br>
                  <br>
                  Targeting something like Wayland and when you need X
                  compatibility<br>
                  XWayland sounds like the much better idea.<br>
                  <br>
                  Regards,<br>
                  Christian.<br>
                  <br>
                  Am 22.12.2016 um 20:54 schrieb Pierre-Loup A.
                  Griffais:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    Display concerns are a separate issue, and as Andres
                    said we have<br>
                    other plans to address. But yes, in general you
                    don't want another<br>
                    compositor in the way, so we'll be acquiring the HMD
                    display directly,<br>
                    separate from any desktop or display server. Same
                    with security, we<br>
                    can have a separate conversation about that when the
                    time comes.<br>
                    <br>
                    On 12/22/2016 08:41 AM, Serguei Sagalovitch wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      Andres,<br>
                      <br>
                      Did you measure  latency, etc. impact of __any__
                      compositor?<br>
                      <br>
                      My understanding is that VR has pretty strict
                      requirements related to<br>
                      QoS.<br>
                      <br>
                      Sincerely yours,<br>
                      Serguei Sagalovitch<br>
                      <br>
                      <br>
                      On 2016-12-22 11:35 AM, Andres Rodriguez wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        Hey Christian,<br>
                        <br>
                        We are currently interested in X, but with some
                        distros switching to<br>
                        other compositors by default, we also need to
                        consider those.<br>
                        <br>
                        We agree, running the full vrcompositor in root
                        isn't something that<br>
                        we want to do. Too many security concerns.
                        Having a small root helper<br>
                        that does the privilege escalation for us is the
                        initial idea.<br>
                        <br>
                        For a long term approach, Pierre-Loup and Dave
                        are working on dealing<br>
                        with the "two compositors" scenario a little
                        better in DRM+X.<br>
                        Fullscreen isn't really a sufficient approach,
                        since we don't want the<br>
                        HMD to be used as part of the Desktop
                        environment when a VR app is not<br>
                        in use (this is extremely annoying).<br>
                        <br>
                        When the above is settled, we should have an
                        auth mechanism besides<br>
                        DRM_MASTER or DRM_AUTH that allows the
                        vrcompositor to take over the<br>
                        HMD permanently away from X. Re-using that auth
                        method to gate this<br>
                        IOCTL is probably going to be the final
                        solution.<br>
                        <br>
                        I propose to start with ROOT_ONLY since it
                        should allow us to respect<br>
                        kernel IOCTL compatibility guidelines with the
                        most flexibility. Going<br>
                        from a restrictive to a more flexible permission
                        model would be<br>
                        inclusive, but going from a general to a
                        restrictive model may exclude<br>
                        some apps that used to work.<br>
                        <br>
                        Regards,<br>
                        Andres<br>
                        <br>
                        On 12/22/2016 6:42 AM, Christian König wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          Hi Andres,<br>
                          <br>
                          well using root might cause stability and
                          security problems as well.<br>
                          We worked quite hard to avoid exactly this for
                          X.<br>
                          <br>
                          We could make this feature depend on the
                          compositor being DRM master,<br>
                          but for example with X the X server is master
                          (and e.g. can change<br>
                          resolutions etc..) and not the compositor.<br>
                          <br>
                          So another question is also what windowing
                          system (if any) are you<br>
                          planning to use? X, Wayland, Flinger or
                          something completely<br>
                          different ?<br>
                          <br>
                          Regards,<br>
                          Christian.<br>
                          <br>
                          Am 20.12.2016 um 16:51 schrieb Andres
                          Rodriguez:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            Hi Christian,<br>
                            <br>
                            That is definitely a concern. What we are
                            currently thinking is to<br>
                            make the high priority queues accessible to
                            root only.<br>
                            <br>
                            Therefore is a non-root user attempts to set
                            the high priority flag<br>
                            on context allocation, we would fail the
                            call and return ENOPERM.<br>
                            <br>
                            Regards,<br>
                            Andres<br>
                            <br>
                            <br>
                            On 12/20/2016 7:56 AM, Christian König
                            wrote:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                BTW: If there is  non-VR application
                                which will use high-priority<br>
                                h/w queue then VR application will
                                suffer.  Any ideas how<br>
                                to solve it?<br>
                              </blockquote>
                              Yeah, that problem came to my mind as
                              well.<br>
                              <br>
                              Basically we need to restrict those high
                              priority submissions to<br>
                              the VR compositor or otherwise any
                              malfunctioning application could<br>
                              use it.<br>
                              <br>
                              Just think about some WebGL suddenly
                              taking all our rendering away<br>
                              and we won't get anything drawn any more.<br>
                              <br>
                              Alex or Michel any ideas on that?<br>
                              <br>
                              Regards,<br>
                              Christian.<br>
                              <br>
                              Am 19.12.2016 um 15:48 schrieb Serguei
                              Sagalovitch:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                > If compute queue is occupied only
                                by you, the efficiency<br>
                                > is equal with setting job queue to
                                high priority I think.<br>
                                The only risk is the situation when
                                graphics will take all<br>
                                needed CUs. But in any case it should be
                                very good test.<br>
                                <br>
                                Andres/Pierre-Loup,<br>
                                <br>
                                Did you try to do it or it is a lot of
                                work for you?<br>
                                <br>
                                <br>
                                BTW: If there is  non-VR application
                                which will use high-priority<br>
                                h/w queue then VR application will
                                suffer.  Any ideas how<br>
                                to solve it?<br>
                                <br>
                                Sincerely yours,<br>
                                Serguei Sagalovitch<br>
                                <br>
                                On 2016-12-19 12:50 AM, zhoucm1 wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  Do you encounter the priority issue
                                  for compute queue with<br>
                                  current driver?<br>
                                  <br>
                                  If compute queue is occupied only by
                                  you, the efficiency is equal<br>
                                  with setting job queue to high
                                  priority I think.<br>
                                  <br>
                                  Regards,<br>
                                  David Zhou<br>
                                  <br>
                                  On 2016年12月19日 13:29, Andres Rodriguez
                                  wrote:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    Yes, vulkan is available on all-open
                                    through the mesa radv UMD.<br>
                                    <br>
                                    I'm not sure if I'm asking for too
                                    much, but if we can<br>
                                    coordinate a similar interface in
                                    radv and amdgpu-pro at the<br>
                                    vulkan level that would be great.<br>
                                    <br>
                                    I'm not sure what that's going to be
                                    yet.<br>
                                    <br>
                                    - Andres<br>
                                    <br>
                                    On 12/19/2016 12:11 AM, zhoucm1
                                    wrote:<br>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex">
                                      <br>
                                      <br>
                                      On 2016年12月19日 11:33, Pierre-Loup
                                      A. Griffais wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                        We're currently working with the
                                        open stack; I assume that a<br>
                                        mechanism could be exposed by
                                        both open and Pro Vulkan<br>
                                        userspace drivers and that the
                                        amdgpu kernel interface<br>
                                        improvements we would pursue
                                        following this discussion would<br>
                                        let both drivers take advantage
                                        of the feature, correct?<br>
                                      </blockquote>
                                      Of course.<br>
                                      Does open stack have Vulkan
                                      support?<br>
                                      <br>
                                      Regards,<br>
                                      David Zhou<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                        <br>
                                        On 12/18/2016 07:26 PM, zhoucm1
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          By the way, are you using
                                          all-open driver or amdgpu-pro<br>
                                          driver?<br>
                                          <br>
                                          +David Mao, who is working on
                                          our Vulkan driver.<br>
                                          <br>
                                          Regards,<br>
                                          David Zhou<br>
                                          <br>
                                          On 2016年12月18日 06:05,
                                          Pierre-Loup A. Griffais wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            Hi Serguei,<br>
                                            <br>
                                            I'm also working on the
                                            bringing up our VR runtime
                                            on top of<br>
                                            amgpu;<br>
                                            see replies inline.<br>
                                            <br>
                                            On 12/16/2016 09:05 PM,
                                            Sagalovitch, Serguei wrote:<br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              Andres,<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                 For current VR
                                                workloads we have 3
                                                separate processes<br>
                                                running<br>
                                                actually:<br>
                                              </blockquote>
                                              So we could have potential
                                              memory overcommit case or
                                              do<br>
                                              you do<br>
                                              partitioning<br>
                                              on your own?  I would
                                              think that there is need
                                              to avoid<br>
                                              overcomit in<br>
                                              VR case to<br>
                                              prevent any BO migration.<br>
                                            </blockquote>
                                            <br>
                                            You're entirely correct;
                                            currently the VR runtime is<br>
                                            setting up<br>
                                            prioritized CPU scheduling
                                            for its VR compositor, we're<br>
                                            working on<br>
                                            prioritized GPU scheduling
                                            and pre-emption (eg. this<br>
                                            thread), and in<br>
                                            the future it will make
                                            sense to do work in order to
                                            make<br>
                                            sure that<br>
                                            its memory allocations do
                                            not get evicted, to prevent
                                            any<br>
                                            unwelcome<br>
                                            additional latency in the
                                            event of needing to perform<br>
                                            just-in-time<br>
                                            reprojection.<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              BTW: Do you mean __real__
                                              processes or threads?<br>
                                              Based on my understanding
                                              sharing BOs between
                                              different<br>
                                              processes<br>
                                              could introduce additional
                                              synchronization
                                              constrains. btw:<br>
                                              I am not<br>
                                              sure<br>
                                              if we are able to share
                                              Vulkan sync. object
                                              cross-process<br>
                                              boundary.<br>
                                            </blockquote>
                                            <br>
                                            They are different
                                            processes; it is important
                                            for the<br>
                                            compositor that<br>
                                            is responsible for
                                            quality-of-service features
                                            such as<br>
                                            consistently<br>
                                            presenting distorted frames
                                            with the right latency,<br>
                                            reprojection, etc,<br>
                                            to be separate from the main
                                            application.<br>
                                            <br>
                                            Currently we are using
                                            unreleased cross-process
                                            memory and<br>
                                            semaphore<br>
                                            extensions to fetch updated
                                            eye images from the client<br>
                                            application,<br>
                                            but the just-in-time
                                            reprojection discussed here
                                            does not<br>
                                            actually<br>
                                            have any direct interactions
                                            with cross-process resource<br>
                                            sharing,<br>
                                            since it's achieved by using
                                            whatever is the latest, most<br>
                                            up-to-date<br>
                                            eye images that have already
                                            been sent by the client<br>
                                            application,<br>
                                            which are already available
                                            to use without additional<br>
                                            synchronization.<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                   3) System compositor
                                                (we are looking at
                                                approaches to<br>
                                                remove this<br>
                                                overhead)<br>
                                              </blockquote>
                                              Yes,  IMHO the best is to
                                              run in  "full screen
                                              mode".<br>
                                            </blockquote>
                                            <br>
                                            Yes, we are working on
                                            mechanisms to present
                                            directly to the<br>
                                            headset<br>
                                            display without any
                                            intermediaries as a separate
                                            effort.<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                 The latency is our main
                                                concern,<br>
                                              </blockquote>
                                              I would assume that this
                                              is the known problem (at
                                              least for<br>
                                              compute<br>
                                              usage).<br>
                                              It looks like that amdgpu
                                              / kernel submission is
                                              rather CPU<br>
                                              intensive<br>
                                              (at least<br>
                                              in the default
                                              configuration).<br>
                                            </blockquote>
                                            <br>
                                            As long as it's a consistent
                                            cost, it shouldn't an issue.<br>
                                            However, if<br>
                                            there's high degrees of
                                            variance then that would be<br>
                                            troublesome and we<br>
                                            would need to account for
                                            the worst case.<br>
                                            <br>
                                            Hopefully the requirements
                                            and approach we described
                                            make<br>
                                            sense, we're<br>
                                            looking forward to your
                                            feedback and suggestions.<br>
                                            <br>
                                            Thanks!<br>
                                             - Pierre-Loup<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <br>
                                              Sincerely yours,<br>
                                              Serguei Sagalovitch<br>
                                              <br>
                                              <br>
                                              From: Andres Rodriguez
                                              <<a
                                                moz-do-not-send="true"
                                                href="mailto:andresr@valvesoftware.com"
                                                target="_blank">andresr@valvesoftware.com</a>><br>
                                              Sent: December 16, 2016
                                              10:00 PM<br>
                                              To: Sagalovitch, Serguei;
                                              <a moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              Subject: RE: [RFC]
                                              Mechanism for high
                                              priority scheduling<br>
                                              in amdgpu<br>
                                              <br>
                                              Hey Serguei,<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                [Serguei] No. I mean
                                                pipe :-) as MEC define
                                                it.  As far<br>
                                                as I<br>
                                                understand (by
                                                simplifying)<br>
                                                some scheduling is per
                                                pipe.  I know about the
                                                current<br>
                                                allocation<br>
                                                scheme but I do not
                                                think<br>
                                                that it is  ideal.  I
                                                would assume that we
                                                need to<br>
                                                switch to<br>
                                                dynamical partition<br>
                                                of resources  based on
                                                the workload otherwise
                                                we will have<br>
                                                resource<br>
                                                conflict<br>
                                                between Vulkan compute
                                                and  OpenCL.<br>
                                              </blockquote>
                                              <br>
                                              I agree the partitioning
                                              isn't ideal. I'm hoping we
                                              can<br>
                                              start with a<br>
                                              solution that assumes that<br>
                                              only pipe0 has any work
                                              and the other pipes are
                                              idle (no<br>
                                              HSA/ROCm<br>
                                              running on the system).<br>
                                              <br>
                                              This should be more or
                                              less the use case we
                                              expect from VR<br>
                                              users.<br>
                                              <br>
                                              I agree the split is
                                              currently not ideal, but
                                              I'd like to<br>
                                              consider<br>
                                              that a separate task,
                                              because<br>
                                              making it dynamic is not
                                              straight forward :P<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                [Serguei] Vulkan works
                                                via amdgpu (kernel
                                                submissions) so<br>
                                                amdkfd<br>
                                                will be not<br>
                                                involved.  I would
                                                assume that in the case
                                                of VR we will<br>
                                                have one main<br>
                                                application ("console"
                                                mode(?)) so we could
                                                temporally<br>
                                                "ignore"<br>
                                                OpenCL/ROCm needs when
                                                VR is running.<br>
                                              </blockquote>
                                              <br>
                                              Correct, this is why we
                                              want to enable the high
                                              priority<br>
                                              compute<br>
                                              queue through<br>
                                              libdrm-amdgpu, so that we
                                              can expose it through
                                              Vulkan<br>
                                              later.<br>
                                              <br>
                                              For current VR workloads
                                              we have 3 separate
                                              processes<br>
                                              running actually:<br>
                                                  1) Game process<br>
                                                  2) VR Compositor (this
                                              is the process that will
                                              require<br>
                                              high<br>
                                              priority queue)<br>
                                                  3) System compositor
                                              (we are looking at
                                              approaches to<br>
                                              remove this<br>
                                              overhead)<br>
                                              <br>
                                              For now I think it is okay
                                              to assume no OpenCL/ROCm
                                              running<br>
                                              simultaneously, but<br>
                                              I would also like to be
                                              able to address this case
                                              in the<br>
                                              future<br>
                                              (cross-pipe priorities).<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                [Serguei]  The problem
                                                with pre-emption of
                                                graphics task:<br>
                                                (a) it<br>
                                                may take time so<br>
                                                latency may suffer<br>
                                              </blockquote>
                                              <br>
                                              The latency is our main
                                              concern, we want something
                                              that is<br>
                                              predictable. A good<br>
                                              illustration of what the
                                              reprojection scheduling
                                              looks like<br>
                                              can be<br>
                                              found here:<br>
                                              <a moz-do-not-send="true"
href="https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png"
                                                rel="noreferrer"
                                                target="_blank">https://community.amd.com/serv<wbr>let/JiveServlet/showImage/38-<wbr>1310-104754/pastedImage_3.png</a><br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                (b) to preempt we need
                                                to have different
                                                "context" - we<br>
                                                want<br>
                                                to guarantee that
                                                submissions from the
                                                same context will<br>
                                                be executed<br>
                                                in order.<br>
                                              </blockquote>
                                              <br>
                                              This is okay, as the
                                              reprojection work doesn't
                                              have<br>
                                              dependencies on<br>
                                              the game context, and it<br>
                                              even happens in a separate
                                              process.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                BTW: (a) Do you want
                                                "preempt" and later
                                                resume or do you<br>
                                                want<br>
                                                "preempt" and<br>
                                                "cancel/abort"<br>
                                              </blockquote>
                                              <br>
                                              Preempt the game with the
                                              compositor task and then
                                              resume<br>
                                              it.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                (b) Vulkan is generic
                                                API and could be used
                                                for graphics<br>
                                                as well as<br>
                                                for plain compute tasks
                                                (VK_QUEUE_COMPUTE_BIT).<br>
                                              </blockquote>
                                              <br>
                                              Yeah, the plan is to use
                                              vulkan compute. But if you
                                              figure<br>
                                              out a way<br>
                                              for us to get<br>
                                              a guaranteed execution
                                              time using vulkan
                                              graphics, then<br>
                                              I'll take you<br>
                                              out for a beer :)<br>
                                              <br>
                                              Regards,<br>
                                              Andres<br>
______________________________<wbr>__________<br>
                                              From: Sagalovitch, Serguei
                                              [<a moz-do-not-send="true"
href="mailto:Serguei.Sagalovitch@amd.com" target="_blank">Serguei.Sagalovitch@amd.com</a>]<br>
                                              Sent: Friday, December 16,
                                              2016 9:13 PM<br>
                                              To: Andres Rodriguez; <a
                                                moz-do-not-send="true"
                                                href="mailto:amd-gfx@lists.freedesktop.org"
                                                target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              Subject: Re: [RFC]
                                              Mechanism for high
                                              priority scheduling<br>
                                              in amdgpu<br>
                                              <br>
                                              Hi Andres,<br>
                                              <br>
                                              Please see inline (as
                                              [Serguei])<br>
                                              <br>
                                              Sincerely yours,<br>
                                              Serguei Sagalovitch<br>
                                              <br>
                                              <br>
                                              From: Andres Rodriguez
                                              <<a
                                                moz-do-not-send="true"
                                                href="mailto:andresr@valvesoftware.com"
                                                target="_blank">andresr@valvesoftware.com</a>><br>
                                              Sent: December 16, 2016
                                              8:29 PM<br>
                                              To: Sagalovitch, Serguei;
                                              <a moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              Subject: RE: [RFC]
                                              Mechanism for high
                                              priority scheduling<br>
                                              in amdgpu<br>
                                              <br>
                                              Hi Serguei,<br>
                                              <br>
                                              Thanks for the feedback.
                                              Answers inline as [AR].<br>
                                              <br>
                                              Regards,<br>
                                              Andres<br>
                                              <br>
______________________________<wbr>__________<br>
                                              From: Sagalovitch, Serguei
                                              [<a moz-do-not-send="true"
href="mailto:Serguei.Sagalovitch@amd.com" target="_blank">Serguei.Sagalovitch@amd.com</a>]<br>
                                              Sent: Friday, December 16,
                                              2016 8:15 PM<br>
                                              To: Andres Rodriguez; <a
                                                moz-do-not-send="true"
                                                href="mailto:amd-gfx@lists.freedesktop.org"
                                                target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              Subject: Re: [RFC]
                                              Mechanism for high
                                              priority scheduling<br>
                                              in amdgpu<br>
                                              <br>
                                              Andres,<br>
                                              <br>
                                              <br>
                                              Quick comments:<br>
                                              <br>
                                              1) To minimize "bubbles",
                                              etc. we need to "force" CU<br>
                                              assignments/binding<br>
                                              to high-priority queue 
                                              when it will be in use and
                                              "free"<br>
                                              them later<br>
                                              (we  do not want forever
                                              take CUs from e.g. graphic
                                              task to<br>
                                              degrade<br>
                                              graphics<br>
                                              performance).<br>
                                              <br>
                                              Otherwise we could have
                                              scenario when long
                                              graphics task (or<br>
                                              low-priority<br>
                                              compute) will took all
                                              (extra) CUs and
                                              high--priority will<br>
                                              wait for<br>
                                              needed resources.<br>
                                              It will not be visible on
                                              "NOP " but only when you
                                              submit<br>
                                              "real"<br>
                                              compute task<br>
                                              so I would recommend  not
                                              to use "NOP" packets at
                                              all for<br>
                                              testing.<br>
                                              <br>
                                              It (CU assignment) could
                                              be relatively easy done
                                              when<br>
                                              everything is<br>
                                              going via kernel<br>
                                              (e.g. as part of frame
                                              submission) but I must
                                              admit that I<br>
                                              am not sure<br>
                                              about the best way for
                                              user level submissions
                                              (amdkfd).<br>
                                              <br>
                                              [AR] I wasn't aware of
                                              this part of the
                                              programming<br>
                                              sequence. Thanks<br>
                                              for the heads up!<br>
                                              Is this similar to the CU
                                              masking programming?<br>
                                              [Serguei] Yes. To
                                              simplify: the problem is
                                              that "scheduler"<br>
                                              when<br>
                                              deciding which<br>
                                              queue to  run will check
                                              if there is enough
                                              resources and<br>
                                              if not then<br>
                                              it will begin<br>
                                              to check other queues with
                                              lower priority.<br>
                                              <br>
                                              2) I would recommend to
                                              dedicate the whole pipe to<br>
                                              high-priority<br>
                                              queue and have<br>
                                              nothing their except it.<br>
                                              <br>
                                              [AR] I'm guessing in this
                                              context you mean pipe =
                                              queue?<br>
                                              (as opposed<br>
                                              to the MEC definition<br>
                                              of pipe, which is a
                                              grouping of queues). I say
                                              this because<br>
                                              amdgpu<br>
                                              only has access to 1 pipe,<br>
                                              and the rest are
                                              statically partitioned for
                                              amdkfd usage.<br>
                                              <br>
                                              [Serguei] No. I mean pipe
                                              :-)  as MEC define it. As
                                              far as I<br>
                                              understand (by
                                              simplifying)<br>
                                              some scheduling is per
                                              pipe.  I know about the
                                              current<br>
                                              allocation<br>
                                              scheme but I do not think<br>
                                              that it is  ideal.  I
                                              would assume that we need
                                              to switch to<br>
                                              dynamical partition<br>
                                              of resources  based on the
                                              workload otherwise we will
                                              have<br>
                                              resource<br>
                                              conflict<br>
                                              between Vulkan compute
                                              and  OpenCL.<br>
                                              <br>
                                              <br>
                                              BTW: Which user level API
                                              do you want to use for
                                              compute:<br>
                                              Vulkan or<br>
                                              OpenCL?<br>
                                              <br>
                                              [AR] Vulkan<br>
                                              <br>
                                              [Serguei] Vulkan works via
                                              amdgpu (kernel
                                              submissions) so<br>
                                              amdkfd will<br>
                                              be not<br>
                                              involved.  I would assume
                                              that in the case of VR we
                                              will<br>
                                              have one main<br>
                                              application ("console"
                                              mode(?)) so we could
                                              temporally<br>
                                              "ignore"<br>
                                              OpenCL/ROCm needs when VR
                                              is running.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                 we will not be able to
                                                provide a solution
                                                compatible with<br>
                                                GFX<br>
                                                worloads.<br>
                                              </blockquote>
                                              I assume that you are
                                              talking about graphics? Am
                                              I right?<br>
                                              <br>
                                              [AR] Yeah, my
                                              understanding is that
                                              pre-empting the<br>
                                              currently running<br>
                                              graphics job and
                                              scheduling in<br>
                                              something else using
                                              mid-buffer pre-emption has
                                              some cases<br>
                                              where it<br>
                                              doesn't work well. But if
                                              with<br>
                                              polaris10 it starts
                                              working well, it might be
                                              a better<br>
                                              solution for<br>
                                              us (because the whole
                                              reprojection<br>
                                              work uses the vulkan
                                              graphics stack at the
                                              moment, and<br>
                                              porting it to<br>
                                              compute is not trivial).<br>
                                              <br>
                                              [Serguei]  The problem
                                              with pre-emption of
                                              graphics task:<br>
                                              (a) it may<br>
                                              take time so<br>
                                              latency may suffer (b) to
                                              preempt we need to have
                                              different<br>
                                              "context"<br>
                                              - we want<br>
                                              to guarantee that
                                              submissions from the same
                                              context will be<br>
                                              executed<br>
                                              in order.<br>
                                              BTW: (a) Do you want 
                                              "preempt" and later resume
                                              or do you<br>
                                              want<br>
                                              "preempt" and<br>
                                              "cancel/abort"?  (b)
                                              Vulkan is generic API and
                                              could be used<br>
                                              for graphics as well as
                                              for plain compute tasks<br>
                                              (VK_QUEUE_COMPUTE_BIT).<br>
                                              <br>
                                              <br>
                                              Sincerely yours,<br>
                                              Serguei Sagalovitch<br>
                                              <br>
                                              <br>
                                              <br>
                                              From: amd-gfx <<a
                                                moz-do-not-send="true"
                                                href="mailto:amd-gfx-bounces@lists.freedesktop.org"
                                                target="_blank">amd-gfx-bounces@lists.freedes<wbr>ktop.org</a>>
                                              on<br>
                                              behalf of<br>
                                              Andres Rodriguez <<a
                                                moz-do-not-send="true"
                                                href="mailto:andresr@valvesoftware.com"
                                                target="_blank">andresr@valvesoftware.com</a>><br>
                                              Sent: December 16, 2016
                                              6:15 PM<br>
                                              To: <a
                                                moz-do-not-send="true"
                                                href="mailto:amd-gfx@lists.freedesktop.org"
                                                target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              Subject: [RFC] Mechanism
                                              for high priority
                                              scheduling in<br>
                                              amdgpu<br>
                                              <br>
                                              Hi Everyone,<br>
                                              <br>
                                              This RFC is also available
                                              as a gist here:<br>
                                              <a moz-do-not-send="true"
href="https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249"
                                                rel="noreferrer"
                                                target="_blank">https://gist.github.com/lostgo<wbr>at/7000432cd6864265dbc2c3ab932<wbr>04249</a><br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <a moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <br>
                                              <br>
                                              <br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <a moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <a moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                              [RFC] Mechanism for high
                                              priority scheduling in
                                              amdgpu<br>
                                              <br>
                                              <br>
                                              We are interested in
                                              feedback for a mechanism
                                              to<br>
                                              effectively schedule<br>
                                              high<br>
                                              priority VR reprojection
                                              tasks (also referred to as<br>
                                              time-warping) for<br>
                                              Polaris10<br>
                                              running on the amdgpu
                                              kernel driver.<br>
                                              <br>
                                              Brief context:<br>
                                              --------------<br>
                                              <br>
                                              The main objective of
                                              reprojection is to avoid
                                              motion<br>
                                              sickness for VR<br>
                                              users in<br>
                                              scenarios where the game
                                              or application would fail
                                              to finish<br>
                                              rendering a new<br>
                                              frame in time for the next
                                              VBLANK. When this happens,
                                              the<br>
                                              user's head<br>
                                              movements<br>
                                              are not reflected on the
                                              Head Mounted Display (HMD)
                                              for the<br>
                                              duration<br>
                                              of an<br>
                                              extra frame. This extended
                                              mismatch between the inner
                                              ear<br>
                                              and the<br>
                                              eyes may<br>
                                              cause the user to
                                              experience motion
                                              sickness.<br>
                                              <br>
                                              The VR compositor deals
                                              with this problem by
                                              fabricating a<br>
                                              new frame<br>
                                              using the<br>
                                              user's updated head
                                              position in combination
                                              with the<br>
                                              previous frames.<br>
                                              This<br>
                                              avoids a prolonged
                                              mismatch between the HMD
                                              output and the<br>
                                              inner ear.<br>
                                              <br>
                                              Because of the adverse
                                              effects on the user, we
                                              require high<br>
                                              confidence that the<br>
                                              reprojection task will
                                              complete before the VBLANK
                                              interval.<br>
                                              Even if<br>
                                              the GFX pipe<br>
                                              is currently full of work
                                              from the game/application
                                              (which<br>
                                              is most<br>
                                              likely the case).<br>
                                              <br>
                                              For more details and
                                              illustrations, please
                                              refer to the<br>
                                              following<br>
                                              document:<br>
                                              <a moz-do-not-send="true"
href="https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved"
                                                rel="noreferrer"
                                                target="_blank">https://community.amd.com/comm<wbr>unity/gaming/blog/2016/03/28/<wbr>asynchronous-shaders-evolved</a><br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              Gaming: Asynchronous
                                              Shaders Evolved |
                                              Community<br>
                                              <a moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                              One of the most exciting
                                              new developments in GPU
                                              technology<br>
                                              over the<br>
                                              past year has been the
                                              adoption of asynchronous
                                              shaders,<br>
                                              which can<br>
                                              make more efficient use of
                                              ...<br>
                                              <br>
                                              <br>
                                              <br>
                                              Gaming: Asynchronous
                                              Shaders Evolved |
                                              Community<br>
                                              <a moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                              One of the most exciting
                                              new developments in GPU
                                              technology<br>
                                              over the<br>
                                              past year has been the
                                              adoption of asynchronous
                                              shaders,<br>
                                              which can<br>
                                              make more efficient use of
                                              ...<br>
                                              <br>
                                              <br>
                                              <br>
                                              Gaming: Asynchronous
                                              Shaders Evolved |
                                              Community<br>
                                              <a moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                              One of the most exciting
                                              new developments in GPU
                                              technology<br>
                                              over the<br>
                                              past year has been the
                                              adoption of asynchronous
                                              shaders,<br>
                                              which can<br>
                                              make more efficient use of
                                              ...<br>
                                              <br>
                                              <br>
                                              Requirements:<br>
                                              -------------<br>
                                              <br>
                                              The mechanism must expose
                                              the following
                                              functionaility:<br>
                                              <br>
                                                  * Job round trip time
                                              must be predictable, from<br>
                                              submission to<br>
                                              fence signal<br>
                                              <br>
                                                  * The mechanism must
                                              support compute workloads.<br>
                                              <br>
                                              Goals:<br>
                                              ------<br>
                                              <br>
                                                  * The mechanism should
                                              provide low submission
                                              latencies<br>
                                              <br>
                                              Test: submitting a NOP
                                              packet through the
                                              mechanism on busy<br>
                                              hardware<br>
                                              should<br>
                                              be equivalent to
                                              submitting a NOP on idle
                                              hardware.<br>
                                              <br>
                                              Nice to have:<br>
                                              -------------<br>
                                              <br>
                                                  * The mechanism should
                                              also support GFX
                                              workloads.<br>
                                              <br>
                                              My understanding is that
                                              with the current hardware<br>
                                              capabilities in<br>
                                              Polaris10 we<br>
                                              will not be able to
                                              provide a solution
                                              compatible with GFX<br>
                                              worloads.<br>
                                              <br>
                                              But I would love to hear
                                              otherwise. So if anyone
                                              has an<br>
                                              idea,<br>
                                              approach or<br>
                                              suggestion that will also
                                              be compatible with the GFX
                                              ring,<br>
                                              please let<br>
                                              us know<br>
                                              about it.<br>
                                              <br>
                                                  * The above guarantees
                                              should also be respected
                                              by<br>
                                              amdkfd workloads<br>
                                              <br>
                                              Would be good to have for
                                              consistency, but not
                                              strictly<br>
                                              necessary as<br>
                                              users running<br>
                                              games are not
                                              traditionally running HPC
                                              workloads in the<br>
                                              background.<br>
                                              <br>
                                              Proposed approach:<br>
                                              ------------------<br>
                                              <br>
                                              Similar to the windows
                                              driver, we could expose a
                                              high<br>
                                              priority<br>
                                              compute queue to<br>
                                              userspace.<br>
                                              <br>
                                              Submissions to this
                                              compute queue will be
                                              scheduled with<br>
                                              high<br>
                                              priority, and may<br>
                                              acquire hardware resources
                                              previously in use by other<br>
                                              queues.<br>
                                              <br>
                                              This can be achieved by
                                              taking advantage of the
                                              'priority'<br>
                                              field in<br>
                                              the HQDs<br>
                                              and could be programmed by
                                              amdgpu or the amdgpu
                                              scheduler.<br>
                                              The relevant<br>
                                              register fields are:<br>
                                                      *
                                              mmCP_HQD_PIPE_PRIORITY<br>
                                                      *
                                              mmCP_HQD_QUEUE_PRIORITY<br>
                                              <br>
                                              Implementation approach 1
                                              - static partitioning:<br>
------------------------------<wbr>------------------<br>
                                              <br>
                                              The amdgpu driver
                                              currently controls 8
                                              compute queues from<br>
                                              pipe0. We can<br>
                                              statically partition these
                                              as follows:<br>
                                                      * 7x regular<br>
                                                      * 1x high priority<br>
                                              <br>
                                              The relevant priorities
                                              can be set so that
                                              submissions to<br>
                                              the high<br>
                                              priority<br>
                                              ring will starve the other
                                              compute rings and the GFX
                                              ring.<br>
                                              <br>
                                              The amdgpu scheduler will
                                              only place jobs into the
                                              high<br>
                                              priority<br>
                                              rings if the<br>
                                              context is marked as high
                                              priority. And a
                                              corresponding<br>
                                              priority<br>
                                              should be<br>
                                              added to keep track of
                                              this information:<br>
                                                   *
                                              AMD_SCHED_PRIORITY_KERNEL<br>
                                                   * ->
                                              AMD_SCHED_PRIORITY_HIGH<br>
                                                   *
                                              AMD_SCHED_PRIORITY_NORMAL<br>
                                              <br>
                                              The user will request a
                                              high priority context by
                                              setting an<br>
                                              appropriate flag<br>
                                              in drm_amdgpu_ctx_in
                                              (AMDGPU_CTX_HIGH_PRIORITY
                                              or similar):<br>
                                              <a moz-do-not-send="true"
href="https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163"
                                                rel="noreferrer"
                                                target="_blank">https://github.com/torvalds/li<wbr>nux/blob/master/include/uapi/<wbr>drm/amdgpu_drm.h#L163</a><br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              The setting is in a per
                                              context level so that we
                                              can:<br>
                                                  * Maintain a
                                              consistent FIFO ordering
                                              of all<br>
                                              submissions to a<br>
                                              context<br>
                                                  * Create high priority
                                              and non-high priority
                                              contexts<br>
                                              in the same<br>
                                              process<br>
                                              <br>
                                              Implementation approach 2
                                              - dynamic priority
                                              programming:<br>
------------------------------<wbr>---------------------------<br>
                                              <br>
                                              Similar to the above, but
                                              instead of programming the<br>
                                              priorities and<br>
                                              amdgpu_init() time, the SW
                                              scheduler will reprogram
                                              the<br>
                                              queue priorities<br>
                                              dynamically when
                                              scheduling a task.<br>
                                              <br>
                                              This would involve having
                                              a hardware specific
                                              callback from<br>
                                              the<br>
                                              scheduler to<br>
                                              set the appropriate queue
                                              priority: set_priority(int
                                              ring,<br>
                                              int index,<br>
                                              int priority)<br>
                                              <br>
                                              During this callback we
                                              would have to grab the
                                              SRBM mutex<br>
                                              to perform<br>
                                              the appropriate<br>
                                              HW programming, and I'm
                                              not really sure if that is<br>
                                              something we<br>
                                              should be doing from<br>
                                              the scheduler.<br>
                                              <br>
                                              On the positive side, this
                                              approach would allow us to<br>
                                              program a range of<br>
                                              priorities for jobs
                                              instead of a single "high
                                              priority"<br>
                                              value",<br>
                                              achieving<br>
                                              something similar to the
                                              niceness API available for
                                              CPU<br>
                                              scheduling.<br>
                                              <br>
                                              I'm not sure if this
                                              flexibility is something
                                              that we would<br>
                                              need for<br>
                                              our use<br>
                                              case, but it might be
                                              useful in other scenarios
                                              (multiple<br>
                                              users<br>
                                              sharing compute<br>
                                              time on a server).<br>
                                              <br>
                                              This approach would
                                              require a new int field in<br>
                                              drm_amdgpu_ctx_in, or<br>
                                              repurposing<br>
                                              of the flags field.<br>
                                              <br>
                                              Known current obstacles:<br>
                                              ------------------------<br>
                                              <br>
                                              The SQ is currently
                                              programmed to disregard
                                              the HQD<br>
                                              priorities, and<br>
                                              instead it picks<br>
                                              jobs at random. Settings
                                              from the shader itself are
                                              also<br>
                                              disregarded<br>
                                              as this is<br>
                                              considered a privileged
                                              field.<br>
                                              <br>
                                              Effectively we can get our
                                              compute wavefront launched
                                              ASAP,<br>
                                              but we<br>
                                              might not get the<br>
                                              time we need on the SQ.<br>
                                              <br>
                                              The current programming
                                              would have to be changed
                                              to allow<br>
                                              priority<br>
                                              propagation<br>
                                              from the HQD into the SQ.<br>
                                              <br>
                                              Generic approach for all
                                              HW IPs:<br>
------------------------------<wbr>--<br>
                                              <br>
                                              For consistency purposes,
                                              the high priority context
                                              can be<br>
                                              enabled<br>
                                              for all HW IPs<br>
                                              with support of the SW
                                              scheduler. This will
                                              function<br>
                                              similarly to the<br>
                                              current<br>
                                              AMD_SCHED_PRIORITY_KERNEL
                                              priority, where the job
                                              can jump<br>
                                              ahead of<br>
                                              anything not<br>
                                              commited to the HW queue.<br>
                                              <br>
                                              The benefits of requesting
                                              a high priority context
                                              for a<br>
                                              non-compute<br>
                                              queue will<br>
                                              be lesser (e.g. up to 10s
                                              of wait time if a GFX
                                              command is<br>
                                              stuck in<br>
                                              front of<br>
                                              you), but having the API
                                              in place will allow us to
                                              easily<br>
                                              improve the<br>
                                              implementation<br>
                                              in the future as new
                                              features become available
                                              in new<br>
                                              hardware.<br>
                                              <br>
                                              Future steps:<br>
                                              -------------<br>
                                              <br>
                                              Once we have an approach
                                              settled, I can take care
                                              of the<br>
                                              implementation.<br>
                                              <br>
                                              Also, once the interface
                                              is mostly decided, we can
                                              start<br>
                                              thinking about<br>
                                              exposing the high priority
                                              queue through radv.<br>
                                              <br>
                                              Request for feedback:<br>
                                              ---------------------<br>
                                              <br>
                                              We aren't married to any
                                              of the approaches outlined
                                              above.<br>
                                              Our goal<br>
                                              is to<br>
                                              obtain a mechanism that
                                              will allow us to complete
                                              the<br>
                                              reprojection<br>
                                              job within a<br>
                                              predictable amount of
                                              time. So if anyone anyone
                                              has any<br>
                                              suggestions for<br>
                                              improvements or
                                              alternative strategies we
                                              are more than<br>
                                              happy to hear<br>
                                              them.<br>
                                              <br>
                                              If any of the technical
                                              information above is also<br>
                                              incorrect, feel<br>
                                              free to point<br>
                                              out my misunderstandings.<br>
                                              <br>
                                              Looking forward to hearing
                                              from you.<br>
                                              <br>
                                              Regards,<br>
                                              Andres<br>
                                              <br>
______________________________<wbr>_________________<br>
                                              amd-gfx mailing list<br>
                                              <a moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              <a moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                                rel="noreferrer"
                                                target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                              <br>
                                              <br>
                                              amd-gfx Info Page - <a
                                                moz-do-not-send="true"
                                                href="http://lists.freedesktop.org"
                                                rel="noreferrer"
                                                target="_blank">lists.freedesktop.org</a><br>
                                              <a moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                              To see the collection of
                                              prior postings to the
                                              list,<br>
                                              visit the<br>
                                              amd-gfx Archives. Using
                                              amd-gfx: To post a message
                                              to all<br>
                                              the list<br>
                                              members, send email ...<br>
                                              <br>
                                              <br>
                                              <br>
                                              amd-gfx Info Page - <a
                                                moz-do-not-send="true"
                                                href="http://lists.freedesktop.org"
                                                rel="noreferrer"
                                                target="_blank">lists.freedesktop.org</a><br>
                                              <a moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                              To see the collection of
                                              prior postings to the
                                              list,<br>
                                              visit the<br>
                                              amd-gfx Archives. Using
                                              amd-gfx: To post a message
                                              to all<br>
                                              the list<br>
                                              members, send email ...<br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
                                              <br>
______________________________<wbr>_________________<br>
                                              amd-gfx mailing list<br>
                                              <a moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              <a moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                                rel="noreferrer"
                                                target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                              <br>
                                            </blockquote>
                                            <br>
______________________________<wbr>_________________<br>
                                            amd-gfx mailing list<br>
                                            <a moz-do-not-send="true"
                                              href="mailto:amd-gfx@lists.freedesktop.org"
                                              target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                            <a moz-do-not-send="true"
                                              href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                              rel="noreferrer"
                                              target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                          </blockquote>
                                          <br>
                                        </blockquote>
                                        <br>
                                      </blockquote>
                                      <br>
                                      ______________________________<wbr>_________________<br>
                                      amd-gfx mailing list<br>
                                      <a moz-do-not-send="true"
                                        href="mailto:amd-gfx@lists.freedesktop.org"
                                        target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                      <a moz-do-not-send="true"
                                        href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                        rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                    </blockquote>
                                    <br>
                                  </blockquote>
                                  <br>
                                </blockquote>
                                <br>
                                Sincerely yours,<br>
                                Serguei Sagalovitch<br>
                                <br>
                                ______________________________<wbr>_________________<br>
                                amd-gfx mailing list<br>
                                <a moz-do-not-send="true"
                                  href="mailto:amd-gfx@lists.freedesktop.org"
                                  target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                <a moz-do-not-send="true"
                                  href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                  rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                              </blockquote>
                              <br>
                              <br>
                            </blockquote>
                            <br>
                          </blockquote>
                          <br>
                        </blockquote>
                        <br>
                      </blockquote>
                      <br>
                      Sincerely yours,<br>
                      Serguei Sagalovitch<br>
                      <br>
                    </blockquote>
                    <br>
                  </blockquote>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>