<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Indeed a couple of nice numbers.<br>
      <br>
      <blockquote type="cite"><span
          style="font-family:monospace,monospace">but everything already
          commited<br>
          to the HW queue is executed in strict FIFO order.</span></blockquote>
      Well actually if we get a high priority submission we could
      preempt/abort everything on the ring buffer before it in theory.<br>
      <br>
      Probably not as fine granularity as the hardware scheduler, but
      might be easier to get working.<br>
      <br>
      Regards,<br>
      Christian.<br>
      <br>
      Am 26.12.2016 um 03:26 schrieb zhoucm1:<br>
    </div>
    <blockquote cite="mid:58607FDF.2080200@amd.com" type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      Nice experiment, which is exactly SW scheduler can provide.<br>
      And as you said "<span style="font-family:monospace,monospace">I.e.
        your context can be scheduled into the<br>
        HW queue ahead of any other context, but everything already
        commited<br>
        to the HW queue is executed in strict FIFO order.</span>"<br>
      <br>
      If you want to keep <span style="font-family:monospace,monospace">consistent</span>
      latency, which will need to enable hw priority queue feature.<br>
      <br>
      Regards,<br>
      David Zhou<br>
      <br>
      <div class="moz-cite-prefix">On 2016年12月24日 06:20, Andres
        Rodriguez wrote:<br>
      </div>
      <blockquote
cite="mid:CAFQ_0eHg=Kf5qV50cgm51m6bTcMYdkgRXkT-sykJnYNzu3Zzsg@mail.gmail.com"
        type="cite">
        <div dir="ltr">
          <div>
            <div><span style="font-family:monospace,monospace">Hey John,<br>
                <br>
              </span></div>
            <span style="font-family:monospace,monospace">I've collected
              bit of data using high priority SW scheduler queues,<br>
            </span></div>
          <div><span style="font-family:monospace,monospace">thought you
              might be interested.<br>
            </span></div>
          <div><span style="font-family:monospace,monospace"><br>
              Implementation as per the patch above.<br>
              <br>
              Control test 1<br>
              ==============<br>
              <br>
              Sascha Willems mesh sample running on its own at regular
              priority<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh: ~0.14ms per-frame latency<br>
              <br>
              Control test 2<br>
              ==============<br>
              <br>
              Two Sascha Willems mesh sample running on its own at
              regular priority<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh 1: ~0.26ms per-frame latency<br>
              Mesh 2: ~0.26ms per-frame latency<br>
              <br>
              Test 1<br>
              ======<br>
              <br>
              Two Sascha Willems mesh samples running simultaneously.
              One at high<br>
              priority and the other running in a regular priority
              graphics context.<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh High:    0.14 - 0.24ms per-frame latency<br>
              Mesh Regular: 0.24 - 0.40ms per-frame latency<br>
              <br>
              Test 2<br>
              ======<br>
              <br>
              Ten Sascha Willems mesh samples running simultaneously.
              One at high<br>
              priority and the others running in a regular priority
              graphics context.<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh High:    0.14 - 0.8ms per-frame latency<br>
              Mesh Regular: 1.10 - 2.05ms per-frame latency<br>
              <br>
              Test 3<br>
              ======<br>
              <br>
              Two Sascha Willems mesh samples running simultaneously.
              One at high<br>
              priority and the other running in a regular priority
              graphics context.<br>
              <br>
              Also running Unigine Heaven at Exteme preset @ 2560x1600<br>
              <br>
              Results<br>
              -------<br>
              <br>
              Mesh High:     7 - 100ms per-frame latency </span><span
              style="font-family:monospace,monospace"><span
                style="font-family:monospace,monospace"><span
                  style="font-family:monospace,monospace">(Lots of
                  fluctuation)</span></span><br>
              Mesh Regular: 40 - 130ms per-frame latency</span><span
              style="font-family:monospace,monospace"><span
                style="font-family:monospace,monospace"></span><span
                style="font-family:monospace,monospace"><span
                  style="font-family:monospace,monospace"><span
                    style="font-family:monospace,monospace"> (Lots of
                    fluctuation)<br>
                  </span></span></span>Unigine Heaven: 20-40 fps<br>
              <br>
            </span><br>
            <span style="font-family:monospace,monospace"><span
                style="font-family:monospace,monospace">Test 4<br>
                ======<br>
                <br>
                Two Sascha Willems mesh samples running simultaneously.
                One at high<br>
                priority and the other running in a regular priority
                graphics context.<br>
                <br>
                Also running Talos Principle @ 4K<br>
                <br>
                Results<br>
                -------<br>
                <br>
                Mesh High:    0.14 - 3.97ms per-frame latency (Mostly
                floats ~0.4ms)<br>
                Mesh Regular: 0.43 - 8.11ms per-frame latency (Lots of
                fluctuation)<br>
                Talos: 24.8 fps AVG</span><br>
              <br>
              Observations<br>
              ============<br>
              <br>
              The high priority queue based on the SW scheduler provides
              significant<br>
              gains when paired with tasks that submit short duration
              commands into<br>
              the queue. This can be observed in tests 1 and 2.<br>
              <br>
              When the pipe is full of long running commands, the
              effects are dampened.<br>
              As observed in test 3, the per-frame latency suffers very
              large spikes,<br>
              and the latencies are very inconsistent.<br>
              <br>
              Talos seems to be a better behaved game. It may be
              submitting shorter<br>
              draw commands and the SW scheduler is able to interleave
              the rest of<br>
              the work.<br>
              <br>
              The results seem consistent with the hypothetical
              advantages the SW<br>
              scheduler should provide. I.e. your context can be
              scheduled into the<br>
              HW queue ahead of any other context, but everything
              already commited<br>
              to the HW queue is executed in strict FIFO order.<br>
              <br>
            </span></div>
          <div><span style="font-family:monospace,monospace">In order to
              deal with cases similar to Test 3, we will need to take<br>
            </span></div>
          <div><span style="font-family:monospace,monospace">advantage
              of further features.<br>
              <br>
              Notes<br>
              =====<br>
              <br>
              - Tests were run multiple times, and reboots were
              performed during tests.<br>
              - The mesh sample isn't really designed for benchmarking,
              but it should<br>
                be decent for ballpark figures<br>
              - The high priority mesh app was run with default niceness
              and also niceness<br>
                at -20. This had no effect on the results, so it was not
              added above.<br>
              - CPU usage was not saturated while running the tests<br>
              <br>
            </span></div>
          <div><span style="font-family:monospace,monospace">Regards,<br>
            </span></div>
          <div><span style="font-family:monospace,monospace">Andres<br>
            </span></div>
          <br>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Fri, Dec 23, 2016 at 1:18 PM,
            Pierre-Loup A. Griffais <span dir="ltr"><<a
                moz-do-not-send="true"
                href="mailto:pgriffais@valvesoftware.com"
                target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:pgriffais@valvesoftware.com">pgriffais@valvesoftware.com</a></a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">I hate
              to keep bringing up display topics in an unrelated
              conversation, but I'm not sure where you got "Application
              -> X server -> compositor -> X server" from. As I
              was saying before, we need to be presenting directly to
              the HMD display as no display server can be in the way,
              both for latency but also quality of service reasons (a
              buggy application cannot be allowed to accidentally
              display undistorted rendering into the HMD); we intend to
              do the necessary work for this, and the extent of X's (or
              a Wayland implementation, or any other display server)
              involvment will be to participate enough to know that the
              HMD display is off-limits. If you have more questions on
              the display aspect, or VR rendering in general, I'm happy
              to try to address them out-of-band from this conversation.
              <div class="HOEnZb">
                <div class="h5"><br>
                  <br>
                  On 12/23/2016 02:54 AM, Christian König wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      But yes, in general you don't want another
                      compositor in the way, so<br>
                      we'll be acquiring the HMD display directly,
                      separate from any desktop<br>
                      or display server.<br>
                    </blockquote>
                    Assuming that the the HMD is attached to the
                    rendering device in some<br>
                    way you have the X server and the Compositor which
                    both try to be DRM<br>
                    master at the same time.<br>
                    <br>
                    Please correct me if that was fixed in the meantime,
                    but that sounds<br>
                    like it will simply not work. Or is this what Andres
                    mention below Dave<br>
                    is working on ?.<br>
                    <br>
                    Additional to that a compositor in combination with
                    X is a bit counter<br>
                    productive when you want to keep the latency low.<br>
                    <br>
                    E.g. the "normal" flow of a GL or Vulkan surface
                    filled with rendered<br>
                    data to be displayed is from the Application -> X
                    server -> compositor<br>
                    -> X server.<br>
                    <br>
                    The extra step between X server and compositor just
                    means extra latency<br>
                    and for this use case you probably don't want that.<br>
                    <br>
                    Targeting something like Wayland and when you need X
                    compatibility<br>
                    XWayland sounds like the much better idea.<br>
                    <br>
                    Regards,<br>
                    Christian.<br>
                    <br>
                    Am 22.12.2016 um 20:54 schrieb Pierre-Loup A.
                    Griffais:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      Display concerns are a separate issue, and as
                      Andres said we have<br>
                      other plans to address. But yes, in general you
                      don't want another<br>
                      compositor in the way, so we'll be acquiring the
                      HMD display directly,<br>
                      separate from any desktop or display server. Same
                      with security, we<br>
                      can have a separate conversation about that when
                      the time comes.<br>
                      <br>
                      On 12/22/2016 08:41 AM, Serguei Sagalovitch wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex"> Andres,<br>
                        <br>
                        Did you measure  latency, etc. impact of __any__
                        compositor?<br>
                        <br>
                        My understanding is that VR has pretty strict
                        requirements related to<br>
                        QoS.<br>
                        <br>
                        Sincerely yours,<br>
                        Serguei Sagalovitch<br>
                        <br>
                        <br>
                        On 2016-12-22 11:35 AM, Andres Rodriguez wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex"> Hey Christian,<br>
                          <br>
                          We are currently interested in X, but with
                          some distros switching to<br>
                          other compositors by default, we also need to
                          consider those.<br>
                          <br>
                          We agree, running the full vrcompositor in
                          root isn't something that<br>
                          we want to do. Too many security concerns.
                          Having a small root helper<br>
                          that does the privilege escalation for us is
                          the initial idea.<br>
                          <br>
                          For a long term approach, Pierre-Loup and Dave
                          are working on dealing<br>
                          with the "two compositors" scenario a little
                          better in DRM+X.<br>
                          Fullscreen isn't really a sufficient approach,
                          since we don't want the<br>
                          HMD to be used as part of the Desktop
                          environment when a VR app is not<br>
                          in use (this is extremely annoying).<br>
                          <br>
                          When the above is settled, we should have an
                          auth mechanism besides<br>
                          DRM_MASTER or DRM_AUTH that allows the
                          vrcompositor to take over the<br>
                          HMD permanently away from X. Re-using that
                          auth method to gate this<br>
                          IOCTL is probably going to be the final
                          solution.<br>
                          <br>
                          I propose to start with ROOT_ONLY since it
                          should allow us to respect<br>
                          kernel IOCTL compatibility guidelines with the
                          most flexibility. Going<br>
                          from a restrictive to a more flexible
                          permission model would be<br>
                          inclusive, but going from a general to a
                          restrictive model may exclude<br>
                          some apps that used to work.<br>
                          <br>
                          Regards,<br>
                          Andres<br>
                          <br>
                          On 12/22/2016 6:42 AM, Christian König wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex"> Hi Andres,<br>
                            <br>
                            well using root might cause stability and
                            security problems as well.<br>
                            We worked quite hard to avoid exactly this
                            for X.<br>
                            <br>
                            We could make this feature depend on the
                            compositor being DRM master,<br>
                            but for example with X the X server is
                            master (and e.g. can change<br>
                            resolutions etc..) and not the compositor.<br>
                            <br>
                            So another question is also what windowing
                            system (if any) are you<br>
                            planning to use? X, Wayland, Flinger or
                            something completely<br>
                            different ?<br>
                            <br>
                            Regards,<br>
                            Christian.<br>
                            <br>
                            Am 20.12.2016 um 16:51 schrieb Andres
                            Rodriguez:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> Hi
                              Christian,<br>
                              <br>
                              That is definitely a concern. What we are
                              currently thinking is to<br>
                              make the high priority queues accessible
                              to root only.<br>
                              <br>
                              Therefore is a non-root user attempts to
                              set the high priority flag<br>
                              on context allocation, we would fail the
                              call and return ENOPERM.<br>
                              <br>
                              Regards,<br>
                              Andres<br>
                              <br>
                              <br>
                              On 12/20/2016 7:56 AM, Christian König
                              wrote:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex"> BTW: If there
                                  is  non-VR application which will use
                                  high-priority<br>
                                  h/w queue then VR application will
                                  suffer.  Any ideas how<br>
                                  to solve it?<br>
                                </blockquote>
                                Yeah, that problem came to my mind as
                                well.<br>
                                <br>
                                Basically we need to restrict those high
                                priority submissions to<br>
                                the VR compositor or otherwise any
                                malfunctioning application could<br>
                                use it.<br>
                                <br>
                                Just think about some WebGL suddenly
                                taking all our rendering away<br>
                                and we won't get anything drawn any
                                more.<br>
                                <br>
                                Alex or Michel any ideas on that?<br>
                                <br>
                                Regards,<br>
                                Christian.<br>
                                <br>
                                Am 19.12.2016 um 15:48 schrieb Serguei
                                Sagalovitch:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex"> > If
                                  compute queue is occupied only by you,
                                  the efficiency<br>
                                  > is equal with setting job queue
                                  to high priority I think.<br>
                                  The only risk is the situation when
                                  graphics will take all<br>
                                  needed CUs. But in any case it should
                                  be very good test.<br>
                                  <br>
                                  Andres/Pierre-Loup,<br>
                                  <br>
                                  Did you try to do it or it is a lot of
                                  work for you?<br>
                                  <br>
                                  <br>
                                  BTW: If there is  non-VR application
                                  which will use high-priority<br>
                                  h/w queue then VR application will
                                  suffer.  Any ideas how<br>
                                  to solve it?<br>
                                  <br>
                                  Sincerely yours,<br>
                                  Serguei Sagalovitch<br>
                                  <br>
                                  On 2016-12-19 12:50 AM, zhoucm1 wrote:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex"> Do you
                                    encounter the priority issue for
                                    compute queue with<br>
                                    current driver?<br>
                                    <br>
                                    If compute queue is occupied only by
                                    you, the efficiency is equal<br>
                                    with setting job queue to high
                                    priority I think.<br>
                                    <br>
                                    Regards,<br>
                                    David Zhou<br>
                                    <br>
                                    On 2016年12月19日 13:29, Andres
                                    Rodriguez wrote:<br>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex"> Yes,
                                      vulkan is available on all-open
                                      through the mesa radv UMD.<br>
                                      <br>
                                      I'm not sure if I'm asking for too
                                      much, but if we can<br>
                                      coordinate a similar interface in
                                      radv and amdgpu-pro at the<br>
                                      vulkan level that would be great.<br>
                                      <br>
                                      I'm not sure what that's going to
                                      be yet.<br>
                                      <br>
                                      - Andres<br>
                                      <br>
                                      On 12/19/2016 12:11 AM, zhoucm1
                                      wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex"> <br>
                                        <br>
                                        On 2016年12月19日 11:33,
                                        Pierre-Loup A. Griffais wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex"> We're
                                          currently working with the
                                          open stack; I assume that a<br>
                                          mechanism could be exposed by
                                          both open and Pro Vulkan<br>
                                          userspace drivers and that the
                                          amdgpu kernel interface<br>
                                          improvements we would pursue
                                          following this discussion
                                          would<br>
                                          let both drivers take
                                          advantage of the feature,
                                          correct?<br>
                                        </blockquote>
                                        Of course.<br>
                                        Does open stack have Vulkan
                                        support?<br>
                                        <br>
                                        Regards,<br>
                                        David Zhou<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex"> <br>
                                          On 12/18/2016 07:26 PM,
                                          zhoucm1 wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex"> By
                                            the way, are you using
                                            all-open driver or
                                            amdgpu-pro<br>
                                            driver?<br>
                                            <br>
                                            +David Mao, who is working
                                            on our Vulkan driver.<br>
                                            <br>
                                            Regards,<br>
                                            David Zhou<br>
                                            <br>
                                            On 2016年12月18日 06:05,
                                            Pierre-Loup A. Griffais
                                            wrote:<br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              Hi Serguei,<br>
                                              <br>
                                              I'm also working on the
                                              bringing up our VR runtime
                                              on top of<br>
                                              amgpu;<br>
                                              see replies inline.<br>
                                              <br>
                                              On 12/16/2016 09:05 PM,
                                              Sagalovitch, Serguei
                                              wrote:<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                Andres,<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                   For current VR
                                                  workloads we have 3
                                                  separate processes<br>
                                                  running<br>
                                                  actually:<br>
                                                </blockquote>
                                                So we could have
                                                potential memory
                                                overcommit case or do<br>
                                                you do<br>
                                                partitioning<br>
                                                on your own?  I would
                                                think that there is need
                                                to avoid<br>
                                                overcomit in<br>
                                                VR case to<br>
                                                prevent any BO
                                                migration.<br>
                                              </blockquote>
                                              <br>
                                              You're entirely correct;
                                              currently the VR runtime
                                              is<br>
                                              setting up<br>
                                              prioritized CPU scheduling
                                              for its VR compositor,
                                              we're<br>
                                              working on<br>
                                              prioritized GPU scheduling
                                              and pre-emption (eg. this<br>
                                              thread), and in<br>
                                              the future it will make
                                              sense to do work in order
                                              to make<br>
                                              sure that<br>
                                              its memory allocations do
                                              not get evicted, to
                                              prevent any<br>
                                              unwelcome<br>
                                              additional latency in the
                                              event of needing to
                                              perform<br>
                                              just-in-time<br>
                                              reprojection.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                BTW: Do you mean
                                                __real__ processes or
                                                threads?<br>
                                                Based on my
                                                understanding sharing
                                                BOs between different<br>
                                                processes<br>
                                                could introduce
                                                additional
                                                synchronization
                                                constrains. btw:<br>
                                                I am not<br>
                                                sure<br>
                                                if we are able to share
                                                Vulkan sync. object
                                                cross-process<br>
                                                boundary.<br>
                                              </blockquote>
                                              <br>
                                              They are different
                                              processes; it is important
                                              for the<br>
                                              compositor that<br>
                                              is responsible for
                                              quality-of-service
                                              features such as<br>
                                              consistently<br>
                                              presenting distorted
                                              frames with the right
                                              latency,<br>
                                              reprojection, etc,<br>
                                              to be separate from the
                                              main application.<br>
                                              <br>
                                              Currently we are using
                                              unreleased cross-process
                                              memory and<br>
                                              semaphore<br>
                                              extensions to fetch
                                              updated eye images from
                                              the client<br>
                                              application,<br>
                                              but the just-in-time
                                              reprojection discussed
                                              here does not<br>
                                              actually<br>
                                              have any direct
                                              interactions with
                                              cross-process resource<br>
                                              sharing,<br>
                                              since it's achieved by
                                              using whatever is the
                                              latest, most<br>
                                              up-to-date<br>
                                              eye images that have
                                              already been sent by the
                                              client<br>
                                              application,<br>
                                              which are already
                                              available to use without
                                              additional<br>
                                              synchronization.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                     3) System
                                                  compositor (we are
                                                  looking at approaches
                                                  to<br>
                                                  remove this<br>
                                                  overhead)<br>
                                                </blockquote>
                                                Yes,  IMHO the best is
                                                to run in  "full screen
                                                mode".<br>
                                              </blockquote>
                                              <br>
                                              Yes, we are working on
                                              mechanisms to present
                                              directly to the<br>
                                              headset<br>
                                              display without any
                                              intermediaries as a
                                              separate effort.<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                   The latency is our
                                                  main concern,<br>
                                                </blockquote>
                                                I would assume that this
                                                is the known problem (at
                                                least for<br>
                                                compute<br>
                                                usage).<br>
                                                It looks like that
                                                amdgpu / kernel
                                                submission is rather CPU<br>
                                                intensive<br>
                                                (at least<br>
                                                in the default
                                                configuration).<br>
                                              </blockquote>
                                              <br>
                                              As long as it's a
                                              consistent cost, it
                                              shouldn't an issue.<br>
                                              However, if<br>
                                              there's high degrees of
                                              variance then that would
                                              be<br>
                                              troublesome and we<br>
                                              would need to account for
                                              the worst case.<br>
                                              <br>
                                              Hopefully the requirements
                                              and approach we described
                                              make<br>
                                              sense, we're<br>
                                              looking forward to your
                                              feedback and suggestions.<br>
                                              <br>
                                              Thanks!<br>
                                               - Pierre-Loup<br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <br>
                                                Sincerely yours,<br>
                                                Serguei Sagalovitch<br>
                                                <br>
                                                <br>
                                                From: Andres Rodriguez
                                                <<a
                                                  moz-do-not-send="true"
href="mailto:andresr@valvesoftware.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:andresr@valvesoftware.com">andresr@valvesoftware.com</a></a>><br>
                                                Sent: December 16, 2016
                                                10:00 PM<br>
                                                To: Sagalovitch,
                                                Serguei; <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                Subject: RE: [RFC]
                                                Mechanism for high
                                                priority scheduling<br>
                                                in amdgpu<br>
                                                <br>
                                                Hey Serguei,<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  [Serguei] No. I mean
                                                  pipe :-) as MEC define
                                                  it.  As far<br>
                                                  as I<br>
                                                  understand (by
                                                  simplifying)<br>
                                                  some scheduling is per
                                                  pipe.  I know about
                                                  the current<br>
                                                  allocation<br>
                                                  scheme but I do not
                                                  think<br>
                                                  that it is  ideal.  I
                                                  would assume that we
                                                  need to<br>
                                                  switch to<br>
                                                  dynamical partition<br>
                                                  of resources  based on
                                                  the workload otherwise
                                                  we will have<br>
                                                  resource<br>
                                                  conflict<br>
                                                  between Vulkan compute
                                                  and  OpenCL.<br>
                                                </blockquote>
                                                <br>
                                                I agree the partitioning
                                                isn't ideal. I'm hoping
                                                we can<br>
                                                start with a<br>
                                                solution that assumes
                                                that<br>
                                                only pipe0 has any work
                                                and the other pipes are
                                                idle (no<br>
                                                HSA/ROCm<br>
                                                running on the system).<br>
                                                <br>
                                                This should be more or
                                                less the use case we
                                                expect from VR<br>
                                                users.<br>
                                                <br>
                                                I agree the split is
                                                currently not ideal, but
                                                I'd like to<br>
                                                consider<br>
                                                that a separate task,
                                                because<br>
                                                making it dynamic is not
                                                straight forward :P<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  [Serguei] Vulkan works
                                                  via amdgpu (kernel
                                                  submissions) so<br>
                                                  amdkfd<br>
                                                  will be not<br>
                                                  involved.  I would
                                                  assume that in the
                                                  case of VR we will<br>
                                                  have one main<br>
                                                  application ("console"
                                                  mode(?)) so we could
                                                  temporally<br>
                                                  "ignore"<br>
                                                  OpenCL/ROCm needs when
                                                  VR is running.<br>
                                                </blockquote>
                                                <br>
                                                Correct, this is why we
                                                want to enable the high
                                                priority<br>
                                                compute<br>
                                                queue through<br>
                                                libdrm-amdgpu, so that
                                                we can expose it through
                                                Vulkan<br>
                                                later.<br>
                                                <br>
                                                For current VR workloads
                                                we have 3 separate
                                                processes<br>
                                                running actually:<br>
                                                    1) Game process<br>
                                                    2) VR Compositor
                                                (this is the process
                                                that will require<br>
                                                high<br>
                                                priority queue)<br>
                                                    3) System compositor
                                                (we are looking at
                                                approaches to<br>
                                                remove this<br>
                                                overhead)<br>
                                                <br>
                                                For now I think it is
                                                okay to assume no
                                                OpenCL/ROCm running<br>
                                                simultaneously, but<br>
                                                I would also like to be
                                                able to address this
                                                case in the<br>
                                                future<br>
                                                (cross-pipe priorities).<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  [Serguei]  The problem
                                                  with pre-emption of
                                                  graphics task:<br>
                                                  (a) it<br>
                                                  may take time so<br>
                                                  latency may suffer<br>
                                                </blockquote>
                                                <br>
                                                The latency is our main
                                                concern, we want
                                                something that is<br>
                                                predictable. A good<br>
                                                illustration of what the
                                                reprojection scheduling
                                                looks like<br>
                                                can be<br>
                                                found here:<br>
                                                <a
                                                  moz-do-not-send="true"
href="https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://community.amd.com/serv">https://community.amd.com/serv</a><wbr>let/JiveServlet/showImage/38-<wbr>1310-104754/pastedImage_3.png</a><br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  (b) to preempt we need
                                                  to have different
                                                  "context" - we<br>
                                                  want<br>
                                                  to guarantee that
                                                  submissions from the
                                                  same context will<br>
                                                  be executed<br>
                                                  in order.<br>
                                                </blockquote>
                                                <br>
                                                This is okay, as the
                                                reprojection work
                                                doesn't have<br>
                                                dependencies on<br>
                                                the game context, and it<br>
                                                even happens in a
                                                separate process.<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  BTW: (a) Do you want
                                                  "preempt" and later
                                                  resume or do you<br>
                                                  want<br>
                                                  "preempt" and<br>
                                                  "cancel/abort"<br>
                                                </blockquote>
                                                <br>
                                                Preempt the game with
                                                the compositor task and
                                                then resume<br>
                                                it.<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  (b) Vulkan is generic
                                                  API and could be used
                                                  for graphics<br>
                                                  as well as<br>
                                                  for plain compute
                                                  tasks
                                                  (VK_QUEUE_COMPUTE_BIT).<br>
                                                </blockquote>
                                                <br>
                                                Yeah, the plan is to use
                                                vulkan compute. But if
                                                you figure<br>
                                                out a way<br>
                                                for us to get<br>
                                                a guaranteed execution
                                                time using vulkan
                                                graphics, then<br>
                                                I'll take you<br>
                                                out for a beer :)<br>
                                                <br>
                                                Regards,<br>
                                                Andres<br>
______________________________<wbr>__________<br>
                                                From: Sagalovitch,
                                                Serguei [<a
                                                  moz-do-not-send="true"
href="mailto:Serguei.Sagalovitch@amd.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Serguei.Sagalovitch@amd.com">Serguei.Sagalovitch@amd.com</a></a>]<br>
                                                Sent: Friday, December
                                                16, 2016 9:13 PM<br>
                                                To: Andres Rodriguez; <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                Subject: Re: [RFC]
                                                Mechanism for high
                                                priority scheduling<br>
                                                in amdgpu<br>
                                                <br>
                                                Hi Andres,<br>
                                                <br>
                                                Please see inline (as
                                                [Serguei])<br>
                                                <br>
                                                Sincerely yours,<br>
                                                Serguei Sagalovitch<br>
                                                <br>
                                                <br>
                                                From: Andres Rodriguez
                                                <<a
                                                  moz-do-not-send="true"
href="mailto:andresr@valvesoftware.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:andresr@valvesoftware.com">andresr@valvesoftware.com</a></a>><br>
                                                Sent: December 16, 2016
                                                8:29 PM<br>
                                                To: Sagalovitch,
                                                Serguei; <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                Subject: RE: [RFC]
                                                Mechanism for high
                                                priority scheduling<br>
                                                in amdgpu<br>
                                                <br>
                                                Hi Serguei,<br>
                                                <br>
                                                Thanks for the feedback.
                                                Answers inline as [AR].<br>
                                                <br>
                                                Regards,<br>
                                                Andres<br>
                                                <br>
______________________________<wbr>__________<br>
                                                From: Sagalovitch,
                                                Serguei [<a
                                                  moz-do-not-send="true"
href="mailto:Serguei.Sagalovitch@amd.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Serguei.Sagalovitch@amd.com">Serguei.Sagalovitch@amd.com</a></a>]<br>
                                                Sent: Friday, December
                                                16, 2016 8:15 PM<br>
                                                To: Andres Rodriguez; <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                Subject: Re: [RFC]
                                                Mechanism for high
                                                priority scheduling<br>
                                                in amdgpu<br>
                                                <br>
                                                Andres,<br>
                                                <br>
                                                <br>
                                                Quick comments:<br>
                                                <br>
                                                1) To minimize
                                                "bubbles", etc. we need
                                                to "force" CU<br>
                                                assignments/binding<br>
                                                to high-priority queue 
                                                when it will be in use
                                                and "free"<br>
                                                them later<br>
                                                (we  do not want forever
                                                take CUs from e.g.
                                                graphic task to<br>
                                                degrade<br>
                                                graphics<br>
                                                performance).<br>
                                                <br>
                                                Otherwise we could have
                                                scenario when long
                                                graphics task (or<br>
                                                low-priority<br>
                                                compute) will took all
                                                (extra) CUs and
                                                high--priority will<br>
                                                wait for<br>
                                                needed resources.<br>
                                                It will not be visible
                                                on "NOP " but only when
                                                you submit<br>
                                                "real"<br>
                                                compute task<br>
                                                so I would recommend 
                                                not to use "NOP" packets
                                                at all for<br>
                                                testing.<br>
                                                <br>
                                                It (CU assignment) could
                                                be relatively easy done
                                                when<br>
                                                everything is<br>
                                                going via kernel<br>
                                                (e.g. as part of frame
                                                submission) but I must
                                                admit that I<br>
                                                am not sure<br>
                                                about the best way for
                                                user level submissions
                                                (amdkfd).<br>
                                                <br>
                                                [AR] I wasn't aware of
                                                this part of the
                                                programming<br>
                                                sequence. Thanks<br>
                                                for the heads up!<br>
                                                Is this similar to the
                                                CU masking programming?<br>
                                                [Serguei] Yes. To
                                                simplify: the problem is
                                                that "scheduler"<br>
                                                when<br>
                                                deciding which<br>
                                                queue to  run will check
                                                if there is enough
                                                resources and<br>
                                                if not then<br>
                                                it will begin<br>
                                                to check other queues
                                                with lower priority.<br>
                                                <br>
                                                2) I would recommend to
                                                dedicate the whole pipe
                                                to<br>
                                                high-priority<br>
                                                queue and have<br>
                                                nothing their except it.<br>
                                                <br>
                                                [AR] I'm guessing in
                                                this context you mean
                                                pipe = queue?<br>
                                                (as opposed<br>
                                                to the MEC definition<br>
                                                of pipe, which is a
                                                grouping of queues). I
                                                say this because<br>
                                                amdgpu<br>
                                                only has access to 1
                                                pipe,<br>
                                                and the rest are
                                                statically partitioned
                                                for amdkfd usage.<br>
                                                <br>
                                                [Serguei] No. I mean
                                                pipe :-)  as MEC define
                                                it. As far as I<br>
                                                understand (by
                                                simplifying)<br>
                                                some scheduling is per
                                                pipe.  I know about the
                                                current<br>
                                                allocation<br>
                                                scheme but I do not
                                                think<br>
                                                that it is  ideal.  I
                                                would assume that we
                                                need to switch to<br>
                                                dynamical partition<br>
                                                of resources  based on
                                                the workload otherwise
                                                we will have<br>
                                                resource<br>
                                                conflict<br>
                                                between Vulkan compute
                                                and  OpenCL.<br>
                                                <br>
                                                <br>
                                                BTW: Which user level
                                                API do you want to use
                                                for compute:<br>
                                                Vulkan or<br>
                                                OpenCL?<br>
                                                <br>
                                                [AR] Vulkan<br>
                                                <br>
                                                [Serguei] Vulkan works
                                                via amdgpu (kernel
                                                submissions) so<br>
                                                amdkfd will<br>
                                                be not<br>
                                                involved.  I would
                                                assume that in the case
                                                of VR we will<br>
                                                have one main<br>
                                                application ("console"
                                                mode(?)) so we could
                                                temporally<br>
                                                "ignore"<br>
                                                OpenCL/ROCm needs when
                                                VR is running.<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                   we will not be able
                                                  to provide a solution
                                                  compatible with<br>
                                                  GFX<br>
                                                  worloads.<br>
                                                </blockquote>
                                                I assume that you are
                                                talking about graphics?
                                                Am I right?<br>
                                                <br>
                                                [AR] Yeah, my
                                                understanding is that
                                                pre-empting the<br>
                                                currently running<br>
                                                graphics job and
                                                scheduling in<br>
                                                something else using
                                                mid-buffer pre-emption
                                                has some cases<br>
                                                where it<br>
                                                doesn't work well. But
                                                if with<br>
                                                polaris10 it starts
                                                working well, it might
                                                be a better<br>
                                                solution for<br>
                                                us (because the whole
                                                reprojection<br>
                                                work uses the vulkan
                                                graphics stack at the
                                                moment, and<br>
                                                porting it to<br>
                                                compute is not trivial).<br>
                                                <br>
                                                [Serguei]  The problem
                                                with pre-emption of
                                                graphics task:<br>
                                                (a) it may<br>
                                                take time so<br>
                                                latency may suffer (b)
                                                to preempt we need to
                                                have different<br>
                                                "context"<br>
                                                - we want<br>
                                                to guarantee that
                                                submissions from the
                                                same context will be<br>
                                                executed<br>
                                                in order.<br>
                                                BTW: (a) Do you want 
                                                "preempt" and later
                                                resume or do you<br>
                                                want<br>
                                                "preempt" and<br>
                                                "cancel/abort"?  (b)
                                                Vulkan is generic API
                                                and could be used<br>
                                                for graphics as well as
                                                for plain compute tasks<br>
                                                (VK_QUEUE_COMPUTE_BIT).<br>
                                                <br>
                                                <br>
                                                Sincerely yours,<br>
                                                Serguei Sagalovitch<br>
                                                <br>
                                                <br>
                                                <br>
                                                From: amd-gfx <<a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx-bounces@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx-bounces@lists.freedes">amd-gfx-bounces@lists.freedes</a><wbr>ktop.org</a>>
                                                on<br>
                                                behalf of<br>
                                                Andres Rodriguez <<a
                                                  moz-do-not-send="true"
href="mailto:andresr@valvesoftware.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:andresr@valvesoftware.com">andresr@valvesoftware.com</a></a>><br>
                                                Sent: December 16, 2016
                                                6:15 PM<br>
                                                To: <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                Subject: [RFC] Mechanism
                                                for high priority
                                                scheduling in<br>
                                                amdgpu<br>
                                                <br>
                                                Hi Everyone,<br>
                                                <br>
                                                This RFC is also
                                                available as a gist
                                                here:<br>
                                                <a
                                                  moz-do-not-send="true"
href="https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://gist.github.com/lostgo">https://gist.github.com/lostgo</a><wbr>at/7000432cd6864265dbc2c3ab932<wbr>04249</a><br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <br>
                                                <br>
                                                <br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://gist.github.com" rel="noreferrer" target="_blank">gist.github.com</a><br>
                                                [RFC] Mechanism for high
                                                priority scheduling in
                                                amdgpu<br>
                                                <br>
                                                <br>
                                                We are interested in
                                                feedback for a mechanism
                                                to<br>
                                                effectively schedule<br>
                                                high<br>
                                                priority VR reprojection
                                                tasks (also referred to
                                                as<br>
                                                time-warping) for<br>
                                                Polaris10<br>
                                                running on the amdgpu
                                                kernel driver.<br>
                                                <br>
                                                Brief context:<br>
                                                --------------<br>
                                                <br>
                                                The main objective of
                                                reprojection is to avoid
                                                motion<br>
                                                sickness for VR<br>
                                                users in<br>
                                                scenarios where the game
                                                or application would
                                                fail to finish<br>
                                                rendering a new<br>
                                                frame in time for the
                                                next VBLANK. When this
                                                happens, the<br>
                                                user's head<br>
                                                movements<br>
                                                are not reflected on the
                                                Head Mounted Display
                                                (HMD) for the<br>
                                                duration<br>
                                                of an<br>
                                                extra frame. This
                                                extended mismatch
                                                between the inner ear<br>
                                                and the<br>
                                                eyes may<br>
                                                cause the user to
                                                experience motion
                                                sickness.<br>
                                                <br>
                                                The VR compositor deals
                                                with this problem by
                                                fabricating a<br>
                                                new frame<br>
                                                using the<br>
                                                user's updated head
                                                position in combination
                                                with the<br>
                                                previous frames.<br>
                                                This<br>
                                                avoids a prolonged
                                                mismatch between the HMD
                                                output and the<br>
                                                inner ear.<br>
                                                <br>
                                                Because of the adverse
                                                effects on the user, we
                                                require high<br>
                                                confidence that the<br>
                                                reprojection task will
                                                complete before the
                                                VBLANK interval.<br>
                                                Even if<br>
                                                the GFX pipe<br>
                                                is currently full of
                                                work from the
                                                game/application (which<br>
                                                is most<br>
                                                likely the case).<br>
                                                <br>
                                                For more details and
                                                illustrations, please
                                                refer to the<br>
                                                following<br>
                                                document:<br>
                                                <a
                                                  moz-do-not-send="true"
href="https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://community.amd.com/comm">https://community.amd.com/comm</a><wbr>unity/gaming/blog/2016/03/28/<wbr>asynchronous-shaders-evolved</a><br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                Gaming: Asynchronous
                                                Shaders Evolved |
                                                Community<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                                One of the most exciting
                                                new developments in GPU
                                                technology<br>
                                                over the<br>
                                                past year has been the
                                                adoption of asynchronous
                                                shaders,<br>
                                                which can<br>
                                                make more efficient use
                                                of ...<br>
                                                <br>
                                                <br>
                                                <br>
                                                Gaming: Asynchronous
                                                Shaders Evolved |
                                                Community<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                                One of the most exciting
                                                new developments in GPU
                                                technology<br>
                                                over the<br>
                                                past year has been the
                                                adoption of asynchronous
                                                shaders,<br>
                                                which can<br>
                                                make more efficient use
                                                of ...<br>
                                                <br>
                                                <br>
                                                <br>
                                                Gaming: Asynchronous
                                                Shaders Evolved |
                                                Community<br>
                                                <a
                                                  moz-do-not-send="true"
href="http://community.amd.com" rel="noreferrer" target="_blank">community.amd.com</a><br>
                                                One of the most exciting
                                                new developments in GPU
                                                technology<br>
                                                over the<br>
                                                past year has been the
                                                adoption of asynchronous
                                                shaders,<br>
                                                which can<br>
                                                make more efficient use
                                                of ...<br>
                                                <br>
                                                <br>
                                                Requirements:<br>
                                                -------------<br>
                                                <br>
                                                The mechanism must
                                                expose the following
                                                functionaility:<br>
                                                <br>
                                                    * Job round trip
                                                time must be
                                                predictable, from<br>
                                                submission to<br>
                                                fence signal<br>
                                                <br>
                                                    * The mechanism must
                                                support compute
                                                workloads.<br>
                                                <br>
                                                Goals:<br>
                                                ------<br>
                                                <br>
                                                    * The mechanism
                                                should provide low
                                                submission latencies<br>
                                                <br>
                                                Test: submitting a NOP
                                                packet through the
                                                mechanism on busy<br>
                                                hardware<br>
                                                should<br>
                                                be equivalent to
                                                submitting a NOP on idle
                                                hardware.<br>
                                                <br>
                                                Nice to have:<br>
                                                -------------<br>
                                                <br>
                                                    * The mechanism
                                                should also support GFX
                                                workloads.<br>
                                                <br>
                                                My understanding is that
                                                with the current
                                                hardware<br>
                                                capabilities in<br>
                                                Polaris10 we<br>
                                                will not be able to
                                                provide a solution
                                                compatible with GFX<br>
                                                worloads.<br>
                                                <br>
                                                But I would love to hear
                                                otherwise. So if anyone
                                                has an<br>
                                                idea,<br>
                                                approach or<br>
                                                suggestion that will
                                                also be compatible with
                                                the GFX ring,<br>
                                                please let<br>
                                                us know<br>
                                                about it.<br>
                                                <br>
                                                    * The above
                                                guarantees should also
                                                be respected by<br>
                                                amdkfd workloads<br>
                                                <br>
                                                Would be good to have
                                                for consistency, but not
                                                strictly<br>
                                                necessary as<br>
                                                users running<br>
                                                games are not
                                                traditionally running
                                                HPC workloads in the<br>
                                                background.<br>
                                                <br>
                                                Proposed approach:<br>
                                                ------------------<br>
                                                <br>
                                                Similar to the windows
                                                driver, we could expose
                                                a high<br>
                                                priority<br>
                                                compute queue to<br>
                                                userspace.<br>
                                                <br>
                                                Submissions to this
                                                compute queue will be
                                                scheduled with<br>
                                                high<br>
                                                priority, and may<br>
                                                acquire hardware
                                                resources previously in
                                                use by other<br>
                                                queues.<br>
                                                <br>
                                                This can be achieved by
                                                taking advantage of the
                                                'priority'<br>
                                                field in<br>
                                                the HQDs<br>
                                                and could be programmed
                                                by amdgpu or the amdgpu
                                                scheduler.<br>
                                                The relevant<br>
                                                register fields are:<br>
                                                        *
                                                mmCP_HQD_PIPE_PRIORITY<br>
                                                        *
                                                mmCP_HQD_QUEUE_PRIORITY<br>
                                                <br>
                                                Implementation approach
                                                1 - static partitioning:<br>
------------------------------<wbr>------------------<br>
                                                <br>
                                                The amdgpu driver
                                                currently controls 8
                                                compute queues from<br>
                                                pipe0. We can<br>
                                                statically partition
                                                these as follows:<br>
                                                        * 7x regular<br>
                                                        * 1x high
                                                priority<br>
                                                <br>
                                                The relevant priorities
                                                can be set so that
                                                submissions to<br>
                                                the high<br>
                                                priority<br>
                                                ring will starve the
                                                other compute rings and
                                                the GFX ring.<br>
                                                <br>
                                                The amdgpu scheduler
                                                will only place jobs
                                                into the high<br>
                                                priority<br>
                                                rings if the<br>
                                                context is marked as
                                                high priority. And a
                                                corresponding<br>
                                                priority<br>
                                                should be<br>
                                                added to keep track of
                                                this information:<br>
                                                     *
                                                AMD_SCHED_PRIORITY_KERNEL<br>
                                                     * ->
                                                AMD_SCHED_PRIORITY_HIGH<br>
                                                     *
                                                AMD_SCHED_PRIORITY_NORMAL<br>
                                                <br>
                                                The user will request a
                                                high priority context by
                                                setting an<br>
                                                appropriate flag<br>
                                                in drm_amdgpu_ctx_in
                                                (AMDGPU_CTX_HIGH_PRIORITY
                                                or similar):<br>
                                                <a
                                                  moz-do-not-send="true"
href="https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://github.com/torvalds/li">https://github.com/torvalds/li</a><wbr>nux/blob/master/include/uapi/<wbr>drm/amdgpu_drm.h#L163</a><br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                The setting is in a per
                                                context level so that we
                                                can:<br>
                                                    * Maintain a
                                                consistent FIFO ordering
                                                of all<br>
                                                submissions to a<br>
                                                context<br>
                                                    * Create high
                                                priority and non-high
                                                priority contexts<br>
                                                in the same<br>
                                                process<br>
                                                <br>
                                                Implementation approach
                                                2 - dynamic priority
                                                programming:<br>
------------------------------<wbr>---------------------------<br>
                                                <br>
                                                Similar to the above,
                                                but instead of
                                                programming the<br>
                                                priorities and<br>
                                                amdgpu_init() time, the
                                                SW scheduler will
                                                reprogram the<br>
                                                queue priorities<br>
                                                dynamically when
                                                scheduling a task.<br>
                                                <br>
                                                This would involve
                                                having a hardware
                                                specific callback from<br>
                                                the<br>
                                                scheduler to<br>
                                                set the appropriate
                                                queue priority:
                                                set_priority(int ring,<br>
                                                int index,<br>
                                                int priority)<br>
                                                <br>
                                                During this callback we
                                                would have to grab the
                                                SRBM mutex<br>
                                                to perform<br>
                                                the appropriate<br>
                                                HW programming, and I'm
                                                not really sure if that
                                                is<br>
                                                something we<br>
                                                should be doing from<br>
                                                the scheduler.<br>
                                                <br>
                                                On the positive side,
                                                this approach would
                                                allow us to<br>
                                                program a range of<br>
                                                priorities for jobs
                                                instead of a single
                                                "high priority"<br>
                                                value",<br>
                                                achieving<br>
                                                something similar to the
                                                niceness API available
                                                for CPU<br>
                                                scheduling.<br>
                                                <br>
                                                I'm not sure if this
                                                flexibility is something
                                                that we would<br>
                                                need for<br>
                                                our use<br>
                                                case, but it might be
                                                useful in other
                                                scenarios (multiple<br>
                                                users<br>
                                                sharing compute<br>
                                                time on a server).<br>
                                                <br>
                                                This approach would
                                                require a new int field
                                                in<br>
                                                drm_amdgpu_ctx_in, or<br>
                                                repurposing<br>
                                                of the flags field.<br>
                                                <br>
                                                Known current obstacles:<br>
                                                ------------------------<br>
                                                <br>
                                                The SQ is currently
                                                programmed to disregard
                                                the HQD<br>
                                                priorities, and<br>
                                                instead it picks<br>
                                                jobs at random. Settings
                                                from the shader itself
                                                are also<br>
                                                disregarded<br>
                                                as this is<br>
                                                considered a privileged
                                                field.<br>
                                                <br>
                                                Effectively we can get
                                                our compute wavefront
                                                launched ASAP,<br>
                                                but we<br>
                                                might not get the<br>
                                                time we need on the SQ.<br>
                                                <br>
                                                The current programming
                                                would have to be changed
                                                to allow<br>
                                                priority<br>
                                                propagation<br>
                                                from the HQD into the
                                                SQ.<br>
                                                <br>
                                                Generic approach for all
                                                HW IPs:<br>
------------------------------<wbr>--<br>
                                                <br>
                                                For consistency
                                                purposes, the high
                                                priority context can be<br>
                                                enabled<br>
                                                for all HW IPs<br>
                                                with support of the SW
                                                scheduler. This will
                                                function<br>
                                                similarly to the<br>
                                                current<br>
AMD_SCHED_PRIORITY_KERNEL priority, where the job can jump<br>
                                                ahead of<br>
                                                anything not<br>
                                                commited to the HW
                                                queue.<br>
                                                <br>
                                                The benefits of
                                                requesting a high
                                                priority context for a<br>
                                                non-compute<br>
                                                queue will<br>
                                                be lesser (e.g. up to
                                                10s of wait time if a
                                                GFX command is<br>
                                                stuck in<br>
                                                front of<br>
                                                you), but having the API
                                                in place will allow us
                                                to easily<br>
                                                improve the<br>
                                                implementation<br>
                                                in the future as new
                                                features become
                                                available in new<br>
                                                hardware.<br>
                                                <br>
                                                Future steps:<br>
                                                -------------<br>
                                                <br>
                                                Once we have an approach
                                                settled, I can take care
                                                of the<br>
                                                implementation.<br>
                                                <br>
                                                Also, once the interface
                                                is mostly decided, we
                                                can start<br>
                                                thinking about<br>
                                                exposing the high
                                                priority queue through
                                                radv.<br>
                                                <br>
                                                Request for feedback:<br>
                                                ---------------------<br>
                                                <br>
                                                We aren't married to any
                                                of the approaches
                                                outlined above.<br>
                                                Our goal<br>
                                                is to<br>
                                                obtain a mechanism that
                                                will allow us to
                                                complete the<br>
                                                reprojection<br>
                                                job within a<br>
                                                predictable amount of
                                                time. So if anyone
                                                anyone has any<br>
                                                suggestions for<br>
                                                improvements or
                                                alternative strategies
                                                we are more than<br>
                                                happy to hear<br>
                                                them.<br>
                                                <br>
                                                If any of the technical
                                                information above is
                                                also<br>
                                                incorrect, feel<br>
                                                free to point<br>
                                                out my
                                                misunderstandings.<br>
                                                <br>
                                                Looking forward to
                                                hearing from you.<br>
                                                <br>
                                                Regards,<br>
                                                Andres<br>
                                                <br>
______________________________<wbr>_________________<br>
                                                amd-gfx mailing list<br>
                                                <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                <a
                                                  moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/">https://lists.freedesktop.org/</a><wbr>mailman/listinfo/amd-gfx</a><br>
                                                <br>
                                                <br>
                                                amd-gfx Info Page - <a
                                                  moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                                <a
                                                  moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                                To see the collection of
                                                prior postings to the
                                                list,<br>
                                                visit the<br>
                                                amd-gfx Archives. Using
                                                amd-gfx: To post a
                                                message to all<br>
                                                the list<br>
                                                members, send email ...<br>
                                                <br>
                                                <br>
                                                <br>
                                                amd-gfx Info Page - <a
                                                  moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                                <a
                                                  moz-do-not-send="true"
href="http://lists.freedesktop.org" rel="noreferrer" target="_blank">lists.freedesktop.org</a><br>
                                                To see the collection of
                                                prior postings to the
                                                list,<br>
                                                visit the<br>
                                                amd-gfx Archives. Using
                                                amd-gfx: To post a
                                                message to all<br>
                                                the list<br>
                                                members, send email ...<br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
                                                <br>
______________________________<wbr>_________________<br>
                                                amd-gfx mailing list<br>
                                                <a
                                                  moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></a><br>
                                                <a
                                                  moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                                  rel="noreferrer"
                                                  target="_blank"><a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/">https://lists.freedesktop.org/</a><wbr>mailman/listinfo/amd-gfx</a><br>
                                                <br>
                                              </blockquote>
                                              <br>
______________________________<wbr>_________________<br>
                                              amd-gfx mailing list<br>
                                              <a moz-do-not-send="true"
href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                              <a moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                                rel="noreferrer"
                                                target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                            </blockquote>
                                            <br>
                                          </blockquote>
                                          <br>
                                        </blockquote>
                                        <br>
                                        ______________________________<wbr>_________________<br>
                                        amd-gfx mailing list<br>
                                        <a moz-do-not-send="true"
                                          href="mailto:amd-gfx@lists.freedesktop.org"
                                          target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                        <a moz-do-not-send="true"
                                          href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                          rel="noreferrer"
                                          target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                      </blockquote>
                                      <br>
                                    </blockquote>
                                    <br>
                                  </blockquote>
                                  <br>
                                  Sincerely yours,<br>
                                  Serguei Sagalovitch<br>
                                  <br>
                                  ______________________________<wbr>_________________<br>
                                  amd-gfx mailing list<br>
                                  <a moz-do-not-send="true"
                                    href="mailto:amd-gfx@lists.freedesktop.org"
                                    target="_blank">amd-gfx@lists.freedesktop.org</a><br>
                                  <a moz-do-not-send="true"
                                    href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                    rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
                                </blockquote>
                                <br>
                                <br>
                              </blockquote>
                              <br>
                            </blockquote>
                            <br>
                          </blockquote>
                          <br>
                        </blockquote>
                        <br>
                        Sincerely yours,<br>
                        Serguei Sagalovitch<br>
                        <br>
                      </blockquote>
                      <br>
                    </blockquote>
                    <br>
                  </blockquote>
                  <br>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </blockquote>
      <br>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>