[RFC] Mechanism for high priority scheduling in amdgpu

Mon Dec 26 02:26:39 UTC 2016

Nice experiment, which is exactly SW scheduler can provide.
And as you said "I.e. your context can be scheduled into the
HW queue ahead of any other context, but everything already commited
to the HW queue is executed in strict FIFO order."

If you want to keep consistent latency, which will need to enable hw 
priority queue feature.

Regards,
David Zhou

On 2016年12月24日 06:20, Andres Rodriguez wrote:
> Hey John,
>
> I've collected bit of data using high priority SW scheduler queues,
> thought you might be interested.
>
> Implementation as per the patch above.
>
> Control test 1
> ==============
>
> Sascha Willems mesh sample running on its own at regular priority
>
> Results
> -------
>
> Mesh: ~0.14ms per-frame latency
>
> Control test 2
> ==============
>
> Two Sascha Willems mesh sample running on its own at regular priority
>
> Results
> -------
>
> Mesh 1: ~0.26ms per-frame latency
> Mesh 2: ~0.26ms per-frame latency
>
> Test 1
> ======
>
> Two Sascha Willems mesh samples running simultaneously. One at high
> priority and the other running in a regular priority graphics context.
>
> Results
> -------
>
> Mesh High:    0.14 - 0.24ms per-frame latency
> Mesh Regular: 0.24 - 0.40ms per-frame latency
>
> Test 2
> ======
>
> Ten Sascha Willems mesh samples running simultaneously. One at high
> priority and the others running in a regular priority graphics context.
>
> Results
> -------
>
> Mesh High:    0.14 - 0.8ms per-frame latency
> Mesh Regular: 1.10 - 2.05ms per-frame latency
>
> Test 3
> ======
>
> Two Sascha Willems mesh samples running simultaneously. One at high
> priority and the other running in a regular priority graphics context.
>
> Also running Unigine Heaven at Exteme preset @ 2560x1600
>
> Results
> -------
>
> Mesh High:     7 - 100ms per-frame latency (Lots of fluctuation)
> Mesh Regular: 40 - 130ms per-frame latency(Lots of fluctuation)
> Unigine Heaven: 20-40 fps
>
>
> Test 4
> ======
>
> Two Sascha Willems mesh samples running simultaneously. One at high
> priority and the other running in a regular priority graphics context.
>
> Also running Talos Principle @ 4K
>
> Results
> -------
>
> Mesh High:    0.14 - 3.97ms per-frame latency (Mostly floats ~0.4ms)
> Mesh Regular: 0.43 - 8.11ms per-frame latency (Lots of fluctuation)
> Talos: 24.8 fps AVG
>
> Observations
> ============
>
> The high priority queue based on the SW scheduler provides significant
> gains when paired with tasks that submit short duration commands into
> the queue. This can be observed in tests 1 and 2.
>
> When the pipe is full of long running commands, the effects are dampened.
> As observed in test 3, the per-frame latency suffers very large spikes,
> and the latencies are very inconsistent.
>
> Talos seems to be a better behaved game. It may be submitting shorter
> draw commands and the SW scheduler is able to interleave the rest of
> the work.
>
> The results seem consistent with the hypothetical advantages the SW
> scheduler should provide. I.e. your context can be scheduled into the
> HW queue ahead of any other context, but everything already commited
> to the HW queue is executed in strict FIFO order.
>
> In order to deal with cases similar to Test 3, we will need to take
> advantage of further features.
>
> Notes
> =====
>
> - Tests were run multiple times, and reboots were performed during tests.
> - The mesh sample isn't really designed for benchmarking, but it should
>   be decent for ballpark figures
> - The high priority mesh app was run with default niceness and also 
> niceness
>   at -20. This had no effect on the results, so it was not added above.
> - CPU usage was not saturated while running the tests
>
> Regards,
> Andres
>
>
> On Fri, Dec 23, 2016 at 1:18 PM, Pierre-Loup A. Griffais 
> <pgriffais at valvesoftware.com <mailto:pgriffais at valvesoftware.com>> wrote:
>
>     I hate to keep bringing up display topics in an unrelated
>     conversation, but I'm not sure where you got "Application -> X
>     server -> compositor -> X server" from. As I was saying before, we
>     need to be presenting directly to the HMD display as no display
>     server can be in the way, both for latency but also quality of
>     service reasons (a buggy application cannot be allowed to
>     accidentally display undistorted rendering into the HMD); we
>     intend to do the necessary work for this, and the extent of X's
>     (or a Wayland implementation, or any other display server)
>     involvment will be to participate enough to know that the HMD
>     display is off-limits. If you have more questions on the display
>     aspect, or VR rendering in general, I'm happy to try to address
>     them out-of-band from this conversation.
>
>
>     On 12/23/2016 02:54 AM, Christian König wrote:
>
>             But yes, in general you don't want another compositor in
>             the way, so
>             we'll be acquiring the HMD display directly, separate from
>             any desktop
>             or display server.
>
>         Assuming that the the HMD is attached to the rendering device
>         in some
>         way you have the X server and the Compositor which both try to
>         be DRM
>         master at the same time.
>
>         Please correct me if that was fixed in the meantime, but that
>         sounds
>         like it will simply not work. Or is this what Andres mention
>         below Dave
>         is working on ?.
>
>         Additional to that a compositor in combination with X is a bit
>         counter
>         productive when you want to keep the latency low.
>
>         E.g. the "normal" flow of a GL or Vulkan surface filled with
>         rendered
>         data to be displayed is from the Application -> X server ->
>         compositor
>         -> X server.
>
>         The extra step between X server and compositor just means
>         extra latency
>         and for this use case you probably don't want that.
>
>         Targeting something like Wayland and when you need X compatibility
>         XWayland sounds like the much better idea.
>
>         Regards,
>         Christian.
>
>         Am 22.12.2016 um 20:54 schrieb Pierre-Loup A. Griffais:
>
>             Display concerns are a separate issue, and as Andres said
>             we have
>             other plans to address. But yes, in general you don't want
>             another
>             compositor in the way, so we'll be acquiring the HMD
>             display directly,
>             separate from any desktop or display server. Same with
>             security, we
>             can have a separate conversation about that when the time
>             comes.
>
>             On 12/22/2016 08:41 AM, Serguei Sagalovitch wrote:
>
>                 Andres,
>
>                 Did you measure  latency, etc. impact of __any__
>                 compositor?
>
>                 My understanding is that VR has pretty strict
>                 requirements related to
>                 QoS.
>
>                 Sincerely yours,
>                 Serguei Sagalovitch
>
>
>                 On 2016-12-22 11:35 AM, Andres Rodriguez wrote:
>
>                     Hey Christian,
>
>                     We are currently interested in X, but with some
>                     distros switching to
>                     other compositors by default, we also need to
>                     consider those.
>
>                     We agree, running the full vrcompositor in root
>                     isn't something that
>                     we want to do. Too many security concerns. Having
>                     a small root helper
>                     that does the privilege escalation for us is the
>                     initial idea.
>
>                     For a long term approach, Pierre-Loup and Dave are
>                     working on dealing
>                     with the "two compositors" scenario a little
>                     better in DRM+X.
>                     Fullscreen isn't really a sufficient approach,
>                     since we don't want the
>                     HMD to be used as part of the Desktop environment
>                     when a VR app is not
>                     in use (this is extremely annoying).
>
>                     When the above is settled, we should have an auth
>                     mechanism besides
>                     DRM_MASTER or DRM_AUTH that allows the
>                     vrcompositor to take over the
>                     HMD permanently away from X. Re-using that auth
>                     method to gate this
>                     IOCTL is probably going to be the final solution.
>
>                     I propose to start with ROOT_ONLY since it should
>                     allow us to respect
>                     kernel IOCTL compatibility guidelines with the
>                     most flexibility. Going
>                     from a restrictive to a more flexible permission
>                     model would be
>                     inclusive, but going from a general to a
>                     restrictive model may exclude
>                     some apps that used to work.
>
>                     Regards,
>                     Andres
>
>                     On 12/22/2016 6:42 AM, Christian König wrote:
>
>                         Hi Andres,
>
>                         well using root might cause stability and
>                         security problems as well.
>                         We worked quite hard to avoid exactly this for X.
>
>                         We could make this feature depend on the
>                         compositor being DRM master,
>                         but for example with X the X server is master
>                         (and e.g. can change
>                         resolutions etc..) and not the compositor.
>
>                         So another question is also what windowing
>                         system (if any) are you
>                         planning to use? X, Wayland, Flinger or
>                         something completely
>                         different ?
>
>                         Regards,
>                         Christian.
>
>                         Am 20.12.2016 um 16:51 schrieb Andres Rodriguez:
>
>                             Hi Christian,
>
>                             That is definitely a concern. What we are
>                             currently thinking is to
>                             make the high priority queues accessible
>                             to root only.
>
>                             Therefore is a non-root user attempts to
>                             set the high priority flag
>                             on context allocation, we would fail the
>                             call and return ENOPERM.
>
>                             Regards,
>                             Andres
>
>
>                             On 12/20/2016 7:56 AM, Christian König wrote:
>
>                                     BTW: If there is  non-VR
>                                     application which will use
>                                     high-priority
>                                     h/w queue then VR application will
>                                     suffer.  Any ideas how
>                                     to solve it?
>
>                                 Yeah, that problem came to my mind as
>                                 well.
>
>                                 Basically we need to restrict those
>                                 high priority submissions to
>                                 the VR compositor or otherwise any
>                                 malfunctioning application could
>                                 use it.
>
>                                 Just think about some WebGL suddenly
>                                 taking all our rendering away
>                                 and we won't get anything drawn any more.
>
>                                 Alex or Michel any ideas on that?
>
>                                 Regards,
>                                 Christian.
>
>                                 Am 19.12.2016 um 15:48 schrieb Serguei
>                                 Sagalovitch:
>
>                                     > If compute queue is occupied
>                                     only by you, the efficiency
>                                     > is equal with setting job queue
>                                     to high priority I think.
>                                     The only risk is the situation
>                                     when graphics will take all
>                                     needed CUs. But in any case it
>                                     should be very good test.
>
>                                     Andres/Pierre-Loup,
>
>                                     Did you try to do it or it is a
>                                     lot of work for you?
>
>
>                                     BTW: If there is  non-VR
>                                     application which will use
>                                     high-priority
>                                     h/w queue then VR application will
>                                     suffer.  Any ideas how
>                                     to solve it?
>
>                                     Sincerely yours,
>                                     Serguei Sagalovitch
>
>                                     On 2016-12-19 12:50 AM, zhoucm1 wrote:
>
>                                         Do you encounter the priority
>                                         issue for compute queue with
>                                         current driver?
>
>                                         If compute queue is occupied
>                                         only by you, the efficiency is
>                                         equal
>                                         with setting job queue to high
>                                         priority I think.
>
>                                         Regards,
>                                         David Zhou
>
>                                         On 2016年12月19日 13:29,
>                                         Andres Rodriguez wrote:
>
>                                             Yes, vulkan is available
>                                             on all-open through the
>                                             mesa radv UMD.
>
>                                             I'm not sure if I'm asking
>                                             for too much, but if we can
>                                             coordinate a similar
>                                             interface in radv and
>                                             amdgpu-pro at the
>                                             vulkan level that would be
>                                             great.
>
>                                             I'm not sure what that's
>                                             going to be yet.
>
>                                             - Andres
>
>                                             On 12/19/2016 12:11 AM,
>                                             zhoucm1 wrote:
>
>
>
>                                                 On 2016年12月19日
>                                                 11:33, Pierre-Loup A.
>                                                 Griffais wrote:
>
>                                                     We're currently
>                                                     working with the
>                                                     open stack; I
>                                                     assume that a
>                                                     mechanism could be
>                                                     exposed by both
>                                                     open and Pro Vulkan
>                                                     userspace drivers
>                                                     and that the
>                                                     amdgpu kernel
>                                                     interface
>                                                     improvements we
>                                                     would pursue
>                                                     following this
>                                                     discussion would
>                                                     let both drivers
>                                                     take advantage of
>                                                     the feature, correct?
>
>                                                 Of course.
>                                                 Does open stack have
>                                                 Vulkan support?
>
>                                                 Regards,
>                                                 David Zhou
>
>
>                                                     On 12/18/2016
>                                                     07:26 PM, zhoucm1
>                                                     wrote:
>
>                                                         By the way,
>                                                         are you using
>                                                         all-open
>                                                         driver or
>                                                         amdgpu-pro
>                                                         driver?
>
>                                                         +David Mao,
>                                                         who is working
>                                                         on our Vulkan
>                                                         driver.
>
>                                                         Regards,
>                                                         David Zhou
>
>                                                         On 2016年12月
>                                                         18日 06:05,
>                                                         Pierre-Loup A.
>                                                         Griffais wrote:
>
>                                                             Hi Serguei,
>
>                                                             I'm also
>                                                             working on
>                                                             the
>                                                             bringing
>                                                             up our VR
>                                                             runtime on
>                                                             top of
>                                                             amgpu;
>                                                             see
>                                                             replies
>                                                             inline.
>
>                                                             On
>                                                             12/16/2016
>                                                             09:05 PM,
>                                                             Sagalovitch,
>                                                             Serguei wrote:
>
>                                                                 Andres,
>
>                                                                      For
>                                                                     current
>                                                                     VR
>                                                                     workloads
>                                                                     we
>                                                                     have
>                                                                     3
>                                                                     separate
>                                                                     processes
>                                                                     running
>                                                                     actually:
>
>                                                                 So we
>                                                                 could
>                                                                 have
>                                                                 potential
>                                                                 memory
>                                                                 overcommit
>                                                                 case or do
>                                                                 you do
>                                                                 partitioning
>                                                                 on
>                                                                 your
>                                                                 own? 
>                                                                 I
>                                                                 would
>                                                                 think
>                                                                 that
>                                                                 there
>                                                                 is
>                                                                 need
>                                                                 to avoid
>                                                                 overcomit
>                                                                 in
>                                                                 VR case to
>                                                                 prevent any
>                                                                 BO
>                                                                 migration.
>
>
>                                                             You're
>                                                             entirely
>                                                             correct;
>                                                             currently
>                                                             the VR
>                                                             runtime is
>                                                             setting up
>                                                             prioritized CPU
>                                                             scheduling
>                                                             for its VR
>                                                             compositor, we're
>                                                             working on
>                                                             prioritized GPU
>                                                             scheduling
>                                                             and
>                                                             pre-emption (eg.
>                                                             this
>                                                             thread),
>                                                             and in
>                                                             the future
>                                                             it will
>                                                             make sense
>                                                             to do work
>                                                             in order
>                                                             to make
>                                                             sure that
>                                                             its memory
>                                                             allocations do
>                                                             not get
>                                                             evicted,
>                                                             to prevent any
>                                                             unwelcome
>                                                             additional
>                                                             latency in
>                                                             the event
>                                                             of needing
>                                                             to perform
>                                                             just-in-time
>                                                             reprojection.
>
>                                                                 BTW:
>                                                                 Do you
>                                                                 mean
>                                                                 __real__
>                                                                 processes
>                                                                 or
>                                                                 threads?
>                                                                 Based
>                                                                 on my
>                                                                 understanding
>                                                                 sharing BOs
>                                                                 between different
>                                                                 processes
>                                                                 could
>                                                                 introduce
>                                                                 additional
>                                                                 synchronization
>                                                                 constrains.
>                                                                 btw:
>                                                                 I am not
>                                                                 sure
>                                                                 if we
>                                                                 are
>                                                                 able
>                                                                 to
>                                                                 share
>                                                                 Vulkan
>                                                                 sync.
>                                                                 object
>                                                                 cross-process
>                                                                 boundary.
>
>
>                                                             They are
>                                                             different
>                                                             processes;
>                                                             it is
>                                                             important
>                                                             for the
>                                                             compositor
>                                                             that
>                                                             is
>                                                             responsible for
>                                                             quality-of-service
>                                                             features
>                                                             such as
>                                                             consistently
>                                                             presenting
>                                                             distorted
>                                                             frames
>                                                             with the
>                                                             right latency,
>                                                             reprojection,
>                                                             etc,
>                                                             to be
>                                                             separate
>                                                             from the
>                                                             main
>                                                             application.
>
>                                                             Currently
>                                                             we are
>                                                             using
>                                                             unreleased
>                                                             cross-process
>                                                             memory and
>                                                             semaphore
>                                                             extensions
>                                                             to fetch
>                                                             updated
>                                                             eye images
>                                                             from the
>                                                             client
>                                                             application,
>                                                             but the
>                                                             just-in-time
>                                                             reprojection
>                                                             discussed
>                                                             here does not
>                                                             actually
>                                                             have any
>                                                             direct
>                                                             interactions
>                                                             with
>                                                             cross-process
>                                                             resource
>                                                             sharing,
>                                                             since it's
>                                                             achieved
>                                                             by using
>                                                             whatever
>                                                             is the
>                                                             latest, most
>                                                             up-to-date
>                                                             eye images
>                                                             that have
>                                                             already
>                                                             been sent
>                                                             by the client
>                                                             application,
>                                                             which are
>                                                             already
>                                                             available
>                                                             to use
>                                                             without
>                                                             additional
>                                                             synchronization.
>
>
>                                                                      
>                                                                      3) System
>                                                                     compositor
>                                                                     (we are
>                                                                     looking
>                                                                     at
>                                                                     approaches
>                                                                     to
>                                                                     remove
>                                                                     this
>                                                                     overhead)
>
>                                                                 Yes, 
>                                                                 IMHO
>                                                                 the
>                                                                 best
>                                                                 is to
>                                                                 run
>                                                                 in 
>                                                                 "full
>                                                                 screen
>                                                                 mode".
>
>
>                                                             Yes, we
>                                                             are
>                                                             working on
>                                                             mechanisms
>                                                             to present
>                                                             directly
>                                                             to the
>                                                             headset
>                                                             display
>                                                             without
>                                                             any
>                                                             intermediaries
>                                                             as a
>                                                             separate
>                                                             effort.
>
>
>                                                                      The
>                                                                     latency
>                                                                     is
>                                                                     our main
>                                                                     concern,
>
>                                                                 I
>                                                                 would
>                                                                 assume
>                                                                 that
>                                                                 this
>                                                                 is the
>                                                                 known
>                                                                 problem (at
>                                                                 least for
>                                                                 compute
>                                                                 usage).
>                                                                 It
>                                                                 looks
>                                                                 like
>                                                                 that
>                                                                 amdgpu
>                                                                 /
>                                                                 kernel
>                                                                 submission
>                                                                 is
>                                                                 rather CPU
>                                                                 intensive
>                                                                 (at least
>                                                                 in the
>                                                                 default configuration).
>
>
>                                                             As long as
>                                                             it's a
>                                                             consistent
>                                                             cost, it
>                                                             shouldn't
>                                                             an issue.
>                                                             However, if
>                                                             there's
>                                                             high
>                                                             degrees of
>                                                             variance
>                                                             then that
>                                                             would be
>                                                             troublesome and
>                                                             we
>                                                             would need
>                                                             to account
>                                                             for the
>                                                             worst case.
>
>                                                             Hopefully
>                                                             the
>                                                             requirements
>                                                             and
>                                                             approach
>                                                             we
>                                                             described make
>                                                             sense, we're
>                                                             looking
>                                                             forward to
>                                                             your
>                                                             feedback
>                                                             and
>                                                             suggestions.
>
>                                                             Thanks!
>                                                              - Pierre-Loup
>
>
>                                                                 Sincerely
>                                                                 yours,
>                                                                 Serguei Sagalovitch
>
>
>                                                                 From:
>                                                                 Andres
>                                                                 Rodriguez
>                                                                 <andresr at valvesoftware.com
>                                                                 <mailto:andresr at valvesoftware.com>>
>                                                                 Sent:
>                                                                 December
>                                                                 16,
>                                                                 2016
>                                                                 10:00 PM
>                                                                 To:
>                                                                 Sagalovitch,
>                                                                 Serguei;
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 Subject:
>                                                                 RE:
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>                                                                 Hey
>                                                                 Serguei,
>
>                                                                     [Serguei]
>                                                                     No. I
>                                                                     mean
>                                                                     pipe
>                                                                     :-) as
>                                                                     MEC define
>                                                                     it. 
>                                                                     As far
>                                                                     as I
>                                                                     understand
>                                                                     (by simplifying)
>                                                                     some
>                                                                     scheduling
>                                                                     is
>                                                                     per pipe. 
>                                                                     I
>                                                                     know
>                                                                     about
>                                                                     the current
>                                                                     allocation
>                                                                     scheme
>                                                                     but I
>                                                                     do
>                                                                     not think
>                                                                     that
>                                                                     it
>                                                                     is  ideal. 
>                                                                     I
>                                                                     would
>                                                                     assume
>                                                                     that
>                                                                     we
>                                                                     need
>                                                                     to
>                                                                     switch
>                                                                     to
>                                                                     dynamical
>                                                                     partition
>                                                                     of
>                                                                     resources 
>                                                                     based
>                                                                     on
>                                                                     the workload
>                                                                     otherwise
>                                                                     we
>                                                                     will
>                                                                     have
>                                                                     resource
>                                                                     conflict
>                                                                     between
>                                                                     Vulkan
>                                                                     compute
>                                                                     and 
>                                                                     OpenCL.
>
>
>                                                                 I
>                                                                 agree
>                                                                 the
>                                                                 partitioning
>                                                                 isn't
>                                                                 ideal.
>                                                                 I'm
>                                                                 hoping
>                                                                 we can
>                                                                 start
>                                                                 with a
>                                                                 solution
>                                                                 that
>                                                                 assumes that
>                                                                 only
>                                                                 pipe0
>                                                                 has
>                                                                 any
>                                                                 work
>                                                                 and
>                                                                 the
>                                                                 other
>                                                                 pipes
>                                                                 are
>                                                                 idle (no
>                                                                 HSA/ROCm
>                                                                 running on
>                                                                 the
>                                                                 system).
>
>                                                                 This
>                                                                 should
>                                                                 be
>                                                                 more
>                                                                 or
>                                                                 less
>                                                                 the
>                                                                 use
>                                                                 case
>                                                                 we
>                                                                 expect
>                                                                 from VR
>                                                                 users.
>
>                                                                 I
>                                                                 agree
>                                                                 the
>                                                                 split
>                                                                 is
>                                                                 currently
>                                                                 not
>                                                                 ideal,
>                                                                 but
>                                                                 I'd
>                                                                 like to
>                                                                 consider
>                                                                 that a
>                                                                 separate
>                                                                 task,
>                                                                 because
>                                                                 making
>                                                                 it
>                                                                 dynamic is
>                                                                 not
>                                                                 straight
>                                                                 forward :P
>
>                                                                     [Serguei]
>                                                                     Vulkan
>                                                                     works
>                                                                     via amdgpu
>                                                                     (kernel
>                                                                     submissions)
>                                                                     so
>                                                                     amdkfd
>                                                                     will
>                                                                     be not
>                                                                     involved. 
>                                                                     I
>                                                                     would
>                                                                     assume
>                                                                     that
>                                                                     in
>                                                                     the case
>                                                                     of
>                                                                     VR
>                                                                     we
>                                                                     will
>                                                                     have
>                                                                     one main
>                                                                     application
>                                                                     ("console"
>                                                                     mode(?))
>                                                                     so
>                                                                     we
>                                                                     could
>                                                                     temporally
>                                                                     "ignore"
>                                                                     OpenCL/ROCm
>                                                                     needs
>                                                                     when
>                                                                     VR
>                                                                     is
>                                                                     running.
>
>
>                                                                 Correct,
>                                                                 this
>                                                                 is why
>                                                                 we
>                                                                 want
>                                                                 to
>                                                                 enable
>                                                                 the
>                                                                 high
>                                                                 priority
>                                                                 compute
>                                                                 queue
>                                                                 through
>                                                                 libdrm-amdgpu,
>                                                                 so
>                                                                 that
>                                                                 we can
>                                                                 expose
>                                                                 it
>                                                                 through Vulkan
>                                                                 later.
>
>                                                                 For
>                                                                 current VR
>                                                                 workloads
>                                                                 we
>                                                                 have 3
>                                                                 separate
>                                                                 processes
>                                                                 running actually:
>                                                                     1)
>                                                                 Game
>                                                                 process
>                                                                     2)
>                                                                 VR
>                                                                 Compositor
>                                                                 (this
>                                                                 is the
>                                                                 process that
>                                                                 will
>                                                                 require
>                                                                 high
>                                                                 priority
>                                                                 queue)
>                                                                     3)
>                                                                 System
>                                                                 compositor
>                                                                 (we
>                                                                 are
>                                                                 looking at
>                                                                 approaches
>                                                                 to
>                                                                 remove
>                                                                 this
>                                                                 overhead)
>
>                                                                 For
>                                                                 now I
>                                                                 think
>                                                                 it is
>                                                                 okay
>                                                                 to
>                                                                 assume
>                                                                 no
>                                                                 OpenCL/ROCm
>                                                                 running
>                                                                 simultaneously,
>                                                                 but
>                                                                 I
>                                                                 would
>                                                                 also
>                                                                 like
>                                                                 to be
>                                                                 able
>                                                                 to
>                                                                 address this
>                                                                 case
>                                                                 in the
>                                                                 future
>                                                                 (cross-pipe
>                                                                 priorities).
>
>                                                                     [Serguei] 
>                                                                     The problem
>                                                                     with
>                                                                     pre-emption
>                                                                     of
>                                                                     graphics
>                                                                     task:
>                                                                     (a) it
>                                                                     may take
>                                                                     time
>                                                                     so
>                                                                     latency
>                                                                     may suffer
>
>
>                                                                 The
>                                                                 latency is
>                                                                 our
>                                                                 main
>                                                                 concern,
>                                                                 we
>                                                                 want
>                                                                 something
>                                                                 that is
>                                                                 predictable.
>                                                                 A good
>                                                                 illustration
>                                                                 of
>                                                                 what
>                                                                 the
>                                                                 reprojection
>                                                                 scheduling
>                                                                 looks like
>                                                                 can be
>                                                                 found
>                                                                 here:
>                                                                 https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png
>                                                                 <https://community.amd.com/servlet/JiveServlet/showImage/38-1310-104754/pastedImage_3.png>
>
>
>
>
>                                                                     (b) to
>                                                                     preempt
>                                                                     we
>                                                                     need
>                                                                     to
>                                                                     have
>                                                                     different
>                                                                     "context"
>                                                                     - we
>                                                                     want
>                                                                     to
>                                                                     guarantee
>                                                                     that
>                                                                     submissions
>                                                                     from
>                                                                     the same
>                                                                     context
>                                                                     will
>                                                                     be
>                                                                     executed
>                                                                     in
>                                                                     order.
>
>
>                                                                 This
>                                                                 is
>                                                                 okay,
>                                                                 as the
>                                                                 reprojection
>                                                                 work
>                                                                 doesn't have
>                                                                 dependencies
>                                                                 on
>                                                                 the
>                                                                 game
>                                                                 context,
>                                                                 and it
>                                                                 even
>                                                                 happens in
>                                                                 a
>                                                                 separate
>                                                                 process.
>
>                                                                     BTW:
>                                                                     (a) Do
>                                                                     you want
>                                                                     "preempt"
>                                                                     and later
>                                                                     resume
>                                                                     or
>                                                                     do you
>                                                                     want
>                                                                     "preempt"
>                                                                     and
>                                                                     "cancel/abort"
>
>
>                                                                 Preempt the
>                                                                 game
>                                                                 with
>                                                                 the
>                                                                 compositor
>                                                                 task
>                                                                 and
>                                                                 then
>                                                                 resume
>                                                                 it.
>
>                                                                     (b) Vulkan
>                                                                     is
>                                                                     generic
>                                                                     API and
>                                                                     could
>                                                                     be
>                                                                     used
>                                                                     for graphics
>                                                                     as
>                                                                     well
>                                                                     as
>                                                                     for plain
>                                                                     compute
>                                                                     tasks
>                                                                     (VK_QUEUE_COMPUTE_BIT).
>
>
>                                                                 Yeah,
>                                                                 the
>                                                                 plan
>                                                                 is to
>                                                                 use
>                                                                 vulkan
>                                                                 compute.
>                                                                 But if
>                                                                 you figure
>                                                                 out a way
>                                                                 for us
>                                                                 to get
>                                                                 a
>                                                                 guaranteed
>                                                                 execution
>                                                                 time
>                                                                 using
>                                                                 vulkan
>                                                                 graphics,
>                                                                 then
>                                                                 I'll
>                                                                 take you
>                                                                 out
>                                                                 for a
>                                                                 beer :)
>
>                                                                 Regards,
>                                                                 Andres
>                                                                 ________________________________________
>                                                                 From:
>                                                                 Sagalovitch,
>                                                                 Serguei [Serguei.Sagalovitch at amd.com
>                                                                 <mailto:Serguei.Sagalovitch at amd.com>]
>                                                                 Sent:
>                                                                 Friday, December
>                                                                 16,
>                                                                 2016
>                                                                 9:13 PM
>                                                                 To:
>                                                                 Andres
>                                                                 Rodriguez;
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 Subject:
>                                                                 Re:
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>                                                                 Hi Andres,
>
>                                                                 Please
>                                                                 see
>                                                                 inline
>                                                                 (as
>                                                                 [Serguei])
>
>                                                                 Sincerely
>                                                                 yours,
>                                                                 Serguei Sagalovitch
>
>
>                                                                 From:
>                                                                 Andres
>                                                                 Rodriguez
>                                                                 <andresr at valvesoftware.com
>                                                                 <mailto:andresr at valvesoftware.com>>
>                                                                 Sent:
>                                                                 December
>                                                                 16,
>                                                                 2016
>                                                                 8:29 PM
>                                                                 To:
>                                                                 Sagalovitch,
>                                                                 Serguei;
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 Subject:
>                                                                 RE:
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>                                                                 Hi
>                                                                 Serguei,
>
>                                                                 Thanks
>                                                                 for
>                                                                 the
>                                                                 feedback.
>                                                                 Answers inline
>                                                                 as [AR].
>
>                                                                 Regards,
>                                                                 Andres
>
>                                                                 ________________________________________
>                                                                 From:
>                                                                 Sagalovitch,
>                                                                 Serguei [Serguei.Sagalovitch at amd.com
>                                                                 <mailto:Serguei.Sagalovitch at amd.com>]
>                                                                 Sent:
>                                                                 Friday, December
>                                                                 16,
>                                                                 2016
>                                                                 8:15 PM
>                                                                 To:
>                                                                 Andres
>                                                                 Rodriguez;
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 Subject:
>                                                                 Re:
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>                                                                 Andres,
>
>
>                                                                 Quick
>                                                                 comments:
>
>                                                                 1) To
>                                                                 minimize
>                                                                 "bubbles",
>                                                                 etc.
>                                                                 we
>                                                                 need
>                                                                 to
>                                                                 "force" CU
>                                                                 assignments/binding
>                                                                 to
>                                                                 high-priority
>                                                                 queue
>                                                                 when
>                                                                 it
>                                                                 will
>                                                                 be in
>                                                                 use
>                                                                 and "free"
>                                                                 them later
>                                                                 (we 
>                                                                 do not
>                                                                 want
>                                                                 forever take
>                                                                 CUs
>                                                                 from
>                                                                 e.g.
>                                                                 graphic task
>                                                                 to
>                                                                 degrade
>                                                                 graphics
>                                                                 performance).
>
>                                                                 Otherwise
>                                                                 we
>                                                                 could
>                                                                 have
>                                                                 scenario
>                                                                 when
>                                                                 long
>                                                                 graphics
>                                                                 task (or
>                                                                 low-priority
>                                                                 compute)
>                                                                 will
>                                                                 took
>                                                                 all
>                                                                 (extra) CUs
>                                                                 and
>                                                                 high--priority
>                                                                 will
>                                                                 wait for
>                                                                 needed
>                                                                 resources.
>                                                                 It
>                                                                 will
>                                                                 not be
>                                                                 visible on
>                                                                 "NOP "
>                                                                 but
>                                                                 only
>                                                                 when
>                                                                 you submit
>                                                                 "real"
>                                                                 compute task
>                                                                 so I
>                                                                 would
>                                                                 recommend 
>                                                                 not to
>                                                                 use
>                                                                 "NOP"
>                                                                 packets at
>                                                                 all for
>                                                                 testing.
>
>                                                                 It (CU
>                                                                 assignment)
>                                                                 could
>                                                                 be
>                                                                 relatively
>                                                                 easy
>                                                                 done when
>                                                                 everything
>                                                                 is
>                                                                 going
>                                                                 via kernel
>                                                                 (e.g.
>                                                                 as
>                                                                 part
>                                                                 of
>                                                                 frame
>                                                                 submission)
>                                                                 but I
>                                                                 must
>                                                                 admit
>                                                                 that I
>                                                                 am not
>                                                                 sure
>                                                                 about
>                                                                 the
>                                                                 best
>                                                                 way
>                                                                 for
>                                                                 user
>                                                                 level
>                                                                 submissions
>                                                                 (amdkfd).
>
>                                                                 [AR] I
>                                                                 wasn't
>                                                                 aware
>                                                                 of
>                                                                 this
>                                                                 part
>                                                                 of the
>                                                                 programming
>                                                                 sequence.
>                                                                 Thanks
>                                                                 for
>                                                                 the
>                                                                 heads up!
>                                                                 Is
>                                                                 this
>                                                                 similar to
>                                                                 the CU
>                                                                 masking programming?
>                                                                 [Serguei]
>                                                                 Yes.
>                                                                 To
>                                                                 simplify:
>                                                                 the
>                                                                 problem is
>                                                                 that
>                                                                 "scheduler"
>                                                                 when
>                                                                 deciding
>                                                                 which
>                                                                 queue
>                                                                 to 
>                                                                 run
>                                                                 will
>                                                                 check
>                                                                 if
>                                                                 there
>                                                                 is
>                                                                 enough
>                                                                 resources
>                                                                 and
>                                                                 if not
>                                                                 then
>                                                                 it
>                                                                 will begin
>                                                                 to
>                                                                 check
>                                                                 other
>                                                                 queues
>                                                                 with
>                                                                 lower
>                                                                 priority.
>
>                                                                 2) I
>                                                                 would
>                                                                 recommend
>                                                                 to
>                                                                 dedicate
>                                                                 the
>                                                                 whole
>                                                                 pipe to
>                                                                 high-priority
>                                                                 queue
>                                                                 and have
>                                                                 nothing their
>                                                                 except it.
>
>                                                                 [AR]
>                                                                 I'm
>                                                                 guessing
>                                                                 in
>                                                                 this
>                                                                 context you
>                                                                 mean
>                                                                 pipe =
>                                                                 queue?
>                                                                 (as
>                                                                 opposed
>                                                                 to the
>                                                                 MEC
>                                                                 definition
>                                                                 of
>                                                                 pipe,
>                                                                 which
>                                                                 is a
>                                                                 grouping
>                                                                 of
>                                                                 queues).
>                                                                 I say
>                                                                 this
>                                                                 because
>                                                                 amdgpu
>                                                                 only
>                                                                 has
>                                                                 access
>                                                                 to 1 pipe,
>                                                                 and
>                                                                 the
>                                                                 rest
>                                                                 are
>                                                                 statically
>                                                                 partitioned
>                                                                 for
>                                                                 amdkfd
>                                                                 usage.
>
>                                                                 [Serguei]
>                                                                 No. I
>                                                                 mean
>                                                                 pipe
>                                                                 :-) 
>                                                                 as MEC
>                                                                 define
>                                                                 it. As
>                                                                 far as I
>                                                                 understand
>                                                                 (by
>                                                                 simplifying)
>                                                                 some
>                                                                 scheduling
>                                                                 is per
>                                                                 pipe. 
>                                                                 I know
>                                                                 about
>                                                                 the
>                                                                 current
>                                                                 allocation
>                                                                 scheme
>                                                                 but I
>                                                                 do not
>                                                                 think
>                                                                 that
>                                                                 it is 
>                                                                 ideal.  I
>                                                                 would
>                                                                 assume
>                                                                 that
>                                                                 we
>                                                                 need
>                                                                 to
>                                                                 switch to
>                                                                 dynamical
>                                                                 partition
>                                                                 of
>                                                                 resources 
>                                                                 based
>                                                                 on the
>                                                                 workload
>                                                                 otherwise
>                                                                 we
>                                                                 will have
>                                                                 resource
>                                                                 conflict
>                                                                 between Vulkan
>                                                                 compute and 
>                                                                 OpenCL.
>
>
>                                                                 BTW:
>                                                                 Which
>                                                                 user
>                                                                 level
>                                                                 API do
>                                                                 you
>                                                                 want
>                                                                 to use
>                                                                 for
>                                                                 compute:
>                                                                 Vulkan or
>                                                                 OpenCL?
>
>                                                                 [AR]
>                                                                 Vulkan
>
>                                                                 [Serguei]
>                                                                 Vulkan
>                                                                 works
>                                                                 via
>                                                                 amdgpu
>                                                                 (kernel submissions)
>                                                                 so
>                                                                 amdkfd
>                                                                 will
>                                                                 be not
>                                                                 involved. 
>                                                                 I
>                                                                 would
>                                                                 assume
>                                                                 that
>                                                                 in the
>                                                                 case
>                                                                 of VR
>                                                                 we will
>                                                                 have
>                                                                 one main
>                                                                 application
>                                                                 ("console"
>                                                                 mode(?))
>                                                                 so we
>                                                                 could
>                                                                 temporally
>                                                                 "ignore"
>                                                                 OpenCL/ROCm
>                                                                 needs
>                                                                 when
>                                                                 VR is
>                                                                 running.
>
>                                                                      we will
>                                                                     not be
>                                                                     able
>                                                                     to
>                                                                     provide
>                                                                     a
>                                                                     solution
>                                                                     compatible
>                                                                     with
>                                                                     GFX
>                                                                     worloads.
>
>                                                                 I
>                                                                 assume
>                                                                 that
>                                                                 you
>                                                                 are
>                                                                 talking about
>                                                                 graphics?
>                                                                 Am I
>                                                                 right?
>
>                                                                 [AR]
>                                                                 Yeah,
>                                                                 my
>                                                                 understanding
>                                                                 is
>                                                                 that
>                                                                 pre-empting
>                                                                 the
>                                                                 currently
>                                                                 running
>                                                                 graphics
>                                                                 job
>                                                                 and
>                                                                 scheduling
>                                                                 in
>                                                                 something
>                                                                 else
>                                                                 using
>                                                                 mid-buffer
>                                                                 pre-emption
>                                                                 has
>                                                                 some cases
>                                                                 where it
>                                                                 doesn't work
>                                                                 well.
>                                                                 But if
>                                                                 with
>                                                                 polaris10
>                                                                 it
>                                                                 starts
>                                                                 working well,
>                                                                 it
>                                                                 might
>                                                                 be a
>                                                                 better
>                                                                 solution
>                                                                 for
>                                                                 us
>                                                                 (because
>                                                                 the
>                                                                 whole
>                                                                 reprojection
>                                                                 work
>                                                                 uses
>                                                                 the
>                                                                 vulkan
>                                                                 graphics
>                                                                 stack
>                                                                 at the
>                                                                 moment, and
>                                                                 porting it
>                                                                 to
>                                                                 compute is
>                                                                 not
>                                                                 trivial).
>
>                                                                 [Serguei] 
>                                                                 The
>                                                                 problem with
>                                                                 pre-emption
>                                                                 of
>                                                                 graphics
>                                                                 task:
>                                                                 (a) it may
>                                                                 take
>                                                                 time so
>                                                                 latency may
>                                                                 suffer
>                                                                 (b) to
>                                                                 preempt we
>                                                                 need
>                                                                 to
>                                                                 have
>                                                                 different
>                                                                 "context"
>                                                                 - we want
>                                                                 to
>                                                                 guarantee
>                                                                 that
>                                                                 submissions
>                                                                 from
>                                                                 the
>                                                                 same
>                                                                 context will
>                                                                 be
>                                                                 executed
>                                                                 in order.
>                                                                 BTW:
>                                                                 (a) Do
>                                                                 you
>                                                                 want
>                                                                 "preempt"
>                                                                 and
>                                                                 later
>                                                                 resume
>                                                                 or do you
>                                                                 want
>                                                                 "preempt"
>                                                                 and
>                                                                 "cancel/abort"? 
>                                                                 (b)
>                                                                 Vulkan
>                                                                 is
>                                                                 generic API
>                                                                 and
>                                                                 could
>                                                                 be used
>                                                                 for
>                                                                 graphics
>                                                                 as
>                                                                 well
>                                                                 as for
>                                                                 plain
>                                                                 compute tasks
>                                                                 (VK_QUEUE_COMPUTE_BIT).
>
>
>                                                                 Sincerely
>                                                                 yours,
>                                                                 Serguei Sagalovitch
>
>
>
>                                                                 From:
>                                                                 amd-gfx <amd-gfx-bounces at lists.freedesktop.org
>                                                                 <mailto:amd-gfx-bounces at lists.freedesktop.org>>
>                                                                 on
>                                                                 behalf of
>                                                                 Andres
>                                                                 Rodriguez
>                                                                 <andresr at valvesoftware.com
>                                                                 <mailto:andresr at valvesoftware.com>>
>                                                                 Sent:
>                                                                 December
>                                                                 16,
>                                                                 2016
>                                                                 6:15 PM
>                                                                 To:
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 Subject:
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in
>                                                                 amdgpu
>
>                                                                 Hi
>                                                                 Everyone,
>
>                                                                 This
>                                                                 RFC is
>                                                                 also
>                                                                 available
>                                                                 as a
>                                                                 gist here:
>                                                                 https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249
>                                                                 <https://gist.github.com/lostgoat/7000432cd6864265dbc2c3ab93204249>
>
>
>
>
>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>                                                                 gist.github.com
>                                                                 <http://gist.github.com>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>
>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>                                                                 gist.github.com
>                                                                 <http://gist.github.com>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>
>
>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>                                                                 gist.github.com
>                                                                 <http://gist.github.com>
>                                                                 [RFC]
>                                                                 Mechanism
>                                                                 for
>                                                                 high
>                                                                 priority
>                                                                 scheduling
>                                                                 in amdgpu
>
>
>                                                                 We are
>                                                                 interested
>                                                                 in
>                                                                 feedback
>                                                                 for a
>                                                                 mechanism
>                                                                 to
>                                                                 effectively
>                                                                 schedule
>                                                                 high
>                                                                 priority
>                                                                 VR
>                                                                 reprojection
>                                                                 tasks
>                                                                 (also
>                                                                 referred
>                                                                 to as
>                                                                 time-warping)
>                                                                 for
>                                                                 Polaris10
>                                                                 running on
>                                                                 the
>                                                                 amdgpu
>                                                                 kernel
>                                                                 driver.
>
>                                                                 Brief
>                                                                 context:
>                                                                 --------------
>
>                                                                 The
>                                                                 main
>                                                                 objective
>                                                                 of
>                                                                 reprojection
>                                                                 is to
>                                                                 avoid
>                                                                 motion
>                                                                 sickness
>                                                                 for VR
>                                                                 users in
>                                                                 scenarios
>                                                                 where
>                                                                 the
>                                                                 game
>                                                                 or
>                                                                 application
>                                                                 would
>                                                                 fail
>                                                                 to finish
>                                                                 rendering
>                                                                 a new
>                                                                 frame
>                                                                 in
>                                                                 time
>                                                                 for
>                                                                 the
>                                                                 next
>                                                                 VBLANK. When
>                                                                 this
>                                                                 happens,
>                                                                 the
>                                                                 user's
>                                                                 head
>                                                                 movements
>                                                                 are
>                                                                 not
>                                                                 reflected
>                                                                 on the
>                                                                 Head
>                                                                 Mounted Display
>                                                                 (HMD)
>                                                                 for the
>                                                                 duration
>                                                                 of an
>                                                                 extra
>                                                                 frame.
>                                                                 This
>                                                                 extended
>                                                                 mismatch
>                                                                 between the
>                                                                 inner ear
>                                                                 and the
>                                                                 eyes may
>                                                                 cause
>                                                                 the
>                                                                 user
>                                                                 to
>                                                                 experience
>                                                                 motion
>                                                                 sickness.
>
>                                                                 The VR
>                                                                 compositor
>                                                                 deals
>                                                                 with
>                                                                 this
>                                                                 problem by
>                                                                 fabricating
>                                                                 a
>                                                                 new frame
>                                                                 using the
>                                                                 user's
>                                                                 updated head
>                                                                 position
>                                                                 in
>                                                                 combination
>                                                                 with the
>                                                                 previous
>                                                                 frames.
>                                                                 This
>                                                                 avoids
>                                                                 a
>                                                                 prolonged
>                                                                 mismatch
>                                                                 between the
>                                                                 HMD
>                                                                 output
>                                                                 and the
>                                                                 inner ear.
>
>                                                                 Because of
>                                                                 the
>                                                                 adverse effects
>                                                                 on the
>                                                                 user,
>                                                                 we
>                                                                 require high
>                                                                 confidence
>                                                                 that the
>                                                                 reprojection
>                                                                 task
>                                                                 will
>                                                                 complete
>                                                                 before
>                                                                 the
>                                                                 VBLANK
>                                                                 interval.
>                                                                 Even if
>                                                                 the
>                                                                 GFX pipe
>                                                                 is
>                                                                 currently
>                                                                 full
>                                                                 of
>                                                                 work
>                                                                 from
>                                                                 the
>                                                                 game/application
>                                                                 (which
>                                                                 is most
>                                                                 likely
>                                                                 the case).
>
>                                                                 For
>                                                                 more
>                                                                 details and
>                                                                 illustrations,
>                                                                 please
>                                                                 refer
>                                                                 to the
>                                                                 following
>                                                                 document:
>                                                                 https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved
>                                                                 <https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved>
>
>
>
>
>
>                                                                 Gaming: Asynchronous
>                                                                 Shaders Evolved
>                                                                 |
>                                                                 Community
>                                                                 community.amd.com
>                                                                 <http://community.amd.com>
>                                                                 One of
>                                                                 the
>                                                                 most
>                                                                 exciting
>                                                                 new
>                                                                 developments
>                                                                 in GPU
>                                                                 technology
>                                                                 over the
>                                                                 past
>                                                                 year
>                                                                 has
>                                                                 been
>                                                                 the
>                                                                 adoption
>                                                                 of
>                                                                 asynchronous
>                                                                 shaders,
>                                                                 which can
>                                                                 make
>                                                                 more
>                                                                 efficient
>                                                                 use of ...
>
>
>
>                                                                 Gaming: Asynchronous
>                                                                 Shaders Evolved
>                                                                 |
>                                                                 Community
>                                                                 community.amd.com
>                                                                 <http://community.amd.com>
>                                                                 One of
>                                                                 the
>                                                                 most
>                                                                 exciting
>                                                                 new
>                                                                 developments
>                                                                 in GPU
>                                                                 technology
>                                                                 over the
>                                                                 past
>                                                                 year
>                                                                 has
>                                                                 been
>                                                                 the
>                                                                 adoption
>                                                                 of
>                                                                 asynchronous
>                                                                 shaders,
>                                                                 which can
>                                                                 make
>                                                                 more
>                                                                 efficient
>                                                                 use of ...
>
>
>
>                                                                 Gaming: Asynchronous
>                                                                 Shaders Evolved
>                                                                 |
>                                                                 Community
>                                                                 community.amd.com
>                                                                 <http://community.amd.com>
>                                                                 One of
>                                                                 the
>                                                                 most
>                                                                 exciting
>                                                                 new
>                                                                 developments
>                                                                 in GPU
>                                                                 technology
>                                                                 over the
>                                                                 past
>                                                                 year
>                                                                 has
>                                                                 been
>                                                                 the
>                                                                 adoption
>                                                                 of
>                                                                 asynchronous
>                                                                 shaders,
>                                                                 which can
>                                                                 make
>                                                                 more
>                                                                 efficient
>                                                                 use of ...
>
>
>                                                                 Requirements:
>                                                                 -------------
>
>                                                                 The
>                                                                 mechanism
>                                                                 must
>                                                                 expose
>                                                                 the
>                                                                 following
>                                                                 functionaility:
>
>                                                                     *
>                                                                 Job
>                                                                 round
>                                                                 trip
>                                                                 time
>                                                                 must
>                                                                 be
>                                                                 predictable,
>                                                                 from
>                                                                 submission
>                                                                 to
>                                                                 fence
>                                                                 signal
>
>                                                                     *
>                                                                 The
>                                                                 mechanism
>                                                                 must
>                                                                 support compute
>                                                                 workloads.
>
>                                                                 Goals:
>                                                                 ------
>
>                                                                     *
>                                                                 The
>                                                                 mechanism
>                                                                 should
>                                                                 provide low
>                                                                 submission
>                                                                 latencies
>
>                                                                 Test:
>                                                                 submitting
>                                                                 a NOP
>                                                                 packet
>                                                                 through the
>                                                                 mechanism
>                                                                 on busy
>                                                                 hardware
>                                                                 should
>                                                                 be
>                                                                 equivalent
>                                                                 to
>                                                                 submitting
>                                                                 a NOP
>                                                                 on
>                                                                 idle
>                                                                 hardware.
>
>                                                                 Nice
>                                                                 to have:
>                                                                 -------------
>
>                                                                     *
>                                                                 The
>                                                                 mechanism
>                                                                 should
>                                                                 also
>                                                                 support GFX
>                                                                 workloads.
>
>                                                                 My
>                                                                 understanding
>                                                                 is
>                                                                 that
>                                                                 with
>                                                                 the
>                                                                 current hardware
>                                                                 capabilities
>                                                                 in
>                                                                 Polaris10
>                                                                 we
>                                                                 will
>                                                                 not be
>                                                                 able
>                                                                 to
>                                                                 provide a
>                                                                 solution
>                                                                 compatible
>                                                                 with GFX
>                                                                 worloads.
>
>                                                                 But I
>                                                                 would
>                                                                 love
>                                                                 to
>                                                                 hear
>                                                                 otherwise.
>                                                                 So if
>                                                                 anyone
>                                                                 has an
>                                                                 idea,
>                                                                 approach
>                                                                 or
>                                                                 suggestion
>                                                                 that
>                                                                 will
>                                                                 also
>                                                                 be
>                                                                 compatible
>                                                                 with
>                                                                 the
>                                                                 GFX ring,
>                                                                 please let
>                                                                 us know
>                                                                 about it.
>
>                                                                     *
>                                                                 The
>                                                                 above
>                                                                 guarantees
>                                                                 should
>                                                                 also
>                                                                 be
>                                                                 respected
>                                                                 by
>                                                                 amdkfd
>                                                                 workloads
>
>                                                                 Would
>                                                                 be
>                                                                 good
>                                                                 to
>                                                                 have
>                                                                 for
>                                                                 consistency,
>                                                                 but
>                                                                 not
>                                                                 strictly
>                                                                 necessary
>                                                                 as
>                                                                 users
>                                                                 running
>                                                                 games
>                                                                 are
>                                                                 not
>                                                                 traditionally
>                                                                 running HPC
>                                                                 workloads
>                                                                 in the
>                                                                 background.
>
>                                                                 Proposed
>                                                                 approach:
>                                                                 ------------------
>
>                                                                 Similar to
>                                                                 the
>                                                                 windows driver,
>                                                                 we
>                                                                 could
>                                                                 expose
>                                                                 a high
>                                                                 priority
>                                                                 compute queue
>                                                                 to
>                                                                 userspace.
>
>                                                                 Submissions
>                                                                 to
>                                                                 this
>                                                                 compute queue
>                                                                 will
>                                                                 be
>                                                                 scheduled
>                                                                 with
>                                                                 high
>                                                                 priority,
>                                                                 and may
>                                                                 acquire hardware
>                                                                 resources
>                                                                 previously
>                                                                 in use
>                                                                 by other
>                                                                 queues.
>
>                                                                 This
>                                                                 can be
>                                                                 achieved
>                                                                 by
>                                                                 taking
>                                                                 advantage
>                                                                 of the
>                                                                 'priority'
>                                                                 field in
>                                                                 the HQDs
>                                                                 and
>                                                                 could
>                                                                 be
>                                                                 programmed
>                                                                 by
>                                                                 amdgpu
>                                                                 or the
>                                                                 amdgpu
>                                                                 scheduler.
>                                                                 The
>                                                                 relevant
>                                                                 register
>                                                                 fields
>                                                                 are:
>                                                                      
>                                                                   *
>                                                                 mmCP_HQD_PIPE_PRIORITY
>                                                                      
>                                                                   *
>                                                                 mmCP_HQD_QUEUE_PRIORITY
>
>                                                                 Implementation
>                                                                 approach
>                                                                 1 -
>                                                                 static
>                                                                 partitioning:
>                                                                 ------------------------------------------------
>
>                                                                 The
>                                                                 amdgpu
>                                                                 driver
>                                                                 currently
>                                                                 controls
>                                                                 8
>                                                                 compute queues
>                                                                 from
>                                                                 pipe0.
>                                                                 We can
>                                                                 statically
>                                                                 partition
>                                                                 these
>                                                                 as
>                                                                 follows:
>                                                                      
>                                                                   * 7x
>                                                                 regular
>                                                                      
>                                                                   * 1x
>                                                                 high
>                                                                 priority
>
>                                                                 The
>                                                                 relevant
>                                                                 priorities
>                                                                 can be
>                                                                 set so
>                                                                 that
>                                                                 submissions
>                                                                 to
>                                                                 the high
>                                                                 priority
>                                                                 ring
>                                                                 will
>                                                                 starve
>                                                                 the
>                                                                 other
>                                                                 compute rings
>                                                                 and
>                                                                 the
>                                                                 GFX ring.
>
>                                                                 The
>                                                                 amdgpu
>                                                                 scheduler
>                                                                 will
>                                                                 only
>                                                                 place
>                                                                 jobs
>                                                                 into
>                                                                 the high
>                                                                 priority
>                                                                 rings
>                                                                 if the
>                                                                 context is
>                                                                 marked
>                                                                 as
>                                                                 high
>                                                                 priority.
>                                                                 And a
>                                                                 corresponding
>                                                                 priority
>                                                                 should be
>                                                                 added
>                                                                 to
>                                                                 keep
>                                                                 track
>                                                                 of
>                                                                 this
>                                                                 information:
>                                                                      *
>                                                                 AMD_SCHED_PRIORITY_KERNEL
>                                                                      *
>                                                                 ->
>                                                                 AMD_SCHED_PRIORITY_HIGH
>                                                                      *
>                                                                 AMD_SCHED_PRIORITY_NORMAL
>
>                                                                 The
>                                                                 user
>                                                                 will
>                                                                 request a
>                                                                 high
>                                                                 priority
>                                                                 context by
>                                                                 setting an
>                                                                 appropriate
>                                                                 flag
>                                                                 in
>                                                                 drm_amdgpu_ctx_in
>                                                                 (AMDGPU_CTX_HIGH_PRIORITY
>                                                                 or
>                                                                 similar):
>                                                                 https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163
>                                                                 <https://github.com/torvalds/linux/blob/master/include/uapi/drm/amdgpu_drm.h#L163>
>
>
>
>
>                                                                 The
>                                                                 setting is
>                                                                 in a
>                                                                 per
>                                                                 context level
>                                                                 so
>                                                                 that
>                                                                 we can:
>                                                                     *
>                                                                 Maintain
>                                                                 a
>                                                                 consistent
>                                                                 FIFO
>                                                                 ordering
>                                                                 of all
>                                                                 submissions
>                                                                 to a
>                                                                 context
>                                                                     *
>                                                                 Create
>                                                                 high
>                                                                 priority
>                                                                 and
>                                                                 non-high
>                                                                 priority
>                                                                 contexts
>                                                                 in the
>                                                                 same
>                                                                 process
>
>                                                                 Implementation
>                                                                 approach
>                                                                 2 -
>                                                                 dynamic priority
>                                                                 programming:
>                                                                 ---------------------------------------------------------
>
>                                                                 Similar to
>                                                                 the
>                                                                 above,
>                                                                 but
>                                                                 instead of
>                                                                 programming
>                                                                 the
>                                                                 priorities
>                                                                 and
>                                                                 amdgpu_init()
>                                                                 time,
>                                                                 the SW
>                                                                 scheduler
>                                                                 will
>                                                                 reprogram
>                                                                 the
>                                                                 queue
>                                                                 priorities
>                                                                 dynamically
>                                                                 when
>                                                                 scheduling
>                                                                 a task.
>
>                                                                 This
>                                                                 would
>                                                                 involve having
>                                                                 a
>                                                                 hardware
>                                                                 specific
>                                                                 callback
>                                                                 from
>                                                                 the
>                                                                 scheduler
>                                                                 to
>                                                                 set
>                                                                 the
>                                                                 appropriate
>                                                                 queue
>                                                                 priority:
>                                                                 set_priority(int
>                                                                 ring,
>                                                                 int index,
>                                                                 int
>                                                                 priority)
>
>                                                                 During
>                                                                 this
>                                                                 callback
>                                                                 we
>                                                                 would
>                                                                 have
>                                                                 to
>                                                                 grab
>                                                                 the
>                                                                 SRBM mutex
>                                                                 to perform
>                                                                 the
>                                                                 appropriate
>                                                                 HW
>                                                                 programming,
>                                                                 and
>                                                                 I'm
>                                                                 not
>                                                                 really
>                                                                 sure
>                                                                 if that is
>                                                                 something
>                                                                 we
>                                                                 should
>                                                                 be
>                                                                 doing from
>                                                                 the
>                                                                 scheduler.
>
>                                                                 On the
>                                                                 positive
>                                                                 side,
>                                                                 this
>                                                                 approach
>                                                                 would
>                                                                 allow
>                                                                 us to
>                                                                 program a
>                                                                 range of
>                                                                 priorities
>                                                                 for
>                                                                 jobs
>                                                                 instead of
>                                                                 a
>                                                                 single
>                                                                 "high
>                                                                 priority"
>                                                                 value",
>                                                                 achieving
>                                                                 something
>                                                                 similar to
>                                                                 the
>                                                                 niceness
>                                                                 API
>                                                                 available
>                                                                 for CPU
>                                                                 scheduling.
>
>                                                                 I'm
>                                                                 not
>                                                                 sure
>                                                                 if
>                                                                 this
>                                                                 flexibility
>                                                                 is
>                                                                 something
>                                                                 that
>                                                                 we would
>                                                                 need for
>                                                                 our use
>                                                                 case,
>                                                                 but it
>                                                                 might
>                                                                 be
>                                                                 useful
>                                                                 in
>                                                                 other
>                                                                 scenarios
>                                                                 (multiple
>                                                                 users
>                                                                 sharing compute
>                                                                 time
>                                                                 on a
>                                                                 server).
>
>                                                                 This
>                                                                 approach
>                                                                 would
>                                                                 require a
>                                                                 new
>                                                                 int
>                                                                 field in
>                                                                 drm_amdgpu_ctx_in,
>                                                                 or
>                                                                 repurposing
>                                                                 of the
>                                                                 flags
>                                                                 field.
>
>                                                                 Known
>                                                                 current obstacles:
>                                                                 ------------------------
>
>                                                                 The SQ
>                                                                 is
>                                                                 currently
>                                                                 programmed
>                                                                 to
>                                                                 disregard
>                                                                 the HQD
>                                                                 priorities,
>                                                                 and
>                                                                 instead it
>                                                                 picks
>                                                                 jobs
>                                                                 at
>                                                                 random. Settings
>                                                                 from
>                                                                 the
>                                                                 shader
>                                                                 itself
>                                                                 are also
>                                                                 disregarded
>                                                                 as this is
>                                                                 considered
>                                                                 a
>                                                                 privileged
>                                                                 field.
>
>                                                                 Effectively
>                                                                 we can
>                                                                 get
>                                                                 our
>                                                                 compute wavefront
>                                                                 launched
>                                                                 ASAP,
>                                                                 but we
>                                                                 might
>                                                                 not
>                                                                 get the
>                                                                 time
>                                                                 we
>                                                                 need
>                                                                 on the SQ.
>
>                                                                 The
>                                                                 current programming
>                                                                 would
>                                                                 have
>                                                                 to be
>                                                                 changed to
>                                                                 allow
>                                                                 priority
>                                                                 propagation
>                                                                 from
>                                                                 the
>                                                                 HQD
>                                                                 into
>                                                                 the SQ.
>
>                                                                 Generic approach
>                                                                 for
>                                                                 all HW
>                                                                 IPs:
>                                                                 --------------------------------
>
>                                                                 For
>                                                                 consistency
>                                                                 purposes,
>                                                                 the
>                                                                 high
>                                                                 priority
>                                                                 context can
>                                                                 be
>                                                                 enabled
>                                                                 for
>                                                                 all HW IPs
>                                                                 with
>                                                                 support of
>                                                                 the SW
>                                                                 scheduler.
>                                                                 This
>                                                                 will
>                                                                 function
>                                                                 similarly
>                                                                 to the
>                                                                 current
>                                                                 AMD_SCHED_PRIORITY_KERNEL
>                                                                 priority,
>                                                                 where
>                                                                 the
>                                                                 job
>                                                                 can jump
>                                                                 ahead of
>                                                                 anything
>                                                                 not
>                                                                 commited
>                                                                 to the
>                                                                 HW queue.
>
>                                                                 The
>                                                                 benefits
>                                                                 of
>                                                                 requesting
>                                                                 a high
>                                                                 priority
>                                                                 context for
>                                                                 a
>                                                                 non-compute
>                                                                 queue will
>                                                                 be
>                                                                 lesser
>                                                                 (e.g.
>                                                                 up to
>                                                                 10s of
>                                                                 wait
>                                                                 time
>                                                                 if a
>                                                                 GFX
>                                                                 command is
>                                                                 stuck in
>                                                                 front of
>                                                                 you),
>                                                                 but
>                                                                 having
>                                                                 the
>                                                                 API in
>                                                                 place
>                                                                 will
>                                                                 allow
>                                                                 us to
>                                                                 easily
>                                                                 improve the
>                                                                 implementation
>                                                                 in the
>                                                                 future
>                                                                 as new
>                                                                 features
>                                                                 become
>                                                                 available
>                                                                 in new
>                                                                 hardware.
>
>                                                                 Future
>                                                                 steps:
>                                                                 -------------
>
>                                                                 Once
>                                                                 we
>                                                                 have
>                                                                 an
>                                                                 approach
>                                                                 settled,
>                                                                 I can
>                                                                 take
>                                                                 care
>                                                                 of the
>                                                                 implementation.
>
>                                                                 Also,
>                                                                 once
>                                                                 the
>                                                                 interface
>                                                                 is
>                                                                 mostly
>                                                                 decided,
>                                                                 we can
>                                                                 start
>                                                                 thinking
>                                                                 about
>                                                                 exposing
>                                                                 the
>                                                                 high
>                                                                 priority
>                                                                 queue
>                                                                 through radv.
>
>                                                                 Request for
>                                                                 feedback:
>                                                                 ---------------------
>
>                                                                 We
>                                                                 aren't
>                                                                 married to
>                                                                 any of
>                                                                 the
>                                                                 approaches
>                                                                 outlined
>                                                                 above.
>                                                                 Our goal
>                                                                 is to
>                                                                 obtain
>                                                                 a
>                                                                 mechanism
>                                                                 that
>                                                                 will
>                                                                 allow
>                                                                 us to
>                                                                 complete
>                                                                 the
>                                                                 reprojection
>                                                                 job
>                                                                 within a
>                                                                 predictable
>                                                                 amount
>                                                                 of
>                                                                 time.
>                                                                 So if
>                                                                 anyone
>                                                                 anyone
>                                                                 has any
>                                                                 suggestions
>                                                                 for
>                                                                 improvements
>                                                                 or
>                                                                 alternative
>                                                                 strategies
>                                                                 we are
>                                                                 more than
>                                                                 happy
>                                                                 to hear
>                                                                 them.
>
>                                                                 If any
>                                                                 of the
>                                                                 technical
>                                                                 information
>                                                                 above
>                                                                 is also
>                                                                 incorrect,
>                                                                 feel
>                                                                 free
>                                                                 to point
>                                                                 out my
>                                                                 misunderstandings.
>
>                                                                 Looking forward
>                                                                 to
>                                                                 hearing from
>                                                                 you.
>
>                                                                 Regards,
>                                                                 Andres
>
>                                                                 _______________________________________________
>                                                                 amd-gfx mailing
>                                                                 list
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>                                                                 <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>
>
>                                                                 amd-gfx Info
>                                                                 Page -
>                                                                 lists.freedesktop.org
>                                                                 <http://lists.freedesktop.org>
>                                                                 lists.freedesktop.org
>                                                                 <http://lists.freedesktop.org>
>                                                                 To see
>                                                                 the
>                                                                 collection
>                                                                 of
>                                                                 prior
>                                                                 postings
>                                                                 to the
>                                                                 list,
>                                                                 visit the
>                                                                 amd-gfx Archives.
>                                                                 Using
>                                                                 amd-gfx:
>                                                                 To
>                                                                 post a
>                                                                 message to
>                                                                 all
>                                                                 the list
>                                                                 members,
>                                                                 send
>                                                                 email ...
>
>
>
>                                                                 amd-gfx Info
>                                                                 Page -
>                                                                 lists.freedesktop.org
>                                                                 <http://lists.freedesktop.org>
>                                                                 lists.freedesktop.org
>                                                                 <http://lists.freedesktop.org>
>                                                                 To see
>                                                                 the
>                                                                 collection
>                                                                 of
>                                                                 prior
>                                                                 postings
>                                                                 to the
>                                                                 list,
>                                                                 visit the
>                                                                 amd-gfx Archives.
>                                                                 Using
>                                                                 amd-gfx:
>                                                                 To
>                                                                 post a
>                                                                 message to
>                                                                 all
>                                                                 the list
>                                                                 members,
>                                                                 send
>                                                                 email ...
>
>
>
>
>
>
>
>
>
>
>
>                                                                 _______________________________________________
>                                                                 amd-gfx mailing
>                                                                 list
>                                                                 amd-gfx at lists.freedesktop.org
>                                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                                 https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>                                                                 <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>
>
>                                                             _______________________________________________
>                                                             amd-gfx
>                                                             mailing list
>                                                             amd-gfx at lists.freedesktop.org
>                                                             <mailto:amd-gfx at lists.freedesktop.org>
>                                                             https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>                                                             <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>
>
>
>
>                                                 _______________________________________________
>                                                 amd-gfx mailing list
>                                                 amd-gfx at lists.freedesktop.org
>                                                 <mailto:amd-gfx at lists.freedesktop.org>
>                                                 https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>                                                 <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>
>
>
>
>                                     Sincerely yours,
>                                     Serguei Sagalovitch
>
>                                     _______________________________________________
>                                     amd-gfx mailing list
>                                     amd-gfx at lists.freedesktop.org
>                                     <mailto:amd-gfx at lists.freedesktop.org>
>                                     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>                                     <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>
>
>
>
>
>
>
>                 Sincerely yours,
>                 Serguei Sagalovitch
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161226/0f265ded/attachment-0001.html>