[PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Tue Nov 7 07:26:38 UTC 2017

Do you have any work actually going into multiple pipes? My understanding
is that opencl will only use one queue at a time (but I'm not really
certain about that).

What you can also check is if the app works correctly when it executed on
pipe0, and if it hangs on pipe 1+. I removed all the locations where pipe0
was hardcoded in the open driver, but it is possible it is still hardcoded
somewhere on the closed stack.

Regards,
Andres

On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <David1.Zhou at amd.com>
wrote:

> Then snychronization should have no problem, it maybe relate to multipipe
> hw setting issue.
>
>
> Regards,
>
> David Zhou
> ------------------------------
> *From:* Andres Rodriguez <andresx7 at gmail.com>
> *Sent:* Tuesday, November 7, 2017 2:00:57 AM
> *To:* Zhou, David(ChunMing); amd-gfx list
> *Cc:* Deucher, Alexander
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on
> non PL11 asics
>
> Sorry my mail client seems to have blown up. My reply got cut off,
> here is the full version:
>
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> > Hi Andres,
> >
> Hi David,
>
> > With your this patch, OCLperf hung.
> Is this on all ASICs or just a specific one?
>
> >
> > Could you explain more?
> >
> > If I am correctly, the difference of with and without this patch is
> > setting first two queue or setting all queues of pipe0 to queue_bitmap.
> It is slightly different. With this patch we will also use the first
> two queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for
> compute. Jobs on different pipes allow for parallel execution of work.
> Jobs on the same pipe (but different queues) use timeslicing to share
> the hardware.
>
>
> >
> > Then UMD can use different number queue to submit command for compute
> > selected by amdgpu_queue_mgr_map.
> >
> > I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> > ring to different hw ring depending on busy or idle, right?
> Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
> the mapping is for a usermode ring to a kernel ring id.
>
> > If yes, I see a bug in it, which will result in our sched_fence not
> > work. Our sched fence assumes the job will be executed in order, your
> > mapping queue breaks this.
>
> I think here you mean that work will execute out of order because it
> will go to different rings?
>
> That should not happen, since the id mapping is permanent on a
> per-context basis. Once a mapping is decided, it will be cached for
> this context so that we keep execution order guarantees. See the
> id-caching code in amdgpu_queue_mgr.c for reference.
>
> As long as the usermode keeps submitting work to the same ring, it
> will all be executed in order (all in the same ring). There is no
> change in this guarantee compared to pre-patch. Note that even before
> this patch amdgpu_queue_mgr_map has been using an LRU policy for a
> long time now.
>
> Regards,
> Andres
>
> On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7 at gmail.com>
> wrote:
> >
> >
> > On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> >>
> >> Hi Andres,
> >>
> >
> > Hi David,
> >
> >> With your this patch, OCLperf hung.
> >
> >
> > Is this on all ASICs or just a specific one?
> >
> >>
> >> Could you explain more?
> >>
> >> If I am correctly, the difference of with and without this patch is
> >> setting first two queue or setting all queues of pipe0 to queue_bitmap.
> >
> >
> > It is slightly different. With this patch we will also use the first two
> > queues of all pipes, not just pipe 0;
> >
> > Pre-patch:
> >
> > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
> >  11111111  00000000  00000000  00000000
> >
> > Post-patch:
> >
> > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
> >  11000000  11000000  11000000  11000000
> >
> > What this means is that we are allowing real multithreading for compute.
> > Jobs on different pipes allow for parallel execution of work. Jobs on the
> > same pipe (but different queues) use timeslicing to share the hardware.
> >
> >
> >
> >>
> >> Then UMD can use different number queue to submit command for compute
> >> selected by amdgpu_queue_mgr_map.
> >>
> >> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring
> >> to different hw ring depending on busy or idle, right?
> >>
> >> If yes, I see a bug in it, which will result in our sched_fence not
> work.
> >> Our sched fence assumes the job will be executed in order, your mapping
> >> queue breaks this.
> >>
> >>
> >> Regards,
> >>
> >> David Zhou
> >>
> >>
> >> On 2017年09月27日 00:22, Andres Rodriguez wrote:
> >>>
> >>> A performance regression for OpenCL tests on Polaris11 had this feature
> >>> disabled for all asics.
> >>>
> >>> Instead, disable it selectively on the affected asics.
> >>>
> >>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
> >>>   1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> index 4f6c68f..3d76e76 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
> >>> unsigned max_se, unsigned max_s
> >>>       }
> >>>   }
> >>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device
> *adev)
> >>> +{
> >>> +    /* FIXME: spreading the queues across pipes causes perf
> regressions
> >>> +     * on POLARIS11 compute workloads */
> >>> +    if (adev->asic_type == CHIP_POLARIS11)
> >>> +        return false;
> >>> +
> >>> +    return adev->gfx.mec.num_mec > 1;
> >>> +}
> >>> +
> >>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
> >>>   {
> >>>       int i, queue, pipe, mec;
> >>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> >>>       /* policy for amdgpu compute queue ownership */
> >>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
> >>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
> >>> amdgpu_device *adev)
> >>>           if (mec >= adev->gfx.mec.num_mec)
> >>>               break;
> >>> -        /* FIXME: spreading the queues across pipes causes perf
> >>> regressions */
> >>> -        if (0) {
> >>> +        if (multipipe_policy) {
> >>>               /* policy: amdgpu owns the first two queues of the first
> >>> MEC */
> >>>               if (mec == 0 && queue < 2)
> >>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171107/67b3557c/attachment.html>