[PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Tue Nov 7 09:23:40 UTC 2017

I got the infomation about this issue:

"

  *   If I install #491963 (failed in the report) with “./amdgpu-pro-install -y --opencl=legacy” command, the test passed. It failed when rocm is also installed with “./amdgpu-pro-install -y --opencl=legacy,rocm” command."

So I guess the hung is related to the pipe is used both ORCA and rocm. But Felix said they dont support rocm on Tonga, that could mean this issue doesn't matter currently.

Regards,

David  Zhou

________________________________
From: Andres Rodriguez <andresx7 at gmail.com>
Sent: Tuesday, November 7, 2017 3:26:38 PM
To: Zhou, David(ChunMing)
Cc: amd-gfx list; Deucher, Alexander
Subject: Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Do you have any work actually going into multiple pipes? My understanding is that opencl will only use one queue at a time (but I'm not really certain about that).

What you can also check is if the app works correctly when it executed on pipe0, and if it hangs on pipe 1+. I removed all the locations where pipe0 was hardcoded in the open driver, but it is possible it is still hardcoded somewhere on the closed stack.

Regards,
Andres

On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <David1.Zhou at amd.com<mailto:David1.Zhou at amd.com>> wrote:

Then snychronization should have no problem, it maybe relate to multipipe hw setting issue.

Regards,

David Zhou

________________________________
From: Andres Rodriguez <andresx7 at gmail.com<mailto:andresx7 at gmail.com>>
Sent: Tuesday, November 7, 2017 2:00:57 AM
To: Zhou, David(ChunMing); amd-gfx list
Cc: Deucher, Alexander
Subject: Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Sorry my mail client seems to have blown up. My reply got cut off,
here is the full version:

On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> Hi Andres,
>
Hi David,

> With your this patch, OCLperf hung.
Is this on all ASICs or just a specific one?

>
> Could you explain more?
>
> If I am correctly, the difference of with and without this patch is
> setting first two queue or setting all queues of pipe0 to queue_bitmap.
It is slightly different. With this patch we will also use the first
two queues of all pipes, not just pipe 0;

Pre-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11111111  00000000  00000000  00000000

Post-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11000000  11000000  11000000  11000000

What this means is that we are allowing real multithreading for
compute. Jobs on different pipes allow for parallel execution of work.
Jobs on the same pipe (but different queues) use timeslicing to share
the hardware.

>
> Then UMD can use different number queue to submit command for compute
> selected by amdgpu_queue_mgr_map.
>
> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring to different hw ring depending on busy or idle, right?
Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
the mapping is for a usermode ring to a kernel ring id.

> If yes, I see a bug in it, which will result in our sched_fence not
> work. Our sched fence assumes the job will be executed in order, your
> mapping queue breaks this.

I think here you mean that work will execute out of order because it
will go to different rings?

That should not happen, since the id mapping is permanent on a
per-context basis. Once a mapping is decided, it will be cached for
this context so that we keep execution order guarantees. See the
id-caching code in amdgpu_queue_mgr.c for reference.

As long as the usermode keeps submitting work to the same ring, it
will all be executed in order (all in the same ring). There is no
change in this guarantee compared to pre-patch. Note that even before
this patch amdgpu_queue_mgr_map has been using an LRU policy for a
long time now.

Regards,
Andres

On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7 at gmail.com<mailto:andresx7 at gmail.com>> wrote:
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>>
>> Hi Andres,
>>
>
> Hi David,
>
>> With your this patch, OCLperf hung.
>
>
> Is this on all ASICs or just a specific one?
>
>>
>> Could you explain more?
>>
>> If I am correctly, the difference of with and without this patch is
>> setting first two queue or setting all queues of pipe0 to queue_bitmap.
>
>
> It is slightly different. With this patch we will also use the first two
> queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for compute.
> Jobs on different pipes allow for parallel execution of work. Jobs on the
> same pipe (but different queues) use timeslicing to share the hardware.
>
>
>
>>
>> Then UMD can use different number queue to submit command for compute
>> selected by amdgpu_queue_mgr_map.
>>
>> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user ring
>> to different hw ring depending on busy or idle, right?
>>
>> If yes, I see a bug in it, which will result in our sched_fence not work.
>> Our sched fence assumes the job will be executed in order, your mapping
>> queue breaks this.
>>
>>
>> Regards,
>>
>> David Zhou
>>
>>
>> On 2017年09月27日 00:22, Andres Rodriguez wrote:
>>>
>>> A performance regression for OpenCL tests on Polaris11 had this feature
>>> disabled for all asics.
>>>
>>> Instead, disable it selectively on the affected asics.
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com<mailto:andresx7 at gmail.com>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..3d76e76 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
>>> unsigned max_se, unsigned max_s
>>>       }
>>>   }
>>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
>>> +{
>>> +    /* FIXME: spreading the queues across pipes causes perf regressions
>>> +     * on POLARIS11 compute workloads */
>>> +    if (adev->asic_type == CHIP_POLARIS11)
>>> +        return false;
>>> +
>>> +    return adev->gfx.mec.num_mec > 1;
>>> +}
>>> +
>>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>>>   {
>>>       int i, queue, pipe, mec;
>>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
>>>       /* policy for amdgpu compute queue ownership */
>>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
>>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
>>> amdgpu_device *adev)
>>>           if (mec >= adev->gfx.mec.num_mec)
>>>               break;
>>> -        /* FIXME: spreading the queues across pipes causes perf
>>> regressions */
>>> -        if (0) {
>>> +        if (multipipe_policy) {
>>>               /* policy: amdgpu owns the first two queues of the first
>>> MEC */
>>>               if (mec == 0 && queue < 2)
>>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171107/fd5a1231/attachment-0001.html>