<div dir="auto">Do you have any work actually going into multiple pipes? My understanding is that opencl will only use one queue at a time (but I'm not really certain about that).<div dir="auto"><br></div><div dir="auto">What you can also check is if the app works correctly when it executed on pipe0, and if it hangs on pipe 1+. I removed all the locations where pipe0 was hardcoded in the open driver, but it is possible it is still hardcoded somewhere on the closed stack.</div><div dir="auto"><br></div><div dir="auto">Regards,</div><div dir="auto">Andres </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <<a href="mailto:David1.Zhou@amd.com">David1.Zhou@amd.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div id="m_-7809043402888492486divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p>Then snychronization should have no problem, it maybe relate to multipipe hw setting issue.</p>
<p><br>
</p>
<p>Regards,</p>
<p>David Zhou<br>
</p>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_-7809043402888492486divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Andres Rodriguez <<a href="mailto:andresx7@gmail.com" target="_blank">andresx7@gmail.com</a>><br>
<b>Sent:</b> Tuesday, November 7, 2017 2:00:57 AM<br>
<b>To:</b> Zhou, David(ChunMing); amd-gfx list<br>
<b>Cc:</b> Deucher, Alexander<br>
<b>Subject:</b> Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics</font>
<div> </div>
</div>
<div class="m_-7809043402888492486BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="m_-7809043402888492486PlainText">Sorry my mail client seems to have blown up. My reply got cut off,<br>
here is the full version:<br>
<br>
<br>
<br>
On 2017-11-06 01:49 AM, Chunming Zhou wrote:<br>
> Hi Andres,<br>
><br>
Hi David,<br>
<br>
> With your this patch, OCLperf hung.<br>
Is this on all ASICs or just a specific one?<br>
<br>
><br>
> Could you explain more?<br>
><br>
> If I am correctly, the difference of with and without this patch is<br>
> setting first two queue or setting all queues of pipe0 to queue_bitmap.<br>
It is slightly different. With this patch we will also use the first<br>
two queues of all pipes, not just pipe 0;<br>
<br>
Pre-patch:<br>
<br>
|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|<br>
11111111 00000000 00000000 00000000<br>
<br>
Post-patch:<br>
<br>
|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|<br>
11000000 11000000 11000000 11000000<br>
<br>
What this means is that we are allowing real multithreading for<br>
compute. Jobs on different pipes allow for parallel execution of work.<br>
Jobs on the same pipe (but different queues) use timeslicing to share<br>
the hardware.<br>
<br>
<br>
><br>
> Then UMD can use different number queue to submit command for compute<br>
> selected by amdgpu_queue_mgr_map.<br>
><br>
> I checked amdgpu_queue_mgr_map implementation, CS_IOCTL can map user<br>
> ring to different hw ring depending on busy or idle, right?<br>
Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what<br>
the mapping is for a usermode ring to a kernel ring id.<br>
<br>
> If yes, I see a bug in it, which will result in our sched_fence not<br>
> work. Our sched fence assumes the job will be executed in order, your<br>
> mapping queue breaks this.<br>
<br>
I think here you mean that work will execute out of order because it<br>
will go to different rings?<br>
<br>
That should not happen, since the id mapping is permanent on a<br>
per-context basis. Once a mapping is decided, it will be cached for<br>
this context so that we keep execution order guarantees. See the<br>
id-caching code in amdgpu_queue_mgr.c for reference.<br>
<br>
As long as the usermode keeps submitting work to the same ring, it<br>
will all be executed in order (all in the same ring). There is no<br>
change in this guarantee compared to pre-patch. Note that even before<br>
this patch amdgpu_queue_mgr_map has been using an LRU policy for a<br>
long time now.<br>
<br>
Regards,<br>
Andres<br>
<br>
On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <<a href="mailto:andresx7@gmail.com" target="_blank">andresx7@gmail.com</a>> wrote:<br>
><br>
><br>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:<br>
>><br>
>> Hi Andres,<br>
>><br>
><br>
> Hi David,<br>
><br>
>> With your this patch, OCLperf hung.<br>
><br>
><br>
> Is this on all ASICs or just a specific one?<br>
><br>
>><br>
>> Could you explain more?<br>
>><br>
>> If I am correctly, the difference of with and without this patch is<br>
>> setting first two queue or setting all queues of pipe0 to queue_bitmap.<br>
><br>
><br>
> It is slightly different. With this patch we will also use the first two<br>
> queues of all pipes, not just pipe 0;<br>
><br>
> Pre-patch:<br>
><br>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|<br>
> 11111111 00000000 00000000 00000000<br>
><br>
> Post-patch:<br>
><br>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|<br>
> 11000000 11000000 11000000 11000000<br>
><br>
> What this means is that we are allowing real multithreading for compute.<br>
> Jobs on different pipes allow for parallel execution of work. Jobs on the<br>
> same pipe (but different queues) use timeslicing to share the hardware.<br>
><br>
><br>
><br>
>><br>
>> Then UMD can use different number queue to submit command for compute<br>
>> selected by amdgpu_queue_mgr_map.<br>
>><br>
>> I checked amdgpu_queue_mgr_map implementation, CS_IOCTL can map user ring<br>
>> to different hw ring depending on busy or idle, right?<br>
>><br>
>> If yes, I see a bug in it, which will result in our sched_fence not work.<br>
>> Our sched fence assumes the job will be executed in order, your mapping<br>
>> queue breaks this.<br>
>><br>
>><br>
>> Regards,<br>
>><br>
>> David Zhou<br>
>><br>
>><br>
>> On 2017年09月27日 00:22, Andres Rodriguez wrote:<br>
>>><br>
>>> A performance regression for OpenCL tests on Polaris11 had this feature<br>
>>> disabled for all asics.<br>
>>><br>
>>> Instead, disable it selectively on the affected asics.<br>
>>><br>
>>> Signed-off-by: Andres Rodriguez <<a href="mailto:andresx7@gmail.com" target="_blank">andresx7@gmail.com</a>><br>
>>> ---<br>
>>> drivers/gpu/drm/amd/amdgpu/<wbr>amdgpu_gfx.c | 14 ++++++++++++--<br>
>>> 1 file changed, 12 insertions(+), 2 deletions(-)<br>
>>><br>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/<wbr>amdgpu_gfx.c<br>
>>> b/drivers/gpu/drm/amd/amdgpu/<wbr>amdgpu_gfx.c<br>
>>> index 4f6c68f..3d76e76 100644<br>
>>> --- a/drivers/gpu/drm/amd/amdgpu/<wbr>amdgpu_gfx.c<br>
>>> +++ b/drivers/gpu/drm/amd/amdgpu/<wbr>amdgpu_gfx.c<br>
>>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(<wbr>unsigned *mask,<br>
>>> unsigned max_se, unsigned max_s<br>
>>> }<br>
>>> }<br>
>>> +static bool amdgpu_gfx_is_multipipe_<wbr>capable(struct amdgpu_device *adev)<br>
>>> +{<br>
>>> + /* FIXME: spreading the queues across pipes causes perf regressions<br>
>>> + * on POLARIS11 compute workloads */<br>
>>> + if (adev->asic_type == CHIP_POLARIS11)<br>
>>> + return false;<br>
>>> +<br>
>>> + return adev->gfx.mec.num_mec > 1;<br>
>>> +}<br>
>>> +<br>
>>> void amdgpu_gfx_compute_queue_<wbr>acquire(struct amdgpu_device *adev)<br>
>>> {<br>
>>> int i, queue, pipe, mec;<br>
>>> + bool multipipe_policy = amdgpu_gfx_is_multipipe_<wbr>capable(adev);<br>
>>> /* policy for amdgpu compute queue ownership */<br>
>>> for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {<br>
>>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_<wbr>acquire(struct<br>
>>> amdgpu_device *adev)<br>
>>> if (mec >= adev->gfx.mec.num_mec)<br>
>>> break;<br>
>>> - /* FIXME: spreading the queues across pipes causes perf<br>
>>> regressions */<br>
>>> - if (0) {<br>
>>> + if (multipipe_policy) {<br>
>>> /* policy: amdgpu owns the first two queues of the first<br>
>>> MEC */<br>
>>> if (mec == 0 && queue < 2)<br>
>>> set_bit(i, adev->gfx.mec.queue_bitmap);<br>
>><br>
>><br>
><br>
</div>
</span></font></div>
</div>
</blockquote></div></div>