[PATCH 18/19] drm/amdgpu: Disable GFX PG on CZ

Felix Kuehling felix.kuehling at amd.com
Sat Aug 12 01:18:39 UTC 2017


On 2017-08-11 08:54 PM, StDenis, Tom wrote:
> Hi Felix,
>
> Well it's really up to Christian and Alex but I'd keep an eye on this since it'll cause issues with embedded down the road.
>
> I happen to have a CZ system so I could possibly try and bisect 4.11/4.12 and see if there's any stable points for you guys.

I doubt there is a stable point. On the KFD branch we've always had GFX
power gating disabled, because it was causing us problems as soon as we
picked up kernel 4.6 in August 2016, which first introduced CZ power
gating to the KFD branch.

>   Is there a short and simple KFD setup I can install/run to test it?  Or is simply loading a KFD merged/rebased kernel enough to cause the hang (and thus I guess a bisect doesn't make sense).

With patch 19 in this series, it's a hang during boot. Without it, you
can boot, and you'll get errors from kfdtest due to MEC hangs as soon as
a user mode queue is created. You'd need a modified Thunk and KFDTest
for this experiment. You could get both from a recent roc-master build.
The rest of the ROCm stack isn't needed.

KFDTest isn't released to the public, and the last public release
doesn't include the necessary Thunk changes yet. I think the Thunk
change will make it into ROCm 1.6.3.

I've also been able to run hsaconformance (which I think is included in
our public releases) with 74% of tests passing. OCL tests currently
segfault in the HSA runtime, as do some of the conformance tests. I'm
going to look into the HSA runtime a bit more to see if I can get OCL to
work for more realistic testing.

Regards,
  Felix

>
> Cheers,
> Tom
>
> ________________________________________
> From: Kuehling, Felix
> Sent: Friday, August 11, 2017 20:40
> To: StDenis, Tom; amd-gfx at lists.freedesktop.org; oded.gabbay at gmail.com
> Subject: Re: [PATCH 18/19] drm/amdgpu: Disable GFX PG on CZ
>
> With the next change that adds programming of RLC_CP_SCHEDULERS it's a
> VM fault and hard hang during boot, just after HWS initialization.
> Without that change it's only a MEC hang when the first application
> tries to create a user mode queue.
>
> Regards,
>   Felix
>
> On 2017-08-11 08:08 PM, StDenis, Tom wrote:
>> Hmm, I'd still be careful about disabling GFX PG since we may fail to meet energy star requirements.
>>
>> Does the system hard hang or simply GPU hang?
>>
>> Tom
>>
>> ________________________________________
>> From: Kuehling, Felix
>> Sent: Friday, August 11, 2017 19:56
>> To: StDenis, Tom; amd-gfx at lists.freedesktop.org; oded.gabbay at gmail.com
>> Subject: Re: [PATCH 18/19] drm/amdgpu: Disable GFX PG on CZ
>>
>> Yes, I'm up-to-date. KFD doesn't use the KIQ to map the HIQ. And HIQ
>> maps all our other queues (unless we're disabling the hardware scheduler).
>>
>> Regards,
>>   Felix
>>
>>
>> On 2017-08-11 07:45 PM, StDenis, Tom wrote:
>>> Hi Felix,
>>>
>>> I'm assuming your tree is up to date with amd-staging-4.11 or 4.12 but we did previously have issues with compute rings if PG was enabled (specifically CGCG + PG) on Carrizo.  Then David committed some KIQ upgrades and it started working properly.
>>>
>>> Could that be related?  Because GFX PG "should work" on Carrizo is the official line last I heard from the GFX IP team.
>>>
>>> Cheers,
>>> Tom
>>> ________________________________________
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Felix Kuehling <Felix.Kuehling at amd.com>
>>> Sent: Friday, August 11, 2017 17:56
>>> To: amd-gfx at lists.freedesktop.org; oded.gabbay at gmail.com
>>> Cc: Kuehling, Felix
>>> Subject: [PATCH 18/19] drm/amdgpu: Disable GFX PG on CZ
>>>
>>> It's causing problems with user mode queues and the HIQ, and can
>>> lead to hard hangs during boot after programming RLC_CP_SCHEDULERS.
>>>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/vi.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
>>> index 18bb3cb..495c8a3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>> @@ -1029,8 +1029,7 @@ static int vi_common_early_init(void *handle)
>>>                 /* rev0 hardware requires workarounds to support PG */
>>>                 adev->pg_flags = 0;
>>>                 if (adev->rev_id != 0x00 || CZ_REV_BRISTOL(adev->pdev->revision)) {
>>> -                       adev->pg_flags |= AMD_PG_SUPPORT_GFX_PG |
>>> -                               AMD_PG_SUPPORT_GFX_SMG |
>>> +                       adev->pg_flags |= AMD_PG_SUPPORT_GFX_SMG |
>>>                                 AMD_PG_SUPPORT_GFX_PIPELINE |
>>>                                 AMD_PG_SUPPORT_CP |
>>>                                 AMD_PG_SUPPORT_UVD |
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list