[PATCH] drm/amdgpu: add VM update fences back to the root PD
Luben Tuikov
luben.tuikov at amd.com
Thu Feb 20 02:55:29 UTC 2020
I was able to bisect it to this commit:
$git bisect good
6643ba1ff05d252e451bada9443759edb95eab3b is the first bad commit
commit 6643ba1ff05d252e451bada9443759edb95eab3b
Author: Luben Tuikov <luben.tuikov at amd.com>
Date: Mon Feb 10 18:16:45 2020 -0500
drm/amdgpu: Move to a per-IB secure flag (TMZ)
Move from a per-CS secure flag (TMZ) to a per-IB
secure flag.
Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
Reviewed-by: Huang Rui <ray.huang at amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 --
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 23 ++++++++++++++++++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 9 ++++-----
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 23 +++++++----------------
drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 3 +--
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 3 +--
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 3 +--
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 20 ++++++--------------
include/uapi/drm/amdgpu_drm.h | 7 ++++---
10 files changed, 44 insertions(+), 52 deletions(-)
It's a bit baffling and perhaps there is a clash in the new flag,
or libdrm needs to also be updated. Will look at it more tomorrow.
My bisect log can be found below.
Regards,
Luben
------------
git bisect start
# good: [31866a9d7d40245316ad7c17b87961f68321cab8] drm/amd/display: Move drm_dp_mst_atomic_check() to the front of dc_validate_global_state()
git bisect good 31866a9d7d40245316ad7c17b87961f68321cab8
# bad: [7fd3b632e17e55c5ffd008f9f025754e7daa1b66] drm/amdgpu: fix colliding of preemption
git bisect bad 7fd3b632e17e55c5ffd008f9f025754e7daa1b66
# good: [41d073f29e59abdfb0d415033772c01c321086c9] drm/amdgpu/vcn2.5: fix warning
git bisect good 41d073f29e59abdfb0d415033772c01c321086c9
# good: [71da21488b65ade2b789416088b9f2493ad3e056] drm/amd/display: fix dtm unloading
git bisect good 71da21488b65ade2b789416088b9f2493ad3e056
# bad: [e3ca25cd2e75824e4dd9e6bb16013ab5f3ec63a6] drm/ttm: individualize resv objects before calling release_notify
git bisect bad e3ca25cd2e75824e4dd9e6bb16013ab5f3ec63a6
# good: [7e3452a6536ee7136a4d79f2369f15d5ce96583c] drm/amdgpu: return -EFAULT if copy_to_user() fails
git bisect good 7e3452a6536ee7136a4d79f2369f15d5ce96583c
# bad: [9b7ac0fb3bbfd6dd001423da497aafec3e8a5131] drm/amdgpu: log on non-zero error conter per IP before GPU reset
git bisect bad 9b7ac0fb3bbfd6dd001423da497aafec3e8a5131
# bad: [6643ba1ff05d252e451bada9443759edb95eab3b] drm/amdgpu: Move to a per-IB secure flag (TMZ)
git bisect bad 6643ba1ff05d252e451bada9443759edb95eab3b
# good: [3387f56e37b2fa8b0fbb3a538bc08daae923bb5f] drm/amd/powerplay: correct the way for checking SMU_FEATURE_BACO_BIT support
git bisect good 3387f56e37b2fa8b0fbb3a538bc08daae923bb5f
# first bad commit: [6643ba1ff05d252e451bada9443759edb95eab3b] drm/amdgpu: Move to a per-IB secure flag (TMZ)
------------
On 2020-02-19 8:02 p.m., Luben Tuikov wrote:
> New developments:
>
> Running "amdgpu_test -s 1 -t 4" causes timeouts and koops. Attached
> is the system log, tested Navi 10:
>
> [ 144.484547] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
> [ 149.604641] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1459, emitted seq=1462
> [ 149.604779] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process amdgpu_test pid 2696 thread amdgpu_test pid 2696
> [ 149.604788] amdgpu 0000:0b:00.0: GPU reset begin!
> ...
>
> The kernel is at 7fd3b632e17e55c5ffd008f9f025754e7daa1b66 plus
> the patch of the original post of this thread (thus the "-dirty").
>
> Running the same test on the previous version of the kernel I was running,
> at 31866a9d7d40245316ad7c17b87961f68321cab8, succeeds as follows:
>
> Suite: Basic Tests
> Test: Command submission Test (GFX) ...passed
>
> Run Summary: Type Total Ran Passed Failed Inactive
> suites 11 0 n/a 0 0
> tests 63 1 1 0 0
> asserts 526725 526725 526725 0 n/a
>
> Elapsed time = 0.027 seconds
>
> Regards,
> Luben
>
> On 2020-02-19 4:40 p.m., Luben Tuikov wrote:
>> On 2020-02-19 9:44 a.m., Christian König wrote:
>>> Well it should apply on top of amd-staging-drm-next. But I haven't
>>> fetched that today yet.
>>>
>>> Give me a minute to rebase.
>>
>> This patch seems to have fixed the regression we saw yesterday.
>> It applies to amd-staging-drm-next with a small jitter:
>>
>> $patch -p1 < /tmp/\[PATCH\]\ drm_amdgpu\:\ add\ VM\ update\ fences\ back\ to\ the\ root\ PD.eml
>> patching file amdgpu_vm.c
>> Hunk #2 succeeded at 1599 (offset -20 lines).
>>
>> I've been running 'glxgears' on the root window and 'pinion'
>> and no problems--clean log.
>>
>> Tested-by: Luben Tuikov <luben.tuikov at amd.com>
>>
>> Regards,
>> Luben
>>
>>>
>>> Christian.
>>>
>>> Am 19.02.20 um 15:27 schrieb Tom St Denis:
>>>> This doesn't apply on top of 7fd3b632e17e55c5ffd008f9f025754e7daa1b66
>>>> which is the tip of drm-next
>>>>
>>>>
>>>> Tom
>>>>
>>>> On 2020-02-19 9:20 a.m., Christian König wrote:
>>>>> Add update fences to the root PD while mapping BOs.
>>>>>
>>>>> Otherwise PDs freed during the mapping won't wait for
>>>>> updates to finish and can cause corruptions.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 ++++++++++++--
>>>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> index e7ab0c1e2793..dd63ccdbad2a 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> @@ -585,8 +585,8 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
>>>>> {
>>>>> entry->priority = 0;
>>>>> entry->tv.bo = &vm->root.base.bo->tbo;
>>>>> - /* One for TTM and one for the CS job */
>>>>> - entry->tv.num_shared = 2;
>>>>> + /* Two for VM updates, one for TTM and one for the CS job */
>>>>> + entry->tv.num_shared = 4;
>>>>> entry->user_pages = NULL;
>>>>> list_add(&entry->tv.head, validated);
>>>>> }
>>>>> @@ -1619,6 +1619,16 @@ static int amdgpu_vm_bo_update_mapping(struct
>>>>> amdgpu_device *adev,
>>>>> goto error_unlock;
>>>>> }
>>>>> + if (flags & AMDGPU_PTE_VALID) {
>>>>> + struct amdgpu_bo *root = vm->root.base.bo;
>>>>> +
>>>>> + if (!dma_fence_is_signaled(vm->last_direct))
>>>>> + amdgpu_bo_fence(root, vm->last_direct, true);
>>>>> +
>>>>> + if (!dma_fence_is_signaled(vm->last_delayed))
>>>>> + amdgpu_bo_fence(root, vm->last_delayed, true);
>>>>> + }
>>>>> +
>>>>> r = vm->update_funcs->prepare(¶ms, resv, sync_mode);
>>>>> if (r)
>>>>> goto error_unlock;
>>>
>>
>
>
>
More information about the amd-gfx
mailing list