amdgpu freezes kernel after kernel 5.7.6 changes

roland.rucky at gmail.com roland.rucky at gmail.com
Sat Jun 27 16:03:56 UTC 2020


Not sure if I am contacting the correct person,

Since I updated to kernel 5.7.6, my system started to freeze randomly.
After a couple of freezes, I noticed, that they always happen when
playing games, or during videoplayback in e.g. firefox.

I reverted to the previous kernel 5.7.5, and all issues are gone. Next
I started to revert and test single commits between the two kernel
versions, which affect amdgpu. If I revert the changes listed below,
the kernel does not freeze any more.

Sadly I can`t get any crash reports / logs. Even the magic sysrq key
does not work, when the system is frozen.

I will also attach a patch, which includes all reverted commits.


List of changes I reverted:
-----------------------------------------------------------------------

commit 6674508ba1a2ea6caca5de2bcb25bc00a050fd0a
Author: Harry Wentland <harry.wentland at amd.com>
Date:   Thu May 28 09:44:44 2020 -0400

    Revert "drm/amd/display: disable dcn20 abm feature for bring up"

    commit 14ed1c908a7a623cc0cbf0203f8201d1b7d31d16 upstream.

    This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.

    This change was used for DCN2 bringup and is no longer desired.
    In fact it breaks backlight on DCN2 systems.

    Cc: Alexander Monakov <amonakov at ispras.ru>
    Cc: Hersen Wu <hersenxs.wu at amd.com>
    Cc: Anthony Koo <Anthony.Koo at amd.com>
    Cc: Michael Chiu <Michael.Chiu at amd.com>
    Signed-off-by: Harry Wentland <harry.wentland at amd.com>
    Acked-by: Alex Deucher <alexander.deucher at amd.com>
    Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
    Reported-and-tested-by: Alexander Monakov <amonakov at ispras.ru>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
    Cc: stable at vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 7fc15b82fe48..f9f02e08054b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1334,7 +1334,7 @@ static int dm_late_init(void *handle)
 	unsigned int linear_lut[16];
 	int i;
 	struct dmcu *dmcu = adev->dm.dc->res_pool->dmcu;
-	bool ret = false;
+	bool ret;

 	for (i = 0; i < 16; i++)
 		linear_lut[i] = 0xFFFF * i / 15;
@@ -1350,13 +1350,10 @@ static int dm_late_init(void *handle)
 	 */
 	params.min_abm_backlight = 0x28F;

-	/* todo will enable for navi10 */
-	if (adev->asic_type <= CHIP_RAVEN) {
-		ret = dmcu_load_iram(dmcu, params);
+	ret = dmcu_load_iram(dmcu, params);

-		if (!ret)
-			return -EINVAL;
-	}
+	if (!ret)
+		return -EINVAL;

 	return detect_mst_link_for_all_connectors(adev->ddev);
 }

commit fba8f9ef7e1405ee6f422beb874791e8a5eb489c
Author: Alex Deucher <alexander.deucher at amd.com>
Date:   Tue Jun 2 17:22:48 2020 -0400

    drm/amdgpu/display: use blanked rather than plane state for sync
groups

    commit b7f839d292948142eaab77cedd031aad0bfec872 upstream.

    We may end up with no planes set yet, depending on the ordering,
but we
    should have the proper blanking state which is either handled by
either
    DPG or TG depending on the hardware generation.  Check both to
determine
    the proper blanked state.

    Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
    Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is
synced by plane state")
    Cc: nicholas.kazlauskas at amd.com
    Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
    Cc: stable at vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 4a619328101c..4acaf4be8a81 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1011,9 +1011,17 @@ static void program_timing_sync(
 			}
 		}

-		/* set first pipe with plane as master */
+		/* set first unblanked pipe as master */
 		for (j = 0; j < group_size; j++) {
-			if (pipe_set[j]->plane_state) {
+			bool is_blanked;
+
+			if (pipe_set[j]->stream_res.opp->funcs-
>dpg_is_blanked)
+				is_blanked =
+					pipe_set[j]->stream_res.opp-
>funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
+			else
+				is_blanked =
+					pipe_set[j]->stream_res.tg-
>funcs->is_blanked(pipe_set[j]->stream_res.tg);
+			if (!is_blanked) {
 				if (j == 0)
 					break;

@@ -1034,9 +1042,17 @@ static void program_timing_sync(
 				status->timing_sync_info.master =
false;

 		}
-		/* remove any other pipes with plane as they have
already been synced */
+		/* remove any other unblanked pipes as they have
already been synced */
 		for (j = j + 1; j < group_size; j++) {
-			if (pipe_set[j]->plane_state) {
+			bool is_blanked;
+
+			if (pipe_set[j]->stream_res.opp->funcs-
>dpg_is_blanked)
+				is_blanked =
+					pipe_set[j]->stream_res.opp-
>funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
+			else
+				is_blanked =
+					pipe_set[j]->stream_res.tg-
>funcs->is_blanked(pipe_set[j]->stream_res.tg);
+			if (!is_blanked) {
 				group_size--;
 				pipe_set[j] = pipe_set[group_size];
 				j--;

commit b5232e2ee8df85891514c73472cac09921e5d51d
Author: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
Date:   Tue Jun 2 20:42:33 2020 -0400

    drm/amd/display: Revalidate bandwidth before commiting DC updates

    [ Upstream commit a24eaa5c51255b344d5a321f1eeb3205f2775498 ]

    [Why]
    Whenever we switch between tiled formats without also switching
pixel
    formats or doing anything else that recreates the DC plane state we
    can run into underflow or hangs since we're not updating the
    DML parameters before committing to the hardware.

    [How]
    If the update type is FULL then call validate_bandwidth again to
update
    the DML parmeters before committing the state.

    This is basically just a workaround and protective measure against
    update types being added DC where we could run into this issue in
    the future.

    We can only fully validate the state in advance before applying it
to
    the hardware if we recreate all the plane and stream states since
    we can't modify what's currently in use.

    The next step is to update DM to ensure that we're creating the
plane
    and stream states for whatever could potentially be a full update
in
    DC to pre-emptively recreate the state for DC global validation.

    The workaround can stay until this has been fixed in DM.

    Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
    Reviewed-by: Hersen Wu <hersenxs.wu at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
    Signed-off-by: Sasha Levin <sashal at kernel.org>

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 47431ca6986d..4a619328101c 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2517,6 +2517,12 @@ void dc_commit_updates_for_stream(struct dc *dc,

 	copy_stream_update_to_stream(dc, context, stream,
stream_update);

+	if (!dc->res_pool->funcs->validate_bandwidth(dc, context,
false)) {
+		DC_ERROR("Mode validation failed for stream
update!\n");
+		dc_release_state(context);
+		return;
+	}
+
 	commit_planes_for_stream(
 				dc,
 				srf_updates,

Details:

* Kernel: 5.7.6
* GPU: radeon 5700XT
* CPU: ryzen 3800X
* running on swaywm(wayland)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: revert-amdgpu-5.7.6.patch
Type: text/x-patch
Size: 2822 bytes
Desc: 
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20200627/6444efb5/attachment.bin>


More information about the dri-devel mailing list