Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1: - Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
Several DRM/KMS atomic commits can run in parallel if they affect different CRTC. These commits share the global HVS state, so we have some code to make sure we run commits in sequence. This synchronization code is one of the first thing that runs in vc4_atomic_commit_tail().
Another constraints we have is that we need to make sure the HVS clock gets a boost during the commit. That code relies on clk_set_min_rate and will remove the old minimum and set a new one. We also need another, temporary, minimum for the duration of the commit.
The algorithm is thus to set a temporary minimum, drop the previous one, do the commit, and finally set the minimum for the current mode.
However, the part that sets the temporary minimum and drops the older one runs before the commit synchronization code.
Thus, under the proper conditions, we can end up mixing up the minimums and ending up with the wrong one for our current step.
To avoid it, let's move the clock setup in the protected section.
Fixes: d7d96c00e585 ("drm/vc4: hvs: Boost the core clock during modeset") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index f0b3e4cf5bce..764ddb41a4ce 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -353,9 +353,6 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) vc4_hvs_mask_underrun(dev, vc4_crtc_state->assigned_channel); }
- if (vc4->hvs->hvs5) - clk_set_min_rate(hvs->core_clk, 500000000); - old_hvs_state = vc4_hvs_get_old_global_state(state); if (!old_hvs_state) return; @@ -377,6 +374,9 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) drm_err(dev, "Timed out waiting for commit\n"); }
+ if (vc4->hvs->hvs5) + clk_set_min_rate(hvs->core_clk, 500000000); + drm_atomic_helper_commit_modeset_disables(dev, state);
vc4_ctm_commit(vc4, state);
The HVS global state functions return an error pointer, but in most cases we check if it's NULL, possibly resulting in an invalid pointer dereference.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 764ddb41a4ce..3f780c195749 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -354,7 +354,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) }
old_hvs_state = vc4_hvs_get_old_global_state(state); - if (!old_hvs_state) + if (IS_ERR(old_hvs_state)) return;
for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { @@ -410,8 +410,8 @@ static int vc4_atomic_commit_setup(struct drm_atomic_state *state) unsigned int i;
hvs_state = vc4_hvs_get_new_global_state(state); - if (!hvs_state) - return -EINVAL; + if (WARN_ON(IS_ERR(hvs_state))) + return PTR_ERR(hvs_state);
for_each_new_crtc_in_state(state, crtc, crtc_state, i) { struct vc4_crtc_state *vc4_crtc_state = @@ -762,8 +762,8 @@ static int vc4_pv_muxing_atomic_check(struct drm_device *dev, unsigned int i;
hvs_new_state = vc4_hvs_get_global_state(state); - if (!hvs_new_state) - return -EINVAL; + if (IS_ERR(hvs_new_state)) + return PTR_ERR(hvs_new_state);
for (i = 0; i < ARRAY_SIZE(hvs_new_state->fifo_state); i++) if (!hvs_new_state->fifo_state[i].in_use)
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a global state for the HVS, with each FIFO storing the current CRTC commit so that we can properly synchronize commits.
However, the refcounting was off and we thus ended up leaking the drm_crtc_commit structure every commit. Add a drm_crtc_commit_put to prevent the leakage.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 3f780c195749..7c1d0c3beba2 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -361,6 +361,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) struct vc4_crtc_state *vc4_crtc_state = to_vc4_crtc_state(old_crtc_state); unsigned int channel = vc4_crtc_state->assigned_channel; + struct drm_crtc_commit *commit; int ret;
if (channel == VC4_HVS_CHANNEL_DISABLED) @@ -369,9 +370,15 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) if (!old_hvs_state->fifo_state[channel].in_use) continue;
- ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[channel].pending_commit); + commit = old_hvs_state->fifo_state[channel].pending_commit; + if (!commit) + continue; + + ret = drm_crtc_commit_wait(commit); if (ret) drm_err(dev, "Timed out waiting for commit\n"); + + drm_crtc_commit_put(commit); }
if (vc4->hvs->hvs5)
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a wait on the previous commit done on a given HVS FIFO.
However, we never cleared that pointer once done. Since drm_crtc_commit_put can free the drm_crtc_commit structure directly if we were the last user, this means that it can lead to a use-after free if we were to duplicate the state, and that stale pointer would even be copied to the new state.
Set the pointer to NULL once we're done with the wait so that we don't carry over a pointer to a free'd structure.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 7c1d0c3beba2..f80370e87e98 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -379,6 +379,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) drm_err(dev, "Timed out waiting for commit\n");
drm_crtc_commit_put(commit); + old_hvs_state->fifo_state[channel].pending_commit = NULL; }
if (vc4->hvs->hvs5)
Our HVS global state, when duplicated, will also copy the pointer to the drm_crtc_commit (and increase the reference count) for each FIFO if the pointer is not NULL.
However, our atomic_setup function will overwrite that pointer without putting the reference back leading to a memory leak.
Since the commit is only relevant during the atomic commit process, it doesn't make sense to duplicate the reference to the commit anyway. Let's remove it.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index f80370e87e98..d9b3e3ad71ea 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -676,12 +676,6 @@ vc4_hvs_channels_duplicate_state(struct drm_private_obj *obj)
for (i = 0; i < HVS_NUM_CHANNELS; i++) { state->fifo_state[i].in_use = old_state->fifo_state[i].in_use; - - if (!old_state->fifo_state[i].pending_commit) - continue; - - state->fifo_state[i].pending_commit = - drm_crtc_commit_get(old_state->fifo_state[i].pending_commit); }
return &state->base;
Our current code is supposed to serialise the commits by waiting for all the drm_crtc_commits associated to the previous HVS state.
However, assuming we have two CRTCs running and being configured and we configure each one alternatively, we end up in a situation where we're not waiting at all.
Indeed, starting with a state (state 0) where both CRTCs are running, and doing a commit (state 1) on the first CRTC (CRTC 0), we'll associate its commit to its assigned FIFO in vc4_hvs_state.
If we get a new commit (state 2), this time affecting the second CRTC (CRTC 1), the DRM core will allow both commits to execute in parallel (assuming they don't have any share resources).
Our code in vc4_atomic_commit_tail is supposed to make sure we only get one commit at a time and serialised by order of submission. It does so by using for_each_old_crtc_in_state, making sure that the CRTC has a FIFO assigned, is used, and has a commit pending. If it does, then we'll wait for the commit before going forward.
During the transition from state 0 to state 1, as our old CRTC state we get the CRTC 0 state 0, its commit, we wait for it, everything works fine.
During the transition from state 1 to state 2 though, the use of for_each_old_crtc_in_state is wrong. Indeed, while the code assumes it's returning the state of the CRTC in the old state (so CRTC 0 state 1), it actually returns the old state of the CRTC affected by the current commit, so CRTC 0 state 0 since it wasn't part of state 1.
Due to this, if we alternate between the configuration of CRTC 0 and CRTC 1, we never actually wait for anything since we should be waiting on the other every time, but it never is affected by the previous commit.
Change the logic to, at every commit, look at every FIFO in the previous HVS state, and if it's in use and has a commit associated to it, wait for that commit.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index d9b3e3ad71ea..b61792d2aa65 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -337,10 +337,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) struct drm_device *dev = state->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs; - struct drm_crtc_state *old_crtc_state; struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; struct vc4_hvs_state *old_hvs_state; + unsigned int channel; int i;
for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { @@ -357,16 +357,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) if (IS_ERR(old_hvs_state)) return;
- for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { - struct vc4_crtc_state *vc4_crtc_state = - to_vc4_crtc_state(old_crtc_state); - unsigned int channel = vc4_crtc_state->assigned_channel; + for (channel = 0; channel < HVS_NUM_CHANNELS; channel++) { struct drm_crtc_commit *commit; int ret;
- if (channel == VC4_HVS_CHANNEL_DISABLED) - continue; - if (!old_hvs_state->fifo_state[channel].in_use) continue;
Hi Maxime
On Wed, 17 Nov 2021 at 09:45, Maxime Ripard maxime@cerno.tech wrote:
Our current code is supposed to serialise the commits by waiting for all the drm_crtc_commits associated to the previous HVS state.
However, assuming we have two CRTCs running and being configured and we configure each one alternatively, we end up in a situation where we're
s/alternatively/alternately
Otherwise the series looks fine to the extent that I understand the issues. So the series is
Reviewed-by: Dave Stevenson dave.stevenson@raspberrypi.com
not waiting at all.
Indeed, starting with a state (state 0) where both CRTCs are running, and doing a commit (state 1) on the first CRTC (CRTC 0), we'll associate its commit to its assigned FIFO in vc4_hvs_state.
If we get a new commit (state 2), this time affecting the second CRTC (CRTC 1), the DRM core will allow both commits to execute in parallel (assuming they don't have any share resources).
Our code in vc4_atomic_commit_tail is supposed to make sure we only get one commit at a time and serialised by order of submission. It does so by using for_each_old_crtc_in_state, making sure that the CRTC has a FIFO assigned, is used, and has a commit pending. If it does, then we'll wait for the commit before going forward.
During the transition from state 0 to state 1, as our old CRTC state we get the CRTC 0 state 0, its commit, we wait for it, everything works fine.
During the transition from state 1 to state 2 though, the use of for_each_old_crtc_in_state is wrong. Indeed, while the code assumes it's returning the state of the CRTC in the old state (so CRTC 0 state 1), it actually returns the old state of the CRTC affected by the current commit, so CRTC 0 state 0 since it wasn't part of state 1.
Due to this, if we alternate between the configuration of CRTC 0 and CRTC 1, we never actually wait for anything since we should be waiting on the other every time, but it never is affected by the previous commit.
Change the logic to, at every commit, look at every FIFO in the previous HVS state, and if it's in use and has a commit associated to it, wait for that commit.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech
drivers/gpu/drm/vc4/vc4_kms.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index d9b3e3ad71ea..b61792d2aa65 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -337,10 +337,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) struct drm_device *dev = state->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs;
struct drm_crtc_state *old_crtc_state; struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; struct vc4_hvs_state *old_hvs_state;
unsigned int channel; int i; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
@@ -357,16 +357,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) if (IS_ERR(old_hvs_state)) return;
for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
struct vc4_crtc_state *vc4_crtc_state =
to_vc4_crtc_state(old_crtc_state);
unsigned int channel = vc4_crtc_state->assigned_channel;
for (channel = 0; channel < HVS_NUM_CHANNELS; channel++) { struct drm_crtc_commit *commit; int ret;
if (channel == VC4_HVS_CHANNEL_DISABLED)
continue;
if (!old_hvs_state->fifo_state[channel].in_use) continue;
-- 2.33.1
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214991
Tested-by: Jian-Hong Pan jhp@endlessos.org
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214991
Tested-by: Jian-Hong Pan jhp@endlessos.org
Thanks! Maxime
Maxime Ripard maxime@cerno.tech 於 2021年11月18日 週四 下午6:40寫道:
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
No. It does not appear anymore.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
I get the RPi boot firmware files from raspberrypi's repository at tag 1.20211029 [1] And, make system boots: RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
The config.txt only has: enable_uart=1 arm_64bit=1 kernel=uboot.bin
This bug can be reproduced with es2gears command easily. May need to wait it running a while.
Mesa: 21.2.2 libdrm: 2.4.107 xserver/wayland: 1.20.11 Using wayland
This bug happens with direct boot path as well: RPi firmware -> Linux kernel (aarch64) with corresponding DTB
I tried with Endless OS and Ubuntu's RPi 4B images. An easy setup is using Ubuntu 21.10 RPi 4B image [2]. Then, replace the kernel & device tree blob and edit the config.txt.
[1] https://github.com/raspberrypi/firmware/tree/1.20211029/boot [2] https://ubuntu.com/download/raspberry-pi
Jian-Hong Pan
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214991
Tested-by: Jian-Hong Pan jhp@endlessos.org
Thanks! Maxime
On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月18日 週四 下午6:40寫道:
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
No. It does not appear anymore.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
I get the RPi boot firmware files from raspberrypi's repository at tag 1.20211029 [1] And, make system boots: RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
The config.txt only has: enable_uart=1 arm_64bit=1 kernel=uboot.bin
This bug can be reproduced with es2gears command easily. May need to wait it running a while.
Mesa: 21.2.2 libdrm: 2.4.107 xserver/wayland: 1.20.11 Using wayland
This bug happens with direct boot path as well: RPi firmware -> Linux kernel (aarch64) with corresponding DTB
I tried with Endless OS and Ubuntu's RPi 4B images. An easy setup is using Ubuntu 21.10 RPi 4B image [2]. Then, replace the kernel & device tree blob and edit the config.txt.
Does it still appear if you raise the core clock in the config.txt file using: core_freq_min=600 ?
Thanks! Maxime
Maxime Ripard maxime@cerno.tech 於 2021年11月26日 週五 下午11:45寫道:
On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月18日 週四 下午6:40寫道:
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
No. It does not appear anymore.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
I get the RPi boot firmware files from raspberrypi's repository at tag 1.20211029 [1] And, make system boots: RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
The config.txt only has: enable_uart=1 arm_64bit=1 kernel=uboot.bin
This bug can be reproduced with es2gears command easily. May need to wait it running a while.
Mesa: 21.2.2 libdrm: 2.4.107 xserver/wayland: 1.20.11 Using wayland
This bug happens with direct boot path as well: RPi firmware -> Linux kernel (aarch64) with corresponding DTB
I tried with Endless OS and Ubuntu's RPi 4B images. An easy setup is using Ubuntu 21.10 RPi 4B image [2]. Then, replace the kernel & device tree blob and edit the config.txt.
Does it still appear if you raise the core clock in the config.txt file using: core_freq_min=600 ?
It still appears when I raise the core clock in the config.txt file: core_freq_min=600
Jian-Hong Pan
On Mon, Nov 29, 2021 at 04:31:57PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月26日 週五 下午11:45寫道:
On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月18日 週四 下午6:40寫道:
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Changes from v1:
- Prevent a null pointer dereference
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
No. It does not appear anymore.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
I get the RPi boot firmware files from raspberrypi's repository at tag 1.20211029 [1] And, make system boots: RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
The config.txt only has: enable_uart=1 arm_64bit=1 kernel=uboot.bin
This bug can be reproduced with es2gears command easily. May need to wait it running a while.
Mesa: 21.2.2 libdrm: 2.4.107 xserver/wayland: 1.20.11 Using wayland
This bug happens with direct boot path as well: RPi firmware -> Linux kernel (aarch64) with corresponding DTB
I tried with Endless OS and Ubuntu's RPi 4B images. An easy setup is using Ubuntu 21.10 RPi 4B image [2]. Then, replace the kernel & device tree blob and edit the config.txt.
Does it still appear if you raise the core clock in the config.txt file using: core_freq_min=600 ?
It still appears when I raise the core clock in the config.txt file: core_freq_min=600
That's weird, we had some issues like that already but could never point exactly what was going on.
Is testing the official raspberrypi kernel an option for you? If so, trying the same workload with fkms would certainly help
Maxime
Maxime Ripard maxime@cerno.tech 於 2021年12月3日 週五 下午10:03寫道:
On Mon, Nov 29, 2021 at 04:31:57PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月26日 週五 下午11:45寫道:
On Fri, Nov 19, 2021 at 06:24:34PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月18日 週四 下午6:40寫道:
On Thu, Nov 18, 2021 at 02:42:58PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月17日 週三 下午5:45寫道: > The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to > atomic helpers") introduced a number of issues in corner cases, most of them > showing themselves in the form of either a vblank timeout or use-after-free > error. > > These patches should fix most of them, some of them still being debugged. > > Maxime > > Changes from v1: > - Prevent a null pointer dereference > > Maxime Ripard (6): > drm/vc4: kms: Wait for the commit before increasing our clock rate > drm/vc4: kms: Fix return code check > drm/vc4: kms: Add missing drm_crtc_commit_put > drm/vc4: kms: Clear the HVS FIFO commit pointer once done > drm/vc4: kms: Don't duplicate pending commit > drm/vc4: kms: Fix previous HVS commit wait > > drivers/gpu/drm/vc4/vc4_kms.c | 42 ++++++++++++++++------------------- > 1 file changed, 19 insertions(+), 23 deletions(-)
I tested the v2 patches based on latest mainline kernel with RPi 4B. System can boot up into desktop environment.
So the thing that was broken initially isn't anymore?
No. It does not appear anymore.
Although it still hit the bug [1], which might be under debugging, I think these patches LGTM.
The vblank timeout stuff is a symptom of various different bugs. Can you share your setup, your config.txt, and what you're doing to trigger it?
I get the RPi boot firmware files from raspberrypi's repository at tag 1.20211029 [1] And, make system boots: RPi firmware -> U-Boot -> Linux kernel (aarch64) with corresponding DTB
The config.txt only has: enable_uart=1 arm_64bit=1 kernel=uboot.bin
This bug can be reproduced with es2gears command easily. May need to wait it running a while.
Mesa: 21.2.2 libdrm: 2.4.107 xserver/wayland: 1.20.11 Using wayland
This bug happens with direct boot path as well: RPi firmware -> Linux kernel (aarch64) with corresponding DTB
I tried with Endless OS and Ubuntu's RPi 4B images. An easy setup is using Ubuntu 21.10 RPi 4B image [2]. Then, replace the kernel & device tree blob and edit the config.txt.
Does it still appear if you raise the core clock in the config.txt file using: core_freq_min=600 ?
It still appears when I raise the core clock in the config.txt file: core_freq_min=600
That's weird, we had some issues like that already but could never point exactly what was going on.
Is testing the official raspberrypi kernel an option for you? If so, trying the same workload with fkms would certainly help
I tested the raspberrypi kernel on rpi-5.16.y branch at commit bcb52df6df52 ("xhci: add a quirk to work around a suspected cache bug on VLI controllers"). Also, enabled the fkms. So, vc4 and v3d are loaded. However, the "flip_done timed out" error does not appear like mainline kernel.
Jian-Hong Pan
On Wed, 17 Nov 2021 10:45:21 +0100, Maxime Ripard wrote:
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
[...]
Applied to drm/drm-misc (drm-misc-fixes).
Thanks! Maxime
dri-devel@lists.freedesktop.org