CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
Paul E. McKenney
paulmck at kernel.org
Tue Jun 28 18:54:37 UTC 2022
On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote:
> Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am:
> > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote:
> >> Ah, I see. I have selected the default value for
> >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not
> >> using Android; I'm not sure there exist Android devices with AMD GPUs.
> >> However, I have set CONFIG_ANDROID=y in order to use
> >> ANDROID_BINDER_IPC=m for emulation.
> >>
> >> In general, I think CONFIG_ANDROID is not a reliable method for
> >> detecting if the kernel is for an Android device; for example, Fedora
> >> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with
> >> Android userspace.
> >>
> >> On the other hand, it's not clear to me why the value 20 should be for
> >> Android only anyways. If, as you say in
> >> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/,
> >> it is related to the size of the system, perhaps some other heuristic
> >> would be more appropriate.
> >
> > It is related to the fact that quite a few Android guys want these
> > 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one
> > else does. Not yet anyway.
> >
> > And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely
> > straightforward and unmistakeable. So perhaps people not running Android
> > devices but wanting a little bit of the Android functionality should do
> > something other than setting CONFIG_ANDROID=y in their .config files. Me,
> > I am surprised that it took this long for something like this to bite you.
> >
> > But just out of curiosity, what would you suggest instead?
>
> Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If
> major distro vendors are consistently making this "mistake", then
> perhaps the problem is elsewhere.
>
> In my own opinion, assuming that binderfs means Android vendor is not a
> good assumption. The ANDROID help says:
>
> > Enable support for various drivers needed on the Android platform
>
> It doesn't say "Enable only if building an Android device", or "Enable
> only if you are Google". Isn't the traditional Linux philosophy a
> collection of pieces to be assembled, without gratuitous hidden
> dependencies? For example, [0] removes the unnecessary Android
> dependency, it doesn't block the whole thing with "depends on ANDROID".
>
> It seems to me that the proper way to set some configuration for Android
> kernels is or should be to ask the Android kernel config maintainers,
> not to set it based on an upstream kernel option. There is, after all,
> no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA.
>
> WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as
> rcu, there to see if suspends are "frequent". This seems dubious for the
> same reasons.
>
> I wonder if it might be time to retire CONFIG_ANDROID: the only
> remaining driver covered is binder, which originates from Android but
> is no longer used exclusively on Android systems. Like ufs-qcom, binder
> is no longer used exclusively on Android devices; it is also used for
> Android device emulators, which might be used on Android-like mobile
> devices, or might not.
>
> My understanding is that both Android and upstream kernel developers
> intend to add no more Android-specific drivers, so binder should be the
> only one covered for the foreseeable future.
Thank you for the perspective, but you never did suggest an alternative.
So here is is what I suggest given the current setup:
config RCU_EXP_CPU_STALL_TIMEOUT
int "Expedited RCU CPU stall timeout in milliseconds"
depends on RCU_STALL_COMMON
range 0 21000
default 20 if ANDROID
default 0 if !ANDROID
help
If a given expedited RCU grace period extends more than the
specified number of milliseconds, a CPU stall warning is printed.
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
seconds to milliseconds.
The default, and only the default, is controlled by ANDROID.
All you need to do to get the previous behavior is to add something like
this to your defconfig file:
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
Any reason why this will not work for you?
> > For that matter, why the private reply?
>
> Mail client issues, not intentional. Lists re-added, plus Android,
> WireGuard, and random.
Thank you!
Thanx, Paul
> Thanks,
> Alex.
>
> [0] https://lore.kernel.org/all/20220321151853.24138-1-krzk@kernel.org/
More information about the amd-gfx
mailing list