[RESEND v3 2/2] drm: Add CONFIG_DRM_WERROR

Nathan Chancellor nathan at kernel.org
Wed Mar 27 16:33:31 UTC 2024


On Wed, Mar 27, 2024 at 09:59:01AM +0200, Jani Nikula wrote:
> On Wed, 27 Mar 2024, Maxime Ripard <mripard at kernel.org> wrote:
> > On Tue, Mar 26, 2024 at 03:56:50PM -0700, Nathan Chancellor wrote:
> >> On Tue, Mar 05, 2024 at 11:07:36AM +0200, Jani Nikula wrote:
> >> > Add kconfig to enable -Werror subsystem wide. This is useful for
> >> > development and CI to keep the subsystem warning free, while avoiding
> >> > issues outside of the subsystem that kernel wide CONFIG_WERROR=y might
> >> > hit.
> >> > 
> >> > v2: Don't depend on COMPILE_TEST
> >> > 
> >> > Reviewed-by: Hamza Mahfooz <hamza.mahfooz at amd.com> # v1
> >> > Signed-off-by: Jani Nikula <jani.nikula at intel.com>
> >> > ---
> >> >  drivers/gpu/drm/Kconfig  | 13 +++++++++++++
> >> >  drivers/gpu/drm/Makefile |  3 +++
> >> >  2 files changed, 16 insertions(+)
> >> > 
> >> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> >> > index 6e853acf15da..c08e18108c2a 100644
> >> > --- a/drivers/gpu/drm/Kconfig
> >> > +++ b/drivers/gpu/drm/Kconfig
> >> > @@ -416,3 +416,16 @@ config DRM_LIB_RANDOM
> >> >  config DRM_PRIVACY_SCREEN
> >> >  	bool
> >> >  	default n
> >> > +
> >> > +config DRM_WERROR
> >> > +	bool "Compile the drm subsystem with warnings as errors"
> >> > +	depends on EXPERT
> >> > +	default n
> >> > +	help
> >> > +	  A kernel build should not cause any compiler warnings, and this
> >> > +	  enables the '-Werror' flag to enforce that rule in the drm subsystem.
> >> > +
> >> > +	  The drm subsystem enables more warnings than the kernel default, so
> >> > +	  this config option is disabled by default.
> >> > +
> >> > +	  If in doubt, say N.
> >> 
> >> While I understand the desire for an easy switch that maintainers and
> >> developers can use to ensure that their changes are warning free for the
> >> drm subsystem specifically, I think subsystem specific configuration
> >> options like this are actively detrimental to developers and continuous
> >> integration systems that build test the entire kernel. For example, we
> >> turned off CONFIG_WERROR for our Hexagon builds because of warnings that
> >> appear with -Wextra that are legitimate but require treewide changes to
> >> resolve in a manner sufficient for Linus:
> >> 
> >> https://github.com/ClangBuiltLinux/linux/issues/1285
> >> https://lore.kernel.org/all/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
> >> https://lore.kernel.org/all/20230522105049.1467313-1-schnelle@linux.ibm.com/
> >> 
> >> But now, due to CONFIG_DRM_WERROR getting enabled by all{mod,yes}config
> >> and -Wextra being unconditionally enabled for DRM, those warnings hard
> >> break the build despite CONFIG_WERROR=n...
> >
> > Would making DRM_WERROR depends on WERROR address your concerns?
> 
> But then what would be the point of having DRM_WERROR at all? For me the
> point is, "werror in drm, ignore the rest, they're someone else's
> problem".

Right, I do think this is a valid view point and one I am sympathetic
to, especially since it is in the pursuit of increased code quality. I
do not want to disrupt that.

> An alternative would be to "depends on !COMPILE_TEST" that we have in
> i915, but then some folks want to have COMPILE_TEST in drm, because some
> drivers are otherwise hard for people to build.

Right. I think it is unfortunate how (at least in my opinion)
CONFIG_COMPILE_TEST has two meanings: genuinely just compile testing or
"allmodconfig". For the first case, we would want CONFIG_DRM_WERROR=y
but for the second case, it would be nice for CONFIG_DRM_WERROR to
default to off (because CONFIG_WERROR is enabled) but allow developers
to turn it on explicitly.

Another lofty/wistful idea to solve this would be to implement something
similar to compiler diagnostic groups for Kconfig, where there would be
a hierarchy like

  CONFIG_WERROR
    CONFIG_DRM_WERROR
    CONFIG_SUBSYSTEM_A_WERROR
    CONFIG_SUBSYSTEM_B_WERROR

where the value of CONFIG_WERROR is the same value for all
subconfigurations under it but they could still be enabled individually
without any additional dependencies (ala something like '-Wno-unused
-Wunused-variable'), which would allow my use case of CONFIG_WERROR=n
removing all instances of -Werror to continue to work but allow other
developers and CI systems to just set their specific -Werror
configuration and be done with it. I don't think something like that
exists but maybe I don't know Kconfig as well as I think I do :)

> Nathan, we do want to fix any issues switfly. Are you hitting specific
> build problems?

Yes, I see three distinct set of problems from our CI as a direct result
of this series. I already covered two in the prior mail but I'll be a
little more expansive below.

1. Instances of -Wunused-but-set-variable from variables that only have
   unary operations applied to them. Clang can warn in this case where
   GCC cannot: https://godbolt.org/z/d368q3coP

      int main(void)
      {
          int a = 0;
          a++;
          return 0;
      }

   which shows up in a few drm drivers. Most have a patch on the mailing
   list that has not been applied.

     drivers/gpu/drm/qxl/qxl_cmd.c:424:6: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable]
       424 |         int count = 0;
           |             ^
     https://lore.kernel.org/all/20230408165023.2706235-1-trix@redhat.com/ (almost a year old)

     drivers/gpu/drm/qxl/qxl_ioctl.c:148:14: error: variable 'num_relocs' set but not used [-Werror,-Wunused-but-set-variable]
       148 |         int i, ret, num_relocs;
           |                     ^
     https://lore.kernel.org/all/20240307104119.1980621-1-colin.i.king@gmail.com/

     drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:843:6: error: variable 'out' set but not used [-Werror,-Wunused-but-set-variable]
       843 |         u64 out = dumper->iova + A6XX_CD_DATA_OFFSET;
           |             ^
     https://lore.kernel.org/all/20240326212324.185832-1-ojeda@kernel.org/ (recent patch)

     drivers/gpu/drm/panthor/panthor_sched.c:2048:6: error: variable 'csg_mod_mask' set but not used [-Werror,-Wunused-but-set-variable]
      2048 |         u32 csg_mod_mask = 0, free_csg_slots = 0;
           |             ^
     No patch, new driver, not reported yet it seems.

2. High stack usage in AMDGPU files for ARCH=powerpc allmodconfig. This
   might be a compiler issue but until now, there have been more
   important fires.

     drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: error: stack frame size (2064) exceeds limit (2048) in 'vcn_v3_0_start' [-Werror,-Wframe-larger-than]
      1096 | static int vcn_v3_0_start(struct amdgpu_device *adev)
           |            ^

     drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: error: stack frame size (2112) exceeds limit (2048) in 'vcn_v4_0_5_start' [-Werror,-Wframe-larger-than]
       955 | static int vcn_v4_0_5_start(struct amdgpu_device *adev)
           |            ^

     drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: error: stack frame size (2992) exceeds limit (2048) in 'vcn_v5_0_0_start' [-Werror,-Wframe-larger-than]
       713 | static int vcn_v5_0_0_start(struct amdgpu_device *adev)
           |            ^

   Taking a brief look at it while writing this email, it appears
   related to CONFIG_UBSAN_BOUNDS, as none of the warnings appear when
   that is disabled on top of allmodconfig. I suspect that the sanitizer
   instrumentation and inlining might be messing something up here, it
   has happened with other sanitizers like KASAN and KCSAN in the past.
   Without CONFIG_UBSAN_BOUNDS, the stack usage of these functions does
   not seem too bad:

     drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: warning: stack frame size (816) exceeds limit (512) in 'vcn_v3_0_start' [-Wframe-larger-than]
     drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: warning: stack frame size (800) exceeds limit (512) in 'vcn_v4_0_5_start' [-Wframe-larger-than]
     drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: warning: stack frame size (1040) exceeds limit (512) in 'vcn_v5_0_0_start' [-Wframe-larger-than]

3. -Wnull-pointer-arithmetic from IO port accessors on architectures
   that do not have them (such as hexagon and s390). For example:

     In file included from drivers/gpu/drm/virtio/virtgpu_plane.c:26:
     In file included from include/drm/drm_atomic_helper.h:31:
     In file included from include/drm/drm_crtc.h:32:
     In file included from include/drm/drm_modes.h:33:
     In file included from include/drm/drm_connector.h:32:
     In file included from include/drm/drm_util.h:35:
     In file included from include/linux/interrupt.h:11:
     In file included from include/linux/hardirq.h:11:
     In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
     In file included from include/asm-generic/hardirq.h:17:
     In file included from include/linux/irq.h:20:
     In file included from include/linux/io.h:13:
     In file included from arch/hexagon/include/asm/io.h:328:
     include/asm-generic/io.h:584:33: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       584 |         __raw_writeb(value, PCI_IOBASE + addr);
           |                             ~~~~~~~~~~ ^
     include/asm-generic/io.h:594:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
           |                                                       ~~~~~~~~~~ ^
     include/asm-generic/io.h:604:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
       604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
           |                                                       ~~~~~~~~~~ ^

   These warnings are numerous throughout drivers/gpu/drm/ because this
   warning is enabled with -Wextra. Again, this is not exactly your
   problem and it should eventually be fixed by [1] (it appears that
   Niklas is working on a new version at [2]) but it is exacerbated by
   the default combo of W=1 + -Werror for DRM with allmodconfig now,
   even with CONFIG_WERROR=n.

Hopefully that helps clear things up. I am more than happy to send
patches or work towards solutions that satisfies everyone (or at least a
majority/consensus). Wider testing with clang never hurts as well but I
understand increasing build matrices is not always an easy sell.

[1]: https://lore.kernel.org/all/20230522105049.1467313-45-schnelle@linux.ibm.com/
[2]: https://git.kernel.org/niks/l/has_ioport_v6

Cheers,
Nathan


More information about the Intel-gfx mailing list