[Intel-gfx] [PATCH] drm: Reduce EDID warnings from DRM_ERROR to DRM_NOTE

Tue Feb 14 21:36:09 UTC 2017

On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote:
> On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> >> > The warnings from parsing the EDID are not driver errors, but the
> >> > "normal but significant" conditions from the external device. As such,
> >> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> >> > DRM_NOTE instead.
> >> >
> >> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >> > ---
> >> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> >> >  1 file changed, 8 insertions(+), 7 deletions(-)
> >>
> >> The below are all conditions that happen when the EDID is bad. I'm not
> >> sure that really qualifies as "normal".
> >
> > Often it is - a bad EDID on the monitor will always be bad. The
> > challenge is distinguishing that from silent data corruption during the
> > read - a reported read failure are trivial.
> >
> >> From a quick look through the code we don't always trigger an error from
> >> the below failure paths at higher levels, so decreasing the level here
> >> has the potential to let this kind of exceptional condition go
> >> unnoticed.
> >
> > The messages are not gone, they are higher than the default loglevel,
> > but now below the level at which they are printed to a terminal. The
> > bad EDID is either expected or recoverable, and definitely not fatal
> > so I don't think an *ERROR* is justified.
> 
> I tend to agree.
> 
> The description for the KERN_NOTICE level is "normal but significant
> condition". I might argue that the presence of these EDID messages
> represents a normal *or* significant condition (depending on why the
> EDID is bad), but I don't think it's unreasonable to expect people to
> check their logs if the display/mode is not working properly.

So for cases where we know that there is shit hw out there (specifically
kvm switches that mangle the cea block without adjusting the edid) we
already tune down the error to debug level. So in principle totally agree
with tuning down anything that happens because it's outside of our control
to info or debug, but do we still need this patch after the cea one has
landed? Our CI at least seems happy ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch