[Intel-gfx] [PATCH] drm: Reduce EDID warnings from DRM_ERROR to DRM_NOTE

Tue Feb 14 21:43:45 UTC 2017

On Tue, Feb 14, 2017 at 10:36:09PM +0100, Daniel Vetter wrote:
> On Mon, Feb 13, 2017 at 12:17:27PM -0500, Sean Paul wrote:
> > On Mon, Feb 13, 2017 at 3:59 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > On Mon, Feb 13, 2017 at 08:41:10AM +0100, Thierry Reding wrote:
> > >> On Fri, Feb 10, 2017 at 07:59:13PM +0000, Chris Wilson wrote:
> > >> > The warnings from parsing the EDID are not driver errors, but the
> > >> > "normal but significant" conditions from the external device. As such,
> > >> > they do not need the ferocity of an *ERROR*, but can use the less harsh
> > >> > DRM_NOTE instead.
> > >> >
> > >> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > >> > ---
> > >> >  drivers/gpu/drm/drm_edid.c | 15 ++++++++-------
> > >> >  1 file changed, 8 insertions(+), 7 deletions(-)
> > >>
> > >> The below are all conditions that happen when the EDID is bad. I'm not
> > >> sure that really qualifies as "normal".
> > >
> > > Often it is - a bad EDID on the monitor will always be bad. The
> > > challenge is distinguishing that from silent data corruption during the
> > > read - a reported read failure are trivial.
> > >
> > >> From a quick look through the code we don't always trigger an error from
> > >> the below failure paths at higher levels, so decreasing the level here
> > >> has the potential to let this kind of exceptional condition go
> > >> unnoticed.
> > >
> > > The messages are not gone, they are higher than the default loglevel,
> > > but now below the level at which they are printed to a terminal. The
> > > bad EDID is either expected or recoverable, and definitely not fatal
> > > so I don't think an *ERROR* is justified.
> > 
> > I tend to agree.
> > 
> > The description for the KERN_NOTICE level is "normal but significant
> > condition". I might argue that the presence of these EDID messages
> > represents a normal *or* significant condition (depending on why the
> > EDID is bad), but I don't think it's unreasonable to expect people to
> > check their logs if the display/mode is not working properly.
> 
> So for cases where we know that there is shit hw out there (specifically
> kvm switches that mangle the cea block without adjusting the edid) we
> already tune down the error to debug level. So in principle totally agree
> with tuning down anything that happens because it's outside of our control
> to info or debug, but do we still need this patch after the cea one has
> landed? Our CI at least seems happy ...

Yes. The one machine with a dodgy EDID also happens to have a dodgy
BIOS. This reduces the number of consistent errors to 1, but since an
unrelated error still remains, CI doesn't detect the improvement.
https://intel-gfx-ci.01.org/CI/CI_DRM_2198/fi-skl-6700k/igt@drv_module_reload@basic-reload.html
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre