[Intel-gfx] [PATCH 1/2] Give up on edid retries when i2c tells us that bus is not there

Sun Oct 30 08:53:13 PDT 2011

Hi Eugeni,

On Mon, 24 Oct 2011 12:40:14 -0200, Eugeni Dodonov wrote:
> On Thu, Oct 20, 2011 at 10:33, Jean Delvare <khali at linux-fr.org> wrote:
> 
> > Just to clarify: by "connectivity is setup", do you mean code in the
> > driver setting the GPIO pin direction etc., or a display being
> > connected to the graphics card?
> >
> > I admit I am a little surprised. I2C buses should have their lines up
> > by default, thanks to pull-up resistors, and masters and slaves should
> > merely drive the lines low as needed. The absence of slaves should have
> > no impact on the line behavior. If the Intel graphics chips (or the
> > driver) rely on the display to hold the lines high, I'd say this is a
> > seriously broken design.
> >
> > As a side note, I thought that HDMI had the capability of presence
> > detection, so there may be a better way to know if a display is
> > connected or not, and this information could used to dynamically
> > instantiate and delete the I2C buses? That way we could skip probing
> > for EDID when there is no chance of success.
> 
> Yes, I think so too.
> 
> I admit I haven't got to the root of the problem here. My test was really
> simple, borrowed from the test_bus() at i2c-algo-bit.c - without HDMI cable
> plugged in, I was getting the "SDA stuck high" messages; with the cable
> plugged in, I wasn't getting any of those.

Just the HDMI cable, or with a screen at the other end?

Either way, this smells like bad hardware design. The graphics card
itself should pull the I2C bus lines up by default, it shouldn't rely
on a cable (I'm not familiar with HDMI but I'm not sure if that makes
sense at all) or external display (more likely) to do it. But I can
also imagine that this could be a driver issue, maybe the GPIO lines
aren't setup properly by the driver? I'm not familiar enough with the
Intel graphics hardware and driver to tell, you'll know better.

> But in any case, I haven't investigated it deeper in the hardware direction
> after figuring out that drm_get_edid would retry its attempts for retreiving
> the edid for 15 times, getting the -ENXIO error for all of them.
> 
> 
> > Well, you could always do manual line testing of the I2C bus _before_
> > calling drm_do_probe_ddc_edid()? And skip the call if the test fails
> > (i.e. the bus isn't ready for use.) As said before, I am willing to
> > export bit_test if it helps any. I've attached a patch doing exactly
> > this. Let me know if you want me to commit it.
> 
> Yes, surely, I can do this. I did a similar test in the i915-specific patch,
> checking if we can talk to i2c adapter pior to call drm_do_probe_ddc_edid,
> but I thought that perhaps it would be easier for all the cards relying on
> drm_get_edid to have a faster return path in case of problems.
> 
> I am fine with any solution, if this problem is happening to appear on i915
> cards only, we could do this in our driver. It is that 15 attempts looks a
> bit overkill.

Yes, I agree, 15 retries makes no sense. And I like your original
patch, it makes a lot of sense.

> > Your proposed patch is better than I first thought. I had forgotten
> > that i2c-algo-bit tries up to 3 times to get a first ack from a slave.
> > So if i2c_transfer returns -ENXIO, this means that i2c-algo-bit has
> > already attempted 3 times to contact the slave, with no reply, so
> > there's probably no point going on. A communication problem with a
> > present slave would have returned a different error code.
> >
> > Without your patch, a missing slave would cause 3 * 5 = 15 retries,
> > which is definitely too much.
> >
> > That being said, even then the whole probe sequence shouldn't exceed
> > 10 ms, which I wouldn't expect a user to notice. The user-reported 4
> > second delay when running xrandr can't be caused by this. 4 seconds for
> > 15 attempts is 250 ms per attempt, this looks like a timeout due to
> > non-functional bus, not a nack. Not that 250 ms is 1 jiffy for HZ=250,
> > which I guess was what the reporting user was running. So even with
> > your patch, there's still 750 ms to wait, I don't think this is
> > acceptable. You really have to fix that I2C bus or skip probing it.
> 
> Yep, true. I've followed the easiest route so far - find out where the
> initial problem happens, and attempt at solving it. This change in
> drm_get_edid solves the delay at its origin, but we certainly could have
> additional delays propagated somewhere else. I couldn't reproduce the
> original reporter's huge delay, so I looked at what was within my reach.

Your fix is not really "at the origin" of the problem. Fixing it at the
origin would be not creating the I2C bus if it won't work (or marking
it as unavailable in some way, if the drm framework allows this.) If
that is not possible, testing the lines before accessing the bus would
be closer to the origin, however it might be argued that it adds delays
in the working case, and also somehow violates the I2C specification
(the line testing is not part of the specification and could confuse
some chips, in theory at least.)

> In any case - looking at a faster way to leave the drm_do_probe_ddc_edid,
> while also allowing a way for having a more detailed probing - would it be
> an acceptable solution to add a:
> 
> MODULE_PARM(skip_unresponsive_edid, "Ignore outputs which do not provide
> edid on first attempt");
> 
> and update the patch to use this option?

I am not in favor of this, as it makes the code more complex while your
original patch looks just fine to me. I no longer expect it to cause
any trouble, so I suggest applying it now. If any problem is reported
during the next two months, we can always reconsider. But even then,
expecting users to pass module parameters is usually a bad strategy, as
this means things aren't working for them out of the box in the first
place.

-- 
Jean Delvare