[Intel-gfx] X hang with quirk VT switches
Takashi Iwai
tiwai at suse.de
Thu Dec 4 03:44:23 PST 2014
At Thu, 4 Dec 2014 11:21:47 +0000,
Chris Wilson wrote:
>
> On Thu, Dec 04, 2014 at 11:53:05AM +0100, Takashi Iwai wrote:
> > At Wed, 3 Dec 2014 18:31:45 +0000,
> > Chris Wilson wrote:
> > >
> > > On Wed, Dec 03, 2014 at 03:45:35PM +0100, Takashi Iwai wrote:
> > > > Hi,
> > > >
> > > > while checking the reported bug about VT switch hang on openSUSE 13.2,
> > > > I also could reproduce a similar issue as reported: namely, X hangs
> > > > when repeatedly switching VT quickly.
> > > >
> > > > For example, running the following on KDE results in the stall of X.
> > > >
> > > > % for i in $(seq 1 100); do chvt 1; chvt 7; done
> > > >
> > > > Looking at the sysrq-t output, it stalls at drm_read(). And after
> > > > putting some debug prints at event handling codes, it shows like:
> > > >
> > > > drm_queue_vblank_event event_space=4064
> > > > send_vblank_event event_space=4064
> > > > drm_poll ENTER event_space=4064
> > > > drm_poll mask=0x41 event_space=4064
> > > > drm_poll ENTER event_space=4064
> > > > drm_poll mask=0x41 event_space=4064
> > > > drm_read ENTER event_space=4064
> > > > drm_read total=32 event_space=4096
> > > > drm_poll ENTER event_space=4096
> > > > drm_poll mask=0x0 event_space=4096
> > > > drm_read ENTER event_space=4096
> > > > drm_read ENTER event_space=4096
> > > > drm_read ENTER event_space=4096
> > > >
> > > > So, after a vblank event, two poll calls succeeded, followed by one
> > > > drm_read(). After that, there were one poll call without event,
> > > > followed by three(!) drm_read() calls. The last three drm_read()
> > > > never exited, thus X stalled. So, this looks like a race or a
> > > > refcount issue somewhere.
> > >
> > > The key question is how did you get 3 calls to drm_read that each didn't
> > > return? The only place where we call drm_read without first doing a poll
> > > is in the WakeupHandler with the drm fd flagged for reads. This is
> > > broken in ZaphodHeads as the drm fd is not O_NONBLOCK without
> > >
> > > commit bd008e5b2953186fc0c6633a885ade95e7043800
> > > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > > Date: Tue Oct 7 14:13:51 2014 +0100
> > >
> > > drm: Implement O_NONBLOCK support on /dev/dri/cardN
> > >
> > > I assume that isn't the case as I expect you would have mentioned using
> > > ZaphodHeads.
> >
> > I took a look back at drm_read() code again, and I found that the
> > function doesn't care about O_NONBLOCK at all. (And there is a memory
> > leak, too.)
> >
> > So I added the support for O_NONBLOCK, and the problem seems
> > resolved.
> >
> > Although this is no right "fix" (the caller side should be fixed), it
> > would be good to have in anyway. I'm going to send patches for review
> > to dri-devel ML, as it's no i915 specific.
>
> I disagree. drm has claimed to support O_NONBLOCK since its inception,
> but the implementation was buggy.
The nonblock read is obviously buggy. If the current implementation
is intentional, then the nonblock flag is somehow misused...
> However, I don't think there is a case
> in non-ZaphodHeads where we use read() without first select/poll
> reporting that there is something to use (and the problem with
> ZaphodHeads is that we have two screens that share the same drm fd
> without clearing the select read flags... hmm)
In my case, I'm using a single screen, so this can't be.
And, my rough guess is that this isn't about the lack of poll but
rather some race between poll/read or two reads. That explains why my
patch worked.
In anyway I'd need to trap X stall and diagnose, but I have to leave
my machine now. Will check it tomorrow.
Meanwhile, it's interesting to see whether this covers Maarten's case,
too...
thanks,
Takashi
More information about the Intel-gfx
mailing list