vblank problem (and proposed fix) on crtc > 1

Thu Mar 3 15:34:53 PST 2011

I would like to propose an extension of the interface between the 
libdrm and drm kernel module for VBLANK wait to address a problem that
I am seeing on screens with more than two displays. Below, I describe the 
problem, propose a (backwards compatible) solution and provide a set
of patches that implement it. I am looking for some feedback on whether 
this would be a reasonable and acceptable fix. Sorry for the length of this 
note.

Specific setup I use is Radeon HD5870 (Evergreen/Cypress) GPU in Zaphod 
mode with three displays (DVI-0, DVI-1 and DisplayPort-0), but the
problem applies to any configuration with more than two displays. 
A sample xorg.conf is pasted at the end of this note for reference. 
Everything works fine if all three displays are connected to the 
monitor.

However, if I yank out DVI-1 connector, leaving DVI-0 and DisplayPort-0
plugged in and restart X without changing the xorg.conf (fairly legitimate
action of an ordinary user) any application that needs VBLANK synchronization
(e.g. glxgears) will get stuck when started in the desktop associated with
DisplayPort-0. It will still progress on DVI-0 and DVI-1 is (of course) not
visible so it doesn't matter.

Tracing the userland and the kernel, I have found out that the reason this
happens is because in this configuration DisplayPort-0 gets assigned to 
crtc-2 (crtc-0 is on DVI-0 and crtc-1 is still on unconnected DVI-1). 
Now the problem is that DRM_IOCTL_WAIT_VBLANK only understands primary
and secondary crtc and everything that is not crtc-0 is considered
secondary. Then in the kernel, drm module maps the secondary flag to 
crtc 1, but that is a disconnected crtc on which VBLANK interrupts never
come.

I am under impression that this limitation is a legacy from the days
when GPUs had at most two connectors. However, analyzing the whole
VBLANK mechanism in the kernel, it looks like it is ready supporting 
VBLANKs on higher crtc numbers (at least I can tell that with confidence
for radeon driver, which is the one I am using as well as for the 
drm module itself; didn't check other GPUs). Also, it looks like
at least when DRI2 is used, the userland fully preserves CRTC 
information all the way from GLX/mesa down to the call into drmWaitVBlank, 
which is the function in libdrm that issues the ioctl.

Specifically in case of DRI2, all calls into drmWaitVBlank are done 
through three functions: radeon_dri2_get_msc, radeon_dri2_schedule_wait_msc 
and radeon_dri2_schedule_swap in DDX (xf86-video-ati library). These 
functions have information about the exact CRTC in which
the drawable is, but they "destroy" it by mapping it to primary/secondary 
flag in the vbl.request.type. In the kernel, the flag is mapped back
to crtc number (0 or 1), so crtc > 1 is never used.

The fix/improvement I propose is to extend the request.type field
in drmVBlank structure with additional 5 bits that I call high_crtc
(there are lots of unused bits in that field). 5 bits covers for 32
CRTCs, which seems to be the hard limit imposed by various parts of the 
Xorg and DDX (e.g. possible_crtcs mask in the display descriptor, 
and the like). If high_crtc is zero, then DRM (kernel module) 
looks at the primary/secondary flag and maps them to crtc 0 and 1
(backwards compatibility with older DDX or DDX for other device 
that does not use the new high_crtc field). If it's not zero then it 
means that the higher CRTC number is specified by DDX (i.e. userland 
is a new DDX) and vblank is waited on the specified CRTC (this is used
only for crtc > 1, crtc 0 and 1, still use the old style).

In case that DDX is ahead of the kernel and tries to use the high_crtc
field, kernel will return -EINVAL (due to failed mask check), but 
the application will progress without waiting on VBLANK, which is
still better than being stuck as it is now. On the other hand, 
if DDX that is ahead of the kernel uses CRTC 0 or 1, this won't 
cause the old kernel to complain because these two CRTCs would
still use the old-style with primary/secondary flag.

So the solution is in my opinion as graceful towards as it can be 
towards old kernel and fully backwards-compatible with old DDX.

Things seem to test very well on my machines (I am currently working 
with GPUs with many displays, so I care about this support a lot), at
least when it comes to using Radeon cards. I would like some feedback 
on whether there could be any issues that I am overseeing and also
whether it would break anyone else's DDX. I had a private conversation
with Alex Deucher and he seems to be fine as far as Radeon GPUs are 
concerned. What do other folks think ? I believe that eliminating
the above-described limitation has to start at least with some GPU
and others will follow if it works out well.

There is also a potential issue a few places in Mesa that call 
drmWaitVBlank directly from there (and they lose the CRTC 
information quite early in the call stack), but I don't think 
they are used at all in DRI2 case (so we can limit the fix to DRI2).

Anyway, I'll stop here; hopefully at least few folks made it this
far in reading this ;-) and solicit some feedback. For deference, 
below is the xorg.conf that I mentioned above and three patches 
that implement the proposed fix. They are to be applied to 
the kernel, libdrm and xf86-video-ati, respectively.

-- Ilija

------------------- xorg.conf (for reference) ---------------------
Section "Monitor"
 	Identifier   "Monitor 0"
EndSection

Section "Monitor"
 	Identifier   "Monitor 1"
EndSection

Section "Monitor"
 	Identifier   "Monitor 2"
EndSection

Section "Device"
 	Identifier      "GPU 0"
 	Driver          "radeon"
 	Screen          0
 	Option          "AccelMethod"    "EXA"
 	Option          "ZaphodHeads"    "DVI-0"
 	BusID           "PCI:8:0:0"
EndSection

Section "Device"
 	Identifier      "GPU 1"
 	Driver          "radeon"
 	Screen          1
 	Option          "AccelMethod"    "EXA"
 	Option          "ZaphodHeads"    "DVI-1"
 	BusID           "PCI:8:0:0"
EndSection

Section "Device"
         Identifier      "GPU 2"
 	Driver          "radeon"
 	Screen          2
 	Option          "AccelMethod"    "EXA"
 	Option          "ZaphodHeads"    "DisplayPort-0"
 	BusID           "PCI:8:0:0"
EndSection

Section "Screen"
 	Identifier  "Screen 0"
 	Device      "GPU 0"
 	Monitor     "Monitor 0"
EndSection

Section "Screen"
 	Identifier  "Screen 1"
 	Device      "GPU 1"
 	Monitor     "Monitor 1"
EndSection

Section "Screen"
 	Identifier "Screen 2"
 	Device     "GPU 2"
 	Monitor    "Monitor 2"
EndSection

Section "ServerLayout"
         Identifier "Layout 0"
         Screen 0 "Screen 0"
         Screen 1 "Screen 1" Above "Screen 0"
 	Screen 2 "Screen 2" RightOf "Screen 0"
EndSection

------------------------- kernel patch ----------------------------------------

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 16d5155..3b0abae 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -668,7 +668,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
  {
  	union drm_wait_vblank *vblwait = data;
  	int ret = 0;
-	unsigned int flags, seq, crtc;
+	unsigned int flags, seq, crtc, high_crtc;

  	if ((!drm_dev_to_irq(dev)) || (!dev->irq_enabled))
  		return -EINVAL;
@@ -677,16 +677,21 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
  		return -EINVAL;

  	if (vblwait->request.type &
-	    ~(_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) {
+	    ~(_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK | 
+	      _DRM_VBLANK_HIGH_CRTC_MASK)) {
  		DRM_ERROR("Unsupported type value 0x%x, supported mask 0x%x\n",
  			  vblwait->request.type,
-			  (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK));
+			  (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK | 
+			   _DRM_VBLANK_HIGH_CRTC_MASK));
  		return -EINVAL;
  	}

  	flags = vblwait->request.type & _DRM_VBLANK_FLAGS_MASK;
-	crtc = flags & _DRM_VBLANK_SECONDARY ? 1 : 0;
-
+	high_crtc = (vblwait->request.type & _DRM_VBLANK_HIGH_CRTC_MASK);
+	if (high_crtc)
+		crtc = high_crtc >> _DRM_VBLANK_HIGH_CRTC_SHIFT;
+	else
+		crtc = flags & _DRM_VBLANK_SECONDARY ? 1 : 0;
  	if (crtc >= dev->num_crtcs)
  		return -EINVAL;

diff --git a/include/drm/drm.h b/include/drm/drm.h
index e5f7061..d950581 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -469,6 +469,8 @@ enum drm_vblank_seq_type {
  	_DRM_VBLANK_SECONDARY = 0x20000000,	/**< Secondary display controller */
  	_DRM_VBLANK_SIGNAL = 0x40000000	/**< Send signal instead of blocking, unsupported */
  };
+#define _DRM_VBLANK_HIGH_CRTC_SHIFT 16
+#define _DRM_VBLANK_HIGH_CRTC_MASK 0x001F0000

  #define _DRM_VBLANK_TYPES_MASK (_DRM_VBLANK_ABSOLUTE | _DRM_VBLANK_RELATIVE)
  #define _DRM_VBLANK_FLAGS_MASK (_DRM_VBLANK_EVENT | _DRM_VBLANK_SIGNAL | \


------------------------------- libdrm patch -----------------------------------

diff --git a/xf86drm.h b/xf86drm.h
index 9b89f56..65a68bf 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -302,6 +302,8 @@ typedef enum {
      DRM_VBLANK_SECONDARY = 0x20000000,	/**< Secondary display controller */
      DRM_VBLANK_SIGNAL   = 0x40000000	/* Send signal instead of blocking */
  } drmVBlankSeqType;
+#define DRM_VBLANK_HIGH_CRTC_SHIFT 16
+#define DRM_VBLANK_HIGH_CRTC_MASK 0x001F0000

  typedef struct _drmVBlankReq {
  	drmVBlankSeqType type;


------------------------------- xf86-video-ati (DDX) patch ---------------------

diff --git a/src/radeon_dri2.c b/src/radeon_dri2.c
index 66df03c..5cbe544 100644
--- a/src/radeon_dri2.c
+++ b/src/radeon_dri2.c
@@ -783,6 +783,7 @@ static int radeon_dri2_get_msc(DrawablePtr draw, CARD64 *ust, CARD64 *msc)
      drmVBlank vbl;
      int ret;
      int crtc = radeon_dri2_drawable_crtc(draw);
+    int high_crtc = 0;

      /* Drawable not displayed, make up a value */
      if (crtc == -1) {
@@ -791,8 +792,12 @@ static int radeon_dri2_get_msc(DrawablePtr draw, CARD64 *ust, CARD64 *msc)
          return TRUE;
      }
      vbl.request.type = DRM_VBLANK_RELATIVE;
-    if (crtc > 0)
+    if (crtc == 1)
          vbl.request.type |= DRM_VBLANK_SECONDARY;
+    else if (crtc > 1)
+	high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) &
+	    DRM_VBLANK_HIGH_CRTC_MASK;
+    vbl.request.type |= high_crtc;
      vbl.request.sequence = 0;

      ret = drmWaitVBlank(info->dri2.drm_fd, &vbl);
@@ -825,6 +830,7 @@ static int radeon_dri2_schedule_wait_msc(ClientPtr client, DrawablePtr draw,
      drmVBlank vbl;
      int ret, crtc = radeon_dri2_drawable_crtc(draw);
      CARD64 current_msc;
+    int high_crtc = 0;

      /* Truncate to match kernel interfaces; means occasional overflow
       * misses, but that's generally not a big deal */
@@ -855,8 +861,12 @@ static int radeon_dri2_schedule_wait_msc(ClientPtr client, DrawablePtr draw,

      /* Get current count */
      vbl.request.type = DRM_VBLANK_RELATIVE;
-    if (crtc > 0)
+    if (crtc == 1)
          vbl.request.type |= DRM_VBLANK_SECONDARY;
+    else if (crtc > 1)
+	high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+	    DRM_VBLANK_HIGH_CRTC_MASK;
+    vbl.request.type |= high_crtc;
      vbl.request.sequence = 0;
      ret = drmWaitVBlank(info->dri2.drm_fd, &vbl);
      if (ret) {
@@ -882,8 +892,12 @@ static int radeon_dri2_schedule_wait_msc(ClientPtr client, DrawablePtr draw,
          if (current_msc >= target_msc)
              target_msc = current_msc;
          vbl.request.type = DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT;
-        if (crtc > 0)
+        if (crtc == 1)
              vbl.request.type |= DRM_VBLANK_SECONDARY;
+	else if (crtc > 1)
+	    high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+		DRM_VBLANK_HIGH_CRTC_MASK;
+	vbl.request.type |= high_crtc;
          vbl.request.sequence = target_msc;
          vbl.request.signal = (unsigned long)wait_info;
          ret = drmWaitVBlank(info->dri2.drm_fd, &vbl);
@@ -903,8 +917,12 @@ static int radeon_dri2_schedule_wait_msc(ClientPtr client, DrawablePtr draw,
       * so we queue an event that will satisfy the divisor/remainder equation.
       */
      vbl.request.type = DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT;
-    if (crtc > 0)
+    if (crtc == 1)
          vbl.request.type |= DRM_VBLANK_SECONDARY;
+    else if (crtc > 1)
+	high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+	    DRM_VBLANK_HIGH_CRTC_MASK;
+    vbl.request.type |= high_crtc;

      vbl.request.sequence = current_msc - (current_msc % divisor) +
          remainder;
@@ -1029,6 +1047,7 @@ static int radeon_dri2_schedule_swap(ClientPtr client, DrawablePtr draw,
      CARD64 current_msc;
      BoxRec box;
      RegionRec region;
+    int high_crtc = 0;

      /* Truncate to match kernel interfaces; means occasional overflow
       * misses, but that's generally not a big deal */
@@ -1068,8 +1087,12 @@ static int radeon_dri2_schedule_swap(ClientPtr client, DrawablePtr draw,

      /* Get current count */
      vbl.request.type = DRM_VBLANK_RELATIVE;
-    if (crtc > 0)
+    if (crtc == 1)
          vbl.request.type |= DRM_VBLANK_SECONDARY;
+    else if (crtc > 1)
+	high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+	    DRM_VBLANK_HIGH_CRTC_MASK;
+    vbl.request.type |= high_crtc;
      vbl.request.sequence = 0;
      ret = drmWaitVBlank(info->dri2.drm_fd, &vbl);
      if (ret) {
@@ -1111,8 +1134,12 @@ static int radeon_dri2_schedule_swap(ClientPtr client, DrawablePtr draw,
           */
          if (flip == 0)
              vbl.request.type |= DRM_VBLANK_NEXTONMISS;
-        if (crtc > 0)
+        if (crtc == 1)
              vbl.request.type |= DRM_VBLANK_SECONDARY;
+	else if (crtc > 1)
+	    high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+		DRM_VBLANK_HIGH_CRTC_MASK;
+	vbl.request.type |= high_crtc;

          /* If target_msc already reached or passed, set it to
           * current_msc to ensure we return a reasonable value back
@@ -1145,8 +1172,12 @@ static int radeon_dri2_schedule_swap(ClientPtr client, DrawablePtr draw,
      vbl.request.type = DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT;
      if (flip == 0)
          vbl.request.type |= DRM_VBLANK_NEXTONMISS;
-    if (crtc > 0)
+    if (crtc == 1)
          vbl.request.type |= DRM_VBLANK_SECONDARY;
+    else if (crtc > 1)
+	high_crtc = (crtc << DRM_VBLANK_HIGH_CRTC_SHIFT) & 
+	    DRM_VBLANK_HIGH_CRTC_MASK;
+    vbl.request.type |= high_crtc;

      vbl.request.sequence = current_msc - (current_msc % divisor) +
          remainder;