Performance improvement to vga arbitration

Fri Jun 11 19:38:14 PDT 2010

On 06/11/10 04:56, Tiago Vignatti wrote:
> Henry,
>
> Thanks for bring up this idea to the community. I'm cc-ing Benjamin and nVidia
> guys who were recently discussing more in-depth improvements on this area (I'm
> not exactly working on it; just doing some fixes here and there). Anyway, I'll
> post some comments bellow.
>
>
> On Thu, Jun 10, 2010 at 08:02:43AM +0200, ext Henry Zhao wrote:
>    
>> It appears  that after session is up,  in most cases,  drivers only do
>> non-legacy accesses.
>> Non-legacy accesses do not need to block each other. Blocking
>> arbitration is needed
>> mostly for session initialization and exiting.
>>      
> there's also VT switching case (or this is what you wanted to say with
> "session exiting"?). Anyway, what you said here is very true: we don't want
> VGA decoding enabled in graphics mode. See the patch next...
>    

Yes.  Here "session existing" includes VT switching.  In "graphics mode"
we still need VGA arbitration, but in a different way with a different lock
(locks2) as described.

>    
>> To improve performance, we need to treat
>> differently to legacy and non-legacy accesses, and allow non-legacy
>> accesses to proceed
>> concurrently among devices without blocking each other. Non-legacy
>> accesses is assumed
>> to be the default for operating functions after initialization.
>>      
> ... over here:
>
> http://people.freedesktop.org/~vignatti/tmp/0001-xfree86-vgaarb-disable-VGA-decoding-after-POST.patch
>
> At this point on the X server, we already POSTed all cards and we could be
> optimistic, letting the drivers say when we actually need VGA legacy. Right
> now, as you said, we're assuming legacy access as default whenever the system
> has more than one card, not driven by DRI (DRI and VGA legacy conflict with
> the current design. Dave preferred to let this away).
>    

The patch didn't go to git master yet ?  We still need arbitration once a
device gets "graphics mode".  Say devices "a" and "b" are working in
"graphics mode", then

(1) A new device "c" may start, which does legacy accesses during 
initialization;
(2) While "b" is still running, device "a" may exit, which also involves 
legacy accesses.

Therefore we need to use another lock ("locks2"), in addition to the 
current "locks"
(for legacy accesses), for non-legacy accesses such that "locks2" and 
"locks" are in
conflict, but "locks2"s themselves are not.

Besides, we cannot rule out possibilities that some drivers may use legacy
access in some operations in "graphics mode" in certain cases, therefore 
we need
to give drivers a final say on kind of accesses they use for the 
operations, although
the default is non-legacy access.

>    
>> In case
>> legacy accesses are
>> necessary for some of them, drivers can redefine them per function group
>> bases.
>> Here are some details:
>>
>> (1) New lock for non-legacy access
>>
>>    Define another lock, vgadev->locks2 (locks2), for non-legacy access
>> locking
>>    in addition to vgadev->locks (locks1), currently used for legacy access
>>    locking.
>>
>>    Non-legacy access requests from a device that does not have legacy access
>>    decoding ability should always be honored without a need of acquiring
>> a lock.
>>    Non-legacy access requests from a device that has legacy access decoding
>>    ability needs to acquire locks2 before proceeding.
>>
>>    Request for locks2 is blocked only when some other device already has
>> locks1
>>    (on the same resources).  Request for locks1 is blocked when some
>> other device
>>    already has locks1 or locks2 (on the same resource). This means
>> request for
>>    locks2 should not be blocked just because some other device already
>> has locks2
>>    (on the same resources).
>>      
> I'm not that sure we need this new lock. However I'm worried that we might
> do have the case you mentioned here: "Request for locks2 is blocked only when
> some other device already has locks1 (on the same resources)".
>
> Dave? Ben?
>    

See comments above.

>    
>>    Currently we have 4 defines for resource request:
>>
>>          VGA_RSRC_LEGACY_IO
>>          VGA_RSRC_LEGACY_MEM
>>          VGA_RSRC_NORMAL_IO
>>          VGA_RSRC_NORMAL_MEM
>>
>>    but only two strings for them, "io" and "mem". Add "IO" and "MEM" for non-
>>    legacy accesses.
>>      
> yeah, last I seen this flags were underused.
>
>
>    
>> (2) Function group based resource request
>>
>>    Need to distinguish between decoding ability and decoding request
>> (resource
>>    request). Decoding ability is still maintained in struct vga_device of
>> kernel
>>    driver with
>>
>>          unsigned int decodes;
>>
>>    and a userland copy in dev->vgaarb_rsrc.
>>
>>    Currently all lock/unlocking mechanism uses resource requests from
>>    dev->vgaarb_rsrc, which is actually decoding ability. In new design
>> however,
>>    this is only the case for xf86VGAarbiterLock() and
>> xf86VGAarbiterUnlock(), run
>>    during session initialization and exiting. During normal run, resource
>> request
>>    is determined by a resource mask associated with each function.
>>
>>    Wrapping function are grouped into MAX_VGAARB_OPS_MASK number of
>>    groups with resource masks assigned to each of them. The default
>> setting of mask is
>>    VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM, meaning non-legacy
>>    access, but drivers can redefine any of them. In an extreme if a
>> driver redefines all
>>    masks to
>>
>>    VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM|
>>    VGA_RSRC_LEGACY_IO|VGA_RSRC_LEGACY_MEM
>>
>>    we are returning to old arbitration algorithm.
>>      
> Acked.
>
> BTW, our current decoding interface is broken. Well, the kernel side is broken
> either, moving the ownership of decoding to another random card when the given
> one owning is stop to decode. I have a hack to fix this for my system:
>
> http://people.freedesktop.org/~vignatti/tmp/0003-vgaarb-HACK-assume-safe-state-if-only-one-card-dec.patch
>    

I noticed the problem.  I didn't fix it in the patch.
> Also, I posted two patches to fix the userspace decoding interface on the
> mailing list some days ago. If you're carrying about remove cards from
> arbitration and stop decoding then definitely you'll want to take a look on
> those:
>
> http://lists.x.org/archives/xorg-devel/2010-June/009559.html
>    

I had this fixed in my patch.
>    
>> (3) Other changes
>>
>>    * pci_device_vgaarb_set_target() is heavily called. Currently it
>> involves two
>>      syscalls.  These calls can be saved if the device in question is the
>> same as
>>      in the previous call (recorded in pci_sys->vga_target). This contributes
>>      to major performance improvement.
>>      
> Indeed!
>
>
>    
>>    * OpenConsole()/CloseConsole() need to be protected by lock and unlock
>> as they
>>      may have vga register accesses. Further,
>> OpenConsole()/CloseConsole() is run
>>      only on a session with primary device.
>>      
> possibly.
>
>
>    
>> I am posting the design idea for comments.
>>
>> (This has been implemented and tested on both Linux and Solaris  systems.)
>>      
> So do you have patches already? If so, please post on the mailing lists so we
> can take a look.
>
>
> Thank you,
>
>               Tiago
>    

3 patches are posted.  Some comments:

(1) Here "owns" in vga_device is defined as the actual state of PCI 
decoding bits. Since we have
     only one bit for both legacy and non-legacy accesses, there are 
only 4 possible values:

     none
     io+IO
     mem+MEM
     io+mem+IO+MEM

     These are snapshots from a testing run of 2 programs that keep 
locking/unlocking
     different kinds of resources on a 2 device system:

count:2,PCI:0000:02:00.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=io+mem(1:1),locks2=IO+MEM(1:1)
count:2,PCI:0000:01:08.0,decodes=io+mem+IO+MEM,owns=none,locks=none(0:0),locks2=none(0:0)

count:2,PCI:0000:02:00.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=none(0:0),locks2=IO+MEM(1:1)
count:2,PCI:0000:01:08.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=none(0:0),locks2=none(0:0)

count:2,PCI:0000:02:00.0,decodes=io+mem+IO+MEM,owns=none,locks=none(0:0),locks2=none(0:0)
count:2,PCI:0000:01:08.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=io+mem(1:1),locks2=none(0:0)

count:2,PCI:0000:02:00.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=none(0:0),locks2=IO+MEM(1:1)
count:2,PCI:0000:01:08.0,decodes=io+mem+IO+MEM,owns=io+mem+IO+MEM,locks=none(0:0),locks2=IO+MEM(1:1)

     It can be seen while "locks" is exclusive,  "locks2" is not.

(2) vga_update_device_decodes() needs an update, but not in patch yet.

(3) In multiple device system, saw hanging in reading config space in 
pci_device_linux_sysfs_probe()
     during system start (on some devices). Reducing reading to config 
head (64 bytes) seems to solve
     the problem.  Don't know why - don't know what exactly happens reading
     /sys/bus/pci/devices/<device_id>/conifg

(4) In pci_device_vgaarb_set_target(struct pci_device *dev, int force), 
force to read rsrc and
     count from kernel when force is 1.  When force is 0, optimization 
can be done to save
     2 syscalls.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vgaarb.patch
Type: text/x-diff
Size: 17798 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20100611/59a4355f/attachment-0003.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xserver.patch
Type: text/x-diff
Size: 24463 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20100611/59a4355f/attachment-0004.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libpciaccess.patch
Type: text/x-diff
Size: 7314 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20100611/59a4355f/attachment-0005.patch>