[Intel-gfx] [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
Ben Widawsky
ben at bwidawsk.net
Wed Aug 14 03:11:59 CEST 2013
On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote:
> From: Ben Widawsky <ben at bwidawsk.net>
>
> In order to transition more of our code over to using a VMA instead of
> an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
> until now, we've only had a VMA when actually binding an object.
>
> The previous patch helped handle the distinction on bound vs. unbound.
> This patch will help us catch leaks, and other issues before we actually
> shuffle a bunch of stuff around.
>
> This attempts to convert all the execbuf code to speak in vmas. Since
> the execbuf code is very self contained it was a nice isolated
> conversion.
>
> The meat of the code is about turning eb_objects into eb_vma, and then
> wiring up the rest of the code to use vmas instead of obj, vm pairs.
>
> Unfortunately, to do this, we must move the exec_list link from the obj
> structure. This list is reused in the eviction code, so we must also
> modify the eviction code to make this work.
>
> WARNING: This patch makes an already hotly profiled path slower. The cost is
> unavoidable. In reply to this mail, I will attach the extra data.
>
[snip]
Here is the output from gem_exec_lut_handle both before and after this
patch. The results honestly don't make sense to me, but I'll set Chris
parse it before scratching my head harder.
Before patch
============
relocation: buffers= 1: old= 8060 + 165.3*reloc, lut= 7816 + 164.8*reloc (ns)
relocation: buffers= 2: old= 6748 + 166.6*reloc, lut= 6952 + 165.4*reloc (ns)
relocation: buffers= 4: old= 8140 + 165.9*reloc, lut= 8216 + 165.4*reloc (ns)
relocation: buffers= 8: old= 10732 + 166.0*reloc, lut= 10615 + 165.2*reloc (ns)
relocation: buffers= 16: old= 15099 + 167.8*reloc, lut= 15337 + 165.3*reloc (ns)
relocation: buffers= 32: old= 26140 + 166.0*reloc, lut= 25488 + 165.5*reloc (ns)
relocation: buffers= 64: old= 46300 + 170.5*reloc, lut= 44279 + 166.7*reloc (ns)
relocation: buffers= 128: old= 84056 + 176.9*reloc, lut= 85379 + 166.3*reloc (ns)
relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns)
relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns)
relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns)
relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns)
skip-relocs: buffers= 1: old= 4445 + 16.0*reloc, lut= 4433 + 15.6*reloc (ns)
skip-relocs: buffers= 2: old= 4585 + 16.0*reloc, lut= 4571 + 15.6*reloc (ns)
skip-relocs: buffers= 4: old= 5667 + 16.0*reloc, lut= 5340 + 15.6*reloc (ns)
skip-relocs: buffers= 8: old= 6051 + 16.1*reloc, lut= 6026 + 15.6*reloc (ns)
skip-relocs: buffers= 16: old= 7953 + 16.1*reloc, lut= 7914 + 15.6*reloc (ns)
skip-relocs: buffers= 32: old= 11972 + 16.2*reloc, lut= 11875 + 15.7*reloc (ns)
skip-relocs: buffers= 64: old= 19999 + 16.5*reloc, lut= 19832 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old= 37796 + 16.9*reloc, lut= 36539 + 15.9*reloc (ns)
skip-relocs: buffers= 256: old= 71604 + 18.1*reloc, lut= 71313 + 16.5*reloc (ns)
skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns)
skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns)
skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns)
no-relocs: buffers= 1: old= 4194 + 5.1*reloc, lut= 4206 + 4.8*reloc (ns)
no-relocs: buffers= 2: old= 4404 + 5.1*reloc, lut= 4381 + 4.8*reloc (ns)
no-relocs: buffers= 4: old= 4926 + 5.1*reloc, lut= 4921 + 4.8*reloc (ns)
no-relocs: buffers= 8: old= 5901 + 5.1*reloc, lut= 5822 + 4.9*reloc (ns)
no-relocs: buffers= 16: old= 7840 + 5.1*reloc, lut= 7737 + 4.9*reloc (ns)
no-relocs: buffers= 32: old= 11842 + 5.1*reloc, lut= 11681 + 4.9*reloc (ns)
no-relocs: buffers= 64: old= 19741 + 5.1*reloc, lut= 19542 + 4.8*reloc (ns)
no-relocs: buffers= 128: old= 36479 + 5.2*reloc, lut= 35958 + 4.9*reloc (ns)
no-relocs: buffers= 256: old= 70171 + 5.4*reloc, lut= 69390 + 5.2*reloc (ns)
no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns)
no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns)
no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns)
After patch
===========
relocation: buffers= 1: old= 8075 + 81.4*reloc, lut= 7592 + 80.6*reloc (ns)
relocation: buffers= 2: old= 5744 + 82.3*reloc, lut= 5837 + 81.1*reloc (ns)
relocation: buffers= 4: old= 4875 + 82.7*reloc, lut= 4871 + 81.6*reloc (ns)
relocation: buffers= 8: old= 5729 + 82.7*reloc, lut= 5698 + 81.5*reloc (ns)
relocation: buffers= 16: old= 7952 + 83.0*reloc, lut= 7809 + 81.9*reloc (ns)
relocation: buffers= 32: old= 11884 + 82.9*reloc, lut= 11702 + 81.6*reloc (ns)
relocation: buffers= 64: old= 20388 + 83.4*reloc, lut= 19995 + 82.2*reloc (ns)
relocation: buffers= 128: old= 38057 + 85.0*reloc, lut= 37675 + 83.4*reloc (ns)
relocation: buffers= 256: old= 74912 + 87.0*reloc, lut= 74064 + 85.4*reloc (ns)
relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns)
relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns)
relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns)
skip-relocs: buffers= 1: old= 2966 + 16.6*reloc, lut= 2963 + 15.6*reloc (ns)
skip-relocs: buffers= 2: old= 3083 + 16.5*reloc, lut= 3056 + 15.5*reloc (ns)
skip-relocs: buffers= 4: old= 3279 + 16.6*reloc, lut= 3242 + 15.6*reloc (ns)
skip-relocs: buffers= 8: old= 3692 + 16.7*reloc, lut= 3654 + 15.6*reloc (ns)
skip-relocs: buffers= 16: old= 4522 + 16.7*reloc, lut= 4461 + 15.5*reloc (ns)
skip-relocs: buffers= 32: old= 6254 + 16.7*reloc, lut= 6138 + 15.7*reloc (ns)
skip-relocs: buffers= 64: old= 10098 + 16.8*reloc, lut= 9939 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old= 17983 + 17.6*reloc, lut= 17729 + 16.3*reloc (ns)
skip-relocs: buffers= 256: old= 34388 + 18.8*reloc, lut= 33981 + 17.6*reloc (ns)
skip-relocs: buffers= 512: old= 74211 + 25.2*reloc, lut= 72185 + 18.6*reloc (ns)
skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns)
skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns)
no-relocs: buffers= 1: old= 2840 + 5.1*reloc, lut= 2834 + 4.8*reloc (ns)
no-relocs: buffers= 2: old= 2938 + 5.1*reloc, lut= 2917 + 4.8*reloc (ns)
no-relocs: buffers= 4: old= 3220 + 5.1*reloc, lut= 3201 + 4.8*reloc (ns)
no-relocs: buffers= 8: old= 3614 + 5.1*reloc, lut= 3545 + 4.8*reloc (ns)
no-relocs: buffers= 16: old= 4437 + 5.1*reloc, lut= 4368 + 4.8*reloc (ns)
no-relocs: buffers= 32: old= 6105 + 5.1*reloc, lut= 6024 + 4.9*reloc (ns)
no-relocs: buffers= 64: old= 9864 + 5.1*reloc, lut= 9652 + 4.9*reloc (ns)
no-relocs: buffers= 128: old= 17388 + 5.1*reloc, lut= 17126 + 4.9*reloc (ns)
no-relocs: buffers= 256: old= 33087 + 5.4*reloc, lut= 32668 + 5.3*reloc (ns)
no-relocs: buffers= 512: old= 71476 + 5.0*reloc, lut= 69464 + 4.9*reloc (ns)
no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns)
no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns)
--
Ben Widawsky, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list