[Intel-gfx] [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
Ben Widawsky
ben at bwidawsk.net
Thu Aug 15 01:22:42 CEST 2013
On Wed, Aug 14, 2013 at 11:43:58PM +0100, Chris Wilson wrote:
> These are my numbers for a beefy haswell box (note the really
> interesting numbers will be on Baytrail):
>
> unpatched:
>
> relocation: buffers= 1: old= 21945 + 34.4*reloc, lut= 21814 + 34.0*reloc (ns)
> relocation: buffers= 2: old= 15947 + 36.4*reloc, lut= 16169 + 35.4*reloc (ns)
> relocation: buffers= 4: old= 12711 + 37.6*reloc, lut= 13039 + 36.7*reloc (ns)
> relocation: buffers= 8: old= 6154 + 40.9*reloc, lut= 7201 + 38.9*reloc (ns)
> relocation: buffers= 16: old= 4846 + 41.6*reloc, lut= 5337 + 40.6*reloc (ns)
> relocation: buffers= 32: old= 7097 + 41.9*reloc, lut= 6943 + 41.0*reloc (ns)
> relocation: buffers= 64: old= 13318 + 41.9*reloc, lut= 12748 + 41.2*reloc (ns)
> relocation: buffers= 128: old= 27282 + 43.0*reloc, lut= 25778 + 41.7*reloc (ns)
> relocation: buffers= 256: old= 54535 + 45.2*reloc, lut= 51912 + 43.7*reloc (ns)
> relocation: buffers= 512: old= 137447 + 53.2*reloc, lut= 129333 + 45.5*reloc (ns)
> relocation: buffers=1024: old= 307347 + 66.5*reloc, lut= 291487 + 48.1*reloc (ns)
> relocation: buffers=2048: old= 606300 + 92.1*reloc, lut= 574774 + 51.6*reloc (ns)
> skip-relocs: buffers= 1: old= 1583 + 15.6*reloc, lut= 1516 + 14.5*reloc (ns)
> skip-relocs: buffers= 2: old= 1621 + 15.6*reloc, lut= 1603 + 14.5*reloc (ns)
> skip-relocs: buffers= 4: old= 1791 + 15.6*reloc, lut= 1777 + 14.5*reloc (ns)
> skip-relocs: buffers= 8: old= 2009 + 15.6*reloc, lut= 2024 + 14.6*reloc (ns)
> skip-relocs: buffers= 16: old= 2637 + 15.7*reloc, lut= 2564 + 14.6*reloc (ns)
> skip-relocs: buffers= 32: old= 3835 + 15.8*reloc, lut= 3785 + 14.7*reloc (ns)
> skip-relocs: buffers= 64: old= 6996 + 15.8*reloc, lut= 6681 + 14.7*reloc (ns)
> skip-relocs: buffers= 128: old= 14333 + 16.4*reloc, lut= 13560 + 15.2*reloc (ns)
> skip-relocs: buffers= 256: old= 28092 + 17.7*reloc, lut= 26759 + 16.2*reloc (ns)
> skip-relocs: buffers= 512: old= 70885 + 25.2*reloc, lut= 66713 + 17.9*reloc (ns)
> skip-relocs: buffers=1024: old= 158520 + 35.2*reloc, lut= 150828 + 20.1*reloc (ns)
> skip-relocs: buffers=2048: old= 314208 + 54.3*reloc, lut= 298343 + 22.1*reloc (ns)
> no-relocs: buffers= 1: old= 1533 + 5.2*reloc, lut= 1498 + 4.9*reloc (ns)
> no-relocs: buffers= 2: old= 1518 + 5.2*reloc, lut= 1505 + 4.9*reloc (ns)
> no-relocs: buffers= 4: old= 1647 + 5.2*reloc, lut= 1593 + 4.9*reloc (ns)
> no-relocs: buffers= 8: old= 1882 + 5.3*reloc, lut= 1874 + 5.0*reloc (ns)
> no-relocs: buffers= 16: old= 2399 + 5.3*reloc, lut= 2341 + 5.0*reloc (ns)
> no-relocs: buffers= 32: old= 3638 + 5.3*reloc, lut= 3554 + 5.0*reloc (ns)
> no-relocs: buffers= 64: old= 6622 + 5.3*reloc, lut= 6308 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old= 13584 + 5.3*reloc, lut= 12872 + 5.1*reloc (ns)
> no-relocs: buffers= 256: old= 26519 + 5.8*reloc, lut= 25234 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old= 67128 + 5.4*reloc, lut= 63054 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 146705 + 5.2*reloc, lut= 139020 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 290319 + 5.4*reloc, lut= 274705 + 5.4*reloc (ns)
>
> vma(execbuffer):
>
> relocation: buffers= 1: old= 21922 + 34.6*reloc, lut= 21510 + 34.0*reloc (ns)
> relocation: buffers= 2: old= 16851 + 37.4*reloc, lut= 17123 + 35.4*reloc (ns)
> relocation: buffers= 4: old= 13234 + 37.8*reloc, lut= 13436 + 36.9*reloc (ns)
> relocation: buffers= 8: old= 6549 + 40.8*reloc, lut= 6512 + 39.8*reloc (ns)
> relocation: buffers= 16: old= 5012 + 41.8*reloc, lut= 4883 + 41.0*reloc (ns)
> relocation: buffers= 32: old= 8591 + 42.2*reloc, lut= 8377 + 41.1*reloc (ns)
> relocation: buffers= 64: old= 16051 + 42.8*reloc, lut= 15658 + 41.7*reloc (ns)
> relocation: buffers= 128: old= 33397 + 44.5*reloc, lut= 32705 + 43.3*reloc (ns)
> relocation: buffers= 256: old= 68012 + 46.8*reloc, lut= 66904 + 45.5*reloc (ns)
> relocation: buffers= 512: old= 160162 + 56.4*reloc, lut= 155586 + 49.1*reloc (ns)
> relocation: buffers=1024: old= 348728 + 71.8*reloc, lut= 338113 + 55.1*reloc (ns)
> relocation: buffers=2048: old= 699331 + 98.7*reloc, lut= 675969 + 62.2*reloc (ns)
> skip-relocs: buffers= 1: old= 1642 + 16.5*reloc, lut= 1588 + 15.6*reloc (ns)
> skip-relocs: buffers= 2: old= 1676 + 16.4*reloc, lut= 1663 + 15.6*reloc (ns)
> skip-relocs: buffers= 4: old= 1926 + 16.4*reloc, lut= 1891 + 15.6*reloc (ns)
> skip-relocs: buffers= 8: old= 2218 + 16.6*reloc, lut= 2212 + 15.7*reloc (ns)
> skip-relocs: buffers= 16: old= 2933 + 16.6*reloc, lut= 2880 + 15.7*reloc (ns)
> skip-relocs: buffers= 32: old= 4594 + 16.6*reloc, lut= 4523 + 15.8*reloc (ns)
> skip-relocs: buffers= 64: old= 8414 + 16.8*reloc, lut= 8210 + 15.9*reloc (ns)
> skip-relocs: buffers= 128: old= 17429 + 17.9*reloc, lut= 17062 + 16.8*reloc (ns)
> skip-relocs: buffers= 256: old= 34794 + 19.8*reloc, lut= 34144 + 18.4*reloc (ns)
> skip-relocs: buffers= 512: old= 82287 + 27.6*reloc, lut= 80002 + 20.8*reloc (ns)
> skip-relocs: buffers=1024: old= 179851 + 38.0*reloc, lut= 174574 + 23.9*reloc (ns)
> skip-relocs: buffers=2048: old= 361511 + 57.2*reloc, lut= 350132 + 26.8*reloc (ns)
> no-relocs: buffers= 1: old= 1581 + 5.2*reloc, lut= 1579 + 4.9*reloc (ns)
> no-relocs: buffers= 2: old= 1609 + 5.2*reloc, lut= 1572 + 4.9*reloc (ns)
> no-relocs: buffers= 4: old= 1701 + 5.3*reloc, lut= 1685 + 4.9*reloc (ns)
> no-relocs: buffers= 8: old= 2084 + 5.3*reloc, lut= 2033 + 5.0*reloc (ns)
> no-relocs: buffers= 16: old= 2747 + 5.3*reloc, lut= 2686 + 5.0*reloc (ns)
> no-relocs: buffers= 32: old= 4379 + 5.3*reloc, lut= 4285 + 5.0*reloc (ns)
> no-relocs: buffers= 64: old= 8049 + 5.3*reloc, lut= 7850 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old= 16641 + 5.4*reloc, lut= 16301 + 5.2*reloc (ns)
> no-relocs: buffers= 256: old= 33111 + 5.7*reloc, lut= 32539 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old= 79898 + 5.4*reloc, lut= 77517 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 172199 + 5.2*reloc, lut= 166907 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 345542 + 5.2*reloc, lut= 334300 + 5.3*reloc (ns)
>
> So there is measurable degradation for the extra indirections, both for
> looking up the execbuffers and for performing the relocations. Though it
> doesn't merit anything more than a footnote in the changelog.
> -Chris
>
I'm sad I can't reproduce it. I think I amended the commit message
already, I can do more if you want.
--
Ben Widawsky, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list