<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 19, 2017 at 7:43 AM, Chris Wilson <span dir="ltr"><<a href="mailto:chris@chris-wilson.co.uk" target="_blank">chris@chris-wilson.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Quoting Eric Engestrom (2017-06-19 15:30:46)<br> <span class="">> On Monday, 2017-06-19 13:02:11 +0100, Chris Wilson wrote:<br> > > Quoting Emil Velikov (2017-06-19 12:43:42)<br> > > > Hi Chris,<br> > > ><br> > > > On 19 June 2017 at 12:32, Chris Wilson <<a href="mailto:chris@chris-wilson.co.uk">chris@chris-wilson.co.uk</a>> wrote:<br> > > > > On linux/x86_64, calling into the kernel is just a single instruction<br> > > > > with the parameters passed via registers. We can therefre shortcircuit a<br> > > > > couple of library indirections around ioctl() which as much as the<br> > > > > switch into the kernel itself. Aside from the PLT indirection, libc's<br> > > > > ioctl() has a slight impedance mismatch with the kernel interface in<br> > > > > that it converts the -errno return into -1 + errno, which we immediately<br> > > > > convert back into -errno for ourselves!<br> > > > ><br> > > > Did you measure any performance difference with this patch/series?<br> > > > If so, please include the numbers in the commit message.<br> > ><br> > > Realistically you are not going to observe any performance improvement.<br> > > The only time when there is non-negligible time inside the drmIoct/ioctl<br> > > library functions is when the client is doing a busy-spin waiting for<br> > > results from the GPU. All this is doing is removing some junk from the<br> > > profiler -- it looks impressive in perf, but since it was GPU bound in<br> > > the first place making the busy-spinner faster is uneventful.<br> > ><br> > > Just a cosmetic improvement, but quite straightforward.<br></span></blockquote><div><br></div><div>I think it's a bit hard to call replacing a standard library function with inline assembly a "cosmetic improvement". :-)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""> > > -Chris<br> ><br> > I'm confused, if this doesn't improve performances, why reimplement<br> > drmIoctl() in Mesa? Or was the point that you made it `static inline`?<br> > Should your assembly hunk be applied in drmIoctl() in libdrm, maybe?<br> <br> </span>Because the profile was spending more time in drmIoctl() + __GI_ioctl() than<br> in the kernel, which annoyed me. drmIoctl() still has the impedance<br> mismatch, and is not used by anv.</blockquote></div><br></div><div class="gmail_extra">Getting rid of the impedance mismatch in error handling seems like the only good reason to do this. I'll have to give it more thought before I have a solid opinion on whether errno or assembly is worse. I do like pulling the anv_ioctl logic into some place common though.<br><br>At some point, I'd like to move the special-case for I915_GEM_SET_TILING to the common helper so we can stop hand-rolling things just for that.<br></div></div>