[Nouveau] [PATCH] drm/nouveau: fix __nouveau_fence_wait performance regression
Francisco Jerez
currojerez at riseup.net
Tue Mar 8 08:22:52 PST 2011
Marcin Slusarz <marcin.slusarz at gmail.com> writes:
> On Tue, Mar 08, 2011 at 01:58:50AM +0100, Francisco Jerez wrote:
>> Marcin Slusarz <marcin.slusarz at gmail.com> writes:
>>
>> > On Tue, Mar 08, 2011 at 08:24:26AM +1000, Ben Skeggs wrote:
>> >> On Mon, 2011-03-07 at 18:18 +0000, Maarten Maathuis wrote:
>> >> > On Fri, Mar 4, 2011 at 4:49 PM, Marcin Slusarz <marcin.slusarz at gmail.com> wrote:
>> >> > > On Sun, Feb 13, 2011 at 09:38:04PM +0100, Marcin Slusarz wrote:
>> >> > >> Combination of locking and interchannel synchronization changes
>> >> > >> uncovered poor behaviour of nouveau_fence_wait, which on HZ=100
>> >> > >> configuration could waste up to 10 ms per call.
>> >> > >> Depending on application, it lead to 10-30% FPS regression.
>> >> > >> To fix it, shorten thread sleep time to 0.1 ms and ensure
>> >> > >> spinning happens for at least one *full* tick.
>> >> > >>
>> >> > >> Signed-off-by: Marcin Slusarz <marcin.slusarz at gmail.com>
>> >> > >> ---
>> >> > >> drivers/gpu/drm/nouveau/nouveau_fence.c | 10 ++++++++--
>> >> > >> 1 files changed, 8 insertions(+), 2 deletions(-)
>> >> > >>
>> >> > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> >> > >> index 221b846..75ba5e2 100644
>> >> > >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>> >> > >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> >> > >> @@ -27,6 +27,9 @@
>> >> > >> #include "drmP.h"
>> >> > >> #include "drm.h"
>> >> > >>
>> >> > >> +#include <linux/ktime.h>
>> >> > >> +#include <linux/hrtimer.h>
>> >> > >> +
>> >> > >> #include "nouveau_drv.h"
>> >> > >> #include "nouveau_ramht.h"
>> >> > >> #include "nouveau_dma.h"
>> >> > >> @@ -230,9 +233,12 @@ int
>> >> > >> __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
>> >> > >> {
>> >> > >> unsigned long timeout = jiffies + (3 * DRM_HZ);
>> >> > >> - unsigned long sleep_time = jiffies + 1;
>> >> > >> + unsigned long sleep_time = jiffies + 2;
>> >> > >> + ktime_t t;
>> >> > >> int ret = 0;
>> >> > >>
>> >> > >> + t = ktime_set(0, NSEC_PER_MSEC / 10);
>> >> > >> +
>> >> > >> while (1) {
>> >> > >> if (__nouveau_fence_signalled(sync_obj, sync_arg))
>> >> > >> break;
>> >> > >> @@ -245,7 +251,7 @@ __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
>> >> > >> __set_current_state(intr ? TASK_INTERRUPTIBLE
>> >> > >> : TASK_UNINTERRUPTIBLE);
>> >> > >> if (lazy && time_after_eq(jiffies, sleep_time))
>> >> > >> - schedule_timeout(1);
>> >> > >> + schedule_hrtimeout(&t, HRTIMER_MODE_REL);
>> >> > >>
>> >> > >> if (intr && signal_pending(current)) {
>> >> > >> ret = -ERESTARTSYS;
>> >> > >> --
>> >> > >> 1.7.4.rc3
>> >> > >>
>> >> > >
>> >> > > ping again
>> >> > > _______________________________________________
>> >> > > Nouveau mailing list
>> >> > > Nouveau at lists.freedesktop.org
>> >> > > http://lists.freedesktop.org/mailman/listinfo/nouveau
>> >> > >
>> >> >
>> >> > This looks ok to me, but I would like to get Ben Skeggs ok on this one
>> >> > as well. So i've CC'ed him, hopefully he'll notice :-)
>> >> Ah sorry, I have actually looked at this quite a while back but came to
>> >> no solid conclusion.
>> >>
>> >> While yes, I did see some minor performance improvement from it, I also
>> >> notice that now we once again get 100% CPU usage while an app is waiting
>> >> for the GPU a lot..
>> >
>> > It's not "minor" performance improvement:
>> >
>> > without this patch (FPS):
>> > nexuiz: 53
>> > wop: 181
>> > tremulous: 157
>> > wsw0.5: 89
>> > glxgears: 730
>> >
>> > with:
>> > nexuiz: 63 (+18%)
>> > wop: 248 (+37%)
>> > tremulous: 156 (-0.6%)
>> > wsw0.5: 91 (+2%)
>> > glxgears: 1054 (+44%)
>> >
>> >
>> > Ok, so you are worried about CPU usage... Let's see what will happen if
>> > I remove spinning added by "drm/nouveau: Spin for a bit in
>> > nouveau_fence_wait() before yielding the CPU".
>> >
>> > reduced version (attached):
>> > nexuiz: 62
>> > wop: 248
>> > trem: 157
>> > wsw0.5: 90
>> > glxgears: 1055
>> >
>> > Good enough?
>>
>> Remember to exercise some software fallbacks as well (e.g. something
>> using core fonts), software fallbacks were the main users of the
>> spinning you've removed.
>
> corefonts are pretty fast (measured "time dmesg"):
>
> without (spinning + timeout 10ms): 0.08s
> with (spinning + hrtimeout 0.1ms): 0.08s
> reduced (no spinning + hrtimeout 0.1ms): 0.25s
> old (no spinning + timeout 10ms): 13s
>
Ah, so it's still trading one performance regression for another, and
you could make everyone happy at the same time.
> So I think "no spinning + hrtimeout 0.1ms" is a reasonable compromise...
>
What's the CPU usage difference between the spinning and the no-spinning
cases? It's likely to be negligible for most applications aside from the
ones using queries and fallbacks intensively, and in those two cases I
agree with you that optimizing for low CPU usage doesn't make a huge lot
of sense, getting low latency is already hard enough.
If I'm wrong and the initial spinning affects the overall CPU usage
negatively, then we have two different use cases with different latency
requirements and the DRM API needs to be fixed (though, there're maybe
other solutions to explore first, like, start with a really small
hrtimeout and increase it exponentially up to some cut-off value).
> BTW, old behaviour (no spinning + timeout 10ms) affects other workloads too
> nexuiz: 50
> wop: 153
> tremulous: 155
> wsw0.5: 89
> glxgears: 100 (!)
>
>> Anyway, software fallbacks and occlusion queries are the only two places
>> (that I can think of now) where we need the low latency your patch
>> gives, and, as Ben already pointed out, we probably want to keep CPU
>> usage at minimum in every other case. As a middle ground, the "lazy"
>> flag (or rather, a "hog" flag?) could be exposed all the way up to
>> userspace, and those two cases fixed to set the flag differently.
>>
>> What do you think?
>
> I'm not sure. I think optimizing for low CPU usage is not the best what
> we can do right now. 3D performance is still too low behind blob.
> Let's fix 3D perf first and think about CPU usage later.
>
IMHO, switching to lazy waits was the right choice at this stage, it
doesn't make optimizing for "3D performance" any harder, quite the
opposite, it helps to pinpoint some poorly-pipelining programming
practices by making the already existing performance problem more
obvious.
>> >
>> > ---
>> > From: Marcin Slusarz <marcin.slusarz at gmail.com>
>> > Subject: [PATCH] drm/nouveau: fix __nouveau_fence_wait performance regression
>> >
>> > Combination of locking and interchannel synchronization changes
>> > uncovered poor behaviour of nouveau_fence_wait, which on HZ=100
>> > configuration could waste up to 10 ms per call.
>> > Depending on application, it lead to 10-30% FPS regression.
>> >
>> > To fix it, shorten thread sleep time to 0.1 ms.
>> >
>> > Additionally, remove spinning (added by "drm/nouveau: Spin for
>> > a bit in nouveau_fence_wait() before yielding the CPU"), because
>> > it's not needed anymore.
>> >
>> > Signed-off-by: Marcin Slusarz <marcin.slusarz at gmail.com>
>> > ---
>> > drivers/gpu/drm/nouveau/nouveau_fence.c | 11 ++++++++---
>> > 1 files changed, 8 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> > index a244702..010243b 100644
>> > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>> > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> > @@ -27,6 +27,9 @@
>> > #include "drmP.h"
>> > #include "drm.h"
>> >
>> > +#include <linux/ktime.h>
>> > +#include <linux/hrtimer.h>
>> > +
>> > #include "nouveau_drv.h"
>> > #include "nouveau_ramht.h"
>> > #include "nouveau_dma.h"
>> > @@ -229,9 +232,11 @@ int
>> > __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
>> > {
>> > unsigned long timeout = jiffies + (3 * DRM_HZ);
>> > - unsigned long sleep_time = jiffies + 1;
>> > + ktime_t t;
>> > int ret = 0;
>> >
>> > + t = ktime_set(0, NSEC_PER_MSEC / 10);
>> > +
>> > while (1) {
>> > if (__nouveau_fence_signalled(sync_obj, sync_arg))
>> > break;
>> > @@ -243,8 +248,8 @@ __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
>> >
>> > __set_current_state(intr ? TASK_INTERRUPTIBLE
>> > : TASK_UNINTERRUPTIBLE);
>> > - if (lazy && time_after_eq(jiffies, sleep_time))
>> > - schedule_timeout(1);
>> > + if (lazy)
>> > + schedule_hrtimeout(&t, HRTIMER_MODE_REL);
>> >
>> > if (intr && signal_pending(current)) {
>> > ret = -ERESTARTSYS;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20110308/8900d569/attachment.pgp>
More information about the Nouveau
mailing list