[Nouveau] [PATCH] drm/nouveau: fix __nouveau_fence_wait performance regression
Marcin Slusarz
marcin.slusarz at gmail.com
Mon Mar 7 15:27:19 PST 2011
On Tue, Mar 08, 2011 at 12:22:56AM +0100, Marcin Slusarz wrote:
> On Tue, Mar 08, 2011 at 08:24:26AM +1000, Ben Skeggs wrote:
> > On Mon, 2011-03-07 at 18:18 +0000, Maarten Maathuis wrote:
> > > On Fri, Mar 4, 2011 at 4:49 PM, Marcin Slusarz <marcin.slusarz at gmail.com> wrote:
> > > > On Sun, Feb 13, 2011 at 09:38:04PM +0100, Marcin Slusarz wrote:
> > > >> Combination of locking and interchannel synchronization changes
> > > >> uncovered poor behaviour of nouveau_fence_wait, which on HZ=100
> > > >> configuration could waste up to 10 ms per call.
> > > >> Depending on application, it lead to 10-30% FPS regression.
> > > >> To fix it, shorten thread sleep time to 0.1 ms and ensure
> > > >> spinning happens for at least one *full* tick.
> > > >>
> > > >> Signed-off-by: Marcin Slusarz <marcin.slusarz at gmail.com>
> > > >> ---
> > > >> drivers/gpu/drm/nouveau/nouveau_fence.c | 10 ++++++++--
> > > >> 1 files changed, 8 insertions(+), 2 deletions(-)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > >> index 221b846..75ba5e2 100644
> > > >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > >> @@ -27,6 +27,9 @@
> > > >> #include "drmP.h"
> > > >> #include "drm.h"
> > > >>
> > > >> +#include <linux/ktime.h>
> > > >> +#include <linux/hrtimer.h>
> > > >> +
> > > >> #include "nouveau_drv.h"
> > > >> #include "nouveau_ramht.h"
> > > >> #include "nouveau_dma.h"
> > > >> @@ -230,9 +233,12 @@ int
> > > >> __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
> > > >> {
> > > >> unsigned long timeout = jiffies + (3 * DRM_HZ);
> > > >> - unsigned long sleep_time = jiffies + 1;
> > > >> + unsigned long sleep_time = jiffies + 2;
> > > >> + ktime_t t;
> > > >> int ret = 0;
> > > >>
> > > >> + t = ktime_set(0, NSEC_PER_MSEC / 10);
> > > >> +
> > > >> while (1) {
> > > >> if (__nouveau_fence_signalled(sync_obj, sync_arg))
> > > >> break;
> > > >> @@ -245,7 +251,7 @@ __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
> > > >> __set_current_state(intr ? TASK_INTERRUPTIBLE
> > > >> : TASK_UNINTERRUPTIBLE);
> > > >> if (lazy && time_after_eq(jiffies, sleep_time))
> > > >> - schedule_timeout(1);
> > > >> + schedule_hrtimeout(&t, HRTIMER_MODE_REL);
> > > >>
> > > >> if (intr && signal_pending(current)) {
> > > >> ret = -ERESTARTSYS;
> > > >> --
> > > >> 1.7.4.rc3
> > > >>
> > > >
> > > > ping again
> > > > _______________________________________________
> > > > Nouveau mailing list
> > > > Nouveau at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/nouveau
> > > >
> > >
> > > This looks ok to me, but I would like to get Ben Skeggs ok on this one
> > > as well. So i've CC'ed him, hopefully he'll notice :-)
> > Ah sorry, I have actually looked at this quite a while back but came to
> > no solid conclusion.
> >
> > While yes, I did see some minor performance improvement from it, I also
> > notice that now we once again get 100% CPU usage while an app is waiting
> > for the GPU a lot..
>
> It's not "minor" performance improvement:
>
> without this patch (FPS):
> nexuiz: 53
> wop: 181
> tremulous: 157
> wsw0.5: 89
> glxgears: 730
>
> with:
> nexuiz: 63 (+18%)
> wop: 248 (+37%)
> tremulous: 156 (-0.6%)
> wsw0.5: 91 (+2%)
> glxgears: 1054 (+44%)
>
>
> Ok, so you are worried about CPU usage... Let's see what will happen if
> I remove spinning added by "drm/nouveau: Spin for a bit in
> nouveau_fence_wait() before yielding the CPU".
>
> reduced version (attached):
> nexuiz: 62
> wop: 248
> trem: 157
> wsw0.5: 90
> glxgears: 1055
>
> Good enough?
Just for comparison:
kernel 2.6.37 + pre nvc0fied mesa had:
nexuiz: 34
wop: 139
tremulous: 82
wsw0.5: 53
glxgears: 1056
> ---
> From: Marcin Slusarz <marcin.slusarz at gmail.com>
> Subject: [PATCH] drm/nouveau: fix __nouveau_fence_wait performance regression
>
> Combination of locking and interchannel synchronization changes
> uncovered poor behaviour of nouveau_fence_wait, which on HZ=100
> configuration could waste up to 10 ms per call.
> Depending on application, it lead to 10-30% FPS regression.
>
> To fix it, shorten thread sleep time to 0.1 ms.
>
> Additionally, remove spinning (added by "drm/nouveau: Spin for
> a bit in nouveau_fence_wait() before yielding the CPU"), because
> it's not needed anymore.
>
> Signed-off-by: Marcin Slusarz <marcin.slusarz at gmail.com>
> ---
> drivers/gpu/drm/nouveau/nouveau_fence.c | 11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index a244702..010243b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -27,6 +27,9 @@
> #include "drmP.h"
> #include "drm.h"
>
> +#include <linux/ktime.h>
> +#include <linux/hrtimer.h>
> +
> #include "nouveau_drv.h"
> #include "nouveau_ramht.h"
> #include "nouveau_dma.h"
> @@ -229,9 +232,11 @@ int
> __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
> {
> unsigned long timeout = jiffies + (3 * DRM_HZ);
> - unsigned long sleep_time = jiffies + 1;
> + ktime_t t;
> int ret = 0;
>
> + t = ktime_set(0, NSEC_PER_MSEC / 10);
> +
> while (1) {
> if (__nouveau_fence_signalled(sync_obj, sync_arg))
> break;
> @@ -243,8 +248,8 @@ __nouveau_fence_wait(void *sync_obj, void *sync_arg, bool lazy, bool intr)
>
> __set_current_state(intr ? TASK_INTERRUPTIBLE
> : TASK_UNINTERRUPTIBLE);
> - if (lazy && time_after_eq(jiffies, sleep_time))
> - schedule_timeout(1);
> + if (lazy)
> + schedule_hrtimeout(&t, HRTIMER_MODE_REL);
>
> if (intr && signal_pending(current)) {
> ret = -ERESTARTSYS;
> --
> 1.7.4.rc3
>
More information about the Nouveau
mailing list