[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2

Christian König christian.koenig at amd.com
Fri Aug 15 09:54:28 PDT 2014


Am 15.08.2014 um 17:32 schrieb Grigori Goronzy:
> On 15.08.2014 17:26, Alex Deucher wrote:
>> On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <greg at chown.ath.cx> wrote:
>>> On 15.08.2014 16:11, Christian König wrote:
>>>> Hi Marco,
>>>>
>>>> the problem with an CS ioctl flag is that we sometimes don't know how
>>>> much SCLK/MCLK boost is needed, for example when we do post processing
>>>> in the player using OpenGL and UVD decoding with VDPAU. In this case
>>>> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so
>>>> can't give that info to the kernel either.
>>>>
>>> Maybe it's an acceptable workaround to simply disable dynamic UVD state
>>> selection in case the UVD states only have a single power level. That
>>> will avoid the performance issues on affected systems, while still
>>> allowing dynamic UVD states on systems that have a saner DPM table
>>> setup. I think it is mosly older systems that suffer from this.
>>>
>> That is exactly what we do now.
>>
> Is it? In 3.17-wip, dynamic UVD state selection (according to active
> streams) is still completely disabled. It will always use the generic
> UVD state. In fact wasn't it reverted again because of the performance
> issues on some systems?

This is the performance table of my laptop (at least the interesting 
parts), which I think is a typical example of the problem:

[    4.106772] == power state 1 ==
[    4.106774]     ui class: performance
[    4.106776]     internal class: none
[    4.106780]     uvd    vclk: 0 dclk: 0
[    4.106782]         power level 0    sclk: 20000 vddc_index: 2
[    4.106784]         power level 1    sclk: 50000 vddc_index: 2
[    4.106805] == power state 3 ==
[    4.106807]     ui class: none
[    4.106808]     internal class: uvd
[    4.106813]     uvd    vclk: 55000 dclk: 40000
[    4.106816]         power level 0    sclk: 50000 vddc_index: 2
[    4.106818]         power level 1    sclk: 50000 vddc_index: 2
[    4.106820]     status:
[    4.106822] == power state 4 ==
[    4.106823]     ui class: battery
[    4.106825]     internal class: uvd_hd
[    4.106831]     uvd    vclk: 40000 dclk: 30000
[    4.106833]         power level 0    sclk: 38000 vddc_index: 1
[    4.106835]         power level 1    sclk: 38000 vddc_index: 1
[    4.106839] == power state 5 ==
[    4.106841]     ui class: battery
[    4.106843]     internal class: uvd_sd
[    4.106848]     uvd    vclk: 40000 dclk: 30000
[    4.106850]         power level 0    sclk: 38000 vddc_index: 2
[    4.106853]         power level 1    sclk: 38000 vddc_index: 2

As you can see we currently always select the performance level uvd, 
which results in selecting the maximum sclk/dclk and vclk. Unfortunately 
neither uvd, uvd_sd nor uvd_hd allows the hardware to switch the sclk 
once selected (it's a hardware limitation of older uvd blocks).

So for all cases where this is interesting you actually always have only 
a single power level to choose from.

Christian.

>
> Grigori
>
>> Alex
>>
>>
>>> Best regards
>>> Grigori
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>>>>> Hey all,
>>>>>
>>>>> I also had a talk with Alex yesterday about post-processing issues
>>>>> when using dynamic UVD profiles and a chamge on CS ioctl
>>>>> including a flag to let user mode driver tell to the kernel which
>>>>> performance requirement it wants for post processing. A commom
>>>>> point for both discussion is to stablish the default values for these
>>>>> profiles, but probably this ioctl change would be more impacting/complex
>>>>> to implement than a sysfs entry.
>>>>>
>>>>> If a sysfs entry is anough for now I can handle the code to create it
>>>>> and, with your help, the code to setup the UVD profile requested
>>>>> through it.
>>>>>
>>>>> Is there any suggestion?
>>>>>
>>>>> Thanks all for your help,
>>>>>
>>>>>
>>>>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König
>>>>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>>>>
>>>>>      Hi guys,
>>>>>
>>>>>      to make a long story short every time I watch a movie my laptop
>>>>>      start to heat up because we always select the standard UVD power
>>>>>      profile without actually measuring if that is necessary.
>>>>>
>>>>>      Marco came up with a patch that seems to reliable measure the fps
>>>>>      send down to the kernel and so together with knowing the frame
>>>>>      size of the video should allow us to select the right UVD power
>>>>>      profile.
>>>>>
>>>>>      The problem is that Alex (unnoticed by me) completely disabled
>>>>>      selecting the UVD profiles because of some issues with advanced
>>>>>      post processing discussed on IRC. The problem seems to be that the
>>>>>      lower UVD profiles have a to low SCLK/MCLK to handle the 3D load
>>>>>      that comes with scaling, deinterlacing etc...
>>>>>
>>>>>      I unfortunately don't have time for it, cause this only affects
>>>>>      the hardware generations R600-SI and not the newest one CIK. So
>>>>>      could you guys stick together and come up with a solution?
>>>>>      Something like a sysfs entry that let's us select the minimum UVD
>>>>>      power level allowed?
>>>>>
>>>>>      I think Marco is happy to come up with a patch, we just need to
>>>>>      know what's really needed and what should be the default values.
>>>>>      I'm happy to review everything that comes out of it, just don't
>>>>>      have time to do it myself.
>>>>>
>>>>>      Happy discussion and thanks in advance,
>>>>>      Christian.
>>>>>
>>>>>      Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>>>>
>>>>>          On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>>>>          <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> wrote:
>>>>>
>>>>>              Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>>>
>>>>>                  On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>>>>                  <deathsimple at vodafone.de
>>>>>                  <mailto:deathsimple at vodafone.de>> wrote:
>>>>>
>>>>>                      Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>>>
>>>>>                          On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>>>>                          <deathsimple at vodafone.de
>>>>>                          <mailto:deathsimple at vodafone.de>> wrote:
>>>>>
>>>>>                              Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>>>
>>>>>                                  On Thu, Aug 7, 2014 at 7:33 AM,
>>>>>                                  Christian König
>>>>>                                  <deathsimple at vodafone.de
>>>>>                                  <mailto:deathsimple at vodafone.de>>
>>>>>                                  wrote:
>>>>>
>>>>>                                      From: Marco A Benatto
>>>>>                                      <marco.antonio.780 at gmail.com
>>>>>                                      <mailto:marco.antonio.780 at gmail.com>>
>>>>>
>>>>>                                      Adding a Frames Per Second
>>>>>                                      estimation logic on UVD handles
>>>>>                                      when it has being used. This
>>>>>                                      estimation is per handle basis
>>>>>                                      and will help on DPM profile
>>>>>                                      calculation.
>>>>>
>>>>>                                      v2 (chk): fix timestamp type, move
>>>>>                                      functions around and
>>>>>                                                    cleanup code a bit.
>>>>>
>>>>>                                  Will this really help much?  I thought
>>>>>                                  the problem was mainly due to
>>>>>                                  sclk and mclk for post processing.
>>>>>
>>>>>
>>>>>                              It should at least handle the UVD side for
>>>>>                              upclocking when you get a
>>>>>                              lot
>>>>>                              of
>>>>>                              streams / fps. And at on my NI the patch
>>>>>                              seems to do exactly that.
>>>>>
>>>>>                              Switching sclk and mclk for post
>>>>>                              processing is a different task, and I
>>>>>                              actually have no idea what to do with them.
>>>>>
>>>>>                          At this point we always choose the plain UVD
>>>>>                          state anyway so this
>>>>>                          patch would only take effect if we re-enabled
>>>>>                          the dynamic UVD state
>>>>>                          selection.
>>>>>
>>>>>
>>>>>                      Hui? I thought we already re-enabled the dynamic
>>>>>                      UVD state selection, but
>>>>>                      double checking this I found it disabled again.
>>>>>
>>>>>                      What was the problem with that? Looks like I
>>>>>                      somehow missed the
>>>>>                      discussion
>>>>>                      around it.
>>>>>
>>>>>                  We did, but after doing so a number of people
>>>>>                  complained about a
>>>>>                  regression on IRC because when apps like xmbc enabled
>>>>>                  post processing,
>>>>>                  performance went down.
>>>>>
>>>>>
>>>>>              That's strange, from my experience the different UVD
>>>>>              performance states only
>>>>>              affect UVDs dclk/vclk, not sclk/mclk. I need to get the
>>>>>              DPM dumps to
>>>>>              confirms this.
>>>>>
>>>>>          The sclks and mclks are usually different as well, especially
>>>>>          on APUs.
>>>>>          I can send you some examples.
>>>>>
>>>>>              You not off hand remember who complained on IRC? Finding
>>>>>              something in the
>>>>>              IRC logs is like searching for a needle in a haystack.
>>>>>
>>>>>          I don't remember off hand.  I think zgreg was involved in some
>>>>>          of the
>>>>>          discussions.
>>>>>
>>>>>          Alex
>>>>>
>>>>>              Thanks,
>>>>>              Christian.
>>>>>
>>>>>
>>>>>                  Alex
>>>>>
>>>>>
>>>>>                      Christian.
>>>>>
>>>>>
>>>>>                          For the post processing, we probably need a
>>>>>                          hint we can
>>>>>                          pass to the driver in the CS ioctl to denote
>>>>>                          what state we need.
>>>>>                          Although if we did that, this could would
>>>>>                          largely be moot.  That said,
>>>>>                          newer asics support dynamic UVD clocks so we
>>>>>                          really only need
>>>>>                          something like that for older asics and I
>>>>>                          guess VCE.
>>>>>
>>>>>                          Alex
>>>>>
>>>>>                              Christian.
>>>>>
>>>>>
>>>>>                                  Alex
>>>>>
>>>>>                                      Signed-off-by: Marco A Benatto
>>>>>                                      <marco.antonio.780 at gmail.com
>>>>>                                      <mailto:marco.antonio.780 at gmail.com>>
>>>>>                                      Signed-off-by: Christian König
>>>>>                                      <christian.koenig at amd.com
>>>>>                                      <mailto:christian.koenig at amd.com>>
>>>>>                                      ---
>>>>>
>>>>>                                       drivers/gpu/drm/radeon/radeon.h
>>>>>                                         | 10 ++++++
>>>>>
>>>>>                                       drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>                                      | 64
>>>>>                                      +++++++++++++++++++++++++++++++++----
>>>>>                                           2 files changed, 68
>>>>>                                      insertions(+), 6 deletions(-)
>>>>>
>>>>>                                      diff --git
>>>>>                                      a/drivers/gpu/drm/radeon/radeon.h
>>>>>                                      b/drivers/gpu/drm/radeon/radeon.h
>>>>>                                      index 9e1732e..e92f6cb 100644
>>>>>                                      --- a/drivers/gpu/drm/radeon/radeon.h
>>>>>                                      +++ b/drivers/gpu/drm/radeon/radeon.h
>>>>>                                      @@ -1617,6 +1617,15 @@ int
>>>>>                                      radeon_pm_get_type_index(struct
>>>>>                                      radeon_device
>>>>>                                      *rdev,
>>>>>                                           #define
>>>>>                                      RADEON_UVD_STACK_SIZE  (1024*1024)
>>>>>                                           #define RADEON_UVD_HEAP_SIZE
>>>>>                                       (1024*1024)
>>>>>
>>>>>                                      +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>>>>                                      +#define RADEON_UVD_DEFAULT_FPS 60
>>>>>                                      +
>>>>>                                      +struct radeon_uvd_fps {
>>>>>                                      +       uint64_t        timestamp;
>>>>>                                      +       uint8_t         event_index;
>>>>>                                      +       uint8_t
>>>>>                                       events[RADEON_UVD_FPS_EVENTS_MAX];
>>>>>                                      +};
>>>>>                                      +
>>>>>                                           struct radeon_uvd {
>>>>>                                                  struct radeon_bo
>>>>>                                        *vcpu_bo;
>>>>>                                                  void
>>>>>                                        *cpu_addr;
>>>>>                                      @@ -1626,6 +1635,7 @@ struct
>>>>>                                      radeon_uvd {
>>>>>                                                  struct drm_file
>>>>>                                       *filp[RADEON_MAX_UVD_HANDLES];
>>>>>                                                  unsigned
>>>>>                                        img_size[RADEON_MAX_UVD_HANDLES];
>>>>>                                                  struct delayed_work
>>>>>                                       idle_work;
>>>>>                                      +       struct radeon_uvd_fps
>>>>>                                       fps_info[RADEON_MAX_UVD_HANDLES];
>>>>>                                           };
>>>>>
>>>>>                                           int radeon_uvd_init(struct
>>>>>                                      radeon_device *rdev);
>>>>>                                      diff --git
>>>>>                                      a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>                                      b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>                                      index 6bf55ec..ef5667a 100644
>>>>>                                      ---
>>>>>                                      a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>                                      +++
>>>>>                                      b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>                                      @@ -237,6 +237,51 @@ void
>>>>>                                      radeon_uvd_force_into_uvd_segment(struct
>>>>>                                      radeon_bo *rbo)
>>>>>                                                  rbo->placement.lpfn =
>>>>>                                      (256 * 1024 * 1024) >> PAGE_SHIFT;
>>>>>                                           }
>>>>>
>>>>>                                      +static void
>>>>>                                      radeon_uvd_fps_clear_events(struct
>>>>>                                      radeon_device *rdev,
>>>>>                                      int
>>>>>                                      idx)
>>>>>                                      +{
>>>>>                                      +       struct radeon_uvd_fps *fps
>>>>>                                      = &rdev->uvd.fps_info[idx];
>>>>>                                      +       unsigned i;
>>>>>                                      +
>>>>>                                      +       fps->timestamp = jiffies_64;
>>>>>                                      +       fps->event_index = 0;
>>>>>                                      +       for (i = 0; i <
>>>>>                                      RADEON_UVD_FPS_EVENTS_MAX; i++)
>>>>>                                      +               fps->events[i] = 0;
>>>>>                                      +}
>>>>>                                      +
>>>>>                                      +static void
>>>>>                                      radeon_uvd_fps_note_event(struct
>>>>>                                      radeon_device *rdev,
>>>>>                                      int
>>>>>                                      idx)
>>>>>                                      +{
>>>>>                                      +       struct radeon_uvd_fps *fps
>>>>>                                      = &rdev->uvd.fps_info[idx];
>>>>>                                      +       uint64_t timestamp =
>>>>>                                      jiffies_64;
>>>>>                                      +       unsigned rate = 0;
>>>>>                                      +
>>>>>                                      +       uint8_t index =
>>>>>                                      fps->event_index++;
>>>>>                                      +       fps->event_index %=
>>>>>                                      RADEON_UVD_FPS_EVENTS_MAX;
>>>>>                                      +
>>>>>                                      +       rate = div64_u64(HZ,
>>>>>                                      max(timestamp - fps->timestamp,
>>>>>                                      1ULL));
>>>>>                                      +
>>>>>                                      +       fps->timestamp = timestamp;
>>>>>                                      +       fps->events[index] =
>>>>>                                      min(rate, 120u);
>>>>>                                      +}
>>>>>                                      +
>>>>>                                      +static unsigned
>>>>>                                      radeon_uvd_estimate_fps(struct
>>>>>                                      radeon_device *rdev,
>>>>>                                      int
>>>>>                                      idx)
>>>>>                                      +{
>>>>>                                      +       struct radeon_uvd_fps *fps
>>>>>                                      = &rdev->uvd.fps_info[idx];
>>>>>                                      +       unsigned i, valid = 0,
>>>>>                                      count = 0;
>>>>>                                      +
>>>>>                                      +       for (i = 0; i <
>>>>>                                      RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>>>>                                      +               /* We should
>>>>>                                      ignore zero values */
>>>>>                                      +               if (fps->events[i]
>>>>>                                      != 0) {
>>>>>                                      +                       count +=
>>>>>                                      fps->events[i];
>>>>>                                      +                       valid++;
>>>>>                                      +               }
>>>>>                                      +       }
>>>>>                                      +
>>>>>                                      +       if (valid > 0)
>>>>>                                      +               return count / valid;
>>>>>                                      +       else
>>>>>                                      +               return
>>>>>                                      RADEON_UVD_DEFAULT_FPS;
>>>>>                                      +}
>>>>>                                      +
>>>>>                                           void
>>>>>                                      radeon_uvd_free_handles(struct
>>>>>                                      radeon_device *rdev, struct
>>>>>                                      drm_file *filp)
>>>>>                                           {
>>>>>                                                  int i, r;
>>>>>                                      @@ -419,8 +464,10 @@ static int
>>>>>                                      radeon_uvd_cs_msg(struct
>>>>>                                      radeon_cs_parser
>>>>>                                      *p, struct radeon_bo *bo,
>>>>>
>>>>>                                                  /* create or decode,
>>>>>                                      validate the handle */
>>>>>                                                  for (i = 0; i <
>>>>>                                      RADEON_MAX_UVD_HANDLES; ++i) {
>>>>>                                      -               if
>>>>>                                      (atomic_read(&p->rdev->uvd.handles[i])
>>>>>                                      == handle)
>>>>>                                      +               if
>>>>>                                      (atomic_read(&p->rdev->uvd.handles[i])
>>>>>                                      == handle)
>>>>>                                      {
>>>>>                                      +
>>>>>                                       radeon_uvd_fps_note_event(p->rdev, i);
>>>>>                                                                  return 0;
>>>>>                                      +               }
>>>>>                                                  }
>>>>>
>>>>>                                                  /* handle not found
>>>>>                                      try to alloc a new one */
>>>>>                                      @@ -428,6 +475,7 @@ static int
>>>>>                                      radeon_uvd_cs_msg(struct
>>>>>                                      radeon_cs_parser
>>>>>                                      *p, struct radeon_bo *bo,
>>>>>                                                          if
>>>>>                                      (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>>>>                                      0,
>>>>>                                      handle)) {
>>>>>
>>>>>                                      p->rdev->uvd.filp[i] = p->filp;
>>>>>
>>>>>                                      p->rdev->uvd.img_size[i] = img_size;
>>>>>                                      +
>>>>>                                       radeon_uvd_fps_clear_events(p->rdev,
>>>>>                                      i);
>>>>>                                                                  return 0;
>>>>>                                                          }
>>>>>                                                  }
>>>>>                                      @@ -763,7 +811,7 @@ int
>>>>>                                      radeon_uvd_get_destroy_msg(struct
>>>>>                                      radeon_device
>>>>>                                      *rdev, int ring,
>>>>>                                           static void
>>>>>                                      radeon_uvd_count_handles(struct
>>>>>                                      radeon_device *rdev,
>>>>>
>>>>>                                             unsigned *sd, unsigned *hd)
>>>>>                                           {
>>>>>                                      -       unsigned i;
>>>>>                                      +       unsigned i, fps_rate = 0;
>>>>>
>>>>>                                                  *sd = 0;
>>>>>                                                  *hd = 0;
>>>>>                                      @@ -772,10 +820,13 @@ static void
>>>>>                                      radeon_uvd_count_handles(struct
>>>>>                                      radeon_device *rdev,
>>>>>                                                          if
>>>>>                                      (!atomic_read(&rdev->uvd.handles[i]))
>>>>>                                                                  continue;
>>>>>
>>>>>                                      -               if
>>>>>                                      (rdev->uvd.img_size[i] >= 720*576)
>>>>>                                      -                       ++(*hd);
>>>>>                                      -               else
>>>>>                                      -                       ++(*sd);
>>>>>                                      +               fps_rate =
>>>>>                                      radeon_uvd_estimate_fps(rdev, i);
>>>>>                                      +
>>>>>                                      +               if
>>>>>                                      (rdev->uvd.img_size[i] >= 720*576) {
>>>>>                                      +                       (*hd) +=
>>>>>                                      fps_rate > 30 ? 1 : 2;
>>>>>                                      +               } else {
>>>>>                                      +                       (*sd) +=
>>>>>                                      fps_rate > 30 ? 1 : 2;
>>>>>                                      +               }
>>>>>                                                  }
>>>>>                                           }
>>>>>
>>>>>                                      @@ -805,6 +856,7 @@ void
>>>>>                                      radeon_uvd_note_usage(struct
>>>>>                                      radeon_device
>>>>>                                      *rdev)
>>>>>                                                  set_clocks &=
>>>>>                                      schedule_delayed_work(&rdev->uvd.idle_work,
>>>>>
>>>>>                                      msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>>>
>>>>>                                      +
>>>>>                                                  if
>>>>>                                      ((rdev->pm.pm_method ==
>>>>>                                      PM_METHOD_DPM) &&
>>>>>                                      rdev->pm.dpm_enabled) {
>>>>>                                                          unsigned hd =
>>>>>                                      0, sd = 0;
>>>>>
>>>>>                                      radeon_uvd_count_handles(rdev,
>>>>>                                      &sd, &hd);
>>>>>                                      --
>>>>>                                      1.9.1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marco Antonio Benatto
>>>>> Linux user ID:#506236
>>>
>



More information about the dri-devel mailing list