[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2

Alex Deucher alexdeucher at gmail.com
Fri Aug 15 08:26:03 PDT 2014


On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> On 15.08.2014 16:11, Christian König wrote:
>> Hi Marco,
>>
>> the problem with an CS ioctl flag is that we sometimes don't know how
>> much SCLK/MCLK boost is needed, for example when we do post processing
>> in the player using OpenGL and UVD decoding with VDPAU. In this case
>> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so
>> can't give that info to the kernel either.
>>
>
> Maybe it's an acceptable workaround to simply disable dynamic UVD state
> selection in case the UVD states only have a single power level. That
> will avoid the performance issues on affected systems, while still
> allowing dynamic UVD states on systems that have a saner DPM table
> setup. I think it is mosly older systems that suffer from this.
>

That is exactly what we do now.

Alex


> Best regards
> Grigori
>
>> Regards,
>> Christian.
>>
>> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>>> Hey all,
>>>
>>> I also had a talk with Alex yesterday about post-processing issues
>>> when using dynamic UVD profiles and a chamge on CS ioctl
>>> including a flag to let user mode driver tell to the kernel which
>>> performance requirement it wants for post processing. A commom
>>> point for both discussion is to stablish the default values for these
>>> profiles, but probably this ioctl change would be more impacting/complex
>>> to implement than a sysfs entry.
>>>
>>> If a sysfs entry is anough for now I can handle the code to create it
>>> and, with your help, the code to setup the UVD profile requested
>>> through it.
>>>
>>> Is there any suggestion?
>>>
>>> Thanks all for your help,
>>>
>>>
>>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König
>>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>>
>>>     Hi guys,
>>>
>>>     to make a long story short every time I watch a movie my laptop
>>>     start to heat up because we always select the standard UVD power
>>>     profile without actually measuring if that is necessary.
>>>
>>>     Marco came up with a patch that seems to reliable measure the fps
>>>     send down to the kernel and so together with knowing the frame
>>>     size of the video should allow us to select the right UVD power
>>>     profile.
>>>
>>>     The problem is that Alex (unnoticed by me) completely disabled
>>>     selecting the UVD profiles because of some issues with advanced
>>>     post processing discussed on IRC. The problem seems to be that the
>>>     lower UVD profiles have a to low SCLK/MCLK to handle the 3D load
>>>     that comes with scaling, deinterlacing etc...
>>>
>>>     I unfortunately don't have time for it, cause this only affects
>>>     the hardware generations R600-SI and not the newest one CIK. So
>>>     could you guys stick together and come up with a solution?
>>>     Something like a sysfs entry that let's us select the minimum UVD
>>>     power level allowed?
>>>
>>>     I think Marco is happy to come up with a patch, we just need to
>>>     know what's really needed and what should be the default values.
>>>     I'm happy to review everything that comes out of it, just don't
>>>     have time to do it myself.
>>>
>>>     Happy discussion and thanks in advance,
>>>     Christian.
>>>
>>>     Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>>
>>>         On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>>         <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>>             Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>
>>>                 On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>>                 <deathsimple at vodafone.de
>>>                 <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>>                     Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>
>>>                         On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>>                         <deathsimple at vodafone.de
>>>                         <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>>                             Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>
>>>                                 On Thu, Aug 7, 2014 at 7:33 AM,
>>>                                 Christian König
>>>                                 <deathsimple at vodafone.de
>>>                                 <mailto:deathsimple at vodafone.de>>
>>>                                 wrote:
>>>
>>>                                     From: Marco A Benatto
>>>                                     <marco.antonio.780 at gmail.com
>>>                                     <mailto:marco.antonio.780 at gmail.com>>
>>>
>>>                                     Adding a Frames Per Second
>>>                                     estimation logic on UVD handles
>>>                                     when it has being used. This
>>>                                     estimation is per handle basis
>>>                                     and will help on DPM profile
>>>                                     calculation.
>>>
>>>                                     v2 (chk): fix timestamp type, move
>>>                                     functions around and
>>>                                                   cleanup code a bit.
>>>
>>>                                 Will this really help much?  I thought
>>>                                 the problem was mainly due to
>>>                                 sclk and mclk for post processing.
>>>
>>>
>>>                             It should at least handle the UVD side for
>>>                             upclocking when you get a
>>>                             lot
>>>                             of
>>>                             streams / fps. And at on my NI the patch
>>>                             seems to do exactly that.
>>>
>>>                             Switching sclk and mclk for post
>>>                             processing is a different task, and I
>>>                             actually have no idea what to do with them.
>>>
>>>                         At this point we always choose the plain UVD
>>>                         state anyway so this
>>>                         patch would only take effect if we re-enabled
>>>                         the dynamic UVD state
>>>                         selection.
>>>
>>>
>>>                     Hui? I thought we already re-enabled the dynamic
>>>                     UVD state selection, but
>>>                     double checking this I found it disabled again.
>>>
>>>                     What was the problem with that? Looks like I
>>>                     somehow missed the
>>>                     discussion
>>>                     around it.
>>>
>>>                 We did, but after doing so a number of people
>>>                 complained about a
>>>                 regression on IRC because when apps like xmbc enabled
>>>                 post processing,
>>>                 performance went down.
>>>
>>>
>>>             That's strange, from my experience the different UVD
>>>             performance states only
>>>             affect UVDs dclk/vclk, not sclk/mclk. I need to get the
>>>             DPM dumps to
>>>             confirms this.
>>>
>>>         The sclks and mclks are usually different as well, especially
>>>         on APUs.
>>>         I can send you some examples.
>>>
>>>             You not off hand remember who complained on IRC? Finding
>>>             something in the
>>>             IRC logs is like searching for a needle in a haystack.
>>>
>>>         I don't remember off hand.  I think zgreg was involved in some
>>>         of the
>>>         discussions.
>>>
>>>         Alex
>>>
>>>             Thanks,
>>>             Christian.
>>>
>>>
>>>                 Alex
>>>
>>>
>>>                     Christian.
>>>
>>>
>>>                         For the post processing, we probably need a
>>>                         hint we can
>>>                         pass to the driver in the CS ioctl to denote
>>>                         what state we need.
>>>                         Although if we did that, this could would
>>>                         largely be moot.  That said,
>>>                         newer asics support dynamic UVD clocks so we
>>>                         really only need
>>>                         something like that for older asics and I
>>>                         guess VCE.
>>>
>>>                         Alex
>>>
>>>                             Christian.
>>>
>>>
>>>                                 Alex
>>>
>>>                                     Signed-off-by: Marco A Benatto
>>>                                     <marco.antonio.780 at gmail.com
>>>                                     <mailto:marco.antonio.780 at gmail.com>>
>>>                                     Signed-off-by: Christian König
>>>                                     <christian.koenig at amd.com
>>>                                     <mailto:christian.koenig at amd.com>>
>>>                                     ---
>>>
>>>                                      drivers/gpu/drm/radeon/radeon.h
>>>                                        | 10 ++++++
>>>
>>>                                      drivers/gpu/drm/radeon/radeon_uvd.c
>>>                                     | 64
>>>                                     +++++++++++++++++++++++++++++++++----
>>>                                          2 files changed, 68
>>>                                     insertions(+), 6 deletions(-)
>>>
>>>                                     diff --git
>>>                                     a/drivers/gpu/drm/radeon/radeon.h
>>>                                     b/drivers/gpu/drm/radeon/radeon.h
>>>                                     index 9e1732e..e92f6cb 100644
>>>                                     --- a/drivers/gpu/drm/radeon/radeon.h
>>>                                     +++ b/drivers/gpu/drm/radeon/radeon.h
>>>                                     @@ -1617,6 +1617,15 @@ int
>>>                                     radeon_pm_get_type_index(struct
>>>                                     radeon_device
>>>                                     *rdev,
>>>                                          #define
>>>                                     RADEON_UVD_STACK_SIZE  (1024*1024)
>>>                                          #define RADEON_UVD_HEAP_SIZE
>>>                                      (1024*1024)
>>>
>>>                                     +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>>                                     +#define RADEON_UVD_DEFAULT_FPS 60
>>>                                     +
>>>                                     +struct radeon_uvd_fps {
>>>                                     +       uint64_t        timestamp;
>>>                                     +       uint8_t         event_index;
>>>                                     +       uint8_t
>>>                                      events[RADEON_UVD_FPS_EVENTS_MAX];
>>>                                     +};
>>>                                     +
>>>                                          struct radeon_uvd {
>>>                                                 struct radeon_bo
>>>                                       *vcpu_bo;
>>>                                                 void
>>>                                       *cpu_addr;
>>>                                     @@ -1626,6 +1635,7 @@ struct
>>>                                     radeon_uvd {
>>>                                                 struct drm_file
>>>                                      *filp[RADEON_MAX_UVD_HANDLES];
>>>                                                 unsigned
>>>                                       img_size[RADEON_MAX_UVD_HANDLES];
>>>                                                 struct delayed_work
>>>                                      idle_work;
>>>                                     +       struct radeon_uvd_fps
>>>                                      fps_info[RADEON_MAX_UVD_HANDLES];
>>>                                          };
>>>
>>>                                          int radeon_uvd_init(struct
>>>                                     radeon_device *rdev);
>>>                                     diff --git
>>>                                     a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>                                     b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>                                     index 6bf55ec..ef5667a 100644
>>>                                     ---
>>>                                     a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>                                     +++
>>>                                     b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>                                     @@ -237,6 +237,51 @@ void
>>>                                     radeon_uvd_force_into_uvd_segment(struct
>>>                                     radeon_bo *rbo)
>>>                                                 rbo->placement.lpfn =
>>>                                     (256 * 1024 * 1024) >> PAGE_SHIFT;
>>>                                          }
>>>
>>>                                     +static void
>>>                                     radeon_uvd_fps_clear_events(struct
>>>                                     radeon_device *rdev,
>>>                                     int
>>>                                     idx)
>>>                                     +{
>>>                                     +       struct radeon_uvd_fps *fps
>>>                                     = &rdev->uvd.fps_info[idx];
>>>                                     +       unsigned i;
>>>                                     +
>>>                                     +       fps->timestamp = jiffies_64;
>>>                                     +       fps->event_index = 0;
>>>                                     +       for (i = 0; i <
>>>                                     RADEON_UVD_FPS_EVENTS_MAX; i++)
>>>                                     +               fps->events[i] = 0;
>>>                                     +}
>>>                                     +
>>>                                     +static void
>>>                                     radeon_uvd_fps_note_event(struct
>>>                                     radeon_device *rdev,
>>>                                     int
>>>                                     idx)
>>>                                     +{
>>>                                     +       struct radeon_uvd_fps *fps
>>>                                     = &rdev->uvd.fps_info[idx];
>>>                                     +       uint64_t timestamp =
>>>                                     jiffies_64;
>>>                                     +       unsigned rate = 0;
>>>                                     +
>>>                                     +       uint8_t index =
>>>                                     fps->event_index++;
>>>                                     +       fps->event_index %=
>>>                                     RADEON_UVD_FPS_EVENTS_MAX;
>>>                                     +
>>>                                     +       rate = div64_u64(HZ,
>>>                                     max(timestamp - fps->timestamp,
>>>                                     1ULL));
>>>                                     +
>>>                                     +       fps->timestamp = timestamp;
>>>                                     +       fps->events[index] =
>>>                                     min(rate, 120u);
>>>                                     +}
>>>                                     +
>>>                                     +static unsigned
>>>                                     radeon_uvd_estimate_fps(struct
>>>                                     radeon_device *rdev,
>>>                                     int
>>>                                     idx)
>>>                                     +{
>>>                                     +       struct radeon_uvd_fps *fps
>>>                                     = &rdev->uvd.fps_info[idx];
>>>                                     +       unsigned i, valid = 0,
>>>                                     count = 0;
>>>                                     +
>>>                                     +       for (i = 0; i <
>>>                                     RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>>                                     +               /* We should
>>>                                     ignore zero values */
>>>                                     +               if (fps->events[i]
>>>                                     != 0) {
>>>                                     +                       count +=
>>>                                     fps->events[i];
>>>                                     +                       valid++;
>>>                                     +               }
>>>                                     +       }
>>>                                     +
>>>                                     +       if (valid > 0)
>>>                                     +               return count / valid;
>>>                                     +       else
>>>                                     +               return
>>>                                     RADEON_UVD_DEFAULT_FPS;
>>>                                     +}
>>>                                     +
>>>                                          void
>>>                                     radeon_uvd_free_handles(struct
>>>                                     radeon_device *rdev, struct
>>>                                     drm_file *filp)
>>>                                          {
>>>                                                 int i, r;
>>>                                     @@ -419,8 +464,10 @@ static int
>>>                                     radeon_uvd_cs_msg(struct
>>>                                     radeon_cs_parser
>>>                                     *p, struct radeon_bo *bo,
>>>
>>>                                                 /* create or decode,
>>>                                     validate the handle */
>>>                                                 for (i = 0; i <
>>>                                     RADEON_MAX_UVD_HANDLES; ++i) {
>>>                                     -               if
>>>                                     (atomic_read(&p->rdev->uvd.handles[i])
>>>                                     == handle)
>>>                                     +               if
>>>                                     (atomic_read(&p->rdev->uvd.handles[i])
>>>                                     == handle)
>>>                                     {
>>>                                     +
>>>                                      radeon_uvd_fps_note_event(p->rdev, i);
>>>                                                                 return 0;
>>>                                     +               }
>>>                                                 }
>>>
>>>                                                 /* handle not found
>>>                                     try to alloc a new one */
>>>                                     @@ -428,6 +475,7 @@ static int
>>>                                     radeon_uvd_cs_msg(struct
>>>                                     radeon_cs_parser
>>>                                     *p, struct radeon_bo *bo,
>>>                                                         if
>>>                                     (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>>                                     0,
>>>                                     handle)) {
>>>
>>>                                     p->rdev->uvd.filp[i] = p->filp;
>>>
>>>                                     p->rdev->uvd.img_size[i] = img_size;
>>>                                     +
>>>                                      radeon_uvd_fps_clear_events(p->rdev,
>>>                                     i);
>>>                                                                 return 0;
>>>                                                         }
>>>                                                 }
>>>                                     @@ -763,7 +811,7 @@ int
>>>                                     radeon_uvd_get_destroy_msg(struct
>>>                                     radeon_device
>>>                                     *rdev, int ring,
>>>                                          static void
>>>                                     radeon_uvd_count_handles(struct
>>>                                     radeon_device *rdev,
>>>
>>>                                            unsigned *sd, unsigned *hd)
>>>                                          {
>>>                                     -       unsigned i;
>>>                                     +       unsigned i, fps_rate = 0;
>>>
>>>                                                 *sd = 0;
>>>                                                 *hd = 0;
>>>                                     @@ -772,10 +820,13 @@ static void
>>>                                     radeon_uvd_count_handles(struct
>>>                                     radeon_device *rdev,
>>>                                                         if
>>>                                     (!atomic_read(&rdev->uvd.handles[i]))
>>>                                                                 continue;
>>>
>>>                                     -               if
>>>                                     (rdev->uvd.img_size[i] >= 720*576)
>>>                                     -                       ++(*hd);
>>>                                     -               else
>>>                                     -                       ++(*sd);
>>>                                     +               fps_rate =
>>>                                     radeon_uvd_estimate_fps(rdev, i);
>>>                                     +
>>>                                     +               if
>>>                                     (rdev->uvd.img_size[i] >= 720*576) {
>>>                                     +                       (*hd) +=
>>>                                     fps_rate > 30 ? 1 : 2;
>>>                                     +               } else {
>>>                                     +                       (*sd) +=
>>>                                     fps_rate > 30 ? 1 : 2;
>>>                                     +               }
>>>                                                 }
>>>                                          }
>>>
>>>                                     @@ -805,6 +856,7 @@ void
>>>                                     radeon_uvd_note_usage(struct
>>>                                     radeon_device
>>>                                     *rdev)
>>>                                                 set_clocks &=
>>>                                     schedule_delayed_work(&rdev->uvd.idle_work,
>>>
>>>                                     msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>
>>>                                     +
>>>                                                 if
>>>                                     ((rdev->pm.pm_method ==
>>>                                     PM_METHOD_DPM) &&
>>>                                     rdev->pm.dpm_enabled) {
>>>                                                         unsigned hd =
>>>                                     0, sd = 0;
>>>
>>>                                     radeon_uvd_count_handles(rdev,
>>>                                     &sd, &hd);
>>>                                     --
>>>                                     1.9.1
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Marco Antonio Benatto
>>> Linux user ID:#506236
>>
>
>


More information about the dri-devel mailing list