[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2

Alex Deucher alexdeucher at gmail.com
Fri Aug 15 10:01:25 PDT 2014


On Fri, Aug 15, 2014 at 12:46 PM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> For reference, I forgot to send to you all.
>
> -------- Forwarded Message --------
> Subject: Re: [PATCH] drm/radeon: Adding UVD handle basis fps estimation v2
> Date: Fri, 15 Aug 2014 18:40:02 +0200
> From: Grigori Goronzy <greg at chown.ath.cx>
> To: Alex Deucher <alexdeucher at gmail.com>
>
> On 15.08.2014 17:45, Alex Deucher wrote:
>> Maybe I'm not understanding what you are proposing them.  UVD power
>> states fall into basically 3 categories:
>>
>> 1.  R6xx/r7xx/evergreen
>> - single UVD state
>>
>> 2.  older APUs/NI/SI
>> - multiple UVD states with different combinations of slck/mclk and uvd clocks
>>
>> 3.  CI+
>> - hw managed dynamic UVD clocks
>>
>> The second category is the problematic one and the only one where the
>> UVD state selection is a factor.  We really need better logic to
>> determine what sclks/mclk and uvd clks so we know which power state to
>> select.  The logic in the kernel is fine for determining the level of
>> uvd clks we need, but when post processing is enabled you also have to
>> take into account the levels of sclk/mclk needed for the 3D engine.
>>
>
> I mean the second category. On most systems, low (uvd_sd, uvd_hd) UVD
> power states appear to have multiple 3D power levels or a single 3D
> power level with clocks equal to the highest level in the "performance"
> power state. However on some systems, low UVD power states only have a
> single 3D power level that is noticeably slower than the maximum
> "performance" state power level. It's mostly these systems that are
> problematic, as far as I can see.
>
> My proposal is to detect this condition and disable usage of the low UVD
> power states (by default at least) on affected systems. Other systems
> can dynamically choose an UVD power state according to active streams.
> Of course this affects power consumption, but at least avoids the issue
> of crippled 3D performance in some scenarios. I don't see a good way to
> automatically select an appropriate UVD power state according to 3D
> load. Video post processing isn't the only possible scenario that needs
> a lot of 3D performance. There really isn't much that can be done.
>
> What do you think about this?

I don't think it will really help in practice.  Generally the cards
that have high gfx clks in the UVD states are desktop dGPUs so heat
and power are not really a concern.  Which basically leaves APUs and
mobile parts stuck in the highest UVD state and those are the ones
were you actually want the lower UVD states.  You don't really need
complex logic in the UVD user mode driver.  Just a flag to say post
processing is enabled, use the high performance UVD state, otherwise,
let the kernel pick based on streams.

Alex

>
> Grigori
>> Alex
>>
>>>
>>> Grigori
>>>
>>>> Alex
>>>>
>>>>
>>>>> Best regards
>>>>> Grigori
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>>>>>>> Hey all,
>>>>>>>
>>>>>>> I also had a talk with Alex yesterday about post-processing issues
>>>>>>> when using dynamic UVD profiles and a chamge on CS ioctl
>>>>>>> including a flag to let user mode driver tell to the kernel which
>>>>>>> performance requirement it wants for post processing. A commom
>>>>>>> point for both discussion is to stablish the default values for these
>>>>>>> profiles, but probably this ioctl change would be more impacting/complex
>>>>>>> to implement than a sysfs entry.
>>>>>>>
>>>>>>> If a sysfs entry is anough for now I can handle the code to create it
>>>>>>> and, with your help, the code to setup the UVD profile requested
>>>>>>> through it.
>>>>>>>
>>>>>>> Is there any suggestion?
>>>>>>>
>>>>>>> Thanks all for your help,
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König
>>>>>>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>>>>>>
>>>>>>>     Hi guys,
>>>>>>>
>>>>>>>     to make a long story short every time I watch a movie my laptop
>>>>>>>     start to heat up because we always select the standard UVD power
>>>>>>>     profile without actually measuring if that is necessary.
>>>>>>>
>>>>>>>     Marco came up with a patch that seems to reliable measure the fps
>>>>>>>     send down to the kernel and so together with knowing the frame
>>>>>>>     size of the video should allow us to select the right UVD power
>>>>>>>     profile.
>>>>>>>
>>>>>>>     The problem is that Alex (unnoticed by me) completely disabled
>>>>>>>     selecting the UVD profiles because of some issues with advanced
>>>>>>>     post processing discussed on IRC. The problem seems to be that the
>>>>>>>     lower UVD profiles have a to low SCLK/MCLK to handle the 3D load
>>>>>>>     that comes with scaling, deinterlacing etc...
>>>>>>>
>>>>>>>     I unfortunately don't have time for it, cause this only affects
>>>>>>>     the hardware generations R600-SI and not the newest one CIK. So
>>>>>>>     could you guys stick together and come up with a solution?
>>>>>>>     Something like a sysfs entry that let's us select the minimum UVD
>>>>>>>     power level allowed?
>>>>>>>
>>>>>>>     I think Marco is happy to come up with a patch, we just need to
>>>>>>>     know what's really needed and what should be the default values.
>>>>>>>     I'm happy to review everything that comes out of it, just don't
>>>>>>>     have time to do it myself.
>>>>>>>
>>>>>>>     Happy discussion and thanks in advance,
>>>>>>>     Christian.
>>>>>>>
>>>>>>>     Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>>>>>>
>>>>>>>         On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>>>>>>         <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> wrote:
>>>>>>>
>>>>>>>             Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>>>>>
>>>>>>>                 On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>>>>>>                 <deathsimple at vodafone.de
>>>>>>>                 <mailto:deathsimple at vodafone.de>> wrote:
>>>>>>>
>>>>>>>                     Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>>>>>
>>>>>>>                         On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>>>>>>                         <deathsimple at vodafone.de
>>>>>>>                         <mailto:deathsimple at vodafone.de>> wrote:
>>>>>>>
>>>>>>>                             Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>>>>>
>>>>>>>                                 On Thu, Aug 7, 2014 at 7:33 AM,
>>>>>>>                                 Christian König
>>>>>>>                                 <deathsimple at vodafone.de
>>>>>>>                                 <mailto:deathsimple at vodafone.de>>
>>>>>>>                                 wrote:
>>>>>>>
>>>>>>>                                     From: Marco A Benatto
>>>>>>>                                     <marco.antonio.780 at gmail.com
>>>>>>>                                     <mailto:marco.antonio.780 at gmail.com>>
>>>>>>>
>>>>>>>                                     Adding a Frames Per Second
>>>>>>>                                     estimation logic on UVD handles
>>>>>>>                                     when it has being used. This
>>>>>>>                                     estimation is per handle basis
>>>>>>>                                     and will help on DPM profile
>>>>>>>                                     calculation.
>>>>>>>
>>>>>>>                                     v2 (chk): fix timestamp type, move
>>>>>>>                                     functions around and
>>>>>>>                                                   cleanup code a bit.
>>>>>>>
>>>>>>>                                 Will this really help much?  I thought
>>>>>>>                                 the problem was mainly due to
>>>>>>>                                 sclk and mclk for post processing.
>>>>>>>
>>>>>>>
>>>>>>>                             It should at least handle the UVD side for
>>>>>>>                             upclocking when you get a
>>>>>>>                             lot
>>>>>>>                             of
>>>>>>>                             streams / fps. And at on my NI the patch
>>>>>>>                             seems to do exactly that.
>>>>>>>
>>>>>>>                             Switching sclk and mclk for post
>>>>>>>                             processing is a different task, and I
>>>>>>>                             actually have no idea what to do with them.
>>>>>>>
>>>>>>>                         At this point we always choose the plain UVD
>>>>>>>                         state anyway so this
>>>>>>>                         patch would only take effect if we re-enabled
>>>>>>>                         the dynamic UVD state
>>>>>>>                         selection.
>>>>>>>
>>>>>>>
>>>>>>>                     Hui? I thought we already re-enabled the dynamic
>>>>>>>                     UVD state selection, but
>>>>>>>                     double checking this I found it disabled again.
>>>>>>>
>>>>>>>                     What was the problem with that? Looks like I
>>>>>>>                     somehow missed the
>>>>>>>                     discussion
>>>>>>>                     around it.
>>>>>>>
>>>>>>>                 We did, but after doing so a number of people
>>>>>>>                 complained about a
>>>>>>>                 regression on IRC because when apps like xmbc enabled
>>>>>>>                 post processing,
>>>>>>>                 performance went down.
>>>>>>>
>>>>>>>
>>>>>>>             That's strange, from my experience the different UVD
>>>>>>>             performance states only
>>>>>>>             affect UVDs dclk/vclk, not sclk/mclk. I need to get the
>>>>>>>             DPM dumps to
>>>>>>>             confirms this.
>>>>>>>
>>>>>>>         The sclks and mclks are usually different as well, especially
>>>>>>>         on APUs.
>>>>>>>         I can send you some examples.
>>>>>>>
>>>>>>>             You not off hand remember who complained on IRC? Finding
>>>>>>>             something in the
>>>>>>>             IRC logs is like searching for a needle in a haystack.
>>>>>>>
>>>>>>>         I don't remember off hand.  I think zgreg was involved in some
>>>>>>>         of the
>>>>>>>         discussions.
>>>>>>>
>>>>>>>         Alex
>>>>>>>
>>>>>>>             Thanks,
>>>>>>>             Christian.
>>>>>>>
>>>>>>>
>>>>>>>                 Alex
>>>>>>>
>>>>>>>
>>>>>>>                     Christian.
>>>>>>>
>>>>>>>
>>>>>>>                         For the post processing, we probably need a
>>>>>>>                         hint we can
>>>>>>>                         pass to the driver in the CS ioctl to denote
>>>>>>>                         what state we need.
>>>>>>>                         Although if we did that, this could would
>>>>>>>                         largely be moot.  That said,
>>>>>>>                         newer asics support dynamic UVD clocks so we
>>>>>>>                         really only need
>>>>>>>                         something like that for older asics and I
>>>>>>>                         guess VCE.
>>>>>>>
>>>>>>>                         Alex
>>>>>>>
>>>>>>>                             Christian.
>>>>>>>
>>>>>>>
>>>>>>>                                 Alex
>>>>>>>
>>>>>>>                                     Signed-off-by: Marco A Benatto
>>>>>>>                                     <marco.antonio.780 at gmail.com
>>>>>>>                                     <mailto:marco.antonio.780 at gmail.com>>
>>>>>>>                                     Signed-off-by: Christian König
>>>>>>>                                     <christian.koenig at amd.com
>>>>>>>                                     <mailto:christian.koenig at amd.com>>
>>>>>>>                                     ---
>>>>>>>
>>>>>>>                                      drivers/gpu/drm/radeon/radeon.h
>>>>>>>                                        | 10 ++++++
>>>>>>>
>>>>>>>                                      drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>                                     | 64
>>>>>>>                                     +++++++++++++++++++++++++++++++++----
>>>>>>>                                          2 files changed, 68
>>>>>>>                                     insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>>                                     diff --git
>>>>>>>                                     a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>                                     b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>                                     index 9e1732e..e92f6cb 100644
>>>>>>>                                     --- a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>                                     +++ b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>                                     @@ -1617,6 +1617,15 @@ int
>>>>>>>                                     radeon_pm_get_type_index(struct
>>>>>>>                                     radeon_device
>>>>>>>                                     *rdev,
>>>>>>>                                          #define
>>>>>>>                                     RADEON_UVD_STACK_SIZE  (1024*1024)
>>>>>>>                                          #define RADEON_UVD_HEAP_SIZE
>>>>>>>                                      (1024*1024)
>>>>>>>
>>>>>>>                                     +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>>>>>>                                     +#define RADEON_UVD_DEFAULT_FPS 60
>>>>>>>                                     +
>>>>>>>                                     +struct radeon_uvd_fps {
>>>>>>>                                     +       uint64_t        timestamp;
>>>>>>>                                     +       uint8_t         event_index;
>>>>>>>                                     +       uint8_t
>>>>>>>                                      events[RADEON_UVD_FPS_EVENTS_MAX];
>>>>>>>                                     +};
>>>>>>>                                     +
>>>>>>>                                          struct radeon_uvd {
>>>>>>>                                                 struct radeon_bo
>>>>>>>                                       *vcpu_bo;
>>>>>>>                                                 void
>>>>>>>                                       *cpu_addr;
>>>>>>>                                     @@ -1626,6 +1635,7 @@ struct
>>>>>>>                                     radeon_uvd {
>>>>>>>                                                 struct drm_file
>>>>>>>                                      *filp[RADEON_MAX_UVD_HANDLES];
>>>>>>>                                                 unsigned
>>>>>>>                                       img_size[RADEON_MAX_UVD_HANDLES];
>>>>>>>                                                 struct delayed_work
>>>>>>>                                      idle_work;
>>>>>>>                                     +       struct radeon_uvd_fps
>>>>>>>                                      fps_info[RADEON_MAX_UVD_HANDLES];
>>>>>>>                                          };
>>>>>>>
>>>>>>>                                          int radeon_uvd_init(struct
>>>>>>>                                     radeon_device *rdev);
>>>>>>>                                     diff --git
>>>>>>>                                     a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>                                     b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>                                     index 6bf55ec..ef5667a 100644
>>>>>>>                                     ---
>>>>>>>                                     a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>                                     +++
>>>>>>>                                     b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>                                     @@ -237,6 +237,51 @@ void
>>>>>>>                                     radeon_uvd_force_into_uvd_segment(struct
>>>>>>>                                     radeon_bo *rbo)
>>>>>>>                                                 rbo->placement.lpfn =
>>>>>>>                                     (256 * 1024 * 1024) >> PAGE_SHIFT;
>>>>>>>                                          }
>>>>>>>
>>>>>>>                                     +static void
>>>>>>>                                     radeon_uvd_fps_clear_events(struct
>>>>>>>                                     radeon_device *rdev,
>>>>>>>                                     int
>>>>>>>                                     idx)
>>>>>>>                                     +{
>>>>>>>                                     +       struct radeon_uvd_fps *fps
>>>>>>>                                     = &rdev->uvd.fps_info[idx];
>>>>>>>                                     +       unsigned i;
>>>>>>>                                     +
>>>>>>>                                     +       fps->timestamp = jiffies_64;
>>>>>>>                                     +       fps->event_index = 0;
>>>>>>>                                     +       for (i = 0; i <
>>>>>>>                                     RADEON_UVD_FPS_EVENTS_MAX; i++)
>>>>>>>                                     +               fps->events[i] = 0;
>>>>>>>                                     +}
>>>>>>>                                     +
>>>>>>>                                     +static void
>>>>>>>                                     radeon_uvd_fps_note_event(struct
>>>>>>>                                     radeon_device *rdev,
>>>>>>>                                     int
>>>>>>>                                     idx)
>>>>>>>                                     +{
>>>>>>>                                     +       struct radeon_uvd_fps *fps
>>>>>>>                                     = &rdev->uvd.fps_info[idx];
>>>>>>>                                     +       uint64_t timestamp =
>>>>>>>                                     jiffies_64;
>>>>>>>                                     +       unsigned rate = 0;
>>>>>>>                                     +
>>>>>>>                                     +       uint8_t index =
>>>>>>>                                     fps->event_index++;
>>>>>>>                                     +       fps->event_index %=
>>>>>>>                                     RADEON_UVD_FPS_EVENTS_MAX;
>>>>>>>                                     +
>>>>>>>                                     +       rate = div64_u64(HZ,
>>>>>>>                                     max(timestamp - fps->timestamp,
>>>>>>>                                     1ULL));
>>>>>>>                                     +
>>>>>>>                                     +       fps->timestamp = timestamp;
>>>>>>>                                     +       fps->events[index] =
>>>>>>>                                     min(rate, 120u);
>>>>>>>                                     +}
>>>>>>>                                     +
>>>>>>>                                     +static unsigned
>>>>>>>                                     radeon_uvd_estimate_fps(struct
>>>>>>>                                     radeon_device *rdev,
>>>>>>>                                     int
>>>>>>>                                     idx)
>>>>>>>                                     +{
>>>>>>>                                     +       struct radeon_uvd_fps *fps
>>>>>>>                                     = &rdev->uvd.fps_info[idx];
>>>>>>>                                     +       unsigned i, valid = 0,
>>>>>>>                                     count = 0;
>>>>>>>                                     +
>>>>>>>                                     +       for (i = 0; i <
>>>>>>>                                     RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>>>>>>                                     +               /* We should
>>>>>>>                                     ignore zero values */
>>>>>>>                                     +               if (fps->events[i]
>>>>>>>                                     != 0) {
>>>>>>>                                     +                       count +=
>>>>>>>                                     fps->events[i];
>>>>>>>                                     +                       valid++;
>>>>>>>                                     +               }
>>>>>>>                                     +       }
>>>>>>>                                     +
>>>>>>>                                     +       if (valid > 0)
>>>>>>>                                     +               return count / valid;
>>>>>>>                                     +       else
>>>>>>>                                     +               return
>>>>>>>                                     RADEON_UVD_DEFAULT_FPS;
>>>>>>>                                     +}
>>>>>>>                                     +
>>>>>>>                                          void
>>>>>>>                                     radeon_uvd_free_handles(struct
>>>>>>>                                     radeon_device *rdev, struct
>>>>>>>                                     drm_file *filp)
>>>>>>>                                          {
>>>>>>>                                                 int i, r;
>>>>>>>                                     @@ -419,8 +464,10 @@ static int
>>>>>>>                                     radeon_uvd_cs_msg(struct
>>>>>>>                                     radeon_cs_parser
>>>>>>>                                     *p, struct radeon_bo *bo,
>>>>>>>
>>>>>>>                                                 /* create or decode,
>>>>>>>                                     validate the handle */
>>>>>>>                                                 for (i = 0; i <
>>>>>>>                                     RADEON_MAX_UVD_HANDLES; ++i) {
>>>>>>>                                     -               if
>>>>>>>                                     (atomic_read(&p->rdev->uvd.handles[i])
>>>>>>>                                     == handle)
>>>>>>>                                     +               if
>>>>>>>                                     (atomic_read(&p->rdev->uvd.handles[i])
>>>>>>>                                     == handle)
>>>>>>>                                     {
>>>>>>>                                     +
>>>>>>>                                      radeon_uvd_fps_note_event(p->rdev, i);
>>>>>>>                                                                 return 0;
>>>>>>>                                     +               }
>>>>>>>                                                 }
>>>>>>>
>>>>>>>                                                 /* handle not found
>>>>>>>                                     try to alloc a new one */
>>>>>>>                                     @@ -428,6 +475,7 @@ static int
>>>>>>>                                     radeon_uvd_cs_msg(struct
>>>>>>>                                     radeon_cs_parser
>>>>>>>                                     *p, struct radeon_bo *bo,
>>>>>>>                                                         if
>>>>>>>                                     (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>>>>>>                                     0,
>>>>>>>                                     handle)) {
>>>>>>>
>>>>>>>                                     p->rdev->uvd.filp[i] = p->filp;
>>>>>>>
>>>>>>>                                     p->rdev->uvd.img_size[i] = img_size;
>>>>>>>                                     +
>>>>>>>                                      radeon_uvd_fps_clear_events(p->rdev,
>>>>>>>                                     i);
>>>>>>>                                                                 return 0;
>>>>>>>                                                         }
>>>>>>>                                                 }
>>>>>>>                                     @@ -763,7 +811,7 @@ int
>>>>>>>                                     radeon_uvd_get_destroy_msg(struct
>>>>>>>                                     radeon_device
>>>>>>>                                     *rdev, int ring,
>>>>>>>                                          static void
>>>>>>>                                     radeon_uvd_count_handles(struct
>>>>>>>                                     radeon_device *rdev,
>>>>>>>
>>>>>>>                                            unsigned *sd, unsigned *hd)
>>>>>>>                                          {
>>>>>>>                                     -       unsigned i;
>>>>>>>                                     +       unsigned i, fps_rate = 0;
>>>>>>>
>>>>>>>                                                 *sd = 0;
>>>>>>>                                                 *hd = 0;
>>>>>>>                                     @@ -772,10 +820,13 @@ static void
>>>>>>>                                     radeon_uvd_count_handles(struct
>>>>>>>                                     radeon_device *rdev,
>>>>>>>                                                         if
>>>>>>>                                     (!atomic_read(&rdev->uvd.handles[i]))
>>>>>>>                                                                 continue;
>>>>>>>
>>>>>>>                                     -               if
>>>>>>>                                     (rdev->uvd.img_size[i] >= 720*576)
>>>>>>>                                     -                       ++(*hd);
>>>>>>>                                     -               else
>>>>>>>                                     -                       ++(*sd);
>>>>>>>                                     +               fps_rate =
>>>>>>>                                     radeon_uvd_estimate_fps(rdev, i);
>>>>>>>                                     +
>>>>>>>                                     +               if
>>>>>>>                                     (rdev->uvd.img_size[i] >= 720*576) {
>>>>>>>                                     +                       (*hd) +=
>>>>>>>                                     fps_rate > 30 ? 1 : 2;
>>>>>>>                                     +               } else {
>>>>>>>                                     +                       (*sd) +=
>>>>>>>                                     fps_rate > 30 ? 1 : 2;
>>>>>>>                                     +               }
>>>>>>>                                                 }
>>>>>>>                                          }
>>>>>>>
>>>>>>>                                     @@ -805,6 +856,7 @@ void
>>>>>>>                                     radeon_uvd_note_usage(struct
>>>>>>>                                     radeon_device
>>>>>>>                                     *rdev)
>>>>>>>                                                 set_clocks &=
>>>>>>>                                     schedule_delayed_work(&rdev->uvd.idle_work,
>>>>>>>
>>>>>>>                                     msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>>>>>
>>>>>>>                                     +
>>>>>>>                                                 if
>>>>>>>                                     ((rdev->pm.pm_method ==
>>>>>>>                                     PM_METHOD_DPM) &&
>>>>>>>                                     rdev->pm.dpm_enabled) {
>>>>>>>                                                         unsigned hd =
>>>>>>>                                     0, sd = 0;
>>>>>>>
>>>>>>>                                     radeon_uvd_count_handles(rdev,
>>>>>>>                                     &sd, &hd);
>>>>>>>                                     --
>>>>>>>                                     1.9.1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Marco Antonio Benatto
>>>>>>> Linux user ID:#506236
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
>
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>


More information about the dri-devel mailing list