[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2

Alex Deucher alexdeucher at gmail.com
Fri Aug 15 08:30:14 PDT 2014


On Fri, Aug 15, 2014 at 10:11 AM, Christian König
<christian.koenig at amd.com> wrote:
> Hi Marco,
>
> the problem with an CS ioctl flag is that we sometimes don't know how much
> SCLK/MCLK boost is needed, for example when we do post processing in the
> player using OpenGL and UVD decoding with VDPAU. In this case VDPAU don't
> has the slightest idea how high SCLK/MCLK must be and so can't give that
> info to the kernel either.

It could be as simple as just setting a flag when any kind of post
processing is enabled so that the kernel selects the highest UVD
performance state, otherwise, let the kernel decide based on streams.

Alex


>
> Regards,
> Christian.
>
> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>
> Hey all,
>
> I also had a talk with Alex yesterday about post-processing issues when
> using dynamic UVD profiles and a chamge on CS ioctl
> including a flag to let user mode driver tell to the kernel which
> performance requirement it wants for post processing. A commom
> point for both discussion is to stablish the default values for these
> profiles, but probably this ioctl change would be more impacting/complex
> to implement than a sysfs entry.
>
> If a sysfs entry is anough for now I can handle the code to create it and,
> with your help, the code to setup the UVD profile requested through it.
>
> Is there any suggestion?
>
> Thanks all for your help,
>
>
> On Fri, Aug 15, 2014 at 5:48 AM, Christian König <christian.koenig at amd.com>
> wrote:
>>
>> Hi guys,
>>
>> to make a long story short every time I watch a movie my laptop start to
>> heat up because we always select the standard UVD power profile without
>> actually measuring if that is necessary.
>>
>> Marco came up with a patch that seems to reliable measure the fps send
>> down to the kernel and so together with knowing the frame size of the video
>> should allow us to select the right UVD power profile.
>>
>> The problem is that Alex (unnoticed by me) completely disabled selecting
>> the UVD profiles because of some issues with advanced post processing
>> discussed on IRC. The problem seems to be that the lower UVD profiles have a
>> to low SCLK/MCLK to handle the 3D load that comes with scaling,
>> deinterlacing etc...
>>
>> I unfortunately don't have time for it, cause this only affects the
>> hardware generations R600-SI and not the newest one CIK. So could you guys
>> stick together and come up with a solution? Something like a sysfs entry
>> that let's us select the minimum UVD power level allowed?
>>
>> I think Marco is happy to come up with a patch, we just need to know
>> what's really needed and what should be the default values. I'm happy to
>> review everything that comes out of it, just don't have time to do it
>> myself.
>>
>> Happy discussion and thanks in advance,
>> Christian.
>>
>> Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>
>>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>>
>>>> Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>>
>>>>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>
>>>>>> Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>>>>
>>>>>>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>
>>>>>>>> Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>>>>>>
>>>>>>>>> On Thu, Aug 7, 2014 at 7:33 AM, Christian König
>>>>>>>>> <deathsimple at vodafone.de>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> From: Marco A Benatto <marco.antonio.780 at gmail.com>
>>>>>>>>>>
>>>>>>>>>> Adding a Frames Per Second estimation logic on UVD handles
>>>>>>>>>> when it has being used. This estimation is per handle basis
>>>>>>>>>> and will help on DPM profile calculation.
>>>>>>>>>>
>>>>>>>>>> v2 (chk): fix timestamp type, move functions around and
>>>>>>>>>>               cleanup code a bit.
>>>>>>>>>
>>>>>>>>> Will this really help much?  I thought the problem was mainly due
>>>>>>>>> to
>>>>>>>>> sclk and mclk for post processing.
>>>>>>>>
>>>>>>>>
>>>>>>>> It should at least handle the UVD side for upclocking when you get a
>>>>>>>> lot
>>>>>>>> of
>>>>>>>> streams / fps. And at on my NI the patch seems to do exactly that.
>>>>>>>>
>>>>>>>> Switching sclk and mclk for post processing is a different task, and
>>>>>>>> I
>>>>>>>> actually have no idea what to do with them.
>>>>>>>
>>>>>>> At this point we always choose the plain UVD state anyway so this
>>>>>>> patch would only take effect if we re-enabled the dynamic UVD state
>>>>>>> selection.
>>>>>>
>>>>>>
>>>>>> Hui? I thought we already re-enabled the dynamic UVD state selection,
>>>>>> but
>>>>>> double checking this I found it disabled again.
>>>>>>
>>>>>> What was the problem with that? Looks like I somehow missed the
>>>>>> discussion
>>>>>> around it.
>>>>>
>>>>> We did, but after doing so a number of people complained about a
>>>>> regression on IRC because when apps like xmbc enabled post processing,
>>>>> performance went down.
>>>>
>>>>
>>>> That's strange, from my experience the different UVD performance states
>>>> only
>>>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the DPM dumps to
>>>> confirms this.
>>>>
>>> The sclks and mclks are usually different as well, especially on APUs.
>>> I can send you some examples.
>>>
>>>> You not off hand remember who complained on IRC? Finding something in
>>>> the
>>>> IRC logs is like searching for a needle in a haystack.
>>>
>>> I don't remember off hand.  I think zgreg was involved in some of the
>>> discussions.
>>>
>>> Alex
>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>> For the post processing, we probably need a hint we can
>>>>>>> pass to the driver in the CS ioctl to denote what state we need.
>>>>>>> Although if we did that, this could would largely be moot.  That
>>>>>>> said,
>>>>>>> newer asics support dynamic UVD clocks so we really only need
>>>>>>> something like that for older asics and I guess VCE.
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Marco A Benatto <marco.antonio.780 at gmail.com>
>>>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/gpu/drm/radeon/radeon.h     | 10 ++++++
>>>>>>>>>>      drivers/gpu/drm/radeon/radeon_uvd.c | 64
>>>>>>>>>> +++++++++++++++++++++++++++++++++----
>>>>>>>>>>      2 files changed, 68 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> index 9e1732e..e92f6cb 100644
>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> @@ -1617,6 +1617,15 @@ int radeon_pm_get_type_index(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>>      #define RADEON_UVD_STACK_SIZE  (1024*1024)
>>>>>>>>>>      #define RADEON_UVD_HEAP_SIZE   (1024*1024)
>>>>>>>>>>
>>>>>>>>>> +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>>>>>>>>> +#define RADEON_UVD_DEFAULT_FPS 60
>>>>>>>>>> +
>>>>>>>>>> +struct radeon_uvd_fps {
>>>>>>>>>> +       uint64_t        timestamp;
>>>>>>>>>> +       uint8_t         event_index;
>>>>>>>>>> +       uint8_t         events[RADEON_UVD_FPS_EVENTS_MAX];
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>>      struct radeon_uvd {
>>>>>>>>>>             struct radeon_bo        *vcpu_bo;
>>>>>>>>>>             void                    *cpu_addr;
>>>>>>>>>> @@ -1626,6 +1635,7 @@ struct radeon_uvd {
>>>>>>>>>>             struct drm_file         *filp[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>>             unsigned
>>>>>>>>>> img_size[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>>             struct delayed_work     idle_work;
>>>>>>>>>> +       struct radeon_uvd_fps   fps_info[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>>      };
>>>>>>>>>>
>>>>>>>>>>      int radeon_uvd_init(struct radeon_device *rdev);
>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> index 6bf55ec..ef5667a 100644
>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> @@ -237,6 +237,51 @@ void radeon_uvd_force_into_uvd_segment(struct
>>>>>>>>>> radeon_bo *rbo)
>>>>>>>>>>             rbo->placement.lpfn = (256 * 1024 * 1024) >>
>>>>>>>>>> PAGE_SHIFT;
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> +static void radeon_uvd_fps_clear_events(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> +       struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> +       unsigned i;
>>>>>>>>>> +
>>>>>>>>>> +       fps->timestamp = jiffies_64;
>>>>>>>>>> +       fps->event_index = 0;
>>>>>>>>>> +       for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++)
>>>>>>>>>> +               fps->events[i] = 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static void radeon_uvd_fps_note_event(struct radeon_device *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> +       struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> +       uint64_t timestamp = jiffies_64;
>>>>>>>>>> +       unsigned rate = 0;
>>>>>>>>>> +
>>>>>>>>>> +       uint8_t index = fps->event_index++;
>>>>>>>>>> +       fps->event_index %= RADEON_UVD_FPS_EVENTS_MAX;
>>>>>>>>>> +
>>>>>>>>>> +       rate = div64_u64(HZ, max(timestamp - fps->timestamp,
>>>>>>>>>> 1ULL));
>>>>>>>>>> +
>>>>>>>>>> +       fps->timestamp = timestamp;
>>>>>>>>>> +       fps->events[index] = min(rate, 120u);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static unsigned radeon_uvd_estimate_fps(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> +       struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> +       unsigned i, valid = 0, count = 0;
>>>>>>>>>> +
>>>>>>>>>> +       for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>>>>>>>>> +               /* We should ignore zero values */
>>>>>>>>>> +               if (fps->events[i] != 0) {
>>>>>>>>>> +                       count += fps->events[i];
>>>>>>>>>> +                       valid++;
>>>>>>>>>> +               }
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>> +       if (valid > 0)
>>>>>>>>>> +               return count / valid;
>>>>>>>>>> +       else
>>>>>>>>>> +               return RADEON_UVD_DEFAULT_FPS;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>      void radeon_uvd_free_handles(struct radeon_device *rdev,
>>>>>>>>>> struct
>>>>>>>>>> drm_file *filp)
>>>>>>>>>>      {
>>>>>>>>>>             int i, r;
>>>>>>>>>> @@ -419,8 +464,10 @@ static int radeon_uvd_cs_msg(struct
>>>>>>>>>> radeon_cs_parser
>>>>>>>>>> *p, struct radeon_bo *bo,
>>>>>>>>>>
>>>>>>>>>>             /* create or decode, validate the handle */
>>>>>>>>>>             for (i = 0; i < RADEON_MAX_UVD_HANDLES; ++i) {
>>>>>>>>>> -               if (atomic_read(&p->rdev->uvd.handles[i]) ==
>>>>>>>>>> handle)
>>>>>>>>>> +               if (atomic_read(&p->rdev->uvd.handles[i]) ==
>>>>>>>>>> handle)
>>>>>>>>>> {
>>>>>>>>>> +                       radeon_uvd_fps_note_event(p->rdev, i);
>>>>>>>>>>                             return 0;
>>>>>>>>>> +               }
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>>             /* handle not found try to alloc a new one */
>>>>>>>>>> @@ -428,6 +475,7 @@ static int radeon_uvd_cs_msg(struct
>>>>>>>>>> radeon_cs_parser
>>>>>>>>>> *p, struct radeon_bo *bo,
>>>>>>>>>>                     if (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>>>>>>>>> 0,
>>>>>>>>>> handle)) {
>>>>>>>>>>                             p->rdev->uvd.filp[i] = p->filp;
>>>>>>>>>>                             p->rdev->uvd.img_size[i] = img_size;
>>>>>>>>>> +                       radeon_uvd_fps_clear_events(p->rdev, i);
>>>>>>>>>>                             return 0;
>>>>>>>>>>                     }
>>>>>>>>>>             }
>>>>>>>>>> @@ -763,7 +811,7 @@ int radeon_uvd_get_destroy_msg(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev, int ring,
>>>>>>>>>>      static void radeon_uvd_count_handles(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>>                                          unsigned *sd, unsigned
>>>>>>>>>> *hd)
>>>>>>>>>>      {
>>>>>>>>>> -       unsigned i;
>>>>>>>>>> +       unsigned i, fps_rate = 0;
>>>>>>>>>>
>>>>>>>>>>             *sd = 0;
>>>>>>>>>>             *hd = 0;
>>>>>>>>>> @@ -772,10 +820,13 @@ static void radeon_uvd_count_handles(struct
>>>>>>>>>> radeon_device *rdev,
>>>>>>>>>>                     if (!atomic_read(&rdev->uvd.handles[i]))
>>>>>>>>>>                             continue;
>>>>>>>>>>
>>>>>>>>>> -               if (rdev->uvd.img_size[i] >= 720*576)
>>>>>>>>>> -                       ++(*hd);
>>>>>>>>>> -               else
>>>>>>>>>> -                       ++(*sd);
>>>>>>>>>> +               fps_rate = radeon_uvd_estimate_fps(rdev, i);
>>>>>>>>>> +
>>>>>>>>>> +               if (rdev->uvd.img_size[i] >= 720*576) {
>>>>>>>>>> +                       (*hd) += fps_rate > 30 ? 1 : 2;
>>>>>>>>>> +               } else {
>>>>>>>>>> +                       (*sd) += fps_rate > 30 ? 1 : 2;
>>>>>>>>>> +               }
>>>>>>>>>>             }
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> @@ -805,6 +856,7 @@ void radeon_uvd_note_usage(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev)
>>>>>>>>>>             set_clocks &=
>>>>>>>>>> schedule_delayed_work(&rdev->uvd.idle_work,
>>>>>>>>>>
>>>>>>>>>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>>             if ((rdev->pm.pm_method == PM_METHOD_DPM) &&
>>>>>>>>>> rdev->pm.dpm_enabled) {
>>>>>>>>>>                     unsigned hd = 0, sd = 0;
>>>>>>>>>>                     radeon_uvd_count_handles(rdev, &sd, &hd);
>>>>>>>>>> --
>>>>>>>>>> 1.9.1
>>>>>>>>>>
>>
>
>
>
> --
> Marco Antonio Benatto
> Linux user ID: #506236
>
>


More information about the dri-devel mailing list