[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2
Marco Benatto
marco.antonio.780 at gmail.com
Fri Aug 15 07:51:50 PDT 2014
Hi Christian,
got it. I'll start to add a sysfs entry here called uvd_power_level where
we'll be able to change UVD performance profile, ok?
I'll need your help to define the power profiles and some one to test it on
r600-SI as I don't have anyone around here.
Thank you.
On Fri, Aug 15, 2014 at 11:11 AM, Christian König <christian.koenig at amd.com>
wrote:
> Hi Marco,
>
> the problem with an CS ioctl flag is that we sometimes don't know how much
> SCLK/MCLK boost is needed, for example when we do post processing in the
> player using OpenGL and UVD decoding with VDPAU. In this case VDPAU don't
> has the slightest idea how high SCLK/MCLK must be and so can't give that
> info to the kernel either.
>
> Regards,
> Christian.
>
> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>
> Hey all,
>
> I also had a talk with Alex yesterday about post-processing issues when
> using dynamic UVD profiles and a chamge on CS ioctl
> including a flag to let user mode driver tell to the kernel which
> performance requirement it wants for post processing. A commom
> point for both discussion is to stablish the default values for these
> profiles, but probably this ioctl change would be more impacting/complex
> to implement than a sysfs entry.
>
> If a sysfs entry is anough for now I can handle the code to create it
> and, with your help, the code to setup the UVD profile requested through it.
>
> Is there any suggestion?
>
> Thanks all for your help,
>
>
> On Fri, Aug 15, 2014 at 5:48 AM, Christian König <christian.koenig at amd.com
> > wrote:
>
>> Hi guys,
>>
>> to make a long story short every time I watch a movie my laptop start to
>> heat up because we always select the standard UVD power profile without
>> actually measuring if that is necessary.
>>
>> Marco came up with a patch that seems to reliable measure the fps send
>> down to the kernel and so together with knowing the frame size of the video
>> should allow us to select the right UVD power profile.
>>
>> The problem is that Alex (unnoticed by me) completely disabled selecting
>> the UVD profiles because of some issues with advanced post processing
>> discussed on IRC. The problem seems to be that the lower UVD profiles have
>> a to low SCLK/MCLK to handle the 3D load that comes with scaling,
>> deinterlacing etc...
>>
>> I unfortunately don't have time for it, cause this only affects the
>> hardware generations R600-SI and not the newest one CIK. So could you guys
>> stick together and come up with a solution? Something like a sysfs entry
>> that let's us select the minimum UVD power level allowed?
>>
>> I think Marco is happy to come up with a patch, we just need to know
>> what's really needed and what should be the default values. I'm happy to
>> review everything that comes out of it, just don't have time to do it
>> myself.
>>
>> Happy discussion and thanks in advance,
>> Christian.
>>
>> Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>
>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>
>>>> Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>>
>>>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>>>> <deathsimple at vodafone.de> wrote:
>>>>>
>>>>>> Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>>>>
>>>>>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>
>>>>>>>> Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>>>>>>
>>>>>>>> On Thu, Aug 7, 2014 at 7:33 AM, Christian König
>>>>>>>>> <deathsimple at vodafone.de>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> From: Marco A Benatto <marco.antonio.780 at gmail.com>
>>>>>>>>>>
>>>>>>>>>> Adding a Frames Per Second estimation logic on UVD handles
>>>>>>>>>> when it has being used. This estimation is per handle basis
>>>>>>>>>> and will help on DPM profile calculation.
>>>>>>>>>>
>>>>>>>>>> v2 (chk): fix timestamp type, move functions around and
>>>>>>>>>> cleanup code a bit.
>>>>>>>>>>
>>>>>>>>> Will this really help much? I thought the problem was mainly due
>>>>>>>>> to
>>>>>>>>> sclk and mclk for post processing.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It should at least handle the UVD side for upclocking when you get a
>>>>>>>> lot
>>>>>>>> of
>>>>>>>> streams / fps. And at on my NI the patch seems to do exactly that.
>>>>>>>>
>>>>>>>> Switching sclk and mclk for post processing is a different task,
>>>>>>>> and I
>>>>>>>> actually have no idea what to do with them.
>>>>>>>>
>>>>>>> At this point we always choose the plain UVD state anyway so this
>>>>>>> patch would only take effect if we re-enabled the dynamic UVD state
>>>>>>> selection.
>>>>>>>
>>>>>>
>>>>>> Hui? I thought we already re-enabled the dynamic UVD state selection,
>>>>>> but
>>>>>> double checking this I found it disabled again.
>>>>>>
>>>>>> What was the problem with that? Looks like I somehow missed the
>>>>>> discussion
>>>>>> around it.
>>>>>>
>>>>> We did, but after doing so a number of people complained about a
>>>>> regression on IRC because when apps like xmbc enabled post processing,
>>>>> performance went down.
>>>>>
>>>>
>>>> That's strange, from my experience the different UVD performance states
>>>> only
>>>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the DPM dumps to
>>>> confirms this.
>>>>
>>>> The sclks and mclks are usually different as well, especially on APUs.
>>> I can send you some examples.
>>>
>>> You not off hand remember who complained on IRC? Finding something in
>>>> the
>>>> IRC logs is like searching for a needle in a haystack.
>>>>
>>> I don't remember off hand. I think zgreg was involved in some of the
>>> discussions.
>>>
>>> Alex
>>>
>>> Thanks,
>>>> Christian.
>>>>
>>>>
>>>> Alex
>>>>>
>>>>>
>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> For the post processing, we probably need a hint we can
>>>>>>> pass to the driver in the CS ioctl to denote what state we need.
>>>>>>> Although if we did that, this could would largely be moot. That
>>>>>>> said,
>>>>>>> newer asics support dynamic UVD clocks so we really only need
>>>>>>> something like that for older asics and I guess VCE.
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>> Signed-off-by: Marco A Benatto <marco.antonio.780 at gmail.com>
>>>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/radeon/radeon.h | 10 ++++++
>>>>>>>>>> drivers/gpu/drm/radeon/radeon_uvd.c | 64
>>>>>>>>>> +++++++++++++++++++++++++++++++++----
>>>>>>>>>> 2 files changed, 68 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> index 9e1732e..e92f6cb 100644
>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>>>>>>>> @@ -1617,6 +1617,15 @@ int radeon_pm_get_type_index(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> #define RADEON_UVD_STACK_SIZE (1024*1024)
>>>>>>>>>> #define RADEON_UVD_HEAP_SIZE (1024*1024)
>>>>>>>>>>
>>>>>>>>>> +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>>>>>>>>> +#define RADEON_UVD_DEFAULT_FPS 60
>>>>>>>>>> +
>>>>>>>>>> +struct radeon_uvd_fps {
>>>>>>>>>> + uint64_t timestamp;
>>>>>>>>>> + uint8_t event_index;
>>>>>>>>>> + uint8_t events[RADEON_UVD_FPS_EVENTS_MAX];
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> struct radeon_uvd {
>>>>>>>>>> struct radeon_bo *vcpu_bo;
>>>>>>>>>> void *cpu_addr;
>>>>>>>>>> @@ -1626,6 +1635,7 @@ struct radeon_uvd {
>>>>>>>>>> struct drm_file *filp[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>> unsigned
>>>>>>>>>> img_size[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>> struct delayed_work idle_work;
>>>>>>>>>> + struct radeon_uvd_fps fps_info[RADEON_MAX_UVD_HANDLES];
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> int radeon_uvd_init(struct radeon_device *rdev);
>>>>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> index 6bf55ec..ef5667a 100644
>>>>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_uvd.c
>>>>>>>>>> @@ -237,6 +237,51 @@ void radeon_uvd_force_into_uvd_segment(struct
>>>>>>>>>> radeon_bo *rbo)
>>>>>>>>>> rbo->placement.lpfn = (256 * 1024 * 1024) >>
>>>>>>>>>> PAGE_SHIFT;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> +static void radeon_uvd_fps_clear_events(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> + unsigned i;
>>>>>>>>>> +
>>>>>>>>>> + fps->timestamp = jiffies_64;
>>>>>>>>>> + fps->event_index = 0;
>>>>>>>>>> + for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++)
>>>>>>>>>> + fps->events[i] = 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static void radeon_uvd_fps_note_event(struct radeon_device *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> + uint64_t timestamp = jiffies_64;
>>>>>>>>>> + unsigned rate = 0;
>>>>>>>>>> +
>>>>>>>>>> + uint8_t index = fps->event_index++;
>>>>>>>>>> + fps->event_index %= RADEON_UVD_FPS_EVENTS_MAX;
>>>>>>>>>> +
>>>>>>>>>> + rate = div64_u64(HZ, max(timestamp - fps->timestamp,
>>>>>>>>>> 1ULL));
>>>>>>>>>> +
>>>>>>>>>> + fps->timestamp = timestamp;
>>>>>>>>>> + fps->events[index] = min(rate, 120u);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static unsigned radeon_uvd_estimate_fps(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> int
>>>>>>>>>> idx)
>>>>>>>>>> +{
>>>>>>>>>> + struct radeon_uvd_fps *fps = &rdev->uvd.fps_info[idx];
>>>>>>>>>> + unsigned i, valid = 0, count = 0;
>>>>>>>>>> +
>>>>>>>>>> + for (i = 0; i < RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>>>>>>>>> + /* We should ignore zero values */
>>>>>>>>>> + if (fps->events[i] != 0) {
>>>>>>>>>> + count += fps->events[i];
>>>>>>>>>> + valid++;
>>>>>>>>>> + }
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + if (valid > 0)
>>>>>>>>>> + return count / valid;
>>>>>>>>>> + else
>>>>>>>>>> + return RADEON_UVD_DEFAULT_FPS;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> void radeon_uvd_free_handles(struct radeon_device *rdev,
>>>>>>>>>> struct
>>>>>>>>>> drm_file *filp)
>>>>>>>>>> {
>>>>>>>>>> int i, r;
>>>>>>>>>> @@ -419,8 +464,10 @@ static int radeon_uvd_cs_msg(struct
>>>>>>>>>> radeon_cs_parser
>>>>>>>>>> *p, struct radeon_bo *bo,
>>>>>>>>>>
>>>>>>>>>> /* create or decode, validate the handle */
>>>>>>>>>> for (i = 0; i < RADEON_MAX_UVD_HANDLES; ++i) {
>>>>>>>>>> - if (atomic_read(&p->rdev->uvd.handles[i]) ==
>>>>>>>>>> handle)
>>>>>>>>>> + if (atomic_read(&p->rdev->uvd.handles[i]) ==
>>>>>>>>>> handle)
>>>>>>>>>> {
>>>>>>>>>> + radeon_uvd_fps_note_event(p->rdev, i);
>>>>>>>>>> return 0;
>>>>>>>>>> + }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> /* handle not found try to alloc a new one */
>>>>>>>>>> @@ -428,6 +475,7 @@ static int radeon_uvd_cs_msg(struct
>>>>>>>>>> radeon_cs_parser
>>>>>>>>>> *p, struct radeon_bo *bo,
>>>>>>>>>> if (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>>>>>>>>> 0,
>>>>>>>>>> handle)) {
>>>>>>>>>> p->rdev->uvd.filp[i] = p->filp;
>>>>>>>>>> p->rdev->uvd.img_size[i] = img_size;
>>>>>>>>>> + radeon_uvd_fps_clear_events(p->rdev, i);
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> @@ -763,7 +811,7 @@ int radeon_uvd_get_destroy_msg(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev, int ring,
>>>>>>>>>> static void radeon_uvd_count_handles(struct radeon_device
>>>>>>>>>> *rdev,
>>>>>>>>>> unsigned *sd, unsigned
>>>>>>>>>> *hd)
>>>>>>>>>> {
>>>>>>>>>> - unsigned i;
>>>>>>>>>> + unsigned i, fps_rate = 0;
>>>>>>>>>>
>>>>>>>>>> *sd = 0;
>>>>>>>>>> *hd = 0;
>>>>>>>>>> @@ -772,10 +820,13 @@ static void radeon_uvd_count_handles(struct
>>>>>>>>>> radeon_device *rdev,
>>>>>>>>>> if (!atomic_read(&rdev->uvd.handles[i]))
>>>>>>>>>> continue;
>>>>>>>>>>
>>>>>>>>>> - if (rdev->uvd.img_size[i] >= 720*576)
>>>>>>>>>> - ++(*hd);
>>>>>>>>>> - else
>>>>>>>>>> - ++(*sd);
>>>>>>>>>> + fps_rate = radeon_uvd_estimate_fps(rdev, i);
>>>>>>>>>> +
>>>>>>>>>> + if (rdev->uvd.img_size[i] >= 720*576) {
>>>>>>>>>> + (*hd) += fps_rate > 30 ? 1 : 2;
>>>>>>>>>> + } else {
>>>>>>>>>> + (*sd) += fps_rate > 30 ? 1 : 2;
>>>>>>>>>> + }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> @@ -805,6 +856,7 @@ void radeon_uvd_note_usage(struct
>>>>>>>>>> radeon_device
>>>>>>>>>> *rdev)
>>>>>>>>>> set_clocks &=
>>>>>>>>>> schedule_delayed_work(&rdev->uvd.idle_work,
>>>>>>>>>>
>>>>>>>>>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> if ((rdev->pm.pm_method == PM_METHOD_DPM) &&
>>>>>>>>>> rdev->pm.dpm_enabled) {
>>>>>>>>>> unsigned hd = 0, sd = 0;
>>>>>>>>>> radeon_uvd_count_handles(rdev, &sd, &hd);
>>>>>>>>>> --
>>>>>>>>>> 1.9.1
>>>>>>>>>>
>>>>>>>>>>
>>
>
>
> --
> Marco Antonio Benatto
> Linux user ID: #506236
>
>
>
--
Marco Antonio Benatto
Linux user ID: #506236
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140815/d83086c0/attachment-0001.html>
More information about the dri-devel
mailing list