[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2
Alex Deucher
alexdeucher at gmail.com
Fri Aug 15 08:26:03 PDT 2014
On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> On 15.08.2014 16:11, Christian König wrote:
>> Hi Marco,
>>
>> the problem with an CS ioctl flag is that we sometimes don't know how
>> much SCLK/MCLK boost is needed, for example when we do post processing
>> in the player using OpenGL and UVD decoding with VDPAU. In this case
>> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so
>> can't give that info to the kernel either.
>>
>
> Maybe it's an acceptable workaround to simply disable dynamic UVD state
> selection in case the UVD states only have a single power level. That
> will avoid the performance issues on affected systems, while still
> allowing dynamic UVD states on systems that have a saner DPM table
> setup. I think it is mosly older systems that suffer from this.
>
That is exactly what we do now.
Alex
> Best regards
> Grigori
>
>> Regards,
>> Christian.
>>
>> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>>> Hey all,
>>>
>>> I also had a talk with Alex yesterday about post-processing issues
>>> when using dynamic UVD profiles and a chamge on CS ioctl
>>> including a flag to let user mode driver tell to the kernel which
>>> performance requirement it wants for post processing. A commom
>>> point for both discussion is to stablish the default values for these
>>> profiles, but probably this ioctl change would be more impacting/complex
>>> to implement than a sysfs entry.
>>>
>>> If a sysfs entry is anough for now I can handle the code to create it
>>> and, with your help, the code to setup the UVD profile requested
>>> through it.
>>>
>>> Is there any suggestion?
>>>
>>> Thanks all for your help,
>>>
>>>
>>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König
>>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>>
>>> Hi guys,
>>>
>>> to make a long story short every time I watch a movie my laptop
>>> start to heat up because we always select the standard UVD power
>>> profile without actually measuring if that is necessary.
>>>
>>> Marco came up with a patch that seems to reliable measure the fps
>>> send down to the kernel and so together with knowing the frame
>>> size of the video should allow us to select the right UVD power
>>> profile.
>>>
>>> The problem is that Alex (unnoticed by me) completely disabled
>>> selecting the UVD profiles because of some issues with advanced
>>> post processing discussed on IRC. The problem seems to be that the
>>> lower UVD profiles have a to low SCLK/MCLK to handle the 3D load
>>> that comes with scaling, deinterlacing etc...
>>>
>>> I unfortunately don't have time for it, cause this only affects
>>> the hardware generations R600-SI and not the newest one CIK. So
>>> could you guys stick together and come up with a solution?
>>> Something like a sysfs entry that let's us select the minimum UVD
>>> power level allowed?
>>>
>>> I think Marco is happy to come up with a patch, we just need to
>>> know what's really needed and what should be the default values.
>>> I'm happy to review everything that comes out of it, just don't
>>> have time to do it myself.
>>>
>>> Happy discussion and thanks in advance,
>>> Christian.
>>>
>>> Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>>
>>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>>> <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>> Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>>
>>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>>> <deathsimple at vodafone.de
>>> <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>> Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>>
>>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>>> <deathsimple at vodafone.de
>>> <mailto:deathsimple at vodafone.de>> wrote:
>>>
>>> Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>>
>>> On Thu, Aug 7, 2014 at 7:33 AM,
>>> Christian König
>>> <deathsimple at vodafone.de
>>> <mailto:deathsimple at vodafone.de>>
>>> wrote:
>>>
>>> From: Marco A Benatto
>>> <marco.antonio.780 at gmail.com
>>> <mailto:marco.antonio.780 at gmail.com>>
>>>
>>> Adding a Frames Per Second
>>> estimation logic on UVD handles
>>> when it has being used. This
>>> estimation is per handle basis
>>> and will help on DPM profile
>>> calculation.
>>>
>>> v2 (chk): fix timestamp type, move
>>> functions around and
>>> cleanup code a bit.
>>>
>>> Will this really help much? I thought
>>> the problem was mainly due to
>>> sclk and mclk for post processing.
>>>
>>>
>>> It should at least handle the UVD side for
>>> upclocking when you get a
>>> lot
>>> of
>>> streams / fps. And at on my NI the patch
>>> seems to do exactly that.
>>>
>>> Switching sclk and mclk for post
>>> processing is a different task, and I
>>> actually have no idea what to do with them.
>>>
>>> At this point we always choose the plain UVD
>>> state anyway so this
>>> patch would only take effect if we re-enabled
>>> the dynamic UVD state
>>> selection.
>>>
>>>
>>> Hui? I thought we already re-enabled the dynamic
>>> UVD state selection, but
>>> double checking this I found it disabled again.
>>>
>>> What was the problem with that? Looks like I
>>> somehow missed the
>>> discussion
>>> around it.
>>>
>>> We did, but after doing so a number of people
>>> complained about a
>>> regression on IRC because when apps like xmbc enabled
>>> post processing,
>>> performance went down.
>>>
>>>
>>> That's strange, from my experience the different UVD
>>> performance states only
>>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the
>>> DPM dumps to
>>> confirms this.
>>>
>>> The sclks and mclks are usually different as well, especially
>>> on APUs.
>>> I can send you some examples.
>>>
>>> You not off hand remember who complained on IRC? Finding
>>> something in the
>>> IRC logs is like searching for a needle in a haystack.
>>>
>>> I don't remember off hand. I think zgreg was involved in some
>>> of the
>>> discussions.
>>>
>>> Alex
>>>
>>> Thanks,
>>> Christian.
>>>
>>>
>>> Alex
>>>
>>>
>>> Christian.
>>>
>>>
>>> For the post processing, we probably need a
>>> hint we can
>>> pass to the driver in the CS ioctl to denote
>>> what state we need.
>>> Although if we did that, this could would
>>> largely be moot. That said,
>>> newer asics support dynamic UVD clocks so we
>>> really only need
>>> something like that for older asics and I
>>> guess VCE.
>>>
>>> Alex
>>>
>>> Christian.
>>>
>>>
>>> Alex
>>>
>>> Signed-off-by: Marco A Benatto
>>> <marco.antonio.780 at gmail.com
>>> <mailto:marco.antonio.780 at gmail.com>>
>>> Signed-off-by: Christian König
>>> <christian.koenig at amd.com
>>> <mailto:christian.koenig at amd.com>>
>>> ---
>>>
>>> drivers/gpu/drm/radeon/radeon.h
>>> | 10 ++++++
>>>
>>> drivers/gpu/drm/radeon/radeon_uvd.c
>>> | 64
>>> +++++++++++++++++++++++++++++++++----
>>> 2 files changed, 68
>>> insertions(+), 6 deletions(-)
>>>
>>> diff --git
>>> a/drivers/gpu/drm/radeon/radeon.h
>>> b/drivers/gpu/drm/radeon/radeon.h
>>> index 9e1732e..e92f6cb 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -1617,6 +1617,15 @@ int
>>> radeon_pm_get_type_index(struct
>>> radeon_device
>>> *rdev,
>>> #define
>>> RADEON_UVD_STACK_SIZE (1024*1024)
>>> #define RADEON_UVD_HEAP_SIZE
>>> (1024*1024)
>>>
>>> +#define RADEON_UVD_FPS_EVENTS_MAX 8
>>> +#define RADEON_UVD_DEFAULT_FPS 60
>>> +
>>> +struct radeon_uvd_fps {
>>> + uint64_t timestamp;
>>> + uint8_t event_index;
>>> + uint8_t
>>> events[RADEON_UVD_FPS_EVENTS_MAX];
>>> +};
>>> +
>>> struct radeon_uvd {
>>> struct radeon_bo
>>> *vcpu_bo;
>>> void
>>> *cpu_addr;
>>> @@ -1626,6 +1635,7 @@ struct
>>> radeon_uvd {
>>> struct drm_file
>>> *filp[RADEON_MAX_UVD_HANDLES];
>>> unsigned
>>> img_size[RADEON_MAX_UVD_HANDLES];
>>> struct delayed_work
>>> idle_work;
>>> + struct radeon_uvd_fps
>>> fps_info[RADEON_MAX_UVD_HANDLES];
>>> };
>>>
>>> int radeon_uvd_init(struct
>>> radeon_device *rdev);
>>> diff --git
>>> a/drivers/gpu/drm/radeon/radeon_uvd.c
>>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>>> index 6bf55ec..ef5667a 100644
>>> ---
>>> a/drivers/gpu/drm/radeon/radeon_uvd.c
>>> +++
>>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>>> @@ -237,6 +237,51 @@ void
>>> radeon_uvd_force_into_uvd_segment(struct
>>> radeon_bo *rbo)
>>> rbo->placement.lpfn =
>>> (256 * 1024 * 1024) >> PAGE_SHIFT;
>>> }
>>>
>>> +static void
>>> radeon_uvd_fps_clear_events(struct
>>> radeon_device *rdev,
>>> int
>>> idx)
>>> +{
>>> + struct radeon_uvd_fps *fps
>>> = &rdev->uvd.fps_info[idx];
>>> + unsigned i;
>>> +
>>> + fps->timestamp = jiffies_64;
>>> + fps->event_index = 0;
>>> + for (i = 0; i <
>>> RADEON_UVD_FPS_EVENTS_MAX; i++)
>>> + fps->events[i] = 0;
>>> +}
>>> +
>>> +static void
>>> radeon_uvd_fps_note_event(struct
>>> radeon_device *rdev,
>>> int
>>> idx)
>>> +{
>>> + struct radeon_uvd_fps *fps
>>> = &rdev->uvd.fps_info[idx];
>>> + uint64_t timestamp =
>>> jiffies_64;
>>> + unsigned rate = 0;
>>> +
>>> + uint8_t index =
>>> fps->event_index++;
>>> + fps->event_index %=
>>> RADEON_UVD_FPS_EVENTS_MAX;
>>> +
>>> + rate = div64_u64(HZ,
>>> max(timestamp - fps->timestamp,
>>> 1ULL));
>>> +
>>> + fps->timestamp = timestamp;
>>> + fps->events[index] =
>>> min(rate, 120u);
>>> +}
>>> +
>>> +static unsigned
>>> radeon_uvd_estimate_fps(struct
>>> radeon_device *rdev,
>>> int
>>> idx)
>>> +{
>>> + struct radeon_uvd_fps *fps
>>> = &rdev->uvd.fps_info[idx];
>>> + unsigned i, valid = 0,
>>> count = 0;
>>> +
>>> + for (i = 0; i <
>>> RADEON_UVD_FPS_EVENTS_MAX; i++) {
>>> + /* We should
>>> ignore zero values */
>>> + if (fps->events[i]
>>> != 0) {
>>> + count +=
>>> fps->events[i];
>>> + valid++;
>>> + }
>>> + }
>>> +
>>> + if (valid > 0)
>>> + return count / valid;
>>> + else
>>> + return
>>> RADEON_UVD_DEFAULT_FPS;
>>> +}
>>> +
>>> void
>>> radeon_uvd_free_handles(struct
>>> radeon_device *rdev, struct
>>> drm_file *filp)
>>> {
>>> int i, r;
>>> @@ -419,8 +464,10 @@ static int
>>> radeon_uvd_cs_msg(struct
>>> radeon_cs_parser
>>> *p, struct radeon_bo *bo,
>>>
>>> /* create or decode,
>>> validate the handle */
>>> for (i = 0; i <
>>> RADEON_MAX_UVD_HANDLES; ++i) {
>>> - if
>>> (atomic_read(&p->rdev->uvd.handles[i])
>>> == handle)
>>> + if
>>> (atomic_read(&p->rdev->uvd.handles[i])
>>> == handle)
>>> {
>>> +
>>> radeon_uvd_fps_note_event(p->rdev, i);
>>> return 0;
>>> + }
>>> }
>>>
>>> /* handle not found
>>> try to alloc a new one */
>>> @@ -428,6 +475,7 @@ static int
>>> radeon_uvd_cs_msg(struct
>>> radeon_cs_parser
>>> *p, struct radeon_bo *bo,
>>> if
>>> (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>>> 0,
>>> handle)) {
>>>
>>> p->rdev->uvd.filp[i] = p->filp;
>>>
>>> p->rdev->uvd.img_size[i] = img_size;
>>> +
>>> radeon_uvd_fps_clear_events(p->rdev,
>>> i);
>>> return 0;
>>> }
>>> }
>>> @@ -763,7 +811,7 @@ int
>>> radeon_uvd_get_destroy_msg(struct
>>> radeon_device
>>> *rdev, int ring,
>>> static void
>>> radeon_uvd_count_handles(struct
>>> radeon_device *rdev,
>>>
>>> unsigned *sd, unsigned *hd)
>>> {
>>> - unsigned i;
>>> + unsigned i, fps_rate = 0;
>>>
>>> *sd = 0;
>>> *hd = 0;
>>> @@ -772,10 +820,13 @@ static void
>>> radeon_uvd_count_handles(struct
>>> radeon_device *rdev,
>>> if
>>> (!atomic_read(&rdev->uvd.handles[i]))
>>> continue;
>>>
>>> - if
>>> (rdev->uvd.img_size[i] >= 720*576)
>>> - ++(*hd);
>>> - else
>>> - ++(*sd);
>>> + fps_rate =
>>> radeon_uvd_estimate_fps(rdev, i);
>>> +
>>> + if
>>> (rdev->uvd.img_size[i] >= 720*576) {
>>> + (*hd) +=
>>> fps_rate > 30 ? 1 : 2;
>>> + } else {
>>> + (*sd) +=
>>> fps_rate > 30 ? 1 : 2;
>>> + }
>>> }
>>> }
>>>
>>> @@ -805,6 +856,7 @@ void
>>> radeon_uvd_note_usage(struct
>>> radeon_device
>>> *rdev)
>>> set_clocks &=
>>> schedule_delayed_work(&rdev->uvd.idle_work,
>>>
>>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>>
>>> +
>>> if
>>> ((rdev->pm.pm_method ==
>>> PM_METHOD_DPM) &&
>>> rdev->pm.dpm_enabled) {
>>> unsigned hd =
>>> 0, sd = 0;
>>>
>>> radeon_uvd_count_handles(rdev,
>>> &sd, &hd);
>>> --
>>> 1.9.1
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Marco Antonio Benatto
>>> Linux user ID:#506236
>>
>
>
More information about the dri-devel
mailing list