[PATCH] drm/radeon: Adding UVD handle basis fps estimation v2
Grigori Goronzy
greg at chown.ath.cx
Fri Aug 15 08:20:16 PDT 2014
On 15.08.2014 16:11, Christian König wrote:
> Hi Marco,
>
> the problem with an CS ioctl flag is that we sometimes don't know how
> much SCLK/MCLK boost is needed, for example when we do post processing
> in the player using OpenGL and UVD decoding with VDPAU. In this case
> VDPAU don't has the slightest idea how high SCLK/MCLK must be and so
> can't give that info to the kernel either.
>
Maybe it's an acceptable workaround to simply disable dynamic UVD state
selection in case the UVD states only have a single power level. That
will avoid the performance issues on affected systems, while still
allowing dynamic UVD states on systems that have a saner DPM table
setup. I think it is mosly older systems that suffer from this.
Best regards
Grigori
> Regards,
> Christian.
>
> Am 15.08.2014 um 15:21 schrieb Marco Benatto:
>> Hey all,
>>
>> I also had a talk with Alex yesterday about post-processing issues
>> when using dynamic UVD profiles and a chamge on CS ioctl
>> including a flag to let user mode driver tell to the kernel which
>> performance requirement it wants for post processing. A commom
>> point for both discussion is to stablish the default values for these
>> profiles, but probably this ioctl change would be more impacting/complex
>> to implement than a sysfs entry.
>>
>> If a sysfs entry is anough for now I can handle the code to create it
>> and, with your help, the code to setup the UVD profile requested
>> through it.
>>
>> Is there any suggestion?
>>
>> Thanks all for your help,
>>
>>
>> On Fri, Aug 15, 2014 at 5:48 AM, Christian König
>> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>
>> Hi guys,
>>
>> to make a long story short every time I watch a movie my laptop
>> start to heat up because we always select the standard UVD power
>> profile without actually measuring if that is necessary.
>>
>> Marco came up with a patch that seems to reliable measure the fps
>> send down to the kernel and so together with knowing the frame
>> size of the video should allow us to select the right UVD power
>> profile.
>>
>> The problem is that Alex (unnoticed by me) completely disabled
>> selecting the UVD profiles because of some issues with advanced
>> post processing discussed on IRC. The problem seems to be that the
>> lower UVD profiles have a to low SCLK/MCLK to handle the 3D load
>> that comes with scaling, deinterlacing etc...
>>
>> I unfortunately don't have time for it, cause this only affects
>> the hardware generations R600-SI and not the newest one CIK. So
>> could you guys stick together and come up with a solution?
>> Something like a sysfs entry that let's us select the minimum UVD
>> power level allowed?
>>
>> I think Marco is happy to come up with a patch, we just need to
>> know what's really needed and what should be the default values.
>> I'm happy to review everything that comes out of it, just don't
>> have time to do it myself.
>>
>> Happy discussion and thanks in advance,
>> Christian.
>>
>> Am 12.08.2014 um 15:05 schrieb Alex Deucher:
>>
>> On Tue, Aug 12, 2014 at 6:00 AM, Christian König
>> <deathsimple at vodafone.de <mailto:deathsimple at vodafone.de>> wrote:
>>
>> Am 11.08.2014 um 16:52 schrieb Alex Deucher:
>>
>> On Mon, Aug 11, 2014 at 5:08 AM, Christian König
>> <deathsimple at vodafone.de
>> <mailto:deathsimple at vodafone.de>> wrote:
>>
>> Am 07.08.2014 um 21:43 schrieb Alex Deucher:
>>
>> On Thu, Aug 7, 2014 at 11:32 AM, Christian König
>> <deathsimple at vodafone.de
>> <mailto:deathsimple at vodafone.de>> wrote:
>>
>> Am 07.08.2014 um 16:32 schrieb Alex Deucher:
>>
>> On Thu, Aug 7, 2014 at 7:33 AM,
>> Christian König
>> <deathsimple at vodafone.de
>> <mailto:deathsimple at vodafone.de>>
>> wrote:
>>
>> From: Marco A Benatto
>> <marco.antonio.780 at gmail.com
>> <mailto:marco.antonio.780 at gmail.com>>
>>
>> Adding a Frames Per Second
>> estimation logic on UVD handles
>> when it has being used. This
>> estimation is per handle basis
>> and will help on DPM profile
>> calculation.
>>
>> v2 (chk): fix timestamp type, move
>> functions around and
>> cleanup code a bit.
>>
>> Will this really help much? I thought
>> the problem was mainly due to
>> sclk and mclk for post processing.
>>
>>
>> It should at least handle the UVD side for
>> upclocking when you get a
>> lot
>> of
>> streams / fps. And at on my NI the patch
>> seems to do exactly that.
>>
>> Switching sclk and mclk for post
>> processing is a different task, and I
>> actually have no idea what to do with them.
>>
>> At this point we always choose the plain UVD
>> state anyway so this
>> patch would only take effect if we re-enabled
>> the dynamic UVD state
>> selection.
>>
>>
>> Hui? I thought we already re-enabled the dynamic
>> UVD state selection, but
>> double checking this I found it disabled again.
>>
>> What was the problem with that? Looks like I
>> somehow missed the
>> discussion
>> around it.
>>
>> We did, but after doing so a number of people
>> complained about a
>> regression on IRC because when apps like xmbc enabled
>> post processing,
>> performance went down.
>>
>>
>> That's strange, from my experience the different UVD
>> performance states only
>> affect UVDs dclk/vclk, not sclk/mclk. I need to get the
>> DPM dumps to
>> confirms this.
>>
>> The sclks and mclks are usually different as well, especially
>> on APUs.
>> I can send you some examples.
>>
>> You not off hand remember who complained on IRC? Finding
>> something in the
>> IRC logs is like searching for a needle in a haystack.
>>
>> I don't remember off hand. I think zgreg was involved in some
>> of the
>> discussions.
>>
>> Alex
>>
>> Thanks,
>> Christian.
>>
>>
>> Alex
>>
>>
>> Christian.
>>
>>
>> For the post processing, we probably need a
>> hint we can
>> pass to the driver in the CS ioctl to denote
>> what state we need.
>> Although if we did that, this could would
>> largely be moot. That said,
>> newer asics support dynamic UVD clocks so we
>> really only need
>> something like that for older asics and I
>> guess VCE.
>>
>> Alex
>>
>> Christian.
>>
>>
>> Alex
>>
>> Signed-off-by: Marco A Benatto
>> <marco.antonio.780 at gmail.com
>> <mailto:marco.antonio.780 at gmail.com>>
>> Signed-off-by: Christian König
>> <christian.koenig at amd.com
>> <mailto:christian.koenig at amd.com>>
>> ---
>>
>> drivers/gpu/drm/radeon/radeon.h
>> | 10 ++++++
>>
>> drivers/gpu/drm/radeon/radeon_uvd.c
>> | 64
>> +++++++++++++++++++++++++++++++++----
>> 2 files changed, 68
>> insertions(+), 6 deletions(-)
>>
>> diff --git
>> a/drivers/gpu/drm/radeon/radeon.h
>> b/drivers/gpu/drm/radeon/radeon.h
>> index 9e1732e..e92f6cb 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -1617,6 +1617,15 @@ int
>> radeon_pm_get_type_index(struct
>> radeon_device
>> *rdev,
>> #define
>> RADEON_UVD_STACK_SIZE (1024*1024)
>> #define RADEON_UVD_HEAP_SIZE
>> (1024*1024)
>>
>> +#define RADEON_UVD_FPS_EVENTS_MAX 8
>> +#define RADEON_UVD_DEFAULT_FPS 60
>> +
>> +struct radeon_uvd_fps {
>> + uint64_t timestamp;
>> + uint8_t event_index;
>> + uint8_t
>> events[RADEON_UVD_FPS_EVENTS_MAX];
>> +};
>> +
>> struct radeon_uvd {
>> struct radeon_bo
>> *vcpu_bo;
>> void
>> *cpu_addr;
>> @@ -1626,6 +1635,7 @@ struct
>> radeon_uvd {
>> struct drm_file
>> *filp[RADEON_MAX_UVD_HANDLES];
>> unsigned
>> img_size[RADEON_MAX_UVD_HANDLES];
>> struct delayed_work
>> idle_work;
>> + struct radeon_uvd_fps
>> fps_info[RADEON_MAX_UVD_HANDLES];
>> };
>>
>> int radeon_uvd_init(struct
>> radeon_device *rdev);
>> diff --git
>> a/drivers/gpu/drm/radeon/radeon_uvd.c
>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>> index 6bf55ec..ef5667a 100644
>> ---
>> a/drivers/gpu/drm/radeon/radeon_uvd.c
>> +++
>> b/drivers/gpu/drm/radeon/radeon_uvd.c
>> @@ -237,6 +237,51 @@ void
>> radeon_uvd_force_into_uvd_segment(struct
>> radeon_bo *rbo)
>> rbo->placement.lpfn =
>> (256 * 1024 * 1024) >> PAGE_SHIFT;
>> }
>>
>> +static void
>> radeon_uvd_fps_clear_events(struct
>> radeon_device *rdev,
>> int
>> idx)
>> +{
>> + struct radeon_uvd_fps *fps
>> = &rdev->uvd.fps_info[idx];
>> + unsigned i;
>> +
>> + fps->timestamp = jiffies_64;
>> + fps->event_index = 0;
>> + for (i = 0; i <
>> RADEON_UVD_FPS_EVENTS_MAX; i++)
>> + fps->events[i] = 0;
>> +}
>> +
>> +static void
>> radeon_uvd_fps_note_event(struct
>> radeon_device *rdev,
>> int
>> idx)
>> +{
>> + struct radeon_uvd_fps *fps
>> = &rdev->uvd.fps_info[idx];
>> + uint64_t timestamp =
>> jiffies_64;
>> + unsigned rate = 0;
>> +
>> + uint8_t index =
>> fps->event_index++;
>> + fps->event_index %=
>> RADEON_UVD_FPS_EVENTS_MAX;
>> +
>> + rate = div64_u64(HZ,
>> max(timestamp - fps->timestamp,
>> 1ULL));
>> +
>> + fps->timestamp = timestamp;
>> + fps->events[index] =
>> min(rate, 120u);
>> +}
>> +
>> +static unsigned
>> radeon_uvd_estimate_fps(struct
>> radeon_device *rdev,
>> int
>> idx)
>> +{
>> + struct radeon_uvd_fps *fps
>> = &rdev->uvd.fps_info[idx];
>> + unsigned i, valid = 0,
>> count = 0;
>> +
>> + for (i = 0; i <
>> RADEON_UVD_FPS_EVENTS_MAX; i++) {
>> + /* We should
>> ignore zero values */
>> + if (fps->events[i]
>> != 0) {
>> + count +=
>> fps->events[i];
>> + valid++;
>> + }
>> + }
>> +
>> + if (valid > 0)
>> + return count / valid;
>> + else
>> + return
>> RADEON_UVD_DEFAULT_FPS;
>> +}
>> +
>> void
>> radeon_uvd_free_handles(struct
>> radeon_device *rdev, struct
>> drm_file *filp)
>> {
>> int i, r;
>> @@ -419,8 +464,10 @@ static int
>> radeon_uvd_cs_msg(struct
>> radeon_cs_parser
>> *p, struct radeon_bo *bo,
>>
>> /* create or decode,
>> validate the handle */
>> for (i = 0; i <
>> RADEON_MAX_UVD_HANDLES; ++i) {
>> - if
>> (atomic_read(&p->rdev->uvd.handles[i])
>> == handle)
>> + if
>> (atomic_read(&p->rdev->uvd.handles[i])
>> == handle)
>> {
>> +
>> radeon_uvd_fps_note_event(p->rdev, i);
>> return 0;
>> + }
>> }
>>
>> /* handle not found
>> try to alloc a new one */
>> @@ -428,6 +475,7 @@ static int
>> radeon_uvd_cs_msg(struct
>> radeon_cs_parser
>> *p, struct radeon_bo *bo,
>> if
>> (!atomic_cmpxchg(&p->rdev->uvd.handles[i],
>> 0,
>> handle)) {
>>
>> p->rdev->uvd.filp[i] = p->filp;
>>
>> p->rdev->uvd.img_size[i] = img_size;
>> +
>> radeon_uvd_fps_clear_events(p->rdev,
>> i);
>> return 0;
>> }
>> }
>> @@ -763,7 +811,7 @@ int
>> radeon_uvd_get_destroy_msg(struct
>> radeon_device
>> *rdev, int ring,
>> static void
>> radeon_uvd_count_handles(struct
>> radeon_device *rdev,
>>
>> unsigned *sd, unsigned *hd)
>> {
>> - unsigned i;
>> + unsigned i, fps_rate = 0;
>>
>> *sd = 0;
>> *hd = 0;
>> @@ -772,10 +820,13 @@ static void
>> radeon_uvd_count_handles(struct
>> radeon_device *rdev,
>> if
>> (!atomic_read(&rdev->uvd.handles[i]))
>> continue;
>>
>> - if
>> (rdev->uvd.img_size[i] >= 720*576)
>> - ++(*hd);
>> - else
>> - ++(*sd);
>> + fps_rate =
>> radeon_uvd_estimate_fps(rdev, i);
>> +
>> + if
>> (rdev->uvd.img_size[i] >= 720*576) {
>> + (*hd) +=
>> fps_rate > 30 ? 1 : 2;
>> + } else {
>> + (*sd) +=
>> fps_rate > 30 ? 1 : 2;
>> + }
>> }
>> }
>>
>> @@ -805,6 +856,7 @@ void
>> radeon_uvd_note_usage(struct
>> radeon_device
>> *rdev)
>> set_clocks &=
>> schedule_delayed_work(&rdev->uvd.idle_work,
>>
>> msecs_to_jiffies(UVD_IDLE_TIMEOUT_MS));
>>
>> +
>> if
>> ((rdev->pm.pm_method ==
>> PM_METHOD_DPM) &&
>> rdev->pm.dpm_enabled) {
>> unsigned hd =
>> 0, sd = 0;
>>
>> radeon_uvd_count_handles(rdev,
>> &sd, &hd);
>> --
>> 1.9.1
>>
>>
>>
>>
>>
>> --
>> Marco Antonio Benatto
>> Linux user ID:#506236
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140815/e12cfa06/attachment.sig>
More information about the dri-devel
mailing list