<div dir="ltr"><div>Grigori, Alex and Christian<br><br>are you ok if I merge ioctl flag idea with sysfs idea?<br><br></div>We let the system decide the state using the hint provided by CS ioctl flag but if performance is not good as expected <br>
or DPM table is not sane we still will have a fallback way o override this decision.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 15, 2014 at 1:54 PM, Christian König <span dir="ltr"><<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Am 15.08.2014 um 17:32 schrieb Grigori Goronzy:<div class=""><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 15.08.2014 17:26, Alex Deucher wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <<a href="mailto:greg@chown.ath.cx" target="_blank">greg@chown.ath.cx</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 15.08.2014 16:11, Christian König wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Marco,<br>
<br>
the problem with an CS ioctl flag is that we sometimes don't know how<br>
much SCLK/MCLK boost is needed, for example when we do post processing<br>
in the player using OpenGL and UVD decoding with VDPAU. In this case<br>
VDPAU don't has the slightest idea how high SCLK/MCLK must be and so<br>
can't give that info to the kernel either.<br>
<br>
</blockquote>
Maybe it's an acceptable workaround to simply disable dynamic UVD state<br>
selection in case the UVD states only have a single power level. That<br>
will avoid the performance issues on affected systems, while still<br>
allowing dynamic UVD states on systems that have a saner DPM table<br>
setup. I think it is mosly older systems that suffer from this.<br>
<br>
</blockquote>
That is exactly what we do now.<br>
<br>
</blockquote>
Is it? In 3.17-wip, dynamic UVD state selection (according to active<br>
streams) is still completely disabled. It will always use the generic<br>
UVD state. In fact wasn't it reverted again because of the performance<br>
issues on some systems?<br>
</blockquote>
<br></div>
This is the performance table of my laptop (at least the interesting parts), which I think is a typical example of the problem:<br>
<br>
[ 4.106772] == power state 1 ==<br>
[ 4.106774] ui class: performance<br>
[ 4.106776] internal class: none<br>
[ 4.106780] uvd vclk: 0 dclk: 0<br>
[ 4.106782] power level 0 sclk: 20000 vddc_index: 2<br>
[ 4.106784] power level 1 sclk: 50000 vddc_index: 2<br>
[ 4.106805] == power state 3 ==<br>
[ 4.106807] ui class: none<br>
[ 4.106808] internal class: uvd<br>
[ 4.106813] uvd vclk: 55000 dclk: 40000<br>
[ 4.106816] power level 0 sclk: 50000 vddc_index: 2<br>
[ 4.106818] power level 1 sclk: 50000 vddc_index: 2<br>
[ 4.106820] status:<br>
[ 4.106822] == power state 4 ==<br>
[ 4.106823] ui class: battery<br>
[ 4.106825] internal class: uvd_hd<br>
[ 4.106831] uvd vclk: 40000 dclk: 30000<br>
[ 4.106833] power level 0 sclk: 38000 vddc_index: 1<br>
[ 4.106835] power level 1 sclk: 38000 vddc_index: 1<br>
[ 4.106839] == power state 5 ==<br>
[ 4.106841] ui class: battery<br>
[ 4.106843] internal class: uvd_sd<br>
[ 4.106848] uvd vclk: 40000 dclk: 30000<br>
[ 4.106850] power level 0 sclk: 38000 vddc_index: 2<br>
[ 4.106853] power level 1 sclk: 38000 vddc_index: 2<br>
<br>
As you can see we currently always select the performance level uvd, which results in selecting the maximum sclk/dclk and vclk. Unfortunately neither uvd, uvd_sd nor uvd_hd allows the hardware to switch the sclk once selected (it's a hardware limitation of older uvd blocks).<br>
<br>
So for all cases where this is interesting you actually always have only a single power level to choose from.<span class="HOEnZb"><font color="#888888"><br>
<br>
Christian.</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Grigori<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Alex<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Best regards<br>
Grigori<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Regards,<br>
Christian.<br>
<br>
Am 15.08.2014 um 15:21 schrieb Marco Benatto:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hey all,<br>
<br>
I also had a talk with Alex yesterday about post-processing issues<br>
when using dynamic UVD profiles and a chamge on CS ioctl<br>
including a flag to let user mode driver tell to the kernel which<br>
performance requirement it wants for post processing. A commom<br>
point for both discussion is to stablish the default values for these<br>
profiles, but probably this ioctl change would be more impacting/complex<br>
to implement than a sysfs entry.<br>
<br>
If a sysfs entry is anough for now I can handle the code to create it<br>
and, with your help, the code to setup the UVD profile requested<br>
through it.<br>
<br>
Is there any suggestion?<br>
<br>
Thanks all for your help,<br>
<br>
<br>
On Fri, Aug 15, 2014 at 5:48 AM, Christian König<br>
<<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a> <mailto:<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.<u></u>com</a>>> wrote:<br>
<br>
Hi guys,<br>
<br>
to make a long story short every time I watch a movie my laptop<br>
start to heat up because we always select the standard UVD power<br>
profile without actually measuring if that is necessary.<br>
<br>
Marco came up with a patch that seems to reliable measure the fps<br>
send down to the kernel and so together with knowing the frame<br>
size of the video should allow us to select the right UVD power<br>
profile.<br>
<br>
The problem is that Alex (unnoticed by me) completely disabled<br>
selecting the UVD profiles because of some issues with advanced<br>
post processing discussed on IRC. The problem seems to be that the<br>
lower UVD profiles have a to low SCLK/MCLK to handle the 3D load<br>
that comes with scaling, deinterlacing etc...<br>
<br>
I unfortunately don't have time for it, cause this only affects<br>
the hardware generations R600-SI and not the newest one CIK. So<br>
could you guys stick together and come up with a solution?<br>
Something like a sysfs entry that let's us select the minimum UVD<br>
power level allowed?<br>
<br>
I think Marco is happy to come up with a patch, we just need to<br>
know what's really needed and what should be the default values.<br>
I'm happy to review everything that comes out of it, just don't<br>
have time to do it myself.<br>
<br>
Happy discussion and thanks in advance,<br>
Christian.<br>
<br>
Am 12.08.2014 um 15:05 schrieb Alex Deucher:<br>
<br>
On Tue, Aug 12, 2014 at 6:00 AM, Christian König<br>
<<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a> <mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
Am 11.08.2014 um 16:52 schrieb Alex Deucher:<br>
<br>
On Mon, Aug 11, 2014 at 5:08 AM, Christian König<br>
<<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
<mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
Am 07.08.2014 um 21:43 schrieb Alex Deucher:<br>
<br>
On Thu, Aug 7, 2014 at 11:32 AM, Christian König<br>
<<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
<mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
Am 07.08.2014 um 16:32 schrieb Alex Deucher:<br>
<br>
On Thu, Aug 7, 2014 at 7:33 AM,<br>
Christian König<br>
<<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
<mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>><br>
wrote:<br>
<br>
From: Marco A Benatto<br>
<<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@gmail.com</a><br>
<mailto:<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@<u></u>gmail.com</a>>><br>
<br>
Adding a Frames Per Second<br>
estimation logic on UVD handles<br>
when it has being used. This<br>
estimation is per handle basis<br>
and will help on DPM profile<br>
calculation.<br>
<br>
v2 (chk): fix timestamp type, move<br>
functions around and<br>
cleanup code a bit.<br>
<br>
Will this really help much? I thought<br>
the problem was mainly due to<br>
sclk and mclk for post processing.<br>
<br>
<br>
It should at least handle the UVD side for<br>
upclocking when you get a<br>
lot<br>
of<br>
streams / fps. And at on my NI the patch<br>
seems to do exactly that.<br>
<br>
Switching sclk and mclk for post<br>
processing is a different task, and I<br>
actually have no idea what to do with them.<br>
<br>
At this point we always choose the plain UVD<br>
state anyway so this<br>
patch would only take effect if we re-enabled<br>
the dynamic UVD state<br>
selection.<br>
<br>
<br>
Hui? I thought we already re-enabled the dynamic<br>
UVD state selection, but<br>
double checking this I found it disabled again.<br>
<br>
What was the problem with that? Looks like I<br>
somehow missed the<br>
discussion<br>
around it.<br>
<br>
We did, but after doing so a number of people<br>
complained about a<br>
regression on IRC because when apps like xmbc enabled<br>
post processing,<br>
performance went down.<br>
<br>
<br>
That's strange, from my experience the different UVD<br>
performance states only<br>
affect UVDs dclk/vclk, not sclk/mclk. I need to get the<br>
DPM dumps to<br>
confirms this.<br>
<br>
The sclks and mclks are usually different as well, especially<br>
on APUs.<br>
I can send you some examples.<br>
<br>
You not off hand remember who complained on IRC? Finding<br>
something in the<br>
IRC logs is like searching for a needle in a haystack.<br>
<br>
I don't remember off hand. I think zgreg was involved in some<br>
of the<br>
discussions.<br>
<br>
Alex<br>
<br>
Thanks,<br>
Christian.<br>
<br>
<br>
Alex<br>
<br>
<br>
Christian.<br>
<br>
<br>
For the post processing, we probably need a<br>
hint we can<br>
pass to the driver in the CS ioctl to denote<br>
what state we need.<br>
Although if we did that, this could would<br>
largely be moot. That said,<br>
newer asics support dynamic UVD clocks so we<br>
really only need<br>
something like that for older asics and I<br>
guess VCE.<br>
<br>
Alex<br>
<br>
Christian.<br>
<br>
<br>
Alex<br>
<br>
Signed-off-by: Marco A Benatto<br>
<<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@gmail.com</a><br>
<mailto:<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@<u></u>gmail.com</a>>><br>
Signed-off-by: Christian König<br>
<<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a><br>
<mailto:<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.<u></u>com</a>>><br>
---<br>
<br>
drivers/gpu/drm/radeon/radeon.<u></u>h<br>
| 10 ++++++<br>
<br>
drivers/gpu/drm/radeon/radeon_<u></u>uvd.c<br>
| 64<br>
++++++++++++++++++++++++++++++<u></u>+++----<br>
2 files changed, 68<br>
insertions(+), 6 deletions(-)<br>
<br>
diff --git<br>
a/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
b/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
index 9e1732e..e92f6cb 100644<br>
--- a/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
+++ b/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
@@ -1617,6 +1617,15 @@ int<br>
radeon_pm_get_type_index(<u></u>struct<br>
radeon_device<br>
*rdev,<br>
#define<br>
RADEON_UVD_STACK_SIZE (1024*1024)<br>
#define RADEON_UVD_HEAP_SIZE<br>
(1024*1024)<br>
<br>
+#define RADEON_UVD_FPS_EVENTS_MAX 8<br>
+#define RADEON_UVD_DEFAULT_FPS 60<br>
+<br>
+struct radeon_uvd_fps {<br>
+ uint64_t timestamp;<br>
+ uint8_t event_index;<br>
+ uint8_t<br>
events[RADEON_UVD_FPS_EVENTS_<u></u>MAX];<br>
+};<br>
+<br>
struct radeon_uvd {<br>
struct radeon_bo<br>
*vcpu_bo;<br>
void<br>
*cpu_addr;<br>
@@ -1626,6 +1635,7 @@ struct<br>
radeon_uvd {<br>
struct drm_file<br>
*filp[RADEON_MAX_UVD_HANDLES];<br>
unsigned<br>
img_size[RADEON_MAX_UVD_<u></u>HANDLES];<br>
struct delayed_work<br>
idle_work;<br>
+ struct radeon_uvd_fps<br>
fps_info[RADEON_MAX_UVD_<u></u>HANDLES];<br>
};<br>
<br>
int radeon_uvd_init(struct<br>
radeon_device *rdev);<br>
diff --git<br>
a/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
b/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
index 6bf55ec..ef5667a 100644<br>
---<br>
a/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
+++<br>
b/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
@@ -237,6 +237,51 @@ void<br>
radeon_uvd_force_into_uvd_<u></u>segment(struct<br>
radeon_bo *rbo)<br>
rbo->placement.lpfn =<br>
(256 * 1024 * 1024) >> PAGE_SHIFT;<br>
}<br>
<br>
+static void<br>
radeon_uvd_fps_clear_events(<u></u>struct<br>
radeon_device *rdev,<br>
int<br>
idx)<br>
+{<br>
+ struct radeon_uvd_fps *fps<br>
= &rdev->uvd.fps_info[idx];<br>
+ unsigned i;<br>
+<br>
+ fps->timestamp = jiffies_64;<br>
+ fps->event_index = 0;<br>
+ for (i = 0; i <<br>
RADEON_UVD_FPS_EVENTS_MAX; i++)<br>
+ fps->events[i] = 0;<br>
+}<br>
+<br>
+static void<br>
radeon_uvd_fps_note_event(<u></u>struct<br>
radeon_device *rdev,<br>
int<br>
idx)<br>
+{<br>
+ struct radeon_uvd_fps *fps<br>
= &rdev->uvd.fps_info[idx];<br>
+ uint64_t timestamp =<br>
jiffies_64;<br>
+ unsigned rate = 0;<br>
+<br>
+ uint8_t index =<br>
fps->event_index++;<br>
+ fps->event_index %=<br>
RADEON_UVD_FPS_EVENTS_MAX;<br>
+<br>
+ rate = div64_u64(HZ,<br>
max(timestamp - fps->timestamp,<br>
1ULL));<br>
+<br>
+ fps->timestamp = timestamp;<br>
+ fps->events[index] =<br>
min(rate, 120u);<br>
+}<br>
+<br>
+static unsigned<br>
radeon_uvd_estimate_fps(struct<br>
radeon_device *rdev,<br>
int<br>
idx)<br>
+{<br>
+ struct radeon_uvd_fps *fps<br>
= &rdev->uvd.fps_info[idx];<br>
+ unsigned i, valid = 0,<br>
count = 0;<br>
+<br>
+ for (i = 0; i <<br>
RADEON_UVD_FPS_EVENTS_MAX; i++) {<br>
+ /* We should<br>
ignore zero values */<br>
+ if (fps->events[i]<br>
!= 0) {<br>
+ count +=<br>
fps->events[i];<br>
+ valid++;<br>
+ }<br>
+ }<br>
+<br>
+ if (valid > 0)<br>
+ return count / valid;<br>
+ else<br>
+ return<br>
RADEON_UVD_DEFAULT_FPS;<br>
+}<br>
+<br>
void<br>
radeon_uvd_free_handles(struct<br>
radeon_device *rdev, struct<br>
drm_file *filp)<br>
{<br>
int i, r;<br>
@@ -419,8 +464,10 @@ static int<br>
radeon_uvd_cs_msg(struct<br>
radeon_cs_parser<br>
*p, struct radeon_bo *bo,<br>
<br>
/* create or decode,<br>
validate the handle */<br>
for (i = 0; i <<br>
RADEON_MAX_UVD_HANDLES; ++i) {<br>
- if<br>
(atomic_read(&p->rdev->uvd.<u></u>handles[i])<br>
== handle)<br>
+ if<br>
(atomic_read(&p->rdev->uvd.<u></u>handles[i])<br>
== handle)<br>
{<br>
+<br>
radeon_uvd_fps_note_event(p-><u></u>rdev, i);<br>
return 0;<br>
+ }<br>
}<br>
<br>
/* handle not found<br>
try to alloc a new one */<br>
@@ -428,6 +475,7 @@ static int<br>
radeon_uvd_cs_msg(struct<br>
radeon_cs_parser<br>
*p, struct radeon_bo *bo,<br>
if<br>
(!atomic_cmpxchg(&p->rdev-><u></u>uvd.handles[i],<br>
0,<br>
handle)) {<br>
<br>
p->rdev->uvd.filp[i] = p->filp;<br>
<br>
p->rdev->uvd.img_size[i] = img_size;<br>
+<br>
radeon_uvd_fps_clear_events(p-<u></u>>rdev,<br>
i);<br>
return 0;<br>
}<br>
}<br>
@@ -763,7 +811,7 @@ int<br>
radeon_uvd_get_destroy_msg(<u></u>struct<br>
radeon_device<br>
*rdev, int ring,<br>
static void<br>
radeon_uvd_count_handles(<u></u>struct<br>
radeon_device *rdev,<br>
<br>
unsigned *sd, unsigned *hd)<br>
{<br>
- unsigned i;<br>
+ unsigned i, fps_rate = 0;<br>
<br>
*sd = 0;<br>
*hd = 0;<br>
@@ -772,10 +820,13 @@ static void<br>
radeon_uvd_count_handles(<u></u>struct<br>
radeon_device *rdev,<br>
if<br>
(!atomic_read(&rdev->uvd.<u></u>handles[i]))<br>
continue;<br>
<br>
- if<br>
(rdev->uvd.img_size[i] >= 720*576)<br>
- ++(*hd);<br>
- else<br>
- ++(*sd);<br>
+ fps_rate =<br>
radeon_uvd_estimate_fps(rdev, i);<br>
+<br>
+ if<br>
(rdev->uvd.img_size[i] >= 720*576) {<br>
+ (*hd) +=<br>
fps_rate > 30 ? 1 : 2;<br>
+ } else {<br>
+ (*sd) +=<br>
fps_rate > 30 ? 1 : 2;<br>
+ }<br>
}<br>
}<br>
<br>
@@ -805,6 +856,7 @@ void<br>
radeon_uvd_note_usage(struct<br>
radeon_device<br>
*rdev)<br>
set_clocks &=<br>
schedule_delayed_work(&rdev-><u></u>uvd.idle_work,<br>
<br>
msecs_to_jiffies(UVD_IDLE_<u></u>TIMEOUT_MS));<br>
<br>
+<br>
if<br>
((rdev->pm.pm_method ==<br>
PM_METHOD_DPM) &&<br>
rdev->pm.dpm_enabled) {<br>
unsigned hd =<br>
0, sd = 0;<br>
<br>
radeon_uvd_count_handles(rdev,<br>
&sd, &hd);<br>
--<br>
1.9.1<br>
<br>
<br>
<br>
<br>
<br>
--<br>
Marco Antonio Benatto<br>
Linux user ID:#506236<br>
</blockquote></blockquote>
<br>
</blockquote></blockquote>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Marco Antonio Benatto<br>Linux user ID:<font> #506236</font>
</div>