<div dir="ltr"><div>Grigori, Alex and Christian<br><br>are you ok if I merge ioctl flag idea with sysfs idea?<br><br></div>We let the system decide the state using the hint provided by CS ioctl flag but if performance is not good as expected <br>
or DPM table is not sane we still will have a fallback way o override this decision.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 15, 2014 at 1:54 PM, Christian König <span dir="ltr"><<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Am 15.08.2014 um 17:32 schrieb Grigori Goronzy:<div class=""><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 15.08.2014 17:26, Alex Deucher wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Fri, Aug 15, 2014 at 11:20 AM, Grigori Goronzy <<a href="mailto:greg@chown.ath.cx" target="_blank">greg@chown.ath.cx</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 15.08.2014 16:11, Christian König wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Marco,<br>
<br>
the problem with an CS ioctl flag is that we sometimes don't know how<br>
much SCLK/MCLK boost is needed, for example when we do post processing<br>
in the player using OpenGL and UVD decoding with VDPAU. In this case<br>
VDPAU don't has the slightest idea how high SCLK/MCLK must be and so<br>
can't give that info to the kernel either.<br>
<br>
</blockquote>
Maybe it's an acceptable workaround to simply disable dynamic UVD state<br>
selection in case the UVD states only have a single power level. That<br>
will avoid the performance issues on affected systems, while still<br>
allowing dynamic UVD states on systems that have a saner DPM table<br>
setup. I think it is mosly older systems that suffer from this.<br>
<br>
</blockquote>
That is exactly what we do now.<br>
<br>
</blockquote>
Is it? In 3.17-wip, dynamic UVD state selection (according to active<br>
streams) is still completely disabled. It will always use the generic<br>
UVD state. In fact wasn't it reverted again because of the performance<br>
issues on some systems?<br>
</blockquote>
<br></div>
This is the performance table of my laptop (at least the interesting parts), which I think is a typical example of the problem:<br>
<br>
[    4.106772] == power state 1 ==<br>
[    4.106774]     ui class: performance<br>
[    4.106776]     internal class: none<br>
[    4.106780]     uvd    vclk: 0 dclk: 0<br>
[    4.106782]         power level 0    sclk: 20000 vddc_index: 2<br>
[    4.106784]         power level 1    sclk: 50000 vddc_index: 2<br>
[    4.106805] == power state 3 ==<br>
[    4.106807]     ui class: none<br>
[    4.106808]     internal class: uvd<br>
[    4.106813]     uvd    vclk: 55000 dclk: 40000<br>
[    4.106816]         power level 0    sclk: 50000 vddc_index: 2<br>
[    4.106818]         power level 1    sclk: 50000 vddc_index: 2<br>
[    4.106820]     status:<br>
[    4.106822] == power state 4 ==<br>
[    4.106823]     ui class: battery<br>
[    4.106825]     internal class: uvd_hd<br>
[    4.106831]     uvd    vclk: 40000 dclk: 30000<br>
[    4.106833]         power level 0    sclk: 38000 vddc_index: 1<br>
[    4.106835]         power level 1    sclk: 38000 vddc_index: 1<br>
[    4.106839] == power state 5 ==<br>
[    4.106841]     ui class: battery<br>
[    4.106843]     internal class: uvd_sd<br>
[    4.106848]     uvd    vclk: 40000 dclk: 30000<br>
[    4.106850]         power level 0    sclk: 38000 vddc_index: 2<br>
[    4.106853]         power level 1    sclk: 38000 vddc_index: 2<br>
<br>
As you can see we currently always select the performance level uvd, which results in selecting the maximum sclk/dclk and vclk. Unfortunately neither uvd, uvd_sd nor uvd_hd allows the hardware to switch the sclk once selected (it's a hardware limitation of older uvd blocks).<br>

<br>
So for all cases where this is interesting you actually always have only a single power level to choose from.<span class="HOEnZb"><font color="#888888"><br>
<br>
Christian.</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Grigori<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Alex<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Best regards<br>
Grigori<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Regards,<br>
Christian.<br>
<br>
Am 15.08.2014 um 15:21 schrieb Marco Benatto:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hey all,<br>
<br>
I also had a talk with Alex yesterday about post-processing issues<br>
when using dynamic UVD profiles and a chamge on CS ioctl<br>
including a flag to let user mode driver tell to the kernel which<br>
performance requirement it wants for post processing. A commom<br>
point for both discussion is to stablish the default values for these<br>
profiles, but probably this ioctl change would be more impacting/complex<br>
to implement than a sysfs entry.<br>
<br>
If a sysfs entry is anough for now I can handle the code to create it<br>
and, with your help, the code to setup the UVD profile requested<br>
through it.<br>
<br>
Is there any suggestion?<br>
<br>
Thanks all for your help,<br>
<br>
<br>
On Fri, Aug 15, 2014 at 5:48 AM, Christian König<br>
<<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a> <mailto:<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.<u></u>com</a>>> wrote:<br>
<br>
     Hi guys,<br>
<br>
     to make a long story short every time I watch a movie my laptop<br>
     start to heat up because we always select the standard UVD power<br>
     profile without actually measuring if that is necessary.<br>
<br>
     Marco came up with a patch that seems to reliable measure the fps<br>
     send down to the kernel and so together with knowing the frame<br>
     size of the video should allow us to select the right UVD power<br>
     profile.<br>
<br>
     The problem is that Alex (unnoticed by me) completely disabled<br>
     selecting the UVD profiles because of some issues with advanced<br>
     post processing discussed on IRC. The problem seems to be that the<br>
     lower UVD profiles have a to low SCLK/MCLK to handle the 3D load<br>
     that comes with scaling, deinterlacing etc...<br>
<br>
     I unfortunately don't have time for it, cause this only affects<br>
     the hardware generations R600-SI and not the newest one CIK. So<br>
     could you guys stick together and come up with a solution?<br>
     Something like a sysfs entry that let's us select the minimum UVD<br>
     power level allowed?<br>
<br>
     I think Marco is happy to come up with a patch, we just need to<br>
     know what's really needed and what should be the default values.<br>
     I'm happy to review everything that comes out of it, just don't<br>
     have time to do it myself.<br>
<br>
     Happy discussion and thanks in advance,<br>
     Christian.<br>
<br>
     Am 12.08.2014 um 15:05 schrieb Alex Deucher:<br>
<br>
         On Tue, Aug 12, 2014 at 6:00 AM, Christian König<br>
         <<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a> <mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
             Am 11.08.2014 um 16:52 schrieb Alex Deucher:<br>
<br>
                 On Mon, Aug 11, 2014 at 5:08 AM, Christian König<br>
                 <<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
                 <mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
                     Am 07.08.2014 um 21:43 schrieb Alex Deucher:<br>
<br>
                         On Thu, Aug 7, 2014 at 11:32 AM, Christian König<br>
                         <<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
                         <mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>> wrote:<br>
<br>
                             Am 07.08.2014 um 16:32 schrieb Alex Deucher:<br>
<br>
                                 On Thu, Aug 7, 2014 at 7:33 AM,<br>
                                 Christian König<br>
                                 <<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.de</a><br>
                                 <mailto:<a href="mailto:deathsimple@vodafone.de" target="_blank">deathsimple@vodafone.<u></u>de</a>>><br>
                                 wrote:<br>
<br>
                                     From: Marco A Benatto<br>
                                     <<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@gmail.com</a><br>
                                     <mailto:<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@<u></u>gmail.com</a>>><br>
<br>
                                     Adding a Frames Per Second<br>
                                     estimation logic on UVD handles<br>
                                     when it has being used. This<br>
                                     estimation is per handle basis<br>
                                     and will help on DPM profile<br>
                                     calculation.<br>
<br>
                                     v2 (chk): fix timestamp type, move<br>
                                     functions around and<br>
                                                   cleanup code a bit.<br>
<br>
                                 Will this really help much?  I thought<br>
                                 the problem was mainly due to<br>
                                 sclk and mclk for post processing.<br>
<br>
<br>
                             It should at least handle the UVD side for<br>
                             upclocking when you get a<br>
                             lot<br>
                             of<br>
                             streams / fps. And at on my NI the patch<br>
                             seems to do exactly that.<br>
<br>
                             Switching sclk and mclk for post<br>
                             processing is a different task, and I<br>
                             actually have no idea what to do with them.<br>
<br>
                         At this point we always choose the plain UVD<br>
                         state anyway so this<br>
                         patch would only take effect if we re-enabled<br>
                         the dynamic UVD state<br>
                         selection.<br>
<br>
<br>
                     Hui? I thought we already re-enabled the dynamic<br>
                     UVD state selection, but<br>
                     double checking this I found it disabled again.<br>
<br>
                     What was the problem with that? Looks like I<br>
                     somehow missed the<br>
                     discussion<br>
                     around it.<br>
<br>
                 We did, but after doing so a number of people<br>
                 complained about a<br>
                 regression on IRC because when apps like xmbc enabled<br>
                 post processing,<br>
                 performance went down.<br>
<br>
<br>
             That's strange, from my experience the different UVD<br>
             performance states only<br>
             affect UVDs dclk/vclk, not sclk/mclk. I need to get the<br>
             DPM dumps to<br>
             confirms this.<br>
<br>
         The sclks and mclks are usually different as well, especially<br>
         on APUs.<br>
         I can send you some examples.<br>
<br>
             You not off hand remember who complained on IRC? Finding<br>
             something in the<br>
             IRC logs is like searching for a needle in a haystack.<br>
<br>
         I don't remember off hand.  I think zgreg was involved in some<br>
         of the<br>
         discussions.<br>
<br>
         Alex<br>
<br>
             Thanks,<br>
             Christian.<br>
<br>
<br>
                 Alex<br>
<br>
<br>
                     Christian.<br>
<br>
<br>
                         For the post processing, we probably need a<br>
                         hint we can<br>
                         pass to the driver in the CS ioctl to denote<br>
                         what state we need.<br>
                         Although if we did that, this could would<br>
                         largely be moot.  That said,<br>
                         newer asics support dynamic UVD clocks so we<br>
                         really only need<br>
                         something like that for older asics and I<br>
                         guess VCE.<br>
<br>
                         Alex<br>
<br>
                             Christian.<br>
<br>
<br>
                                 Alex<br>
<br>
                                     Signed-off-by: Marco A Benatto<br>
                                     <<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@gmail.com</a><br>
                                     <mailto:<a href="mailto:marco.antonio.780@gmail.com" target="_blank">marco.antonio.780@<u></u>gmail.com</a>>><br>
                                     Signed-off-by: Christian König<br>
                                     <<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a><br>
                                     <mailto:<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.<u></u>com</a>>><br>
                                     ---<br>
<br>
                                      drivers/gpu/drm/radeon/radeon.<u></u>h<br>
                                        | 10 ++++++<br>
<br>
                                      drivers/gpu/drm/radeon/radeon_<u></u>uvd.c<br>
                                     | 64<br>
                                     ++++++++++++++++++++++++++++++<u></u>+++----<br>
                                          2 files changed, 68<br>
                                     insertions(+), 6 deletions(-)<br>
<br>
                                     diff --git<br>
                                     a/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
                                     b/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
                                     index 9e1732e..e92f6cb 100644<br>
                                     --- a/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
                                     +++ b/drivers/gpu/drm/radeon/<u></u>radeon.h<br>
                                     @@ -1617,6 +1617,15 @@ int<br>
                                     radeon_pm_get_type_index(<u></u>struct<br>
                                     radeon_device<br>
                                     *rdev,<br>
                                          #define<br>
                                     RADEON_UVD_STACK_SIZE  (1024*1024)<br>
                                          #define RADEON_UVD_HEAP_SIZE<br>
                                      (1024*1024)<br>
<br>
                                     +#define RADEON_UVD_FPS_EVENTS_MAX 8<br>
                                     +#define RADEON_UVD_DEFAULT_FPS 60<br>
                                     +<br>
                                     +struct radeon_uvd_fps {<br>
                                     +       uint64_t        timestamp;<br>
                                     +       uint8_t         event_index;<br>
                                     +       uint8_t<br>
                                      events[RADEON_UVD_FPS_EVENTS_<u></u>MAX];<br>
                                     +};<br>
                                     +<br>
                                          struct radeon_uvd {<br>
                                                 struct radeon_bo<br>
                                       *vcpu_bo;<br>
                                                 void<br>
                                       *cpu_addr;<br>
                                     @@ -1626,6 +1635,7 @@ struct<br>
                                     radeon_uvd {<br>
                                                 struct drm_file<br>
                                      *filp[RADEON_MAX_UVD_HANDLES];<br>
                                                 unsigned<br>
                                       img_size[RADEON_MAX_UVD_<u></u>HANDLES];<br>
                                                 struct delayed_work<br>
                                      idle_work;<br>
                                     +       struct radeon_uvd_fps<br>
                                      fps_info[RADEON_MAX_UVD_<u></u>HANDLES];<br>
                                          };<br>
<br>
                                          int radeon_uvd_init(struct<br>
                                     radeon_device *rdev);<br>
                                     diff --git<br>
                                     a/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
                                     b/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
                                     index 6bf55ec..ef5667a 100644<br>
                                     ---<br>
                                     a/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
                                     +++<br>
                                     b/drivers/gpu/drm/radeon/<u></u>radeon_uvd.c<br>
                                     @@ -237,6 +237,51 @@ void<br>
                                     radeon_uvd_force_into_uvd_<u></u>segment(struct<br>
                                     radeon_bo *rbo)<br>
                                                 rbo->placement.lpfn =<br>
                                     (256 * 1024 * 1024) >> PAGE_SHIFT;<br>
                                          }<br>
<br>
                                     +static void<br>
                                     radeon_uvd_fps_clear_events(<u></u>struct<br>
                                     radeon_device *rdev,<br>
                                     int<br>
                                     idx)<br>
                                     +{<br>
                                     +       struct radeon_uvd_fps *fps<br>
                                     = &rdev->uvd.fps_info[idx];<br>
                                     +       unsigned i;<br>
                                     +<br>
                                     +       fps->timestamp = jiffies_64;<br>
                                     +       fps->event_index = 0;<br>
                                     +       for (i = 0; i <<br>
                                     RADEON_UVD_FPS_EVENTS_MAX; i++)<br>
                                     +               fps->events[i] = 0;<br>
                                     +}<br>
                                     +<br>
                                     +static void<br>
                                     radeon_uvd_fps_note_event(<u></u>struct<br>
                                     radeon_device *rdev,<br>
                                     int<br>
                                     idx)<br>
                                     +{<br>
                                     +       struct radeon_uvd_fps *fps<br>
                                     = &rdev->uvd.fps_info[idx];<br>
                                     +       uint64_t timestamp =<br>
                                     jiffies_64;<br>
                                     +       unsigned rate = 0;<br>
                                     +<br>
                                     +       uint8_t index =<br>
                                     fps->event_index++;<br>
                                     +       fps->event_index %=<br>
                                     RADEON_UVD_FPS_EVENTS_MAX;<br>
                                     +<br>
                                     +       rate = div64_u64(HZ,<br>
                                     max(timestamp - fps->timestamp,<br>
                                     1ULL));<br>
                                     +<br>
                                     +       fps->timestamp = timestamp;<br>
                                     +       fps->events[index] =<br>
                                     min(rate, 120u);<br>
                                     +}<br>
                                     +<br>
                                     +static unsigned<br>
                                     radeon_uvd_estimate_fps(struct<br>
                                     radeon_device *rdev,<br>
                                     int<br>
                                     idx)<br>
                                     +{<br>
                                     +       struct radeon_uvd_fps *fps<br>
                                     = &rdev->uvd.fps_info[idx];<br>
                                     +       unsigned i, valid = 0,<br>
                                     count = 0;<br>
                                     +<br>
                                     +       for (i = 0; i <<br>
                                     RADEON_UVD_FPS_EVENTS_MAX; i++) {<br>
                                     +               /* We should<br>
                                     ignore zero values */<br>
                                     +               if (fps->events[i]<br>
                                     != 0) {<br>
                                     +                       count +=<br>
                                     fps->events[i];<br>
                                     +                       valid++;<br>
                                     +               }<br>
                                     +       }<br>
                                     +<br>
                                     +       if (valid > 0)<br>
                                     +               return count / valid;<br>
                                     +       else<br>
                                     +               return<br>
                                     RADEON_UVD_DEFAULT_FPS;<br>
                                     +}<br>
                                     +<br>
                                          void<br>
                                     radeon_uvd_free_handles(struct<br>
                                     radeon_device *rdev, struct<br>
                                     drm_file *filp)<br>
                                          {<br>
                                                 int i, r;<br>
                                     @@ -419,8 +464,10 @@ static int<br>
                                     radeon_uvd_cs_msg(struct<br>
                                     radeon_cs_parser<br>
                                     *p, struct radeon_bo *bo,<br>
<br>
                                                 /* create or decode,<br>
                                     validate the handle */<br>
                                                 for (i = 0; i <<br>
                                     RADEON_MAX_UVD_HANDLES; ++i) {<br>
                                     -               if<br>
                                     (atomic_read(&p->rdev->uvd.<u></u>handles[i])<br>
                                     == handle)<br>
                                     +               if<br>
                                     (atomic_read(&p->rdev->uvd.<u></u>handles[i])<br>
                                     == handle)<br>
                                     {<br>
                                     +<br>
                                      radeon_uvd_fps_note_event(p-><u></u>rdev, i);<br>
                                                                 return 0;<br>
                                     +               }<br>
                                                 }<br>
<br>
                                                 /* handle not found<br>
                                     try to alloc a new one */<br>
                                     @@ -428,6 +475,7 @@ static int<br>
                                     radeon_uvd_cs_msg(struct<br>
                                     radeon_cs_parser<br>
                                     *p, struct radeon_bo *bo,<br>
                                                         if<br>
                                     (!atomic_cmpxchg(&p->rdev-><u></u>uvd.handles[i],<br>
                                     0,<br>
                                     handle)) {<br>
<br>
                                     p->rdev->uvd.filp[i] = p->filp;<br>
<br>
                                     p->rdev->uvd.img_size[i] = img_size;<br>
                                     +<br>
                                      radeon_uvd_fps_clear_events(p-<u></u>>rdev,<br>
                                     i);<br>
                                                                 return 0;<br>
                                                         }<br>
                                                 }<br>
                                     @@ -763,7 +811,7 @@ int<br>
                                     radeon_uvd_get_destroy_msg(<u></u>struct<br>
                                     radeon_device<br>
                                     *rdev, int ring,<br>
                                          static void<br>
                                     radeon_uvd_count_handles(<u></u>struct<br>
                                     radeon_device *rdev,<br>
<br>
                                            unsigned *sd, unsigned *hd)<br>
                                          {<br>
                                     -       unsigned i;<br>
                                     +       unsigned i, fps_rate = 0;<br>
<br>
                                                 *sd = 0;<br>
                                                 *hd = 0;<br>
                                     @@ -772,10 +820,13 @@ static void<br>
                                     radeon_uvd_count_handles(<u></u>struct<br>
                                     radeon_device *rdev,<br>
                                                         if<br>
                                     (!atomic_read(&rdev->uvd.<u></u>handles[i]))<br>
                                                                 continue;<br>
<br>
                                     -               if<br>
                                     (rdev->uvd.img_size[i] >= 720*576)<br>
                                     -                       ++(*hd);<br>
                                     -               else<br>
                                     -                       ++(*sd);<br>
                                     +               fps_rate =<br>
                                     radeon_uvd_estimate_fps(rdev, i);<br>
                                     +<br>
                                     +               if<br>
                                     (rdev->uvd.img_size[i] >= 720*576) {<br>
                                     +                       (*hd) +=<br>
                                     fps_rate > 30 ? 1 : 2;<br>
                                     +               } else {<br>
                                     +                       (*sd) +=<br>
                                     fps_rate > 30 ? 1 : 2;<br>
                                     +               }<br>
                                                 }<br>
                                          }<br>
<br>
                                     @@ -805,6 +856,7 @@ void<br>
                                     radeon_uvd_note_usage(struct<br>
                                     radeon_device<br>
                                     *rdev)<br>
                                                 set_clocks &=<br>
                                     schedule_delayed_work(&rdev-><u></u>uvd.idle_work,<br>
<br>
                                     msecs_to_jiffies(UVD_IDLE_<u></u>TIMEOUT_MS));<br>
<br>
                                     +<br>
                                                 if<br>
                                     ((rdev->pm.pm_method ==<br>
                                     PM_METHOD_DPM) &&<br>
                                     rdev->pm.dpm_enabled) {<br>
                                                         unsigned hd =<br>
                                     0, sd = 0;<br>
<br>
                                     radeon_uvd_count_handles(rdev,<br>
                                     &sd, &hd);<br>
                                     --<br>
                                     1.9.1<br>
<br>
<br>
<br>
<br>
<br>
--<br>
Marco Antonio Benatto<br>
Linux user ID:#506236<br>
</blockquote></blockquote>
<br>
</blockquote></blockquote>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Marco Antonio Benatto<br>Linux user ID:<font> #506236</font>
</div>