[Libva] Quality of the scaled H264 decoded image

Thu Mar 28 12:43:50 PDT 2013

Is there a way to do scaling without displaying the video on the monitor? I
need to show a high res version of the video on the display with putSurface
and a lower res version for further processing. vaPutSurface will display
the video at a lower res but the surface resolution is unmodified.
Unfortunately the GPU I am using does not have the PostProcessing
capability or else I could copy the surface to another post Processing
surface and apply the filter and then derive the image.

On Mon, Feb 25, 2013 at 6:35 PM, Ratin <ratin3 at gmail.com> wrote:

> Just to let you know , I tried the VPP code for de-noise, I set the
> de-noise level all the way to (max) it doesn't seem to make any noticable
> difference as far as quality. I noticed the fast path for setting the
> scaling type in VAProcPipelineParameterBuffer, so including putSurface flag
> and as well as an additional filter, there seems to be three different ways
> to do this. Anyways, with VA_FILTER_SCALING_NL_ANAMORPHIC and de-noise, the
> gpu usage increases  to 31 % (first measure below). With
> VA_FILTER_SCALING_NL_ANAMORPHIC as part of putSurface flag and no de-noise
> filtering, GPU usage drops 10 %. With VA_FILTER_SCALING_FAST as part of
> putsurface flag, it drops another 4 %. The stream I am decoding / rendering
> is 1280 x 720 H 264, 3 mbps.
>
>
>
> clock: unknown  sampler clock: unknown
>                    render busy:  31%:
> ██████▎                                render space: 37/131072
>                 bitstream busy:   4%: ▉
> bitstream space: 1/131072
>                   blitter busy:  24%:
> ████▉                                 blitter space: 10/131072
>
>                           task  percent busy
>                            GAM:  33%: ██████▋                 vert fetch:
> 0 (0/sec)
>                            TSG:  19%: ███▉                    prim fetch:
> 0 (0/sec)
>                            VFE:  19%: ███▉                 VS invocations:
> 104062 (0/sec)
>                             VF:  19%: ███▉                 GS invocations:
> 0 (0/sec)
>                           GAFS:  14%: ██▉                       GS prims:
> 0 (0/sec)
>                            TDG:   0%:                      CL invocations:
> 44604 (0/sec)
>                           GAFM:   0%:                           CL prims:
> 47270 (0/sec)
>                            SOL:   0%:                      PS invocations:
> 3011525360 (0/sec)
>                             GS:   0%:                      PS depth pass:
> 3010388146 (0/sec)
>
> render clock: unknown  sampler clock: unknown
>                    render busy:  21%:
> ████▎                                  render space: 20/131072
>                 bitstream busy:   4%: ▉
> bitstream space: 1/131072
>                   blitter busy:  21%:
> ████▎                                 blitter space: 8/131072
>
>                           task  percent busy
>                            GAM:  23%: ████▋                   vert fetch:
> 0 (0/sec)
>                            TSG:  10%: ██                      prim fetch:
> 0 (0/sec)
>                            VFE:  10%: ██                   VS invocations:
> 104062 (0/sec)
>                             VF:  10%: ██                   GS invocations:
> 0 (0/sec)
>                           GAFS:   9%: █▉                        GS prims:
> 0 (0/sec)
>                            TDG:   0%:                      CL invocations:
> 44604 (0/sec)
>                           GAFM:   0%:                           CL prims:
> 47270 (0/sec)
>                             DS:   0%:                      PS invocations:
> 3011525360 (0/sec)
>                             GS:   0%:                      PS depth pass:
> 3010388146 (0/sec)
>
>
>
>
> render clock: unknown  sampler clock: unknown
>                    render busy:  17%:
> ███▎                                   render space: 10/131072
>                 bitstream busy:   4%: ▉
> bitstream space: 1/131072
>                   blitter busy:  17%:
> ███▎                                  blitter space: 7/131072
>
>                           task  percent busy
>                            GAM:  17%: ███▌                    vert fetch:
> 0 (0/sec)
>                           GAFS:   4%: ▉                       prim fetch:
> 0 (0/sec)
>                             VS:   0%:                      VS invocations:
> 104062 (0/sec)
>                             VF:   0%:                      GS invocations:
> 0 (0/sec)
>                                                                 GS prims:
> 0 (0/sec)
>                                                            CL invocations:
> 44604 (0/sec)
>                                                                 CL prims:
> 47270 (0/sec)
>                                                            PS invocations:
> 3011525360 (0/sec)
>                                                            PS depth pass:
> 3010388146 (0/sec)
>
>
>
>
> On Fri, Feb 22, 2013 at 7:17 AM, Ratin <ratin3 at gmail.com> wrote:
>
>>
>>
>>
>> On Thu, Feb 21, 2013 at 5:26 PM, ykzhao <yakui.zhao at intel.com> wrote:
>>
>>> On Thu, 2013-02-21 at 06:30 -0700, Ratin wrote:
>>> > awesome, would like to see the result from HQ scaling sometime in the
>>> > future. I am just using putSurface, don't want to go thru Proc
>>> > pipeline if I don't have to. Is the performance penalty identical in
>>> > both ways? Is there a way I can measure how much GPU processing  (%
>>> > and such) is being utilized?
>>>
>>> They are implemented in different ways and it is difficult to check the
>>> performance penalty. The putsurface is based on the 3D model while the
>>> proc pipeline is based on GPGPU model. (Intel_gpu_top may help to show
>>> the GPU utility, which can be downloaded from the
>>> http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/).
>>>
>>> Will you please check whether it can meet with your requirement if you
>>> can use the proc VPP to do the upscaling conversion and then call the
>>> vaPutsurface to display it?
>>>
>>> Thanks.
>>>     Yakui
>>>
>>
>> Hi Yakui, Thanks for your reply. I just started looking into this , the
>> total number of filters available for me seems to be only two, not sure if
>> thats normal or not. I am using HD4000 My lspci output shows the following:
>>
>> d02788e046eb:/usr/local/bin# lspci
>> 00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM
>> Controller (rev 09)
>> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core
>> processor Graphics Controller (rev 09)
>> 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
>> Family USB xHCI Host Controller (rev 04)
>> 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series
>> Chipset Family MEI Controller #1 (rev 04)
>> 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
>> Family USB Enhanced Host Controller #2 (rev 04)
>> 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family
>> PCI Express Root Port 1 (rev c4)
>> 00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family
>> PCI Express Root Port 4 (rev c4)
>> 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
>> Family USB Enhanced Host Controller #1 (rev 04)
>> 00:1f.0 ISA bridge: Intel Corporation HM76 Express Chipset LPC Controller
>> (rev 04)
>> 00:1f.2 IDE interface: Intel Corporation 7 Series Chipset Family 4-port
>> SATA Controller [IDE mode] (rev 04)
>> 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family
>> SMBus Controller (rev 04)
>> 00:1f.5 IDE interface: Intel Corporation 7 Series Chipset Family 2-port
>> SATA Controller [IDE mode] (rev 04)
>> 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>> RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 07)
>> 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 2200
>> (rev c4)
>>
>> I will use more query code to know what are those two filters and will
>> post the results here..
>>
>> Thanks
>>
>>  Ratin
>>
>>
>> On Tue, Feb 19, 2013 at 7:24 PM, Xiang, Haihao
>> > <haihao.xiang at intel.com> wrote:
>> >
>> >
>> >         > I am using Intel_driver from the staging branch, on a Gen 3
>> >         HD4000. So
>> >         > there other algorithms like bi-cubic is not supported?
>> >
>> >
>> >         You can select another scaling method other than the default
>> >         method via
>> >         the flag to vaPutSurface() or the filter_flag in
>> >         VAProcPipelineParameterBuffer.
>> >
>> >         /* Scaling flags for vaPutSurface() */
>> >         #define VA_FILTER_SCALING_DEFAULT       0x00000000
>> >         #define VA_FILTER_SCALING_FAST          0x00000100
>> >         #define VA_FILTER_SCALING_HQ            0x00000200
>> >         #define VA_FILTER_SCALING_NL_ANAMORPHIC 0x00000300
>> >         #define VA_FILTER_SCALING_MASK          0x00000f00
>> >
>> >         In VAProcPipelineParameterBuffer:
>> >
>> >             * - Scaling: \c VA_FILTER_SCALING_DEFAULT, \c
>> >         VA_FILTER_SCALING_FAST,
>> >              *   \c VA_FILTER_SCALING_HQ, \c
>> >         VA_FILTER_SCALING_NL_ANAMORPHIC.
>> >              */
>> >             unsigned int        filter_flags;
>> >
>> >         For Inter driver, Currently only
>> >         VA_FILTER_SCALING_NL_ANAMORPHIC and
>> >         VA_FILTER_SCALING_DEFAULT/VA_FILTER_SCALING_FAST  are
>> >         supported.  We
>> >         will add the support for VA_FILTER_SCALING_HQ.
>> >
>> >         Thanks
>> >         Haihao
>> >
>> >         >
>> >         >
>> >         >
>> >         > On Mon, Feb 18, 2013 at 12:11 AM, Xiang, Haihao
>> >         > <haihao.xiang at intel.com> wrote:
>> >         >         On Fri, 2013-02-15 at 16:18 -0800, Ratin wrote:
>> >         >         > I am decoding a 720 P video stream from a camera
>> >         to 1080 P
>> >         >         surfaces
>> >         >         > and displaying them on the screen. I am seeing
>> >         noticable
>> >         >         noise and
>> >         >         > pulsating which is directly related to the I frame
>> >         interval
>> >         >         > (aparently), the lowest I-frame interval I can
>> >         specify for
>> >         >         the camera
>> >         >         > is  1 second and selecting that in addition to
>> >         bitrate of
>> >         >         8192 kbps
>> >         >         > makes is slightly better but still a lot of noise.
>> >         A
>> >         >         software
>> >         >         > decoded/scaled video looks all smooth.
>> >         >         >
>> >         >         >
>> >         >         > What I am wondering is what's the default scaling
>> >         algorithm
>> >         >         being used
>> >         >         > in vaapi/intel driver and how do I specify better
>> >         scaling
>> >         >         algorithms
>> >         >         > like bi-cubic etc.and possibly specify the
>> >         strength of
>> >         >         deblocking
>> >         >         > filter level as well, and what can I do to reduce
>> >         the
>> >         >         pulsating ?
>> >         >
>> >         >
>> >         >         Which driver are you using ? For Intel, it is
>> >         bilinear.
>> >         >
>> >         >         >
>> >         >         >
>> >         >         >
>> >         >         >
>> >         >         > Any input would be much appreciated.
>> >         >         >
>> >         >         >
>> >         >         > Thanks
>> >         >         >
>> >         >         >
>> >         >         > Ratin
>> >         >         >
>> >         >         >
>> >         >
>> >         >         > _______________________________________________
>> >         >         > Libva mailing list
>> >         >         > Libva at lists.freedesktop.org
>> >         >         >
>> >         http://lists.freedesktop.org/mailman/listinfo/libva
>> >         >
>> >         >
>> >         >
>> >         >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libva/attachments/20130328/3e5b4c90/attachment-0001.html>