nvcodec: low resolutions transcode faster with host memory, high resolutions faster with GL memory
Sid Sethupathi
sid.sethupathi at gmail.com
Tue Apr 21 18:03:28 UTC 2020
Hello,
I noticed that when transcoding using nvh264dec and nvh264enc, lower
resolutions perform better when using system/host memory instead of GL
memory. Higher resolutions perform better when using GL memory instead of
system/host memory.
If you profile the pipelines using nvprof, the memory copy operations seem
in line with what you'd expect: device to host memory copies are slower
than device to device. Since the memory copy operation performance seems as
expected, what could be the cause of this slower performance and why does
it only affect lower resolutions?
This gist has results of my testing:
https://gist.github.com/sidsethupathi/b464a6dc30907768a074d8dc526b2b66.
I created 10 minute test sources, one at 320x420 and another at 3840x2160
and ran them through a "filesrc ! nvh264dec ! nvh264enc ! fakesink"
pipeline, similar to Seungha's benchmarks in this MR:
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/539
.
The results are better formatted in the gist, but they are copied below:
320x240, host memory. Execution time = 0:00:11.252269640
gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse !
nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min
Max Name
GPU activities: 49.35% 368.89ms 36000 10.246us 5.9200us
19.872us [CUDA memcpy HtoD]
40.12% 299.85ms 36000 8.3290us 5.3120us
24.256us [CUDA memcpy DtoH]
6.34% 47.353ms 36000 1.3150us 1.2150us
9.3440us Convert_PL2BL
4.19% 31.291ms 18000 1.7380us 1.6640us
2.2400us ConvertNV24toNV12
0.01% 77.632us 68 1.1410us 704ns
2.6240us [CUDA memset]
320x240, GL memory. Execution time = 0:00:20.584277338
gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse !
nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min
Max Name
GPU activities: 50.84% 79.086ms 72000 1.0980us 864ns
14.208us [CUDA memcpy DtoD]
27.69% 43.070ms 36000 1.1960us 991ns
14.016us Convert_PL2BL
21.41% 33.308ms 18000 1.8500us 1.5030us
2.8480us ConvertNV24toNV12
0.05% 78.944us 68 1.1600us 672ns
2.6560us [CUDA memset]
3840x2160, host memory. Execution time = 0:03:20.462018560
gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse !
nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min
Max Name
GPU activities: 54.40% 46.3980s 36000 1.2888ms 738.27us
2.7441ms [CUDA memcpy HtoD]
42.74% 36.4568s 36000 1.0127ms 599.52us
3.0454ms [CUDA memcpy DtoH]
1.47% 1.25313s 18000 69.618us 67.584us
72.192us ConvertNV24toNV12
1.39% 1.18157s 36000 32.821us 23.328us
45.856us Convert_PL2BL
0.00% 81.504us 66 1.2340us 704ns
2.6560us [CUDA memset]
3840x2160, GL memory. Execution time = 0:02:18.106101429
gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse !
nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min
Max Name
GPU activities: 50.48% 2.63958s 72000 36.660us 22.976us
58.976us [CUDA memcpy DtoD]
25.11% 1.31285s 36000 36.468us 23.744us
49.024us Convert_PL2BL
24.41% 1.27668s 18000 70.926us 67.585us
71.872us ConvertNV24toNV12
0.00% 81.536us 66 1.2350us 704ns
2.9120us [CUDA memset]
Sid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20200421/e845b3ab/attachment-0001.htm>
More information about the gstreamer-devel
mailing list