<div dir="ltr">Hello,<div><br></div><div>I noticed that when transcoding using nvh264dec and nvh264enc, lower resolutions perform better when using system/host memory instead of GL memory. Higher resolutions perform better when using GL memory instead of system/host memory.</div><div><br></div><div>If you profile the pipelines using nvprof, the memory copy operations seem in line with what you'd expect: device to host memory copies are slower than device to device. Since the memory copy operation performance seems as expected, what could be the cause of this slower performance and why does it only affect lower resolutions?</div><div><br></div><div>This gist has results of my testing: <a href="https://gist.github.com/sidsethupathi/b464a6dc30907768a074d8dc526b2b66">https://gist.github.com/sidsethupathi/b464a6dc30907768a074d8dc526b2b66</a>. </div><div><br></div><div>I created 10 minute test sources, one at 320x420 and another at 3840x2160 and ran them through a "filesrc ! nvh264dec ! nvh264enc ! fakesink" pipeline, similar to Seungha's benchmarks in this MR: <a href="https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/539">https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/539</a>.</div><div><br></div><div>The results are better formatted in the gist, but they are copied below:</div><div><br></div><div>320x240, host memory. Execution time = 0:00:11.252269640</div><div><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;padding:0px;margin:0px;background:initial;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 49.35% 368.89ms 36000 10.246us 5.9200us 19.872us [CUDA memcpy HtoD]
40.12% 299.85ms 36000 8.3290us 5.3120us 24.256us [CUDA memcpy DtoH]
6.34% 47.353ms 36000 1.3150us 1.2150us 9.3440us Convert_PL2BL
4.19% 31.291ms 18000 1.7380us 1.6640us 2.2400us ConvertNV24toNV12
0.01% 77.632us 68 1.1410us 704ns 2.6240us [CUDA memset]</code></pre></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">320x240, GL memory. Execution time = 0:00:20.584277338</div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;padding:0px;margin:0px;background:initial;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">gst-launch-1.0 filesrc location=low_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 50.84% 79.086ms 72000 1.0980us 864ns 14.208us [CUDA memcpy DtoD]
27.69% 43.070ms 36000 1.1960us 991ns 14.016us Convert_PL2BL
21.41% 33.308ms 18000 1.8500us 1.5030us 2.8480us ConvertNV24toNV12
0.05% 78.944us 68 1.1600us 672ns 2.6560us [CUDA memset]</code></pre></div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">3840x2160, host memory. Execution time = 0:03:20.462018560</div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;padding:0px;margin:0px;background:initial;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 54.40% 46.3980s 36000 1.2888ms 738.27us 2.7441ms [CUDA memcpy HtoD]
42.74% 36.4568s 36000 1.0127ms 599.52us 3.0454ms [CUDA memcpy DtoH]
1.47% 1.25313s 18000 69.618us 67.584us 72.192us ConvertNV24toNV12
1.39% 1.18157s 36000 32.821us 23.328us 45.856us Convert_PL2BL
0.00% 81.504us 66 1.2340us 704ns 2.6560us [CUDA memset]</code></pre></div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">3840x2160, GL memory. Execution time = 0:02:18.106101429</div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;margin-top:0px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46);margin-bottom:0px"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:13.6px;padding:0px;margin:0px;background:initial;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">gst-launch-1.0 filesrc location=hi_res.ts ! tsdemux ! h264parse ! nvh264dec ! "video/x-raw(memory:GLMemory)" ! nvh264enc ! fakesink
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 50.48% 2.63958s 72000 36.660us 22.976us 58.976us [CUDA memcpy DtoD]
25.11% 1.31285s 36000 36.468us 23.744us 49.024us Convert_PL2BL
24.41% 1.27668s 18000 70.926us 67.585us 71.872us ConvertNV24toNV12
0.00% 81.536us 66 1.2350us 704ns 2.9120us [CUDA memset]</code></pre></div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><br></div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Sid<br><br></div></div></div></div>