Bottlenecked on gstglcontext thread

Michiel Konstapel michiel at aanmelder.nl
Wed Jun 30 14:33:41 UTC 2021


My pipeline isn't keeping up (I have leaky queues at the sources and 
these are overflowing) and I think I have narrowed it down to the 
gstglcontext thread hitting 100% CPU. The pipeline uses the GPU quite 
heavily:
- 3 nvh265dec decoders
- 5 nvh264enc encoders
- a bunch of glupload, gldownload and glcolorconvert elements
- about 17 glvideomixers (in 3 layers of compositing)

For some of the mixers, the pad properties are updated every frame 
(xpos, ypos, width, height, alpha).

Removing encoders and videomixers helps a bit in bringing down the CPU 
usage, but of course, they are in there for a reason :)

All this is running on an EC2 g4dn instance with a T4 GPU.

Q1: Is this bottleneck to be expected? I've run a separate test of just 
feeding 16x gltestsrc into 16x glvideomixer into 1x glvideomixer, and 
that sits at ~14% CPU:

gst-launch-1.0 glvideomixer name=m latency=80000000 \
     sink_0::xpos=0    sink_0::ypos=0 sink_0::width=480 sink_0::height=270 \
     sink_1::xpos=480  sink_1::ypos=0 sink_1::width=480 sink_1::height=270 \
     sink_2::xpos=960  sink_2::ypos=0 sink_2::width=480 sink_2::height=270 \
     sink_3::xpos=1440 sink_3::ypos=0 sink_3::width=480 sink_3::height=270 \
     sink_4::xpos=0    sink_4::ypos=270 sink_4::width=480 
sink_4::height=270 \
     sink_5::xpos=480  sink_5::ypos=270 sink_5::width=480 
sink_5::height=270 \
     sink_6::xpos=960  sink_6::ypos=270 sink_6::width=480 
sink_6::height=270 \
     sink_7::xpos=1440 sink_7::ypos=270 sink_7::width=480 
sink_7::height=270 \
     sink_8::xpos=0    sink_8::ypos=540 sink_8::width=480 
sink_8::height=270 \
     sink_9::xpos=480  sink_9::ypos=540 sink_9::width=480 
sink_9::height=270 \
     sink_10::xpos=960  sink_10::ypos=540 sink_10::width=480 
sink_10::height=270 \
     sink_11::xpos=1440 sink_11::ypos=540 sink_11::width=480 
sink_11::height=270 \
     sink_12::xpos=0    sink_12::ypos=810 sink_12::width=480 
sink_12::height=270 \
     sink_13::xpos=480  sink_13::ypos=810 sink_13::width=480 
sink_13::height=270 \
     sink_14::xpos=960  sink_14::ypos=810 sink_14::width=480 
sink_14::height=270 \
     sink_15::xpos=1440 sink_15::ypos=810 sink_15::width=480 
sink_15::height=270 \
     ! fpsdisplaysink \
     gltestsrc pattern=0 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=1 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=2 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=3 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=4 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=5 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=6 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=7 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=8 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=9 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=10 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=11 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=12 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
     gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory), 
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m.

However, if I replace the 16 testsrcs with imagefreeze branches, 
approximating what the real pipeline is doing, it starts chugging as 
well (lots of "gst_base_sink_is_too_late") and the thread hits 100% CPU:

gst-launch-1.0 glvideomixer name=m latency=80000000 \
     sink_0::xpos=0    sink_0::ypos=0 sink_0::width=480 sink_0::height=270 \
     sink_1::xpos=480  sink_1::ypos=0 sink_1::width=480 sink_1::height=270 \
     sink_2::xpos=960  sink_2::ypos=0 sink_2::width=480 sink_2::height=270 \
     sink_3::xpos=1440 sink_3::ypos=0 sink_3::width=480 sink_3::height=270 \
     sink_4::xpos=0    sink_4::ypos=270 sink_4::width=480 
sink_4::height=270 \
     sink_5::xpos=480  sink_5::ypos=270 sink_5::width=480 
sink_5::height=270 \
     sink_6::xpos=960  sink_6::ypos=270 sink_6::width=480 
sink_6::height=270 \
     sink_7::xpos=1440 sink_7::ypos=270 sink_7::width=480 
sink_7::height=270 \
     sink_8::xpos=0    sink_8::ypos=540 sink_8::width=480 
sink_8::height=270 \
     sink_9::xpos=480  sink_9::ypos=540 sink_9::width=480 
sink_9::height=270 \
     sink_10::xpos=960  sink_10::ypos=540 sink_10::width=480 
sink_10::height=270 \
     sink_11::xpos=1440 sink_11::ypos=540 sink_11::width=480 
sink_11::height=270 \
     sink_12::xpos=0    sink_12::ypos=810 sink_12::width=480 
sink_12::height=270 \
     sink_13::xpos=480  sink_13::ypos=810 sink_13::width=480 
sink_13::height=270 \
     sink_14::xpos=960  sink_14::ypos=810 sink_14::width=480 
sink_14::height=270 \
     sink_15::xpos=1440 sink_15::ypos=810 sink_15::width=480 
sink_15::height=270 \
     ! fpsdisplaysink \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \
     filesrc location="../images/blank.png" ! pngdec ! imagefreeze 
is-live=true ! glupload ! glcolorconvert ! glcolorscale ! 
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue ! 
glvideomixer ! queue ! m. \

Q2: I've tried a larger instance type with 4 GPUs, and moving the 
encoders to another GPU (using nvh264device1enc), but there's still only 
one gstglcontext thread. Is there an easy way (or a hard way, if 
necessary) to offload some of the load to a second GPU/gstglcontext thread?

Q3: Other than trying to remove stuff from the pipeline, what can I do 
to improve performance?

Cheers,
Michiel



More information about the gstreamer-devel mailing list