Bottlenecked on gstglcontext thread
Michiel Konstapel
michiel at aanmelder.nl
Wed Jun 30 14:33:41 UTC 2021
My pipeline isn't keeping up (I have leaky queues at the sources and
these are overflowing) and I think I have narrowed it down to the
gstglcontext thread hitting 100% CPU. The pipeline uses the GPU quite
heavily:
- 3 nvh265dec decoders
- 5 nvh264enc encoders
- a bunch of glupload, gldownload and glcolorconvert elements
- about 17 glvideomixers (in 3 layers of compositing)
For some of the mixers, the pad properties are updated every frame
(xpos, ypos, width, height, alpha).
Removing encoders and videomixers helps a bit in bringing down the CPU
usage, but of course, they are in there for a reason :)
All this is running on an EC2 g4dn instance with a T4 GPU.
Q1: Is this bottleneck to be expected? I've run a separate test of just
feeding 16x gltestsrc into 16x glvideomixer into 1x glvideomixer, and
that sits at ~14% CPU:
gst-launch-1.0 glvideomixer name=m latency=80000000 \
sink_0::xpos=0 sink_0::ypos=0 sink_0::width=480 sink_0::height=270 \
sink_1::xpos=480 sink_1::ypos=0 sink_1::width=480 sink_1::height=270 \
sink_2::xpos=960 sink_2::ypos=0 sink_2::width=480 sink_2::height=270 \
sink_3::xpos=1440 sink_3::ypos=0 sink_3::width=480 sink_3::height=270 \
sink_4::xpos=0 sink_4::ypos=270 sink_4::width=480
sink_4::height=270 \
sink_5::xpos=480 sink_5::ypos=270 sink_5::width=480
sink_5::height=270 \
sink_6::xpos=960 sink_6::ypos=270 sink_6::width=480
sink_6::height=270 \
sink_7::xpos=1440 sink_7::ypos=270 sink_7::width=480
sink_7::height=270 \
sink_8::xpos=0 sink_8::ypos=540 sink_8::width=480
sink_8::height=270 \
sink_9::xpos=480 sink_9::ypos=540 sink_9::width=480
sink_9::height=270 \
sink_10::xpos=960 sink_10::ypos=540 sink_10::width=480
sink_10::height=270 \
sink_11::xpos=1440 sink_11::ypos=540 sink_11::width=480
sink_11::height=270 \
sink_12::xpos=0 sink_12::ypos=810 sink_12::width=480
sink_12::height=270 \
sink_13::xpos=480 sink_13::ypos=810 sink_13::width=480
sink_13::height=270 \
sink_14::xpos=960 sink_14::ypos=810 sink_14::width=480
sink_14::height=270 \
sink_15::xpos=1440 sink_15::ypos=810 sink_15::width=480
sink_15::height=270 \
! fpsdisplaysink \
gltestsrc pattern=0 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=1 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=2 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=3 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=4 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=5 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=6 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=7 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=8 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=9 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=10 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=11 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=12 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m. \
gltestsrc pattern=13 is-live=true ! "video/x-raw(memory:GLMemory),
width=3840, height=2160, format=RGBA" ! queue ! glvideomixer ! queue ! m.
However, if I replace the 16 testsrcs with imagefreeze branches,
approximating what the real pipeline is doing, it starts chugging as
well (lots of "gst_base_sink_is_too_late") and the thread hits 100% CPU:
gst-launch-1.0 glvideomixer name=m latency=80000000 \
sink_0::xpos=0 sink_0::ypos=0 sink_0::width=480 sink_0::height=270 \
sink_1::xpos=480 sink_1::ypos=0 sink_1::width=480 sink_1::height=270 \
sink_2::xpos=960 sink_2::ypos=0 sink_2::width=480 sink_2::height=270 \
sink_3::xpos=1440 sink_3::ypos=0 sink_3::width=480 sink_3::height=270 \
sink_4::xpos=0 sink_4::ypos=270 sink_4::width=480
sink_4::height=270 \
sink_5::xpos=480 sink_5::ypos=270 sink_5::width=480
sink_5::height=270 \
sink_6::xpos=960 sink_6::ypos=270 sink_6::width=480
sink_6::height=270 \
sink_7::xpos=1440 sink_7::ypos=270 sink_7::width=480
sink_7::height=270 \
sink_8::xpos=0 sink_8::ypos=540 sink_8::width=480
sink_8::height=270 \
sink_9::xpos=480 sink_9::ypos=540 sink_9::width=480
sink_9::height=270 \
sink_10::xpos=960 sink_10::ypos=540 sink_10::width=480
sink_10::height=270 \
sink_11::xpos=1440 sink_11::ypos=540 sink_11::width=480
sink_11::height=270 \
sink_12::xpos=0 sink_12::ypos=810 sink_12::width=480
sink_12::height=270 \
sink_13::xpos=480 sink_13::ypos=810 sink_13::width=480
sink_13::height=270 \
sink_14::xpos=960 sink_14::ypos=810 sink_14::width=480
sink_14::height=270 \
sink_15::xpos=1440 sink_15::ypos=810 sink_15::width=480
sink_15::height=270 \
! fpsdisplaysink \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
filesrc location="../images/blank.png" ! pngdec ! imagefreeze
is-live=true ! glupload ! glcolorconvert ! glcolorscale !
"video/x-raw(memory:GLMemory), format=RGBA, framerate=25/1" ! queue !
glvideomixer ! queue ! m. \
Q2: I've tried a larger instance type with 4 GPUs, and moving the
encoders to another GPU (using nvh264device1enc), but there's still only
one gstglcontext thread. Is there an easy way (or a hard way, if
necessary) to offload some of the load to a second GPU/gstglcontext thread?
Q3: Other than trying to remove stuff from the pipeline, what can I do
to improve performance?
Cheers,
Michiel
More information about the gstreamer-devel
mailing list