High cpu utilization with raw-video(UYVY) network streaming pipeline
Tim Müller
tim at centricular.com
Wed Dec 3 05:43:38 PST 2014
On Wed, 2014-12-03 at 04:48 -0800, Amit Pandya wrote:
Hi,
> I am trying to stream raw-video(UYVY) to network from Nvidia Jetson
> platform.
>
> => While using only nv hw accelerated plugins, following gst-pipeline a) is
> taking very low cpu load with fakesink.
>
> a) gst-launch-1.0 filesrc location=stream264_1280x960_25fps.mkv !
> matroskademux ! queue name=txq1 ! h264parse ! omxh264dec ! nvvidconv !
> 'video/x-raw, format=(string)UYVY' ! fakesink silent=1 sync=1 -e
>
> (top: ~15-20%)
> oprofile:
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples % app name symbol name
> 43217 95.7102 no-vmlinux /no-vmlinux
> 267 0.5913 libgstreamer-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstreamer-1.0.so.0.204.0
> 166 0.3676 libglib-2.0.so.0.4002.0
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 94 0.2082 libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
> 74 0.1639 libc-2.19.so __memcpy_neon
> 66 0.1462 libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
>
> => When introduce packetizer element "rtpvrawpay" with pipeline b) with
> fakesink, cpu load increase drastically.
>
> b) gst-launch-1.0 filesrc location=stream264_1280x960_25fps.mkv !
> matroskademux ! queue name=txq1 ! h264parse ! omxh264dec ! nvvidconv
> name=txnv ! 'video/x-raw, format=(string)UYVY' ! rtpvrawpay mtu=1472 !
> fakesink silent=1 sync=1 -e
>
> (top: ~90%)
> oprofile:
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples % app name symbol name
> 19391 39.8385 no-vmlinux /no-vmlinux
> *11885 24.4176 libgstreamer-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstreamer-1.0.so.0.204.0*
> 3838 7.8851 libglib-2.0.so.0.4002.0
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 2882 5.9210 libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
> 1710 3.5132 libc-2.19.so __memcpy_neon
> 1504 3.0899 libgstrtp-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstrtp-1.0.so.0.204.0
> 1438 2.9543 libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
>
> It is observed that gstreamer core library "libgstreamer-1.0.so.0.204.0" is
> hitting big spike of 20-25% load increase.
>
> Tried to analyze the gst-core library "libgstreamer-1.0.so.0.204.0" symbols
> & following are the details for pipeline b)
>
> oprofile:
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples % app name symbol name
> 17278 31.8494 no-vmlinux /no-vmlinux
> 5274 9.7218 libc-2.19.so
> /lib/arm-linux-gnueabihf/libc-2.19.so
> 4416 8.1402 libglib-2.0.so.0.4002.0
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 3088 5.6923 libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
> 2030 3.7420 libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
> 2001 3.6885 libgstrtp-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstrtp-1.0.so.0.204.0
> 1694 3.1226 libgstreamer-1.0.so.0.204.0 gst_mini_object_unref
> 1687 3.1097 libpthread-2.19.so pthread_mutex_lock
> 1663 3.0655 libgstreamer-1.0.so.0.204.0 gst_mini_object_unlock
> 1546 2.8498 libgstreamer-1.0.so.0.204.0 gst_mini_object_lock
> 1291 2.3798 libpthread-2.19.so __pthread_mutex_unlock_usercnt
> 979 1.8046 libgstrtp.so
> /usr/lib/arm-linux-gnueabihf/gstreamer-1.0/libgstrtp.so
> 712 1.3125 libgstreamer-1.0.so.0.204.0 gst_mini_object_is_writable
> 674 1.2424 libgstreamer-1.0.so.0.204.0 gst_mini_object_ref
> 578 1.0655 libgstreamer-1.0.so.0.204.0 gst_pad_push_data
> 456 0.8406 libgstreamer-1.0.so.0.204.0 .udivsi3_skip_div0_test
> 439 0.8092 libgstreamer-1.0.so.0.204.0 gst_buffer_map_range
> 418 0.7705 libgstreamer-1.0.so.0.204.0 gst_buffer_get_sizes_range
> 324 0.5972 libpthread-2.19.so pthread_getspecific
> 309 0.5696 libgstreamer-1.0.so.0.204.0 gst_segment_clip
> 304 0.5604 libgstreamer-1.0.so.0.204.0 __udivdi3
> 266 0.4903 libgstreamer-1.0.so.0.204.0
> gst_system_clock_id_wait_jitter_unlocked
> 261 0.4811 libgstreamer-1.0.so.0.204.0 gst_segment_to_running_time
> 240 0.4424 libgstreamer-1.0.so.0.204.0 gst_clock_get_time
> 224 0.4129 libgstreamer-1.0.so.0.204.0 gst_memory_get_type
> 212 0.3908 libgstreamer-1.0.so.0.204.0 gst_memory_get_sizes
> 209 0.3853 libgstreamer-1.0.so.0.204.0 gst_clock_get_type
> 195 0.3595 libgstreamer-1.0.so.0.204.0 gst_allocator_get_type
> 190 0.3502 libgstreamer-1.0.so.0.204.0 gst_segment_to_stream_time
> 184 0.3392 libgstreamer-1.0.so.0.204.0 gst_buffer_find_memory
> 181 0.3336 libgstreamer-1.0.so.0.204.0 __aeabi_uidivmod
> 180 0.3318 libgstreamer-1.0.so.0.204.0 gst_memory_map
> 178 0.3281 libgstreamer-1.0.so.0.204.0 gst_buffer_insert_memory
> 178 0.3281 libgstreamer-1.0.so.0.204.0 gst_clock_id_wait
> 177 0.3263 libgstreamer-1.0.so.0.204.0 gst_buffer_unmap
> 172 0.3171 libgstreamer-1.0.so.0.204.0 gst_object_ref
> 166 0.3060 libgstreamer-1.0.so.0.204.0 gst_pad_get_type
> 163 0.3005 libgstreamer-1.0.so.0.204.0 _get_merged_memory
> 162 0.2986 libgstreamer-1.0.so.0.204.0 _gst_util_uint64_scale
> 155 0.2857 libgstreamer-1.0.so.0.204.0 gst_mini_object_replace
> 141 0.2599 libgstreamer-1.0.so.0.204.0 gst_buffer_resize_range
> 126 0.2323 libgstreamer-1.0.so.0.204.0 __gnu_uldivmod_helper
> 121 0.2230 libgstreamer-1.0.so.0.204.0 _gst_buffer_free
> 120 0.2212 libgstreamer-1.0.so.0.204.0 _sysmem_new_block
> 118 0.2175 libgstcoreelements.so gst_fake_sink_render
> 114 0.2101 libgstreamer-1.0.so.0.204.0 gst_allocator_alloc
> 109 0.2009 libgstreamer-1.0.so.0.204.0 gst_memory_init
> 101 0.1862 libgstreamer-1.0.so.0.204.0 gst_mini_object_init
> 101 0.1862 libgstreamer-1.0.so.0.204.0 gst_util_uint64_scale_int
> 100 0.1843 libgstreamer-1.0.so.0.204.0 gst_allocator_free
> ....
>
> It shows following gst core APIs are the top most in load contribution from
> above list,
> 1694 3.1226 libgstreamer-1.0.so.0.204.0 gst_mini_object_unref
> 1663 3.0655 libgstreamer-1.0.so.0.204.0 gst_mini_object_unlock
> 1546 2.8498 libgstreamer-1.0.so.0.204.0 gst_mini_object_lock
> 712 1.3125 libgstreamer-1.0.so.0.204.0 gst_mini_object_is_writable
> 674 1.2424 libgstreamer-1.0.so.0.204.0 gst_mini_object_ref
>
> Can anyone provide some inputs to understand, what could force gstreamer
> core to consume such high load just by introducing packetizer element
> "rtpvrawpay"(with fakesink) ?
It's more than just a 'packetizer' element. Depending on the input
resolution it might have to create thousands of packets / memories /
buffers for every input frame, and do a lot of memcpying around. There
will be loads of mini objects created/destroyed. I have done some
optimisations in git a while back (across modules), but there are still
some things we could do better but haven't gotten around to it yet. In
any case, don't expect rtpvrawpay to be cheap. You might find the
patches in
https://bugzilla.gnome.org/show_bug.cgi?id=732152
useful as well in this case, if you haven't seen them yet.
Cheers
-Tim
--
Tim Müller, Centricular Ltd - http://www.centricular.com
More information about the gstreamer-devel
mailing list