High cpu utilization with raw-video(UYVY) network streaming pipeline

Tim Müller tim at centricular.com
Wed Dec 3 05:43:38 PST 2014


On Wed, 2014-12-03 at 04:48 -0800, Amit Pandya wrote:

Hi,

> I am trying to stream raw-video(UYVY) to network from Nvidia Jetson
> platform.
> 
> => While using only nv hw accelerated plugins, following gst-pipeline a) is
> taking very low cpu load with fakesink. 
> 
> a) gst-launch-1.0 filesrc location=stream264_1280x960_25fps.mkv !
> matroskademux ! queue name=txq1 ! h264parse ! omxh264dec ! nvvidconv !
> 'video/x-raw, format=(string)UYVY' ! fakesink silent=1 sync=1 -e
> 
> (top: ~15-20%) 
> oprofile:
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples  %        app name                 symbol name
> 43217    95.7102  no-vmlinux               /no-vmlinux
> 267       0.5913  libgstreamer-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstreamer-1.0.so.0.204.0
> 166       0.3676  libglib-2.0.so.0.4002.0 
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 94        0.2082  libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
> 74        0.1639  libc-2.19.so             __memcpy_neon
> 66        0.1462  libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
> 
> => When introduce packetizer element "rtpvrawpay" with pipeline b) with
> fakesink, cpu load increase drastically.
> 
> b) gst-launch-1.0 filesrc location=stream264_1280x960_25fps.mkv !
> matroskademux ! queue name=txq1 ! h264parse ! omxh264dec ! nvvidconv
> name=txnv ! 'video/x-raw, format=(string)UYVY' ! rtpvrawpay mtu=1472 !
> fakesink silent=1 sync=1 -e
> 
>  (top: ~90%)  
> oprofile:
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples  %        app name                 symbol name
> 19391    39.8385  no-vmlinux               /no-vmlinux
> *11885    24.4176  libgstreamer-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstreamer-1.0.so.0.204.0*
> 3838      7.8851  libglib-2.0.so.0.4002.0 
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 2882      5.9210  libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
> 1710      3.5132  libc-2.19.so             __memcpy_neon
> 1504      3.0899  libgstrtp-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstrtp-1.0.so.0.204.0
> 1438      2.9543  libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
> 
> It is observed that gstreamer core library "libgstreamer-1.0.so.0.204.0" is
> hitting big spike of 20-25% load increase.
> 
> Tried to analyze the gst-core library "libgstreamer-1.0.so.0.204.0" symbols
> & following are the details for pipeline b) 
> 
> oprofile: 
> CPU: CPU with timer interrupt, speed 2.3205e+06 MHz (estimated)
> Profiling through timer interrupt
> samples  %        app name                 symbol name
> 17278    31.8494  no-vmlinux               /no-vmlinux
> 5274      9.7218  libc-2.19.so            
> /lib/arm-linux-gnueabihf/libc-2.19.so
> 4416      8.1402  libglib-2.0.so.0.4002.0 
> /lib/arm-linux-gnueabihf/libglib-2.0.so.0.4002.0
> 3088      5.6923  libgstbase-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstbase-1.0.so.0.204.0
> 2030      3.7420  libgobject-2.0.so.0.4002.0
> /usr/lib/arm-linux-gnueabihf/libgobject-2.0.so.0.4002.0
> 2001      3.6885  libgstrtp-1.0.so.0.204.0
> /usr/lib/arm-linux-gnueabihf/libgstrtp-1.0.so.0.204.0
> 1694      3.1226  libgstreamer-1.0.so.0.204.0 gst_mini_object_unref
> 1687      3.1097  libpthread-2.19.so       pthread_mutex_lock
> 1663      3.0655  libgstreamer-1.0.so.0.204.0 gst_mini_object_unlock
> 1546      2.8498  libgstreamer-1.0.so.0.204.0 gst_mini_object_lock
> 1291      2.3798  libpthread-2.19.so       __pthread_mutex_unlock_usercnt
> 979       1.8046  libgstrtp.so            
> /usr/lib/arm-linux-gnueabihf/gstreamer-1.0/libgstrtp.so
> 712       1.3125  libgstreamer-1.0.so.0.204.0 gst_mini_object_is_writable
> 674       1.2424  libgstreamer-1.0.so.0.204.0 gst_mini_object_ref
> 578       1.0655  libgstreamer-1.0.so.0.204.0 gst_pad_push_data
> 456       0.8406  libgstreamer-1.0.so.0.204.0 .udivsi3_skip_div0_test
> 439       0.8092  libgstreamer-1.0.so.0.204.0 gst_buffer_map_range
> 418       0.7705  libgstreamer-1.0.so.0.204.0 gst_buffer_get_sizes_range
> 324       0.5972  libpthread-2.19.so       pthread_getspecific
> 309       0.5696  libgstreamer-1.0.so.0.204.0 gst_segment_clip
> 304       0.5604  libgstreamer-1.0.so.0.204.0 __udivdi3
> 266       0.4903  libgstreamer-1.0.so.0.204.0
> gst_system_clock_id_wait_jitter_unlocked
> 261       0.4811  libgstreamer-1.0.so.0.204.0 gst_segment_to_running_time
> 240       0.4424  libgstreamer-1.0.so.0.204.0 gst_clock_get_time
> 224       0.4129  libgstreamer-1.0.so.0.204.0 gst_memory_get_type
> 212       0.3908  libgstreamer-1.0.so.0.204.0 gst_memory_get_sizes
> 209       0.3853  libgstreamer-1.0.so.0.204.0 gst_clock_get_type
> 195       0.3595  libgstreamer-1.0.so.0.204.0 gst_allocator_get_type
> 190       0.3502  libgstreamer-1.0.so.0.204.0 gst_segment_to_stream_time
> 184       0.3392  libgstreamer-1.0.so.0.204.0 gst_buffer_find_memory
> 181       0.3336  libgstreamer-1.0.so.0.204.0 __aeabi_uidivmod
> 180       0.3318  libgstreamer-1.0.so.0.204.0 gst_memory_map
> 178       0.3281  libgstreamer-1.0.so.0.204.0 gst_buffer_insert_memory
> 178       0.3281  libgstreamer-1.0.so.0.204.0 gst_clock_id_wait
> 177       0.3263  libgstreamer-1.0.so.0.204.0 gst_buffer_unmap
> 172       0.3171  libgstreamer-1.0.so.0.204.0 gst_object_ref
> 166       0.3060  libgstreamer-1.0.so.0.204.0 gst_pad_get_type
> 163       0.3005  libgstreamer-1.0.so.0.204.0 _get_merged_memory
> 162       0.2986  libgstreamer-1.0.so.0.204.0 _gst_util_uint64_scale
> 155       0.2857  libgstreamer-1.0.so.0.204.0 gst_mini_object_replace
> 141       0.2599  libgstreamer-1.0.so.0.204.0 gst_buffer_resize_range
> 126       0.2323  libgstreamer-1.0.so.0.204.0 __gnu_uldivmod_helper
> 121       0.2230  libgstreamer-1.0.so.0.204.0 _gst_buffer_free
> 120       0.2212  libgstreamer-1.0.so.0.204.0 _sysmem_new_block
> 118       0.2175  libgstcoreelements.so    gst_fake_sink_render
> 114       0.2101  libgstreamer-1.0.so.0.204.0 gst_allocator_alloc
> 109       0.2009  libgstreamer-1.0.so.0.204.0 gst_memory_init
> 101       0.1862  libgstreamer-1.0.so.0.204.0 gst_mini_object_init
> 101       0.1862  libgstreamer-1.0.so.0.204.0 gst_util_uint64_scale_int
> 100       0.1843  libgstreamer-1.0.so.0.204.0 gst_allocator_free
> ....
> 
> It shows following gst core APIs are the top most in load contribution from
> above list,
> 1694      3.1226  libgstreamer-1.0.so.0.204.0 gst_mini_object_unref
> 1663      3.0655  libgstreamer-1.0.so.0.204.0 gst_mini_object_unlock
> 1546      2.8498  libgstreamer-1.0.so.0.204.0 gst_mini_object_lock
> 712       1.3125  libgstreamer-1.0.so.0.204.0 gst_mini_object_is_writable
> 674       1.2424  libgstreamer-1.0.so.0.204.0 gst_mini_object_ref 
> 
> Can anyone provide some inputs to understand, what could force gstreamer
> core to consume such high load just by introducing packetizer element
> "rtpvrawpay"(with fakesink) ?

It's more than just a 'packetizer' element. Depending on the input
resolution it might have to create thousands of packets / memories /
buffers for every input frame, and do a lot of memcpying around. There
will be loads of mini objects created/destroyed. I have done some
optimisations in git a while back (across modules), but there are still
some things we could do better but haven't gotten around to it yet. In
any case, don't expect rtpvrawpay to be cheap. You might find the
patches in

https://bugzilla.gnome.org/show_bug.cgi?id=732152

useful as well in this case, if you haven't seen them yet.

 Cheers
  -Tim

-- 
Tim Müller, Centricular Ltd - http://www.centricular.com



More information about the gstreamer-devel mailing list