IMX Scaler / CSC m2m driver.

Ian Molton imolton at ad-holdings.co.uk
Wed Mar 25 04:02:25 PDT 2015


On 24/03/15 16:44, Jean-Michel Hautbois wrote:
> Hi,
>
> 2015-03-24 16:51 GMT+01:00 Ian Molton <imolton at ad-holdings.co.uk>:
>> I've done some timings and discovered that an actual image
>> scaling takes about 3-4 hundredths of a second.
>>
>> This is only barely fast enough - is this expected behaviour?
>> it doesnt look like the driver does anything between kicking
>> off the frame and the completion interrupt, so I can only
>> assume the hardware is /that/ slow.
>
> If this is really the hardware scaler, it clearly is not correct.
> I had it doing 1080p conversion in ~10ms.

How are you measuring this?

I put a couple of printk()s in the code and did the math on the timestamps.

@@ -214,10 +214,11 @@ static void ipu_scaler_work(struct work_struct *work)
  			schedule_work(&ctx->skip_run);
  			return;
  		}
  	}
  
+printk("Frame Start\n");
  	ipu_image_convert_run(ipu_scaler->ipu, &in, &out, ctx->icc,
  			      ctx->num_tiles, ipu_complete, ipu_scaler, false);
  
  	if (!wait_for_completion_timeout(&ctx->completion,
  					 msecs_to_jiffies(300))) {
@@ -225,10 +226,11 @@ static void ipu_scaler_work(struct work_struct *work)
  			"Timeout waiting for scaling result\n");
  		err = -ETIMEDOUT;
  	} else {
  		err = ctx->error;
  	}
+printk("Frame end\n");
  
  	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
  	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
  
  	dst_buf->v4l2_buf.timestamp = src_buf->v4l2_buf.timestamp;


> Can you use a perf command in order to see all bottlnecks ?

For the following pipeline:

gst-launch-1.0 filesrc location= Downloads/big_buck_bunny_1080p_h264.mov ! qtdemux ! h264parse ! v4l2video3videodec capture-io-mode=dmabuf ! v4l2video0convert output-io-mode=dmabuf-import ! video/x-raw,width=480,height=272 ! ximagesink sync=false


I get the following from perf:

Samples: 56K of event 'cycles', Event count (approx.): 2250401989
   44.53%  v4l2video3video  libc-2.19.so         [.] __GI___memcpy_neon
   22.16%  v4l2video3video  [kernel.kallsyms]    [k] v7_dma_clean_range
    8.42%  v4l2video3video  [kernel.kallsyms]    [k] l2c210_clean_range
    2.68%    qtdemux0:sink  [kernel.kallsyms]    [k] v7_dma_clean_range
    1.49%    qtdemux0:sink  libc-2.19.so         [.] __GI___memcpy_neon
    1.11%    qtdemux0:sink  [kernel.kallsyms]    [k] l2c210_clean_range
    0.82%  v4l2video3video  [kernel.kallsyms]    [k] _raw_spin_unlock_i
    0.81%  v4l2video3video  [kernel.kallsyms]    [k] __do_softirq
    0.77%    qtdemux0:sink  [kernel.kallsyms]    [k] __copy_to_user_std
    0.60%  v4l2video3video  [kernel.kallsyms]    [k] lock_acquire

> What is your pipeline right now exactly ?

As above :)

Obviously, that memcpy() is dominating right now.

I may be misunderstanding something here, but presumably, all the
buffers used by hardware are allocated in the kernel, wether they be
GFP_DMA or GFP_KERNEL. I dont see why userspace would want to
read / write a buffer between v4l2video3videodec and v4l2video0convert,
so surely all userspace needs to do is tell the kernel which buffer to
pass along? after all, its allocated in kernel, written in kernel, and
will be consumed by another driver from kernel memory. Why is userspace
even trying to copy the data at all?

-Ian


More information about the gstreamer-devel mailing list