[PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors

Boris Brezillon boris.brezillon at bootlin.com
Mon Nov 5 11:36:48 UTC 2018


Hi Eric,

On Tue, 30 Oct 2018 16:12:55 -0700
Eric Anholt <eric at anholt.net> wrote:

> > +
> > +	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let  
> 
> "absolute"

Will fix the typo.

> 
> > +	 * the system work when other blocks are accessing the memory.
> > +	 */
> > +	if (load_state->membus_load > SZ_1G + SZ_512M)
> > +		return -ENOSPC;  
> 
> 
> > +	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> > +	 * consider the maximum number of cycles is 240M.
> > +	 */
> > +	if (load_state->hvs_load > 240000000ULL)
> > +		return -ENOSPC;  
> 
> Some day we should probably be using the HVS's actual clock from DT if
> available instead of a hardcoded number here.

I agree.

> Good enough for now.
> 
> > +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
> > +					   struct drm_private_state *state)
> > +{
> > +	struct vc4_load_tracker_state *load_state;
> > +
> > +	load_state = to_vc4_load_tracker_state(state);
> > +	kfree(load_state);
> > +}  
> 
> Optional: just kfree(state) for simplicity.

Hm, not sure that's a good idea. kfree(state) works as long as
drm_private_state is the first field in vc4_load_tracker_state, but it
sounds a bit fragile.

I can do

	kfree(to_vc4_load_tracker_state(state));

if you prefer.

> 
> > +static void vc4_plane_calc_load(struct drm_plane_state *state)
> > +{
> > +	unsigned int hvs_load_shift, vrefresh, i;
> > +	struct drm_framebuffer *fb = state->fb;
> > +	struct vc4_plane_state *vc4_state;
> > +	struct drm_crtc_state *crtc_state;
> > +	unsigned int vscale_factor;
> > +
> > +	vc4_state = to_vc4_plane_state(state);
> > +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> > +							state->crtc);
> > +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> > +
> > +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
> > +	 * 4 pixels/cycle otherwise.
> > +	 * Alpha blending step seems to be pipelined and it's always operating
> > +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
> > +	 * scaler block.
> > +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
> > +	 */
> > +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> > +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> > +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> > +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> > +		hvs_load_shift = 1;
> > +	else
> > +		hvs_load_shift = 2;
> > +
> > +	vc4_state->membus_load = 0;
> > +	vc4_state->hvs_load = 0;
> > +	for (i = 0; i < fb->format->num_planes; i++) {
> > +		unsigned long pixels_load;  
> 
> I'm scared any time I see longs.  Do you want 32 or 64 bits here?

I just assumed a 32bit unsigned var would be enough, so unsigned long
seemed just fine. I can use u32 or u64 if you prefer.

> 
> > +		/* Even if the bandwidth/plane required for a single frame is
> > +		 *
> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> > +		 *
> > +		 * when downscaling, we have to read more pixels per line in
> > +		 * the time frame reserved for a single line, so the bandwidth
> > +		 * demand can be punctually higher. To account for that, we
> > +		 * calculate the down-scaling factor and multiply the plane
> > +		 * load by this number. We're likely over-estimating the read
> > +		 * demand, but that's better than under-estimating it.
> > +		 */
> > +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> > +					     vc4_state->crtc_h);
> > +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> > +			      vscale_factor;  
> 
> If we're upscaling (common for video, right?), aren't we under-counting
> the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
> pixels per cycle.

That's not entirely clear to me. I'm not sure what the scaler does when
upscaling. Are the same pixels read several times from the memory? If
that's the case, then the membus load should indeed be based on the
crtc_w,h.

Also, when the spec says the HVS can process 4pixels/cycles, is it 4
input pixels or 4 output pixels per cycle?

Regards,

Boris


More information about the dri-devel mailing list