[PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management
Dmitry Osipenko
digetx at gmail.com
Mon Mar 15 18:39:06 UTC 2021
15.03.2021 01:31, Michał Mirosław пишет:
> On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
>> Display controller (DC) performs isochronous memory transfers, and thus,
>> has a requirement for a minimum memory bandwidth that shall be fulfilled,
>> otherwise framebuffer data can't be fetched fast enough and this results
>> in a DC's data-FIFO underflow that follows by a visual corruption.
> [...]
>> +static unsigned long
>> +tegra_plane_overlap_mask(struct drm_crtc_state *state,
>> + const struct drm_plane_state *plane_state)
>> +{
>> + const struct drm_plane_state *other_state;
>> + const struct tegra_plane *tegra;
>> + unsigned long overlap_mask = 0;
>> + struct drm_plane *plane;
>> + struct drm_rect rect;
>> +
>> + if (!plane_state->visible || !plane_state->fb)
>> + return 0;
>> +
>> + drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
> [...]
>> + /*
>> + * Data-prefetch FIFO will easily help to overcome temporal memory
>> + * pressure if other plane overlaps with the cursor plane.
>> + */
>> + if (tegra_plane_is_cursor(plane_state) && overlap_mask)
>> + return 0;
>> +
>> + return overlap_mask;
>> +}
>
> Since for cursor plane this always returns 0, you could test
> tegra_plane_is_cursor() at the start of the function.
Yes, thanks.
>> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,
>> + struct drm_atomic_state *state)
> [...]
>> + /*
>> + * For overlapping planes pixel's data is fetched for each plane at
>> + * the same time, hence bandwidths are accumulated in this case.
>> + * This needs to be taken into account for calculating total bandwidth
>> + * consumed by all planes.
>> + *
>> + * Here we get the overlapping state of each plane, which is a
>> + * bitmask of plane indices telling with what planes there is an
>> + * overlap. Note that bitmask[plane] includes BIT(plane) in order
>> + * to make further code nicer and simpler.
>> + */
>> + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) {
>> + tegra_state = to_const_tegra_plane_state(plane_state);
>> + tegra = to_tegra_plane(plane);
>> +
>> + if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))
>> + return -EINVAL;
>> +
>> + plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth;
>> + mask = tegra_plane_overlap_mask(new_state, plane_state);
>> + overlap_mask[tegra->index] = mask;
>> +
>> + if (hweight_long(mask) != 3)
>> + all_planes_overlap_simultaneously = false;
>> + }
>> +
>> + old_state = drm_atomic_get_old_crtc_state(state, crtc);
>> + old_dc_state = to_const_dc_state(old_state);
>> +
>> + /*
>> + * Then we calculate maximum bandwidth of each plane state.
>> + * The bandwidth includes the plane BW + BW of the "simultaneously"
>> + * overlapping planes, where "simultaneously" means areas where DC
>> + * fetches from the planes simultaneously during of scan-out process.
>> + *
>> + * For example, if plane A overlaps with planes B and C, but B and C
>> + * don't overlap, then the peak bandwidth will be either in area where
>> + * A-and-B or A-and-C planes overlap.
>> + *
>> + * The plane_peak_bw[] contains peak memory bandwidth values of
>> + * each plane, this information is needed by interconnect provider
>> + * in order to set up latency allowness based on the peak BW, see
>> + * tegra_crtc_update_memory_bandwidth().
>> + */
>> + for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
>> + overlap_bw = 0;
>> +
>> + for_each_set_bit(k, &overlap_mask[i], 3) {
>> + if (k == i)
>> + continue;
>> +
>> + if (all_planes_overlap_simultaneously)
>> + overlap_bw += plane_peak_bw[k];
>> + else
>> + overlap_bw = max(overlap_bw, plane_peak_bw[k]);
>> + }
>> +
>> + new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
>> +
>> + /*
>> + * If plane's peak bandwidth changed (for example plane isn't
>> + * overlapped anymore) and plane isn't in the atomic state,
>> + * then add plane to the state in order to have the bandwidth
>> + * updated.
>> + */
>> + if (old_dc_state->plane_peak_bw[i] !=
>> + new_dc_state->plane_peak_bw[i]) {
>> + plane = tegra_crtc_get_plane_by_index(crtc, i);
>> + if (!plane)
>> + continue;
>> +
>> + plane_state = drm_atomic_get_plane_state(state, plane);
>> + if (IS_ERR(plane_state))
>> + return PTR_ERR(plane_state);
>> + }
>> + }
> [...]
>
> Does it matter to which channel (plane) the peak bw is attached? Would
> it still work if the first channel specified max(peak_bw of overlaps)
> and others only zeroes?
The latency allowance will be configured incorrectly for the case of
zeroes by the memory driver, hence it does matter.
>> + /*
>> + * Horizontal downscale needs a lower memory latency, which roughly
>> + * depends on the scaled width. Trying to tune latency of a memory
>> + * client alone will likely result in a strong negative impact on
>> + * other memory clients, hence we will request a higher bandwidth
>> + * since latency depends on bandwidth. This allows to prevent memory
>> + * FIFO underflows for a large plane downscales, meanwhile allowing
>> + * display to share bandwidth fairly with other memory clients.
>> + */
>> + if (src_w > dst_w)
>> + mul = (src_w - dst_w) * bpp / 2048 + 1;
>> + else
>> + mul = 1;
> [...]
>
> One point is unexplained yet: why is the multiplier proportional to a
> *difference* between src and dst widths? Also, I would expect max (worst
> case) is pixclock * read_size when src_w/dst_w >= read_size.
IIRC, the difference gives a more adequate/practical result than the
proportion. Although, downstream driver uses proportion. I'll try to
revisit this for the next version of the patch.
> BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...
Indeed, and should be a bit nicer to use 'mul' in both cases.
More information about the dri-devel
mailing list