[Freedreno] [PATCH v2] drm/msm/dpu: improve DSC allocation
Kuogee Hsieh
quic_khsieh at quicinc.com
Wed Dec 6 17:34:30 UTC 2023
On 12/6/2023 1:35 AM, Dmitry Baryshkov wrote:
> On 05/12/2023 22:51, Kuogee Hsieh wrote:
>>
>> On 12/5/2023 11:23 AM, Dmitry Baryshkov wrote:
>>> On Tue, 5 Dec 2023 at 20:12, Kuogee Hsieh <quic_khsieh at quicinc.com>
>>> wrote:
>>>>
>>>> On 12/4/2023 4:23 PM, Dmitry Baryshkov wrote:
>>>>> On Tue, 5 Dec 2023 at 01:55, Kuogee Hsieh
>>>>> <quic_khsieh at quicinc.com> wrote:
>>>>>> A DCE (Display Compression Engine) contains two DSC hard slice
>>>>>> encoders. Each DCE start with even DSC encoder index followed by
>>>>>> an odd DSC encoder index. Each encoder can work independently.
>>>>>> But Only two DSC encoders from same DCE can be paired to work
>>>>>> together to support merge mode. In addition, the DSC with even
>>>>>> index have to mapping to even pingpong index and DSC with odd
>>>>>> index have to mapping to odd pingpong index at its data path.
>>>>>> This patch improve DSC allocation mechanism with consideration
>>>>>> of above factors.
>>>>>>
>>>>>> Changes in V2:
>>>>>> -- split _dpu_rm_reserve_dsc() into _dpu_rm_reserve_dsc_single() and
>>>>>> _dpu_rm_reserve_dsc_pair()
>>>>> Please don't send the new iteration of the patch if the discussion
>>>>> is ongoing.
>>>> Got it, thanks.
>>>>> Quoting v1 review:
>>>>>
>>>>> Are the limitations (odd:odd, allocation in pairs, etc) applicable to
>>>>> v1.1 encoders?
>>>>>
>>>>> I assume that at least 'allocate two consecutive DSC for DSC
>>>>> merge' is
>>>>> not applicable, since there are no separate DCE units.
>>>> yes, you are correct in the hardware point of view.
>>>>
>>>> However, at software, we can think think of dsc index 0 and 1 are
>>>> bound
>>>> to DCE #1, index 2 and 3 are bound to DCE #2 and etc in regardless of
>>>> v1.1 or v1.2.
>>>>
>>>> By doing that,this dsc allocation algorithm should be able to apply to
>>>> both.
>>>>
>>>> There is no case to have dsc index 1 and dsc index 2 bind together
>>>> (skip
>>>> dsc index 0) to support merge mode.
>>> Yes. However this might cause issues on the platforms which have DSI,
>>> DP and just two DSC encoders. Consider RM allocating two odd (or two
>>> even) PP blocks. One for DSI, one for DP. Then if we need DSC on both
>>> interfaces, the RM won't be able to allocate it.
>>
>>
>> I am not sure this case is possible.
>>
>> DSC + pingpong allocation is base on Layer mixer which is allocated
>> sequentially.
>
> Not sequentially, but also in pairs. Yes, LM_5 (a pair for LM_2 on
> sdm845) is connected to PINGPONG_3. However all this doesn't make
> things easier to understand or to snoop bugs. I'd prefer to keep a
> simple allocation code for older DSC blocks. Especially since I might
> have to touch LM allocation for MDP 1.x platforms.
1) from dpu_rm_reserve() --> _dpu_rm_make_reservation() -->
_dpu_rm_reserve_dsc() path, there is no any flag can be used to identify
is this allocation is for dsc V1.1 or V1.2
2) your earlier question -- Are the limitations (odd:odd, allocation in
pairs, etc) applicable to v1.1 encoders? <== the answer is true, my
apology to have not confirmed this earlier.
I try to said is the checking pp-index and dsc-index (odd:odd,
even:even) at new implementation is to correct original (v1.1) mistake.
>
>> ex, first lm --> pingpong --> dsc allocate completed then followed by
>> next lm --> pingpong --> dsc allocation.
>>
>> therefore it is not possible to have case with two odd pingpong index
>> to map two odd dsc index.
>>
>> With this algorithm, there is one case (below) which can not be
>> supported is,
>>
>> dsc_0 for pingpong-0 of stand alone mode + dsc-1 and dsc-2 for
>> pingpong-1 and ping pong-2 to support merge mode for DSC v1.1.
>>
>> However there is no hardware configuration which only have 3 or 5
>> dsc encoders due to dsc always come in pair except some low end chip
>> which mdp come with only one dsc encoder.
>>
>>
>>
>>
>>>
>>>>>> Signed-off-by: Kuogee Hsieh <quic_khsieh at quicinc.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c | 171
>>>>>> ++++++++++++++++++++++++++++++---
>>>>>> 1 file changed, 156 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
>>>>>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
>>>>>> index 17ecf23..dafe1ee 100644
>>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
>>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
>>>>>> @@ -470,33 +470,174 @@ static int _dpu_rm_reserve_ctls(
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> -static int _dpu_rm_reserve_dsc(struct dpu_rm *rm,
>>>>>> +static int _dpu_rm_reserve_dsc_single(struct dpu_rm *rm,
>>>>>> struct dpu_global_state
>>>>>> *global_state,
>>>>>> - struct drm_encoder *enc,
>>>>>> + uint32_t enc_id,
>>>>>> const struct
>>>>>> msm_display_topology *top)
>>>>>> {
>>>>>> - int num_dsc = top->num_dsc;
>>>>>> - int i;
>>>>>> + int num_dsc = 0;
>>>>>> + int i, pp_idx;
>>>>>> + int dsc_idx[DSC_MAX - DSC_0];
>>>>>> + uint32_t pp_to_enc_id[PINGPONG_MAX - PINGPONG_0];
>
> You don't need pp_to_enc_id copy in the _single usecase. Just use the
> exisiting rm data.
>
>>>>>> + int pp_max = PINGPONG_MAX - PINGPONG_0;
>
> inline
>
>>>>>> +
>>>>>> + for (i = 0; i < DSC_MAX - DSC_0; i++)
>>>>>> + dsc_idx[i] = 0;
>
> You know, this is called memset.
>
>>>>>> +
>>>>>> + /* fill working copy with pingpong list */
>>>>>> + memcpy(pp_to_enc_id, global_state->pingpong_to_enc_id,
>>>>>> sizeof(pp_to_enc_id));
>>>>>> +
>>>>>> + for (i = 0; i < ARRAY_SIZE(rm->dsc_blks) && num_dsc >=
>>>>>> top->num_dsc; i++) {
>>>>>> + if (!rm->dsc_blks[i])
>>>>>> + continue;
>>>>>>
>>>>>> - /* check if DSC required are allocated or not */
>>>>>> - for (i = 0; i < num_dsc; i++) {
>>>>>> - if (!rm->dsc_blks[i]) {
>>>>>> - DPU_ERROR("DSC %d does not exist\n", i);
>>>>>> - return -EIO;
>>>>>> + if (global_state->dsc_to_enc_id[i]) /* used */
>>>>>> + continue;
>>>>>> +
>>>>>> + /*
>>>>>> + * find the pingpong index which had been reserved
>>>>>> + * previously at layer mixer allocation
>>>>>> + */
>>>>>> + for (pp_idx = 0; pp_idx < pp_max; pp_idx++) {
>>>>>> + if (pp_to_enc_id[pp_idx] == enc_id)
>
> !reserved_by_other()
>
>>>>>> + break;
>>>>>> }
>>>>>>
>>>>>> - if (global_state->dsc_to_enc_id[i]) {
>>>>>> - DPU_ERROR("DSC %d is already
>>>>>> allocated\n", i);
>>>>>> - return -EIO;
>>>>>> + /*
>>>>>> + * dsc even index must map to pingpong even index
>>>>>> + * dsc odd index must map to pingpong odd index
>>>>>> + */
>>>>>> + if ((i & 0x01) != (pp_idx & 0x01))
>>>>>> + continue;
>>>>>> +
>>>>>> + dsc_idx[num_dsc++] = i + 1; /* found, start
>>>>>> from 1 */
>
> It's not +1. It is DSC_0 + i.
>
>>>>>> + }
>>>>>> +
>>>>>> + if (num_dsc < top->num_dsc) {
>>>>>> + DPU_ERROR("DSC allocation failed num_dsc=%d
>>>>>> required=%d\n",
>>>>>> + num_dsc,
>>>>>> top->num_dsc);
>>>>>> + return -ENAVAIL;
>>>>>> + }
>>>>>> +
>>>>>> + /* reserve dsc */
>>>>>> + for (i = 0; i < top->num_dsc; i++) {
>>>>>> + int j;
>>>>>> +
>>>>>> + j = dsc_idx[i];
>>>>>> + if (j)
>>>>>> + global_state->dsc_to_enc_id[j-1] = enc_id;
>>>>>> + }
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int _dpu_rm_reserve_dsc_pair(struct dpu_rm *rm,
>>>>>> + struct dpu_global_state
>>>>>> *global_state,
>>>>>> + uint32_t enc_id,
>>>>>> + const struct msm_display_topology
>>>>>> *top)
>
>
>>>>>> +{
>>>>>> + int num_dsc = 0;
>>>>>> + int i, pp_idx;
>>>>>> + int dsc_idx[DSC_MAX - DSC_0];
>>>>>> + uint32_t pp_to_enc_id[PINGPONG_MAX - PINGPONG_0];
>>>>>> + int pp_max = PINGPONG_MAX - PINGPONG_0;
>>>>>> +
>>>>>> + for (i = 0; i < DSC_MAX - DSC_0; i++)
>>>>>> + dsc_idx[i] = 0;
>>>>>> +
>>>>>> + /* fill working copy with pingpong list */
>>>>>> + memcpy(pp_to_enc_id, global_state->pingpong_to_enc_id,
>>>>>> sizeof(pp_to_enc_id));
>>>>>> +
>>>>>> + for (i = 0; i < ARRAY_SIZE(rm->dsc_blks) && num_dsc >=
>>>>>> top->num_dsc; i++) {
>
> Much easier. Just do i+= 2 here and in the loop check that the second
> half of DCE is not used.
>
>>>>>> + if (!rm->dsc_blks[i])
>>>>>> + continue;
>>>>>> +
>>>>>> + if (global_state->dsc_to_enc_id[i]) { /* used */
>>>>>> + /* consective dsc index to be paired */
>>>>>> + if (num_dsc) { /* already start pairing,
>>>>>> re start search */
>>>>>> + num_dsc = 0;
>>>>>> + /* fill working copy with
>>>>>> pingpong list */
>>>>>> + memcpy(pp_to_enc_id,
>>>>>> global_state->pingpong_to_enc_id,
>>>>>> + sizeof(pp_to_enc_id));
>>>>>> + }
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>>>>>> + /* odd index can not become start of pairing */
>>>>>> + if (i & 0x01 && !num_dsc)
>>>>>> + continue;
>>>>>> +
>>>>>> + /*
>>>>>> + * find the pingpong index which had been reserved
>>>>>> + * previously at layer mixer allocation
>>>>>> + */
>>>>>> + for (pp_idx = 0; pp_idx < pp_max; pp_idx++) {
>>>>>> + if (pp_to_enc_id[pp_idx] == enc_id)
>>>>>> + break;
>>>>>> }
>>>>>> +
>>>>>> + /*
>>>>>> + * dsc even index must map to pingpong even index
>>>>>> + * dsc odd index must map to pingpong odd index
>>>>>> + */
>>>>>> + if ((i & 0x01) != (pp_idx & 0x01))
>>>>>> + continue;
>>>>>> +
>>>>>> + /*
>>>>>> + * delete pp_idx so that next pp_idx can be
>>>>>> paired with
>>>>>> + * next dsc_idx
>>>>>> + */
>>>>>> + pp_to_enc_id[pp_idx] = 0xffff;
>>>>>> +
>>>>>> + dsc_idx[num_dsc++] = i + 1; /* found, start
>>>>>> from 1 */
>>>>>> }
>>>>>>
>>>>>> - for (i = 0; i < num_dsc; i++)
>>>>>> - global_state->dsc_to_enc_id[i] = enc->base.id;
>>>>>> + if (num_dsc < top->num_dsc) {
>>>>>> + DPU_ERROR("DSC allocation failed num_dsc=%d
>>>>>> required=%d\n",
>>>>>> + num_dsc,
>>>>>> top->num_dsc);
>>>>>> + return -ENAVAIL;
>>>>>> + }
>>>>>> +
>>>>>> + /* reserve dsc */
>>>>>> + for (i = 0; i < top->num_dsc; i++) {
>>>>>> + int j;
>>>>>> +
>>>>>> + j = dsc_idx[i];
>>>>>> + if (j)
>>>>>> + global_state->dsc_to_enc_id[j-1] = enc_id;
>>>>>> + }
>>>>>>
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static int _dpu_rm_reserve_dsc(struct dpu_rm *rm,
>>>>>> + struct dpu_global_state
>>>>>> *global_state,
>>>>>> + uint32_t enc_id,
>>>>>> + const struct msm_display_topology
>>>>>> *top)
>>>>>> +{
>>>>>> + if (!top->num_dsc || !top->num_intf)
>>>>>> + return 0;
>>>>>> +
>>>>>> + /*
>>>>>> + * Truth:
>>>>>> + * 1) every layer mixer only connects to one pingpong
>>>>>> + * 2) no pingpong split -- which is two layer mixers
>>>>>> shared one pingpong
>>>>>> + * 3) each DSC engine contains two dsc encoders
>>>>>> + * -- index(0,1), index (2,3),... etc
>>>>> Does this apply to v1.1 encoders?
>>>>>
>>>>>> + * 4) dsc pair can only happens with same DSC engine
>>>>>> + * 5) odd pingpong connect to odd dsc
>>>>>> + * 6) even pingpong connect to even dsc
>>>>>> + * 7) pair: encoder +--> pp_idx_0 --> dsc_idx_0
>>>>>> + +--> pp_idx_1 --> dsc_idx_1
>>>>>> + */
>>>>>> +
>>>>>> + /* num_dsc should be either 1, 2 or 4 */
>>>>>> + if (top->num_dsc > top->num_intf) /* merge mode */
>>>>>> + return _dpu_rm_reserve_dsc_pair(rm, global_state,
>>>>>> enc_id, top);
>>>>>> + else
>>>>>> + return _dpu_rm_reserve_dsc_single(rm,
>>>>>> global_state, enc_id, top);
>
> Can we end up with num_dsc = 2, num_intf = 2 and
> _dpu_rm_reserve_dsc_single() allocating two DSC blocks despite its name?
>
>>>>>> +}
>>>>>> +
>>>>>> static int _dpu_rm_make_reservation(
>>>>>> struct dpu_rm *rm,
>>>>>> struct dpu_global_state *global_state,
>>>>>> @@ -518,7 +659,7 @@ static int _dpu_rm_make_reservation(
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> - ret = _dpu_rm_reserve_dsc(rm, global_state, enc,
>>>>>> &reqs->topology);
>>>>>> + ret = _dpu_rm_reserve_dsc(rm, global_state,
>>>>>> enc->base.id, &reqs->topology);
>
> Is there any reason to change this?
>
>>>>>> if (ret)
>>>>>> return ret;
>>>>>>
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>
>>>
>
More information about the Freedreno
mailing list