[Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify and optimize reference frame store updates.

Gwenole Beauchesne gb.devel at gmail.com
Thu Jun 5 06:29:14 PDT 2014


Hi,

2014-06-05 14:44 GMT+02:00 Xiang, Haihao <haihao.xiang at intel.com>:
>
>
>> -----Original Message-----
>> From: Libva [mailto:libva-bounces at lists.freedesktop.org] On Behalf Of
>> Gwenole Beauchesne
>> Sent: Thursday, June 05, 2014 5:20 PM
>> To: Zhao, Yakui
>> Cc: libva at lists.freedesktop.org
>> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify and
>> optimize reference frame store updates.
>>
>> 2014-06-04 7:30 GMT+02:00 Zhao, Yakui <yakui.zhao at intel.com>:
>> > On Tue, 2014-06-03 at 22:59 -0600, Gwenole Beauchesne wrote:
>> >> 2014-06-04 4:11 GMT+02:00 Zhao, Yakui <yakui.zhao at intel.com>:
>> >> > On Tue, 2014-06-03 at 10:43 -0600, Gwenole Beauchesne wrote:
>> >> >> Simplify and optimize the update process of the reference frame store.
>> >> >> Use less iterations to look up existing objects. Use a cache to
>> >> >> store the free'd slots.
>> >> >>
>> >> >> Prerequisite: the reference_objects[] array was previously
>> >> >> arranged in a way that the element at index i is exactly the
>> >> >> object_surface that corresponds to the VA surface identified by
>> >> >> the VAPictureH264.picture_id located at index i in the ReferenceFrames[]
>> array.
>> >> >
>> >> > As a whole, this patch looks good to me. Based on the hardware
>> >> > requirement, it will be better that one reference picture later
>> >> > used is not assigned for the other purpose even when it is not used
>> >> > in current reference frame list. In such case this patch does some
>> >> > optimization about DPB, which can defer the usage of one picture
>> >> > later used that is not in current reference frame.especially when:
>> >> > one picture is not used during decoding frame X while it is used during
>> decoding frame X + 1.
>> >>
>> >> The case you mention was supposed to be fixed with the other patch
>> >> series where the re-factorisation is a pre-requisite. :) A very
>> >> strict GenFrameStoreContext needs to be maintained with the set of
>> >> used_frames[] as it is today, and an additional set of free_frames[]
>> >> *candidates*. Though, this case is not covered by the conformance
>> >> suite, and I haven't found any real world case exhibiting that,
>> >> neither did I try to generate one yet. So we can defer this one.
>> >>
>> >> > Although it still has no improvment for the following scenario, it
>> >> > still is an improvement.
>> >> >     >One picture is not used during decoding frame X and X + 1
>> >> > while it is used during decoding frame X + 2.
>> >> >
>> >> > Of course the perfect solution is that the upper layer can assure
>> >> > that one picture later used is still added into the DPB of current
>> >> > frame although it is not used in current reference list.
>> >>
>> >> No, this is not going to happen. This would be an implementation
>> >> error of the standard.
>> >
>> > In theory it is no problem for normal H264. But it is true for MVC
>> > case as some reference frames for view 1 is not used when decoding view 0.
>>
>> When I said this was not going to happen, this concerned the "perfect
>> solution" you proposed, which is not one. That would be a workaround to
>> driver bugs, and an error of implementation of the standard. Of course, the
>> DPB holds all the necessary view components, but the
>> VAPictureH264.ReferenceFrames[] is not exactly the DPB, and it cannot be.
>
> The number of elements in VAPictureH264.ReferenceFrames[] is 16, so it is enough to hold
> All reference surfaces for H.264. and the patch works as before, I am OK for your patch.
>
>>
>> The case you mention is what I said a paragraph above, and the only "perfect"
>> way to solve it is to correctly track the references in the driver side. This was
>> what the GenFrameStoreContext was for. However, as I mentioned, there is no
>> supporting case in the conformance test suite, and I haven't come by a
>> real-world sample exhibiting this.
>
> The case Yakui mentioned is not for H.264 as a surface should be in DPB if it is referenced by later
> Surface.
>
> Xiaowei told us each view uses a separated DPB buffer for MVC

There is no such thing in MVC. There is a unique DPB with a size
scalable by a log2 number of views.

> A case might occurs as below:
>
> View0: f0, f1, f2, f3, ...
> View1: s0, s1, s2, s3, ...
>
> f0 is referenced by f1, but f0 isn't referenced by other surfaces in view0,  f1 is reference by f2.
> F0 is also referenced by s1 , both f0 and f1 are referenced by S2 in view1
>
> When decoding f1, f0 is added into View0's DPB, so it is added into frame store too, the frame store index for f0 is
> *0*.
>
> When decoding s1, f0 is added into view1's DPB, it has been in the frame store, the frame store index for f0 is still *0*.
>
> When decoding f2, f0 is removed from view0's DPB and f1 is added into view0's DPB,
> so f0 is removed from frame store and f1 is added into frame store too, now the frame store index
> For f1 is 0,
>
> When decoding s2, f0 is still in View1's DPB so it will be added into the frame store again, and now the frame
> store index for f0 is *1*
>
> The frame store index for f0 is changed from 0 to 1 and the decoding result will be wrong.

That's exactly the thing we are discussing since the beginning. :)

And there is no way to handle that from the unique GenFrameStore array
as it is today, even if you delay the marking as "free", or whatever.
You need a more elaborated data struct. And this is what was to be
provided with the GenFrameStoreContext and additional re-factoring so
that I don't have to copy code around again.

I am working on an alternative that wouldn't need too many changes at this time.

Regards,
Gwenole.

>
>>
>> >> >> Theory of operations:
>> >> >>
>> >> >> 1. Obsolete entries are removed first, i.e. entries in the internal DPB
>> >> >>    that no longer have a match in the supplied ReferenceFrames[] array.
>> >> >>    That obsolete entry index is stored in a local cache: free_slots[].
>> >> >>
>> >> >> 2. This cache is completed with entries considered as "invalid" or "not
>> >> >>    present", sequentially while traversing the frame store for obsolete
>> >> >>    entries. At the end of this removal process, the free_slots[] array
>> >> >>    represents all possible indices in there that could be re-used for
>> >> >>    new reference frames to track.
>> >> >>
>> >> >> 3. The list of ReferenceFrames[] objects is traversed for new entries
>> >> >>    that are not already in the frame store. If an entry needs to be
>> >> >>    added, it is placed at the index obtained from the next free_slots[]
>> >> >>    element. There is no need to traverse the frame store array again,
>> >> >>    the next available slot can be known from that free_slots[] cache.
>> >> >>
>> >> >> Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne at intel.com>
>> >> >> ---
>> >> >>  src/i965_decoder_utils.c |  107
>> >> >> ++++++++++++++++++++--------------------------
>> >> >>  1 file changed, 46 insertions(+), 61 deletions(-)
>> >> >>
>> >> >> diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c
>> >> >> index 2570be9..c686235 100644
>> >> >> --- a/src/i965_decoder_utils.c
>> >> >> +++ b/src/i965_decoder_utils.c
>> >> >> @@ -418,88 +418,73 @@ gen6_send_avc_ref_idx_state(  }
>> >> >>
>> >> >>  void
>> >> >> -intel_update_avc_frame_store_index(VADriverContextP ctx,
>> >> >> -                                   struct decode_state
>> *decode_state,
>> >> >> -
>> VAPictureParameterBufferH264 *pic_param,
>> >> >> -                                   GenFrameStore
>> frame_store[MAX_GEN_REFERENCE_FRAMES])
>> >> >> +intel_update_avc_frame_store_index(
>> >> >> +    VADriverContextP              ctx,
>> >> >> +    struct decode_state          *decode_state,
>> >> >> +    VAPictureParameterBufferH264 *pic_param,
>> >> >> +    GenFrameStore
>> frame_store[MAX_GEN_REFERENCE_FRAMES]
>> >> >> +)
>> >> >>  {
>> >> >> -    int i, j;
>> >> >> -
>> >> >> -    assert(MAX_GEN_REFERENCE_FRAMES ==
>> ARRAY_ELEMS(pic_param->ReferenceFrames));
>> >> >> -
>> >> >> -    for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
>> >> >> -        int found = 0;
>> >> >> -
>> >> >> -        if (frame_store[i].surface_id == VA_INVALID_ID ||
>> >> >> -            frame_store[i].obj_surface == NULL)
>> >> >> +    int free_slots[MAX_GEN_REFERENCE_FRAMES];
>> >> >> +    int i, j, n, num_free_slots;
>> >> >> +    bool found;
>> >> >> +
>> >> >
>> >> > It will be better to fill the free_slots through the following two
>> >> > loops instead of one loop, which is to try to defer the usage of
>> >> > the reference picture later used that is not used in current reference frame
>> list.
>> >> > a. check the invalid surface_id or NULL surface and then add it
>> >> > into free_slots.
>> >> > b. check whether one picture is used by the current reference frame
>> >> > list. If not, add it into the free_slots.
>> >>
>> >> The previous algorithm was filling in the holes. This one does the
>> >> same. I don't see where your proposal could bring an improvement?
>> >
>> > In your patch you use the free_slots to find one free slot to hold a
>> > new reference picture in DPB. Right? It will firstly try to find one
>> > free_slot with invalid picture_id or surface, which is used to hold a
>> > new reference picture in DPB.
>>
>> No, it will take whatever comes first between:
>> - a slot with an invalid picture_id or surface ;
>> - or a slot with a retired reference picture candidate, i.e. something that cannot
>> be found in the ReferenceFrames[] array.
>> And this works exactly the same way as before. No functional change yet.
>>
>> > It can remove the possibility that one picture later used is
>> > immediately used to hold a new reference picture in DPB. So based on
>> > the above point IMO it is better to find the free_slots through two
>> > steps to hold the new reference picture in DPB.
>>
>> This would only hide further this situation, but OK.
>>
>> Regards,
>> Gwenole.
>>
>> >> >> +    /* Remove obsolete entries from the internal DPB */
>> >> >> +    for (i = 0, n = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
>> >> >> +        GenFrameStore * const fs = &frame_store[i];
>> >> >> +        if (fs->surface_id == VA_INVALID_ID || !fs->obj_surface) {
>> >> >> +            free_slots[n++] = i;
>> >> >>              continue;
>> >> >> +        }
>> >> >>
>> >> >> -        assert(frame_store[i].frame_store_id != -1);
>> >> >> -
>> >> >> -        for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
>> >> >> -            VAPictureH264 *ref_pic =
>> &pic_param->ReferenceFrames[j];
>> >> >> -            if (ref_pic->flags & VA_PICTURE_H264_INVALID)
>> >> >> +        // Find whether the current entry is still a valid reference
>> frame
>> >> >> +        found = false;
>> >> >> +        for (j = 0; j < ARRAY_ELEMS(decode_state->reference_objects);
>> j++) {
>> >> >> +            struct object_surface * const obj_surface =
>> >> >> +                decode_state->reference_objects[j];
>> >> >> +            if (!obj_surface)
>> >> >>                  continue;
>> >> >> -
>> >> >> -            if (frame_store[i].surface_id == ref_pic->picture_id) {
>> >> >> -                found = 1;
>> >> >> +            if ((found = obj_surface == fs->obj_surface))
>> >> >>                  break;
>> >> >> -            }
>> >> >>          }
>> >> >>
>> >> >> -        /* remove it from the internal DPB */
>> >> >> +        // ... or remove it
>> >> >>          if (!found) {
>> >> >> -            frame_store[i].surface_id = VA_INVALID_ID;
>> >> >> -            frame_store[i].frame_store_id = -1;
>> >> >> -            frame_store[i].obj_surface = NULL;
>> >> >> +            fs->surface_id = VA_INVALID_ID;
>> >> >> +            fs->obj_surface = NULL;
>> >> >> +            fs->frame_store_id = -1;
>> >> >> +            free_slots[n++] = i;
>> >> >>          }
>> >> >>      }
>> >> >> +    num_free_slots = n;
>> >> >>
>> >> >> -    for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
>> >> >> -        VAPictureH264 *ref_pic = &pic_param->ReferenceFrames[i];
>> >> >> -        int found = 0;
>> >> >> -
>> >> >> -        if (ref_pic->flags & VA_PICTURE_H264_INVALID ||
>> >> >> -            ref_pic->picture_id == VA_INVALID_SURFACE ||
>> >> >> -            decode_state->reference_objects[i] == NULL)
>> >> >> +    /* Append the new reference frames */
>> >> >> +    for (i = 0, n = 0; i < ARRAY_ELEMS(decode_state->reference_objects);
>> i++) {
>> >> >> +        struct object_surface * const obj_surface =
>> >> >> +            decode_state->reference_objects[i];
>> >> >> +        if (!obj_surface)
>> >> >>              continue;
>> >> >>
>> >> >> +        // Find whether the current frame is not already in our frame
>> store
>> >> >> +        found = false;
>> >> >>          for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
>> >> >> -            if (frame_store[j].surface_id == ref_pic->picture_id) {
>> >> >> -                found = 1;
>> >> >> +            if ((found = frame_store[j].obj_surface ==
>> >> >> + obj_surface))
>> >> >>                  break;
>> >> >> -            }
>> >> >>          }
>> >> >>
>> >> >> -        /* add the new reference frame into the internal DPB */
>> >> >> +        // ... or add it
>> >> >>          if (!found) {
>> >> >> -            int frame_idx;
>> >> >> -            int slot_found;
>> >> >> -            struct object_surface *obj_surface =
>> decode_state->reference_objects[i];
>> >> >> -
>> >> >> -            slot_found = 0;
>> >> >> -            frame_idx = -1;
>> >> >> -            /* Find a free frame store index */
>> >> >> -            for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
>> >> >> -                if (frame_store[j].surface_id == VA_INVALID_ID ||
>> >> >> -                    frame_store[j].obj_surface == NULL) {
>> >> >> -                    frame_idx = j;
>> >> >> -                    slot_found = 1;
>> >> >> -                    break;
>> >> >> -                }
>> >> >> +            if (n < num_free_slots) {
>> >> >> +                GenFrameStore * const fs =
>> &frame_store[free_slots[n++]];
>> >> >> +                fs->surface_id = obj_surface->base.id;
>> >> >> +                fs->obj_surface = obj_surface;
>> >> >> +                fs->frame_store_id = fs - frame_store;
>> >> >> +            }
>> >> >> +            else {
>> >> >> +                WARN_ONCE("No free slot found for DPB reference
>> >> >> + list!!!\n");
>> >> >>              }
>> >> >> -
>> >> >> -
>> >> >> -         if (slot_found) {
>> >> >> -                frame_store[j].surface_id = ref_pic->picture_id;
>> >> >> -                frame_store[j].frame_store_id = frame_idx;
>> >> >> -                frame_store[j].obj_surface = obj_surface;
>> >> >> -            } else {
>> >> >> -                WARN_ONCE("Not free slot for DPB reference
>> list!!!\n");
>> >> >> -         }
>> >> >>          }
>> >> >>      }
>> >> >> -
>> >> >>  }
>> >> >>
>> >> >>  void
>> >> >
>> >> >
>> >
>> >
>> _______________________________________________
>> Libva mailing list
>> Libva at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/libva


More information about the Libva mailing list