[Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify and optimize reference frame store updates.

Zhao, Yakui yakui.zhao at intel.com
Thu Jun 5 18:45:07 PDT 2014


On Thu, 2014-06-05 at 10:20 -0600, Gwenole Beauchesne wrote:
> 2014-06-05 17:29 GMT+02:00 Xiang, Haihao <haihao.xiang at intel.com>:
> >
> >
> >> -----Original Message-----
> >> From: Gwenole Beauchesne [mailto:gb.devel at gmail.com]
> >> Sent: Thursday, June 05, 2014 9:29 PM
> >> To: Xiang, Haihao
> >> Cc: Zhao, Yakui; libva at lists.freedesktop.org
> >> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify and
> >> optimize reference frame store updates.
> >>
> >> Hi,
> >>
> >> 2014-06-05 14:44 GMT+02:00 Xiang, Haihao <haihao.xiang at intel.com>:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Libva [mailto:libva-bounces at lists.freedesktop.org] On Behalf Of
> >> >> Gwenole Beauchesne
> >> >> Sent: Thursday, June 05, 2014 5:20 PM
> >> >> To: Zhao, Yakui
> >> >> Cc: libva at lists.freedesktop.org
> >> >> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264:
> >> >> simplify and optimize reference frame store updates.
> >> >>
> >> >> 2014-06-04 7:30 GMT+02:00 Zhao, Yakui <yakui.zhao at intel.com>:
> >> >> > On Tue, 2014-06-03 at 22:59 -0600, Gwenole Beauchesne wrote:
> >> >> >> 2014-06-04 4:11 GMT+02:00 Zhao, Yakui <yakui.zhao at intel.com>:
> >> >> >> > On Tue, 2014-06-03 at 10:43 -0600, Gwenole Beauchesne wrote:
> >> >> >> >> Simplify and optimize the update process of the reference frame
> >> store.
> >> >> >> >> Use less iterations to look up existing objects. Use a cache to
> >> >> >> >> store the free'd slots.
> >> >> >> >>
> >> >> >> >> Prerequisite: the reference_objects[] array was previously
> >> >> >> >> arranged in a way that the element at index i is exactly the
> >> >> >> >> object_surface that corresponds to the VA surface identified by
> >> >> >> >> the VAPictureH264.picture_id located at index i in the
> >> >> >> >> ReferenceFrames[]
> >> >> array.
> >> >> >> >
> >> >> >> > As a whole, this patch looks good to me. Based on the hardware
> >> >> >> > requirement, it will be better that one reference picture later
> >> >> >> > used is not assigned for the other purpose even when it is not
> >> >> >> > used in current reference frame list. In such case this patch
> >> >> >> > does some optimization about DPB, which can defer the usage of
> >> >> >> > one picture later used that is not in current reference frame.especially
> >> when:
> >> >> >> > one picture is not used during decoding frame X while it is used
> >> >> >> > during
> >> >> decoding frame X + 1.
> >> >> >>
> >> >> >> The case you mention was supposed to be fixed with the other patch
> >> >> >> series where the re-factorisation is a pre-requisite. :) A very
> >> >> >> strict GenFrameStoreContext needs to be maintained with the set of
> >> >> >> used_frames[] as it is today, and an additional set of
> >> >> >> free_frames[] *candidates*. Though, this case is not covered by
> >> >> >> the conformance suite, and I haven't found any real world case
> >> >> >> exhibiting that, neither did I try to generate one yet. So we can defer this
> >> one.
> >> >> >>
> >> >> >> > Although it still has no improvment for the following scenario,
> >> >> >> > it still is an improvement.
> >> >> >> >     >One picture is not used during decoding frame X and X + 1
> >> >> >> > while it is used during decoding frame X + 2.
> >> >> >> >
> >> >> >> > Of course the perfect solution is that the upper layer can
> >> >> >> > assure that one picture later used is still added into the DPB
> >> >> >> > of current frame although it is not used in current reference list.
> >> >> >>
> >> >> >> No, this is not going to happen. This would be an implementation
> >> >> >> error of the standard.
> >> >> >
> >> >> > In theory it is no problem for normal H264. But it is true for MVC
> >> >> > case as some reference frames for view 1 is not used when decoding view
> >> 0.
> >> >>
> >> >> When I said this was not going to happen, this concerned the "perfect
> >> >> solution" you proposed, which is not one. That would be a workaround
> >> >> to driver bugs, and an error of implementation of the standard. Of
> >> >> course, the DPB holds all the necessary view components, but the
> >> >> VAPictureH264.ReferenceFrames[] is not exactly the DPB, and it cannot be.
> >> >
> >> > The number of elements in VAPictureH264.ReferenceFrames[] is 16, so it
> >> > is enough to hold All reference surfaces for H.264. and the patch works as
> >> before, I am OK for your patch.
> >> >
> >> >>
> >> >> The case you mention is what I said a paragraph above, and the only
> >> "perfect"
> >> >> way to solve it is to correctly track the references in the driver
> >> >> side. This was what the GenFrameStoreContext was for. However, as I
> >> >> mentioned, there is no supporting case in the conformance test suite,
> >> >> and I haven't come by a real-world sample exhibiting this.
> >> >
> >> > The case Yakui mentioned is not for H.264 as a surface should be in
> >> > DPB if it is referenced by later Surface.
> >> >
> >> > Xiaowei told us each view uses a separated DPB buffer for MVC
> >>
> >> There is no such thing in MVC. There is a unique DPB with a size scalable by a
> >> log2 number of views.
> >>
> >> > A case might occurs as below:
> >> >
> >> > View0: f0, f1, f2, f3, ...
> >> > View1: s0, s1, s2, s3, ...
> >> >
> >> > f0 is referenced by f1, but f0 isn't referenced by other surfaces in view0,  f1
> >> is reference by f2.
> >> > F0 is also referenced by s1 , both f0 and f1 are referenced by S2 in
> >> > view1
> >> >
> >> > When decoding f1, f0 is added into View0's DPB, so it is added into
> >> > frame store too, the frame store index for f0 is *0*.
> >> >
> >> > When decoding s1, f0 is added into view1's DPB, it has been in the frame
> >> store, the frame store index for f0 is still *0*.
> >> >
> >> > When decoding f2, f0 is removed from view0's DPB and f1 is added into
> >> > view0's DPB, so f0 is removed from frame store and f1 is added into
> >> > frame store too, now the frame store index For f1 is 0,
> >> >
> >> > When decoding s2, f0 is still in View1's DPB so it will be added into
> >> > the frame store again, and now the frame store index for f0 is *1*
> >> >
> >> > The frame store index for f0 is changed from 0 to 1 and the decoding result
> >> will be wrong.
> >>
> >> That's exactly the thing we are discussing since the beginning. :)
> >
> > The above case doesn't exist if MVC uses a unique DBP buffer as f0 is still in the DBP after
> > Decoding f2, hence the driver won't remove f0 from the frame store.
> 
> Not necessarily. Anyway, your described case does not exist at all
> because you can only have inter-view prediction with frames from the
> same access unit. So, you cannot have f0+f1 referenced by s2, only f2
> and s0+s1 could be referenced by s2. But anyway, that's the idea. A
> valid example could be f1+f2 for s2, then f1+f2 for f3, then f3+s1+s2
> for s3.
> 
> Because of the additional constraint I gave that ReferenceFrames[]
> cannot always be the DPB because MVC is not limited to 2 views to
> begin with, the case I describe is what will give issues. This is
> because you cannot simply always append inter-view reference
> components in the ReferenceFrames[] array because this is not scalable
> for MultiView High with a higher number of views. This could work for
> Stereo High because the DPB is always capped to 16 frames at most.

> Besides, you'd better see the ReferenceFrames[] array as RefPicList0[]
> U RefPicList1[] rather than the DPB. The DPB, as per the H.264 spec
> definition, could have more than 16 entries for Multiview High
> profile. And I don't really want to workaround/do different things for
> Stereo High profile vs. Multiview High profile. So, the
> ReferenceFrames[] is to be filled in with the reference frames
> effectively used (L0+L1) for decoding the current picture.

If the ReferenceFrames[] only holds the RefPicList0 + RefPicList1
instead of DPB, then it will trigger the issue Haihao and I mentioned
several times in previous emails. Some reference picture is not used in
RefPicList0/1 during decoding frame X while it will be used later in
decoding frame X+1. So the upper middles will add it into the
ReferenceFrames[] even although it is not used when decoding the current
frame. If not, the driver still can trigger the corresponding issue
although it can reduce the possibility of triggering issue after some
workaround. 



> 
> Regards,
> Gwenole.
> 
> >
> >>
> >> And there is no way to handle that from the unique GenFrameStore array as it
> >> is today, even if you delay the marking as "free", or whatever.
> >> You need a more elaborated data struct. And this is what was to be provided
> >> with the GenFrameStoreContext and additional re-factoring so that I don't
> >> have to copy code around again.
> >>
> >> I am working on an alternative that wouldn't need too many changes at this
> >> time.
> >>
> >> Regards,
> >> Gwenole.
> >>
> >> >
> >> >>
> >> >> >> >> Theory of operations:
> >> >> >> >>
> >> >> >> >> 1. Obsolete entries are removed first, i.e. entries in the internal DPB
> >> >> >> >>    that no longer have a match in the supplied ReferenceFrames[]
> >> array.
> >> >> >> >>    That obsolete entry index is stored in a local cache: free_slots[].
> >> >> >> >>
> >> >> >> >> 2. This cache is completed with entries considered as "invalid" or "not
> >> >> >> >>    present", sequentially while traversing the frame store for
> >> obsolete
> >> >> >> >>    entries. At the end of this removal process, the free_slots[] array
> >> >> >> >>    represents all possible indices in there that could be re-used for
> >> >> >> >>    new reference frames to track.
> >> >> >> >>
> >> >> >> >> 3. The list of ReferenceFrames[] objects is traversed for new entries
> >> >> >> >>    that are not already in the frame store. If an entry needs to be
> >> >> >> >>    added, it is placed at the index obtained from the next free_slots[]
> >> >> >> >>    element. There is no need to traverse the frame store array again,
> >> >> >> >>    the next available slot can be known from that free_slots[] cache.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Gwenole Beauchesne
> >> >> >> >> <gwenole.beauchesne at intel.com>
> >> >> >> >> ---
> >> >> >> >>  src/i965_decoder_utils.c |  107
> >> >> >> >> ++++++++++++++++++++--------------------------
> >> >> >> >>  1 file changed, 46 insertions(+), 61 deletions(-)
> >> >> >> >>
> >> >> >> >> diff --git a/src/i965_decoder_utils.c
> >> >> >> >> b/src/i965_decoder_utils.c index 2570be9..c686235 100644
> >> >> >> >> --- a/src/i965_decoder_utils.c
> >> >> >> >> +++ b/src/i965_decoder_utils.c
> >> >> >> >> @@ -418,88 +418,73 @@ gen6_send_avc_ref_idx_state(  }
> >> >> >> >>
> >> >> >> >>  void
> >> >> >> >> -intel_update_avc_frame_store_index(VADriverContextP ctx,
> >> >> >> >> -                                   struct decode_state
> >> >> *decode_state,
> >> >> >> >> -
> >> >> VAPictureParameterBufferH264 *pic_param,
> >> >> >> >> -                                   GenFrameStore
> >> >> frame_store[MAX_GEN_REFERENCE_FRAMES])
> >> >> >> >> +intel_update_avc_frame_store_index(
> >> >> >> >> +    VADriverContextP              ctx,
> >> >> >> >> +    struct decode_state          *decode_state,
> >> >> >> >> +    VAPictureParameterBufferH264 *pic_param,
> >> >> >> >> +    GenFrameStore
> >> >> frame_store[MAX_GEN_REFERENCE_FRAMES]
> >> >> >> >> +)
> >> >> >> >>  {
> >> >> >> >> -    int i, j;
> >> >> >> >> -
> >> >> >> >> -    assert(MAX_GEN_REFERENCE_FRAMES ==
> >> >> ARRAY_ELEMS(pic_param->ReferenceFrames));
> >> >> >> >> -
> >> >> >> >> -    for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
> >> >> >> >> -        int found = 0;
> >> >> >> >> -
> >> >> >> >> -        if (frame_store[i].surface_id == VA_INVALID_ID ||
> >> >> >> >> -            frame_store[i].obj_surface == NULL)
> >> >> >> >> +    int free_slots[MAX_GEN_REFERENCE_FRAMES];
> >> >> >> >> +    int i, j, n, num_free_slots;
> >> >> >> >> +    bool found;
> >> >> >> >> +
> >> >> >> >
> >> >> >> > It will be better to fill the free_slots through the following
> >> >> >> > two loops instead of one loop, which is to try to defer the
> >> >> >> > usage of the reference picture later used that is not used in
> >> >> >> > current reference frame
> >> >> list.
> >> >> >> > a. check the invalid surface_id or NULL surface and then add it
> >> >> >> > into free_slots.
> >> >> >> > b. check whether one picture is used by the current reference
> >> >> >> > frame list. If not, add it into the free_slots.
> >> >> >>
> >> >> >> The previous algorithm was filling in the holes. This one does the
> >> >> >> same. I don't see where your proposal could bring an improvement?
> >> >> >
> >> >> > In your patch you use the free_slots to find one free slot to hold
> >> >> > a new reference picture in DPB. Right? It will firstly try to find
> >> >> > one free_slot with invalid picture_id or surface, which is used to
> >> >> > hold a new reference picture in DPB.
> >> >>
> >> >> No, it will take whatever comes first between:
> >> >> - a slot with an invalid picture_id or surface ;
> >> >> - or a slot with a retired reference picture candidate, i.e.
> >> >> something that cannot be found in the ReferenceFrames[] array.
> >> >> And this works exactly the same way as before. No functional change yet.
> >> >>
> >> >> > It can remove the possibility that one picture later used is
> >> >> > immediately used to hold a new reference picture in DPB. So based
> >> >> > on the above point IMO it is better to find the free_slots through
> >> >> > two steps to hold the new reference picture in DPB.
> >> >>
> >> >> This would only hide further this situation, but OK.
> >> >>
> >> >> Regards,
> >> >> Gwenole.
> >> >>
> >> >> >> >> +    /* Remove obsolete entries from the internal DPB */
> >> >> >> >> +    for (i = 0, n = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
> >> >> >> >> +        GenFrameStore * const fs = &frame_store[i];
> >> >> >> >> +        if (fs->surface_id == VA_INVALID_ID || !fs->obj_surface) {
> >> >> >> >> +            free_slots[n++] = i;
> >> >> >> >>              continue;
> >> >> >> >> +        }
> >> >> >> >>
> >> >> >> >> -        assert(frame_store[i].frame_store_id != -1);
> >> >> >> >> -
> >> >> >> >> -        for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
> >> >> >> >> -            VAPictureH264 *ref_pic =
> >> >> &pic_param->ReferenceFrames[j];
> >> >> >> >> -            if (ref_pic->flags & VA_PICTURE_H264_INVALID)
> >> >> >> >> +        // Find whether the current entry is still a valid
> >> >> >> >> + reference
> >> >> frame
> >> >> >> >> +        found = false;
> >> >> >> >> +        for (j = 0; j <
> >> >> >> >> + ARRAY_ELEMS(decode_state->reference_objects);
> >> >> j++) {
> >> >> >> >> +            struct object_surface * const obj_surface =
> >> >> >> >> +                decode_state->reference_objects[j];
> >> >> >> >> +            if (!obj_surface)
> >> >> >> >>                  continue;
> >> >> >> >> -
> >> >> >> >> -            if (frame_store[i].surface_id == ref_pic->picture_id) {
> >> >> >> >> -                found = 1;
> >> >> >> >> +            if ((found = obj_surface == fs->obj_surface))
> >> >> >> >>                  break;
> >> >> >> >> -            }
> >> >> >> >>          }
> >> >> >> >>
> >> >> >> >> -        /* remove it from the internal DPB */
> >> >> >> >> +        // ... or remove it
> >> >> >> >>          if (!found) {
> >> >> >> >> -            frame_store[i].surface_id = VA_INVALID_ID;
> >> >> >> >> -            frame_store[i].frame_store_id = -1;
> >> >> >> >> -            frame_store[i].obj_surface = NULL;
> >> >> >> >> +            fs->surface_id = VA_INVALID_ID;
> >> >> >> >> +            fs->obj_surface = NULL;
> >> >> >> >> +            fs->frame_store_id = -1;
> >> >> >> >> +            free_slots[n++] = i;
> >> >> >> >>          }
> >> >> >> >>      }
> >> >> >> >> +    num_free_slots = n;
> >> >> >> >>
> >> >> >> >> -    for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) {
> >> >> >> >> -        VAPictureH264 *ref_pic = &pic_param->ReferenceFrames[i];
> >> >> >> >> -        int found = 0;
> >> >> >> >> -
> >> >> >> >> -        if (ref_pic->flags & VA_PICTURE_H264_INVALID ||
> >> >> >> >> -            ref_pic->picture_id == VA_INVALID_SURFACE ||
> >> >> >> >> -            decode_state->reference_objects[i] == NULL)
> >> >> >> >> +    /* Append the new reference frames */
> >> >> >> >> +    for (i = 0, n = 0; i <
> >> >> >> >> + ARRAY_ELEMS(decode_state->reference_objects);
> >> >> i++) {
> >> >> >> >> +        struct object_surface * const obj_surface =
> >> >> >> >> +            decode_state->reference_objects[i];
> >> >> >> >> +        if (!obj_surface)
> >> >> >> >>              continue;
> >> >> >> >>
> >> >> >> >> +        // Find whether the current frame is not already in
> >> >> >> >> + our frame
> >> >> store
> >> >> >> >> +        found = false;
> >> >> >> >>          for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
> >> >> >> >> -            if (frame_store[j].surface_id == ref_pic->picture_id) {
> >> >> >> >> -                found = 1;
> >> >> >> >> +            if ((found = frame_store[j].obj_surface ==
> >> >> >> >> + obj_surface))
> >> >> >> >>                  break;
> >> >> >> >> -            }
> >> >> >> >>          }
> >> >> >> >>
> >> >> >> >> -        /* add the new reference frame into the internal DPB */
> >> >> >> >> +        // ... or add it
> >> >> >> >>          if (!found) {
> >> >> >> >> -            int frame_idx;
> >> >> >> >> -            int slot_found;
> >> >> >> >> -            struct object_surface *obj_surface =
> >> >> decode_state->reference_objects[i];
> >> >> >> >> -
> >> >> >> >> -            slot_found = 0;
> >> >> >> >> -            frame_idx = -1;
> >> >> >> >> -            /* Find a free frame store index */
> >> >> >> >> -            for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) {
> >> >> >> >> -                if (frame_store[j].surface_id == VA_INVALID_ID ||
> >> >> >> >> -                    frame_store[j].obj_surface == NULL) {
> >> >> >> >> -                    frame_idx = j;
> >> >> >> >> -                    slot_found = 1;
> >> >> >> >> -                    break;
> >> >> >> >> -                }
> >> >> >> >> +            if (n < num_free_slots) {
> >> >> >> >> +                GenFrameStore * const fs =
> >> >> &frame_store[free_slots[n++]];
> >> >> >> >> +                fs->surface_id = obj_surface->base.id;
> >> >> >> >> +                fs->obj_surface = obj_surface;
> >> >> >> >> +                fs->frame_store_id = fs - frame_store;
> >> >> >> >> +            }
> >> >> >> >> +            else {
> >> >> >> >> +                WARN_ONCE("No free slot found for DPB
> >> >> >> >> + reference list!!!\n");
> >> >> >> >>              }
> >> >> >> >> -
> >> >> >> >> -
> >> >> >> >> -         if (slot_found) {
> >> >> >> >> -                frame_store[j].surface_id = ref_pic->picture_id;
> >> >> >> >> -                frame_store[j].frame_store_id = frame_idx;
> >> >> >> >> -                frame_store[j].obj_surface = obj_surface;
> >> >> >> >> -            } else {
> >> >> >> >> -                WARN_ONCE("Not free slot for DPB reference
> >> >> list!!!\n");
> >> >> >> >> -         }
> >> >> >> >>          }
> >> >> >> >>      }
> >> >> >> >> -
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >>  void
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> Libva mailing list
> >> >> Libva at lists.freedesktop.org
> >> >> http://lists.freedesktop.org/mailman/listinfo/libva




More information about the Libva mailing list