[Mesa-dev] [PATCH 2/2] st/vdapu: use lanczos filter for scaling v4

Fri Sep 2 15:21:29 UTC 2016

On 09/02/2016 10:48 AM, Christian König wrote:
> Am 02.09.2016 um 16:10 schrieb Leo Liu:
>>
>>
>> On 09/02/2016 09:50 AM, Christian König wrote:
>>> Am 02.09.2016 um 15:27 schrieb Leo Liu:
>>>>
>>>>
>>>> On 09/02/2016 02:11 AM, Christian König wrote:
>>>>> Am 02.09.2016 um 04:03 schrieb Michel Dänzer:
>>>>>> On 02/09/16 10:17 AM, Michel Dänzer wrote:
>>>>>>> On 02/09/16 12:58 AM, Leo Liu wrote:
>>>>>>>> On 09/01/2016 11:54 AM, Nayan Deshmukh wrote:
>>>>>>>>> I saw the code in dri3_glx.c and I could somewhat relate some 
>>>>>>>>> basic
>>>>>>>>> code structure to the vl_winsys_dri3.c. But I am new to this 
>>>>>>>>> and not aware of the
>>>>>>>>> terminology that you used about the buffers. Could you please 
>>>>>>>>> explain what needs
>>>>>>>>> to be done in more detail or point me to where I can read 
>>>>>>>>> about it.
>>>>>>>> I believe it's from loader_dri3_helper.c with "is_different_gpu"
>>>>>>>> condition true, that will include back buffer and front buffer 
>>>>>>>> case.
>>>>>>>> you could try only back buffer case for now.
>>>>>>>  From a high level, PRIME mainly affects presentation, not so 
>>>>>>> much the
>>>>>>> video decoding / rendering. The important thing is that the 
>>>>>>> buffer used
>>>>>>> for presentation via the Present extension is linear, not tiled. 
>>>>>>> I'm not
>>>>>>> sure whether it makes more sense to allocate a separate linear 
>>>>>>> buffer
>>>>>>> for this purpose, as is done for GLX, or for the vl code to make 
>>>>>>> the
>>>>>>> corresponding back (or front?) buffer linear in the first place.
>>>>>> A separate linear buffer is probably better, actually, since it will
>>>>>> also be pinned to system memory while it's being shared with 
>>>>>> another GPU.
>>>>>
>>>>> Yes, I agree. Nayan should also work on avoiding the extra copy 
>>>>> which currently occur because we can't allocate output buffers 
>>>>> directly in the format needed for presentation.
>>>>>
>>>>> The general idea should be to to check during presentation if the 
>>>>> format in the output surface is displayable directly.
>>>>
>>>> Also we have to consider drawable resized case.
>>>
>>> Actually we don't. Take a look at the VDPAU spec the output surface 
>>> should be send for displaying without considering it's size.
>>>
>>> E.g. when the window is 256x256 pixels, but the application 
>>> allocated an output surface of 1024x768 we should still send the 
>>> whole surface to the X server.
>>>
>>> It's the job of the application to resize the output surfaces not 
>>> the one of the VDPAU state tracker.
>>
>> I thought this get done by vl compositor from presentation, scaling 
>> up or down from output surface to back buffer based on the resize.
>
> No, that is incorrect. Take a look at the VDPAU spec:
>
>> Applications may choose to allow resizing of the presentation queue 
>> target (which may be e.g. a regular Window when using an X11-based 
>> implementation).
>>
>> *clip_width* and *clip_height* may be used to limit the size of the 
>> displayed region of a surface, in order to match the specific region 
>> that was rendered to.
>>
>> In turn, this allows the application to allocate over-sized (e.g. 
>> screen-sized) surfaces, but render to a region that matches the 
>> current size of the video window.
>>
>> Using this technique, an application's response to window resizing 
>> may simply be to render to, and display, a different region of the 
>> surface, rather than de-/re-allocation of surfaces to match the 
>> updated window size.
>>
>
> This means that we should send the original output surface size to X, 
> no matter what size it has or what size the window has it is displayed in.
>
> That wasn't possible with DRI2, that's why we have that workaround 
> with the delayed rendering in the mixer.

I did a quick hack on single GPU, and tested, this proves the whole idea 
is working including resizing.
Linear is still displayable, just looks kind of sluggish when playback.

Here is the hack for reference including remove back buffer creating, 
and presentation rendering, use output surface handle for X

diff --git a/src/gallium/auxiliary/vl/vl_winsys.h 
b/src/gallium/auxiliary/vl/vl_winsys.h
index 26db9f2..908ec3a 100644
--- a/src/gallium/auxiliary/vl/vl_winsys.h
+++ b/src/gallium/auxiliary/vl/vl_winsys.h
@@ -59,6 +59,9 @@ struct vl_screen
     void *
     (*get_private)(struct vl_screen *vscreen);

+   void *
+   (*set_output_handle)(struct vl_screen *vscreen, struct winsys_handle 
whandle);
+
     struct pipe_screen *pscreen;
     struct pipe_loader_device *dev;
  };
diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..4908699 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -64,6 +64,9 @@ struct vl_dri3_screen
     xcb_connection_t *conn;
     xcb_drawable_t drawable;

+   unsigned output_handle;
+   unsigned output_stride;
+
     uint32_t width, height, depth;

     xcb_present_event_t eid;
@@ -225,6 +228,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
     if (!shm_fence)
        goto close_fd;

+#if 0
     memset(&templ, 0, sizeof(templ));
     templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
                  PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
@@ -248,6 +252,11 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
                                             usage);
     buffer_fd = whandle.handle;
     buffer->pitch = whandle.stride;
+#endif
+
+   buffer_fd = scrn->output_handle;
+   buffer->pitch = scrn->output_stride;
+
     xcb_dri3_pixmap_from_buffer(scrn->conn,
                                 (pixmap = xcb_generate_id(scrn->conn)),
                                 scrn->drawable,
@@ -591,6 +600,15 @@ vl_dri3_screen_get_private(struct vl_screen *vscreen)
  }

  static void
+vl_dri3_set_output_handle(struct vl_screen *vscreen, struct 
winsys_handle whandle)
+{
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   scrn->output_handle = whandle.handle;
+   scrn->output_stride = whandle.stride;
+}
+
+static void
  vl_dri3_screen_destroy(struct vl_screen *vscreen)
  {
     struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
@@ -706,6 +724,7 @@ vl_dri3_screen_create(Display *display, int screen)
     scrn->base.set_next_timestamp = vl_dri3_screen_set_next_timestamp;
     scrn->base.get_private = vl_dri3_screen_get_private;
     scrn->base.pscreen->flush_frontbuffer = vl_dri3_flush_frontbuffer;
+   scrn->base.set_output_handle = vl_dri3_set_output_handle;

     return &scrn->base;

diff --git a/src/gallium/state_trackers/vdpau/presentation.c 
b/src/gallium/state_trackers/vdpau/presentation.c
index 2862eaf..d5e832e 100644
--- a/src/gallium/state_trackers/vdpau/presentation.c
+++ b/src/gallium/state_trackers/vdpau/presentation.c
@@ -30,6 +30,11 @@

  #include "util/u_debug.h"
  #include "util/u_memory.h"
+#include "util/u_sampler.h"
+#include "util/u_format.h"
+#include "util/u_surface.h"
+
+#include "state_tracker/drm_driver.h"

  #include "vdpau_private.h"

@@ -216,6 +221,7 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue 
presentation_queue,
     struct vl_compositor *compositor;
     struct vl_compositor_state *cstate;
     struct vl_screen *vscreen;
+   struct winsys_handle whandle;

     pq = vlGetDataHTAB(presentation_queue);
     if (!pq)
@@ -231,14 +237,26 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue 
presentation_queue,
     vscreen = pq->device->vscreen;

     pipe_mutex_lock(pq->device->mutex);
+
+   memset(&whandle, 0, sizeof(struct winsys_handle));
+   whandle.type = DRM_API_HANDLE_TYPE_FD;
+
+   if (!vscreen->pscreen->resource_get_handle(vscreen->pscreen, 
surf->device->context,
+                                     surf->surface->texture, &whandle,
+                     PIPE_HANDLE_USAGE_READ_WRITE))
+      return VDP_STATUS_NO_IMPLEMENTATION;
+
+   vscreen->set_output_handle(vscreen, whandle);
+
     tex = vscreen->texture_from_drawable(vscreen, (void *)pq->drawable);
-   if (!tex) {
-      pipe_mutex_unlock(pq->device->mutex);
-      return VDP_STATUS_INVALID_HANDLE;
-   }
+//   if (!tex) {
+//      pipe_mutex_unlock(pq->device->mutex);
+//      return VDP_STATUS_INVALID_HANDLE;
+//   }

     dirty_area = vscreen->get_dirty_area(vscreen);

+#if 0
     memset(&surf_templ, 0, sizeof(surf_templ));
     surf_templ.format = tex->format;
     surf_draw = pipe->create_surface(pipe, tex, &surf_templ);
@@ -269,6 +287,7 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue 
presentation_queue,
        vl_compositor_set_dst_clip(cstate, &dst_clip);
        vl_compositor_render(cstate, compositor, surf_draw, dirty_area, 
true);
     }
+#endif

     vscreen->set_next_timestamp(vscreen, earliest_presentation_time);
     pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0,
@@ -294,8 +313,10 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue 
presentation_queue,
        framenum++;
     }

+#if 0
     pipe_resource_reference(&tex, NULL);
     pipe_surface_reference(&surf_draw, NULL);
+#endif
     pipe_mutex_unlock(pq->device->mutex);

     return VDP_STATUS_OK;


Regards,
Leo


>
> But no worry it's only a minor issue and a good task for Nayan to get 
> deeper into the graphics stack.
>
> Regards,
> Christian.
>
>>
>> Regards,
>> Leo
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Regards,
>>>> Leo
>>>>
>>>>> If that is the case then handle of that surface should be send 
>>>>> directly to X.
>>>>> If that isn't the case we reallocate the backing buffer, copy the 
>>>>> content of the output surface into it and then send the new handle 
>>>>> to X.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160902/ee148edf/attachment-0001.html>