[PATCH 01/21] vulkan: Add KHR_display extension using DRM

Wed Mar 7 20:15:15 UTC 2018

Jason Ekstrand <jason at jlekstrand.net> writes:

Thanks a million for the intense review. I've pushed 15 tiny patches
that address individual questions in this message. If you want to look
at those separately, that would be great. When we're done, I'll merge
them back into the giant patch.

Sorry for the delay in answering; you asked a hard question about GEM
handles and reference counting which I think I've resolved by
eliminating the ability to share the same fd for modesetting and
rendering.

Let me know if I've deleted too much context from your answers.

> With the exception of NIR (which is a bit odd), we usually use ALL_CAPS for
> enum values.

Thanks, fixed.

> Yes, this does seem like the only reasonable mode for now.  I suppose you
> could check to see if the platform supports ASYNC_FLIP and advertise
> IMMEDIATE in that case.  I know intel doesn't advertise it but I thought
> amdgpu does.

Ok, we'll put that on the wish list, along with MAILBOX (which
applications actually want :-)

> vk_outarray, though I suppose you've probably already made that
> change.

Yup, vk_outarray everywhere now.

> I don't see use_prime_blit being set anywhere.  Perhaps that comes in a
> later patch?  The line is obviously correct, so I'm fine with leaving
> it.

You're right -- this was a cult-n-paste error. I don't support prime at
all, so I've removed this bit.

> This will have to be updated for modifiers.  I'm reasonably happy for it to
> be the no-op update for now though KHR_display supporting real modifiers
> would be nice. :-)

Yeah, "patches welcome", of course. I don't have a requirement for them
on Radeon at this point.

>> +   if (result != VK_SUCCESS)
>> +      return result;
>> +
>> +   ret = drmPrimeFDToHandle(wsi->master_fd, image->base.fd,
>> &image_handle);
>>
>
> This opens up some interesting questions.  GEM handles are not reference
> counted by the kernel.  If the driver that we're running on is using
> master_fd, then we shouldn't close our handle because that would close the
> handle for the driver as well.  If it's not using master_fd, we should
> close the handle to avoid leaking it.  The later is only a worry in the
> prime case but given that I saw a line above about use_prime_blit, you have
> me scared. :-)

Ok, as I mentioned in the header of this message, I've eliminated any
code which talks about potentially sharing master_fd and render_fd. I
removed render_fd from the wsi layer entirely as it is no longer used
anywhere.

In the drivers, when you request KHR_display, I attempt to open the
related primary node and if that succeeds, I pass that to the wsi layer
for use there. That fd is otherwise unused by the driver, and in fact
the driver doesn't need to close it as the wsi layer will.

Obviously having two FDs is "wasteful" at some level as we really don't
need that, but that's what the acquire_xlib extension will have to do
anyways.

> One solution to this connundrum would be to simply never use the master FD
> for the Vulkan driver itself and always open a render node.  If handed a
> master FD, you could check if it matches your render node and set
> use_prime_blit accordingly.  Then the driver and WSI would have separate
> handle domains and this problem would be solved.  You could also just open
> the master FD twice, once for WSI and once for the driver.

Yup, two FDs, one master, one render. That way, the kludge
seen in this patch where it opens the master when you request the
KHR_display extension works the same as the acquire_xlib code in the
later patch.

> You have the format in create_info.  We could add some generic VkFormat
> introspection or we can just handle the 2-4 cases we care about
> directly.

Given that this layer provides the list of valid surface formats which
that image could contain, I've added depth/bpp information to that list
and the image_init code walks over the list of possible formats to find
the associated depth/bpp values.

If the application provides an invalid format, I'm returning
VK_ERROR_DEVICE_LOST as that is a valid error return and I couldn't find
anything else that seemed like a better value. If we get here, the
application made a mistake anyways.

> What happens if we close the handle immediately after calling
> drmModeAddFB?  Does the FB become invalid?  If so, then we probably want to
> stash the handle so we can clean it up when we destroy the swapchain.
> Unless, of course, we're willing to make the assumption (as stated above)
> that the driver and WSI are always using the same FD.

It looks like the FB ends up with a reference to the object, so it would
should be safe to just close the handle at this point.  However, to make
it less dependent on the kernel behavior, I've changed the code to store
the buffer_object handle in the image and then added code to free both
the buffer object and the frame buffer in wsi_display_image_finish.

Thanks for finding the leak.

>> +/* call with wait_mutex held */
>>
>
> It might be worth a line in the comment to say that this function does not
> guarntee that any particular type of event (or indeed any event at all) has
> been processed.  It's just a yield.

Yeah, more comments are always good :-)

> I don't see anywhere where you track any sort of per-swapchain error
> status.  We need something so that, once you get VK_ERROR_OUT_OF_DATE, most
> of the other functions will start bailing immediately with OUT_OF_DATE
> instead of assuming the client catches it the first time it's returned.  In
> particular, if wsi_display_queue_next fails inside an event handler, we
> need to know.  In the X11 WSI patches, we just have a VkResult called
> "status" in the swapchain.

Failure is not an option? In reality, failures in the display backend
should only result from an actual unexpected error condition, unlike the
X backend where windows get destroyed or resized without the
applications knowledge.

However, that doesn't mean we shouldn't handle them correctly. I now
record asynchronous failures in a 'chain->status' value and return that
to applications in acquire_next_image and queue_present.

> Lots of whitespace in here.  We don't usually try to line stuff up in code
> unless there's a good reason.

> Same here.

Fixed.

> I take it that 0 is not a valid crtc_id?

Correct.

>> +   mode_res = drmModeGetResources(wsi->master_fd);
>> +   if (!mode_res) {
>> +      result = VK_ERROR_INITIALIZATION_FAILED;
>>
>
> This looks more like an OUT_OF_DATE to me.  Unless, of course, it's
> actually an OUT_OF_HOST_MEMORY.

Yeah, VK_ERROR_OUT_OF_HOST_MEMORY if errno is ENOMEM, otherwise the
ioctl failed and we've lost our ability to do master operations (vt
switched away, or a revoked lease), in which case OUT_OF_DATE_KHR seems appropriate.

>> +   if (!drm_connector) {
>> +      result = VK_ERROR_INITIALIZATION_FAILED;
>>
>
> Same here.

Same fix.

> Extra new line?

There's a comment afterwards, so I like a bit of whitespace to highlight that.

>> +   if (wsi->master_fd < 0)
>> +      return VK_ERROR_INITIALIZATION_FAILED;
>>
>
> Is this possible?  If so, it should be OUT_OF_DATE

It's only possible if you call into this API without having gotten a
master_fd set, or after calling vkReleaseDisplay (part of the
direct_mode_display extension).

I don't know what the right error is in this case as the application did
something wrong or didn't have the right permissions. It "can't happen"
in normal operation, so OUT_OF_DATE seems inappropriate to me.

Suggestions welcome.

> If one half uses braces, please use braces on the other half.

I found another case doing this 

> As I stated earlier, I think we want some sort of chain->status so that we
> can easily know that things have gone out-of-date and we don't end up
> retrying just because the client called QueuePresent again.

All fixed -- queue_next returns are captured in chain->status and
returned at subsequent calls to queue_present or acquire_next_image.

>> +            ret = drmModeSetCrtc(wsi->master_fd, connector->crtc_id,
>> image->fb_id, 0, 0,
>> +                                 &connector->id, 1,
>> &connector->current_drm_mode);
>> +
>> +            if (ret == 0) {
>> +               image->state = wsi_image_displaying;
>> +               wsi_display_idle_old_displaying(image);
>>
>
> Can we really idle them immediately or do we need to wait until this image
> has begun to scan out before returning a new image from
> AcquireNextImage?

After ModeSetCrtc, the old image is no longer being used. There's no
event delivered that could tell us that the scanout started later anyways.

> I suppose we do use implicit fencing for all WSI images today so this is
> probably ok.  Still, might be worth a TODO comment or something.

Given that we're not going to receive an event telling us that the image
is idle, I'm not sure what we could ever do, but a comment is probably wise.

>> +         case EACCES:
>> +            usleep(1000 * 1000);
>>
>
> ?

We get EACCES when VT switched away, so we just want to idle until VT
switched back. After we get back, we'll get EINVAL from the page flip
ioctl, which will cause us to re-set the mode.

If there were an event delivered on VT switch, I could monitor for that
and make the switch back happen faster.

I'll add a comment.

>> +static VkResult
>> +wsi_display_queue_present(struct wsi_swapchain          *drv_chain,
>> +                          uint32_t                      image_index,
>> +                          const VkPresentRegionKHR      *damage)
>> +{
>> +   struct wsi_display_swapchain *chain = (struct wsi_display_swapchain *)
>> drv_chain;
>> +   struct wsi_display           *wsi = chain->wsi;
>> +   struct wsi_display_image     *image = &chain->images[image_index];
>> +   VkResult                     result;
>> +
>> +   assert(image->state == wsi_image_drawing);
>> +   wsi_display_debug("present %d\n", image_index);
>> +
>> +   pthread_mutex_lock(&wsi->wait_mutex);
>> +
>>
>
> assert(image->state == wsi_image_idle);

I don't understand. I'm asserting that you can't queue an image that you
haven't gotten from acquire_next_image, which puts the image in
WSI_IMAGE_DRAWING state.

> You need to clean up if you are only able to create some of the
> images.

Oops.

>> +VkResult
>> +wsi_display_init_wsi(struct wsi_device *wsi_device,
>> +                     const VkAllocationCallbacks *alloc,
>> +                     VkPhysicalDevice physical_device,
>>
>
> You can get the physical device from the wsi_device.  No need to pass it in
> separately.

No, you can't (there's very little in wsi_device), but it turns out that
I never needed this value anyways. I've removed it from the API and from
'struct wsi_display'.

> I don't see this being used anywhere.  Is it needed?

Nope. See above.

-- 
-keith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180307/70066da1/attachment.sig>