[PATCH 08/27] habanalabs: add info when FD released while device still in use

Stanislaw Gruszka stanislaw.gruszka at linux.intel.com
Thu Feb 16 15:04:47 UTC 2023


On Thu, Feb 16, 2023 at 04:21:48PM +0200, Oded Gabbay wrote:
> On Thu, Feb 16, 2023 at 2:25 PM Stanislaw Gruszka
> <stanislaw.gruszka at linux.intel.com> wrote:
> >
> > On Sun, Feb 12, 2023 at 10:44:35PM +0200, Oded Gabbay wrote:
> > > From: Tomer Tayar <ttayar at habana.ai>
> > >
> > > When user closes the device file descriptor, it is checked whether the
> > > device is still in use, and a message is printed if it is.
> >
> > I guess this is only for debugging your user-space component?
> > Because kernel driver should not make any assumption about
> > user-space behavior. Closing whenever user wants should be
> > no problem.
> First of all, indeed the user can close the device whatever it wants.
> We don't limit him, but we do need to track the device state, because
> we can't allow a new user to acquire the device until it is idle (due
> to h/w limitations).
> Therefore, this print is not so much for debug, as it is for letting
> the user know the device wasn't idle after he closed it, and
> therefore, we are going to reset it to make it idle.
> So, it is a notification that is important imo.

This sounds like something that should be handed in open() with -EAGAIN
error with eventual message in dmesg. But you know best what info
is needed by user-space :-)

> > > +static void print_device_in_use_info(struct hl_device *hdev, const char *message)
> > > +{
> > > +     u32 active_cs_num, dmabuf_export_cnt;
> > > +     char buf[64], *buf_ptr = buf;
> > > +     size_t buf_size = sizeof(buf);
> > > +     bool unknown_reason = true;
> > > +
> > > +     active_cs_num = hl_get_active_cs_num(hdev);
> > > +     if (active_cs_num) {
> > > +             unknown_reason = false;
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u active CS]", active_cs_num);
> > > +     }
> > > +
> > > +     dmabuf_export_cnt = atomic_read(&hdev->dmabuf_export_cnt);
> > > +     if (dmabuf_export_cnt) {
> > > +             unknown_reason = false;
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u exported dma-buf]",
> > > +                                             dmabuf_export_cnt);
> > > +     }
> > > +
> > > +     if (unknown_reason)
> > > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [unknown reason]");
> > > +
> > > +     dev_notice(hdev->dev, "%s%s\n", message, buf);
> >
> > why not print counters directly, i.e. "active cs count %u, dmabuf export count %u" ?
> Because we wanted to print the specific reason, or unknown reason, and
> not print all the possible counters in one line, because most of the
> time most of the counters will be 0.
> We plan to add more reasons so this helper simplifies the code.

Ok, just place replace compose_device_in_use_info() with snprintf().
I don't think you need custom implementation of snprintf().

> > > +             print_device_in_use_info(hdev, "User process closed FD but device still in use");
> > >               hl_device_reset(hdev, HL_DRV_RESET_HARD);
> >
> > You really need reset here ?
> Yes, our h/w requires that we reset the device after the user closed
> it. If the device is not idle after the user closed it, we hard reset
> it.
> If it is idle, we do a more graceful reset.

Hmm, ok.

Regards
Stanislaw



More information about the dri-devel mailing list