[PATCH 08/27] habanalabs: add info when FD released while device still in use

Oded Gabbay ogabbay at kernel.org
Thu Feb 16 14:21:48 UTC 2023


On Thu, Feb 16, 2023 at 2:25 PM Stanislaw Gruszka
<stanislaw.gruszka at linux.intel.com> wrote:
>
> On Sun, Feb 12, 2023 at 10:44:35PM +0200, Oded Gabbay wrote:
> > From: Tomer Tayar <ttayar at habana.ai>
> >
> > When user closes the device file descriptor, it is checked whether the
> > device is still in use, and a message is printed if it is.
>
> I guess this is only for debugging your user-space component?
> Because kernel driver should not make any assumption about
> user-space behavior. Closing whenever user wants should be
> no problem.
First of all, indeed the user can close the device whatever it wants.
We don't limit him, but we do need to track the device state, because
we can't allow a new user to acquire the device until it is idle (due
to h/w limitations).
Therefore, this print is not so much for debug, as it is for letting
the user know the device wasn't idle after he closed it, and
therefore, we are going to reset it to make it idle.
So, it is a notification that is important imo.

>
> > +static void print_device_in_use_info(struct hl_device *hdev, const char *message)
> > +{
> > +     u32 active_cs_num, dmabuf_export_cnt;
> > +     char buf[64], *buf_ptr = buf;
> > +     size_t buf_size = sizeof(buf);
> > +     bool unknown_reason = true;
> > +
> > +     active_cs_num = hl_get_active_cs_num(hdev);
> > +     if (active_cs_num) {
> > +             unknown_reason = false;
> > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u active CS]", active_cs_num);
> > +     }
> > +
> > +     dmabuf_export_cnt = atomic_read(&hdev->dmabuf_export_cnt);
> > +     if (dmabuf_export_cnt) {
> > +             unknown_reason = false;
> > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [%u exported dma-buf]",
> > +                                             dmabuf_export_cnt);
> > +     }
> > +
> > +     if (unknown_reason)
> > +             compose_device_in_use_info(&buf_ptr, &buf_size, " [unknown reason]");
> > +
> > +     dev_notice(hdev->dev, "%s%s\n", message, buf);
>
> why not print counters directly, i.e. "active cs count %u, dmabuf export count %u" ?
Because we wanted to print the specific reason, or unknown reason, and
not print all the possible counters in one line, because most of the
time most of the counters will be 0.
We plan to add more reasons so this helper simplifies the code.

>
> >       if (!hl_hpriv_put(hpriv)) {
> > -             dev_notice(hdev->dev, "User process closed FD but device still in use\n");
> > +             print_device_in_use_info(hdev, "User process closed FD but device still in use");
> >               hl_device_reset(hdev, HL_DRV_RESET_HARD);
>
> You really need reset here ?
Yes, our h/w requires that we reset the device after the user closed
it. If the device is not idle after the user closed it, we hard reset
it.
If it is idle, we do a more graceful reset.
Thanks,
Oded
>
> Regards
> Stanislaw


More information about the dri-devel mailing list