[PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
Mikko Perttunen
cyndis at kapsi.fi
Wed Feb 3 11:18:55 UTC 2021
On 1/29/21 7:30 PM, Dmitry Osipenko wrote:
> 28.01.2021 19:58, Thierry Reding пишет:
>> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>>> increments. The job's sync point will be allocated dynamically when job
>>>>>> is submitted. We will need a fag for the sync_incr and wait_syncpt
>>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>>
>>>>> Negative. Like I have explained in previous discussions, with the
>>>>> current way the usage of hardware resources is much more deterministic
>>>>> and obvious. I disagree on the point that this is much more complicated
>>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>>> the primary motivations of this series for me.
>>>>
>>>> Sync points are a limited resource. The most sensible way to work around
>>>> it is to keep sync points within kernel as much as possible. This is not
>>>> only much simpler for user space, but also allows to utilize DRM API
>>>> properly without re-inventing what already exists and it's easier to
>>>> maintain hardware in a good state.
>>>
>>> I've spent the last few years designing for automotive and industrial
>>> products, where we don't want to at runtime notice that the system is out of
>>> free syncpoints and because of that we can only process the next camera
>>> frame in a second or two instead of 16 milliseconds. We need to know that
>>> once we have allocated the resource, it is there. The newer chips are also
>>> designed to support this.
>>>
>>> Considering Linux is increasingly being used for such applications, and they
>>> are important target markets for NVIDIA, these need to be supported.
>>>
>>> Because of the above design constraint the userspace software that runs in
>>> these environments also expects resources to be allocated up front. This
>>> isn't a matter of having to design that software according to what kind of
>>> allocation API we decide do at Linux level -- it's no use designing for
>>> dynamic allocation if it leads to you not meeting the safety requirement of
>>> needing to ensure you have all resources allocated up front.
>>>
>>> This isn't a good design feature just in a car, but in anything that needs
>>> to be reliable. However, it does pose some tradeoffs, and if you think that
>>> running out of syncpoints on T20-T114 because of upfront allocation is an
>>> actual problem, I'm not opposed to having both options available.
>
> The word "reliable" contradicts to the error-prone approach. On the
> other hand, it should be very difficult to change the stubborn
> downstream firmware and we want driver to be usable as much as possible,
> so in reality not much can be done about it.
Depends on the perspective.
>
>> I think that's a fair point. I don't see why we can't support both
>> implicit and explicit syncpoint requests. If most of the use-cases can
>> work with implicit syncpoints and let the kernel handle all aspects of
>> it, that's great. But there's no reason we can't provide more explicit
>> controls for cases where it's really important that all the resources
>> are allocated upfront.
>>
>> Ultimately this is very specific to each use-case, so I think having
>> both options will provide us with the most flexibility.
> It should be fine to support both. This will add complexity to the
> driver, thus it needs to be done wisely.
>
> I'll need more time to think about it.
>
How about something like this:
Turn the syncpt_incr field back into an array of structs like
#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ (1<<0)
#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT (1<<1)
struct drm_tegra_submit_syncpt_incr {
/* can be left as zero if using dynamic syncpt */
__u32 syncpt_id;
__u32 flags;
struct {
__u32 syncobj;
__u32 value;
} fence;
/* patch word as such:
* *word = *word | (syncpt_id << shift)
*/
struct {
__u32 gather_offset_words;
__u32 shift;
} patch;
};
So this will work similarly to the buffer reloc system; the kernel
driver will allocate a job syncpoint and patch in the syncpoint ID if
requested, and allows outputting syncobjs for each increment.
Mikko
More information about the dri-devel
mailing list