[PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs

Wed Feb 3 11:18:55 UTC 2021

On 1/29/21 7:30 PM, Dmitry Osipenko wrote:
> 28.01.2021 19:58, Thierry Reding пишет:
>> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>>> increments.  The job's sync point will be allocated dynamically when job
>>>>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>>
>>>>> Negative. Like I have explained in previous discussions, with the
>>>>> current way the usage of hardware resources is much more deterministic
>>>>> and obvious. I disagree on the point that this is much more complicated
>>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>>> the primary motivations of this series for me.
>>>>
>>>> Sync points are a limited resource. The most sensible way to work around
>>>> it is to keep sync points within kernel as much as possible. This is not
>>>> only much simpler for user space, but also allows to utilize DRM API
>>>> properly without re-inventing what already exists and it's easier to
>>>> maintain hardware in a good state.
>>>
>>> I've spent the last few years designing for automotive and industrial
>>> products, where we don't want to at runtime notice that the system is out of
>>> free syncpoints and because of that we can only process the next camera
>>> frame in a second or two instead of 16 milliseconds. We need to know that
>>> once we have allocated the resource, it is there. The newer chips are also
>>> designed to support this.
>>>
>>> Considering Linux is increasingly being used for such applications, and they
>>> are important target markets for NVIDIA, these need to be supported.
>>>
>>> Because of the above design constraint the userspace software that runs in
>>> these environments also expects resources to be allocated up front. This
>>> isn't a matter of having to design that software according to what kind of
>>> allocation API we decide do at Linux level -- it's no use designing for
>>> dynamic allocation if it leads to you not meeting the safety requirement of
>>> needing to ensure you have all resources allocated up front.
>>>
>>> This isn't a good design feature just in a car, but in anything that needs
>>> to be reliable. However, it does pose some tradeoffs, and if you think that
>>> running out of syncpoints on T20-T114 because of upfront allocation is an
>>> actual problem, I'm not opposed to having both options available.
> 
> The word "reliable" contradicts to the error-prone approach. On the
> other hand, it should be very difficult to change the stubborn
> downstream firmware and we want driver to be usable as much as possible,
> so in reality not much can be done about it.

Depends on the perspective.

> 
>> I think that's a fair point. I don't see why we can't support both
>> implicit and explicit syncpoint requests. If most of the use-cases can
>> work with implicit syncpoints and let the kernel handle all aspects of
>> it, that's great. But there's no reason we can't provide more explicit
>> controls for cases where it's really important that all the resources
>> are allocated upfront.
>>
>> Ultimately this is very specific to each use-case, so I think having
>> both options will provide us with the most flexibility.
> It should be fine to support both. This will add complexity to the
> driver, thus it needs to be done wisely.
> 
> I'll need more time to think about it.
> 

How about something like this:

Turn the syncpt_incr field back into an array of structs like

#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ		(1<<0)
#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT	(1<<1)

struct drm_tegra_submit_syncpt_incr {
	/* can be left as zero if using dynamic syncpt */
	__u32 syncpt_id;
	__u32 flags;

	struct {
		__u32 syncobj;
		__u32 value;
	} fence;

	/* patch word as such:
          * *word = *word | (syncpt_id << shift)
          */
	struct {
		__u32 gather_offset_words;
		__u32 shift;
	} patch;
};

So this will work similarly to the buffer reloc system; the kernel 
driver will allocate a job syncpoint and patch in the syncpoint ID if 
requested, and allows outputting syncobjs for each increment.

Mikko