[RFC] Host1x/TegraDRM UAPI (drm_tegra_submit_command)
Mikko Perttunen
cyndis at kapsi.fi
Tue Jun 30 10:40:34 UTC 2020
On 6/29/20 3:00 AM, Dmitry Osipenko wrote:
> 28.06.2020 13:28, Mikko Perttunen пишет:
>> On 6/28/20 1:32 AM, Dmitry Osipenko wrote:
>>> 23.06.2020 15:09, Mikko Perttunen пишет:
>>>> /* Command is an opcode gather from a GEM handle */
>>>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
>>>> /* Command is an opcode gather from a user pointer */
>>>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR 1
>>>> /* Command is a wait for syncpt fence completion */
>>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT 2
>>>> /* Command is a wait for SYNC_FILE FD completion */
>>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3
>>>> /* Command is a wait for DRM syncobj completion */
>>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ 4
>>>>
>>>> /*
>>>> * Allow driver to skip command execution if engine
>>>> * was not accessed by another channel between
>>>> * submissions.
>>>> */
>>>> #define DRM_TEGRA_SUBMIT_CONTEXT_SETUP (1<<0)
>>>>
>>>> struct drm_tegra_submit_command {
>>>> __u16 type;
>>>> __u16 flags;
>>>
>>> Shouldn't the "packed" attribute be needed if a non-32bit aligned fields
>>> are used?
>>
>> I guess we should change these to u32 for consistency.
>>
>>>
>>>> union {
>>>> struct {
>>>> /* GEM handle */
>>>> __u32 handle;
>>>>
>>>> /*
>>>> * Offset into GEM object in bytes.
>>>> * Must be aligned by 4.
>>>> */
>>>> __u64 offset;
>>>
>>> 64bits for a gather offset is a bit too much, in most cases gathers are
>>> under 4K.
>>>
>>> u32 should be more than enough (maybe even u16 if offset is given in a
>>> dword granularity).
>>
>> I guess this can be changed to u32, though I don't think there is any
>> particularly pressing reason not to use u64.
>>
>> In any case, I think we concluded that we'll drop the GEM gather command
>> for now.
>
> Sure, I commented this just in a case, for the future :)
Yep, OK :)
>
>>>
>>>> /*
>>>> * Length of gather in bytes.
>>>> * Must be aligned by 4.
>>>> */
>>>> __u64 length;
>>>
>>> u32/16
>>>
>>>> } gather;
>>>
>>>> struct {
>>>> __u32 reserved[1];
>>>>
>>>> /*
>>>> * Pointer to gather data.
>>>> * Must be aligned by 4 bytes.
>>>> */
>>>> __u64 base;
>>>> /*
>>>> * Length of gather in bytes.
>>>> * Must be aligned by 4.
>>>> */
>>>> __u64 length;
>>>> } gather_uptr;
>>>
>>> What about to inline the UPTR gather data and relocs into the
>>> drm_tegra_submit_command[] buffer:
>>>
>>> struct drm_tegra_submit_command {
>>> struct {
>>> u16 num_words;
>>> u16 num_relocs;
>>>
>>> gather_data[];
>>> drm_tegra_submit_relocation relocs[];
>>> } gather_uptr;
>>> };
>>>
>>> struct drm_tegra_channel_submit {
>>> __u32 num_syncpt_incrs;
>>> __u32 syncpt_idx;
>>>
>>> __u64 commands_ptr;
>>> __u32 commands_size;
>>> ...
>>> };
>>>
>>> struct drm_tegra_submit_command example[] = {
>>> cmd.gather_uptr{},
>>> ...
>>> gather_data[],
>>> gather_relocs[],
>>> cmd.wait_syncpt{},
>>> ...
>>> };
>>>
>>> This way we will have only a single copy_from_user() for the whole
>>> cmdstream, which should be more efficient to do and nicer from both
>>> userspace and kernel perspectives in regards to forming and parsing the
>>> commands.
>>>
>>
>> I'm not sure I agree it being nicer with regard to forming and parsing
>> - Can you actually place multiple unsized arrays in a struct without
>> pointers?
>
> No :) But there are no unsized arrays here. The parser will read first
> command and then move pointer to the next command based on size of the
> parsed command.
>
>> - gather_data's length is a multiple of 4, and so since we have u64's in
>> drm_tegra_submit_relocation, alignment needs to be taken care of
>> manually, both when forming and kernel needs to validate it while
>> parsing. Depending on number of words in the gather, padding would need
>> to be inserted. We could swap the two around but it still feels more
>> brittle.
>
> Yes, there will be unaligned reads on ARM64, but I don't think it's a
> problem. Isn't it?
It's not a big problem, but I think it would be less error-prone to have
naturally aligned data.
>
>> Also, if we read the gather from a separate address, userspace doesn't
>> necessarily need to know the length of the gather (and number of relocs)
>> upfront, so it's easier to have a builder pattern without needing to
>> copy things later.
>
> Usually there are 2-3 relocs per gather, so userspace could simply
> maintain a fixed-sized static buffer for the relocs (say 32 relocs).
> Once gather is sliced by userspace, the stashed relocs will be appended
> to the comands buffer and next gather will be formed.
>
> The relocs copying doesn't really costs anything for userspace, the
> price is incomparable with the cost of UPTR copying of each gather for
> the kernel.
>
> Well, the UPTR copying isn't expensive, but if there is a single copy
> for the whole IOCTL, then it's even much better!
>
>> If we allow direct page mappings for gathers later, a separate address
>> would make things also a little bit easier. For direct page mappings
>> userspace would need to keep the opcode data unchanged while the job is
>> being executed, so userspace could keep an arena where the gathers are
>> placed, and refer to segments of that, instead of having to keep the
>> drm_tegra_submit_commands alive.
>
> Okay, what about to have a single UPTR buffer for all gathers of a job?
>
> struct drm_tegra_channel_submit {
> u64 commands_ptr;
> u64 gathers_ptr;
>
> u32 commands_size;
> u32 gathers_size;
> ...
> };
>
> struct drm_tegra_submit_command {
> ...
> union {
> struct {
> /*
> * Length of gather in bytes.
> * Must be aligned by 4.
> */
> __u32 length;
> } gather_uptr;
> };
> };
>
> Now we have a single UPTR gather that could be sliced into sub-gathers
> during of job's submission and also the whole data could be copied at
> once by kernel driver for the parsing purposes.
>
> The userspace will have to preallocate a large-enough buffer upfront for
> the gathers. If it runs out of space in the buffer, then it should
> reallocate a larger buffer. Nice and simple :)
>
I think that would work fine for the usecases that I can think of.
Mikko
More information about the dri-devel
mailing list