26.01.2021 05:45, Mikko Perttunen пишет:
- We will need to allocate a host1x BO for a job's cmdstream and add a
restart command to the end of the job's stream. CDMA will jump into the job's stream from push buffer.
We could add a flag for that to drm_tegra_submit_cmd_gather, saying that gather should be inlined into job's main cmdstream.
This will remove a need to have a large push buffer that will easily overflow, it's a real problem and upstream driver even has a bug where it locks up on overflow.
How it will look from CDMA perspective:
PUSHBUF |
... | | JOB | | --------- | JOB GATHER | RESTART ------> CMD | -------------- | |GATHER -------> DATA | ... <---------- RESTART| | | | | |
Let me check if I understood you correctly:
- You would like to have the job's cmdbuf have further GATHER opcodes
that jump into smaller gathers?
I want jobs to be a self-contained. Instead of pushing commands to the PB of a kernel driver, we'll push them to job's cmdstream. This means that for each new job we'll need to allocate a host1x buffer.
I assume this is needed because currently WAITs are placed into the pushbuffer, so the job will take a lot of space in the pushbuffer if there are a lot of waits (and GATHERs in between these waits)?
Yes, and with drm-sched we will just need to limit the max number of jobs in the h/w queue (i.e. push buffer) and then push buffer won't ever overflow. Problem solved.
If so, perhaps as a simpler alternative we could change the firewall to allow SETCLASS into HOST1X for waiting specifically, then userspace could just submit one big cmdbuf taking only little space in the pushbuffer? Although that would only allow direct ID/threshold waits.
My solution doesn't require changes to firewall, not sure whether it's easier.
In any case, it seems that this can be added in a later patch, so we should omit it from this series for simplicity. If it is impossible for the userspace to deal with it, we could disable the firewall temporarily, or implement the above change in the firewall.
I won't be able to test UAPI fully until all features are at least on par with the experimental driver of grate kernel.