[PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

Wed Jun 14 08:44:03 UTC 2017

On Wed, Jun 14, 2017 at 10:32 AM, Dmitry Osipenko <digetx at gmail.com> wrote:
> On 14.06.2017 10:56, Erik Faye-Lund wrote:
>> On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko <digetx at gmail.com> wrote:
>>> The blocking gather copy allocation is a major performance downside of the
>>> Host1x firewall, it may take hundreds milliseconds which is unacceptable
>>> for the real-time graphics operations. Let's try a non-blocking allocation
>>> first as a least invasive solution, it makes opentegra (Xorg driver)
>>> performance indistinguishable with/without the firewall.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx at gmail.com>
>>> ---
>>>
>>> The other much more invasive solution could be: to have a memory pool
>>> reserved for the gather copy and performing the firewall checks (and the
>>> gather copying) on the actual job submission to the CDMA, not on the job
>>> allocation like it is done now. This solution reduces the overhead of a
>>> gather copy allocations.
>>>
>>>  drivers/gpu/host1x/job.c | 16 ++++++++++++----
>>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
>>> index fc194c676d91..ed0575f14093 100644
>>> --- a/drivers/gpu/host1x/job.c
>>> +++ b/drivers/gpu/host1x/job.c
>>> @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev)
>>>                 size += g->words * sizeof(u32);
>>>         }
>>>
>>> +       /*
>>> +        * Try a non-blocking allocation from a higher priority pools first,
>>> +        * as awaiting for the allocation here is a major performance hit.
>>> +        */
>>>         job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy,
>>> -                                              GFP_KERNEL);
>>> -       if (!job->gather_copy_mapped) {
>>> -               job->gather_copy_mapped = NULL;
>>> +                                              GFP_NOWAIT);
>>> +
>>> +       /* the higher priority allocation failed, try the generic-blocking */
>>> +       if (!job->gather_copy_mapped)
>>> +               job->gather_copy_mapped = dma_alloc_wc(dev, size,
>>> +                                                      &job->gather_copy,
>>> +                                                      GFP_KERNEL);
>>> +       if (!job->gather_copy_mapped)
>>>                 return -ENOMEM;
>>> -       }
>>>
>>>         job->gather_copy_size = size;
>>
>> Can't we just have a static fixed-size buffer, and validate chunks at
>> the time? Or why can't we validate directly on the mapped pointers?
>>
>> It feels pretty silly to have to allocate memory in the first place here...
>>
>
> We can't validate the mapped pointers because userspace may write to the buffers
> memory at any time. Maybe it should be possible to write-protect cmdbuffers
> memory for the time of its submission and execution, but I'm not sure whether
> it's worth the effort. The current-proposed solution is efficient and least
> invasive.

Right, good point.

Reviewed-by: Erik Faye-Lund <kusmabite at gmail.com>