[PATCH 3/5] drm/panfrost: Add a no execute flag for BO allocations

Steven Price steven.price at arm.com
Mon Jul 22 10:07:33 UTC 2019


On 22/07/2019 10:50, Robin Murphy wrote:
> On 19/07/2019 23:07, Rob Herring wrote:
>> On Fri, Jul 19, 2019 at 4:39 AM Steven Price <steven.price at arm.com>
>> wrote:
>>>
>>> On 18/07/2019 18:03, Rob Herring wrote:
>>>> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price at arm.com>
>>>> wrote:
>>>>>
>>>>> On 17/07/2019 19:33, Rob Herring wrote:
>>>>>> Executable buffers have an alignment restriction that they can't
>>>>>> cross
>>>>>> 16MB boundary as the GPU program counter is 24-bits. This
>>>>>> restriction is
>>>>>> currently not handled and we just get lucky. As current userspace
>>>>>
>>>>> Actually it depends which GPU you are using - some do have a bigger
>>>>> program counter - so it might be the choice of GPU to test on rather
>>>>> than just luck. But kbase always assumes the worst case (24 bit) as in
>>>>> practise that's enough.
>>>>>
>>>>>> assumes all BOs are executable, that has to remain the default. So
>>>>>> add a
>>>>>> new PANFROST_BO_NOEXEC flag to allow userspace to indicate which
>>>>>> BOs are
>>>>>> not executable.
>>>>>>
>>>>>> There is also a restriction that executable buffers cannot start
>>>>>> or end
>>>>>> on a 4GB boundary. This is mostly avoided as there is only 4GB of
>>>>>> space
>>>>>> currently and the beginning is already blocked out for NULL ptr
>>>>>> detection. Add support to handle this restriction fully regardless of
>>>>>> the current constraints.
>>>>>>
>>>>>> For existing userspace, all created BOs remain executable, but the
>>>>>> GPU
>>>>>> VA alignment will be increased to the size of the BO. This shouldn't
>>>>>> matter as there is plenty of GPU VA space.
>>>>>>
>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>>> Cc: Boris Brezillon <boris.brezillon at collabora.com>
>>>>>> Cc: Robin Murphy <robin.murphy at arm.com>
>>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>>> Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>
>>>>>> Signed-off-by: Rob Herring <robh at kernel.org>
>>>>>> ---
>>>>>>   drivers/gpu/drm/panfrost/panfrost_drv.c | 20 +++++++++++++++++++-
>>>>>>   drivers/gpu/drm/panfrost/panfrost_gem.c | 18 ++++++++++++++++--
>>>>>>   drivers/gpu/drm/panfrost/panfrost_gem.h |  3 ++-
>>>>>>   drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 +++
>>>>>>   include/uapi/drm/panfrost_drm.h         |  2 ++
>>>>>>   5 files changed, 42 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>> b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>> index d354b92964d5..b91e991bc6a3 100644
>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>> @@ -49,7 +49,8 @@ static int panfrost_ioctl_create_bo(struct
>>>>>> drm_device *dev, void *data,
>>>>>>        struct panfrost_gem_object *bo;
>>>>>>        struct drm_panfrost_create_bo *args = data;
>>>>>>
>>>>>> -     if (!args->size || args->flags || args->pad)
>>>>>> +     if (!args->size || args->pad ||
>>>>>> +         (args->flags & ~PANFROST_BO_NOEXEC))
>>>>>>                return -EINVAL;
>>>>>>
>>>>>>        bo = panfrost_gem_create_with_handle(file, dev, args->size,
>>>>>> args->flags,
>>>>>> @@ -367,6 +368,22 @@ static struct drm_driver panfrost_drm_driver = {
>>>>>>        .gem_prime_mmap         = drm_gem_prime_mmap,
>>>>>>   };
>>>>>>
>>>>>> +#define PFN_4G_MASK    ((SZ_4G - 1) >> PAGE_SHIFT)
>>>>>> +
>>>>>> +static void panfrost_drm_mm_color_adjust(const struct drm_mm_node
>>>>>> *node,
>>>>>> +                                      unsigned long color,
>>>>>> +                                      u64 *start, u64 *end)
>>>>>> +{
>>>>>> +     /* Executable buffers can't start or end on a 4GB boundary */
>>>>>> +     if (!color) {
>>>>>> +             if ((*start & PFN_4G_MASK) == 0)
>>>>>> +                     (*start)++;
>>>>>> +
>>>>>> +             if ((*end & PFN_4G_MASK) == 0)
>>>>>> +                     (*end)--;
>>>>>> +     }
>>>>>> +}
>>>>>
>>>>> Unless I'm mistaken this won't actually provide the guarantee if the
>>>>> memory region is >4GB (which admittedly it isn't at the moment). For
>>>>> example a 8GB region would have the beginning/end trimmed off, but
>>>>> there's another 4GB in the middle which could be allocated next to.
>>>>
>>>> Humm, other than not allowing sizes greater than 4G-(2*PAGE_SIZE), I'm
>>>> not sure how we solve that. I guess avoiding sizes of (n*4G -
>>>> PAGE_SIZE) would avoid the problem. Seems like almost 4GB executable
>>>> buffer should be enough for anyone(TM).
>>>
>>> I was thinking of something like:
>>>
>>> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>>>          if ((*start & PFN_4G_MASK) +
>>>              (SZ_16M >> PAGE_SHIFT) < PFN_4G_MASK)
>>>                  *end = (*start & ~PFN_4G_MASK) +
>>>                          (SZ_4GB - SZ_16M) >> PAGE_SHIFT;
>>>          else
>>>                  *start = (*end & ~PFN_4G_MASK) + (SZ_16M) >>
>>> PAGE_SHIFT;
>>> }
>>>
>>> So split the region depending on where we can find a free 16MB region
>>> which doesn't cross 4GB.
>>
>> Here's what I ended up with. It's slightly different in that the start
>> and end don't get 16MB aligned. The code already takes care of the
>> alignment which also is not necessarily 16MB, but 'align = size' as
>> that's sufficient to not cross a 16MB boundary.
>>
>> #define PFN_4G          (SZ_4G >> PAGE_SHIFT)
>> #define PFN_4G_MASK     (PFN_4G - 1)
>> #define PFN_16M         (SZ_16M >> PAGE_SHIFT)
>>
>> static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
>>                                           unsigned long color,
>>                                           u64 *start, u64 *end)
>> {
>>          /* Executable buffers can't start or end on a 4GB boundary */
>>          if (!(color & PANFROST_BO_NOEXEC)) {
>>                  if ((*start & PFN_4G_MASK) == 0)
>>                          (*start)++;
>>
>>                  if ((*end & PFN_4G_MASK) == 0)
>>                          (*end)--;
>>
>>                  /* Ensure start and end are in the same 4GB range */
>>                  if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>>                          if ((*start & PFN_4G_MASK) + PFN_16M <
>> PFN_4G_MASK)
>>                                  *end = (*start & ~PFN_4G_MASK) +
>> PFN_4G - 1;
>>                          else
>>                                  *start = (*end & ~PFN_4G_MASK) + 1;
>>
>>                  }
>>          }
>> }
> 
> If you're happy with the additional restriction that a buffer can't
> cross a 4GB boundary (which does seem to be required for certain kinds
> of buffers anyway), then I don't think it needs to be anywhere near that
> complex:
> 
>     if (!(*start & PFN_4G_MASK))
>         *start++
>     *end = min(*end, ALIGN(*start, PFN_4G) - 1);

What happens if *start is very near the end of a 4GB boundary? In that
case *end - *start might not be as big as 16MB and we could end up
failing the allocation IIUC.

>>>
>>> But like you say: 4GB should be enough for anyone ;)
>>>
>>>>> Also a minor issue, but we might want to consider having some
>>>>> constants
>>>>> for 'color' - it's not obvious from this code that color==no_exec.
>>>>
>>>> Yeah, I was just going worry about that when we had a second bit to
>>>> pass.
>>>
>>> One other 'minor' issue I noticed as I was writing the above. PAGE_SIZE
>>> is of course the CPU page size - we could have the situation of CPU and
>>> GPU page size being different (e.g. CONFIG_ARM64_64K_PAGES). I'm not
>>> sure whether we want to support this or not (kbase doesn't). Also in
>>> theory some GPUs do support 64K (AS_TRANSCFG_ADRMODE_AARCH64_64K) - but
>>> kbase has never used this so I don't know if it works... ;)
>>
>> Shhh! I have thought about that, but was ignoring for now.
> 
> In general, I don't think that should be too great a concern - we
> already handle it for IOMMUs, provided the minimum IOMMU page size is no
> larger than PAGE_SIZE, which should always be the case here too. Looking
> at how panfrost_mmu_map() is implemented, and given that we're already
> trying to handle stuff at 2MB granularity where possible, I don't see
> any obvious reasons for 64K pages not to work already (assuming there
> aren't any 4K assumptions baked into the userspace side). I might have
> to give it a try...

I'd be interested to know if it works. I know that kbase incorrectly
uses PAGE_SIZE in several places (in particular assuming it is the size
of the page tables). And I was aware I was in danger of slipping into
the same mindset here.

User space shouldn't care too much - other than the size of buffers
allocated being rounded up to the CPU's page size. At least the Panfrost
user/kernel ABI has sizes in bytes not pages (unlike kbase).

Steve


More information about the dri-devel mailing list