[PATCH 3/5] drm/panfrost: Add a no execute flag for BO allocations

Mon Jul 22 12:19:58 UTC 2019

On 22/07/2019 13:09, Robin Murphy wrote:
> On 22/07/2019 11:07, Steven Price wrote:
>> On 22/07/2019 10:50, Robin Murphy wrote:
>>> On 19/07/2019 23:07, Rob Herring wrote:
>>>> On Fri, Jul 19, 2019 at 4:39 AM Steven Price <steven.price at arm.com>
>>>> wrote:
>>>>>
>>>>> On 18/07/2019 18:03, Rob Herring wrote:
>>>>>> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price at arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 17/07/2019 19:33, Rob Herring wrote:
>>>>>>>> Executable buffers have an alignment restriction that they can't
>>>>>>>> cross
>>>>>>>> 16MB boundary as the GPU program counter is 24-bits. This
>>>>>>>> restriction is
>>>>>>>> currently not handled and we just get lucky. As current userspace
>>>>>>>
>>>>>>> Actually it depends which GPU you are using - some do have a bigger
>>>>>>> program counter - so it might be the choice of GPU to test on rather
>>>>>>> than just luck. But kbase always assumes the worst case (24 bit)
>>>>>>> as in
>>>>>>> practise that's enough.
>>>>>>>
>>>>>>>> assumes all BOs are executable, that has to remain the default. So
>>>>>>>> add a
>>>>>>>> new PANFROST_BO_NOEXEC flag to allow userspace to indicate which
>>>>>>>> BOs are
>>>>>>>> not executable.
>>>>>>>>
>>>>>>>> There is also a restriction that executable buffers cannot start
>>>>>>>> or end
>>>>>>>> on a 4GB boundary. This is mostly avoided as there is only 4GB of
>>>>>>>> space
>>>>>>>> currently and the beginning is already blocked out for NULL ptr
>>>>>>>> detection. Add support to handle this restriction fully
>>>>>>>> regardless of
>>>>>>>> the current constraints.
>>>>>>>>
>>>>>>>> For existing userspace, all created BOs remain executable, but the
>>>>>>>> GPU
>>>>>>>> VA alignment will be increased to the size of the BO. This
>>>>>>>> shouldn't
>>>>>>>> matter as there is plenty of GPU VA space.
>>>>>>>>
>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>>>>> Cc: Boris Brezillon <boris.brezillon at collabora.com>
>>>>>>>> Cc: Robin Murphy <robin.murphy at arm.com>
>>>>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>>>>> Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>
>>>>>>>> Signed-off-by: Rob Herring <robh at kernel.org>
>>>>>>>> ---
>>>>>>>>    drivers/gpu/drm/panfrost/panfrost_drv.c | 20
>>>>>>>> +++++++++++++++++++-
>>>>>>>>    drivers/gpu/drm/panfrost/panfrost_gem.c | 18 ++++++++++++++++--
>>>>>>>>    drivers/gpu/drm/panfrost/panfrost_gem.h |  3 ++-
>>>>>>>>    drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 +++
>>>>>>>>    include/uapi/drm/panfrost_drm.h         |  2 ++
>>>>>>>>    5 files changed, 42 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>>> index d354b92964d5..b91e991bc6a3 100644
>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>>> @@ -49,7 +49,8 @@ static int panfrost_ioctl_create_bo(struct
>>>>>>>> drm_device *dev, void *data,
>>>>>>>>         struct panfrost_gem_object *bo;
>>>>>>>>         struct drm_panfrost_create_bo *args = data;
>>>>>>>>
>>>>>>>> -     if (!args->size || args->flags || args->pad)
>>>>>>>> +     if (!args->size || args->pad ||
>>>>>>>> +         (args->flags & ~PANFROST_BO_NOEXEC))
>>>>>>>>                 return -EINVAL;
>>>>>>>>
>>>>>>>>         bo = panfrost_gem_create_with_handle(file, dev, args->size,
>>>>>>>> args->flags,
>>>>>>>> @@ -367,6 +368,22 @@ static struct drm_driver
>>>>>>>> panfrost_drm_driver = {
>>>>>>>>         .gem_prime_mmap         = drm_gem_prime_mmap,
>>>>>>>>    };
>>>>>>>>
>>>>>>>> +#define PFN_4G_MASK    ((SZ_4G - 1) >> PAGE_SHIFT)
>>>>>>>> +
>>>>>>>> +static void panfrost_drm_mm_color_adjust(const struct drm_mm_node
>>>>>>>> *node,
>>>>>>>> +                                      unsigned long color,
>>>>>>>> +                                      u64 *start, u64 *end)
>>>>>>>> +{
>>>>>>>> +     /* Executable buffers can't start or end on a 4GB boundary */
>>>>>>>> +     if (!color) {
>>>>>>>> +             if ((*start & PFN_4G_MASK) == 0)
>>>>>>>> +                     (*start)++;
>>>>>>>> +
>>>>>>>> +             if ((*end & PFN_4G_MASK) == 0)
>>>>>>>> +                     (*end)--;
>>>>>>>> +     }
>>>>>>>> +}
>>>>>>>
>>>>>>> Unless I'm mistaken this won't actually provide the guarantee if the
>>>>>>> memory region is >4GB (which admittedly it isn't at the moment). For
>>>>>>> example a 8GB region would have the beginning/end trimmed off, but
>>>>>>> there's another 4GB in the middle which could be allocated next to.
>>>>>>
>>>>>> Humm, other than not allowing sizes greater than 4G-(2*PAGE_SIZE),
>>>>>> I'm
>>>>>> not sure how we solve that. I guess avoiding sizes of (n*4G -
>>>>>> PAGE_SIZE) would avoid the problem. Seems like almost 4GB executable
>>>>>> buffer should be enough for anyone(TM).
>>>>>
>>>>> I was thinking of something like:
>>>>>
>>>>> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>>>>>           if ((*start & PFN_4G_MASK) +
>>>>>               (SZ_16M >> PAGE_SHIFT) < PFN_4G_MASK)
>>>>>                   *end = (*start & ~PFN_4G_MASK) +
>>>>>                           (SZ_4GB - SZ_16M) >> PAGE_SHIFT;
>>>>>           else
>>>>>                   *start = (*end & ~PFN_4G_MASK) + (SZ_16M) >>
>>>>> PAGE_SHIFT;
>>>>> }
>>>>>
>>>>> So split the region depending on where we can find a free 16MB region
>>>>> which doesn't cross 4GB.
>>>>
>>>> Here's what I ended up with. It's slightly different in that the start
>>>> and end don't get 16MB aligned. The code already takes care of the
>>>> alignment which also is not necessarily 16MB, but 'align = size' as
>>>> that's sufficient to not cross a 16MB boundary.
>>>>
>>>> #define PFN_4G          (SZ_4G >> PAGE_SHIFT)
>>>> #define PFN_4G_MASK     (PFN_4G - 1)
>>>> #define PFN_16M         (SZ_16M >> PAGE_SHIFT)
>>>>
>>>> static void panfrost_drm_mm_color_adjust(const struct drm_mm_node
>>>> *node,
>>>>                                            unsigned long color,
>>>>                                            u64 *start, u64 *end)
>>>> {
>>>>           /* Executable buffers can't start or end on a 4GB boundary */
>>>>           if (!(color & PANFROST_BO_NOEXEC)) {
>>>>                   if ((*start & PFN_4G_MASK) == 0)
>>>>                           (*start)++;
>>>>
>>>>                   if ((*end & PFN_4G_MASK) == 0)
>>>>                           (*end)--;
>>>>
>>>>                   /* Ensure start and end are in the same 4GB range */
>>>>                   if ((*end & ~PFN_4G_MASK) != (*start &
>>>> ~PFN_4G_MASK)) {
>>>>                           if ((*start & PFN_4G_MASK) + PFN_16M <
>>>> PFN_4G_MASK)
>>>>                                   *end = (*start & ~PFN_4G_MASK) +
>>>> PFN_4G - 1;
>>>>                           else
>>>>                                   *start = (*end & ~PFN_4G_MASK) + 1;
>>>>
>>>>                   }
>>>>           }
>>>> }
>>>
>>> If you're happy with the additional restriction that a buffer can't
>>> cross a 4GB boundary (which does seem to be required for certain kinds
>>> of buffers anyway), then I don't think it needs to be anywhere near that
>>> complex:
>>>
>>>      if (!(*start & PFN_4G_MASK))
>>>          *start++
>>>      *end = min(*end, ALIGN(*start, PFN_4G) - 1);
>>
>> What happens if *start is very near the end of a 4GB boundary? In that
>> case *end - *start might not be as big as 16MB and we could end up
>> failing the allocation IIUC.
> 
> Ahem... well, the thing about complicated-and-hard-to-read code is that
> it turns out to be complicated and hard to read correctly :)
> 
> FWIW, having taken a closer look, my interpretation would be something
> like (even more untested than before):
> 
>     u64 start_seg = ALIGN(*start, PFN_4G);
>     u64 end_seg = ALIGN_DOWN(*end, PFN_4G);
> 
>     if (start_seg - start < end - end_seg) {
>         *start = end_seg + 1;
>         *end = min(*end, end_seg + PFN_4G - 1);
>     } else {
>         *end = start_seg - 1;
>         *start = max(*start, start_seg - PFN_4G + 1);
>     }
> 
> but at this point I'm sure we're well into personal preference and
> "shorter does not necessarily imply clearer" territory. More
> importantly, though, it now occurs to me that the "pick the biggest end"
> approach seems inherently suboptimal for cases where the [start,end]
> interval crosses *more* than one boundary. For, say, start = PFN_4G - 1,
> end = 2 * PFN_4G + 1, either way we'd get caught up on the single page
> at one end and ignore the full 4GB in the middle :/

Indeed, that case was just occurring to me too! How about:

	u64 next_seg = ALIGN(*start, PFN_4G);

	if (next_seg - *start <= PFN_16M)
		*start = next_seg + 1;

	*end = min(*end, ALIGN(*start, PFN_4G) - 1);

So always allocate at the beginning, but skip past the next 4GB boundary
if there's less than 16MB left (or equal to avoid the 4GB boundary).

Steve

>>>>>
>>>>> But like you say: 4GB should be enough for anyone ;)
>>>>>
>>>>>>> Also a minor issue, but we might want to consider having some
>>>>>>> constants
>>>>>>> for 'color' - it's not obvious from this code that color==no_exec.
>>>>>>
>>>>>> Yeah, I was just going worry about that when we had a second bit to
>>>>>> pass.
>>>>>
>>>>> One other 'minor' issue I noticed as I was writing the above.
>>>>> PAGE_SIZE
>>>>> is of course the CPU page size - we could have the situation of CPU
>>>>> and
>>>>> GPU page size being different (e.g. CONFIG_ARM64_64K_PAGES). I'm not
>>>>> sure whether we want to support this or not (kbase doesn't). Also in
>>>>> theory some GPUs do support 64K (AS_TRANSCFG_ADRMODE_AARCH64_64K) -
>>>>> but
>>>>> kbase has never used this so I don't know if it works... ;)
>>>>
>>>> Shhh! I have thought about that, but was ignoring for now.
>>>
>>> In general, I don't think that should be too great a concern - we
>>> already handle it for IOMMUs, provided the minimum IOMMU page size is no
>>> larger than PAGE_SIZE, which should always be the case here too. Looking
>>> at how panfrost_mmu_map() is implemented, and given that we're already
>>> trying to handle stuff at 2MB granularity where possible, I don't see
>>> any obvious reasons for 64K pages not to work already (assuming there
>>> aren't any 4K assumptions baked into the userspace side). I might have
>>> to give it a try...
>>
>> I'd be interested to know if it works. I know that kbase incorrectly
>> uses PAGE_SIZE in several places (in particular assuming it is the size
>> of the page tables). And I was aware I was in danger of slipping into
>> the same mindset here.
>>
>> User space shouldn't care too much - other than the size of buffers
>> allocated being rounded up to the CPU's page size. At least the Panfrost
>> user/kernel ABI has sizes in bytes not pages (unlike kbase).
> 
> Sounds promising - my Juno branch is in a bit of a mess at the moment
> but once I've got that cleaned up I'll have a quick play.
> 
> Robin.
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel