[PATCH 3/5] drm/panfrost: Add a no execute flag for BO allocations

Mon Jul 22 12:09:43 UTC 2019

On 22/07/2019 11:07, Steven Price wrote:
> On 22/07/2019 10:50, Robin Murphy wrote:
>> On 19/07/2019 23:07, Rob Herring wrote:
>>> On Fri, Jul 19, 2019 at 4:39 AM Steven Price <steven.price at arm.com>
>>> wrote:
>>>>
>>>> On 18/07/2019 18:03, Rob Herring wrote:
>>>>> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price at arm.com>
>>>>> wrote:
>>>>>>
>>>>>> On 17/07/2019 19:33, Rob Herring wrote:
>>>>>>> Executable buffers have an alignment restriction that they can't
>>>>>>> cross
>>>>>>> 16MB boundary as the GPU program counter is 24-bits. This
>>>>>>> restriction is
>>>>>>> currently not handled and we just get lucky. As current userspace
>>>>>>
>>>>>> Actually it depends which GPU you are using - some do have a bigger
>>>>>> program counter - so it might be the choice of GPU to test on rather
>>>>>> than just luck. But kbase always assumes the worst case (24 bit) as in
>>>>>> practise that's enough.
>>>>>>
>>>>>>> assumes all BOs are executable, that has to remain the default. So
>>>>>>> add a
>>>>>>> new PANFROST_BO_NOEXEC flag to allow userspace to indicate which
>>>>>>> BOs are
>>>>>>> not executable.
>>>>>>>
>>>>>>> There is also a restriction that executable buffers cannot start
>>>>>>> or end
>>>>>>> on a 4GB boundary. This is mostly avoided as there is only 4GB of
>>>>>>> space
>>>>>>> currently and the beginning is already blocked out for NULL ptr
>>>>>>> detection. Add support to handle this restriction fully regardless of
>>>>>>> the current constraints.
>>>>>>>
>>>>>>> For existing userspace, all created BOs remain executable, but the
>>>>>>> GPU
>>>>>>> VA alignment will be increased to the size of the BO. This shouldn't
>>>>>>> matter as there is plenty of GPU VA space.
>>>>>>>
>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>>>> Cc: Boris Brezillon <boris.brezillon at collabora.com>
>>>>>>> Cc: Robin Murphy <robin.murphy at arm.com>
>>>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>>>> Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>
>>>>>>> Signed-off-by: Rob Herring <robh at kernel.org>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/panfrost/panfrost_drv.c | 20 +++++++++++++++++++-
>>>>>>>    drivers/gpu/drm/panfrost/panfrost_gem.c | 18 ++++++++++++++++--
>>>>>>>    drivers/gpu/drm/panfrost/panfrost_gem.h |  3 ++-
>>>>>>>    drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 +++
>>>>>>>    include/uapi/drm/panfrost_drm.h         |  2 ++
>>>>>>>    5 files changed, 42 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>> index d354b92964d5..b91e991bc6a3 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>>>> @@ -49,7 +49,8 @@ static int panfrost_ioctl_create_bo(struct
>>>>>>> drm_device *dev, void *data,
>>>>>>>         struct panfrost_gem_object *bo;
>>>>>>>         struct drm_panfrost_create_bo *args = data;
>>>>>>>
>>>>>>> -     if (!args->size || args->flags || args->pad)
>>>>>>> +     if (!args->size || args->pad ||
>>>>>>> +         (args->flags & ~PANFROST_BO_NOEXEC))
>>>>>>>                 return -EINVAL;
>>>>>>>
>>>>>>>         bo = panfrost_gem_create_with_handle(file, dev, args->size,
>>>>>>> args->flags,
>>>>>>> @@ -367,6 +368,22 @@ static struct drm_driver panfrost_drm_driver = {
>>>>>>>         .gem_prime_mmap         = drm_gem_prime_mmap,
>>>>>>>    };
>>>>>>>
>>>>>>> +#define PFN_4G_MASK    ((SZ_4G - 1) >> PAGE_SHIFT)
>>>>>>> +
>>>>>>> +static void panfrost_drm_mm_color_adjust(const struct drm_mm_node
>>>>>>> *node,
>>>>>>> +                                      unsigned long color,
>>>>>>> +                                      u64 *start, u64 *end)
>>>>>>> +{
>>>>>>> +     /* Executable buffers can't start or end on a 4GB boundary */
>>>>>>> +     if (!color) {
>>>>>>> +             if ((*start & PFN_4G_MASK) == 0)
>>>>>>> +                     (*start)++;
>>>>>>> +
>>>>>>> +             if ((*end & PFN_4G_MASK) == 0)
>>>>>>> +                     (*end)--;
>>>>>>> +     }
>>>>>>> +}
>>>>>>
>>>>>> Unless I'm mistaken this won't actually provide the guarantee if the
>>>>>> memory region is >4GB (which admittedly it isn't at the moment). For
>>>>>> example a 8GB region would have the beginning/end trimmed off, but
>>>>>> there's another 4GB in the middle which could be allocated next to.
>>>>>
>>>>> Humm, other than not allowing sizes greater than 4G-(2*PAGE_SIZE), I'm
>>>>> not sure how we solve that. I guess avoiding sizes of (n*4G -
>>>>> PAGE_SIZE) would avoid the problem. Seems like almost 4GB executable
>>>>> buffer should be enough for anyone(TM).
>>>>
>>>> I was thinking of something like:
>>>>
>>>> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>>>>           if ((*start & PFN_4G_MASK) +
>>>>               (SZ_16M >> PAGE_SHIFT) < PFN_4G_MASK)
>>>>                   *end = (*start & ~PFN_4G_MASK) +
>>>>                           (SZ_4GB - SZ_16M) >> PAGE_SHIFT;
>>>>           else
>>>>                   *start = (*end & ~PFN_4G_MASK) + (SZ_16M) >>
>>>> PAGE_SHIFT;
>>>> }
>>>>
>>>> So split the region depending on where we can find a free 16MB region
>>>> which doesn't cross 4GB.
>>>
>>> Here's what I ended up with. It's slightly different in that the start
>>> and end don't get 16MB aligned. The code already takes care of the
>>> alignment which also is not necessarily 16MB, but 'align = size' as
>>> that's sufficient to not cross a 16MB boundary.
>>>
>>> #define PFN_4G          (SZ_4G >> PAGE_SHIFT)
>>> #define PFN_4G_MASK     (PFN_4G - 1)
>>> #define PFN_16M         (SZ_16M >> PAGE_SHIFT)
>>>
>>> static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
>>>                                            unsigned long color,
>>>                                            u64 *start, u64 *end)
>>> {
>>>           /* Executable buffers can't start or end on a 4GB boundary */
>>>           if (!(color & PANFROST_BO_NOEXEC)) {
>>>                   if ((*start & PFN_4G_MASK) == 0)
>>>                           (*start)++;
>>>
>>>                   if ((*end & PFN_4G_MASK) == 0)
>>>                           (*end)--;
>>>
>>>                   /* Ensure start and end are in the same 4GB range */
>>>                   if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>>>                           if ((*start & PFN_4G_MASK) + PFN_16M <
>>> PFN_4G_MASK)
>>>                                   *end = (*start & ~PFN_4G_MASK) +
>>> PFN_4G - 1;
>>>                           else
>>>                                   *start = (*end & ~PFN_4G_MASK) + 1;
>>>
>>>                   }
>>>           }
>>> }
>>
>> If you're happy with the additional restriction that a buffer can't
>> cross a 4GB boundary (which does seem to be required for certain kinds
>> of buffers anyway), then I don't think it needs to be anywhere near that
>> complex:
>>
>>      if (!(*start & PFN_4G_MASK))
>>          *start++
>>      *end = min(*end, ALIGN(*start, PFN_4G) - 1);
> 
> What happens if *start is very near the end of a 4GB boundary? In that
> case *end - *start might not be as big as 16MB and we could end up
> failing the allocation IIUC.

Ahem... well, the thing about complicated-and-hard-to-read code is that 
it turns out to be complicated and hard to read correctly :)

FWIW, having taken a closer look, my interpretation would be something 
like (even more untested than before):

	u64 start_seg = ALIGN(*start, PFN_4G);
	u64 end_seg = ALIGN_DOWN(*end, PFN_4G);

	if (start_seg - start < end - end_seg) {
		*start = end_seg + 1;
		*end = min(*end, end_seg + PFN_4G - 1);
	} else {
		*end = start_seg - 1;
		*start = max(*start, start_seg - PFN_4G + 1);
	}

but at this point I'm sure we're well into personal preference and 
"shorter does not necessarily imply clearer" territory. More 
importantly, though, it now occurs to me that the "pick the biggest end" 
approach seems inherently suboptimal for cases where the [start,end] 
interval crosses *more* than one boundary. For, say, start = PFN_4G - 1, 
end = 2 * PFN_4G + 1, either way we'd get caught up on the single page 
at one end and ignore the full 4GB in the middle :/

>>>>
>>>> But like you say: 4GB should be enough for anyone ;)
>>>>
>>>>>> Also a minor issue, but we might want to consider having some
>>>>>> constants
>>>>>> for 'color' - it's not obvious from this code that color==no_exec.
>>>>>
>>>>> Yeah, I was just going worry about that when we had a second bit to
>>>>> pass.
>>>>
>>>> One other 'minor' issue I noticed as I was writing the above. PAGE_SIZE
>>>> is of course the CPU page size - we could have the situation of CPU and
>>>> GPU page size being different (e.g. CONFIG_ARM64_64K_PAGES). I'm not
>>>> sure whether we want to support this or not (kbase doesn't). Also in
>>>> theory some GPUs do support 64K (AS_TRANSCFG_ADRMODE_AARCH64_64K) - but
>>>> kbase has never used this so I don't know if it works... ;)
>>>
>>> Shhh! I have thought about that, but was ignoring for now.
>>
>> In general, I don't think that should be too great a concern - we
>> already handle it for IOMMUs, provided the minimum IOMMU page size is no
>> larger than PAGE_SIZE, which should always be the case here too. Looking
>> at how panfrost_mmu_map() is implemented, and given that we're already
>> trying to handle stuff at 2MB granularity where possible, I don't see
>> any obvious reasons for 64K pages not to work already (assuming there
>> aren't any 4K assumptions baked into the userspace side). I might have
>> to give it a try...
> 
> I'd be interested to know if it works. I know that kbase incorrectly
> uses PAGE_SIZE in several places (in particular assuming it is the size
> of the page tables). And I was aware I was in danger of slipping into
> the same mindset here.
> 
> User space shouldn't care too much - other than the size of buffers
> allocated being rounded up to the CPU's page size. At least the Panfrost
> user/kernel ABI has sizes in bytes not pages (unlike kbase).

Sounds promising - my Juno branch is in a bit of a mess at the moment 
but once I've got that cleaned up I'll have a quick play.

Robin.