[PATCH 3/5] drm/panfrost: Add a no execute flag for BO allocations
Robin Murphy
robin.murphy at arm.com
Mon Jul 22 09:50:07 UTC 2019
On 19/07/2019 23:07, Rob Herring wrote:
> On Fri, Jul 19, 2019 at 4:39 AM Steven Price <steven.price at arm.com> wrote:
>>
>> On 18/07/2019 18:03, Rob Herring wrote:
>>> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price at arm.com> wrote:
>>>>
>>>> On 17/07/2019 19:33, Rob Herring wrote:
>>>>> Executable buffers have an alignment restriction that they can't cross
>>>>> 16MB boundary as the GPU program counter is 24-bits. This restriction is
>>>>> currently not handled and we just get lucky. As current userspace
>>>>
>>>> Actually it depends which GPU you are using - some do have a bigger
>>>> program counter - so it might be the choice of GPU to test on rather
>>>> than just luck. But kbase always assumes the worst case (24 bit) as in
>>>> practise that's enough.
>>>>
>>>>> assumes all BOs are executable, that has to remain the default. So add a
>>>>> new PANFROST_BO_NOEXEC flag to allow userspace to indicate which BOs are
>>>>> not executable.
>>>>>
>>>>> There is also a restriction that executable buffers cannot start or end
>>>>> on a 4GB boundary. This is mostly avoided as there is only 4GB of space
>>>>> currently and the beginning is already blocked out for NULL ptr
>>>>> detection. Add support to handle this restriction fully regardless of
>>>>> the current constraints.
>>>>>
>>>>> For existing userspace, all created BOs remain executable, but the GPU
>>>>> VA alignment will be increased to the size of the BO. This shouldn't
>>>>> matter as there is plenty of GPU VA space.
>>>>>
>>>>> Cc: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>> Cc: Boris Brezillon <boris.brezillon at collabora.com>
>>>>> Cc: Robin Murphy <robin.murphy at arm.com>
>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>> Cc: Alyssa Rosenzweig <alyssa at rosenzweig.io>
>>>>> Signed-off-by: Rob Herring <robh at kernel.org>
>>>>> ---
>>>>> drivers/gpu/drm/panfrost/panfrost_drv.c | 20 +++++++++++++++++++-
>>>>> drivers/gpu/drm/panfrost/panfrost_gem.c | 18 ++++++++++++++++--
>>>>> drivers/gpu/drm/panfrost/panfrost_gem.h | 3 ++-
>>>>> drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++
>>>>> include/uapi/drm/panfrost_drm.h | 2 ++
>>>>> 5 files changed, 42 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>> index d354b92964d5..b91e991bc6a3 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>>>> @@ -49,7 +49,8 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
>>>>> struct panfrost_gem_object *bo;
>>>>> struct drm_panfrost_create_bo *args = data;
>>>>>
>>>>> - if (!args->size || args->flags || args->pad)
>>>>> + if (!args->size || args->pad ||
>>>>> + (args->flags & ~PANFROST_BO_NOEXEC))
>>>>> return -EINVAL;
>>>>>
>>>>> bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags,
>>>>> @@ -367,6 +368,22 @@ static struct drm_driver panfrost_drm_driver = {
>>>>> .gem_prime_mmap = drm_gem_prime_mmap,
>>>>> };
>>>>>
>>>>> +#define PFN_4G_MASK ((SZ_4G - 1) >> PAGE_SHIFT)
>>>>> +
>>>>> +static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
>>>>> + unsigned long color,
>>>>> + u64 *start, u64 *end)
>>>>> +{
>>>>> + /* Executable buffers can't start or end on a 4GB boundary */
>>>>> + if (!color) {
>>>>> + if ((*start & PFN_4G_MASK) == 0)
>>>>> + (*start)++;
>>>>> +
>>>>> + if ((*end & PFN_4G_MASK) == 0)
>>>>> + (*end)--;
>>>>> + }
>>>>> +}
>>>>
>>>> Unless I'm mistaken this won't actually provide the guarantee if the
>>>> memory region is >4GB (which admittedly it isn't at the moment). For
>>>> example a 8GB region would have the beginning/end trimmed off, but
>>>> there's another 4GB in the middle which could be allocated next to.
>>>
>>> Humm, other than not allowing sizes greater than 4G-(2*PAGE_SIZE), I'm
>>> not sure how we solve that. I guess avoiding sizes of (n*4G -
>>> PAGE_SIZE) would avoid the problem. Seems like almost 4GB executable
>>> buffer should be enough for anyone(TM).
>>
>> I was thinking of something like:
>>
>> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
>> if ((*start & PFN_4G_MASK) +
>> (SZ_16M >> PAGE_SHIFT) < PFN_4G_MASK)
>> *end = (*start & ~PFN_4G_MASK) +
>> (SZ_4GB - SZ_16M) >> PAGE_SHIFT;
>> else
>> *start = (*end & ~PFN_4G_MASK) + (SZ_16M) >> PAGE_SHIFT;
>> }
>>
>> So split the region depending on where we can find a free 16MB region
>> which doesn't cross 4GB.
>
> Here's what I ended up with. It's slightly different in that the start
> and end don't get 16MB aligned. The code already takes care of the
> alignment which also is not necessarily 16MB, but 'align = size' as
> that's sufficient to not cross a 16MB boundary.
>
> #define PFN_4G (SZ_4G >> PAGE_SHIFT)
> #define PFN_4G_MASK (PFN_4G - 1)
> #define PFN_16M (SZ_16M >> PAGE_SHIFT)
>
> static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node,
> unsigned long color,
> u64 *start, u64 *end)
> {
> /* Executable buffers can't start or end on a 4GB boundary */
> if (!(color & PANFROST_BO_NOEXEC)) {
> if ((*start & PFN_4G_MASK) == 0)
> (*start)++;
>
> if ((*end & PFN_4G_MASK) == 0)
> (*end)--;
>
> /* Ensure start and end are in the same 4GB range */
> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) {
> if ((*start & PFN_4G_MASK) + PFN_16M < PFN_4G_MASK)
> *end = (*start & ~PFN_4G_MASK) + PFN_4G - 1;
> else
> *start = (*end & ~PFN_4G_MASK) + 1;
>
> }
> }
> }
If you're happy with the additional restriction that a buffer can't
cross a 4GB boundary (which does seem to be required for certain kinds
of buffers anyway), then I don't think it needs to be anywhere near that
complex:
if (!(*start & PFN_4G_MASK))
*start++
*end = min(*end, ALIGN(*start, PFN_4G) - 1);
>>
>> But like you say: 4GB should be enough for anyone ;)
>>
>>>> Also a minor issue, but we might want to consider having some constants
>>>> for 'color' - it's not obvious from this code that color==no_exec.
>>>
>>> Yeah, I was just going worry about that when we had a second bit to pass.
>>
>> One other 'minor' issue I noticed as I was writing the above. PAGE_SIZE
>> is of course the CPU page size - we could have the situation of CPU and
>> GPU page size being different (e.g. CONFIG_ARM64_64K_PAGES). I'm not
>> sure whether we want to support this or not (kbase doesn't). Also in
>> theory some GPUs do support 64K (AS_TRANSCFG_ADRMODE_AARCH64_64K) - but
>> kbase has never used this so I don't know if it works... ;)
>
> Shhh! I have thought about that, but was ignoring for now.
In general, I don't think that should be too great a concern - we
already handle it for IOMMUs, provided the minimum IOMMU page size is no
larger than PAGE_SIZE, which should always be the case here too. Looking
at how panfrost_mmu_map() is implemented, and given that we're already
trying to handle stuff at 2MB granularity where possible, I don't see
any obvious reasons for 64K pages not to work already (assuming there
aren't any 4K assumptions baked into the userspace side). I might have
to give it a try...
Robin.
More information about the dri-devel
mailing list