[Freedreno] [PATCH 1/3] drm/msm: Fix speed-bin support not to access outside valid memory

Fri Mar 5 16:07:35 UTC 2021

On 05/03/2021 14:45, Doug Anderson wrote:
> Hi,
> 
> On Fri, Mar 5, 2021 at 2:28 AM Srinivas Kandagatla
> <srinivas.kandagatla at linaro.org> wrote:
>>
>>
>>
>> On 27/02/2021 00:26, Douglas Anderson wrote:
>>> When running the latest kernel on an sc7180 with KASAN I got this
>>> splat:
>>>     BUG: KASAN: slab-out-of-bounds in a6xx_gpu_init+0x618/0x644
>>>     Read of size 4 at addr ffffff8088f36100 by task kworker/7:1/58
>>>     CPU: 7 PID: 58 Comm: kworker/7:1 Not tainted 5.11.0+ #3
>>>     Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>>>     Workqueue: events deferred_probe_work_func
>>>     Call trace:
>>>      dump_backtrace+0x0/0x3a8
>>>      show_stack+0x24/0x30
>>>      dump_stack+0x174/0x1e0
>>>      print_address_description+0x70/0x2e4
>>>      kasan_report+0x178/0x1bc
>>>      __asan_report_load4_noabort+0x44/0x50
>>>      a6xx_gpu_init+0x618/0x644
>>>      adreno_bind+0x26c/0x438
>>>
>>> This is because the speed bin is defined like this:
>>>     gpu_speed_bin: gpu_speed_bin at 1d2 {
>>>       reg = <0x1d2 0x2>;
>>>       bits = <5 8>;
>>>     };
>>>
>>> As you can see the "length" is 2 bytes. That means that the nvmem
>>> subsystem allocates only 2 bytes. The GPU code, however, was casting
>>> the pointer allocated by nvmem to a (u32 *) and dereferencing. That's
>>> not so good.
>>>
>>> Let's fix this to just use the nvmem_cell_read_u16() accessor function
>>> which simplifies things and also gets rid of the splat.
>>>
>>> Let's also put an explicit conversion from little endian in place just
>>> to make things clear. The nvmem subsystem today is assuming little
>>> endian and this makes it clear. Specifically, the way the above sc7180
>>> cell is interpreted:
>>>
>>> NVMEM:
>>>    +--------+--------+--------+--------+--------+
>>>    | ...... | 0x1d3  | 0x1d2  | ...... | 0x000  |
>>>    +--------+--------+--------+--------+--------+
>>>                 ^       ^
>>>                msb     lsb
>>>
>>> You can see that the least significant data is at the lower address
>>> which is little endian.
>>>
>>> NOTE: someone who is truly paying attention might wonder about me
>>> picking the "u16" version of this accessor instead of the "u8" (since
>>> the value is 8 bits big) or the u32 version (just for fun). At the
>>> moment you need to pick the accessor that exactly matches the length
>>> the cell was specified as in the device tree. Hopefully future
>>> patches to the nvmem subsystem will fix this.
>>>
>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
>>> Signed-off-by: Douglas Anderson <dianders at chromium.org>
>>> ---
>>>
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++--------------------
>>>    1 file changed, 8 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> index ba8e9d3cf0fe..0e2024defd79 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> @@ -1350,35 +1350,20 @@ static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu,
>>>                u32 revn)
>>>    {
>>>        struct opp_table *opp_table;
>>> -     struct nvmem_cell *cell;
>>>        u32 supp_hw = UINT_MAX;
>>> -     void *buf;
>>> -
>>> -     cell = nvmem_cell_get(dev, "speed_bin");
>>> -     /*
>>> -      * -ENOENT means that the platform doesn't support speedbin which is
>>> -      * fine
>>> -      */
>>> -     if (PTR_ERR(cell) == -ENOENT)
>>> -             return 0;
>>> -     else if (IS_ERR(cell)) {
>>> -             DRM_DEV_ERROR(dev,
>>> -                             "failed to read speed-bin. Some OPPs may not be supported by hardware");
>>> -             goto done;
>>> -     }
>>> +     u16 speedbin;
>>> +     int ret;
>>>
>>> -     buf = nvmem_cell_read(cell, NULL);
>>
>> I think the issue here is not passing len pointer which should return
>> how many bytes the cell is!
>>
>> Then from there we can decide to do le16_to_cpu or le32_to_cpu or not!
>> This will also future proof the code to handle speed_bins of different
>> sizes!
> 
> I think what you're saying is that you want to copy/paste this code
> (or something similar) everywhere that accesses an nvmem cell.  Is
> that correct?  ...or maybe you can suggest some smaller / shorter code
> that I'm missing?
> 

It depends what the consumer is doing! If it is already aware of what 
size of data its expecting then you can use nvmem_cell_read_u8/16/32/64 
variants, however it wants to do bit more with the data then 
nvmem_cell_read() should give more flexibility!

> ---
> 
> {
>    struct nvmem_cell *cell;
>    ssize_t len;
>    char *ret;
>    int i;
> 
>    *data = 0;
> 
>    cell = nvmem_cell_get(dev, cname);
>    if (IS_ERR(cell)) {
>      if (PTR_ERR(cell) != -EPROBE_DEFER)
>        dev_err(dev, "undefined cell %s\n", cname);
>      return PTR_ERR(cell);
>    }
> 
>    ret = nvmem_cell_read(cell, &len);
>    nvmem_cell_put(cell);
>    if (IS_ERR(ret)) {
>      dev_err(dev, "can't read cell %s\n", cname);
>      return PTR_ERR(ret);
>    }
> 
>    for (i = 0; i < len; i++)
>      *data |= ret[i] << (8 * i);
> 
>    kfree(ret);
>    dev_dbg(dev, "efuse read(%s) = %x, bytes %zd\n", cname, *data, len);
> 
>    return 0;
> }
> 
> ---
> 
> The above code is from cpr_read_efuse() in "cpr.c".  I mentioned in
> the cover letter that I thought about doing this and decided it wasn't
> a great idea.  There should be _some_ function in the nvmem core that
> says: there's an integer that's 32-bits or less stored in nvmem.

There is no such helper function other than using the above snippet to do.

> Please read it for me.  If you don't think we can use one of the
> existing functions for that, would you be opposed to me creating a new
> one?

I have no issue in adding a new helper function in nvmem to allow such 
thing!

> 
> ---
> 
> In any case, while we discuss what we should do long term, I still
> hope that Rob can merge this patch since it fixes the bug.

Yes, I agree this definitely fixes the mentioned bug!
> 
> -Doug
>