[early pull] drm/msm: drm-msm-next-2021-07-28 for v5.15

Rob Clark robdclark at gmail.com
Thu Jul 29 02:50:52 UTC 2021


On Wed, Jul 28, 2021 at 7:18 PM Caleb Connolly
<caleb.connolly at linaro.org> wrote:
>
>
>
> On 29/07/2021 02:02, Rob Clark wrote:
> > Jordan, any idea if more frequent frequency changes would for some
> > reason make a630 grumpy?  I was expecting it should be somewhat
> > similar to a618 (same GMU fw, etc).  The main result of that patch
> > should be clamping to min freq when gpu goes idle, and the toggling
> > back to devfreq provided freq on idle->active transition.  So there
> > might be more frequent freq transitions.
> >
> > Caleb, I don't suppose you could somehow delay starting UI and get
> > some traces?  Something along the lines of:
> >
> >    localhost ~ # cd /sys/kernel/debug/tracing/
> >    localhost /sys/kernel/debug/tracing # echo 1 > events/drm_msm_gpu/enable
> >    localhost /sys/kernel/debug/tracing # echo 1 > tracing_on
> >    localhost /sys/kernel/debug/tracing # cat trace_pipe
> Sure, here's the last ~1k lines of the trace logs:
> https://paste.ubuntu.com/p/XMKjKDWxYg/
> And what I managed to get from dmesg before the crash (mostly the same
> as before): https://paste.ubuntu.com/p/kGVtRHDWKH/
> >
> > Does adding an 'if (1) return' at the top of msm_devfreq_idle() help?
> > That should bypass the clamping to min freq when the GPU isn't doing
> > anything and reduce the # of freq transitions.  I suppose we could
> > opt-in to this behavior on a per-gpu basis..
> Yeah, that seems to resolve the issue, although I got the following
> probably unrelated (?) error on rebooting the device:
> [  134.994449] [drm:dpu_encoder_vsync_event_handler:1749] [dpu
> error]invalid parameters

I think that should probably be unrelated..

Based on the traces, I'm seeing rapid toggling between idle freq and
non-idle freq.. but no invalid freq's (assuming the dts opp table is
correct) so I *guess* there is maybe some sort of race condition
communicating with GMU or some other issue with rapid freq transition?
 Maybe Jordan has some ideas.

The earlier dmesg you posted look like gpu getting cranky about what
looks like a valid opcode, and then it goes off into the weeds.. when
you start seeing things like "0xDEAFBEAF" I think that means the GPU
has lost context (ie. power collapse and back, and now it is reading
bogus power-on default values).

I think I can put together a patch to make the "clamp to min freq when
gpu is idle" opt-in so we can enable it per-gpu once someone has
confirmed that it doesn't cause problems.  I guess that would at least
work as a short term solution.  But not sure if that is just papering
over some gpu/gmu bug (or maybe gdsc/clk bug), or if it is a legit
workaround for some limitation..

BR,
-R

>
> I wonder if the PocoPhone F1 has the same problem...
> >
> > BR,
> > -R
> >
> > On Wed, Jul 28, 2021 at 5:35 PM Caleb Connolly
> > <caleb.connolly at linaro.org> wrote:
> >>
> >> Hi Rob,
> >>
> >> This series causes a fatal crash on my Oneplus 6, the device goes to
> >> Qualcomm crashdump mode shortly after reaching UI with the following errors:
> >>
> >> https://paste.ubuntu.com/p/HvjmzZYtgw/
> >>
> >> I did a git bisect and the patch ("drm/msm: Devfreq tuning") seems to be
> >> the cause of the crash, reverting it resolves the issue.
> >>
> >>
> >> On 28/07/2021 21:52, Rob Clark wrote:
> >>> Hi Dave & Daniel,
> >>>
> >>> An early pull for v5.15 (there'll be more coming in a week or two),
> >>> consisting of the drm/scheduler conversion and a couple other small
> >>> series that one was based one.  Mostly sending this now because IIUC
> >>> danvet wanted it in drm-next so he could rebase on it.  (Daniel, if
> >>> you disagree then speak up, and I'll instead include this in the main
> >>> pull request once that is ready.)
> >>>
> >>> This also has a core patch to drop drm_gem_object_put_locked() now
> >>> that the last use of it is removed.
> >>>
> >>> The following changes since commit ff1176468d368232b684f75e82563369208bc371:
> >>>
> >>>     Linux 5.14-rc3 (2021-07-25 15:35:14 -0700)
> >>>
> >>> are available in the Git repository at:
> >>>
> >>>     https://gitlab.freedesktop.org/drm/msm.git drm-msm-next-2021-07-28
> >>>
> >>> for you to fetch changes up to 4541e4f2225c30b0e9442be9eb2fb8b7086cdd1f:
> >>>
> >>>     drm/msm/gem: Mark active before pinning (2021-07-28 09:19:00 -0700)
> >>>
> >>> ----------------------------------------------------------------
> >>> Rob Clark (18):
> >>>         drm/msm: Let fences read directly from memptrs
> >>>         drm/msm: Signal fences sooner
> >>>         drm/msm: Split out devfreq handling
> >>>         drm/msm: Split out get_freq() helper
> >>>         drm/msm: Devfreq tuning
> >>>         drm/msm: Docs and misc cleanup
> >>>         drm/msm: Small submitqueue creation cleanup
> >>>         drm/msm: drop drm_gem_object_put_locked()
> >>>         drm: Drop drm_gem_object_put_locked()
> >>>         drm/msm/submit: Simplify out-fence-fd handling
> >>>         drm/msm: Consolidate submit bo state
> >>>         drm/msm: Track "seqno" fences by idr
> >>>         drm/msm: Return ERR_PTR() from submit_create()
> >>>         drm/msm: Conversion to drm scheduler
> >>>         drm/msm: Drop submit bo_list
> >>>         drm/msm: Drop struct_mutex in submit path
> >>>         drm/msm: Utilize gpu scheduler priorities
> >>>         drm/msm/gem: Mark active before pinning
> >>>
> >>>    drivers/gpu/drm/drm_gem.c                   |  22 --
> >>>    drivers/gpu/drm/msm/Kconfig                 |   1 +
> >>>    drivers/gpu/drm/msm/Makefile                |   1 +
> >>>    drivers/gpu/drm/msm/adreno/a5xx_debugfs.c   |   4 +-
> >>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |   6 +-
> >>>    drivers/gpu/drm/msm/adreno/a5xx_power.c     |   2 +-
> >>>    drivers/gpu/drm/msm/adreno/a5xx_preempt.c   |   7 +-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  12 +-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c       |   6 +-
> >>>    drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |   4 +-
> >>>    drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   6 +-
> >>>    drivers/gpu/drm/msm/msm_drv.c               |  30 ++-
> >>>    drivers/gpu/drm/msm/msm_fence.c             |  53 +----
> >>>    drivers/gpu/drm/msm/msm_fence.h             |  44 +++-
> >>>    drivers/gpu/drm/msm/msm_gem.c               |  94 +-------
> >>>    drivers/gpu/drm/msm/msm_gem.h               |  47 ++--
> >>>    drivers/gpu/drm/msm/msm_gem_submit.c        | 344 +++++++++++++++++-----------
> >>>    drivers/gpu/drm/msm/msm_gpu.c               | 220 ++++--------------
> >>>    drivers/gpu/drm/msm/msm_gpu.h               | 139 ++++++++++-
> >>>    drivers/gpu/drm/msm/msm_gpu_devfreq.c       | 203 ++++++++++++++++
> >>>    drivers/gpu/drm/msm/msm_rd.c                |   6 +-
> >>>    drivers/gpu/drm/msm/msm_ringbuffer.c        |  69 +++++-
> >>>    drivers/gpu/drm/msm/msm_ringbuffer.h        |  12 +
> >>>    drivers/gpu/drm/msm/msm_submitqueue.c       |  53 +++--
> >>>    include/drm/drm_gem.h                       |   2 -
> >>>    include/uapi/drm/msm_drm.h                  |  14 +-
> >>>    26 files changed, 865 insertions(+), 536 deletions(-)
> >>>    create mode 100644 drivers/gpu/drm/msm/msm_gpu_devfreq.c
> >>>
> >>
> >> --
> >> Kind Regards,
> >> Caleb (they/them)
>
> --
> Kind Regards,
> Caleb (they/them)


More information about the dri-devel mailing list