[Intel-gfx] [PATCH] drm/i915: Fix NULL ptr deref by checking new_crtc_state

Lisovskiy, Stanislav stanislav.lisovskiy at intel.com
Fri May 5 18:18:02 UTC 2023


On Fri, May 05, 2023 at 07:44:11PM +0300, Ville Syrjälä wrote:
> On Fri, May 05, 2023 at 06:55:18PM +0300, Lisovskiy, Stanislav wrote:
> > On Fri, May 05, 2023 at 05:17:06PM +0300, Ville Syrjälä wrote:
> > > On Fri, May 05, 2023 at 05:05:55PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Fri, May 05, 2023 at 04:57:54PM +0300, Ville Syrjälä wrote:
> > > > > On Fri, May 05, 2023 at 04:42:33PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > On Fri, May 05, 2023 at 04:28:50PM +0300, Ville Syrjälä wrote:
> > > > > > > On Fri, May 05, 2023 at 04:21:16PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > On Fri, May 05, 2023 at 04:11:52PM +0300, Ville Syrjälä wrote:
> > > > > > > > > On Fri, May 05, 2023 at 03:54:58PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > On Fri, May 05, 2023 at 03:46:40PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > On Fri, May 05, 2023 at 03:27:51PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > On Fri, May 05, 2023 at 03:09:01PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > On Fri, May 05, 2023 at 02:41:24PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:25:46PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:20:17PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:06:34PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:05:27PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 02:02:43PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 01:58:03PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 01:54:14PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > > > > > > > > > > > On Fri, May 05, 2023 at 11:22:12AM +0300, Stanislav Lisovskiy wrote:
> > > > > > > > > > > > > > > > > > > > > > intel_atomic_get_new_crtc_state can return NULL, unless crtc state wasn't
> > > > > > > > > > > > > > > > > > > > > > obtained previously with intel_atomic_get_crtc_state, so we must check it
> > > > > > > > > > > > > > > > > > > > > > for NULLness here, just as in many other places, where we can't guarantee
> > > > > > > > > > > > > > > > > > > > > > that intel_atomic_get_crtc_state was called.
> > > > > > > > > > > > > > > > > > > > > > We are currently getting NULL ptr deref because of that, so this fix was
> > > > > > > > > > > > > > > > > > > > > > confirmed to help.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Fixes: 74a75dc90869 ("drm/i915/display: move plane prepare/cleanup to intel_atomic_plane.c")
> > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
> > > > > > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > > > > > >  drivers/gpu/drm/i915/display/intel_atomic_plane.c | 4 ++--
> > > > > > > > > > > > > > > > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > index 9f670dcfe76e..4125ee07a271 100644
> > > > > > > > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > > > > > > > > > > > @@ -1029,7 +1029,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > > > > > > > > > > > >  	int ret;
> > > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > >  	if (old_obj) {
> > > > > > > > > > > > > > > > > > > > > > -		const struct intel_crtc_state *crtc_state =
> > > > > > > > > > > > > > > > > > > > > > +		const struct intel_crtc_state *new_crtc_state =
> > > > > > > > > > > > > > > > > > > > > >  			intel_atomic_get_new_crtc_state(state,
> > > > > > > > > > > > > > > > > > > > > >  							to_intel_crtc(old_plane_state->hw.crtc));
> > > > > > > > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > > > > > > > @@ -1044,7 +1044,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > > > > > > > > > > > >  		 * This should only fail upon a hung GPU, in which case we
> > > > > > > > > > > > > > > > > > > > > >  		 * can safely continue.
> > > > > > > > > > > > > > > > > > > > > >  		 */
> > > > > > > > > > > > > > > > > > > > > > -		if (intel_crtc_needs_modeset(crtc_state)) {
> > > > > > > > > > > > > > > > > > > > > > +		if (new_crtc_state && intel_crtc_needs_modeset(new_crtc_state)) {
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > NAK. We need to fix the bug instead of paparing over it.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I had pushed this already.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > It didn't even finish CI. Please revert.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Swati did run CI and verified that fix helps. I'm _not_ going to revert.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Fine. I'll do it.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Problem is that you don't even care to explain, why this fix is wrong, but simply
> > > > > > > > > > > > > > > > act in authoritarian way, instead of having constructive discussion.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I've explanined this one about a hundred times. The NULL pointer should
> > > > > > > > > > > > > > > not happen. Someone needs to actually analyze what is happening instead
> > > > > > > > > > > > > > > of just adding randomg NULL checks all over the place.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I do get this point. However why are we doing those check in other places then?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We do then when they are actually necessary.
> > > > > > > > > > > > 
> > > > > > > > > > > > Well but for example when we do check like if(new_bw_state) in intel_bw.c,
> > > > > > > > > > > > we are also might be having potentially some silent bugs.
> > > > > > > > > > > > Would you guarantee that if we remove all if(crtc_state) and if(new_bw_state) checks
> > > > > > > > > > > > in our code, that there won't be NULL pointer dereferences? I bet you don't.
> > > > > > > > > > > 
> > > > > > > > > > > We have the checks where they are needed. The check in
> > > > > > > > > > > intel_bw_atomic_check() (if that's the one you mean)
> > > > > > > > > > > looks entirely correct to me.
> > > > > > > > > > 
> > > > > > > > > > Typo in my prev message, I meant intel_atomic_get_bw_state..but common idea is the same.
> > > > > > > > > 
> > > > > > > > > get_state() vs. get_{new,old}_state() are entirely different
> > > > > > > > > things.
> > > > > > > > > 
> > > > > > > > > You use get_state() when you really want the state to be
> > > > > > > > > included, and either
> > > > > > > > > - know the state isn't included already, or
> > > > > > > > > - you don't know wether the might have alerady been included
> > > > > > > > > 
> > > > > > > > > And one must of course remember that get_state() can
> > > > > > > > > - fail so error handling is needed
> > > > > > > > > - only be used during the check phase, and is illegal during the
> > > > > > > > >   commit phase.
> > > > > > > > 
> > > > > > > > Sure I know this. I even remember we discussed this many times.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > The get_{new,old}_state() (or the various for loop variants)
> > > > > > > > > you can use when you either:
> > > > > > > > > - know that the state is included already
> > > > > > > > > - are fine with the state potentially not being included
> > > > > > > > 
> > > > > > > > Don't you see that it is a bit of a contradiction in those 2 above??
> > > > > > > > 
> > > > > > > > You can't be "know that the state is included already" and 
> > > > > > > > "are fine with the state potentially not being included" same time :)
> > > > > > > > 
> > > > > > > > Those 2 above actually mean that you CANNOT be sure, because you 
> > > > > > > > are "fine with the state potentially not being included"! 
> > > > > > > > Otherwise second one would have been redundant.
> > > > > > > 
> > > > > > > No. You are either fine with NULL, XOR you know that
> > > > > > > the state is there already. There is no contradiction.
> > > > > > 
> > > > > > I do get that. But that way of calling the function is veeery counterintuitive.
> > > > > > Means that you call it and check for NULLness..if you are fine with NULL and
> > > > > > don't check for NULL..if you aren't fine with it and expect the state to be there.
> > > > > > 
> > > > > > That is really probabilistic design.
> > > > > > I think we must enumerate all the cases where 
> > > > > 
> > > > > Not sure what you mean with enumerate. You can't just delcare
> > > > > somewhere globally that in functions X and Y NULL is fine,
> > > > > and in Z it is not. It depends on how X,Y,Z are implemented
> > > > > and it may change any time the implementation is changed.
> > > > > 
> > > > > 
> > > > > > 1) we expect new_state to be there and
> > > > > >    then we don't need even any checks to be there, because we will then rely on get_state.
> > > > > > 2) we don't expect it to be there and then call get_state always.
> > > > > > 
> > > > > > Because if you are "fine" with new_state being NULL, why even calling it?
> > > > > 
> > > > > Because
> > > > > !NULL -> you have some work to do
> > > > >  NULL -> you don't have work to do
> > > > 
> > > > Pretty sure we could find a way not to call it at all in case if no work is needed,
> > > > and call it without any checks, if work is needed.
> > > > 
> > > > You typically get new bw state to recalculate and compare with old state, however
> > > > there has to be some place where you decide whether to call get_bw/crtc_state or not.
> > > > So from there, this could have been propagated to the moment where we decide where
> > > > to call get_new_bw/crtc_state or not. Then no checks would have been needed.
> > > > And NULL would always mean a bug.
> > > > Also that would be a lot more simple, following KISS principle.
> > > 
> > > You'd need to separately track each case in some boolean/etc.
> > > in the overall atomic state. Doable? Sure. Simpler? Don't see
> > > it. It's the exact same code with the NULL check just replaced
> > > with some other check. And you must additionally remember to
> > > sprinkle those bool assignments around.
> > 
> > No-no-no. This is how intel_atomic_get_bw_state is called:
> > 
> > for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
> > 	new_bw_state = intel_atomic_get_bw_state(state);
> 
> That's just because we don't need to do anything to the 
> bw state unless some crtc is doing stuff.
> 
> > 
> > 
> > Basically in any subsequent check, if it is called after that,
> > whenever its called under for_each_new_intel_crtc_in_state, you 
> > can be sure that intel_atomic_get_new_bw_state returns non-NULL.
> 
> intel_atomic_get_new_bw_state() is never called from a loop
> like that. At least I can't immediately see a single place
> where that would happen.

We used to do this before, however here I just put this as an example.

> 
> And there is no guarantee anyway that a crtc being part
> of the commit would imply that bw state is also included.
> The crtc could have been added to the commit after the
> code ran which adds the bw state.

Well-well, crtc has been added to the state after code which adds
the bw state ran.. Does it mean that we are actually
then getting intel_atomic_get_new_bw_state as NULL, despite
we have a crtc in state? Sounds like you just described one of the possible 
similar scenarios, why we are having this bug.
I.e we ran that code:

for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
     new_bw_state = intel_atomic_get_bw_state(state);

but as you mentioned this doesn't mean that we got a bw state
because there might have been no crtc.
Then it gets added later and then we call intel_atomic_get_new_bw_state
and bum.
But then checking for NULL is also wrong, because we should have called
intel_atomic_get_bw_state for the newly added crtc?..

Sometimes I think, we should make some kind of a doc, with a guidelines,
similar like we have for some other areas, describing how should code
flow be in each of the typical scenarios, plus the guidelines, how to use
it.

Stan

> 
> -- 
> Ville Syrjälä
> Intel


More information about the Intel-gfx mailing list