[Intel-gfx] [PATCH] drm/i915: Fix NULL ptr deref by checking new_crtc_state

Fri May 5 12:52:12 UTC 2023

On Fri, May 05, 2023 at 03:46:40PM +0300, Ville Syrjälä wrote:
> On Fri, May 05, 2023 at 03:27:51PM +0300, Lisovskiy, Stanislav wrote:
> > On Fri, May 05, 2023 at 03:09:01PM +0300, Ville Syrjälä wrote:
> > > On Fri, May 05, 2023 at 02:41:24PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Fri, May 05, 2023 at 02:25:46PM +0300, Ville Syrjälä wrote:
> > > > > On Fri, May 05, 2023 at 02:20:17PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > On Fri, May 05, 2023 at 02:06:34PM +0300, Ville Syrjälä wrote:
> > > > > > > On Fri, May 05, 2023 at 02:05:27PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > On Fri, May 05, 2023 at 02:02:43PM +0300, Ville Syrjälä wrote:
> > > > > > > > > On Fri, May 05, 2023 at 01:58:03PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > > > > > On Fri, May 05, 2023 at 01:54:14PM +0300, Ville Syrjälä wrote:
> > > > > > > > > > > On Fri, May 05, 2023 at 11:22:12AM +0300, Stanislav Lisovskiy wrote:
> > > > > > > > > > > > intel_atomic_get_new_crtc_state can return NULL, unless crtc state wasn't
> > > > > > > > > > > > obtained previously with intel_atomic_get_crtc_state, so we must check it
> > > > > > > > > > > > for NULLness here, just as in many other places, where we can't guarantee
> > > > > > > > > > > > that intel_atomic_get_crtc_state was called.
> > > > > > > > > > > > We are currently getting NULL ptr deref because of that, so this fix was
> > > > > > > > > > > > confirmed to help.
> > > > > > > > > > > > 
> > > > > > > > > > > > Fixes: 74a75dc90869 ("drm/i915/display: move plane prepare/cleanup to intel_atomic_plane.c")
> > > > > > > > > > > > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy at intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/gpu/drm/i915/display/intel_atomic_plane.c | 4 ++--
> > > > > > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > index 9f670dcfe76e..4125ee07a271 100644
> > > > > > > > > > > > --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> > > > > > > > > > > > @@ -1029,7 +1029,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > >  	int ret;
> > > > > > > > > > > >  
> > > > > > > > > > > >  	if (old_obj) {
> > > > > > > > > > > > -		const struct intel_crtc_state *crtc_state =
> > > > > > > > > > > > +		const struct intel_crtc_state *new_crtc_state =
> > > > > > > > > > > >  			intel_atomic_get_new_crtc_state(state,
> > > > > > > > > > > >  							to_intel_crtc(old_plane_state->hw.crtc));
> > > > > > > > > > > >  
> > > > > > > > > > > > @@ -1044,7 +1044,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
> > > > > > > > > > > >  		 * This should only fail upon a hung GPU, in which case we
> > > > > > > > > > > >  		 * can safely continue.
> > > > > > > > > > > >  		 */
> > > > > > > > > > > > -		if (intel_crtc_needs_modeset(crtc_state)) {
> > > > > > > > > > > > +		if (new_crtc_state && intel_crtc_needs_modeset(new_crtc_state)) {
> > > > > > > > > > > 
> > > > > > > > > > > NAK. We need to fix the bug instead of paparing over it.
> > > > > > > > > > 
> > > > > > > > > > I had pushed this already.
> > > > > > > > > 
> > > > > > > > > It didn't even finish CI. Please revert.
> > > > > > > > 
> > > > > > > > Swati did run CI and verified that fix helps. I'm _not_ going to revert.
> > > > > > > 
> > > > > > > Fine. I'll do it.
> > > > > > 
> > > > > > Problem is that you don't even care to explain, why this fix is wrong, but simply
> > > > > > act in authoritarian way, instead of having constructive discussion.
> > > > > 
> > > > > I've explanined this one about a hundred times. The NULL pointer should
> > > > > not happen. Someone needs to actually analyze what is happening instead
> > > > > of just adding randomg NULL checks all over the place.
> > > > 
> > > > I do get this point. However why are we doing those check in other places then?
> > > 
> > > We do then when they are actually necessary.
> > 
> > Well but for example when we do check like if(new_bw_state) in intel_bw.c,
> > we are also might be having potentially some silent bugs.
> > Would you guarantee that if we remove all if(crtc_state) and if(new_bw_state) checks
> > in our code, that there won't be NULL pointer dereferences? I bet you don't.
> 
> We have the checks where they are needed. The check in
> intel_bw_atomic_check() (if that's the one you mean)
> looks entirely correct to me.

They are needed because there might the case, when intel_atomic_get_crtc
might not get called right?

> 
> > 
> > But IF you do, then lets remove it everywhere then, why keeping it there, if we are sure! :))
> > 
> > > 
> > > > Moreover I can remember that you told me to do this check even, when were reviewing
> > > > my other patches. Because we always have to check result of this function, as it
> > > > can be NULL, in case if intel_atomic_get_crtc_state wasn't called before, which
> > > > could happen even in normal case, as I understand.
> > > 
> > > You can't apply that kind of general rule. Whether the crtc should have
> > > already been added to the state or not is case dependent. In this case
> > > that should never be the case since the plane was already added to the
> > > state, and thus its crtc should also have been added.
> > 
> > Well it is kinda weird, that we don't have clear rules here.
> > As I understand this is Bigjoiner, so most likely that was the reason why intel_get_crtc_state
> > wasn't called.
> > I mean I was anyway planning to continue investigating that Bigjoiner logic here in fact,
> > however that fix could help at least CI team to continue testing further.
> 
> What's the point of testing code that is known to be broken in
> ways no one currently understands. Any results you get are entirely
> suspect.

Any code has some issues, what we do is trying to gradually fix those.

> 
> > 
> > > 
> > > > 
> > > > If we want to understand why it happens in particular here, great lets investigate,
> > > > however I don't get why we are having same checks everywhere all over the place then
> > > > and I can even find your words, that we need to do those checks as well.
> > > > 
> > > > Also if this doesn't break anything,
> > > 
> > > You can't know that. You're trading a clearly reproducible
> > > bug with a silent bug that can cause who knows what other
> > > issues. That one will be impossible to debug.
> > 
> > Answered above...
> > 
> > > 
> > > > improves our CI results, not violating our coding
> > > > practices, because once again worth mentioning we do check new_crtc_state for NULLness
> > > > in many places.. then why it can't be the fix?
> > > > If we find better solution thats great, but there are plenty of other things as well,
> > > > if you haven't noticed.
> > > > 
> > > > Can we somehow _stop_ these childish kindergarden level review arguing warfare, at least 
> > > > for sake of professional efficiency? 
> > > 
> > > Not sure what that kindergarten level stuff is. I just
> > > NAKed the patch.
> > 
> > Well, I'm glad, we are at least discussing now, why you NAKed it, initially without
> > having discussion first.
> 
> Like I said, this specific bug has been discussed before, and IIRC 
> we have at least one internal bug report about it, not sure if
> there's also a gitlab issue. Am I to assume you haven't actually
> read those?

Well that is where I started actually.

> 
> > 
> > > 
> > > > 
> > > > For all my next patches I will always add you to CC and _personally_ will ask to review,
> > > > even though quite often when I do this - I get nothing.
> > > 
> > > I can't review everything in detail. But in any case you should
> > > at least wait a day or two for review feedback, and you definitely
> > > need to wait for CI results as well.
> > 
> > Sometimes I wait for weeks.
> 
> I presume you mean review feedback here rather than CI results?
> I would say if a week has passed by and you need more input then
> ping people directly (for me pinging on irc is probably the
> thing that works best).
> 
> If you need to wait for CI results for that long then you need
> to have a serious talk with the CI team.

Yep, regarding pinging I agree, lets discuss offline regarding
how we could improve that.

> 
> -- 
> Ville Syrjälä
> Intel