[Intel-gfx] [PATCH v3] drm/i915: Remove unsafe i915.enable_rc6
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Wed Nov 1 16:09:47 UTC 2017
+ Kimmo and Paul
On Wed, 2017-11-01 at 07:43 -0700, Ben Widawsky wrote:
> On 17-11-01 14:07:28, Joonas Lahtinen wrote:
> > On Mon, 2017-10-30 at 10:48 -0700, Rodrigo Vivi wrote:
> > > On Mon, Oct 30, 2017 at 01:00:51PM +0000, David Weinehall wrote:
> > > > On Fri, Oct 27, 2017 at 01:57:09PM -0700, Daniele Ceraolo Spurio wrote:
> > > > >
> > > > >
> > > > > On 26/10/17 03:32, Chris Wilson wrote:
> > > > > > It has been many years since the last confirmed sighting (and fix) of an
> > > > > > RC6 related bug (usually a system hang). Remove the parameter to stop
> > > > > > users from setting dangerous values, as they often set it during triage
> > > > > > and end up disabling the entire runtime pm instead (the option is not a
> > > > > > fine scalpel!).
> > > > > >
> > > > > > Furthermore, it allows users to set known dangerous values which were
> > > > > > intended for testing and not for production use. For testing, we can
> > > > > > always patch in the required setting without having to expose ourselves
> > > > > > to random abuse.
> > > > > >
> > > > > > v2: Fixup NEEDS_WaRsDisableCoarsePowerGating fumble, and document the
> > > > > > lack of ilk support better.
> > > > > > v3: Clear intel_info->rc6p if we don't support rc6 itself.
> > > > > >
> > > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > > > Cc: Imre Deak <imre.deak at intel.com>
> > > > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > > > Acked-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > > > ---
> > > > >
> > > > > I think that for execution/debug on early silicon we might still want the
> > > > > ability to turn features like RC6 off. Maybe we can add a debug kconfig to
> > > > > force info->has_rc6 = 0? Not a blocker to this patch but worth considering
> > > > > IMO.
> > > >
> > > > Most of the BIOSes I've seen on our RVPs have had an option to disable
> > > > RC6.
> > >
> > > BIOS option don't block our code to run and set some MMIOs.
> > > Not sure how the GPU will behave on such cases.
> > >
> > > I like the idea of removing some and keeping the parameters clean.
> > > But there are few ones like RC6 and disable_power_wells that are very
> > > useful on platform enabling and also when assisting others to debug issues.
> > >
> > > For instance right now that we fixed RC6 on CNL someone told that
> > > he believe seeing more hangs, so I immediately asked to boot with
> > > i915.enable_rc6=0 to double check. It is easier and straighforward
> > > to direct them to the unsafe param than to ask them to compile the code
> > > with different options or to use some BIOS options that we are not sure.
> > >
> > > Also on bug triage some options like this are helpful.
> > >
> > > Also BIOS and compile are saved flags. So if you need to do a quick test
> > > you have to save it, and then unsave later. Parameters are very convinient
> > > for 1 boot only check.
> >
> > It's convenient for sure, but the unsafe module parameters seems to be
> > finding their way into way too many HOWTOs, and from there to some
> > "productized" use-cases. Chris states that setting .enable_rc6=0 to
> > solving an issue on publicly shipping products has been some years ago,
> > so I don't see a need for carrying this.
> >
> > We shouldn't allow the convenience of not having to change one line and
> > recompile kernel during development to affect the end-users who are
> > Googling how to get the best performance out of their hardware (I could
> > mention some distro here).
> >
> > This seems the like the best option as I don't think introducing kernel
> > parameters that only exists on debug builds would be too convenient
> > either. It'd maybe just add more confusion.
> >
> > Regards, Joonas
>
> I believe the ability to disable RC6 is valuable not just for debugging
> purposes. Folks with very latency sensitive workloads are often willing to
> forego power savings. The real problem I see is that we don't test without rc6
> in our setup, which indeed makes it unsafe. As such, I see the other option here
> going back to the ability to toggle rc6 after load (either module parameter, or
> make it sysfs), and actually run some subset of our workloads with RC6. I
> suspect people will poop on that suggestion, but I figured I'd mention.
I agree there, but by my understanding there's really no ask to support
the feature in upstream. And the original motive from Chris to drop the
feature is that it's unsafe as it currently is.
So unless we've got the resources to bring it back from the unsafe
zone, I think we should drop it like this patch proposes.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list