[Intel-gfx] [DMC_BUGFIX_SKL_V2 1/5] drm/i915/skl: Added a check for the hardware status of csr fw before loading.

Daniel Vetter daniel at ffwll.ch
Thu Sep 10 07:45:28 PDT 2015


On Thu, Sep 10, 2015 at 01:58:54AM +0530, Animesh Manna wrote:
> 
> 
> On 9/2/2015 2:24 PM, Daniel Vetter wrote:
> >On Wed, Aug 26, 2015 at 07:40:54PM +0530, Animesh Manna wrote:
> >>
> >>On 8/26/2015 6:40 PM, Daniel Vetter wrote:
> >>>On Wed, Aug 26, 2015 at 01:36:05AM +0530, Animesh Manna wrote:
> >>>>Dmc will restore the csr program except DC9, cold boot,
> >>>>warm reset, PCI function level reset, and hibernate/suspend.
> >>>>
> >>>>intel_csr_load_program() function is used to load the firmware
> >>>>data from kernel memory to csr address space.
> >>>>
> >>>>All values of csr address space will be zero if it got reset and
> >>>>the first byte of csr program is always a non-zero if firmware
> >>>>is loaded successfuly. Based on hardware status will load the
> >>>>firmware.
> >>>>
> >>>>Without this condition check if we overwrite the firmware data the
> >>>>counters exposed for dc5/dc6 (help for debugging) will be nullified.
> >>Bacause of the above reason mentioned just above we need to block firmware loading again.
> >>So only WARN_ON will not help.
> >>
> >>
> >>>>v1: Initial version.
> >>>>
> >>>>v2: Based on review comments from Daniel,
> >>>>- Added a check to know hardware status and load the firmware if not loaded.
> >>>>
> >>>>Cc: Daniel Vetter <daniel.vetter at intel.com>
> >>>>Cc: Damien Lespiau <damien.lespiau at intel.com>
> >>>>Cc: Imre Deak <imre.deak at intel.com>
> >>>>Cc: Sunil Kamath <sunil.kamath at intel.com>
> >>>>Signed-off-by: Animesh Manna <animesh.manna at intel.com>
> >>>>Signed-off-by: Vathsala Nagaraju <vathsala.nagaraju at intel.com>
> >>>>---
> >>>>  drivers/gpu/drm/i915/intel_csr.c | 9 +++++++++
> >>>>  1 file changed, 9 insertions(+)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
> >>>>index ba1ae03..682cc26 100644
> >>>>--- a/drivers/gpu/drm/i915/intel_csr.c
> >>>>+++ b/drivers/gpu/drm/i915/intel_csr.c
> >>>>@@ -252,6 +252,15 @@ void intel_csr_load_program(struct drm_device *dev)
> >>>>  		return;
> >>>>  	}
> >>>>+	/*
> >>>>+	 * Dmc will restore the csr the program except DC9, cold boot,
> >>>>+	 * warm reset, PCI function level reset, and hibernate/suspend.
> >>>>+	 * This condition will help to check if csr address space is reset/
> >>>>+	 * not loaded.
> >>>>+	 */
> >>>Atm we call this from driver load and resume, which doesn seem to cover
> >>>all the cases you mention in the comment. Should this be a WARN_ON
> >>>instead? Or do we have troubles in our init sequence where we load too
> >>>many times?
> >>Yes, the above statement taken from bspec to describe about the special cases dmc will not restore the firmware.
> >>Agree, In our cases cold boot and hibernate/suspend mainly we need to load the firmware again, so in my
> >>second sentence I wanted to comment mainly regarding this condition check added for suspend-hibernate(reset)
> >>and cold boot(not loaded).
> >>
> >>Anyways the same api later can be used to load the firmware from anywhere, so my intention to check firmware loaded or not.
> >>If already loaded then not to overwrite the csr address space to maintain the dc5/dc6 counter value.
> >>
> >>Can the below comment more clear to you.
> >>
> >>	/*
> >>	 * Dmc will restore the csr the program except DC9, cold boot,
> >>	 * warm reset, PCI function level reset, and hibernate/suspend.
> >>	 * If firmware is restored by dmc then no need to load again which
> >>	 * will keep the dc5/dc6 counter exposed by firmware.
> >>	 */
> >>
> >>No issue in init sequence.
> >That seems to still cover all the callers of the function afaics - we do
> >pci resets over suspend resume unconditionally. So I still don't
> >understand where exactly we try to load the dmc firmware in i915.ko when
> >it's already loaded.
> 
> During resume intel_csr_load_program() will be called from
> intel_runtime_resume().
> 
> intel_runtime_resume()-> skl_resume_prepare()-> intel_csr_load_program()
> 
> During Pc10 entry testing I can see dmc is restoring back the firmware always,
> but as you mentioned pci-reset can happen unconditionally, but still then
> also during resume intel_runtime_resume() will be called and based on
> register read of csr-base-address firmware loading will happen.

But in your comment you're saying it won't get restored in case of dc9 and
suspend. So that seems to mismatch what you're saying here (and what the
commit message says) and what the code does. And this function here is
called for resume after suspend/hibernate only.

So I still see a mismatch between what you're saying here and what the
code does vs. what the comment above says. It would be clear if there's
just no comment. And we don't call intel_csr_load_program from random
places just for luck, if there's unecessary calls of that its kinda a bug
in the code (and we should just remove it).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list