[PATCH v1] drm/ci: enable lockdep detection
Vignesh Raman
vignesh.raman at collabora.com
Fri Sep 13 12:19:02 UTC 2024
Hi Rob, Helen,
On 14/08/24 23:11, Rob Clark wrote:
> On Wed, Aug 14, 2024 at 2:42 AM Vignesh Raman
> <vignesh.raman at collabora.com> wrote:
>>
>> Hi Helen,
>>
>> On 14/08/24 01:44, Helen Mae Koike Fornazier wrote:
>>>
>>>
>>>
>>>
>>> ---- On Tue, 13 Aug 2024 02:26:48 -0300 Vignesh Raman wrote ---
>>>
>>> > Hi Helen,
>>> >
>>> > On 13/08/24 01:47, Helen Mae Koike Fornazier wrote:
>>> > >
>>> > > Hi Vignesh,
>>> > >
>>> > > Thanks for your patch.
>>> > >
>>> > >
>>> > > ---- On Mon, 12 Aug 2024 08:20:28 -0300 Vignesh Raman wrote ---
>>> > >
>>> > > > We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
>>> > > > This will output warnings when kernel locking errors are encountered
>>> > > > and will continue executing tests. To detect if lockdep has been
>>> > > > triggered, check the debug_locks value in /proc/lockdep_stats after
>>> > > > the tests have run. When debug_locks is 0, it indicates that lockdep
>>> > > > has detected issues and turned itself off. So check this value and
>>> > > > exit with an error if lockdep is detected.
>>> > >
>>> > > Should we exit with an error? Or with a warning? (GitLab-CI supports that).
>>> > > Well, I guess it is serious enough.
>>> >
>>> > I think we can exit with an error since we check the status at the end
>>> > of the tests.
>>>
>>> I mean, we can exit with a specific error and configure this specific error in gitlab-ci to be a warning,
>>> so the job will be yellow and not red.
>>>
>>> But maybe the lockdep issue should be a strong error.
>>
>> Yes agree. We can exit with an error for lockdep issue instead of a warning.
>
> I think that is too strong, lockdep can warn about things which can
> never happen in practice. (We've never completely solved some of the
> things that lockdep complains about in runpm vs shrinker reclaim.)
>
> Surfacing it as a warning is fine.
Will send another patch which will exit with an error if lockdep is
detected and configure it as a warning in GitLab CI.
Regards,
Vignesh
>
> BR,
> -R
>
>>>
>>> >
>>> > >
>>> > > Should we also track on the xfail folder? So we can annotate those errors as well?
>>> >
>>> > Do you mean reporting this error in expectation files?
>>>
>>> I wonder if there will be cases were we are getting this error and we should ignore it, so in the code
>>> we should check the xfail files to see if we should exit with an error or ignore it.
>>>
>>> For instance, if we have a case where we are having this error, and it is flaky, we might want to add it
>>> to the flakes file list.
>>>
>>> Maybe this is not the case, I'm just wondering.
>>
>>
>> The tests are passing but log shows lockdep warning
>> (https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711).
>>
>> Moreover if the lockdep warning is emitted, lockdep will not continue to
>> run and there is no need to check this warning for each tests.
>> So added the check at the end of the tests.
>>
>>>
>>>
>>> >
>>> > > Did you have an entire pipeline with this? To see if everything is still green?
>>> >
>>> > Yes. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/62177711
>>> >
>>> > This is a test branch in which I reverted a fix for the lockdep issue.
>>> > We see 'WARNING: bad unlock balance detected!' in logs and pipeline is
>>> > still green.
>>>
>>> But with your patch, it would red right?
>>
>> Yes it would fail and the pipeline will be red.
>>
>>> With the current patch, is the pipeline still all green?
>>
>> With this current patch, it will fail.
>> Pipeline link to show lockdep_stats before and after tests,
>> https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>>
>> Regards,
>> Vignesh
>>
>>>
>>> Regards,
>>> Helen
>>>
>>> >
>>> > Regards,
>>> > Vignesh
>>> >
>>> > >
>>> > > Helen
>>> > >
>>> > > >
>>> > > > Signed-off-by: Vignesh Raman vignesh.raman at collabora.com>
>>> > > > ---
>>> > > >
>>> > > > v1:
>>> > > > - Pipeline link to show lockdep_stats before and after tests,
>>> > > > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1246721
>>> > > >
>>> > > > ---
>>> > > > drivers/gpu/drm/ci/igt_runner.sh | 11 +++++++++++
>>> > > > 1 file changed, 11 insertions(+)
>>> > > >
>>> > > > diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
>>> > > > index f38836ec837c..d2c043cd8c6a 100755
>>> > > > --- a/drivers/gpu/drm/ci/igt_runner.sh
>>> > > > +++ b/drivers/gpu/drm/ci/igt_runner.sh
>>> > > > @@ -85,6 +85,17 @@ deqp-runner junit \
>>> > > > --limit 50 \
>>> > > > --template "See https://$CI_PROJECT_ROOT_NAMESPACE.pages.freedesktop.org/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/results/{{testcase}}.xml"
>>> > > >
>>> > > > +# Check if /proc/lockdep_stats exists
>>> > > > +if [ -f /proc/lockdep_stats ]; then
>>> > > > + # If debug_locks is 0, it indicates lockdep is detected and it turns itself off.
>>> > > > + debug_locks=$(grep 'debug_locks:' /proc/lockdep_stats | awk '{print $2}')
>>> > > > + if [ "$debug_locks" -eq 0 ]; then
>>> > > > + echo "LOCKDEP issue detected. Please check dmesg logs for more information."
>>> > > > + cat /proc/lockdep_stats
>>> > > > + ret=1
>>> > > > + fi
>>> > > > +fi
>>> > > > +
>>> > > > # Store the results also in the simpler format used by the runner in ChromeOS CI
>>> > > > #sed -r 's/(dmesg-warn|pass)/success/g' /results/results.txt > /results/results_simple.txt
>>> > > >
>>> > > > --
>>> > > > 2.43.0
>>> > > >
>>> > > >
>>> >
More information about the dri-devel
mailing list