Long radeon stalls on recent kernels

Andy Lutomirski luto at amacapital.net
Wed Nov 19 16:07:45 PST 2014


On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer <michel at daenzer.net> wrote:
> On 19.11.2014 09:21, Andy Lutomirski wrote:
>>
>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>
>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>> graphics intensive seems to cause my system to become unusable for
>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>>>> it can take several minutes for me to move my mouse far enough to
>>>> close the tab and get my computer back.
>>>>
>>>> On bootup, I get this warning:
>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>> rv770_restrict_performance_levels_before_switch failed
>>>>
>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>> giving my rather slow graphics.
>>>>
>>>> Are there known issues here?
>>>
>>>
>>>
>>> Can you bisect the kernel, or at least isolate which kernel version first
>>> introduced the problem?
>>
>>
>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>
>> With radeon.dpm=0, I can still trigger short stalls (around one
>> second), but I seem unable to trigger long stalls easily.  (I say
>> easily because, just as I was typing this email, my system stalled for
>> about a minute.)
>
>
> I can only think of two things offhand that could cause such extremely long
> stalls: Swap thrashing or IRQ storms.
>
> With a setup where you can easily trigger long stalls, can you try getting a
> CPU profile for a stall with sysprof or perf?
>
>

Got one with perf:

  16.82%             Xorg  libc-2.18.so                        [.]
__memcpy_sse2_unaligned
   9.20%          swapper  [kernel.kallsyms]                   [k] intel_idle
   1.00%             Xorg  [kernel.kallsyms]                   [k]
evergreen_irq_set
   0.83%          firefox  libxul.so                           [.]
0x0000000001d93281
   0.69%          firefox  libxul.so                           [.]
0x0000000001d932ad
   0.62%          firefox  [kernel.kallsyms]                   [k]
copy_user_generic_string
   0.55%          swapper  [kernel.kallsyms]                   [k]
evergreen_irq_ack
   0.54%          firefox  libpthread-2.18.so                  [.]
pthread_mutex_lock
   0.52%          firefox  libpthread-2.18.so                  [.]
pthread_mutex_unlock
   0.45%             Xorg  [kernel.kallsyms]                   [k]
drm_mm_insert_node_in_range_generic
   0.41%             Xorg  [kernel.kallsyms]                   [k] lock_release
   0.40%             Xorg  [kernel.kallsyms]                   [k] lock_acquire
   0.35%          firefox  firefox                             [.]
0x000000000001245d
   0.33%             Xorg  [kernel.kallsyms]                   [k]
__module_address
   0.31%          firefox  [kernel.kallsyms]                   [k] clear_page_c
   0.29%             Xorg  [kernel.kallsyms]                   [k]
copy_user_generic_string
   0.28%          firefox  firefox                             [.]
0x0000000000013159

and:

Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802
  87.43%          swapper  [kernel.kallsyms]  [k] handle_irq_event_percpu
   7.52%          firefox  [kernel.kallsyms]  [k] handle_irq_event_percpu
   1.84%      irq/36-ahci  [kernel.kallsyms]  [k] handle_irq_event_percpu
   1.14%             Xorg  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.75%      kworker/5:0  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.32%      gnome-shell  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.25%     kworker/5:1H  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.25%  Media D~ode #10  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.19%  ImageDe~er #330  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.07%       pulseaudio  [kernel.kallsyms]  [k] handle_irq_event_percpu

The cycles were with -e cycles:pp, so I think that iret would have
shown up if there were enough IRQs to cause the problem.

I'll build a kernel with latencytop.

--Andy


More information about the dri-devel mailing list