[PATCH 1/3] drm/amdgpu: fix a typo
Marek Olšák
maraeo at gmail.com
Thu Jun 22 23:54:15 UTC 2017
That's all nice, but does it improve performance? Have you been able
to measure some performance difference with that code? Were you
targeting a specific inefficiency you had seen e.g. with a CPU
profiler?
Marek
On Thu, Jun 22, 2017 at 8:19 PM, axie <axie at amd.com> wrote:
> To clarify, local IRQ is disabled by calling raw_local_irq_save(flags);
>
> Function __lock_acquire double checks that the local IRQ is really disabled.
>
>
>
> On 2017-06-22 01:34 PM, axie wrote:
>>
>> Hi Marek,
>>
>> Spin lock and spin unlock is fast. But it is not so fast compared with
>> atomic, which is a single CPU instruction in x86.
>>
>>
>> 1. spinlock does NOT allow preemption at local CPU. Let us have a look at
>> how spin lock was implemented.
>>
>> static inline void __raw_spin_lock(raw_spinlock_t *lock)
>> {
>> preempt_disable(); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<--This is
>> memory barrier operation too.
>> spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
>> LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>> }
>>
>> 2. A function __lock_acquire called by spinlock. The function is so long
>> that I would not attach all of it here.
>>
>> There is atomic operation inside and 12 meta data updates and 14 if
>> statements and it calls quite some other functions.
>>
>> Note that it disable IRQ...
>>
>> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>> int trylock, int read, int check, int hardirqs_off,
>> struct lockdep_map *nest_lock, unsigned long ip,
>> int references, int pin_count)
>> {
>> struct task_struct *curr = current;
>> struct lock_class *class = NULL;
>> struct held_lock *hlock;
>> unsigned int depth;
>> int chain_head = 0;
>> int class_idx;
>> u64 chain_key;
>>
>> if (unlikely(!debug_locks))
>> return 0;
>>
>> /*
>> * Lockdep should run with IRQs disabled, otherwise we could
>> * get an interrupt which would want to take locks, which would
>> * end up in lockdep and have you got a head-ache already?
>> */
>> if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) <<<<<<<<<<<<<<<Disable IRQ
>> return 0;
>>
>> ....
>>
>> 3. Another function called by spinlock in a higher level:
>>
>> void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>
>> int trylock, int read, int check,
>> struct lockdep_map *nest_lock, unsigned long ip)
>> {
>> unsigned long flags;
>>
>> if (unlikely(current->lockdep_recursion))
>> return;
>>
>> raw_local_irq_save(flags);
>> check_flags(flags);
>>
>> current->lockdep_recursion = 1;
>> trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock,
>> ip);
>> __lock_acquire(lock, subclass, trylock, read, check,
>> irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
>> current->lockdep_recursion = 0;
>> raw_local_irq_restore(flags);
>> }
>>
>>
>> Thanks,
>>
>> Alex Bin
>>
>>
>> On 2017-06-22 12:27 PM, Marek Olšák wrote:
>>>
>>> On Thu, Jun 22, 2017 at 5:33 PM, Xie, AlexBin <AlexBin.Xie at amd.com>
>>> wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>> In fact, the change from spinlock to atomic is quite painful. When I
>>>> started, I thought it was easy but later I found there might be race
>>>> condition here and there. Now I think the change looks more robust. In
>>>> kernel source, there are several other drivers used the same trick.
>>>>
>>>>
>>>> On the other hand, I think the logic itself might be optimized
>>>> considering
>>>> the locking. But I had spent quite some effort to maintain original
>>>> logic.
>>>
>>> It seems quite complicated and I don't know if there is any
>>> performance benefit. Spinlocks are nice because they allow preemption.
>>>
>>> It would be more interesting to merge the CS and BO_LIST ioctls into one.
>>>
>>> Marek
>>
>>
>
More information about the amd-gfx
mailing list