[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

Wed Feb 25 07:43:23 PST 2015

Op 25-02-15 om 16:28 schreef Patrick Baggett:
> On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst <
> maarten.lankhorst at ubuntu.com> wrote:
>
>> Op 25-02-15 om 16:04 schreef Patrick Baggett:
>>> On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst <
>>> maarten.lankhorst at ubuntu.com> wrote:
>>>
>>>> Op 25-02-15 om 15:11 schreef Emil Velikov:
>>>>> On 24 February 2015 at 09:01, Maarten Lankhorst
>>>>> <maarten.lankhorst at ubuntu.com> wrote:
>>>>>> Only add wrapped bo's and bo's that have been exported through flink
>> or
>>>> dma-buf.
>>>>>> This avoids a lock in the common case, and decreases traversal needed
>>>> for importing
>>>>>> a dma-buf or flink.
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at ubuntu.com>
>>>>>> ---
>>>>>>  nouveau/nouveau.c | 47
>> +++++++++++++++++++++++------------------------
>>>>>>  1 file changed, 23 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c
>>>>>> index 1c723b9..d411523 100644
>>>>>> --- a/nouveau/nouveau.c
>>>>>> +++ b/nouveau/nouveau.c
>>>>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo *bo)
>>>>>>         struct nouveau_bo_priv *nvbo = nouveau_bo(bo);
>>>>>>         struct drm_gem_close req = { bo->handle };
>>>>>>
>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>> -       if (nvbo->name) {
>>>>>> +       if (nvbo->head.next) {
>>>>>> +               pthread_mutex_lock(&nvdev->lock);
>>>>>>                 if (atomic_read(&nvbo->refcnt) == 0) {
>>>>>>                         DRMLISTDEL(&nvbo->head);
>>>>>>                         /*
>>>>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo *bo)
>>>>>>                 }
>>>>>>                 pthread_mutex_unlock(&nvdev->lock);
>>>>>>         } else {
>>>>>> -               DRMLISTDEL(&nvbo->head);
>>>>>> -               pthread_mutex_unlock(&nvdev->lock);
>>>>>>                 drmIoctl(bo->device->fd, DRM_IOCTL_GEM_CLOSE, &req);
>>>>>>         }
>>>>>>         if (bo->map)
>>>>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct nouveau_device *dev,
>> uint32_t
>>>> flags, uint32_t align,
>>>>>>                uint64_t size, union nouveau_bo_config *config,
>>>>>>                struct nouveau_bo **pbo)
>>>>>>  {
>>>>>> -       struct nouveau_device_priv *nvdev = nouveau_device(dev);
>>>>>>         struct nouveau_bo_priv *nvbo = calloc(1, sizeof(*nvbo));
>>>>>>         struct nouveau_bo *bo = &nvbo->base;
>>>>>>         int ret;
>>>>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct nouveau_device *dev,
>>>> uint32_t flags, uint32_t align,
>>>>>>                 return ret;
>>>>>>         }
>>>>>>
>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>> -       DRMLISTADD(&nvbo->head, &nvdev->bo_list);
>>>>>> -       pthread_mutex_unlock(&nvdev->lock);
>>>>>> -
>>>>>>         *pbo = bo;
>>>>>>         return 0;
>>>>>>  }
>>>>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct nouveau_device
>> *dev,
>>>> uint32_t handle,
>>>>>>         return -ENOMEM;
>>>>>>  }
>>>>>>
>>>>>> +static void
>>>>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo)
>>>>>> +{
>>>>>> +       if (!nvbo->head.next) {
>>>>>> +               struct nouveau_device_priv *nvdev =
>>>> nouveau_device(nvbo->base.device);
>>>>>> +               pthread_mutex_lock(&nvdev->lock);
>>>>>> +               if (!nvbo->head.next)
>>> I guess the bo_make_global call is not particularly sensitive, so
>>>> removing's fine with me.
>>>>
>>> I would be worried about the duplicated check. It seems like a "smart"
>>> compiler could cache the value of "nvbo->head.next" (unless marked as
>>> volatile), rendering the second if() useless. If the field is marked
>>> volatile, then of course, this does not apply.
>>>
>> In that case I would be worried about a compiler that ignores the acquire
>> semantics of pthread_mutex_lock..
>>
> This isn't about the acquire semantics here, because you're reading it on
> both sides of the lock. The compiler need not reorder the reads at all,
> merely just save the value of the first read. Unless the compiler knows
> that the value can change after the lock is acquired, it can simply cache
> the result. Consider the nearly identical transformation:
>
> const int condition = (!nvbo->head.next);
> if(condition){
>    pthread_mutex_lock(...);
>     if(condition){ //redundant, already can assert it is true
>         ...
>
So you're saying a compiler can optimize:

- statement with memory access
- read memory barrier
- statement with memory access

To drop the second statement? I would worry about your definition of memory barrier then..

~Maarten