Modernizing atomics

Mon Oct 26 15:30:29 PDT 2015

Thanks Norbert for the detailed reply. Some comments below.

On Mon, Oct 26, 2015 at 5:28 PM, Norbert Thiebaud <nthiebaud at gmail.com>
wrote:

> On Mon, Oct 26, 2015 at 2:56 PM, Ashod Nakashian <ashnakash at gmail.com>
> wrote:
> > On Mon, Oct 26, 2015 at 2:21 PM, Norbert Thiebaud <nthiebaud at gmail.com>
> > wrote:
> >>
> >> On Mon, Oct 26, 2015 at 1:00 PM, Ashod Nakashian <ashnakash at gmail.com>
> >> wrote:
> >> > On Mon, Oct 26, 2015 at 1:35 PM, Norbert Thiebaud <
> nthiebaud at gmail.com>
> >> > wrote:
> >> >>
> >> >> On Mon, Oct 26, 2015 at 12:14 PM, Ashod Nakashian <
> ashnakash at gmail.com>
> >> >> wrote:
> >> >> > OSL provides atomic helpers (osl_atomic_xxx) in the form of a GNU
> >> >> > builtin
> >> >> > (where available) or a platform-specific implementation.
> >> >> >
> >> >> > Any reason for not using modern std::atomic (besides possible lack
> of
> >> >> > volunteers) ?
> >> >> >
> >> >> >
> >> >> > As a transitional phase, we can maintain the same interface but
> with
> >> >> > std:atomic as the implementation.
> >> >> >
> >> >> > Thoughts?
> >> >>
> >> >> osl atomic are c interface, used in c-source...
> >> >>
> >> > Thanks. Is there equivalent used in C++ ? (osl atomics only work for
> >> > sal_Int32 values, which is another potential issue for 64-bit
> >> > portability.)
> >>
> >> the c++ code use these too.
> >
> >
> > Would there be support for using std::atomic in C++ code?
> >
> > There is a case to be made in terms of performance if nothing else (in
> some
> > scenarios they are hotspots, according to my profiler).
>
> I seriously doubt that std:: will improve the performance over
> __sync_add_and_fetch((p), 1)
> and
> __sync_sub_and_fetch((p), 1)
>
> Agreed, it will not. But on non-gcc it will.

> and fro windows, the only real gain would be to move the
> implementation in include/osl
> with
> #define osl_atomic_increment(p) InterlockedIncrement(pCount)
> and
> #define osl_atomic_decrement(p) InterlockedDecrement(pCount)
>
> that will give you most it not all of the gain.
>

Unfortunately, on Windows, and unlike gcc, the overhead is significant.
Ideally, the code would generate a single `lock xadd` instruction.
Currently the overhead of dispatching via function calls is very large
compared to this single instruction at the heart of the call.

Your suggestion is a good starting point and makes the need for std::atomic
less.

> (Note: I did not mess with Windows back then when I did that for gcc,
> as I was not in a position to test it properly,
> nor did I have the inclination to mess with Windows in general, as
> long as I can avoid it.
> but really that should be fairly easy to do)
>
> >
> >>
> >>
> >> relying on atomic on 64 bits is going to be a problem as long as we
> >> support 32 bits OS.
> >
> >
> > I believe most modern hardware support atomic operations on wide words
> (i.e.
> > 64-bit even when running in 32-bit mode).
>
> yes, but bear in mode that we had to rollback patches that required
> SSE2 on windows...
> The hardware baseline is quite old...
>

I understand the concern.

> In any case see below, osl only provide atomic increment/decrement for
> sal_uInt32 explicitly.
> If there is a need for atomic over another type of data, that will
> involve something else than sal/osl
>

This is less of a concern. Agreed.

>
> Note also that the osl API is a published API... that is why, although
> internally on gcc/clang platfrom we use
> the built-in directly via macro, the entry point in osl is maintained
> for API compatibility purpose.
>

Understood.

> >
> >>
> >> and mostly these atomic are used to ref-count... and there is really
> >> no reasonable need to have 64 bits ref-count is there ?
> >
> >
> > True for ref-counting. Not so for compare-exchange obviously (but I don't
> > know if these are used and how much).
>
> osl does not implement/expose any compare-and-swap api AFAIK.
> And honestly considering the horror of your locking 'model', having
> such CAS api would be pretty silly.
>

Not sure which horror you are referring to (surely you meant 'our', for the
collective codebase).
I'm only suggesting to improve locking API, not change any specific thread
synchronization code at all.

I'll run some test with
#define osl_atomic_increment(p) InterlockedIncrement(pCount)
and
#define osl_atomic_decrement(p) InterlockedDecrement(pCount)

on Windows and if it improves things in the profiler, I'll submit a patch
for consideration.
I think it's a trivial change that can have some improvement in performance
(however small) without much risk.

Thanks again, appreciate the exchange.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20151026/b0e569af/attachment.html>