<div dir="ltr">Thanks Norbert for the detailed reply. Some comments below.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 26, 2015 at 5:28 PM, Norbert Thiebaud <span dir="ltr"><<a href="mailto:nthiebaud@gmail.com" target="_blank">nthiebaud@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Mon, Oct 26, 2015 at 2:56 PM, Ashod Nakashian <<a href="mailto:ashnakash@gmail.com">ashnakash@gmail.com</a>> wrote:<br> > On Mon, Oct 26, 2015 at 2:21 PM, Norbert Thiebaud <<a href="mailto:nthiebaud@gmail.com">nthiebaud@gmail.com</a>><br> > wrote:<br> >><br> >> On Mon, Oct 26, 2015 at 1:00 PM, Ashod Nakashian <<a href="mailto:ashnakash@gmail.com">ashnakash@gmail.com</a>><br> >> wrote:<br> >> > On Mon, Oct 26, 2015 at 1:35 PM, Norbert Thiebaud <<a href="mailto:nthiebaud@gmail.com">nthiebaud@gmail.com</a>><br> >> > wrote:<br> >> >><br> >> >> On Mon, Oct 26, 2015 at 12:14 PM, Ashod Nakashian <<a href="mailto:ashnakash@gmail.com">ashnakash@gmail.com</a>><br> >> >> wrote:<br> >> >> > OSL provides atomic helpers (osl_atomic_xxx) in the form of a GNU<br> >> >> > builtin<br> >> >> > (where available) or a platform-specific implementation.<br> >> >> ><br> >> >> > Any reason for not using modern std::atomic (besides possible lack of<br> >> >> > volunteers) ?<br> >> >> ><br> >> >> ><br> >> >> > As a transitional phase, we can maintain the same interface but with<br> >> >> > std:atomic as the implementation.<br> >> >> ><br> >> >> > Thoughts?<br> >> >><br> >> >> osl atomic are c interface, used in c-source...<br> >> >><br> >> > Thanks. Is there equivalent used in C++ ? (osl atomics only work for<br> >> > sal_Int32 values, which is another potential issue for 64-bit<br> >> > portability.)<br> >><br> >> the c++ code use these too.<br> ><br> ><br> > Would there be support for using std::atomic in C++ code?<br> ><br> > There is a case to be made in terms of performance if nothing else (in some<br> > scenarios they are hotspots, according to my profiler).<br> <br> </div></div>I seriously doubt that std:: will improve the performance over<br> __sync_add_and_fetch((p), 1)<br> and<br> __sync_sub_and_fetch((p), 1)<br> <br></blockquote><div>Agreed, it will not. But on non-gcc it will.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> and fro windows, the only real gain would be to move the<br> implementation in include/osl<br> with<br> #define osl_atomic_increment(p) InterlockedIncrement(pCount)<br> and<br> #define osl_atomic_decrement(p) InterlockedDecrement(pCount)<br> <br> that will give you most it not all of the gain.<br></blockquote><div><br></div><div>Unfortunately, on Windows, and unlike gcc, the overhead is significant.</div><div>Ideally, the code would generate a single `lock xadd` instruction.</div><div>Currently the overhead of dispatching via function calls is very large compared to this single instruction at the heart of the call.</div><div> </div><div>Your suggestion is a good starting point and makes the need for std::atomic less.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br> (Note: I did not mess with Windows back then when I did that for gcc,<br> as I was not in a position to test it properly,<br> nor did I have the inclination to mess with Windows in general, as<br> long as I can avoid it.<br> but really that should be fairly easy to do)<br> <span class=""><br> ><br> >><br> >><br> >> relying on atomic on 64 bits is going to be a problem as long as we<br> >> support 32 bits OS.<br> ><br> ><br> > I believe most modern hardware support atomic operations on wide words (i.e.<br> > 64-bit even when running in 32-bit mode).<br> <br> </span>yes, but bear in mode that we had to rollback patches that required<br> SSE2 on windows...<br> The hardware baseline is quite old...<br></blockquote><div><br></div><div>I understand the concern.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> In any case see below, osl only provide atomic increment/decrement for<br> sal_uInt32 explicitly.<br> If there is a need for atomic over another type of data, that will<br> involve something else than sal/osl<br></blockquote><div><br></div><div>This is less of a concern. Agreed.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br> Note also that the osl API is a published API... that is why, although<br> internally on gcc/clang platfrom we use<br> the built-in directly via macro, the entry point in osl is maintained<br> for API compatibility purpose.<br></blockquote><div><br></div><div>Understood.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""> ><br> >><br> >> and mostly these atomic are used to ref-count... and there is really<br> >> no reasonable need to have 64 bits ref-count is there ?<br> ><br> ><br> > True for ref-counting. Not so for compare-exchange obviously (but I don't<br> > know if these are used and how much).<br> <br> </span>osl does not implement/expose any compare-and-swap api AFAIK.<br> And honestly considering the horror of your locking 'model', having<br> such CAS api would be pretty silly.<br></blockquote><div><br></div><div>Not sure which horror you are referring to (surely you meant 'our', for the collective codebase).</div><div>I'm only suggesting to improve locking API, not change any specific thread synchronization code at all.</div><div> </div><div>I'll run some test with </div><div>#define osl_atomic_increment(p) InterlockedIncrement(pCount)<br>and<br>#define osl_atomic_decrement(p) InterlockedDecrement(pCount)<br></div><div><br></div><div>on Windows and if it improves things in the profiler, I'll submit a patch for consideration.</div><div>I think it's a trivial change that can have some improvement in performance (however small) without much risk.</div><div><br></div><div>Thanks again, appreciate the exchange.</div></div><br></div></div>