o3tl::make_unsigned

Thu Jan 30 11:56:11 UTC 2020

On Thursday 30 of January 2020, Stephan Bergmann wrote:
> On 29/01/2020 17:14, Luboš Luňák wrote:
> >   Exactly my point. It's just that you seem to find it guaranteed that
> > people won't mess up range checks and only likely there won't be
> > titanically huge files/allocations/containers, and I see it the other way
> > around. So far I've definitely seen more often somebody get >=0 wrong
> > than I've seen 8 exabytes of anything.
>
> My point is that, for e1 of signed type S1 (where U1 is the unsigned
> counterpart) and e2 of unsigned type U2 (where S2 is the signed
> counterpart),
>
>    e1 < 0 || U1(e1) < e2  // (*)
>
> is guaranteed to work for all types S1 and U2 and all values of e1 and
> e2, while
>
>    e1 < S2(e2)
>
> is not.  My point has nothing to do with people writing broken code, or
> how to prevent them from doing so.
>
> It is just that for the task "compare a signed e1 against an unsigned
> e2", (*) is the tool I at least reach for (naturally; without much of a
> second thought, actually).  And it has in fact been used all over the LO
> code base,

 This contradicts your original mail, where you stated that what is used 
is "if (sal_uInt32(e1) < e2) ...    // (B)" (i.e. without the <0 check). And 
that's what people will often do, and there signed is the better choice. Your 
point is valid only if it assumes some ideal reality. In this reality, people 
mess up, and APIs should make it harder to write possibly problematic code, 
not promote it.

> and the newly introduced o3tl::make_unsigned merely helps 
> write it in a better way (by not having to spell out U1).  This is
> orthogonal to the observation that signed APIs may be better than
> unsigned ones.

 But it shouldn't be. If something may be better, then we should prefer moving 
towards it and not towards some local-maximum.

 And "used all over the LO code base" is a rather weak argument, given that 
the LO code base is full of stuff that is obsolete, poorly designed or 
otherwise poor practice. We also could have made the same argument with 
RTL_CONSTASCII_STRINGPARAM and yet I hope there's nobody who'd think it was a 
bad idea to reverse away from it instead of adding more of it.

 Similarly, the problem here is fundamentally caused by the fact that we use 
unsigned types at all (which is a problem fundamentally caused by the C 
promotion rules). The more we will add of those, the more likely we'll be to 
have these problems.

> C++20 will have ssize for containers (see
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1227r2.html>
> "P1227: Signed ssize() functions, unsigned size() functions (Revision
> 2)").  Using it would probably help remove a large chunk of
> signed/unsigned mixture in existing LO code.

 Ah, good.

> (I'm fine with using it, as, sure, for containers it is clear that
> restricting maximum size to no larger than size_t/2 is feasible.  What I
> dislike is a helper function mapping from an arbitrary unsigned type to
> its signed counterpart pretending to be a total function.)

 It doesn't pretend any more than make_unsigned() does. It's just that it's 
harder to accidentally break its requirements in practice.

> If you only use unsigned->signed, the equivalent of (*) would be
> something like
>
>    e2 > std::numeric_limits<S1>::max() || e1 < S1(e2)
>
> which is why I think signed->unsigned is a better building block.

 But people will not consistently use (*), because it's more error-prone, 
longer to type and harder to read (yeah, we're all lazy). Using just "e1 < S1
(e2)" can assert in S1 and then the only disadvantage there I can see is if 
somebody actually somehow manages to trigger the assert => make_signed() is 
the better API.

-- 
 Luboš Luňák
 l.lunak at collabora.com