o3tl::make_unsigned

Thu Jan 30 12:30:18 UTC 2020

On 30/01/2020 12:56, Luboš Luňák wrote:
> On Thursday 30 of January 2020, Stephan Bergmann wrote:
>> On 29/01/2020 17:14, Luboš Luňák wrote:
>>>    Exactly my point. It's just that you seem to find it guaranteed that
>>> people won't mess up range checks and only likely there won't be
>>> titanically huge files/allocations/containers, and I see it the other way
>>> around. So far I've definitely seen more often somebody get >=0 wrong
>>> than I've seen 8 exabytes of anything.
>>
>> My point is that, for e1 of signed type S1 (where U1 is the unsigned
>> counterpart) and e2 of unsigned type U2 (where S2 is the signed
>> counterpart),
>>
>>     e1 < 0 || U1(e1) < e2  // (*)
>>
>> is guaranteed to work for all types S1 and U2 and all values of e1 and
>> e2, while
>>
>>     e1 < S2(e2)
>>
>> is not.  My point has nothing to do with people writing broken code, or
>> how to prevent them from doing so.
>>
>> It is just that for the task "compare a signed e1 against an unsigned
>> e2", (*) is the tool I at least reach for (naturally; without much of a
>> second thought, actually).  And it has in fact been used all over the LO
>> code base,
> 
>   This contradicts your original mail, where you stated that what is used
> is "if (sal_uInt32(e1) < e2) ...    // (B)" (i.e. without the <0 check). And

Maybe it is unclear, but my original mail was talking about a "correct" 
comparison

   e1 < e2

(i.e., where it is known that e1 >= 0, but where compilers might 
nevertheless emit a signed-vs.-unsigned warning).

> that's what people will often do, and there signed is the better choice. Your
> point is valid only if it assumes some ideal reality. In this reality, people
> mess up, and APIs should make it harder to write possibly problematic code,
> not promote it.

...which, again, is unrelated to my point.

>> and the newly introduced o3tl::make_unsigned merely helps
>> write it in a better way (by not having to spell out U1).  This is
>> orthogonal to the observation that signed APIs may be better than
>> unsigned ones.
> 
>   But it shouldn't be. If something may be better, then we should prefer moving
> towards it and not towards some local-maximum.

I do not argue against moving towards better APIs.

>> (I'm fine with using it, as, sure, for containers it is clear that
>> restricting maximum size to no larger than size_t/2 is feasible.  What I
>> dislike is a helper function mapping from an arbitrary unsigned type to
>> its signed counterpart pretending to be a total function.)
> 
>   It doesn't pretend any more than make_unsigned() does. It's just that it's
> harder to accidentally break its requirements in practice.

...while it's probably easier to write code that is guaranteed to comply 
with the make_unsigned preconditions.

(Also, I have no idea what the usage patterns for actual types S1 and U2 
are.  For usages with small U2, or e2 obtained from tainted sources, 
casting to S2 may well be an issue wrt preconditions.)

>> If you only use unsigned->signed, the equivalent of (*) would be
>> something like
>>
>>     e2 > std::numeric_limits<S1>::max() || e1 < S1(e2)
>>
>> which is why I think signed->unsigned is a better building block.
> 
>   But people will not consistently use (*), because it's more error-prone,
> longer to type and harder to read (yeah, we're all lazy). Using just "e1 < S1

I'm not sure about that.  As I wrote, I saw lots of occurrences of (*) 
across the LO code base.  (And, again, people writing broken code is 
something that o3tl::make_unsigned cannot and will not address.)

> (e2)" can assert in S1 and then the only disadvantage there I can see is if
> somebody actually somehow manages to trigger the assert => make_signed() is
> the better API.

I don't understand the above sentence; what do you mean with "can assert 
in S1"?