o3tl::make_unsigned

Thu Jan 30 10:39:01 UTC 2020

On 29/01/2020 17:14, Luboš Luňák wrote:
> On Wednesday 29 of January 2020, Stephan Bergmann wrote:
>> On 29/01/2020 15:20, Luboš Luňák wrote:
>>>    Which is the assumption. Using o3tl::make_signed would not require such
>>> an assumption, or it would be even less likely false.
>>
>> But a precondition-free o3tl::make_signed would need to map an unsigned
>> type to a wider signed types, which need not exist.
> 
>   That wider type would need to be larger than 63bits, which is a value so
> large that it's extremely unlikely we'd need it in practice any time soon. I
> realize I'm now in the territory of "640K ought to be enough", but seriously,
> in which realistic scenario is a precise representation of
> 9223372036854775807 insufficient but 18446744073709551615 will still do?
> 
>   This part is about the meaning of the highest bit, and what I'm saying here
> is that in signed/unsigned comparisons you're still more likely in practice
> to encounter the highest bit set in a signed type than in unsigned. People
> are still more likely to mess up the >=0 assumption than compare with an
> unsigned value that has the highest bit set.
> 
>> The "if large enough" is the hard part.
> 
>   Only theoretically.
> 
>> Why resort to code that likely works, when we can write code that is
>> guaranteed to work?
> 
>   Exactly my point. It's just that you seem to find it guaranteed that people
> won't mess up range checks and only likely there won't be titanically huge
> files/allocations/containers, and I see it the other way around. So far I've
> definitely seen more often somebody get >=0 wrong than I've seen 8 exabytes
> of anything.

My point is that, for e1 of signed type S1 (where U1 is the unsigned 
counterpart) and e2 of unsigned type U2 (where S2 is the signed 
counterpart),

   e1 < 0 || U1(e1) < e2  // (*)

is guaranteed to work for all types S1 and U2 and all values of e1 and 
e2, while

   e1 < S2(e2)

is not.  My point has nothing to do with people writing broken code, or 
how to prevent them from doing so.

It is just that for the task "compare a signed e1 against an unsigned 
e2", (*) is the tool I at least reach for (naturally; without much of a 
second thought, actually).  And it has in fact been used all over the LO 
code base, and the newly introduced o3tl::make_unsigned merely helps 
write it in a better way (by not having to spell out U1).  This is 
orthogonal to the observation that signed APIs may be better than 
unsigned ones.

>>>    For reference, both Java's and C#'s List classes use int for size and
>>> index types, and use long for file size and position types. Apparently it
>>> does the job.
>>
>> Sure.  But what we are faced with here are C/C++ APIs that use unsigned
>> types, and we have to interoperate with those.
> 
>   Sure. 'o3tl::make_signed(l.size())' wherever needed. Done. It'll generally
> need to go next to the place where we'd need to write the make_unsigned()
> variant anyway.
> 
>   My point was just that the highest bit in size_t is practically irrelevant.

C++20 will have ssize for containers (see 
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1227r2.html> 
"P1227: Signed ssize() functions, unsigned size() functions (Revision 
2)").  Using it would probably help remove a large chunk of 
signed/unsigned mixture in existing LO code.

(I'm fine with using it, as, sure, for containers it is clear that 
restricting maximum size to no larger than size_t/2 is feasible.  What I 
dislike is a helper function mapping from an arbitrary unsigned type to 
its signed counterpart pretending to be a total function.)

>> You mean, an o3tl::make_signed that maps from an unsigned type to the
>> signed type of the same rank?  What would its precondition be, require
>> that the given value is sufficiently small?  Typically not being able to
>> guarantee that statically, code would then need to first check for '<=
>> std::numeric_limits<T>::max()' before being able to call o3tl::make_signed?
> 
>   It's the same whether it's make_signed() or make_unsigned().
> 
>   Also, it seems to me that you make the mistake of assuming that using an
> unsigned type actually guarantees you anything (the "semantically makes
> sense" mistake I mentioned before). You can as easily "underflow" unsigned as
> you can overflow signed.
> 
>> I still fail to see how converting from unsigned to signed is generally
>> possible, leave alone safe.
> 
>   I can write the same about the other direction. The difference is that
> signed->unsigned cuts off values that are realistic and unsigned->signed cuts
> off values that are pretty much unrealistic.
> 
>   What exactly is the base for your claim that signed->unsigned is better than
> unsigned->signed?

If you only use unsigned->signed, the equivalent of (*) would be 
something like

   e2 > std::numeric_limits<S1>::max() || e1 < S1(e2)

which is why I think signed->unsigned is a better building block.