o3tl::make_unsigned
Stephan Bergmann
sbergman at redhat.com
Thu Jan 30 10:39:01 UTC 2020
On 29/01/2020 17:14, Luboš Luňák wrote:
> On Wednesday 29 of January 2020, Stephan Bergmann wrote:
>> On 29/01/2020 15:20, Luboš Luňák wrote:
>>> Which is the assumption. Using o3tl::make_signed would not require such
>>> an assumption, or it would be even less likely false.
>>
>> But a precondition-free o3tl::make_signed would need to map an unsigned
>> type to a wider signed types, which need not exist.
>
> That wider type would need to be larger than 63bits, which is a value so
> large that it's extremely unlikely we'd need it in practice any time soon. I
> realize I'm now in the territory of "640K ought to be enough", but seriously,
> in which realistic scenario is a precise representation of
> 9223372036854775807 insufficient but 18446744073709551615 will still do?
>
> This part is about the meaning of the highest bit, and what I'm saying here
> is that in signed/unsigned comparisons you're still more likely in practice
> to encounter the highest bit set in a signed type than in unsigned. People
> are still more likely to mess up the >=0 assumption than compare with an
> unsigned value that has the highest bit set.
>
>> The "if large enough" is the hard part.
>
> Only theoretically.
>
>> Why resort to code that likely works, when we can write code that is
>> guaranteed to work?
>
> Exactly my point. It's just that you seem to find it guaranteed that people
> won't mess up range checks and only likely there won't be titanically huge
> files/allocations/containers, and I see it the other way around. So far I've
> definitely seen more often somebody get >=0 wrong than I've seen 8 exabytes
> of anything.
My point is that, for e1 of signed type S1 (where U1 is the unsigned
counterpart) and e2 of unsigned type U2 (where S2 is the signed
counterpart),
e1 < 0 || U1(e1) < e2 // (*)
is guaranteed to work for all types S1 and U2 and all values of e1 and
e2, while
e1 < S2(e2)
is not. My point has nothing to do with people writing broken code, or
how to prevent them from doing so.
It is just that for the task "compare a signed e1 against an unsigned
e2", (*) is the tool I at least reach for (naturally; without much of a
second thought, actually). And it has in fact been used all over the LO
code base, and the newly introduced o3tl::make_unsigned merely helps
write it in a better way (by not having to spell out U1). This is
orthogonal to the observation that signed APIs may be better than
unsigned ones.
>>> For reference, both Java's and C#'s List classes use int for size and
>>> index types, and use long for file size and position types. Apparently it
>>> does the job.
>>
>> Sure. But what we are faced with here are C/C++ APIs that use unsigned
>> types, and we have to interoperate with those.
>
> Sure. 'o3tl::make_signed(l.size())' wherever needed. Done. It'll generally
> need to go next to the place where we'd need to write the make_unsigned()
> variant anyway.
>
> My point was just that the highest bit in size_t is practically irrelevant.
C++20 will have ssize for containers (see
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1227r2.html>
"P1227: Signed ssize() functions, unsigned size() functions (Revision
2)"). Using it would probably help remove a large chunk of
signed/unsigned mixture in existing LO code.
(I'm fine with using it, as, sure, for containers it is clear that
restricting maximum size to no larger than size_t/2 is feasible. What I
dislike is a helper function mapping from an arbitrary unsigned type to
its signed counterpart pretending to be a total function.)
>> You mean, an o3tl::make_signed that maps from an unsigned type to the
>> signed type of the same rank? What would its precondition be, require
>> that the given value is sufficiently small? Typically not being able to
>> guarantee that statically, code would then need to first check for '<=
>> std::numeric_limits<T>::max()' before being able to call o3tl::make_signed?
>
> It's the same whether it's make_signed() or make_unsigned().
>
> Also, it seems to me that you make the mistake of assuming that using an
> unsigned type actually guarantees you anything (the "semantically makes
> sense" mistake I mentioned before). You can as easily "underflow" unsigned as
> you can overflow signed.
>
>> I still fail to see how converting from unsigned to signed is generally
>> possible, leave alone safe.
>
> I can write the same about the other direction. The difference is that
> signed->unsigned cuts off values that are realistic and unsigned->signed cuts
> off values that are pretty much unrealistic.
>
> What exactly is the base for your claim that signed->unsigned is better than
> unsigned->signed?
If you only use unsigned->signed, the equivalent of (*) would be
something like
e2 > std::numeric_limits<S1>::max() || e1 < S1(e2)
which is why I think signed->unsigned is a better building block.
More information about the LibreOffice
mailing list