[pulseaudio-discuss] asserts active by default

Tue Sep 2 10:20:41 PDT 2008

On Tue, 2008-09-02 18:01:20 +0200, Lennart Poettering <lennart at poettering.net> wrote:
> On Tue, 02.09.08 14:58, Jan-Benedict Glaw (jbglaw at lug-owl.de) wrote:
> > But pulse behaves different here. A lot of circumstances where
> > everybody and her turtle would just return an error, pulse will hit an
> > assertion.
> 
> "Just return an error"?
> 
> In the PA core I use asserts wherever a programming error is
> obvious. Which is the right thing to do. (Note that I am am talking of
> the core here, not the client libs!)

Please keep in mind that traditionally, assert() is only used during
the testing phase. NDEBUGging it out of the code should still give you
a working program that behaves just right[tm]. An assert() is a
construct to be used as a pessimistic check in a paranoid world to
check something that should never ever happen.

> It is naive to believe that "just returning an error" would be a good
> option in all cases including programming errors in complex programs
> like in PA. Error paths can never be tested comprehensively, which

Dump a stacktrace, use fprintf(), ...

> means they are usually much more buggy than the paths where everything
> is behaving correctly. Which means: if it is you who fucked up, then
> admit it, don't try to fuck it up even further by trying to be smart
> and come up with "Plan Bs" or something that try to fix your
> programming errors without you even knowning them. The crux of
> handling programming errors is that you can never know what their nature
> actually is. Because if you new, you'd have fixed them anyway, right?

Sure.  Though that doesn't actually imply killing the program, after
being build with standard options, while it could detect and handle
error situations that could be handled by a simple return -1.

> Don't try to hide your programming errors. If you try, you lost already.

Sure.

> So, if you believe that everyone and his turtle is doing right by
> "just returning an error" everywhere, then that 'everyone' must have
> psychic powers -- or simply no clue. ;-)

I don't think so.  But ISTR that we had different opinions about that
the last time, too ;-)

> Catching common-case runtime errors is hard enough. Spend your time on
> coming up with error paths for them. Don't try to come up with errors
> paths for your own programming errors! Spend the time of fixing them
> instead.

Sure. That's where you first have an assert() to simply drop into the
GDB prompt during the devel phase, and a regular if (...) check right
afterwards, containing the traditional error handling.

Note that this doesn't neccessarily hold true for /all/ cases of
assert() usage. I'm not suggesting, in any way, to just substitute
assert()s by if()s.

> > > If the asserts were disabled, then a significant rework in terms of 
> > > error handling would need to be performed. If a catchable error occurs 
> > > in pulse, an assert is not used.
> > 
> > Right, because error handling and assertions were mixed up. Error
> > handling is for errors that could expectedly happen (eg. malloc()
> > returns a NULL pointer, user requests an operation that's invalid in
> > this context, ...)
> 
> malloc() returning NULL is not an error that can realisticly
> happen. If malloc() returns NULL you are fucked anyway.

It can easily happen if you eg. switch off memory overcommit. On top
of that, this situation can even clear within some milliseconds.

So there are chances that you eg. cannot accept() and handle one
incoming client due to memory allocation problems, but with the next
client, there may be memory available again.

> Since modern operating systems work the way they work, the result of
> OOM is not that malloc() returns NULL, but that you are being killed
> by the OOM killer. The reason for that is that memory pages are only
> allocated when they are used -- not already when you call malloc(). 

This is the default behaviour of Linux right now, with overcommit
enabled. In a lot of environments, you're explicitely asked to switch
it off!

The point is that with late allocation, you can never know when
there's an OOM situation, until it's there and kernel needs to *kill*
a program. Ask eg. the Oracle guys how much they like that...

> On modern operating systems malloc() doesn't reserve memory, it only
> reserves address space. If malloc() returns NULL then you managed to
> reserve up to the limit of 2GB of memory (on x86). If you did that
> then you deserve being aborted.

You can even get beyond that IIRC. Like way above 3GB.

> Due to this all reasonable software (unless it is very low-level or
> kernel stuff) doesn't try to handle OOM. It just aborts. All GLib/Gtk
> programs are one example. PA is another one.

Database systems, or any software working with large datasets (large
enough that they could hit the address space constraints), are a
counter-example. They *rely* on overcommit being switched off.

> Appropriate error checking is for stuff like permission errors on
> open(), disk full, and so on. Testing the result of malloc is not a
> good example.

I think different about that.

MfG, JBG

-- 
      Jan-Benedict Glaw      jbglaw at lug-owl.de              +49-172-7608481
Signature of:               http://www.eyrie.org/~eagle/faqs/questions.html
the second  :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20080902/1faeb327/attachment.pgp>