hal not parsing netlink messages properly

Sat Oct 29 10:29:51 PDT 2005

On Fri, Oct 28, 2005 at 09:39:56PM -0400, Jon Nettleton wrote:
> In testing that Hal properly recognizes device inputs and outputs, I
> have come across another bug.  Whenever I rebooted with multiple ( 3 or
> mote ) removable media devices in my readers, something would fail to
> mount when I logged in.  The problem reared it's ugly head when
> gnome-volume-manager sent as many asynchronous mount commands as
> necessary to mount all the media.  My findings show that hal's osspec.c
> reads the message buffer and only reads the first null terminated
> string.  Because the mounts were coming so fast some of the messages are
> actually 44bytes which is basically two messages, so the second
> mount@/blah/blah/blah is being lost.  I have attached the hald verbose
> output that illustrates this.  

Yup. That's a bug.

> I am not sure this is a problem with netlink or hal.

It's hal.

> I started to write
> something to parse the buffer at null terminated strings and then loop
> through the messages, but then thought I should ask you all if this
> makes sense.  

No, just call recvfrom() _once_ per processing loop. These are datagrams
and it's wrong to read as long as the buffer has data if you want only
a single message.

> In the debug output there should have been 5 drives mounted.  You can
> see only 3 were mounted.  The last two having a message length of
> 44bytes, which equals two "mount@/block/sdc/sdc1" with the null
> terminators.
...

> 17:46:34.920 [I] osspec.c:243: total_read=22 buf='mount@/block/sda/sda1'
> 17:46:35.084 [I] osspec.c:243: total_read=44 buf='mount@/block/sdc/sdc1'

Good catch! This is definitely broken. The whole read loop with the total_read
logic in netlink_detection_data_ready() should be removed completely.

Care to test it and send a patch? Otherwise let me know I can do it.

Thanks,
Kay