Tracking whether a connection is alive

Maciej Katafiasz mnews22 at wp.pl
Mon Jul 12 06:06:27 PDT 2004


W liście z pon, 12-07-2004, godz. 12:29, Owen Fraser-Green pisze: 
> Hi,
> 
> > W li¶cie z pon, 12-07-2004, godz. 10:44, David Zeuthen pisze:
> > One solution slightly more elegant than pinging periodically is to ping
> > only on attempt to lock acquisition, this however introduces up to
> > $TIMEOUT latency without removing fundamental issues, so maybe it isn't
> > all that great optimization.
> 
> Another alternative is leasing the lock so the acquirer has to be alive in
> order to keep renewing the lease to keep the lock.

Yeah, renewable lease seems to be best solution, with it code dealing
with stalled locks in HAL in smallest version can be trivially reduced
to:

bool grant_lock(lock)
{
	if(time_diff(current_time(), last_received_ping(lock)) < timeout)
	{
		/* Previous owner didn't release lock yet, cannot be granted */
		return FALSE;
	}
	/* Lock stalled, revoke it */
	(...)
}

Virtually no book-keeping is needed, apart from recording received pings
from lock owner. IMHO, it may be valuable to somehow inform that lock
was revoked abnormally, maybe by additional out param in AquireLock(),
or signal emission, so that new owner can be better prepared for
possible difficulties (such as showing dialog like "Opening CD burner
device failed. Note however that previous application using it did not
release drive, it may be necessary for you to manually close the
application and retry" in case of drive being busy)

> Anyway though, even if there's some mechanism such that HAL can be 100%
> sure that an application has hung, will it always be able to release the
> lock without the application first being killed? In that case, again, I
> think dealing with hanging apps is out of HALs scope.

That's why I propose additional information when lock is granted even if
previous owner did not release it normally. It's app's duty to decide if
it's safe to use. However, any service that provides locking must be
prepared for their abuse. HAL does not have power to deal with arbitrary
stalled lock, but should be at least able to tell when it stalls, and
remove it from system, so that it doesn't block everything. Policy
enforcement is necessarily dependant on cooperation of interested sides,
but it's the best we can do :).

IMHO mechanism should look like that:
AcquireLockLease() - any app that is granted a lock MUST renew it at
least every INTERVAL seconds, by issuing RenewLease() with owned lock
passed as parameter. Failure to do so is equivalent to returning lease
to system, and application MUST NOT continue to rely on lease it failed
to renew. Application will be notified with LeaseRevoked message that it
is no longer owner of lock in question, however it MUST cease to use it
as soon as it fails to renew it, not after receiving LeaseRevoked
message. 
After loosing lock, application MUST issue AcquireLockLease() prior to
using it again, and MUST behave as it has never been granted it. It is
however safe to invoke AcquireLockLease() on lock that hasn't been
revoked by HAL, and conforming HAL implementation MUST treat attempt to
AcquireLockLease() by application which already owns the lock as if it
was RenewLease() invoked.
NOTE: Any application acquiring lock MUST make best effort to provide
accurate feedback with RenewLock() that reflects true state of
application part that is using the lock. Application conforming to this
spec MUST NOT create "artificial" threads of execution, that will only
renew lease even if its real user effectively fails to conform to this
policy. This spec guarantees that values of INTERVAL will always be
reasonable, so that even applications with strictest time constraints
can afford periodical renewing of lock. One possible implementation of
this is expressed in following pseudocode:

atomic_t ttl;

WorkThread()
{
	while(1)
	{
		/* Do lots of heavy lifting with exact timing */
		my_crucial_rt_job();
		atomic_add(ttl, 1);
	}
}

SignalThread()
{
	static atomic_t last_ttl = atomic_read(ttl);
	atomic_t cur_ttl;
	static time_t timestamp;
	
	if(time_diff(timestamp, time()) < timeout)
	{
		cur_ttl = atomic_read(ttl);
		if(cur_ttl > last_ttl)
		{
			/* OK, do all magic to renew lock here */
			timestamp = time();
			return;
		}
	} else
	{
		/* Uh-oh, we're no longer allowed to use lock
		 * We have to deal with it
		 */
	}
}

-- 
"Tautologizm to coś tautologicznego"
   Maciej Katafiasz <mnews2 at wp.pl>
       http://mathrick.blog.pl




More information about the dbus mailing list