The mail-database exchanger script periodically updates a few entries in the runtime_info to report about its status. These entries can be checked by an external program; this article demonstrates how.
The relevant keys (runtime_info.rt_key field) are:
mail=> select * from runtime_info where rt_key in ('last_alive', 'last_sent', 'last_import');
rt_key | rt_value
-------------+------------
last_alive | 1156772479
last_sent | 1156754648
last_import | 1156771322
(3 rows)
The date can be converted to a human-readable form by a one-line script
in Perl:
$ perl -e 'print scalar(localtime(1156772479));' Mon Aug 28 15:41:19 2006
The last_alive entry gets updated only if the 'alive_interval' configuration parameter is set in the manitou-mdx config file, which is not the case by default.
In order to check if manitou-mdx's is running, we can create a simple script that connects to the database , read one or several of these entries, and compares them to an expected result. For last_alive, that result is the easier to define. The configuration parameter 'alive_interval' specifies how many seconds there is between two updates of 'last_alive'. If it happens that the difference between the current time and the value of 'last_alive' is significantly higher than 'alive_interval', then it can be assumed that manitou-mdx is no longer running, or something prevents it to update the entry (it could be stuck waiting for a database lock, for example).
In addition, this script can be hosted on a different machine than manitou-mdx and the database, and so will still be able to report if one of those is down.
Below is an example of such a script, in Perl (it assumes the existence of an environment variable named MANITOU_CONNECT_STRING that contains a valid DBI connect string, for example: Dbi:Pg:host=pgserver;dbname=mail;user=manitou )
#!/usr/bin/perl
use DBI;
use POSIX qw(strftime);
# The maximum number of seconds allowed between the
# 'last_alive' value of the database and the current time.
# When the difference between these two becomes higher
# than ALIVE_INTERVAL_MAX, the alert is triggered.
my $ALIVE_INTERVAL_MAX=600;
# Change these for real addresses
my $ALERT_EMAIL="alert\@domain.tld";
my $FROM_EMAIL="alert-sender\@domain.tld";
# A file created when an alert is sent
# The alert won't be sent again until this file is removed, either
# by us when detecting that the mdx is up again, or
# by another program, for instance the mdx start script
my $ALERT_LCK="/var/tmp/manitou-alert.lck";
sub alert {
my $msg=shift;
if (! -f $ALERT_LCK) {
# If no lockfile, create one
open(F, ">$ALERT_LCK");
print F localtime(time);
close(F);
# and send the alert
alert_mail($msg);
}
}
sub alert_mail {
my $msg=shift;
open(F, "|/usr/sbin/sendmail -t -f $FROM_EMAIL") or die $!;
print F "From: $FROM_EMAIL\n";
print F "To: $ALERT_EMAIL\n";
print F "Subject: alert about manitou-mdx\n";
print F "\n"; # end of header
print F "This is an automatically generated alert\n\n";
print F "Error message:\n$msg\n";
close(F);
}
my $cnx_string=$ENV{'MANITOU_CONNECT_STRING'};
if (!defined($cnx_string)) {
die "Missing MANITOU_CONNECT_STRING environment variable";
}
my $dbh=DBI->connect($cnx_string);
if (!$dbh) {
alert("unable to connect to database: $DBI::errstr");
exit 1;
}
my $sth=$dbh->prepare("SELECT rt_value FROM runtime_info WHERE rt_key='last_alive'");
$sth->execute;
my @r=$sth->fetchrow_array;
# if there's no entry, we consider there's no error
if (@r) {
if (time-$r[0] > $ALIVE_INTERVAL_MAX) {
my $d=strftime("%d/%m/%Y %H:%M:%S", localtime($r[0]));
alert("manitou-mdx appears to be down since $d");
}
else {
# if the mdx is running and there's an alert lockfile, then remove it
# in order not to block further alerts
if (-f $ALERT_LCK) {
unlink($ALERT_LCK);
}
}
}
$sth->finish;
$dbh->disconnect;
Similarly, the last _import entry could be used to detect a problem in the mail chain. For example, if a mail system that is generally busy hasn't processed a single incoming message during several hours, that could be considered suspicious enough to trigger an alert.