Archive

Archive for September, 2012

Upgrading attachments indexers

September 26th, 2012 Comments off

Version 1.3.0 is ready to be released with some major improvements in the full-text search and full-text indexer. A couple of packaging issues are still being worked on, but the code won’t change significantly from the 1.3.0 tags on the master branches at github, for both manitou-mail-mdx and manitou-mail-ui.

The list of changes from the previous version is currently visible in the NEWS file in the sources.

It should be noted that 1.3.0 will require some adjustments in how attachments in various formats are indexed with the help of user-supplied scripts.

Up to now, indexer plugins were used to index the contents of attached files in PDF or DOC, or even HTML formats.

Starting from 1.3.0, this is no longer desirable: they should be integrated with a new method called “words extractors” and declared in manitou-mdx main configuration file. Example:

index_words_extractors = application/pdf: /opt/scripts/pdf2text \
	application/msword: /opt/scripts/word2text

The user-supplied scripts should extract words from the contents in custom format passed to their standard input, and output these words encoded in utf-8 to the standard output.

For installs that didn’t index attachments with plugins, it doesn’t matter. Upgrading to 1.3.0 will just cause starting to index HTML contents, which it does now internally by default, so no manual action is required.

On the other hand, for installs that used indexer plugins, a preliminary step to upgrade to 1.3.0 would be to convert these to word extractor scripts. It’s nothing particularly difficult. As an example, here is a ready-to-use script that extracts words from MS-Word files with antiword.

#!/bin/sh

t=$(tempfile --suffix=.doc) || exit 1
trap "rm -f -- '$t'" EXIT
cat >>$t
antiword -i1 "$t" || exit 1

rm -f -- "$t"
trap - EXIT
exit 0

This is a preliminary step because it will be recommended to rebuild the inverted word index when upgrading to 1.3.0 , and doing this involves reindexing attachments as well.
Up to version 1.2, that was not possible with indexer plugins. That’s one of the reasons why plugins get deprecated as a way to index attachments contents. In addition, the lack of integration with the words vectors cache was a performance drag, and 1.3 solves that as well with its words extractors method.

Categories: Development, New features Tags:

Exim4 and its pipe_transport unset error

September 3rd, 2012 Comments off

On Debian systems, Exim4 in its default configuration does not allow piping of an incoming mail into  a program defined in /etc/aliases. Instead of launching the program it will report this type of error in /var/log/exim4/mainlog:

system_aliases defer (-30): pipe_transport unset in system_aliases router

Yet that’s the method  suggested in manitou-mdx documentation’s “Delivering incoming mail into files”  section, and it has the advantage of being quite standard across most  MTAs.  This is also a problem for  popular mail software such as MailMan, as mentioned in this Ubuntu issue.

The solution is similar to the one mentioned in the above issue, except that it’s better to create a custom configuration file rather than editing exim4.conf.template, according to update-exim4.conf manpage.

In the simplest case where a split configuration for Exim is not used (dc_use_split_config=’false’), the fix is as simple as creating /etc/exim4/exim4.conf.localmacros containing:

SYSTEM_ALIASES_PIPE_TRANSPORT = address_pipe

SYSTEM_ALIASES_USER and SYSTEM_ALIASES_GROUP may be specified too if the defaults are not suitable, but only SYSTEM_ALIASES_PIPE_TRANSPORT is strictly necessary.

If a split configuration is used, the line should go into a file under /etc/exim4/conf.d, e.g. /etc/exim4/conf.d/main/000_localmacros

Categories: Administration Tags: