Version 1.6.0 released

March 20th, 2017 No comments

Manitou-Mail 1.6.0 is released and available to download.

This version adds operators in the searchbar (on message senders, recipients, status, dates, attachments, tags), a statistics panel with charts and exportable results, and the creation of users and groups within the interface.

The users management features also include access rights checked at the database level, and the possibility of restricting certain accounts to certain identities, using policies with PostgreSQL’s Row Level Security feature.

Pre-compiled binaries for Windows and macOs come ready-to-use with Qt-5.5 libraries (including WebKit), and binary packages for Linux Debian 8 and Ubuntu 14.04 and 16.04 are also available through the APT repository (see the download page).

 

If upgrading from a previous version, make sure to run the server-side command:

manitou-mgr --upgrade-schema
Categories: New features Tags:

Improvements in mail deduplication

October 4th, 2016 Comments off

The no_duplicate plugin tracks exact duplicates, precisely incoming mail files having the same SHA1 fingerprint as a previously imported mail file.

Up to now, such duplicates could be discarded by simply declaring in manitou-mdx configuration file:


[name@example.com]
incoming_preprocess_plugins = no_duplicate

But when a manitou-mail database is used to only sync new messages from an IMAP server, with hierarchical tags reflecting folders, a message move across IMAP folders is interpreted as a duplicate coming in.

It’s fine and actually desirable not to import the message again, but ideally we’d want to see it in its new folder.

The no_duplicate plugin can now do that by acting both as a incoming_preprocess_plugin and as a incoming_postprocess_plugin.
The first step recognizes the duplicate, and optionally, updates the tags of the message instance already in the database.
The second step associates the SHA1 fingerprint of a newly imported message to its unique ID, which is necessary for the optional tags update to work, if a duplicate of this message comes in the future with different tags.

The declaration taking advantage of this new feature looks like:

[name@example.com]
incoming_preprocess_plugins = no_duplicate({update_tags=>1})
incoming_postprocess_plugins = no_duplicate

For more information on manitou-mdx plugins, see the documentation.

Categories: Database, New features Tags:

Users management in the interface

September 21st, 2016 Comments off

Starting with version 1.6, Manitou-Mail will allow the creation of users and groups from within the user interface, as shown in the screenshot below:

ui-users-1

Although PostgreSQL has merged users and groups into “roles” years ago (since version 8.1), we intentionally stick to the old terminology, because it’s easier to describe the model. Users are accounts for persons connecting to the database. Groups are entities having a set of permissions, and to which users are assigned. Users can belong to multiple groups.

Under the hood, users correspond to PostgreSQL roles having the LOGIN attribute, so they can log in (assuming proper permissions), whereas groups are roles that don’t have this attribute. Permissions are associated to groups through SQL GRANT commands.

Manitou-Mail chooses to associate permissions to groups, instead of individual users, given its focus on team work and shared mail corpuses. For a set of permissions that apply to a single person, a group will have to be created with only that user as member.

The set of permissions currently handled is shown in the snapshot below. More fined-tuned permissions will probably be added in the future depending on users needs.

 

ui-users-2

Version 1.6 can be built from the git repository; pre-compiled binaries and packages will be made available soon.

Categories: New features, User Interface Tags:

Parallel import

July 18th, 2016 Comments off

Importing in parallel from a single sourceĀ  is really enabledĀ in manitou-mdx since commit 6a860e, under the following conditions:

  • parallelism is driven from the outside: manitou-mdx instances run concurrently, but don’t fork and manage child workers. Workers don’t share anything. Fortunately GNU parallel can easily handle this part.
  • the custom full text indexing is done once the contents are imported, not during the import. The reason is that it absolutely needs a cache for performance, and such a cache wouldn’t work in the share-nothing implementation mentioned above.

The previous post showed how to create a list of all mail files to import from the Enron sample database.

Now instead of that, let’s create a list splitted in chunks of 25k messages, that will be fed separately to the parallel workers:


$ find . -type f | split -d -l 25000 - /data/enron/list-

The result is 21 numbered files of 25000 lines each, except for the last one, list-20 containing 17401 lines.

The main command is essentially the same as before. As a shell variable:

cmd="mdx/script/manitou-mdx --import-list={} \
--import-basedir=$basedir/maildir \
--conf=$basedir/enron-mdx.conf \
--status=33"

Based on this, a parallel import with 8 workers can be launched through a single command:

ls "$basedir"/list-* | parallel -j 8 $cmd

This invocation will automatically launch manitou-mdx processes and feed them each with a different list of mails to import (through the –import-list={} argument). It will also take care that there are always 8 such running processes if possible, launching a new one when another terminates.

This is very effective, compared to a serial import. Here are the times spent to import to entire mailset (517401 messages) for various degrees of parallelism, on a small server with a Xeon D-1540 @ 2.00GHz processor (8 cores, 16 threads).

 

parallel-mdx

Categories: Usage Tags:

Mass-importing case: the Enron mail database

July 12th, 2016 Comments off

Importing mail messages en masse works best when fiddling a bit with the configuration, rather than pushing the mail messages into the normal feed.

As an example, we’re going to use the mails from Enron, the energy provider that famously went down in the 90s, amidst a fraud scandal.
The mail corpus has been made public by the judicial process:
http://www.cs.cmu.edu/~enron/

It has been cleaned from all attachments, in addition to another cleaning process to remove potentially sensitive personal information, done by Nuix.

The archive format is a 423MB .tar.gz file with an MH-style layout:
– one top-level directory per account.
– inside each account, files and directories with mail folders.

It contains 3500 directories for 151 accounts, and a total of 517401 files, taking 2.6GB on disk once uncompressed.

After unpacking the archive, follow these steps to import the mailset from scratch:

1) Create the list of files


$ cd /data/enron/maildir
$ find . -type f > /data/enron/00-list-all

2) Create a database and a dedicated configuration file for manitou-mdx


# Run this as a user with enough privileges to create
# a database (generally, postgres should do)
$ manitou-mgr --create-database --db-name=enron

Create a specific configuration file with some optimizations for mass import:


$ cat enron-mdx.conf
[common]
db_connect_string = Dbi:Pg:dbname=enron;user=manitou

update_runtime_info = no
update_addresses_last = no
apply_filters = no
index_words = no

preferred_datetime = sender

update_runtime_info is set to no to avoid needlessly update timestamps in the runtime_info table for every imported message.

update_addresses_last set to no also will avoid some unnecessary writes.

apply_filters is again a micro-optimization to avoid querying for filters on every message. On the other hand, it should be left to yes if happen to have defined filters and want them to be used during this import.

index_words is key to performance. Running the full-text indexing after the import instead of during it makes it 3x faster. Also the full-text indexing as a separate process can be parallelized (more on that below).

preferred_datetime set to sender indicates that the date of a message is given by its header Date field, as opposed to the file creation time.

If we were importing into a pre-existing manitou-mdx instance running in the background, we would stop it at this point, as
several instances of manitou-mdx cannot work on the same database because of caching, except in specific circumstances (also more on that later).

3) Run the actual import command


$ cd /data/enron/maildir
$ time manitou-mdx --import-list=../00-list-all --conf=../enron-mdx.conf

On a low-end server, it takes about 70 minutes to import the 517402 messages with this configuration and PostgreSQL 9.5.

We can check with psql that all messages came in:

$ psql -d enron -U manitou
psql (9.5.3)
Type "help" for help.

enron=> select count(*) from mail;
count
--------
517401
(1 row)

4) Run the full text indexing

As it's a new database with no preexisting index, we don't have to worry about existing partitions. We let manitou-mgr index the messages with 4 jobs in parallel:


$ time manitou-mgr --conf=enron-mdx.conf --reindex-full-text --reindex-jobs=4

Output from time:

real 10m41.855s
user 28m22.744s
sys 1m8.476s

So this part of the process takes about 10 minutes.

Conclusion

With manitou-mgr, we can check the final size of the database and its main tables:

$ manitou-mgr --conf=enron-mdx.conf --print-size
-----------------------------------
addresses : 13.52 MB
attachment_contents : 0.02 MB
attachments : 0.02 MB
body : 684.98 MB
header : 402.45 MB
inverted_word_index : 2664.77 MB
mail : 250.12 MB
mail_addresses : 441.17 MB
mail_tags : 0.01 MB
pg_largeobject : 0.01 MB
raw_mail : 0.01 MB
words : 106.52 MB
-----------------------------------
Total database size : 4633 MB

Future posts will show how it compares to the full mailset (with attachments, 18GB of .pst files), and how to parallelize the main import itself.

Categories: Usage Tags:

Operators in the search bar

July 9th, 2016 Comments off

Until now, the search bar in the user interface did not support query
terms to search on metadata.
I’m glad to say that commits 2ddddaae and a1cbe72a add support for filtering
by date and message status right from the search bar, introducing
five operators:

  • “date:” must be followed by an iso-8601 date (format YYYY-MM-DD),
    or by a specific month (format YYYY-MM), or just a year (YYYY).
    It selects the messages from respectively that day,or month, or year.

  • “before:” has the same format but selects messages dated
    from this day/month/year or an earlier date.

  • “after:” is of course the opposite, selecting messages past
    the date that follows.

  • “is:” must be followed by a status among read,replied,forward,archived,sent.
    Criteria can be combined by using the option several times, as statuses are cumulative, not mutually exclusive,

  • “isnot:” is of course the opposite of “is”. It accepts the same arguments
    and filters out the messages that have the corresponding status bit.
    “is:” and “isnot:” can also be combined, for instance: “is:archived isnot:sent”.

A few more search bar operators are likely to be added to that list, as it’s a pretty handy and fast way to express basic queries.

Categories: New features, User Interface Tags:

Version 1.5.0 released

May 24th, 2016 Comments off

Manitou-Mail 1.5.0 is released and available to download.

The major improvement is the move from Qt4 to Qt5, an important step
to continue to benefit from Qt’s progress.

Other changes are, in the user interface:

  • Bug fixes with Unicode characters in headers when composing.
  • Fix “Fetch more” bug.
  • Fix locale bug when retrieving FP numbers from queries.
  • Fix bug interpreting certain URLs with percent-encoded chars.
  • Desktop notifications available on all platforms.
  • Russian translation added.

In manitou-mdx:

  • Reimplement rfc2047 encoding for consecutive words.
  • Add workaround for a DBD::Pg 3.x bug with utf-8 handling.
  • Fix utf-8 encoding in HTML MIME parts.
  • Fix undesirable CRLF conversion in attachments on MS-Windows.
  • Minor parsing improvements in full text indexing.
Categories: Uncategorized Tags:

Experimental Qt5 package available for Ubuntu

March 15th, 2016 Comments off

The Qt5 port of the user interface is in good shape, successful tests have been made with up to Qt5.5.

The development still happens in a branch:
(github link), but that will be merged soon into the main trunk, and the Qt4.x version will become a separate branch.

In the meantime, a binary package for Ubuntu 14.04 or 15.10 on amd64 architecture is available in an experimental repository, which can be added to /etc/apt/sources.list.d/manitou.list:

deb http://manitou/apt trusty experimental

The package is manitou-ui, version 1.5.0. It’s built with Qt-5.2.1 on Ubuntu 14.04.


# apt-cache show manitou-ui
Package: manitou-ui
Version: 1.5.0
Architecture: amd64
Maintainer: Daniel Verite
Installed-Size: 2443
Depends: libc6 (>= 2.14), libgcc1 (>= 1:4.1.1), libpq5 (>= 9.0~), libqt5core5a (>= 5.0.2), libqt5gui5 (>= 5.0.2) | libqt5gui5-gles (>= 5.0.2), libqt5network5 (>= 5.0.2), libqt5printsupport5 (>= 5.0.2), libqt5webkit5, libqt5widgets5 (>= 5.2.0), libstdc++6 (>= 4.6)
Priority: extra
Section: mail
Filename: pool/experimental/m/manitou-ui/manitou-ui_1.5.0_amd64.deb
Size: 674752
SHA256: 1974bf59cc40d1da67e7fd097b41d78707c012c29d7934610b9ff3a020c14cc4
SHA1: fa6df5ac14a33de5411f06a4139ab86dff3b8536
MD5sum: 937578411e1eff15f5763eb219a1c7a9
Description: Manitou-Mail's user interface
Qt-based GUI that acts as a front-end to a Manitou-Mail database.
Description-md5: 1fcf376b591321a77a965062e2ea0b97

Categories: Development, User Interface Tags:

Improved Debian packages for manitou-mdx 1.4.0

October 15th, 2015 Comments off

An APT repository is now available for manitou-mail, with the latest manitou-mdx packages for amd64 and i386 architectures.
Create /etc/apt/sources.list.d/manitou.list with these contents:

# for Debian 7 or 8
deb http://manitou-mail.org/apt/ jessie main
# for Ubuntu 14.04 and other current versions
deb http://manitou-mail.org/apt/ trusty main

(the distribution codename is obtained from `lsb_release -c` )

then run:

apt-get update
apt-get install manitou-mdx

The first time, the installer now asks if it should install the database.

Screenshot from 2015-10-10 23:45:38

When choosing Yes, it creates a new database as indicated (assuming a default PostgreSQL cluster is up and running), and auto-starts (configured in /etc/default/manitou-mdx)

The default source of mail defined in /etc/manitou-mdx.conf is also the Maildir directory for manitou as a user (/var/lib/manitou/Maildir). Then if the system is configured to use Maildir, messages directed to the local address manitou@localhost will end up directly into the database.

Hopefully, these changes will make it easier to start with manitou-mdx compared to the previous versions of the package.

On Debian, the default maildrop destination is still mbox files (/var/mail/$USER), and delivery happens through procmail. maildir is always a better choice. With procmail, this is configurable by putting in /etc/procmailrc:

DEFAULT=$HOME/Maildir/

If not using procmail, for example if postfix is responsible for the maildrop, set in /etc/postfix/main.cf

home_mailbox = Maildir/

See https://wiki.debian.org/MaildirConfiguration for more details.

Categories: Administration, New features Tags:

Installing manitou-mdx on FreeBSD 10

September 22nd, 2015 Comments off

Installing manitou-mdx on FreeBSD 10.1 can be achieved following these steps:

1) Install Perl modules

# perl -MCPAN -e shell
install DBD::Pg
install MIME::Entity
install URI::Escape
install IO::Uncompress::Gunzip
install Bit::Vector
install HTML::TreeBuilder

2) Build manitou-mdx from source
the current version is 1.3.1 from
https://github.com/manitou-mail/manitou-mail-mdx

cd manitou-mdx-1.3.1
perl Makefile.PL
make
# make install

This will install into /usr/local

3) Create the database, user, schemas

$ export PGDATABASE=postgres
$ manitou-mgr --create-database --db-super-user=pgsql --db-name=manitou

Categories: Administration Tags: