Archive

Archive for the ‘Development’ Category

Note to query writers about mail_status

July 8th, 2017 Comments off

The mail_status table

mail_status is a (mail_id, status) table containing the subset of the
mail that is not “current”, which in terms of status meant, technically:
status & (16+32+256) = 0

Commit de2ee18  and related commit 7804642 in the user interface remove that table
in favor of a partial index on the mail table with the expression:
(status & 32 = 0) which means exactly: “not archived”.

This might raise a few questions among users who have developed their own set of queries. Hopefully the rest of this post will answer them in advance.

Can we keep the old queries (involving mail_status) unchanged?

Yes, by creating a mail_status view emulating the old table, taking advantage of the new index:

CREATE VIEW mail_status AS
SELECT mail_id, status FROM mail WHERE status&32=0 AND status&(16+256)=0;

Simple views like that are normally inlined by the PostgreSQL optimizer, so this should perform pretty well. When in doubt, use EXPLAIN in SQL to check the execution plan.

What is the motivation behind the change ?

Mostly performance. According to EXPLAIN ANALYZE, joining against a mail_status real table populated with a few thousand messages is fast (which is why this table existed in the first place), but avoiding the join and using the partial index instead is faster.

Also mail_status was maintained by triggers on INSERT, UPDATE, DELETE, and these triggers were not free in execution time. Now they’re no longer necessary and have been removed in the above-mentioned commits.

Why is the index on status&32=0, instead of status&(16+32+256)=0 ?

For simplicity. The triggers maintaining mail_status used the latter expression, but a message with the status “sent” (256) or “trashed” (16), but not “archived”, is a bit of a weird case, because there’s generally no action pending on a message that was sent or moved into the trashcan. It’s easier to reason about this new index knowing that it partitions the mail simply between archived and not archived, matching exactly the “archived” bit in the status.
In most cases, status&32=0 is the expression that should be used to mean this message is “current”. For exact compatibility with the old expression, status&32=0 AND status&(16+256)=0 should be used, so that the PostgreSQL optimizer can use the new index.

Categories: Database, Development Tags:

Experimental Qt5 package available for Ubuntu

March 15th, 2016 Comments off

The Qt5 port of the user interface is in good shape, successful tests have been made with up to Qt5.5.

The development still happens in a branch:
(github link), but that will be merged soon into the main trunk, and the Qt4.x version will become a separate branch.

In the meantime, a binary package for Ubuntu 14.04 or 15.10 on amd64 architecture is available in an experimental repository, which can be added to /etc/apt/sources.list.d/manitou.list:

deb http://manitou/apt trusty experimental

The package is manitou-ui, version 1.5.0. It’s built with Qt-5.2.1 on Ubuntu 14.04.


# apt-cache show manitou-ui
Package: manitou-ui
Version: 1.5.0
Architecture: amd64
Maintainer: Daniel Verite
Installed-Size: 2443
Depends: libc6 (>= 2.14), libgcc1 (>= 1:4.1.1), libpq5 (>= 9.0~), libqt5core5a (>= 5.0.2), libqt5gui5 (>= 5.0.2) | libqt5gui5-gles (>= 5.0.2), libqt5network5 (>= 5.0.2), libqt5printsupport5 (>= 5.0.2), libqt5webkit5, libqt5widgets5 (>= 5.2.0), libstdc++6 (>= 4.6)
Priority: extra
Section: mail
Filename: pool/experimental/m/manitou-ui/manitou-ui_1.5.0_amd64.deb
Size: 674752
SHA256: 1974bf59cc40d1da67e7fd097b41d78707c012c29d7934610b9ff3a020c14cc4
SHA1: fa6df5ac14a33de5411f06a4139ab86dff3b8536
MD5sum: 937578411e1eff15f5763eb219a1c7a9
Description: Manitou-Mail's user interface
Qt-based GUI that acts as a front-end to a Manitou-Mail database.
Description-md5: 1fcf376b591321a77a965062e2ea0b97

Categories: Development, User Interface Tags:

Upgrading attachments indexers

September 26th, 2012 Comments off

Version 1.3.0 is ready to be released with some major improvements in the full-text search and full-text indexer. A couple of packaging issues are still being worked on, but the code won’t change significantly from the 1.3.0 tags on the master branches at github, for both manitou-mail-mdx and manitou-mail-ui.

The list of changes from the previous version is currently visible in the NEWS file in the sources.

It should be noted that 1.3.0 will require some adjustments in how attachments in various formats are indexed with the help of user-supplied scripts.

Up to now, indexer plugins were used to index the contents of attached files in PDF or DOC, or even HTML formats.

Starting from 1.3.0, this is no longer desirable: they should be integrated with a new method called “words extractors” and declared in manitou-mdx main configuration file. Example:

index_words_extractors = application/pdf: /opt/scripts/pdf2text \
	application/msword: /opt/scripts/word2text

The user-supplied scripts should extract words from the contents in custom format passed to their standard input, and output these words encoded in utf-8 to the standard output.

For installs that didn’t index attachments with plugins, it doesn’t matter. Upgrading to 1.3.0 will just cause starting to index HTML contents, which it does now internally by default, so no manual action is required.

On the other hand, for installs that used indexer plugins, a preliminary step to upgrade to 1.3.0 would be to convert these to word extractor scripts. It’s nothing particularly difficult. As an example, here is a ready-to-use script that extracts words from MS-Word files with antiword.

#!/bin/sh

t=$(tempfile --suffix=.doc) || exit 1
trap "rm -f -- '$t'" EXIT
cat >>$t
antiword -i1 "$t" || exit 1

rm -f -- "$t"
trap - EXIT
exit 0

This is a preliminary step because it will be recommended to rebuild the inverted word index when upgrading to 1.3.0 , and doing this involves reindexing attachments as well.
Up to version 1.2, that was not possible with indexer plugins. That’s one of the reasons why plugins get deprecated as a way to index attachments contents. In addition, the lack of integration with the words vectors cache was a performance drag, and 1.3 solves that as well with its words extractors method.

Categories: Development, New features Tags:

SQL functions for tags

February 29th, 2012 Comments off

Tags in Manitou-mail are hierarchical, for several reasons such as the ability to mimic folders. There are pros and cons of this choice, but from the point of view of SQL querying, tree-like structures are clearly more complicated than flat structures. Here are two functions in the wiki that could be of help to compare tags across hierarchies:

  • tag_path(tag_id) returns the full hierarchical path of a tag, with -> as the separator between branches.
  • tag_depth(tag_id) returns the depth of the tag inside its hierarchy, starting at 1

As an example of use, in the custom queries of the user interface, we could use this query:

select mail_id from mail_tags mt join tags t on (mt.tag=t.tag_id) where tag_path(t.tag_id) ilike 'ParentTag->%'

to retrieve any message tagged with any tag whose top-level ancestor is ‘ParentTag’, no matter how deep the tag is inside the hierarchy (child, grandchild, grand grandchild…)

Categories: Database, Development Tags:

Moving the source code to github

January 10th, 2012 Comments off

Starting with v1.2.1 currently under development, subversion is replaced by the more modern git source control tool.

It’s also the opportunity to split the source code into two distinct repositories for the user interface and mdx, since they can be worked on independently. A third repository should follow¬† for the documentation.

The master branches for both programs are accessible at:
https://github.com/manitou-mail/manitou-mail-ui and https://github.com/manitou-mail/manitou-mail-mdx

The subversion repository at SourceForge.Net is still accessible, but from now on it will probably see commits only on releases.

The github service also has an issue tracker, a wiki and other things that might get used over time, we’ll see.

Categories: Development Tags:

Word search in SQL

April 9th, 2011 Comments off

The current version of manitou-mail uses C++ code inside the interface to deal with the inverted word index. There is also a Perl version in the Manitou::Words module (see sub search), but so far it wasn’t possible to issue a search directly from inside an SQL query, making it hard to combine the results with other criteria.
I’m glad to say that it’s now possible, by using a new wordsearch() function implemented in pl/pgsql.
This opens up the possibility of doing some interesting custom queries, such as for example:

SELECT mail_id,subject,msg_date
 FROM mail m
  JOIN (SELECT * FROM wordsearch(array['foo','bar']) AS id) s ON (m.mail_id=s.id)
 WHERE m.status&16=0 AND msg_date>now()-'1 year'::INTERVAL;

This query retrieves the messages that contain both ‘foo’ and ‘bar’, excluding those that are in the trashcan, and those that are more than one year old.
This should be pretty fast, too. This one runs for me in 0.4s on a database containing about 47000 messages.
Not only this kind of query can be launched by any custom program using the mail database, but it can also be saved inside the manitou-mail user interface as a “user query” and be accessible from the quick selection panel.

The code will be shipped with future versions of manitou, but in the meantime the function source code is available in the wiki.

Categories: Database, Development Tags:

Windows developement environment, final part

September 13th, 2009 Comments off

In part 1, we installed the build tools. In part 2, we built Qt and the PostgreSQL libraries from source. In this part, let’s see how to use these to finally build the Manitou-Mail user interface.

The starting point is the set of sources as fetched from the SF.net subversion repository. The latest version can be checked out with for example, Tortoise SVN, as shown in this screenshot:

Source checkout with Tortoise

Source checkout with Tortoise

The difference between a source tarball and sources from the repository is mostly that the configure script and Makefile.in inside each directory are present in the tarball but not in the repository: these files are to be generated by autoconf and automake.

Here are the steps from the checkout of the source to getting the executable file:

  1. Make sure that Qt binaries are in the path:
    export PATH=/c/Qt/4.5.2/bin:$PATH

  2. Generate configure script and Makefile.in files:

    aclocal
    autoheader
    autoconf
    automake -a -c

  3. Configure:

    ./configure --with-pgsql-includes=/usr/local/pgsql/include --with-pgsql-libs=/usr/local/pgsql/lib

  4. Compile:

    make

The compilation produces the final executable, manitou.exe, in the src/ directory. While it’s possible to run ‘make install’ at this point as we would do in an Unix environment, personally I don’t use it. This is because I just want to package the executable along with libraries inside an installer that is targeted at Windows systems that generally don’t have MSYS with its Unix-like filiesystem layout.

To package the Manitou-Mail interface for Windows along with the required Qt, PostgreSQL and MingW libraries into a self-contained installer program, I currently use NSIS. The current NSIS script is available on the wiki.

Categories: Development Tags:

Windows developement environment, part 2

September 6th, 2009 Comments off

In part 1, we installed the tools that are required to build from source.
This post shows how to use them to compile Qt and the PostgreSQL libraries we need.

Qt-4.5.2

Qt is huge and building it from scratch takes a long time, typically several hours on a current desktop machine. Using pre-compiled binaries as available on the Qt site can be preferred, although it is only since quite recently (4.5.2?) that the pre-compiled version can be used to build manitou-mail (As I mentioned in this message to qt-interest, there was a problem with the pre-compiled qmake that precluded its use with autotools scripts launched in MSYS. But this appears to have been fixed).
If however a custom compilation is chosen, here are the steps to follow:

  1. Download Qt sources into c:\qt\4.5.2 or similar
  2. Open CMD.EXE and make sure g++ is in the PATH. Add it manually if necessary.
  3. Inside, c:\qt\4.5.2, run configure.exe -platform win32-g++.
  4. Run mingw32-make as told at the end of configure, and don’t expect the result before several hours :)

Beware: contrary to the Unix way of installing from source, no make install is necessary nor possible with Qt/Windows : the build must happen directly inside the destination directory.

PostgreSQL’s libpq

  1. Grab postgresql source tar archive and untar it. I’ve used version 8.4.0, the latest available at the time of this post.
  2. If /mingw/include/libz.h doesn’t exist, grab and unpack the zlib dev archive in /mingw.
  3. cd into postgresql source directory and run ./configure
  4. cd into src/interfaces/libpq
  5. run make && make install

Once this is done, the headers files to be found in /usr/local/pgsql/include and the libraries (.a and .dll) in /usr/local/pgsql/lib

Conclusion of part 2

Now that Qt and PostgreSQL are built, in the next and final part of this series we’ll compile the manitou-mail application itself for windows, with the toolchain that we installed in part 1.

Categories: Development Tags:

Windows developement environment, part 1

August 21st, 2009 Comments off

This is the first post (out of 3) about setting up a development environment to build the user interface of Manitou-Mail on Windows. In this part, we’re installing the compiler (GCC packaged by MinGW), the MSYS environment, and the autotools (autoconf and automake).
In the second part, we’ll build Qt itself and the PostgreSQL client libraries from source. In the third part, we’ll build manitou.exe from the sources off the subversion repository.

MinGW (compiler) installation

The current version is 5.1.4 is available here: MingW-5.1.4 installer on sourceforge.net.
Let’s run the installer and choose “Download and install”:
MinGW install start

After accepting the license, we get to choose the destination directory. Let’s install into c:\mingw, the default choice.

MinGW destination directory

We want the “current” version:
MinGW version's choice

The next screen is about choosing the components.
We need to check the following ones:

  • MinGW base tools
  • g++ compiler
  • MinGW Make

MinGW components

After that, a rapid succession of screens appear, that show the packages that are being downloaded and installed.
MinGW packages getting installed

Then the install finished and we end up with a c:\mingw directory containing about 60MB of programs. It turns out that in my case, I’m also finding .tar.gz archives of the compiler packages on my desktop (from which I launched the installer), so let’s delete them.
Remaining files

MSYS (shell environment) installation

MSYS stands for minimal system. It’s a light unix-like environment with a shell and enough capabilities to support the autotools.

Like MingW, MSYS has a base system and packages, except that it has a lot more packages. We need only a half-dozen of them.
The current version of the MSYS base is available at:

MSYS-1.0.11 installer on sourceforge.net

Let’s select the default directory: c:\msys\1.0
MSys destination directory
At the end of the installation, a post-installer script running in a CMD window “will try to normalize” with MingW, so we just accept what it suggests and type the path of our mingw directory as told (c:/mingw)
MSYS post-install

Once the installation is done, we have this icon on the desktop to launch an MSYS shell :
MSYS shortcut

Autotools installation

MSYS has an MSYS Developer Toolkit in the form of a .EXE that installs automake, autoconf and the perl interpreter they depend on, but unfortunately their versions are too old for manitou’s configure script. It installs autoconf version 2.56 and automake 1.7.1 that date back from 2002, while we need at least autoconf version 2.60. Perl is version 5.6.1.

Still, let’s install that package, if just to have Perl, and upgrade separately automake and autoconf.

We install MSYS DTK into the recommended location
MSYS Developer Toolkit install

Now, following the advice from the MSYS wiki, we get the source archives for autoconf and automake directly off the main GNU FTP server (so it’s the baseline, not modified versions for MSYS).
The current sources are autoconf-2.64.tar.bz2 and automake-1.11.tar.bz2.

Let’s download these archives into c:/tmp/ and unpack them (all the subsequent commands are run inside the MSYS shell)

$ cd c:/tmp
$ tar xjf autoconf-2.64.tar.bz2
$ tar xjf automake-1.11.tar.bz2

The MSYS wiki says it’s important to compile into separate build directories so we do as told:

$ mkdir build-autoconf
$ cd build-autoconf
$ ../autoconf-2.64/configure --prefix=/mingw 
$ make
$ make install

Same thing for automake:

$ mkdir build-automake
$ cd build-automake
$ ../automake-1.11/configure --prefix=/mingw 
$ make
$ make install

Now that they’re both installed, autoconf --version displays:

$ autoconf --version
autoconf (GNU Autoconf) 2.64
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David J. MacKenzie and Akim Demaille.

And automake –version:

$ automake --version
automake (GNU automake) 1.11
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Tom Tromey 
       and Alexandre Duret-Lutz .

Conclusion of part 1

With MinGW and MSYS, we have an environment that is good enough to build programs on Windows the same way that they are on unix (basically: configure && make && make install)
With autotools we can regenerate the configure script and the Makefile.in files, which is something we need to add new source files to the project or modify the configure.in script.
Also, we’re ready to compile Qt and a PostgreSQL client, which will be done in the next part.

Categories: Development Tags:

Indexing HTML parts

August 15th, 2009 Comments off

While HTML integration is improving in Manitou-Mail, the current version (0.9.12) does not index the contents of HTML parts. This is generally not a problem because messages tend to carry a text version inside a multipart/alternative MIME construct, and that version gets indexed so that the message can still be retrieved by the words it contains. But still, some people send HTML-only messages, in which case we want to automatically extract the text from the HTML and pass it to the indexer.

It’s relatively easy to write a manitou-mdx Perl plugin that does just that, by using a CPAN module to do the HTML to text conversion: HTML::FormatText

Apart from the usual init and process functions that are described in the mdx plugins reference, we need to provide two functions: one that recursively descends the MIME tree to find the html parts, and another that extracts them to text and pass them to the indexer.

sub index_contents {
  my ($fh, $ctxt)=@_;
  my $html;
  my $text;
  {
    local $/;
    $html = $fh->getline();
  }
 
  if (defined $html) {
    my $tree = HTML::TreeBuilder->new;
    $tree->parse_content($html);
    my $formatter = HTML::FormatText->new(leftmargin=>0, rightmargin=>78);
    $text = $formatter->format($tree);
  }
  if (defined $text) {
    Manitou::Words::index_words($ctxt->{'dbh'}, $ctxt->{'mail_id'}, \$text);
  }
}
 
sub process_parts {
  my ($obj,$ctxt) = @_;;
  if ($obj->is_multipart) {
    foreach my $subobj ($obj->parts) {
      process_parts($subobj, $ctxt);    # recurse
    }
  }
  else {
    my $type=$obj->effective_type;
    if ($type eq "text/html") {
      my $io = $obj->bodyhandle->open("r");
      index_contents($io, $ctxt);
      $io->close;
    }
  }
}

The full source code and download link are available on the wiki

Categories: Development Tags: , ,