User Tools

Site Tools


inverted_word_index

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revisionBoth sides next revision
inverted_word_index [2011/04/09 20:31] – created danielinverted_word_index [2016/05/16 12:25] – Fix doc links daniel
Line 2: Line 2:
  
 See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables: See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables:
-  * [[http://www.manitou-mail.org/doc/table.words.html|words]] +  * [[http://www.manitou-mail.org/doc/mdx/table.words.html|words]] 
-  * [[http://www.manitou-mail.org/doc/table.non-indexable-words.html|non_indexable_words]] +  * [[http://www.manitou-mail.org/doc/mdx/table.non-indexable-words.html|non_indexable_words]] 
-  * [[http://www.manitou-mail.org/doc/table.inverted-word-index.html|inverted_word_index]]+  * [[http://www.manitou-mail.org/doc/mdx/table.inverted-word-index.html|inverted_word_index]]
  
 ===== Function ===== ===== Function =====
  
 The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text. The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text.
-<code sql> +The function is already in the Manitou-Mail database for versions 1.1.0 and above 
--- Input: an array of words to search in the entire index+<file sql wordsearch.sql> 
 +-- Input: an array of words to search within the entire index
 -- Output: the mail_id of the matching messages, as a set. -- Output: the mail_id of the matching messages, as a set.
 CREATE OR REPLACE FUNCTION wordsearch(in_words text[]) RETURNS SETOF integer CREATE OR REPLACE FUNCTION wordsearch(in_words text[]) RETURNS SETOF integer
Line 72: Line 73:
 $$ LANGUAGE plpgsql; $$ LANGUAGE plpgsql;
  
-</code>+</file> 
 +===== Notes ===== 
 +The following formula: <code>var_part_no*16384+(i*8)+j+1</code> that translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly.
  
 ===== Usage ===== ===== Usage =====
-The simplest form of usage is to search for one word with no other criteria:+The simplest form of usage would be to search for one word with no other criteria:
 <code sql> <code sql>
 select wordsearch(array['foobar']); select wordsearch(array['foobar']);
Line 81: Line 84:
  
  
-example1: get the messages that contain 'foo' and 'bar', excluding those in the trashcan+But the result of wordsearch() can be joined with other tables and filtered with criteria. For exemple, the following SQL retrieves the messages that contain 'foo' and 'bar', excluding those in the trashcan
 <code sql> <code sql>
 select m.mail_id select m.mail_id
inverted_word_index.txt · Last modified: 2016/05/16 12:27 by daniel