User Tools

Site Tools


inverted_word_index

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
inverted_word_index [2011/04/09 20:50] danielinverted_word_index [2016/05/16 12:27] (current) – remove useless j++ in loop daniel
Line 2: Line 2:
  
 See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables: See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables:
-  * [[http://www.manitou-mail.org/doc/table.words.html|words]] +  * [[http://www.manitou-mail.org/doc/mdx/table.words.html|words]] 
-  * [[http://www.manitou-mail.org/doc/table.non-indexable-words.html|non_indexable_words]] +  * [[http://www.manitou-mail.org/doc/mdx/table.non-indexable-words.html|non_indexable_words]] 
-  * [[http://www.manitou-mail.org/doc/table.inverted-word-index.html|inverted_word_index]]+  * [[http://www.manitou-mail.org/doc/mdx/table.inverted-word-index.html|inverted_word_index]]
  
 ===== Function ===== ===== Function =====
  
 The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text. The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text.
-The function is already in the Manitou-Mail database for versions 1.0.and above +The function is already in the Manitou-Mail database for versions 1.1.and above 
-<code sql>+<file sql wordsearch.sql>
 -- Input: an array of words to search within the entire index -- Input: an array of words to search within the entire index
 -- Output: the mail_id of the matching messages, as a set. -- Output: the mail_id of the matching messages, as a set.
Line 64: Line 64:
      RETURN NEXT var_part_no*16384+(i*8)+j+1; -- hit      RETURN NEXT var_part_no*16384+(i*8)+j+1; -- hit
    END IF;    END IF;
-   j:=j+1; 
  END LOOP;  END LOOP;
       END LOOP;       END LOOP;
Line 73: Line 72:
 $$ LANGUAGE plpgsql; $$ LANGUAGE plpgsql;
  
-</code>+</file>
 ===== Notes ===== ===== Notes =====
 The following formula: <code>var_part_no*16384+(i*8)+j+1</code> that translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly. The following formula: <code>var_part_no*16384+(i*8)+j+1</code> that translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly.
inverted_word_index.txt · Last modified: 2016/05/16 12:27 by daniel