This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
inverted_word_index [2011/04/09 22:49] daniel |
inverted_word_index [2016/05/16 14:27] (current) daniel remove useless j++ in loop |
||
---|---|---|---|
Line 2: | Line 2: | ||
See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables: | See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables: | ||
- | * [[http://www.manitou-mail.org/doc/table.words.html|words]] | + | * [[http://www.manitou-mail.org/doc/mdx/table.words.html|words]] |
- | * [[http://www.manitou-mail.org/doc/table.non-indexable-words.html|non_indexable_words]] | + | * [[http://www.manitou-mail.org/doc/mdx/table.non-indexable-words.html|non_indexable_words]] |
- | * [[http://www.manitou-mail.org/doc/table.inverted-word-index.html|inverted_word_index]] | + | * [[http://www.manitou-mail.org/doc/mdx/table.inverted-word-index.html|inverted_word_index]] |
===== Function ===== | ===== Function ===== | ||
The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text. | The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text. | ||
- | The function is already in the Manitou-Mail database for versions 1.0.3 and above | + | The function is already in the Manitou-Mail database for versions 1.1.0 and above |
- | <code sql> | + | <file sql wordsearch.sql> |
-- Input: an array of words to search within the entire index | -- Input: an array of words to search within the entire index | ||
-- Output: the mail_id of the matching messages, as a set. | -- Output: the mail_id of the matching messages, as a set. | ||
Line 64: | Line 64: | ||
RETURN NEXT var_part_no*16384+(i*8)+j+1; -- hit | RETURN NEXT var_part_no*16384+(i*8)+j+1; -- hit | ||
END IF; | END IF; | ||
- | j:=j+1; | ||
END LOOP; | END LOOP; | ||
END LOOP; | END LOOP; | ||
Line 73: | Line 72: | ||
$$ LANGUAGE plpgsql; | $$ LANGUAGE plpgsql; | ||
- | </code> | + | </file> |
===== Notes ===== | ===== Notes ===== | ||
- | The formula <code>var_part_no*16384+(i*8)+j+1</code> translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly. | + | The following formula: <code>var_part_no*16384+(i*8)+j+1</code> that translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly. |
===== Usage ===== | ===== Usage ===== |