Differences

This shows you the differences between two versions of the page.

--- inverted_word_index [2011/04/09 20:49] – daniel
+++ inverted_word_index [2016/05/16 12:27] (current) – remove useless j++ in loop daniel
@@ Line 2: / Line 2: @@
 See also [[http://www.manitou-mail.org/db/inverted-wordindex.html|Implementation of the inverted word index]] for how the word index is stored in the database, and the definition of tables:
-  * [[http://www.manitou-mail.org/doc/table.words.html|words]]
+  * [[http://www.manitou-mail.org/doc/mdx/table.words.html|words]]
-  * [[http://www.manitou-mail.org/doc/table.non-indexable-words.html|non_indexable_words]]
+  * [[http://www.manitou-mail.org/doc/mdx/table.non-indexable-words.html|non_indexable_words]]
-  * [[http://www.manitou-mail.org/doc/table.inverted-word-index.html|inverted_word_index]]
+  * [[http://www.manitou-mail.org/doc/mdx/table.inverted-word-index.html|inverted_word_index]]
 ===== Function =====
 The word index can be queried in SQL. Here is the code of a PL/pgSQL function that retrieves the IDs of messages containing a set of words. The words are given as an array of text.
-The function is already in the Manitou-Mail database for versions 1.0.3 and above
+The function is already in the Manitou-Mail database for versions 1.1.0 and above
-<code sql>
+<file sql wordsearch.sql>
 -- Input: an array of words to search within the entire index
 -- Output: the mail_id of the matching messages, as a set.
@@ Line 64: / Line 64: @@
 	    RETURN NEXT var_part_no*16384+(i*8)+j+1; -- hit
 	  END IF;
-	  j:=j+1;
 	END LOOP;
       END LOOP;
@@ Line 73: / Line 72: @@
 $$ LANGUAGE plpgsql;
-</code>
+</file>
 ===== Notes =====
-The formula <code>var_part_no*16384+(i*8)+j+1</code> translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly.
+The following formula: <code>var_part_no*16384+(i*8)+j+1</code> that translates a bit in the vector to a mail_id relies on the fact that the index is split into parts indexing each one at most 16384 messages. This size of parts could actually be increased for larger databases (think millions of messages), and in this case, this code should be updated accordingly.
 ===== Usage =====