Wednesday, February 4, 2009

Full-text & faceted search over In-Memory-Data-Grid

Modern in memory data-grid solutions provide different facilities for execution of queries over whole stored data set with different levels of sophistication . Coherence provides Query facilities (one time full scan & continuous quering with Cost-Based-Optimized). Gigaspaces has JDBC Query interface with ability to use hash & B-Tree indexes. The solutions work quite well for big part of problem areas. However for heavy loads & complex multi-createrias queries the facilities may become a bottleneck quickly.

But there is a class of workloads producing high query loads onto IMDGs. Item catalogs (in retailing companies for example) are hit by diverse stream of multi-createria queries.Typically query you see likes :

give me cell phones with MP3 support, Java and in red color.

Fortunately, Compass framework allows to do such queries effectively. It allows to builds inverse indexes with Apache Lucene and store them in a grid. The capability originates in very modular design of Lucene framework. All index IO operations hidden by abstraction of FileDirectory.

For now Compass provides implementations for Coherence, GigaSpaces and Terracota introducing unprecedented ability to build a vertical search solution on top of In-Memory DataGrid.

Moreover, Compass has sophisticated object-to-document mapping system which allows to make stored objects searchable just by addition of Java annotations or XML mapping files. Also mapping can be built in runtime.

However, inspite the great code, Composs has scarce documentation. It may take significant time to dive into the code and docs to get what you want. But results will overcome all your expectations. Search engines on top of data grids easily overcome any old-generation search technologies.

No comments: