21 Sep 2009

CouchDB-Lucene, CouchDBX, and CouchRest

I’ve been playing a lot with CouchDB lately and if you’re on OS X there’s really no easier way to do so than by using CouchDBX, a self-contained application that includes CouchDB, Erlang, and all of the dependencies you need to run Couch.

However, recently I wanted to try the power and functionality of couchdb-lucene for full-text indexing of a CouchDB application on which I was working. It wasn’t immediately obvious to me how to make that happen, so I thought I’d share how I got it working for those who might want to do the same. For the record, I am using the following versions of things:

Unpacking CouchDB Lucene

You will need to place CouchDB Lucene in a convenient location, but first you’ll need to unpack the gZip file. Note that, at least for me, the default Mac unarchiver did not provide a suitable .jar file. Instead, follow the README instructions and extract it using the following command:

unpack200 couchdb-lucene-0.4-jar-with-dependencies.jar.gz couchdb-lucene-0.4-jar-with-dependencies.jar

It doesn’t particularly matter where you put it (I put mine in /usr/local/etc) so long as you remember the location.

Editing the CouchDBX .ini File

Next you will need to edit the CouchDBX local.ini file to add the external for full-text indexing. To access the file, right-click CouchDBX and select “Show Package Contents” then navigate to Contents/Resources/couchdbx-core/couchdb/etc/couchdb/local.ini and open it up in your favorite text editor. You will need to add the following lines to this file, taking note to change the directory to match where you stored your Lucene .jar file.

[couchdb]
os_process_timeout=60000 ; increase the timeout from 5 seconds.

[external]
fti=/usr/bin/java -server -Xmx1g -jar /path/to/couchdb-lucene-0.4-jar-with-dependencies.jar -search

[update_notification]
indexer=/usr/bin/java -server -Xmx1g -jar  /path/to/couchdb-lucene-0.4-jar-with-dependencies.jar -index

[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}

All right, that’s all the setup you need to be running CouchDB Lucene! Now you can start up CouchDBX and set up your first search indexes.

Creating a Full-Text Index View

To create a full-text index view, you simply need to add a “fulltext” field to one of your design documents. The URL structure for accessing CouchDB Lucene searches is as follows:

http://localhost:5984/database_name/_fti/design_doc_name/index_name?q=your+query+here

Where database_name is any database you have on your system, design_doc_name is any design document in your database, and index_name is a fulltext index you defined. For instance, if I had a CouchRest generated design doc for a bunch of music, I might have a design document that looks something like this:

{
  "_id":"_design/Song",
  "_rev":"af12a4b12af1b24afbf244f1",
  "fulltext":{
    "my_search":{
      "index":"function(doc) { if (!doc['couchrest-type'] == 'Song') return null; var ret = new Document(); ret.add(doc.title); ret.add(doc.artist); return ret; }"
    }
  }
}

I would then be able to access the search results with this URL:

http://localhost:5984/myapp/_fti/Song/my_search?q=Ben+Folds

Awesome! We now have full-text indexing up and running on CouchDBX!

Bonus: CouchRest Lucene

I use CouchRest extensively in my Ruby CouchDB projects, and I wanted to be able to integrate the new Lucene searches easily. I found this post that added a bit of functionality, but I wanted to be able to integrate with ExtendedDocument (and the snippet was also slightly outdated), so I’ve updated it. Just add this sometime after you include CouchRest:

class CouchRest::Database
  def search(design, index, query, options={})
    CouchRest.get CouchRest.paramify_url("#{@root}/_fti/#{design}/#{index}", options.merge(:q => query))
  end
end

class CouchRest::ExtendedDocument
  def self.search(index, query, options={})
    options[:include_docs] = true
    ret = self.database.search(self.to_s, index, query, options)
    ret['rows'].collect!{|r| self.new(r['doc'])}
    ret
  end
end

What this snippet does is allows you to perform searches on a database directly or by calling a search method on an extended document. Let’s look at a couple examples to see how it would work:

@db = CouchRest.database!("http://localhost:5984/myapp")

@db.search('Song','my_search', 'Ben Folds', :include_docs => true)

# The following is equivalent to the above, but will automatically
# include the docs and cast the result rows into ExtendedDocuments
Song.search('my_search', 'Ben Folds')

Now that we have this set up, we’re ready to go forth and build full-text search into our CouchDB apps. I hope this is helpful to some who have become somewhat familiar with CouchDB but are looking to push it a little further and try out some of the more advanced usages of CouchDB in their applications.

blog comments powered by Disqus