Otherwise known as a PITA (and I’m not talking about warm, fluffy flatbread).
Search is one of the most critical components of a modern application and platform, if perhaps not the most critical. Especially when working with a “big data” application, collecting, sorting, and being able to search for that data is highly important. There is just so much data for humans to process and manage – powerful search capabilities make it humanly possible to make that data meaningful and useful.
Having worked in the software technology industry for 10 years now, a very common “information retrieval”software library is Lucene. In my previous companies, since Lucene is free (as in beer) and also free (as in liberty), it was offered out-of-the-box. The implementations were straightforward and allowed for at least basic search capabilities.
Beyond simple Lucene, I’ve touched both Solr and Elastic Search (Episerver Find). I’ve found that ultimately Solr and Elastic Search (both based off Lucene libraries) are PITAs in their own way, but when deployed, configured, and implemented correctly (or however correct you can get them based off the business need), they are quite powerful. Considering that fast, usable search can be difficult to achieve, Solr and Elastic Search are popular for many projects.
Now that I’m working with Solr more currently, it does have a steep learning curve and – for older versions especially – stability issues. Starting with Solr 4.x (which Cloudera uses right now), support for Solr in distributed environments became possible. A solid overview of what SolrCloud is provided by Lucidworks, who are true Solr experts.
I still have much to learn on my part. This is only the very beginning.