how do I normalise a solr/lucene score?

To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages: People frequently want to compute a “Percentage” from Lucene scores to determine what is a “100% perfect” match vs a “50%” match. This is also somethings called a “normalized score” Don’t do this. Seriously. Stop trying to think about your problem this way, it’s not going to end well. That page does … Read more

Solr – Query over all fields best practice

The best solution is to build a field, that collects the data of all fields like this <field name=”collector” type=”text_general” indexed=”true” stored=”false” multiValued=”true” /> The only thing you have to do now is, copy the contents of all fields into that field: <copyField source=”notes” dest=”collector”/> <copyField source=”missionFocus” dest=”collector”/> <copyField source=”name” dest=”collector”/> …. Be aware that … Read more

Solr documents with child elements?

As of Solr 4.7 and 4.8, Solr supports nested documents: { “id”: “chapter1”, “title” : “Indexing Child Documents in JSON”, “content_type”: “chapter”, “_childDocuments_”: [ { “id”: “1-1”, “content_type”: “page”, “text”: “ho hum… this is page 1 of chapter 1” }, { “id”: “1-2”, “content_type”: “page”, “text”: “more text… this is page 2 of chapter 1” … Read more

solrj api for partial document update

As it turns out, the code snippet shown above in the question actually works. I don’t know what was wrong the first time I tried it, perhaps I simply forgot to commit or my schema was misconfigured. In any case, this question is very localized. However, since the api with the hash map is so … Read more

Setup sunspot solr with rails in production environment

The Sunspot gem includes the sunspot-solr binary. The simplest setup would be just to run sunspot-solr start. Depending on how your application is deployed, you might also include a task in your Capistrano deploy that uses Sunspot’s provided rake task to start a Solr server. Namely, rake sunspot:solr:start RAILS_ENV=production. Getting more in-depth from that could … Read more

Search Engine – Lucene or Solr

Lucene: Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search Solr: Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, … Read more

Can a raw Lucene index be loaded by Solr?

Success! With Pascal’s suggestion of changes to schema.xml I got it working in no time. Thanks! Here are my complete steps for anyone interested: Downloaded Solr and copied dist/apache-solr-1.4.0.war to tomcat/webapps Copied example/solr/conf to /usr/local/solr/ Copied pre-existing Lucene index files to /usr/local/solr/data/index Set solr.home to /usr/local/solr In solrconfig.xml, changed dataDir to /usr/local/solr/data (Solr looks for … Read more

Solr Custom Similarity

I figured it out on my own. I have stored my own implementation of DefaultSimilarity under /dist/ folder in solr. Then i add <lib dir=”../../../dist/org/apache/lucene/search/similarities/” regex=”.*\.jar”/> to my solrconfig.xml and everything works fine. package org.apache.lucene.search.similarities; import org.apache.lucene.index.FieldInvertState; import org.apache.lucene.search.similarities.DefaultSimilarity; public class MyNewSimilarityClass extends DefaultSimilarity { @Override public float coord(int overlap, int maxOverlap) { return 1.0f; … Read more