lucene - w3toppers.com

Is it possible to iterate through documents stored in Lucene Index?

IndexReader reader = // create IndexReader for (int i=0; i<reader.maxDoc(); i++) { if (reader.isDeleted(i)) continue; Document doc = reader.document(i); String docId = doc.get(“docId”); // do something with docId here… }

Lucene query fails with mixed MUST/MUST_NOT

Lucene doesn’t start with a full view of everything, like a SQL database. Lucene starts with no documents matched, and finds things based on the clauses searched on. This is why: -Content:xyz On it’s own doesn’t really work. It knows not to bring in content:xyz, but hasn’t been given any documents to match. The same … Read more

Solr/Lucene Scorer

Scorer are parts of lucene Queries via the ‘weight’ query method. In short, the framework calls Query.weight(..).scorer(..) . Have a look at http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html To use your own Query class in Solr, you’ll need to implement your own solr QueryParserPlugin that uses your own QParser that generates your previously implemented lucene Query. You then … Read more

Weird Solr/Lucene behaviors with boolean operators

The question have been answered very well in Solr mailing list. They have also added an entry in the offical FAQ, that says: Boolean queries must have at least one “positive” expression (ie; MUST or SHOULD) in order to match. Solr tries to help with this, and if asked to execute a BooleanQuery that does … Read more

Remove results below a certain score threshold in Solr/Lucene?

You could write your own Collector that would ignore collecting those documents that the scorer places below your threshold. Below is a simple example of this using Lucene.Net 2.9.1.2 and C#. You’ll need to modify the example if you want to keep the calculated score. using System; using System.Collections.Generic; using Lucene.Net.Index; using Lucene.Net.Search; public class … Read more

Update specific field on SOLR index

Solr does not support updating individual fields yet, but there is a JIRA issue about this (almost 3 years old as of this writing). Until this is implemented, you have to update the whole document. UPDATE: as of Solr 4+ this is implemented, here’s the documentation.

Which are the best alternatives to Lucene? [closed]

would need to know what problems you’re having with Lucene, but Xapian is worth a look.

get cosine similarity between two documents in lucene

As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more

Hibernate Search | ngram analyzer with minGramSize 1

Updated answer for Hibernate Search 6 With Hibernate Search 6, you can define a second analyzer, identical to your “ngram” analyzer except that it does not have an ngram filter, and assign it as the searchAnalyzer for your field: public class Hospital { // … @FullTextField(analyzer = “ngram”, searchAnalyzer = “my_analyzer_without_ngrams”) private String name = … Read more

Problem using same instance of indexSearcher for multiple requests

Try something like the following: protected static IndexSearcher searcher = null; … if (searcher == null) { searcher = new IndexSearcher(jobIndexFolderPath); }