Which are the best alternatives to Lucene? [closed]
would need to know what problems you’re having with Lucene, but Xapian is worth a look.
would need to know what problems you’re having with Lucene, but Xapian is worth a look.
SnowballAnalyzer is deprecated, you can use Lucene Porter Stemmer instead: PorterStemmer stem = new PorterStemmer(); stem.setCurrent(word); stem.stem(); String result = stem.getCurrent(); Hope this help!
In a nutshell, Lucene builds an inverted index using Skip-Lists on disk, and then loads a mapping for the indexed terms into memory using a Finite State Transducer (FST). Note, however, that Lucene does not (necessarily) load all indexed terms to RAM, as described by Michael McCandless, the author of Lucene’s indexing system himself. Note … Read more
Try this: ?q=-id:[“” TO *]
Try a PhraseQuery instead: PhraseQuery query = new PhraseQuery(); String[] words = sentence.split(” “); for (String word : words) { query.add(new Term(“contents”, word)); } booleanQuery.add(query, BooleanClause.Occur.MUST); Edit: I think you have a different problem. What other parts are there to your booleanQuery? Here’s a full working example of searching for a phrase: public class LucenePhraseQuery … Read more
To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages: People frequently want to compute a “Percentage” from Lucene scores to determine what is a “100% perfect” match vs a “50%” match. This is also somethings called a “normalized score” Don’t do this. Seriously. Stop trying to think about your problem this way, it’s not going to end well. That page does … Read more
The default stop words set in StandardAnalyzer and EnglishAnalyzer is from StopAnalyzer.ENGLISH_STOP_WORDS_SET, as found in the source file: “a”, “an”, “and”, “are”, “as”, “at”, “be”, “but”, “by”, “for”, “if”, “in”, “into”, “is”, “it”, “no”, “not”, “of”, “on”, “or”, “such”, “that”, “the”, “their”, “then”, “there”, “these”, “they”, “this”, “to”, “was”, “will”, “with” StopFilter itself defines no … Read more
A very simple way would be to use Luke. On the ‘Overview’ tab, there is a ‘Show top terms’ button that can be used for what you need.
One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely. We … Read more
As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more