lucene - w3toppers.com

Which are the best alternatives to Lucene? [closed]

would need to know what problems you’re having with Lucene, but Xapian is worth a look.

Stemming English words with Lucene

SnowballAnalyzer is deprecated, you can use Lucene Porter Stemmer instead: PorterStemmer stem = new PorterStemmer(); stem.setCurrent(word); stem.stem(); String result = stem.getCurrent(); Hope this help!

How does lucene index documents?

In a nutshell, Lucene builds an inverted index using Skip-Lists on disk, and then loads a mapping for the indexed terms into memory using a Finite State Transducer (FST). Note, however, that Lucene does not (necessarily) load all indexed terms to RAM, as described by Michael McCandless, the author of Lucene’s indexing system himself. Note … Read more

How to query SOLR for empty fields?

Try this: ?q=-id:[“” TO *]

Exact Phrase search using Lucene?

Try a PhraseQuery instead: PhraseQuery query = new PhraseQuery(); String[] words = sentence.split(” “); for (String word : words) { query.add(new Term(“contents”, word)); } booleanQuery.add(query, BooleanClause.Occur.MUST); Edit: I think you have a different problem. What other parts are there to your booleanQuery? Here’s a full working example of searching for a phrase: public class LucenePhraseQuery … Read more

how do I normalise a solr/lucene score?

To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages: People frequently want to compute a “Percentage” from Lucene scores to determine what is a “100% perfect” match vs a “50%” match. This is also somethings called a “normalized score” Don’t do this. Seriously. Stop trying to think about your problem this way, it’s not going to end well. That page does … Read more

What is the default list of stopwords used in Lucene’s StopFilter?

The default stop words set in StandardAnalyzer and EnglishAnalyzer is from StopAnalyzer.ENGLISH_STOP_WORDS_SET, as found in the source file: “a”, “an”, “and”, “are”, “as”, “at”, “be”, “but”, “by”, “for”, “if”, “in”, “into”, “is”, “it”, “no”, “not”, “of”, “on”, “or”, “such”, “that”, “the”, “their”, “then”, “there”, “these”, “they”, “this”, “to”, “was”, “will”, “with” StopFilter itself defines no … Read more

Get highest frequency terms from Lucene index

A very simple way would be to use Luke. On the ‘Overview’ tab, there is a ‘Show top terms’ button that can be used for what you need.

Elasticsearch vs Cassandra vs Elasticsearch with Cassandra

One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely. We … Read more

get cosine similarity between two documents in lucene

As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more