get cosine similarity between two documents in lucene

As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more

Hibernate Search | ngram analyzer with minGramSize 1

Updated answer for Hibernate Search 6 With Hibernate Search 6, you can define a second analyzer, identical to your “ngram” analyzer except that it does not have an ngram filter, and assign it as the searchAnalyzer for your field: public class Hospital { // … @FullTextField(analyzer = “ngram”, searchAnalyzer = “my_analyzer_without_ngrams”) private String name = … Read more

How to control Indexing a field in lucene 4.0

Constructors taking Field.Index arguments are available, but are deprecated in 4.0, and should not be used. Instead, you should look to subclasses of Field to control how a field is indexed. StringField is the standard un-analyzed indexed field. The field is indexed is a single token. It is appropriate things like identifiers, for which you … Read more

Filename search with ElasticSearch

You have various problems with what you pasted: 1) Incorrect mapping When creating the index, you specify: “mappings”: { “files”: { But your type is actually file, not files. If you checked the mapping, you would see that immediately: curl -XGET ‘http://127.0.0.1:9200/files/_mapping?pretty=1’ # { # “files” : { # “files” : { # “properties” : … Read more