Save Spark Dataframe into Elasticsearch – Can’t handle type exception

The answer for this one was tricky, but thanks to samklr, I have managed to figure about what the problem was. The solution isn’t straightforward nevertheless and might consider some “unnecessary” transformations. First let’s talk about Serialization. There are two aspects of serialization to consider in Spark serialization of data and serialization of functions. In … Read more

Change default mapping of string to “not analyzed” in Elasticsearch

Just create a template. run curl -XPUT localhost:9200/_template/template_1 -d ‘{ “template”: “*”, “settings”: { “index.refresh_interval”: “5s” }, “mappings”: { “_default_”: { “_all”: { “enabled”: true }, “dynamic_templates”: [ { “string_fields”: { “match”: “*”, “match_mapping_type”: “string”, “mapping”: { “index”: “not_analyzed”, “omit_norms”: true, “type”: “string” } } } ], “properties”: { “@version”: { “type”: “string”, “index”: “not_analyzed” … Read more

Queries vs. Filters

The difference is simple: filters are cached and don’t influence the score, therefore faster than queries. Have a look here too. Let’s say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.

Elasticsearch searchable synthetic fields

There are 2 steps to this — a dynamic_mapping and an ingest_pipeline. I’m assuming your field c is non-trivial so you may want to match that field in a dynamic template using a match and assign the keyword mapping to it: PUT synthetic { “mappings”: { “dynamic_templates”: [ { “c_like_field”: { “match_mapping_type”: “string”, “match”: “c*”, … Read more

How to update multiple documents that match a query in elasticsearch

You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish. curl -XPOST ‘localhost:9200/webproxylog/_update_by_query’ -H “Content-Type: application/json” -d ‘ { “query”: { “filtered”: { “filter”: { “bool”: { “must”: … Read more