Mapping in Elasticsearch

Mapping is the process of defining how a document and fields it contains are stored and indexed. Mapping is very useful to define for following cases:

  1. Which string fields should be treated as full text fields?
  2. Which fields contain numbers, dates or geolocations?
  3. Whether the values of all fields in the document should be indexed into the catch-all _all field?
  4. Format of date values?
  5. Custom rules to control the mapping for dynamically added fields?

Mapping can be done in two ways in Elasticsearch.

Dynamic Mappings

This is the default mapping provided by the Elasticsearch. Fields and mapping types do not need to be defined before being used. New mapping types and new field names will be added automatically, just by indexing document.

Explicit Mappings

Dynamic mapping can be useful to get started. But at some point, we will want to specify our own explicit mappings since we know our data more than Elasticsearch. We can create mapping types and field mappings when we create an index. We can also add mapping types and fields to existing index with PUT mapping API of Elasticsearch.

We can create mapping for an index in two ways.

  1. Programmatically using java
  2. Using CLI

Programmatically using java

Mapping can be created at the time of index creation. It is recommended to create mapping before adding any data to index. Since, we know most of the fields before hand, it is very easy to do programmatically. If the mappings are present in the index, Exception will be thrown. We need to do safety check to eliminate this issue. The code looks like following:

import java.io.IOException;

import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;

public class IndexAndCreateMapping {

	public void prepareIndex(Client client, String indexName, String documentType) throws IOException {
		//looking for index if that exists already
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(indexName).get();

		//need to have some logic to create mapping only when the index is not created yet
		if (!indexResponse.isExists()) {
			XContentBuilder builder = getMappingBuilder();
			//creating mapping
			client.admin().indices().prepareCreate(indexName).addMapping(documentType, builder).get();

			//refeshes the index so that it can be accessed instantly
			client.admin().indices().prepareRefresh().get();
		}
	}

	private XContentBuilder getMappingBuilder() throws IOException {
		return XContentFactory.jsonBuilder().prettyPrint().startObject()
				.startObject("movie")
				.startObject("properties")
				.startObject("Directory").field("type", "string").field("index", "not_analyzed").endObject()
				.startObject("Title").field("type", "string").endObject()
				.startObject("Generes").field("type", "string").endObject()
				.startObject("Year").field("type", "integer").endObject()
				.endObject()
				.endObject()
				.endObject();
	}
}

We can add new mapping types and fields to the existing index with PUT mapping API.

Using CLI

We can run PUT mapping API from command line. Following CURL command does the following.

  1. Creates an index called movies.
  2. Adds mapping types called movies.
  3. Specifies fields or properties in each mapping type.
  4. Specifies the data type and mapping for each field.
curl -XPUT 'localhost:9200/movies?pretty' -d'
{
  "movie" : {
    "properties" : {
      "Director" : {
        "type" : "string",
        "index" : "not_analyzed"
      },
      "Title" : {
        "type" : "string"
      },
      "Generes" : {
        "type" : "string"
      },
      "Year" : {
        "type" : "integer"
      }
    }
  }
}'

Updating Existing Mappings

Existing type and field mappings cannot be updated unless otherwise documented. Changing the mapping would mean invalidating already indexed documents. Instead, we should create a new index with correct mappings and re-index the data. However, there are some exceptions to this rule. For instance:

  1. New properties can be added to object datatype fields.
  2. New multi-fields can be added to existing fields.
  3. The ignore_above parameter can be updated.

We can add mapping to the existing index using CLI as follows. In this example, we are adding Hero of each movie:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed"
        }
    }
}'

When we add index : not_analyzed,  it will not analyzed the string while indexing the data. This means that when we search, it will match for whole word rather than keywords.

We can also update the existing mappings by updating new field as follows:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 100
        }
    }
}'

Ignore_above will ignore any hero value that are longer than 100 characters.

Mapping is one time thing. On my opinion, it is better to do mapping when we create index from command line. Since, it is one time process to create mapping, it is better to add mapping from CLI before adding data to the Elasticsearch. However, we can add programmatically as well which need constant check of index.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s