Testing Mapping with Embedded Elasticsearch Server

Running instance of Elasticsearch cluster server won’t available for unit test. On the other hand, unit test shouldn’t depend on the external running instance. If you want to test the real Elasticsearch behavior in unit test, you can use embedded Elasticsearch server instance. Embedded Elasticsearch server is a small instance of Elasticsearch cluster and it works exactly as Elasticsearch cluster as a whole.
You need following dependency in POM file to run Elasticsearch as embedded in your application for unit test.

	<dependency>
	    <groupId>org.elasticsearch</groupId>
	    <artifactId>elasticsearch</artifactId>
	    <version>5.2.1</version>
	</dependency>
        <dependency>
	    <groupId>org.elasticsearch.client</groupId>
	    <artifactId>transport</artifactId>
	    <version>5.2.1</version>
	</dependency>
	<dependency>
	    <groupId>com.google.guava</groupId>
	    <artifactId>guava</artifactId>
	    <version>21.0</version>
	</dependency>

Following code tests that whether the mapping has been created in Elasticsearch or not. We can do using Mockito. However, I don’t think Mockito will give us full confidence on our code. I am using embedded Elasticsearch to make sure I can create index and mapping without any issue. If you want to see the mapping, Please click here for my mapping tutorial.

package com.vsubedi.elasticsearch;

import static org.junit.Assert.*;

import java.io.File;
import java.io.IOException;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.common.io.Files;

/**
 *
 * @author vsubedi
 *
 */
public class PrepareIndexTest {

	private static final Logger logger = LoggerFactory.getLogger(PrepareIndexTest.class);

	private static Node server;
	private static Client client;
	private static File tempDir;
	private static String index = "test";
	private static String docType = "movies";

	@BeforeClass
	public static void setUpBeforeClass() throws Exception {
		logger.info("======= START INTEGRATION TEST ========");

		// spinning up the elasticsearch server for junit
		tempDir = Files.createTempDir();
		logger.info(tempDir.getAbsolutePath());
		Settings settings = Settings.builder().put("path.home", tempDir.getAbsolutePath())
				.put("transport.type", "local")
				.put("http.enabled", false)
				.build();
		server = new Node(settings);
		final String clusterName = server.settings().get("cluster.name");

		logger.info("starting server with cluster-name: [{}]", clusterName);
		server.start();

		client = server.client();
	}

	@AfterClass
	public static void tearDownAfterClass() throws Exception {
		DeleteIndexResponse deleteIndexResponse = client.admin().indices()
				.prepareDelete(index).get();
		assertTrue(String.valueOf(deleteIndexResponse.isAcknowledged()).equalsIgnoreCase("true"));
		tempDir.delete();
		client.close();
		server.close();
		logger.info("======= END INTEGRATION TEST ========");
	}

	@Test
	public void testPrepareIndex() throws IOException {
		//creating index
		IndexAndCreateMapping.prepareIndex(client, index, docType);

		//checking if the index has been created or not
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(index).get();

		//it should be true
		assertTrue(indexResponse.isExists());
	}

}

Above code is for Elasticsearch version 5.2.1. If you want to run for previous versions like 2.4.0, you need to use NodeBulder to create a embedded Elasticsearch instance as follows:

tempDir = Files.createTempDir();
logger.info(tempDir.getAbsolutePath());
Settings settings = Settings.builder().put("path.home",
tempDir.getAbsolutePath()).build();
server = NodeBuilder.nodeBuilder().settings(settings).build();
final String clusterName = server.settings().get("cluster.name");

logger.info("starting server with cluster-name: [{}]", clusterName);
server.start();

client = server.client();

with following dependency

		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>2.4.1</version>
			<scope>test</scope>
			<type>test-jar</type>
		</dependency>

Mapping in Elasticsearch

Mapping is the process of defining how a document and fields it contains are stored and indexed. Mapping is very useful to define for following cases:

  1. Which string fields should be treated as full text fields?
  2. Which fields contain numbers, dates or geolocations?
  3. Whether the values of all fields in the document should be indexed into the catch-all _all field?
  4. Format of date values?
  5. Custom rules to control the mapping for dynamically added fields?

Mapping can be done in two ways in Elasticsearch.

Dynamic Mappings

This is the default mapping provided by the Elasticsearch. Fields and mapping types do not need to be defined before being used. New mapping types and new field names will be added automatically, just by indexing document.

Explicit Mappings

Dynamic mapping can be useful to get started. But at some point, we will want to specify our own explicit mappings since we know our data more than Elasticsearch. We can create mapping types and field mappings when we create an index. We can also add mapping types and fields to existing index with PUT mapping API of Elasticsearch.

We can create mapping for an index in two ways.

  1. Programmatically using java
  2. Using CLI

Programmatically using java

Mapping can be created at the time of index creation. It is recommended to create mapping before adding any data to index. Since, we know most of the fields before hand, it is very easy to do programmatically. If the mappings are present in the index, Exception will be thrown. We need to do safety check to eliminate this issue. The code looks like following:

import java.io.IOException;

import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;

public class IndexAndCreateMapping {

	public void prepareIndex(Client client, String indexName, String documentType) throws IOException {
		//looking for index if that exists already
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(indexName).get();

		//need to have some logic to create mapping only when the index is not created yet
		if (!indexResponse.isExists()) {
			XContentBuilder builder = getMappingBuilder();
			//creating mapping
			client.admin().indices().prepareCreate(indexName).addMapping(documentType, builder).get();

			//refeshes the index so that it can be accessed instantly
			client.admin().indices().prepareRefresh().get();
		}
	}

	private XContentBuilder getMappingBuilder() throws IOException {
		return XContentFactory.jsonBuilder().prettyPrint().startObject()
				.startObject("movie")
				.startObject("properties")
				.startObject("Directory").field("type", "string").field("index", "not_analyzed").endObject()
				.startObject("Title").field("type", "string").endObject()
				.startObject("Generes").field("type", "string").endObject()
				.startObject("Year").field("type", "integer").endObject()
				.endObject()
				.endObject()
				.endObject();
	}
}

We can add new mapping types and fields to the existing index with PUT mapping API.

Using CLI

We can run PUT mapping API from command line. Following CURL command does the following.

  1. Creates an index called movies.
  2. Adds mapping types called movies.
  3. Specifies fields or properties in each mapping type.
  4. Specifies the data type and mapping for each field.
curl -XPUT 'localhost:9200/movies?pretty' -d'
{
  "movie" : {
    "properties" : {
      "Director" : {
        "type" : "string",
        "index" : "not_analyzed"
      },
      "Title" : {
        "type" : "string"
      },
      "Generes" : {
        "type" : "string"
      },
      "Year" : {
        "type" : "integer"
      }
    }
  }
}'

Updating Existing Mappings

Existing type and field mappings cannot be updated unless otherwise documented. Changing the mapping would mean invalidating already indexed documents. Instead, we should create a new index with correct mappings and re-index the data. However, there are some exceptions to this rule. For instance:

  1. New properties can be added to object datatype fields.
  2. New multi-fields can be added to existing fields.
  3. The ignore_above parameter can be updated.

We can add mapping to the existing index using CLI as follows. In this example, we are adding Hero of each movie:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed"
        }
    }
}'

When we add index : not_analyzed,  it will not analyzed the string while indexing the data. This means that when we search, it will match for whole word rather than keywords.

We can also update the existing mappings by updating new field as follows:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 100
        }
    }
}'

Ignore_above will ignore any hero value that are longer than 100 characters.

Mapping is one time thing. On my opinion, it is better to do mapping when we create index from command line. Since, it is one time process to create mapping, it is better to add mapping from CLI before adding data to the Elasticsearch. However, we can add programmatically as well which need constant check of index.

Deleting Containers and Images from Docker

If you want to delete specific image or container from docker registry, it is very easy to delete just by writing following command.

for container: docker rm [container name]
for image: docker rmi [image id/name]

What about if you want to delete all the images and containers from your docker? It is very easy to remove all container and images at once as well. Please execute following command to remove all containers from docker

docker rm $(docker ps -a -q)

This command won’t delete running containers. You need to stop all the running containers if you want to delete running container as well.

On the other hand, if you want to delete all the images from docker registry, you can run following command.

docker rmi $(docker images -q)

Sometimes above command won’t delete all the images from docker. If it did not delete some images, you can run following command to force delete.

docker rmi -f $(docker images -q)

 

reference: https://github.com/docker/docker/issues/928#issuecomment-23538307

Connecting to Elasticsearch Cluster Using Java API

We can use Elasticsearch Client to connect to elasitcsearch. There are Transport Client and Node Client to connect to Elasticsearch cluster. Node is very lowever level client and knows the cluster very well. However, the TransportClient connects remotely to an Elasticsearch cluster using the transport module. It does not join the cluster, but simply gets one or more initial transport addresses and communicates with them in round robin fashion on each action. It is recommended to use Tranport client. After 5.0.0 version, we need to add following dependency for Transport client to make it work along with core jar.


<!-- https://mvnrepository.com/artifact/org.elasticsearch.client/transport -->
<dependency>
     <groupId>org.elasticsearch.client</groupId>
     <artifactId>transport</artifactId>
     <version>5.2.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch -->
<dependency>
     <groupId>org.elasticsearch</groupId>
     <artifactId>elasticsearch</artifactId>
     <version>5.2.0</version>
</dependency>

We can initialize the Elasticsearch Client as following using JAVA API. It is not recomended to create new connection for every call.


/**
* Class to initialized the Elasticsearch client. It will re-use the same client
*
* @author vsubedi
*
*/
public class ElasticsearchClient{

private static Client client;

/**
* Initialized the Elasticsearch Client if the client is not null. If the
* client is not null, that client will be used
*
* @param clustername - clusername
* @param ipAddress -ES ip address
* @param port -ES port
* @return @Client
* @throws UnknownHostException
*/
@SuppressWarnings("resource")
public static Client getElasticsearchClient(String clustername, String ipAddress, int port) throws UnknownHostException{

   if(client != null){
       return client;
   }

   Settings settings = Settings.builder().put("cluster.name", clustername).build();
   client = new PreBuiltTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), port));

   return client;
}

public static void closeClient(){
    if(client != null){
        client.close();
     }
  }
}

Before 5.0.0 version (2.* versions), we could use TranportClient that was in core. We could connect to elasticsearch that were before 5.0.0 as follows:


public static Client getElasticsearchClient(String clustername, String ipAddress, int port) throws UnknownHostException{

     if(client != null){
          return client;
     }

     Settings settings = Settings.builder().put("cluster.name", clustername).build();
     client = TransportClient.builder().settings(settings).build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), port));

    return client;
}

For more: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html

Easy Way to Write toString() Method

It starts to get boring to write toString() method especially if you want to include a lot of instance variables. If you are using eclipse, you can generate toString() method code by clicking Source > Generate toString()…. Method looks ugly to be honest.


@Override
public String toString(){
   return"Message [id="+ id +", message="+ message +", created="
         + created +", author="+ author +", comments="+ comments
         +", links="+ links +"]";
}

Luckily, there is an open source library that takes care of it for you. Apache Commons Lang provides some methods that you might can use to write toString() method in cleaner way.

This all you have to write toString() method using Apache Commons.

public String toString() {
   return ToStringBuilder.reflectionToString(this);
}

Above method outputs all the variables with the memory location of a object that you are trying to print. We can remove memory location of a object by writing code like follows.

public String toString() {
   return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE);
}

There are other styles too if you want to explore on it. The dependency for Apache Commons is as follows.

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.5</version>
</dependency>

Reading and Writing text files using java – part 1

Manipulating text file is the main skill that needs to be in a Java programmer. Manipulating means reading text file and writing into the another text file. Text file means just plain text file. You can create text file in your notepad/wordpad in windows. If you have mac, you can use textedit. You can also use Notepad++ or Sublime Text to create text file which both are compatible for windows and mac. There are many ways to read and write text file depending on the JDK that you are using. There are different classes that you can use from java library from those JDKs. On the other hand, it also depends on the requirements of the project and size of the file that you are trying to read and write. I will show you couple of ways to read and write text file.

Let’s start with important classes that we need to know. Most of the classes covered in the I/O Streams section are in the java.io package. Most of the classes covered in the File I/O section are in the java.nio.file package. Following are the building blocks classes of java I/O. There are a lot more in Oracle Website

  •  Path – file name with location
  •  Files – class to do operations on file
  •  Charset – for the encoding of file
  •  Scanner – allows to read file and take input from keyboard
  •  BufferedReader – readLine
  •  BufferedWriter – write new line

We should always a need to pay attention to exceptions in particular IOException and FileNotFoundException. You can catch these exceptions and provide some useful information to the user. It is always best practice to handle the exception and provide the best explanation so that user can understand the cause of the exception. Now I will show you different ways of reading and writing text file depending on the JDK that you are using.

Reading and Writing File using JDK < 7

Most entry level Java programmer use FileReader and FileWriter. FileReader and FileWriter are the widely used classes to read and write files. These classes are a bit tricky, because they use the system’s default character encoding. If we want to read and write system independent character encoding, we need look for other classes. Here are the few recommended alternatives:

  •  FileInputStream fileInput = new FileInputStream(“test.txt”);
  •  InputStreamReader inputStream = new InputStreamReader(fileInput, “Your file encoding”);
  •  FileOutputStream fileOutput = new FileOutputStream(“test.txt”);
  •  OutputStreamWriter outputStream = new OutputStreamWriter(fileOutput, “Your file encoding”);
  •  Scanner scanner = new Scanner(“test.txt”, “Your file encoding”);

You can always choose the system specific seperater by calling System.getProperty("line.separator") method for system specific line separator. It is the safest bet for the programmer to use System.getProperty(“line.seperator”). Following 3 methods take file as an argument. Following methods are used with JDK 6 and lower.

Method 1:- Using BufferedReader and InputStreamReader

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;

public class ReadWriteFile {
/**
     * Static method to read file using BufferedReader, InputStreamReader and FileInputStream
     * @param aFile
     */
    public static void readWriteFileMethod1(File aFile) {
        //reading from a file and writing into another file
        try {
            //Construct BufferedReader from InputStreamReader for fast reading
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(aFile)));
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File("myWriteFile.txt"))));

            String line = null;
            while ((line = br.readLine()) != null) {
                /**
                 * You can split each line here, if you need but you need to know
                 * the data splitter a head of time.
                 */
                System.out.println(line);

                //writing each line to the myWriterFile.txt
                bw.write(line);
            }

            //Closing resources. Always good idea to close resource when we don't need
            br.close();
            bw.close();
        } catch (FileNotFoundException e) {
            System.out.println(aFile +" file is not found in the path that you provided");
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void main(String[] args) {
        //reading file in fileinputstreadm
        ReadFile.readFileMethod1(new File("../resources/myTest.txt"));
        System.out.println();
    }
}

Method 2: Using BufferedReader and FileReader

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.Scanner;

public class ReadWriteFile {
/**
     * Static method to read file using BufferedReader, FileReader
     * @param aFile
     */
    public static void readWriteFileMethod2(File aFile) {
        // Construct BufferedReader from FileReader
        BufferedReader br;
        BufferedWriter bw;
        try {
            br = new BufferedReader(new FileReader(aFile));
            bw = new BufferedWriter(new FileWriter(new File("mySecondFile.txt")));

            String line = null;
            while ((line = br.readLine()) != null) {
                /**
                 * You can split each line here if you need but you need to know
                 * the data splitter a head of time.
                 */
                System.out.println(line);
                bw.write(line);
            }

            //Closing resources. Always good idea to close resource when we don't need
            br.close();
            bw.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void main(String[] args) {
        //reading file FileReader and buffered reader
        ReadFile.readFileMethod2(new File("../resources/myTest.txt"));
        System.out.println();
    }
}

Method 3: Using Scanner and FileInputStream

import java.util.Scanner;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.PrintWriter;

public class ReadWriteFile {
    /**
     * Static method to read file using Scanner and FileInputStream
     * @param aFile
     * @throws IOException 
     */
    public static void readWriteFileMethod3(File aFile) throws IOException {

        //String builder is just to append/concat the each line to the text
        StringBuilder text = new StringBuilder();
        Scanner scanner = new Scanner(new FileInputStream(aFile), "UTF-8");
        PrintWriter pw = new PrintWriter(new FileWriter(new File("outputFile.txt")));
        while (scanner.hasNextLine()){
            /**
             * You can split each line here if you need but you need to know
             * the data splitter a head of time.
             */
            text.append(scanner.nextLine() + System.getProperty("line.separator"));
            pw.write(scanner.nextLine());
          }
        System.out.println("The text that we read: " + text);
        scanner.close();
        pw.close();
      }

    public static void main(String[] args) throws IOException {
        //reading file using scanner
        try {
            ReadWriteFile.readWriteFileMethod3(new File("../MyWebProjectCodes/resources/myTest.txt"));
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
}

Above codes reads and write file line by line. You can split each line into more small pieces and put it into an array or create an object for later use or insert into the table using JDBC connection.

You might have questions, why so many ways to do the same thing? There are couples of differences depending on the thread safe and efficient. All works for reading a text file line by line though. Method 1 uses InputStreamReader and Method 2 uses FileReader. You might have a question on your mind that what’s the difference between those two classes? According to Java Doc that I retrieved from Oracle website says, “An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset.” The good about InputStreamReader is that it can handle other input streams than files, such as network connections, classpath resources, ZIP files, etc.

On the other hand, according to Java Doc, FileReader is “Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.” Having said that, FileReader does not allow you to specify an encoding other than the platform defaults encoding. If you need to read the file with specific encoding, you can’t use FileReader. In summary, InputStreamReader is always a safer choice than FileReader because of encoding and can be used in other purposes as well such as network connections, zipped files etc.

How to setup SQuirrel SQL Client for PostgreSQL/Greenplum?

SQuirreL SQL Client is a popular open source graphical application written in java that allows users to browse tables and issue all kinds of SQL commands. SQuirrel can be used in variety of OS like Java application since it is written in java. The newer and recent version i.e. 3.6 supports Java 1.6 or higher. If you are running older version of java, you should use the older SQuirrel versions. Each database has its own application/tool to access its database. SQuirrel SQL Client is the answer for those people who can use all the database in one application. I believe SQuirrel SQL Client is the best open source application I can find in the market to access any database in one application. Here are the few tips to set up best application to connect to PostgreSQL database.

  1.  Download SQuirrel SQL Client from http://squirrel-sql.sourceforge.net/#installation
  2.  Go to postgreSQL website and download the right driver depending on your database
  3.  Launch the SQuirrel SQL Client.
  4.  Open Driver list from left menu, click the plus sign “Create a New Driver”.
  5.  Type ” Greenplum Driver” or your choice of name in “Name” text box.
  6.  Type jdbc:postgresql://:/ in “Example URL” text box. Port number might get changed for you.
  7.  5. Type org.postgresql.Driver in “Class Name” editable drop down list.
  8.  Click “Extra Class Path” tab.
  9.  7. Click “Add” and select the JDBC jar for postgresql postgresql-9.4-1201.jdbc41.jar.
  10.  Click Ok, and we are done defining the driver.

squirrel3

At the end, now create an alias for the DB using previous driver and providing URL, username, & password.

squirrel4