Recursion in Java

A Recursion is when the method calls itself to solve a problem. The recursive solution is constructed with a base case and a recursive case. In every recursive solution, we need to have these two components. If we miss base case in our recursive solution, the method will run in never ending loop and will throw StackOverFlowError.

Base Case

A non-recursive method that terminates the recursive path. This should run first in a method to check to terminate the recursive path.

Recursive Case

A recursive method that calls itself one or multiple times to solve a problem.

The best example of recursion is to compute the factorial of a number. In mathematics, a factorial is what you get when you multiply a number by all of the integers below it. The factorial of 6 is equal to 6 * 5 * 4 * 3 * 2 * 1 = 720. We can write a recursive function for factorial as follows in java:

public static int factorial(int n) {
	if (n <= 1) {
		return 1;
	} else {
		return n * factorial(n - 1);
	}
}

If we trace the function call of above recursive function, we can see as follows:

factorial(6)
	factorial(5)
		factorial(4)
			factorial(3)
				factorial(2)
					factorial(1)
					return 1
				return 2*1 = 2
			return 3*2 = 6
		return 4*6 = 24
	return 5*24 = 120
return 6*120 = 720

In this example, you see that 1 is the base case, and any integer value greater than 1 triggers the recursive case.

One challenge in implementing a recursive solution is always to make sure that the recursive process arrives at a base case. For example, if the base is never reached, the solution will continue infinitely and the program will hang. In java, this will result in a StackOverFlowError anytime the application recurses too deeply.

Following famous algorithms are using recursive strategy to solve a problem

  1. Euclid’s algorithm
  2. Towers of Hanoi
  3. Brownian bridge

Properties File Formats for Java

The most common syntax is where a property file contains key/value pairs in the following format. If you define the log4j.properties file, you might define following way which is the default way of doing it.

#This is a stub log4.properties file used by the bootstrap process

log4j.rootCategory=info, stdout, rf

#---------------------
#Log to Console
#---------------------
#Log messages to the console for priority INFO, WARN, ERROR, and FATAL
#To Log debug or trace, set log4j.rootCategory to DEBUG or TRACE
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
#
#DEFAULT - log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %n
#Pattern 1 - full class name - log4j.appender.stdout.layout.ConversionPattern=%d %p [%t:%c] - %n
#Pattern 2 - no class name
#
log4j.appender.stdout.layout.ConversionPattern=%d %-5p [%c{2}:%t] - %n
log4j.appender.stdout.Threshold=info


#---------------------
#Log to Rolling File
#---------------------
#Log messages to the log file (backed up each day) for priority INFO, WARN, ERROR, and FATAL
#To Log debug or trace, set log4j.rootCategory to DEBUG or TRACE
log4j.appender.rf=org.apache.log4j.DailyRollingFileAppender
log4j.appender.rf.File=log/app-bootstrap.log
log4j.appender.rf.DatePattern='.'yyyy-MM-dd
log4j.appender.rf.Append=true
log4j.appender.rf.layout=org.apache.log4j.PatternLayout
log4j.appender.rf.layout.ConversionPattern=%d %-5p [%c{2}:%t] - %n
log4j.appender.rf.Threshold=debug

There is more to it than that. There are actually two other format that you can use to express these pairs. Even if you never use them in your job, it is better to know in case you might encounter in code.

name:David
name David

You might wonder how to express some other ideas in a property file. The common ones are as follows:

  1. If a line begins with # or !, it is a comment. I use # syntax on above property file defination for comment.
  2. Spaces before or after the separator character are ignored.
  3. Spaces at the beginning of a line are ignored.
  4. Spaces at the end of a line are not ignored.
  5. End a line with a backlash if you want to break the line for readability.
  6. You can use normal Java escape characters like \t and \n.

If we put these concepts together, we can write the following:

# one comment
! second comment
key =    value
string = value \tafter tab
long = lkajdfljasdlj\
nandfsdlkjfdlj

Printing our these two properties in a program gives us theses:

value
value [tab] after tab
lkajdfljasdljnandfsdlkjfdlj

 
It is always good to know how many ways the properties file can be written. There are various ways to read these kinds of property files as well. One of the way is to use FileInputStream to load all the properties of a file to Properties object as follows:

public static void main(String[] args) {
	Properties prop = new Properties();
	InputStream input = null;
	try {
		input = new FileInputStream("config.properties");

		// load a properties file
		prop.load(input);

		// get the property value and print it out
		System.out.println(prop.getProperty("database"));
		System.out.println(prop.getProperty("dbuser"));
		System.out.println(prop.getProperty("dbpassword"));
	} catch (IOException ex) {
		ex.printStackTrace();
	} finally {
		if (input != null) {
			try {
				input.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
  }

Another way is to use ResourceBundle. ResourceBundle.getBundle() takes property file name and the Locale object to distinguish which property file to load. It always loads file from resource folder of a project.


// ResourceBundle class will use SystemMessages.properties file
ResourceBundle resourceBundle = ResourceBundle.getBundle(
"SystemMessages", Locale.getDefault());
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

// ResourceBundle class will use SystemMessages_es.properties file
resourceBundle = ResourceBundle.getBundle("SystemMessages",
Locale.forLanguageTag("es"));
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

// ResourceBundle class will use SystemMessages_fr.properties file
resourceBundle = ResourceBundle.getBundle("SystemMessages",
Locale.FRANCE);
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

If you want to learn more about ResourceBundle, please visit https://docs.oracle.com/javase/tutorial/i18n/resbundle/propfile.html site for more information.
 

Creating a JanusGraph using Gremlin Console

JanusGraph is new community project under the Linux foundation. It is forked from TitanDB code. JanusGraph incorporates support for the property graph model with Apache TinkerPop (the open source graph computing framework) and its Gremlin graph traversal language. According to JanusGraph website:

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.

I will show how you can run JanusGraph locally so that you can try out yourself since it might become the de-facto implementation of TinkerPop. I will be using Cassandra and Elasticsearch for backend databases.

Procedure

  1. Start Gremlin console as per Starting with Gremlin Console
  2. Define your connection properties in properties file to connect to Cassandra and Elasticsearch docker images as follows. (hostname of Cassandra and Elasticsearch might change on your machine).
  3. storage.backend=cassandrathrift
    storage.hostname=localhost
    cache.db-cache=true
    cache.db-cache-clean-wait=20
    cache.db-cache-time=180000
    cache.db-cache-size=0.25
    index.search.backend=elasticsearch
    index.search.hostname=localhost
    index.search.elasticsearch.client-only=true

  4. Run docker pull cassandra:2.1.9 command to pull cassandra from Docker Hub. It is using old version of cassandra.
  5. Run docker pull elasticsearch:2.4.4 command to pull elasticsearch from Docker Hub.
  6. After successful pull, run following command to run cassandra docker image. docker run -e CASSANDRA_START_RPC=true --name cassandra-latest -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9042:9042 -p 9160:9160 -t -d cassandra:2.1.9
  7. Look for following message
    Screenshot 2017-04-11 19.07.04
  8. Run docker run –name elasticsearch-2.4.4 -p 9300:9300 -p 9200:9200 -t -d elasticsearch:2.4.4 command run run elasticsearch docker image.
    Screenshot 2017-04-11 19.13.10
  9. Check both images are running by running docker ps command and see both images are running or not.
    Screenshot 2017-04-11 20.37.07
  10. Run following command to connect to your backend using Gremlin console.
    gremlin> graph = JanusGraphFactory.open('.../website/janusgraph.properties')
    ==> standardjanusgraph[cassandrathrift:[localhost]]
  11. Once a graph exists, a graph traversal g is configured that will allow graph traversal. Graph traversal is used to query the graph data and returns results. A graph traversal is bound to specific traversal source that is the standard Janus traversal engine.

  12. Create traversal instance so that we can query our graph database. We can also use traversal instance to create vertices.
    gremlin> g = graph.traversal()
    ==>graphtraversalsource[standardjanusgraph[cassandrathrift:[localhost]], standard]
  13. The graph commands usually add vertices and edges to database or get other graph information. the g commands generally do queries to obtain results.

  14. Load some vertices and start playing with JanusGraph.

Starting Gremlin Console With JanusGraph in Linux

Gremlin is the graph database query language that is used to interact with JanusGraph. One method of putting Gremlin code in JanusGraph is to use Gremlin Console that is being provided by the JanusGraph which comes with JanusGraph plugin. The Gremlin console is a very useful interactive environment for directly writing Gremlin query to create graph schema, load data, administer graph, and retrieval traversal results.

Procedure

  1. Clone JanusGraph from its GitHub repository https://github.com/JanusGraph/janusgraph/
  2. Run mvn clean install -DskipTests=true command on your console. You need to be inside Janusgraph folder to run this command. -DskipTest=true would skip all the test from the build. If you want to run test, you can run it but it would take around 5 hours for all tests. It will take around 25 minutes to build even with -DskipTest=true.
  3. Once step two completed successfully, navigate to bin folder. bin folder only gets created when the step 2 gets completed successfully.
  4. run the ./gremlin.sh command on your console. If your are not inside the bin folder. You can run bin/gremlin.sh command from your console.
  5. The console output looks like follows

Screenshot 2017-04-10 21.04.55

Note: Six plugins are activated by default, as shown. The Gremlin Server, tinker pop.server, is started so that commands can be issued to JanusGraph. The utilities plugin, tunkerpop.utilities provides various functions, helper methods and imports of external classes that are useful in Gremlin console.

Discover all Gremlin console commands with help. Console commands are not Gremlin language commands, but rather commands issue to Gremlin console for shell functionality. The Gremlin console is based on Groovy Language.

Screenshot 2017-04-10 21.05.42

Testing Void Method Using Mockito

Mockito is industry wide mocking framework for unit test. It gives an ability to a developer to write his/her code without depending on other developer’s code. Testing void method is hard because it doesn’t return any value. In this blog, I will show, how we can test our void method using Mockito.

Purpose:

Unit testing void method of a class.

Dependency:

<!-- https://mvnrepository.com/artifact/org.mockito/mockito-all -->
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.10.19</version>
</dependency>

Implementation:

Class to hold void method:

package com.vsubedi.voidTest;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SomeVoidMethod {
	private static final Logger logger = LoggerFactory.getLogger(SomeVoidMethod.class);

	public void printLogs(){
		logger.info("This method has been executed");
	}
}

Testing above class’s void method:

package com.vsubedi.voidTest;

import static org.mockito.Mockito.verify;

import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import org.mockito.InjectMocks;
import org.mockito.MockitoAnnotations;
import org.mockito.Spy;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class TestVoidMethod {
	private static final Logger logger = LoggerFactory.getLogger(TestVoidMethod.class);

	@InjectMocks
	@Spy
	private SomeVoidMethod someVoidMethod;

	@BeforeClass
	public static void setUpBeforeClass() throws Exception {
		logger.info("============ START UNIT TEST ==============");
	}

	@AfterClass
	public static void tearDownAfterClass() throws Exception {
		logger.info("============ END UNIT TEST ==============");
	}

	@Before
	public void setUp() throws Exception {
		MockitoAnnotations.initMocks(this);
	}

	@Test
	public void testPrintLogs() {
		someVoidMethod.printLogs();

		//verifying the interaction to the method
		verify(someVoidMethod).printLogs();
	}
}

We need to have @Spy annotation to run verify() method which is supplied by Mockito. Verify tells that printLogs() of SomeVoidMethod class has been executed at least once.

This is just a very simple example to show how we can test void method. Void methods can be complex. We can also test those complex void methods using Mockito.

Testing Mapping with Embedded Elasticsearch Server

Running instance of Elasticsearch cluster server won’t available for unit test. On the other hand, unit test shouldn’t depend on the external running instance. If you want to test the real Elasticsearch behavior in unit test, you can use embedded Elasticsearch server instance. Embedded Elasticsearch server is a small instance of Elasticsearch cluster and it works exactly as Elasticsearch cluster as a whole.
You need following dependency in POM file to run Elasticsearch as embedded in your application for unit test.

	<dependency>
	    <groupId>org.elasticsearch</groupId>
	    <artifactId>elasticsearch</artifactId>
	    <version>5.2.1</version>
	</dependency>
        <dependency>
	    <groupId>org.elasticsearch.client</groupId>
	    <artifactId>transport</artifactId>
	    <version>5.2.1</version>
	</dependency>
	<dependency>
	    <groupId>com.google.guava</groupId>
	    <artifactId>guava</artifactId>
	    <version>21.0</version>
	</dependency>

Following code tests that whether the mapping has been created in Elasticsearch or not. We can do using Mockito. However, I don’t think Mockito will give us full confidence on our code. I am using embedded Elasticsearch to make sure I can create index and mapping without any issue. If you want to see the mapping, Please click here for my mapping tutorial.

package com.vsubedi.elasticsearch;

import static org.junit.Assert.*;

import java.io.File;
import java.io.IOException;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.common.io.Files;

/**
 *
 * @author vsubedi
 *
 */
public class PrepareIndexTest {

	private static final Logger logger = LoggerFactory.getLogger(PrepareIndexTest.class);

	private static Node server;
	private static Client client;
	private static File tempDir;
	private static String index = "test";
	private static String docType = "movies";

	@BeforeClass
	public static void setUpBeforeClass() throws Exception {
		logger.info("======= START INTEGRATION TEST ========");

		// spinning up the elasticsearch server for junit
		tempDir = Files.createTempDir();
		logger.info(tempDir.getAbsolutePath());
		Settings settings = Settings.builder().put("path.home", tempDir.getAbsolutePath())
				.put("transport.type", "local")
				.put("http.enabled", false)
				.build();
		server = new Node(settings);
		final String clusterName = server.settings().get("cluster.name");

		logger.info("starting server with cluster-name: [{}]", clusterName);
		server.start();

		client = server.client();
	}

	@AfterClass
	public static void tearDownAfterClass() throws Exception {
		DeleteIndexResponse deleteIndexResponse = client.admin().indices()
				.prepareDelete(index).get();
		assertTrue(String.valueOf(deleteIndexResponse.isAcknowledged()).equalsIgnoreCase("true"));
		tempDir.delete();
		client.close();
		server.close();
		logger.info("======= END INTEGRATION TEST ========");
	}

	@Test
	public void testPrepareIndex() throws IOException {
		//creating index
		IndexAndCreateMapping.prepareIndex(client, index, docType);

		//checking if the index has been created or not
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(index).get();

		//it should be true
		assertTrue(indexResponse.isExists());
	}

}

Above code is for Elasticsearch version 5.2.1. If you want to run for previous versions like 2.4.0, you need to use NodeBulder to create a embedded Elasticsearch instance as follows:

tempDir = Files.createTempDir();
logger.info(tempDir.getAbsolutePath());
Settings settings = Settings.builder().put("path.home",
tempDir.getAbsolutePath()).build();
server = NodeBuilder.nodeBuilder().settings(settings).build();
final String clusterName = server.settings().get("cluster.name");

logger.info("starting server with cluster-name: [{}]", clusterName);
server.start();

client = server.client();

with following dependency

		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>2.4.1</version>
			<scope>test</scope>
			<type>test-jar</type>
		</dependency>

Mapping in Elasticsearch

Mapping is the process of defining how a document and fields it contains are stored and indexed. Mapping is very useful to define for following cases:

  1. Which string fields should be treated as full text fields?
  2. Which fields contain numbers, dates or geolocations?
  3. Whether the values of all fields in the document should be indexed into the catch-all _all field?
  4. Format of date values?
  5. Custom rules to control the mapping for dynamically added fields?

Mapping can be done in two ways in Elasticsearch.

Dynamic Mappings

This is the default mapping provided by the Elasticsearch. Fields and mapping types do not need to be defined before being used. New mapping types and new field names will be added automatically, just by indexing document.

Explicit Mappings

Dynamic mapping can be useful to get started. But at some point, we will want to specify our own explicit mappings since we know our data more than Elasticsearch. We can create mapping types and field mappings when we create an index. We can also add mapping types and fields to existing index with PUT mapping API of Elasticsearch.

We can create mapping for an index in two ways.

  1. Programmatically using java
  2. Using CLI

Programmatically using java

Mapping can be created at the time of index creation. It is recommended to create mapping before adding any data to index. Since, we know most of the fields before hand, it is very easy to do programmatically. If the mappings are present in the index, Exception will be thrown. We need to do safety check to eliminate this issue. The code looks like following:

import java.io.IOException;

import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;

public class IndexAndCreateMapping {

	public void prepareIndex(Client client, String indexName, String documentType) throws IOException {
		//looking for index if that exists already
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(indexName).get();

		//need to have some logic to create mapping only when the index is not created yet
		if (!indexResponse.isExists()) {
			XContentBuilder builder = getMappingBuilder();
			//creating mapping
			client.admin().indices().prepareCreate(indexName).addMapping(documentType, builder).get();

			//refeshes the index so that it can be accessed instantly
			client.admin().indices().prepareRefresh().get();
		}
	}

	private XContentBuilder getMappingBuilder() throws IOException {
		return XContentFactory.jsonBuilder().prettyPrint().startObject()
				.startObject("movie")
				.startObject("properties")
				.startObject("Directory").field("type", "string").field("index", "not_analyzed").endObject()
				.startObject("Title").field("type", "string").endObject()
				.startObject("Generes").field("type", "string").endObject()
				.startObject("Year").field("type", "integer").endObject()
				.endObject()
				.endObject()
				.endObject();
	}
}

We can add new mapping types and fields to the existing index with PUT mapping API.

Using CLI

We can run PUT mapping API from command line. Following CURL command does the following.

  1. Creates an index called movies.
  2. Adds mapping types called movies.
  3. Specifies fields or properties in each mapping type.
  4. Specifies the data type and mapping for each field.
curl -XPUT 'localhost:9200/movies?pretty' -d'
{
  "movie" : {
    "properties" : {
      "Director" : {
        "type" : "string",
        "index" : "not_analyzed"
      },
      "Title" : {
        "type" : "string"
      },
      "Generes" : {
        "type" : "string"
      },
      "Year" : {
        "type" : "integer"
      }
    }
  }
}'

Updating Existing Mappings

Existing type and field mappings cannot be updated unless otherwise documented. Changing the mapping would mean invalidating already indexed documents. Instead, we should create a new index with correct mappings and re-index the data. However, there are some exceptions to this rule. For instance:

  1. New properties can be added to object datatype fields.
  2. New multi-fields can be added to existing fields.
  3. The ignore_above parameter can be updated.

We can add mapping to the existing index using CLI as follows. In this example, we are adding Hero of each movie:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed"
        }
    }
}'

When we add index : not_analyzed,  it will not analyzed the string while indexing the data. This means that when we search, it will match for whole word rather than keywords.

We can also update the existing mappings by updating new field as follows:

curl -XPUT 'localhost:9200/movies/movies/_mapping?pretty' -d'
{
    "properties" : {
      "Hero" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 100
        }
    }
}'

Ignore_above will ignore any hero value that are longer than 100 characters.

Mapping is one time thing. On my opinion, it is better to do mapping when we create index from command line. Since, it is one time process to create mapping, it is better to add mapping from CLI before adding data to the Elasticsearch. However, we can add programmatically as well which need constant check of index.