NavigableSet Interface

We use set in Java when we don’t want to store duplicate entries. There are two implementation of Set. HashSet and TreeSet.

HashSet stores its elements in a hash table which means it uses hashCode() method of the objects to retrieve and store more efficiently. The main benefit of having HashSet is that adding and searching of a element have constant time. HashSet is very commonly used implementation of Set.

TreeSet stores its element in tree structure which means it stores in sorted order. If we compare this to HashSet, the main tradeoff is the performance on adding and searching. It will take O(log n). TreeSet implements a special interface called NavigableSet Interface.

NavigableSet is very useful if we want to find the next lowest or highest number of a list. This interface has some interesting methods.

Method Description
E lower(E e) Returns greatest element that is < e, or null if no such element
E floor(E e) Returns greatest element that is <= e, or null if no such element
E ceiling(E e) Returns smallest element that is >= e, or null if no such element
E higher(E e) Returns smallest element that is > e, or null if no such element

NavigableSet Oracle Site is the best java doc of NabigableSet Interface. Lets look at very simple example.

import java.util.NavigableSet;
import java.util.TreeSet;

public class NavigableSetTest {

	public static void main(String[] args) {
		NavigableSet<Integer> set = new TreeSet<>();
		for (int i = 0; i <= 20; i++) {
			set.add(i);
		}
		System.out.println(set.lower(10)); //9
		System.out.println(set.floor(10)); //10
		System.out.println(set.ceiling(20)); //20
		System.out.println(set.higher(20)); //null

	}
}

Above TreeSet contains 20 Integers whose value is from 0 to 20. Line 11 must return the highest element that is less than 10 according to java doc. It return 9 which is correct. Line 12 must return the highest element that is no higher than 10. The main difference here is that one call includes the highest element but other does not.

Line 13 must return the lowest element that is greater than or equal to 20. Line 14 must return the lowest element that is greater than 20. There is not any such element that meets these criteria making the result null.

Tip, lower and higher methods do not include the target element however ceiling and floor includes target elements.

Recursion in Java

A Recursion is when the method calls itself to solve a problem. The recursive solution is constructed with a base case and a recursive case. In every recursive solution, we need to have these two components. If we miss base case in our recursive solution, the method will run in never ending loop and will throw StackOverFlowError.

Base Case

A non-recursive method that terminates the recursive path. This should run first in a method to check to terminate the recursive path.

Recursive Case

A recursive method that calls itself one or multiple times to solve a problem.

The best example of recursion is to compute the factorial of a number. In mathematics, a factorial is what you get when you multiply a number by all of the integers below it. The factorial of 6 is equal to 6 * 5 * 4 * 3 * 2 * 1 = 720. We can write a recursive function for factorial as follows in java:

public static int factorial(int n) {
	if (n <= 1) {
		return 1;
	} else {
		return n * factorial(n - 1);
	}
}

If we trace the function call of above recursive function, we can see as follows:

factorial(6)
	factorial(5)
		factorial(4)
			factorial(3)
				factorial(2)
					factorial(1)
					return 1
				return 2*1 = 2
			return 3*2 = 6
		return 4*6 = 24
	return 5*24 = 120
return 6*120 = 720

In this example, you see that 1 is the base case, and any integer value greater than 1 triggers the recursive case.

One challenge in implementing a recursive solution is always to make sure that the recursive process arrives at a base case. For example, if the base is never reached, the solution will continue infinitely and the program will hang. In java, this will result in a StackOverFlowError anytime the application recurses too deeply.

Following famous algorithms are using recursive strategy to solve a problem

  1. Euclid’s algorithm
  2. Towers of Hanoi
  3. Brownian bridge

Properties File Formats for Java

The most common syntax is where a property file contains key/value pairs in the following format. If you define the log4j.properties file, you might define following way which is the default way of doing it.

#This is a stub log4.properties file used by the bootstrap process

log4j.rootCategory=info, stdout, rf

#---------------------
#Log to Console
#---------------------
#Log messages to the console for priority INFO, WARN, ERROR, and FATAL
#To Log debug or trace, set log4j.rootCategory to DEBUG or TRACE
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
#
#DEFAULT - log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %n
#Pattern 1 - full class name - log4j.appender.stdout.layout.ConversionPattern=%d %p [%t:%c] - %n
#Pattern 2 - no class name
#
log4j.appender.stdout.layout.ConversionPattern=%d %-5p [%c{2}:%t] - %n
log4j.appender.stdout.Threshold=info


#---------------------
#Log to Rolling File
#---------------------
#Log messages to the log file (backed up each day) for priority INFO, WARN, ERROR, and FATAL
#To Log debug or trace, set log4j.rootCategory to DEBUG or TRACE
log4j.appender.rf=org.apache.log4j.DailyRollingFileAppender
log4j.appender.rf.File=log/app-bootstrap.log
log4j.appender.rf.DatePattern='.'yyyy-MM-dd
log4j.appender.rf.Append=true
log4j.appender.rf.layout=org.apache.log4j.PatternLayout
log4j.appender.rf.layout.ConversionPattern=%d %-5p [%c{2}:%t] - %n
log4j.appender.rf.Threshold=debug

There is more to it than that. There are actually two other format that you can use to express these pairs. Even if you never use them in your job, it is better to know in case you might encounter in code.

name:David
name David

You might wonder how to express some other ideas in a property file. The common ones are as follows:

  1. If a line begins with # or !, it is a comment. I use # syntax on above property file defination for comment.
  2. Spaces before or after the separator character are ignored.
  3. Spaces at the beginning of a line are ignored.
  4. Spaces at the end of a line are not ignored.
  5. End a line with a backlash if you want to break the line for readability.
  6. You can use normal Java escape characters like \t and \n.

If we put these concepts together, we can write the following:

# one comment
! second comment
key =    value
string = value \tafter tab
long = lkajdfljasdlj\
nandfsdlkjfdlj

Printing our these two properties in a program gives us theses:

value
value [tab] after tab
lkajdfljasdljnandfsdlkjfdlj

 
It is always good to know how many ways the properties file can be written. There are various ways to read these kinds of property files as well. One of the way is to use FileInputStream to load all the properties of a file to Properties object as follows:

public static void main(String[] args) {
	Properties prop = new Properties();
	InputStream input = null;
	try {
		input = new FileInputStream("config.properties");

		// load a properties file
		prop.load(input);

		// get the property value and print it out
		System.out.println(prop.getProperty("database"));
		System.out.println(prop.getProperty("dbuser"));
		System.out.println(prop.getProperty("dbpassword"));
	} catch (IOException ex) {
		ex.printStackTrace();
	} finally {
		if (input != null) {
			try {
				input.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
  }

Another way is to use ResourceBundle. ResourceBundle.getBundle() takes property file name and the Locale object to distinguish which property file to load. It always loads file from resource folder of a project.


// ResourceBundle class will use SystemMessages.properties file
ResourceBundle resourceBundle = ResourceBundle.getBundle(
"SystemMessages", Locale.getDefault());
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

// ResourceBundle class will use SystemMessages_es.properties file
resourceBundle = ResourceBundle.getBundle("SystemMessages",
Locale.forLanguageTag("es"));
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

// ResourceBundle class will use SystemMessages_fr.properties file
resourceBundle = ResourceBundle.getBundle("SystemMessages",
Locale.FRANCE);
logger.info(resourceBundle.getString("first_name") + ": David");
logger.info(resourceBundle.getString("last_name") + ": Dang");

If you want to learn more about ResourceBundle, please visit https://docs.oracle.com/javase/tutorial/i18n/resbundle/propfile.html site for more information.
 

Creating a JanusGraph using Gremlin Console

JanusGraph is new community project under the Linux foundation. It is forked from TitanDB code. JanusGraph incorporates support for the property graph model with Apache TinkerPop (the open source graph computing framework) and its Gremlin graph traversal language. According to JanusGraph website:

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.

I will show how you can run JanusGraph locally so that you can try out yourself since it might become the de-facto implementation of TinkerPop. I will be using Cassandra and Elasticsearch for backend databases.

Procedure

  1. Start Gremlin console as per Starting with Gremlin Console
  2. Define your connection properties in properties file to connect to Cassandra and Elasticsearch docker images as follows. (hostname of Cassandra and Elasticsearch might change on your machine).
  3. storage.backend=cassandrathrift
    storage.hostname=localhost
    cache.db-cache=true
    cache.db-cache-clean-wait=20
    cache.db-cache-time=180000
    cache.db-cache-size=0.25
    index.search.backend=elasticsearch
    index.search.hostname=localhost
    index.search.elasticsearch.client-only=true

  4. Run docker pull cassandra:2.1.9 command to pull cassandra from Docker Hub. It is using old version of cassandra.
  5. Run docker pull elasticsearch:2.4.4 command to pull elasticsearch from Docker Hub.
  6. After successful pull, run following command to run cassandra docker image. docker run -e CASSANDRA_START_RPC=true --name cassandra-latest -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9042:9042 -p 9160:9160 -t -d cassandra:2.1.9
  7. Look for following message
    Screenshot 2017-04-11 19.07.04
  8. Run docker run –name elasticsearch-2.4.4 -p 9300:9300 -p 9200:9200 -t -d elasticsearch:2.4.4 command run run elasticsearch docker image.
    Screenshot 2017-04-11 19.13.10
  9. Check both images are running by running docker ps command and see both images are running or not.
    Screenshot 2017-04-11 20.37.07
  10. Run following command to connect to your backend using Gremlin console.
    gremlin> graph = JanusGraphFactory.open('.../website/janusgraph.properties')
    ==> standardjanusgraph[cassandrathrift:[localhost]]
  11. Once a graph exists, a graph traversal g is configured that will allow graph traversal. Graph traversal is used to query the graph data and returns results. A graph traversal is bound to specific traversal source that is the standard Janus traversal engine.

  12. Create traversal instance so that we can query our graph database. We can also use traversal instance to create vertices.
    gremlin> g = graph.traversal()
    ==>graphtraversalsource[standardjanusgraph[cassandrathrift:[localhost]], standard]
  13. The graph commands usually add vertices and edges to database or get other graph information. the g commands generally do queries to obtain results.

  14. Load some vertices and start playing with JanusGraph.

Starting Gremlin Console With JanusGraph in Linux

Gremlin is the graph database query language that is used to interact with JanusGraph. One method of putting Gremlin code in JanusGraph is to use Gremlin Console that is being provided by the JanusGraph which comes with JanusGraph plugin. The Gremlin console is a very useful interactive environment for directly writing Gremlin query to create graph schema, load data, administer graph, and retrieval traversal results.

Procedure

  1. Clone JanusGraph from its GitHub repository https://github.com/JanusGraph/janusgraph/
  2. Run mvn clean install -DskipTests=true command on your console. You need to be inside Janusgraph folder to run this command. -DskipTest=true would skip all the test from the build. If you want to run test, you can run it but it would take around 5 hours for all tests. It will take around 25 minutes to build even with -DskipTest=true.
  3. Once step two completed successfully, navigate to bin folder. bin folder only gets created when the step 2 gets completed successfully.
  4. run the ./gremlin.sh command on your console. If your are not inside the bin folder. You can run bin/gremlin.sh command from your console.
  5. The console output looks like follows

Screenshot 2017-04-10 21.04.55

Note: Six plugins are activated by default, as shown. The Gremlin Server, tinker pop.server, is started so that commands can be issued to JanusGraph. The utilities plugin, tunkerpop.utilities provides various functions, helper methods and imports of external classes that are useful in Gremlin console.

Discover all Gremlin console commands with help. Console commands are not Gremlin language commands, but rather commands issue to Gremlin console for shell functionality. The Gremlin console is based on Groovy Language.

Screenshot 2017-04-10 21.05.42

Testing Void Method Using Mockito

Mockito is industry wide mocking framework for unit test. It gives an ability to a developer to write his/her code without depending on other developer’s code. Testing void method is hard because it doesn’t return any value. In this blog, I will show, how we can test our void method using Mockito.

Purpose:

Unit testing void method of a class.

Dependency:

<!-- https://mvnrepository.com/artifact/org.mockito/mockito-all -->
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.10.19</version>
</dependency>

Implementation:

Class to hold void method:

package com.vsubedi.voidTest;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SomeVoidMethod {
	private static final Logger logger = LoggerFactory.getLogger(SomeVoidMethod.class);

	public void printLogs(){
		logger.info("This method has been executed");
	}
}

Testing above class’s void method:

package com.vsubedi.voidTest;

import static org.mockito.Mockito.verify;

import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import org.mockito.InjectMocks;
import org.mockito.MockitoAnnotations;
import org.mockito.Spy;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class TestVoidMethod {
	private static final Logger logger = LoggerFactory.getLogger(TestVoidMethod.class);

	@InjectMocks
	@Spy
	private SomeVoidMethod someVoidMethod;

	@BeforeClass
	public static void setUpBeforeClass() throws Exception {
		logger.info("============ START UNIT TEST ==============");
	}

	@AfterClass
	public static void tearDownAfterClass() throws Exception {
		logger.info("============ END UNIT TEST ==============");
	}

	@Before
	public void setUp() throws Exception {
		MockitoAnnotations.initMocks(this);
	}

	@Test
	public void testPrintLogs() {
		someVoidMethod.printLogs();

		//verifying the interaction to the method
		verify(someVoidMethod).printLogs();
	}
}

We need to have @Spy annotation to run verify() method which is supplied by Mockito. Verify tells that printLogs() of SomeVoidMethod class has been executed at least once.

This is just a very simple example to show how we can test void method. Void methods can be complex. We can also test those complex void methods using Mockito.

Testing Mapping with Embedded Elasticsearch Server

Running instance of Elasticsearch cluster server won’t available for unit test. On the other hand, unit test shouldn’t depend on the external running instance. If you want to test the real Elasticsearch behavior in unit test, you can use embedded Elasticsearch server instance. Embedded Elasticsearch server is a small instance of Elasticsearch cluster and it works exactly as Elasticsearch cluster as a whole.
You need following dependency in POM file to run Elasticsearch as embedded in your application for unit test.

	<dependency>
	    <groupId>org.elasticsearch</groupId>
	    <artifactId>elasticsearch</artifactId>
	    <version>5.2.1</version>
	</dependency>
        <dependency>
	    <groupId>org.elasticsearch.client</groupId>
	    <artifactId>transport</artifactId>
	    <version>5.2.1</version>
	</dependency>
	<dependency>
	    <groupId>com.google.guava</groupId>
	    <artifactId>guava</artifactId>
	    <version>21.0</version>
	</dependency>

Following code tests that whether the mapping has been created in Elasticsearch or not. We can do using Mockito. However, I don’t think Mockito will give us full confidence on our code. I am using embedded Elasticsearch to make sure I can create index and mapping without any issue. If you want to see the mapping, Please click here for my mapping tutorial.

package com.vsubedi.elasticsearch;

import static org.junit.Assert.*;

import java.io.File;
import java.io.IOException;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.common.io.Files;

/**
 *
 * @author vsubedi
 *
 */
public class PrepareIndexTest {

	private static final Logger logger = LoggerFactory.getLogger(PrepareIndexTest.class);

	private static Node server;
	private static Client client;
	private static File tempDir;
	private static String index = "test";
	private static String docType = "movies";

	@BeforeClass
	public static void setUpBeforeClass() throws Exception {
		logger.info("======= START INTEGRATION TEST ========");

		// spinning up the elasticsearch server for junit
		tempDir = Files.createTempDir();
		logger.info(tempDir.getAbsolutePath());
		Settings settings = Settings.builder().put("path.home", tempDir.getAbsolutePath())
				.put("transport.type", "local")
				.put("http.enabled", false)
				.build();
		server = new Node(settings);
		final String clusterName = server.settings().get("cluster.name");

		logger.info("starting server with cluster-name: [{}]", clusterName);
		server.start();

		client = server.client();
	}

	@AfterClass
	public static void tearDownAfterClass() throws Exception {
		DeleteIndexResponse deleteIndexResponse = client.admin().indices()
				.prepareDelete(index).get();
		assertTrue(String.valueOf(deleteIndexResponse.isAcknowledged()).equalsIgnoreCase("true"));
		tempDir.delete();
		client.close();
		server.close();
		logger.info("======= END INTEGRATION TEST ========");
	}

	@Test
	public void testPrepareIndex() throws IOException {
		//creating index
		IndexAndCreateMapping.prepareIndex(client, index, docType);

		//checking if the index has been created or not
		IndicesExistsResponse indexResponse = client.admin().indices().prepareExists(index).get();

		//it should be true
		assertTrue(indexResponse.isExists());
	}

}

Above code is for Elasticsearch version 5.2.1. If you want to run for previous versions like 2.4.0, you need to use NodeBulder to create a embedded Elasticsearch instance as follows:

tempDir = Files.createTempDir();
logger.info(tempDir.getAbsolutePath());
Settings settings = Settings.builder().put("path.home",
tempDir.getAbsolutePath()).build();
server = NodeBuilder.nodeBuilder().settings(settings).build();
final String clusterName = server.settings().get("cluster.name");

logger.info("starting server with cluster-name: [{}]", clusterName);
server.start();

client = server.client();

with following dependency

		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>2.4.1</version>
			<scope>test</scope>
			<type>test-jar</type>
		</dependency>