Friday, June 1, 2018

Connecting to Amazon Neptune with CloverETL

Connecting to Amazon Neptune with CloverETL

Just at the end of May 2018, AWS pushed their new service, Neptune, for general availability (as promised on reInvent 2017). I decided to play with it right away and try connect to Neptune with my “weapon of choice”, CloverETL.

What is AWS Neptune? Let me steal directly from AWS as they surely have better copywriters than I ever will be: “Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.”

So Neptune is managed (as other RDS databases) graph (like Neo4j) database. Graph databases use structures like “nodes” (or "vertices") and “edges” to store and represent data and relationships between objects.
Figure 1 via

Graph databases could be helpful whenever relationships between objects get “a bit complicated” or just plainly massive. Common use cases of usage of graph database would be social networks (you want to traverse through all followers and theirs follower of specific person), fraud detection (you want to uncover fraud rings) etc.

For this exercise I spun up Neptune cluster, following , and CloverETL server on EC2, you can use this for info.
(both Neptune and EC2 instance are living in same VPC and security group.)

Amazon Neptune supports two different graph query languages, Gremlin and SPARQL, I will be using Gremlin as I am more familiar with that one.

Check for more information and great tutorials for Gremlin.
Copyright by

There are multiple ways how to use Gremlin for connecting to Neptune, you can write some code in Java or Groovy(for that we could use a CustomJavaComponent), Python (you can call Python scripts from CloverETL with a ExecuteScript component or use, you might want to use .Net or Node.js.

I will use another functionality of Amazon Neptune which is HTTP endpoints and REST interface. 

Document tells me that I can call Amazon Neptune with HTTP request, they are using ‘curl’ in the documentation. Same job in CloverETL is doing a HTTPConnector.

So lets do quick POST call like this one:

I created a graph on my CloverETL server in my CloverETL Designer, pasted a HTTPConnector from Palette on the right.
(Tip: Use Shift + Space for Add component dialog, this could make you a bit more efficient.)

I connect first output port of the HTTPConnector to a Trash component for now. 

In the HTTPConnector I am configuring URL property with value of HTTP endpoint of Amazon Neptune ( , Request method and payload in Request content.

You might notice that I created parameter for the endpoint as this one might change whenever I will recreate Neptune cluster.

I created simple graph (like "real graph in graph database" not CloverETL graph) in my Neptune cluster following .
I might be doing all these steps in CloverETL, but using Gremlin's REPL console is a bit faster. CloverETL will help us later to enhance and automatize data ingestion and can produce reports with data in different formats which is not really possible via Gremlin console.

I am using a json document "{"gremlin" : "g.V().limit(1)"}" to run query which will return me first vertex from the graph.

After I run this graph I can check response on the edge coming from HTTPConnector directly in the CloverETL Designer.

REST API is communicating via JSON format, record produced by HTTPConnector will contain response content and headers.

I will use functionality of  CloverETL which allows me to parse JSON documents returned from REST API with another components directly in the same process.

There are two, JSONReader (uses xpath) or JSONExtract (uses SAX parser and something like "json scheme").

JSONExtract can parse JSON document, build scheme for it and show user visual tree that will help him or her map elements out of JSON elements which are of interest.

For this I will copy value from content field of record (value will be available after successful run of the graph) and store it as file, NeptuneResponse.json in data-in subfolder. 

(Tip: Whenever I work with JSON I like to add some JSON plugin that will make JSON manipulation easier. I am using that CloverETL is build on top of a Eclipse platform, so there will be a bunch of plugins available. Currently I am using Json Tools.)

I will prepare for parsing of the response from HTTPConnector by placing JSONExtract component directly on edge of CloverETL graph, just before Trash component.

Now I need to point JSONExtract to read saved file by configuring its File URL property.
(Tip: Just drag the file from data-in folder onto JSONExtract component, that's the easiest way.)

Then I open Mapping dialog by clicking on ellipsis on Mapping property and let component produce visual tree by clicking on a Generate tree structure button.

In the mapping document I will map all field which are interested of me.

This will produce 2 connected output ports with appropriate metadata.

Last thing for JSONExtract is switching value of the File URL  property from physical file to different approach when JSON document will be provided dynamically on the CloverETL edge (by running the graph) and parsed automatically at the runtime.

Syntax for it is: port:$0.content:discrete (thats the value that should be in File URL now).

I mapped out 2 ports, first one is containing parsed value returned from Neptune, second port contains response type (200 for everything went OK, different code and error message other way) for possible error handling.

This combination of components could be easily reused by wrapping them into Subgraph, which would allow bigger re-usability, but that's question for some next article. 


  1. i was fully study this article and get more ideas from this article and in this article i know what to do to improve my skills..really thanks for the information.
    aws training in chennai | aws course in chennai | aws certification in chennai | data science course in chennai | devops training in chennai | devops certification in chennai

  2. Amazing article. Cool.Looking ahead to examining more. Good article.Really thank you!Many thanks extremely for you post.Really getting excited about study more.I price the article.Really thanks! Much obliged.Really loved this article post.
    friv 4 online
    abcya xyz free
    friv 2017

  3. If you've been watching my journey or even the SLIGHTEST bit intrigued, comment below or message me to join us on Monday in my Sneak Peek into Coaching group. Here you'll find out a little more about what I do everyday and how you TOO can work from the couch in your pjs towards a financially free life!
    kizi kizi 2 games
    games free for kids
    friv Games for kids 2019

  4. I’m not that much of a internet reader to be honest but your blogs really nice, keep it up! I’ll go ahead and bookmark your site to come back down the road. Cheers

    jogos io online
    friv play
    a10 for free