Friday, January 25, 2019

CloverDX and supported Java versions

Most of the IT specialist with experience in Java are probably aware about impeding changes to Oracle's Java licencing. There were a lot of words written, many articles published.

Now there is also official notice about CloverDX and supported Java versions. Article could be found here:

Wednesday, October 31, 2018

CloverDX Worker is not starting

New functionality was introduced in CloverDX (formerly CloverETL) version 4.9 which should increase resiliency of server processing in difficult conditions. In older versions CloverDX server run in a single process, same one as a application container's one. This could be a bit problematic if you were running multiple java applications in same application container (in my experience Tomcat being used most often). If there would be an issue with CloverDX or if there would be issue in any of the other application, restart of the complete application container would be needed to fix it.

To mitigate this issue, new completely separate "runtime" was introduced for CloverDX server. This process hosts and run all transformations. CloverDX now runs in 2 processes, master process (Core) handles server UI (runs in the same process as Tomcat), "Worker" process runs data transformations.

This gives you more control for your environment, you can set resources (memory) separately for a master and a worker. In the future there might be even multiple workers possible, that way you could separate scheduled processes into different segments. There could be "really really important" processes that should have most memory, there could be "slow and steady wins the race" transformations, which could run separately with less memory, any unforeseeable issue in those wouldn't have any impact on more important jobs.

For now there is only 1 Core and 1 Worker process.

You don't need to worry about too much new configuration introduced by Worker, everything should work seamlessly, but there might be some gotchas:

  • Worker has separate heap memory setting  
  • Worker has separate classpath (needed for usage of external .jar libraries - it is not enough to copy them to TOMCAT_HOME/lib folder)
I encountered one tricky issue recently that had me scratch my head. I installed new version of CloverDX server, restarted it couple of times as I was fine tuning configuration/license etc. Everything seemed fine, only Worker wasn't running. And that is kinda big deal! 

You can still fall back to master (Core) process and run everything there, but you would be loosing all this new fancy separation.

I looked for logs in TOMCAT_HOME/temp/cloverlogs. I couldn't find any in worker.log (worker has separate log file too, in this case it wasn't there at all as Worker wasn't starting), I found only this peculiar error in all.log:

Worker [worker0@node01] failed to start. Remaining restart attempts: 2

I highlighted problematic portion in the log.

This points us to some problem during deployment of CloverDX server, Worker portion more specifically.

There are multiple steps happening during first deployment from .war file. One of them is unzipping worker.jar file in temporary work folder, which is then used to setup classpath.

There was an issue in my case, unfortunately between keyboard and chair.  When I started Tomcat for the first time, I did that as a root user. I understood my mistake, created separate user for Tomcat (as it is recommended), restarted Tomcat again with that user (I called him very originally cloveruser), but I didn't notice that some files got stuck with root ownership.

The fix was quite simple, I changed ownership of the work folder:

sudo chmod -R cloveruser:cloveruser TOMCAT_HOME/work/Catalina/localhost/clover

Restarted Tomcat again and voila, Worker is alive and kicking.

Hope this helps someone else.

Thursday, October 18, 2018

CloverETL rebranded to CloverDX

Not only trees are shedding leaves during Fall, CloverETL is losing its long used name and its being rebranded to CloverDX.

No worries, Designer nor Server aren't going anywhere, just company wants to share its belief that they are not only about ETL (but mostly they are ;)), but its more about full experience whenever you are in need of data integration.

More eloquently put here

BTW with new name, Server UI was revamped and looks modern and pretty spiffy. Definitely check it out!

Not sure if I ever go through posts on this site and replace all occurrences of CloverETL to CloverDX. Maybe that is a job for long cold winter nights. Winter is coming!!

Thursday, July 19, 2018

Limiting user access to view logs only in CloverETL server

When you are running production installation of a CloverETL server you might have separation of responsibilities or different permissions for different people.

I was asked recently how to enable specific people to only access logs for job runs on the server. This is common use case, you have some support personnel that need to have access to logs if something goes wrong in any of the scheduled processes. You don't want to overwhelm them with too many options, you don't want them to modify any existing process etc.

CloverETL set permissions on Group level, not on specific user level, eg. you cannot give John and Amy different permissions, you need to separate them first into two groups.

Whenever you will create new group it will have all permissions removed by default. You could tell that by red cross icons on the permission tree.

In our use case, you want to enable only "Unlimited access to execution history" for this particular group.

This configuration will allow all user assigned to this group view only Execution History tab, check previous runs, see their Tracking information (how many records were processed) and see or download log for particular run.

You might wonder what is doing Limited access to execution history list. That one gives you more control, With unlimited access group members will see all content of execution history, for all sandboxes. 
Limited access allows you to show history only for sandboxes that group has read access to, eg. if you limit access to Sandbox A to Group A and John is not member of Group A, he won't be able to see any runs of processes from that project even if he has access to Execution history. 

By default all sandboxes are visible to all groups, if you want to change it, you need to do in Permissions section of Sandboxes tab.

Currently visibility granularity is on sandbox level, eg. you cannot limit visibility for a specific graph only.

And this a very limited view that John will have if he is only member of a group which has only permissions for Execution history:

As you could seen in one of the previous pictures, granularity of permissions on CloverETL server is pretty elaborate, so go check documentation page for more details how you can configure access permissions for your users to your liking.

Thursday, July 5, 2018

Auto start Tomcat and CloverETL on EC2 AWS Linux AMI

In one of the previous blog posts I installed evaluation CloverETL server on Amazon's EC2 instance. This installation is useful for evaluation, quick setup and I mentioned at the end of article that you might want to set CloverETL server to start up automatically if host gets restarted.

This article will show you one way how.


Last time we installed Tomcat and deployed server war to:


(In retrospect keeping name of version in the directory name wasn't greatest idea, that's when you will try to upgrade, but again..evaluation installation.)


First thing I will do is to create init script:
sudo vim /etc/init.d/clover

#!/bin/bash ### BEGIN INIT INFO # Provides:        tomcat8 # Required-Start:  $network # Required-Stop:   $network # Default-Start:   2 3 4 5 # Default-Stop:    0 1 6 # Short-Description: Start/Stop Tomcat server ### END INIT INFO
start() {  sh /clover/CloverETLServer.4.5.1.Tomcat-8.0.30/bin/ }
stop() {  sh /clover/CloverETLServer.4.5.1.Tomcat-8.0.30/bin/ }
case $1 in   start|stop) $1;;   restart) stop; start;;   *) echo "Run as $0 <start|stop|restart>"; exit 1;; esac

There might be different versions of init script, this one makes sure that you will run in only after network interface is set up on the instance.

Last step is to put this init script to be run after reboot. Common way to do is with update-rc.d command, but that one is not installed on Amazon Linux AMI for some reason.

Another way that worked for me was:
sudo chmod 755 /etc/init.d/clover
sudo chkconfig --level 345 clover on

After these steps Tomcat and CloverETL server should automatically after reboot of the host.

(I used heavily answers here )

Friday, June 29, 2018

Healthcheck for CloverETL server

In modern cloud oriented world there is a need for periodical healthchecks of servers or services. Let's imagine that you have CloverETL server deployed on AWS and you need to know that your host machine, application container and CloverETL server is running.

AWS has you covered for first 2 points with healthchecks on load balancer .

This might not cover edge cases when your host machine and application container (it might be Tomcat or any other supported application containers) are running, but CloverETL server is not.

There are multiple ways how to check health or "liveness" (if thats even a word).

Use HTTP API call

CloverETL server supports HTTP api for multiple operations, you are interested in 'cluster_status' .
Please ignore name, same operation will work on cluster or even on single installation.

You can use health check functionality of load balancer to ping that endpoint to get status. 

Disadvantage of this approach is that by default HTTP API is protected by HTTP Basic authentication , what might be a problem for some health check services.

Calling accessibility page

CloverETL server has page that could be used without any authentication.

Calling this page will result in one of 3 states:
  1. OK/200
  2. 500 
  3. 503
This way you can use any health check application or service to learn if your CloverETL is living.

Thursday, June 21, 2018

Flat files and newlines for different OS

CloverETL can read, as one of the many sources, flat files. Eg. files without hierarchical structure, data stored in human readable format. Simple example is still popular csv.

Csv means comma separated values, eg columns of data are separated by ',' (comma) delimiter.

CloverETL has for such files a FlatFileReader component which can read csv with different delimiters ('|' or ';' are another popular ones). This component can read a flat files with different delimiters, it can read them not only from local system but also from remote ones (ftp, sftp, S3).

For each file you want to read with a FlatFileReader you will need to have a metadata. Reader provide easy way how to create metadata for existing file via Extract metadata functionality.

This option will parse the file and produce metadata (description) of the file. Eg. list of fields, their datatypes etc.

One of the issues you might encounter in real world is that you created your metadata from one version of flat file, but in reality files could come from various sources, with various OS. Each operation system implements its own newline delimiters.

Extract metadata will get new line from that one file you triggered it on. But don't worry, there is an way how to be prepared for files from different OS.

You just need to:

  1.  edit created metadata (double click on the edge with the metadata)
  2. click on first row with name of metadata to get properties in right hand side column
  3. select last option in Record delimiter field

This option will allow you to read files from different OS without issues. (You can even write to that field and use delimiters which are not in the dropdown, just give it try!)