Wednesday, October 31, 2018

CloverDX Worker is not starting

New functionality was introduced in CloverDX (formerly CloverETL) version 4.9 which should increase resiliency of server processing in difficult conditions. In older versions CloverDX server run in a single process, same one as a application container's one. This could be a bit problematic if you were running multiple java applications in same application container (in my experience Tomcat being used most often). If there would be an issue with CloverDX or if there would be issue in any of the other application, restart of the complete application container would be needed to fix it.

To mitigate this issue, new completely separate "runtime" was introduced for CloverDX server. This process hosts and run all transformations. CloverDX now runs in 2 processes, master process (Core) handles server UI (runs in the same process as Tomcat), "Worker" process runs data transformations.

This gives you more control for your environment, you can set resources (memory) separately for a master and a worker. In the future there might be even multiple workers possible, that way you could separate scheduled processes into different segments. There could be "really really important" processes that should have most memory, there could be "slow and steady wins the race" transformations, which could run separately with less memory, any unforeseeable issue in those wouldn't have any impact on more important jobs.

For now there is only 1 Core and 1 Worker process.

You don't need to worry about too much new configuration introduced by Worker, everything should work seamlessly, but there might be some gotchas:

  • Worker has separate heap memory setting  
  • Worker has separate classpath (needed for usage of external .jar libraries - it is not enough to copy them to TOMCAT_HOME/lib folder)
I encountered one tricky issue recently that had me scratch my head. I installed new version of CloverDX server, restarted it couple of times as I was fine tuning configuration/license etc. Everything seemed fine, only Worker wasn't running. And that is kinda big deal! 

You can still fall back to master (Core) process and run everything there, but you would be loosing all this new fancy separation.

I looked for logs in TOMCAT_HOME/temp/cloverlogs. I couldn't find any in worker.log (worker has separate log file too, in this case it wasn't there at all as Worker wasn't starting), I found only this peculiar error in all.log:


Worker [worker0@node01] failed to start. Remaining restart attempts: 2

I highlighted problematic portion in the log.

This points us to some problem during deployment of CloverDX server, Worker portion more specifically.

There are multiple steps happening during first deployment from .war file. One of them is unzipping worker.jar file in temporary work folder, which is then used to setup classpath.

There was an issue in my case, unfortunately between keyboard and chair.  When I started Tomcat for the first time, I did that as a root user. I understood my mistake, created separate user for Tomcat (as it is recommended), restarted Tomcat again with that user (I called him very originally cloveruser), but I didn't notice that some files got stuck with root ownership.

The fix was quite simple, I changed ownership of the work folder:

sudo chmod -R cloveruser:cloveruser TOMCAT_HOME/work/Catalina/localhost/clover

Restarted Tomcat again and voila, Worker is alive and kicking.

Hope this helps someone else.

No comments:

Post a Comment