Wednesday, May 4, 2016

CloverETL - Replacing Legacy Software

What does legacy data software mean to you: old software that’s currently outdated or existing software that works? Or, I should ask, are you a developer or a business stakeholder? No matter which side of the discussion you are on, replacing legacy software is always a difficult conversation between developers and business stakeholders. On one side, business stakeholders see little value in developing a solution for something that is already working and generally follow the motto ‘if it’s not broke, don’t fix it’. Whereas, developers are maintaining the legacy software on a day-to-day basis (and they most likely haven’t designed nor developed the software) and are always looking for ways to improve the software. As a developer, I always see the value in upgrading your software because technology is constantly changing and evolving. The requirements and constraints your systems had 3 years ago are no longer constraints in today’s world. Today we’ll outline an existing legacy system process and software, and show that it can easily be migrated into a data integration solution using CloverETL.

Background
I used to work as a government contractor at the National Oceanic and Atmospheric Administration where my job was maintaining and enhancing the data ingestion system. The data ingestion system, at a high level, gathers water level data from NOAA tide stations and transmits them to local read-out ground systems where the data ingestion system would then acquire the water level data for processing. Once the data arrived on the data ingestion servers, software would decode the messages, calculate the water level value based upon the raw data, and insert the data into the database. Can you count the number of software programs that were needed for this process? Disregarding the upstream software, there is a special scripting language that acquires the data, a Fortran program to decode the messages, another Fortran program to quality control and add offsets, and a C program to insert the data into the database. Do you know how to code in Fortran or compile Fortran code? Making a change to these programs takes weeks to fully compile and test. This is a huge problem with maintaining legacy software because the technology is outdated, the business requirements are not fully understood, and costs too much money to enhance the existing software.

Wouldn’t it be nice to consolidate all of the software into a packaged solution that can be easily customized for your data needs? Using CloverETL, you can design your solution to follow the same process that’s already in place with the same check-pointing that exists in your system today.   

Existing Process Flow:




Using CloverETL, it’s possible to achieve the same results with the same process flow that you have already defined.

Benefits of using a CloverETL solution for your legacy data software needs:
  • Update your codebase to a modern architecture approach
  • Write custom logic components using Clover Transformation Language (CTL) or Java
  • Metadata propagation between your business objects and processes which can cut down on the amount of processing your systems are doing
  • Near real-time data handling (file event listeners and/or scheduled events)
  • Removes clunky, old programming languages
  • Allows you to update your business requirements in a timely fashion

This is one example from my previous experience where I know CloverETL would help to make a difference for business stakeholders and developers. Please let me know if you have any additional questions. 

Monday, May 2, 2016

CloverETL Products

When talking about CloverETL with various business, sales, and technical resources it is often hard to communicate which CloverETL product your project requires. Before I dive into the reasoning that I use behind each product, I want to briefly describe each of the products.

The CloverETL Designer is a visual tool that's used for development and manual execution of graphs/jobflows that the developer builds. This is a great tool for debugging your logic in a graphical environment.

The CloverETL Corporate Server allows the developer and/or data analyst full workflow management. The user can setup schedules for jobs to execute, event listeners, publish web services, and create an orchestration framework for your project.

The CloverETL Cluster in addition to all of the CloverETL Corporate Server functionality, allows the users to setup job parallelism with load balancing and failover redundancy.

Now that you have a quick understanding of the products that CloverETL offers, I am going to give you a few scenarios and let's see if you can determine which CloverETL product you would recommend.

Scenario 1: I (along with tens of thousands) play fantasy baseball so I like to process all historical statistics on a yearly basis to track player trends prior to the start of my fantasy baseball draft. This process will ingest all statistical category and can offer a value for a particular player for their estimated value the next year.

Scenario 2: I am a baseball organization who tracks statistics for every baseball game that is happening on a given day. I execute the process at the conclusion of the final game of each night that will update all statistical categories, hitting trends, pitching trends, and fielding trends.

Scenario 3: I am a broadcasting organization which tracks statistical trends as they happen in real-time. If a player strikes out in the fourth inning, I want all statistics and trends updated when they hit again in the sixth inning.

Which CloverETL products would you recommend and why?

Scenario 1: I would recommend the CloverETL Designer. This is more of a manual process that is being run on a yearly basis. You can save the logic that you have created and change the input for another year's worth of data.

Scenario 2: I would recommend the CloverETL Corporate Server. This should be an automated process because it's happening every night. You can schedule the job to run at a certain time of the evening, automate all logging, and create a generic workflow for handling a number of scenarios programmatically.

Scenario 3: Personally, I would recommend the CloverETL Cluster, but you could use the CloverETL Corporate Server. The reason why I would recommend the Cluster in this instance is because we are talking about a business critical function since the broadcasting company relies on this data being always updated. I would setup the Cluster for parallel processing with load balancing to execute the job faster as well as setting up the nodes such that the job could still execute if one of the nodes was unresponsive.

I hope you have learned a bit about which CloverETL product is best for your project. If you have any questions, please don't hesitate to comment below and I would be happy to respond with any recommendations.