Monday, August 29, 2016

How to get unique values in list with CTL

Last article in our "How to" series was about how to get sheet names from Excel file. Today I will venture into CTL (Clover Tranformation Language - scripting language oriented on data migration use cases used in CloverETL)

You might know that if you need to get only unique records by some key value you should use Dedup component in CloverETL.
But what if you have CTL variable as a list and you would like to get only unique values?

Data types in CloverETL

CloverETL contains couple of standard data types that you can use in CTL (Clover Transformation Language) programming.
There are simple data types:

  • boolean
  • byte
  • cbyte
  • date
  • decimal
  • integer
  • long
  • number

There are also couple of complex types:

  • record
  • list
  • map


I will focus today on "list". List is a container for multiple values of same data type. Eg. you could have list of strings, list of integers, but not list of strings AND integers.
To be more precise, list is a ordered sequence of elements, where elements could be accessed by their position only.
List container is omnipresent in almost all programming languages. In PHP they call it Indexed Array, in Java you might be familiar with ArrayList implementation of AbstractList class and you can find "list" in Python too.

Declaration and usage

You can declare and assign values to list CTL variable like this:
string[] myList = ['a', 'b', 'c'];
Clover list is accessible with use of index value:
myList[1] == 'b';//we are in java, index starts at 0
You can also use list as a stack (or queue) and use known pop and  poll functions:
string[] myList = ['a', 'b', 'c'];
string val = pop(myList);
//val == 'c' AND myList == ['a', 'b']
And there is a lot of helpful Container functions.


CloverETL's list doesn't have any uniqueness enforcement, it is not for example Java Set.
So it might happen that there would be multiple same values.
string[] myList = ['a', 'b', 'c', 'a', 'c', 'b'];
You can use in CTL similar trick to get only unique values from list as you might know from different languages. You will convert list to map (set of tuples, "key" => "value" pairs) and get only keys from this map. Conversion to map will take care of duplicates automatically as CloverETL's map can contain only one value per key.

Declaration and usage of map

map[string, string] urls;
urls['Google'] = '';
urls['Microsoft'] = '';

toMap usage 

toMap function in CTL could have two signatures:
map[<type of key>,<type of value>] toMap(<type of key>[] keys, <type of value>[] values);
map[<type of key>,<type of value>] toMap(<type of key>[] keys, <type of value> value);
First signature expects 2 lists, one will be used as a keys, second as a values. Mapping from keys to values will be done automatically by order. (Length of both lists needs to be same!)
string[] companies = ['Google', 'Microsoft', 'Apple'];
string[] urls = ['', '', ''];
map[string, string] companyUrls = toMap(companies, urls);
//companyUrls['Microsoft'] == ''

Second signature expect 1 list and 1 value, this value would be automatically assigned to all keys.
string[] addresses = ['', ''];
map[string, boolean] enabledAddresses = toMap(addresses, TRUE);
//enabledAddresses[''] == TRUE;
So in our case:
 string[] myList = ['a', 'b', 'c', 'a', 'c', 'b'];
map[string, string] myListAsMap = toMap(myList, myList);//we don't care about values
string[] uniqueMyList = getKeys(myListAsMap);
//uniqueMyList == ['a', 'b', 'c']
You can use printLog() funtion to check content of variables (used in commented lines with '==' which denotes what is expected to be in the variable) - values will be printed into log during runtime or new functionality in CloverETL 4.3.0 called CTL debugging.

I hope this short blogpost showed you something new and useful, all comments welcomed. See you at next "How to" article :).

No comments:

Post a Comment