Tuesday, August 16, 2016

How to get all sheet names from Excel

Imagine this use case: You have an Excel file with dynamically changing number of Sheets, how you could get list with names of all sheets?

You might know that in CloverETL there is SpreadsheetReader component that can read .xlsx, .xls files. You know that you can use its graphical view to map content from an Excel file to a metadata. You might not know that you can read from multiple sheets with same SpreadsheetDataReader. It has one requirement though, all sheets need to have same looking records, there is no way how to use single SpreadsheetDataReader to parse records with different metadata! 
You can use wildcards '?' and '*' in Sheet property of a SpreadsheedDataReader to specify multiple sheets to be read. So '*' character in Sheet will be first step of our example.


Next step would be creating metadata and mapping. To get list of all sheet names you need read at least one record from the sheet. The value is not important, so I created metadata called DummyRecord with 'dummy' field. I am using this metadata in Mapping property editor where I select A1 column as a source as I expect this cell will always contain something. I used "Map by order" button to map content of A1 to "dummy" field and "Data offset (global)" set to 0 - this is to read value directly from A1 cell, other way component would read data from A2. Notice orange label of A1 cell and yellow label of A2 before you set Data offset (global) to 0. Orange means header, yellow denotes first line of data.

After this step we would read content of A column from all sheets in our excel. But we need only one record per sheet. Luckily there is Advanced property just for this: "Max number of records per spreadsheet".


No I have 1 record per sheet. But still no sheet names! 
You might notice that there is no functionality inside of Mapping editor to pull sheet name. For that we will use Autofilling property in Metadata.

Autofilling property is handy feature that creates 'virtual' fields that are not in source. These fields are filled automatically from list of possible configurations. Fields with configured Autofilling property could carry information for example about order of the record in the file, name of the source file or sheet name. And that is exactly what we need!
Lets go to metadata editor behind SpreadsheetDataReader, add there new field, name it originally 'sheetName', name of the field is not important, Autofilling property is. For that select newly created field, scroll in right column to Autofilling property and choose sheet_name. 


Finally I have 1 record per sheet with its sheet name.

I decided to make this fancier, so I wrapped this functionality in Subgraph, parametrized File URL property of SpreadsheetDataReader to select parsed file more easily. I created new metadata without dummy field in the subgraph and used Reformat to convert records between these metadata.
Result is generic subgraph component that produces list of sheet names from selected excel file.
I highly recommend check CloverETL documentation about Autofilling property: http://doc.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/autofilling.html , it is not so known feature that could help you if you know about.

See attached project for your reference: Project

No comments:

Post a Comment