Solved: Re: How to deal with changing columns in SAS DI

gyambqt · Posted 09-15-2021 10:27 PM

Hi Experts,

If I receive a CSV file every morning with changing columns inside of the CSV. I have a file reader node in SAS DI to read this CSV file. How do I solve changing columns issue in DI when create a metadata source or target table?

Kurt_Bremser · Posted 09-16-2021 05:29 AM

@gyambqt wrote:
You could have more variables or changing variable name.

So human intervention will be needed with every new file, and you can forget to automate that. How could you prepare any process to deal with data that is not known beforehand?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 09-16-2021 03:09 AM

DI Studio is for automating processes.

If you want to automate things, those things have to be automatable. A csv file with changing structure is surely not.

Go to the source of the data and make them send consistent data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Patrick · Posted 09-16-2021 03:20 AM

An ETL process needs stable data structures and a change to these structures is a change to the ETL process.

If you wouldn't use DI: Could you write code that can deal with this changing .csv and that is re-runnable without you having to change anything? If so then please explain in more detail what you're dealing with.

gyambqt · Posted 09-16-2021 03:24 AM

The CSV file they sent is with changing column every day. I just want to explore alternatives to read inconsistent CSV columns

Patrick · Posted 09-16-2021 03:28 AM

But in the end you will need to map these data from the .csv into a table where the columns don't change in order to implement any reasonable process.

So if you use DIS for the right purpose (ETL) then please explain in more detail. ...and if you can explain the logic how to map into a stable table structure then that's already 75% of the solution.

For example if the column name got a date component in it but it's always the first column in the .csv then this is something that can be dealt with. But you need to provide more detail for us to give any advice.

Kurt_Bremser · Posted 09-16-2021 03:29 AM

Charge the <redacted> responsible for this mess for your work. Maybe losing money will cause their (obviously few) neurons to fire.

A daily changing structure means daily coding work, totally undermining the principles on which Data Integration is built (consistent predefined metadata).

So if this stupidity is going on, be prepared to fire up your Enterprise Guide each day, inspect the file, and write code to bring it into the shape you need. Then you can hand off the result to DI.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

gyambqt · Posted 09-16-2021 03:50 AM

I am wondering if I use proc import to read the header in the csv file. And make each column name a macro variable. In DI I use &column1 &column2 etc. Do you think it is working? But I don’t know how to deal with my Scd2 issue as you need consistent name for variables used for detect change.

Patrick · Posted 09-16-2021 04:05 AM

@gyambqt wrote:
I am wondering if I use proc import to read the header in the csv file. And make each column name a macro variable. In DI I use &column1 &column2 etc. Do you think it is working? But I don’t know how to deal with my Scd2 issue as you need consistent name for variables used for detect change.

No, this will not work.

Kurt_Bremser · Posted 09-16-2021 04:08 AM

PROC IMPORT will add even more inconsistency into this, as you can then not even be sure of variable types.

Which part(s) of the file structure will change from day to day?

If it is only the order of columns, one might be able to read the first line, join it with a dataset containing the column attributes, and then write the data step to read the data dynamically from the resulting dataset.

If variable names or attributes (like date formats) change, you're hosed, and you need to use your good ol' eyeballs Mk1 to sense what is contained in the file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

gyambqt · Posted 09-16-2021 05:14 AM

You could have more variables or changing variable name.

Kurt_Bremser · Posted 09-16-2021 05:29 AM

@gyambqt wrote:
You could have more variables or changing variable name.

So human intervention will be needed with every new file, and you can forget to automate that. How could you prepare any process to deal with data that is not known beforehand?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AllanBowe · Posted 09-16-2021 04:29 AM

Hi @gyambqt,

Loading excel files (of any type), with varying column position / order, as many extra columns as you like, from any range, on any worksheet, with any filename is perfectly possible using Data Controller for SAS

This is a third party tool (which my team maintain) which is also free for 5 users.

The user simply chooses the target table and drags the excel file into the browser. Based on the target table metadata, we can extract the relevant columns, perform any data quality rules, and upload both the data and the original excel file to SAS - using SCD2 if needed. Bitemporal, Retained Keys, composite keys etc are all supported. You can even configure a job to run after an upload.

There is a video explaining the process here: https://youtu.be/Dhy1pAbCaEw?t=592

No need for any special ACCESS engines, we use an OEM version of SheetJS to parse the excel.

Further documentation here.

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)
Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs

ballardw · Posted 09-16-2021 10:26 AM

@gyambqt wrote:

Hi Experts,

If I receive a CSV file every morning with changing columns inside of the CSV. I have a file reader node in SAS DI to read this CSV file. How do I solve changing columns issue in DI when create a metadata source or target table?

Go to management and get them involved about fixing changing file layouts. This sort of thing indicates process problems that cost money and man-hours for next to no valid reason. There should have been an agreed file structure with variable lengths, types and possibly even content for some fields put in place.

I worked as a contractor where I received 3 files per week for use. The client asked why we were charging them $$$ every month for programming as "the programming should have been done once". We explained that we would get files with columns in different order, with different column headers, and sometimes random character values in what should be numeric fields and so had to reprogram the read program 2 or 3 times weekly. Once they figured out how many thousands of dollars that was costing them they were able to convince there data extract people to get a standard file format.

SAS Innovate 2025: Call for Content