BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gyambqt
Obsidian | Level 7

Hi Experts,

 

If I receive a CSV file every morning with changing columns inside of the CSV. I have a file reader node in SAS DI to read this CSV file. How do I solve changing columns issue in DI when create a metadata source or target table?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@gyambqt wrote:
You could have more variables or changing variable name.

So human intervention will be needed with every new file, and you can forget to automate that. How could you prepare any process to deal with data that is not known beforehand?

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

DI Studio is for automating processes.

If you want to automate things, those things have to be automatable. A csv file with changing structure is surely not.

Go to the source of the data and make them send consistent data.

Patrick
Opal | Level 21

An ETL process needs stable data structures and a change to these structures is a change to the ETL process. 

If you wouldn't use DI: Could you write code that can deal with this changing .csv and that is re-runnable without you having to change anything? If so then please explain in more detail what you're dealing with.

gyambqt
Obsidian | Level 7
The CSV file they sent is with changing column every day. I just want to explore alternatives to read inconsistent CSV columns
Patrick
Opal | Level 21

But in the end you will need to map these data from the .csv into a table where the columns don't change in order to implement any reasonable process. 

So if you use DIS for the right purpose (ETL) then please explain in more detail. ...and if you can explain the logic how to map into a stable table structure then that's already 75% of the solution.

 

For example if the column name got a date component in it but it's always the first column in the .csv then this is something that can be dealt with. But you need to provide more detail for us to give any advice.

Kurt_Bremser
Super User

Charge the <redacted> responsible for this mess for your work. Maybe losing money will cause their (obviously few) neurons to fire.

A daily changing structure means daily coding work, totally undermining the principles on which Data Integration is built (consistent predefined metadata).

So if this stupidity is going on, be prepared to fire up your Enterprise Guide each day, inspect the file, and write code to bring it into the shape you need. Then you can hand off the result to DI.

gyambqt
Obsidian | Level 7
I am wondering if I use proc import to read the header in the csv file. And make each column name a macro variable. In DI I use &column1 &column2 etc. Do you think it is working? But I don’t know how to deal with my Scd2 issue as you need consistent name for variables used for detect change.
Patrick
Opal | Level 21

@gyambqt wrote:
I am wondering if I use proc import to read the header in the csv file. And make each column name a macro variable. In DI I use &column1 &column2 etc. Do you think it is working? But I don’t know how to deal with my Scd2 issue as you need consistent name for variables used for detect change.

No, this will not work. 

Kurt_Bremser
Super User

PROC IMPORT will add even more inconsistency into this, as you can then not even be sure of variable types.

 

Which part(s) of the file structure will change from day to day?

If it is only the order of columns, one might be able to read the first line, join it with a dataset containing the column attributes, and then write the data step to read the data dynamically from the resulting dataset.

If variable names or attributes (like date formats) change, you're hosed, and you need to use your good ol' eyeballs Mk1 to sense what is contained in the file.

gyambqt
Obsidian | Level 7
You could have more variables or changing variable name.
Kurt_Bremser
Super User

@gyambqt wrote:
You could have more variables or changing variable name.

So human intervention will be needed with every new file, and you can forget to automate that. How could you prepare any process to deal with data that is not known beforehand?

AllanBowe
Barite | Level 11

Hi @gyambqt,

 

Loading excel files (of any type), with varying column position / order, as many extra columns as you like, from any range, on any worksheet, with any filename is perfectly possible using Data Controller for SAS

 

This is a third party tool (which my team maintain) which is also free for 5 users.

 

The user simply chooses the target table and drags the excel file into the browser.  Based on the target table metadata, we can extract the relevant columns, perform any data quality rules, and upload both the data and the original excel file to SAS - using SCD2 if needed. Bitemporal, Retained Keys, composite keys etc are all supported.  You can even configure a job to run after an upload.

There is a video explaining the process here:  https://youtu.be/Dhy1pAbCaEw?t=592

No need for any special ACCESS engines, we use an OEM version of SheetJS to parse the excel.

 

Further documentation here.

 

 

 

 

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs
ballardw
Super User

@gyambqt wrote:

Hi Experts,

 

If I receive a CSV file every morning with changing columns inside of the CSV. I have a file reader node in SAS DI to read this CSV file. How do I solve changing columns issue in DI when create a metadata source or target table?


Go to management and get them involved about fixing changing file layouts. This sort of thing indicates process problems that cost money and man-hours for next to no valid reason. There should have been an agreed file structure with variable lengths, types and possibly even content for some fields put in place.

 

I worked as a contractor where I received 3 files per week for use. The client asked why we were charging them $$$ every month for programming as "the programming should have been done once". We explained that we would get files with columns in different order, with different column headers, and sometimes random character values in what should be numeric fields and so had to reprogram the read program 2 or 3 times weekly. Once they figured out how many thousands of dollars that was costing them they were able to convince there data extract people to get a standard file format.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1667 views
  • 0 likes
  • 5 in conversation