11-24-2016 04:39 AM
please correct me if i have chosen wrong board.
I am trying to read a CSV file (delimeter = tilda"~") comprised of 2 lakhs rows.
When i tried to import it using SAS-DI the column information is getting populated wrong when i do say autofill as it considers only first 200 rows by default.
As in my case data is huge and might in future i cant even predict exact count of it.
So whats the solution to this ?
for now as a work around i am using SAS-EG to get column information correctly as it considers whole data
11-24-2016 07:24 AM
DI Studio is not a data exploration tool, even though it has a quick auto fill functionality.
So I think that you should use other resources to correctly define your source data, which the most important is a file specification that your data provider and you and DW agree upon.
SAS metadata assumes stability, you really can't build a future proof routine, since data types and lengths are mapped forward in the ETL processes, and end user reports/application rely upon a stable data set lay-outs.
12-06-2016 01:18 PM
Well what are different ways to obtain correct column information of csv file automatically ?
As a work around i am using SAS-EG to populate column information automatically and then pasting generated code into user written code transformation of SAS DI. It works smooth.
But would like to know what are other ways to do the same. Thx
12-07-2016 06:56 AM
Again, it should be a handshake with the sending party of the file.
Once defined in DI Studio, it can't be dynamic.
12-09-2016 08:15 PM
DI Studio is used to implement production worthy ETL processes. It is not a data exploration tool.
You need to define and fully document your data sources as part of the design process and you need to have done this work before you start build. If required use SAS EG and the like for data exploration and DQ assessment during the design phase.
If possible then you have an interface contract in place with your upstream data providers which fully defines the data structures, data types, value ranges, delivery format, delivery method and periodicity,... If it's not possible to establish such a contract then you still need to fully document these things as part of your design and program documentation.
All ETL processes need stable and agreed data structures. If that's not what you're dealing with then consider if ETL processes are the right thing to use, or if you're rather in an area of adhoc "reporting" which requires regular user interaction and program maintenance.