BookmarkSubscribeRSS Feed
Attyslogin
Obsidian | Level 7

Hey community.

please correct me if i have chosen  wrong board.

I am trying to read a CSV file (delimeter = tilda"~") comprised of 2 lakhs rows.

When i tried to import it using SAS-DI the column information is getting populated wrong when i do say autofill as it considers only first 200 rows by default.

As in my case data is huge and might in future i cant even predict exact count of it.

So whats the solution to this ?

for now as a work around i am using SAS-EG to get column information correctly as it considers whole data

5 REPLIES 5
LinusH
Tourmaline | Level 20

DI Studio is not a data exploration tool, even though it has a quick auto fill functionality.

So I think that you should use other resources to correctly define your source data, which the most important is a file specification that your data provider and you and DW agree upon.

SAS metadata assumes stability, you really can't build a future proof routine, since data types and lengths are mapped forward in the ETL processes, and end user reports/application rely upon a stable data set lay-outs.

Data never sleeps
Attyslogin
Obsidian | Level 7

Well what are  different ways to obtain correct column information of csv file automatically ?

As a work around i am using SAS-EG to populate column information automatically and then pasting generated code into user written code transformation of SAS DI. It works smooth.

 

But would like to know what are other ways to do the same. Thx  

LinusH
Tourmaline | Level 20

Again, it should be a handshake with the sending party of the file.

Once defined in DI Studio, it can't be dynamic.

Data never sleeps
Attyslogin
Obsidian | Level 7

I love your breath line linusH.

Data Never Sleep. Thanks 

Patrick
Opal | Level 21

DI Studio is used to implement production worthy ETL processes. It is not a data exploration tool.

 

You need to define and fully document your data sources as part of the design process and you need to have done this work before you start build. If required use SAS EG and the like for data exploration and DQ assessment during the design phase.

 

If possible then you have an interface contract in place with your upstream data providers which fully defines the data structures, data types, value ranges, delivery format, delivery method and periodicity,... If it's not possible to establish such a contract then you still need to fully document these things as part of your design and program documentation.

 

All ETL processes need stable and agreed data structures. If that's not what you're dealing with then consider if ETL processes are the right thing to use, or if you're rather in an area of adhoc "reporting" which requires regular user interaction and program maintenance.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 843 views
  • 0 likes
  • 3 in conversation