SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

while importing raw file, column information is getting populated incorrectly.

Reply
Contributor
Posts: 29

while importing raw file, column information is getting populated incorrectly.

Hey community.

please correct me if i have chosen  wrong board.

I am trying to read a CSV file (delimeter = tilda"~") comprised of 2 lakhs rows.

When i tried to import it using SAS-DI the column information is getting populated wrong when i do say autofill as it considers only first 200 rows by default.

As in my case data is huge and might in future i cant even predict exact count of it.

So whats the solution to this ?

for now as a work around i am using SAS-EG to get column information correctly as it considers whole data

Super User
Posts: 5,426

Re: while importing raw file, column information is getting populated incorrectly.

Posted in reply to Attyslogin

DI Studio is not a data exploration tool, even though it has a quick auto fill functionality.

So I think that you should use other resources to correctly define your source data, which the most important is a file specification that your data provider and you and DW agree upon.

SAS metadata assumes stability, you really can't build a future proof routine, since data types and lengths are mapped forward in the ETL processes, and end user reports/application rely upon a stable data set lay-outs.

Data never sleeps
Contributor
Posts: 29

Re: while importing raw file, column information is getting populated incorrectly.

Well what are  different ways to obtain correct column information of csv file automatically ?

As a work around i am using SAS-EG to populate column information automatically and then pasting generated code into user written code transformation of SAS DI. It works smooth.

 

But would like to know what are other ways to do the same. Thx  

Super User
Posts: 5,426

Re: while importing raw file, column information is getting populated incorrectly.

Posted in reply to Attyslogin

Again, it should be a handshake with the sending party of the file.

Once defined in DI Studio, it can't be dynamic.

Data never sleeps
Contributor
Posts: 29

Re: while importing raw file, column information is getting populated incorrectly.

I love your breath line linusH.

Data Never Sleep. Thanks 

Respected Advisor
Posts: 4,173

Re: while importing raw file, column information is getting populated incorrectly.

Posted in reply to Attyslogin

DI Studio is used to implement production worthy ETL processes. It is not a data exploration tool.

 

You need to define and fully document your data sources as part of the design process and you need to have done this work before you start build. If required use SAS EG and the like for data exploration and DQ assessment during the design phase.

 

If possible then you have an interface contract in place with your upstream data providers which fully defines the data structures, data types, value ranges, delivery format, delivery method and periodicity,... If it's not possible to establish such a contract then you still need to fully document these things as part of your design and program documentation.

 

All ETL processes need stable and agreed data structures. If that's not what you're dealing with then consider if ETL processes are the right thing to use, or if you're rather in an area of adhoc "reporting" which requires regular user interaction and program maintenance.

 

 

Ask a Question
Discussion stats
  • 5 replies
  • 222 views
  • 0 likes
  • 3 in conversation