Help using Base SAS procedures

clean PROC IMPORT without having to modify the formats

Reply
Contributor
Posts: 31

clean PROC IMPORT without having to modify the formats

Hello,

I was wondering if someone found a perfect way to import csv/text files to sas without having to change the formats afterwards.

Someone suggested in a post changing the default number for "GuessingRows" in SAS Registry (regedit command), but there is no "perfect" number to use efficiently on every type of file.

Sometimes I would have data with missing values in the first 5 rows, sometimes my first half of the file has missing values, sometimes qualitative variables take a short value in the first rows and afterwards the really long ones.

Is there a way to perfectly import data without having to change some variables' format afterwards?

Thank you!

Super User
Super User
Posts: 7,997

Re: clean PROC IMPORT without having to modify the formats

Posted in reply to loredana_cornea

Hi,

As described on quite a few posts on here, using the proc import syntax is shaky a best.  You are allowing a generalized procedure to guess what you want to do.  IMO I would advise to avoid proc import at all.  Look at your data, understand the data, write code which imports that data.

data want;

     length var1 var2 $10;

     infile "xyz.csv";

     input var1 $ var2 $;

run;

This may seem a bit more effort than letting proc import guess it for you, however in the long term you:

1) Get complete control over the import

2) Catch errors early on

3) Understand the data structure.

Contributor
Posts: 31

Re: clean PROC IMPORT without having to modify the formats

@RW9

Thank you for your response. I agree with you, but it is an assignment and I'm going to have to find a way.

Super User
Super User
Posts: 7,997

Re: clean PROC IMPORT without having to modify the formats

Posted in reply to loredana_cornea

What is your assignment exactly?  If it is, write a program which imports any csv file exactly as we want, I am afraid you are on a road nowhere.  There is no such thing as code which can handle any eventuality.  Even if you go down the road of reading the complete file character by character and having some complex algorithm to calculate each column you are still going to come up with scenarios where the data just fit.  You have to have some kind of idea of what the data is going to be structure-wise.

Super User
Posts: 11,343

Re: clean PROC IMPORT without having to modify the formats

Posted in reply to loredana_cornea

For CSV files there is no need to edit the registry, that value is a default.

CSV files allow the guessingrows as an option in the procedure call. The max value is 2147483647. If you have more rows than that before a variable changes behavior then you're likely hosed anyway.

As I set it the biggest issues have to due with numerically coded data with 1) significant leading zeroes and/or 2) 15 or more digits. Things like account numbers or identifiers most of the time should not be numeric and proc import is likely to assign them as numeric but account 0001 and 00001 aren't different as numbers (bad numbering but the example works) or a value like 1234567891234567 may exceed storage precision for integers (hint: you'll see values like 1.23E15)

Ask a Question
Discussion stats
  • 4 replies
  • 420 views
  • 0 likes
  • 3 in conversation