BookmarkSubscribeRSS Feed
loredana_cornea
Obsidian | Level 7

Hello,

I was wondering if someone found a perfect way to import csv/text files to sas without having to change the formats afterwards.

Someone suggested in a post changing the default number for "GuessingRows" in SAS Registry (regedit command), but there is no "perfect" number to use efficiently on every type of file.

Sometimes I would have data with missing values in the first 5 rows, sometimes my first half of the file has missing values, sometimes qualitative variables take a short value in the first rows and afterwards the really long ones.

Is there a way to perfectly import data without having to change some variables' format afterwards?

Thank you!

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

As described on quite a few posts on here, using the proc import syntax is shaky a best.  You are allowing a generalized procedure to guess what you want to do.  IMO I would advise to avoid proc import at all.  Look at your data, understand the data, write code which imports that data.

data want;

     length var1 var2 $10;

     infile "xyz.csv";

     input var1 $ var2 $;

run;

This may seem a bit more effort than letting proc import guess it for you, however in the long term you:

1) Get complete control over the import

2) Catch errors early on

3) Understand the data structure.

loredana_cornea
Obsidian | Level 7

@RW9

Thank you for your response. I agree with you, but it is an assignment and I'm going to have to find a way.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

What is your assignment exactly?  If it is, write a program which imports any csv file exactly as we want, I am afraid you are on a road nowhere.  There is no such thing as code which can handle any eventuality.  Even if you go down the road of reading the complete file character by character and having some complex algorithm to calculate each column you are still going to come up with scenarios where the data just fit.  You have to have some kind of idea of what the data is going to be structure-wise.

ballardw
Super User

For CSV files there is no need to edit the registry, that value is a default.

CSV files allow the guessingrows as an option in the procedure call. The max value is 2147483647. If you have more rows than that before a variable changes behavior then you're likely hosed anyway.

As I set it the biggest issues have to due with numerically coded data with 1) significant leading zeroes and/or 2) 15 or more digits. Things like account numbers or identifiers most of the time should not be numeric and proc import is likely to assign them as numeric but account 0001 and 00001 aren't different as numbers (bad numbering but the example works) or a value like 1234567891234567 may exceed storage precision for integers (hint: you'll see values like 1.23E15)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1468 views
  • 0 likes
  • 3 in conversation