BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ginak
Quartz | Level 8

Hello,

 

Sometimes I receive .csv files from folks who work in R and change everything that's missing to "NA." What's an easy way to import the .csv while telling SAS to treat "NA" as ".m"? For example,

 

data have;	
input ID status (start end) (:mmddyy10.) number;
format start end yymmdd10. number ;
cards;
1A	active 	01/02/2011 02/01/2011      9495552233
1A	revoked    NA               NA     9495523344
1A	active 	05/01/2013	06/03/2016 (949)553-4488
;

Except in my data, these character variables appear deep into the data.The data I imported could look something like this and Ireceived an error because of the second line and third line, since SAS assumed two dates and a numeric variable, only to be surprised with characters in those. I usually go into the proc import and manually change each variable to a character to avoid errors, but then it's cumbersome to have to go back and change those back to numeric manually, especially when I have a lot of variables.

Thanks!

Best,

Gina

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One way is to just pre-process the data to remove the NA. That is really easy if you know the number of columns in the source file.  So if the file has 20 columns and the longest possible value is 200 characters then a program like this will convert all the NA to blanks.  Then SAS will properly read those a missing.  If you really wanted to treat them as some type of special missing then convert them into a single letter instead.  But why would need to have missing and NA as meaning two different things?

data _null_;
   infile 'from_r.csv' dsd truncover ;
   file 'to_sas.csv' dsd ;
   length word $200 ;
   do i=1 to 20;
      input word @;
      if word='NA' then word=' ';
      put word @;
   end;
   put;
run;

Then you could still try using PROC IMPORT to guess what is actually in the file.  Note that it is not hard to figure out how many variables are in the file and put that into a macro variable.

data _null_;
  infile 'from_r.csv' obs=1;
  input;
  call symputx('ncol',countw(_infile_,',','q'));
run;

Another method is to make your own informats to handle the NA strings.

proc format;
invalue from_r
 'NA' = .
other = [comma32.]
;
invalue from_r_d
 'NA' = .
other = [anydtdte32.]
;
run;
data want ;
  infile 'from_r.csv' dsd truncover firstobs=2;
  length col1 $20 col2 8 col3 8;
  informat _numeric_ from_r.;
  informat col3 from_rd. ;
  format col3 yymmdd10.;
run;

View solution in original post

7 REPLIES 7
PGStats
Opal | Level 21

Read _INFILE_ buffer, replace NA with . and read again :

 

data have;
infile datalines dsd;
input @;
_infile_ = transtrn(_infile_, ",NA", ",."); /* Replace NA with . */
input @1 ID $ status $ (start end) (:mmddyy10.);
format start end yymmdd10.;
datalines;
1A,active,01/02/2011,02/01/2011
1A,revoked,NA,NA
1A,active,05/01/2013,06/03/2016
;
PG
ginak
Quartz | Level 8

Thank you!! This is very helpful!

 

If I were to use the proc import code, where could I denote that?

 

 data MAIN2.combined_dat ;
   %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
   infile 'B:\Gina\combined_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
      informat ID best32. ;
      informat phoneNumber best32. ;
      informat strt_dt mmddyy10. ;
      informat end_dt mmddyy10. ;
.....
ballardw
Super User

@ginak wrote:

Thank you!! This is very helpful!

 

If I were to use the proc import code, where could I denote that?

 

 data MAIN2.combined_dat ;
   %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
   infile 'B:\Gina\combined_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
      informat ID best32. ;
      informat phoneNumber best32. ;
      informat strt_dt mmddyy10. ;
      informat end_dt mmddyy10. ;
.....

To use a custom informat you need to use a data step. Proc Import is not able to use such.

If you have multiple files that should have the same structure and always read them with Proc Import you will eventually have an issue with one or more variables changing data type or character variables with different lengths causing issues unless you are very lucky. Proc Import guesses as to variable types, lengths and informats. It can be pretty good but for this specific issue you need to use a data step to use either of the approaches.

Tom
Super User Tom
Super User

Watch out for real values that happen to start with NA.

_infile_=tranwrd(cats(',',_infile_,','),',NA,',',,');
_infile_=substrn(_infile_,2,length(_infile_)-2);
Tom
Super User Tom
Super User

One way is to just pre-process the data to remove the NA. That is really easy if you know the number of columns in the source file.  So if the file has 20 columns and the longest possible value is 200 characters then a program like this will convert all the NA to blanks.  Then SAS will properly read those a missing.  If you really wanted to treat them as some type of special missing then convert them into a single letter instead.  But why would need to have missing and NA as meaning two different things?

data _null_;
   infile 'from_r.csv' dsd truncover ;
   file 'to_sas.csv' dsd ;
   length word $200 ;
   do i=1 to 20;
      input word @;
      if word='NA' then word=' ';
      put word @;
   end;
   put;
run;

Then you could still try using PROC IMPORT to guess what is actually in the file.  Note that it is not hard to figure out how many variables are in the file and put that into a macro variable.

data _null_;
  infile 'from_r.csv' obs=1;
  input;
  call symputx('ncol',countw(_infile_,',','q'));
run;

Another method is to make your own informats to handle the NA strings.

proc format;
invalue from_r
 'NA' = .
other = [comma32.]
;
invalue from_r_d
 'NA' = .
other = [anydtdte32.]
;
run;
data want ;
  infile 'from_r.csv' dsd truncover firstobs=2;
  length col1 $20 col2 8 col3 8;
  informat _numeric_ from_r.;
  informat col3 from_rd. ;
  format col3 yymmdd10.;
run;
ginak
Quartz | Level 8

Hi Tom,

 

Thank you! I ended up doing what you first suggested:

 

data _null_;
   infile 'B:\Gina\combineddata.csv' dsd truncover ;
   file 'B:\Gina\to_sas.csv' dsd ;
   length word $200 ;
   do i=1 to 140;
      input word @;
      if word='NA' then word=' ';
      put word @;
   end;
   put;
run;

I had 140 variables. This worked! I then used proc import to import my data, and am going through to change the phone numbers to character (most are in 8005525522 format, but there's some that look like (800)555-3344 and it's throwing SAS off). I wonder if there is a better way to do this and have SAS scan more records sot hat it can see that they're longer than $1 or something.
Thanks for your help!

Tom
Super User Tom
Super User

You can add the GUESSINGROWS=MAX statement to your PROC IMPORT step.

 

But a better solution would be use to use a format that preserves the data types.  There are packages for for R that will allow you write out a SAS dataset or at least a SAS transport file.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 12676 views
  • 9 likes
  • 4 in conversation