BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
denperp
Obsidian | Level 7

Hi, I am working on making my code more data driven (SAS 9.4 Windows) but am stumbling with something which could be super easy or not so much…. 

 

This all works fine until the end where SAS is reading some variables as character instead of numeric.
Here is the process:


I manually download csv data into a folder.  

denperp_0-1701289379596.png

 

filename HR pipe 'dir "[directory path]*.csv" /b';
/* This step resolves the file names in the directory (the downloaded csv files) to SAS dataset "files" - one table is created "fname" with the names of the csv files*/
data _HR;
	length fname $20;	
        infile HR  truncover length= reclen;
		input fname $varying20. reclen;
run;

 

denperp_1-1701289379597.png

 

/*extracting the year out of each of the files from the previous step*/

data _HR1;
set _HR;
ipedsyear = compress(fname,,'kd');
run;

 

denperp_2-1701289379597.png

(the &&ipedsyr&i is for later - not part of this question)

 

/*This puts all the years available into a variable for processing and number obs - */

proc sql; select distinct ipedsyear into: ipedsyr1-
from _hr1
order by ipedsyear;
%let numyrs = &sqlobs;
quit;

 

 

Here is the macro to pull all the files in:

 

/*This loops through as many files as there are to pull in the csv files*/

%macro fileinput;
            %local i;
            %do i= 1 %to &numyrs;
                        data _null_;
                                    set _HR1(firstobs= &i obs= &i);
                                                call symput('fpath', "[file path name]" || fname);
                                                call symput('foutname', scan(fname, 1, '.') );
                        run;
                        proc import out= work.&foutname
                                    datafile= "&fpath"
                                    dbms=csv replace;
                        run;
            %end;
%mend;

%fileinput

 

I then set them together:

 

data _HR2 ;

length ipedsyear $4;
length unitidc $6;
set EAP20: (keep = [variable list])  indsname = in  ;
ipedsyear =compress(in,,'kd');
unitidc = put(unitid,6.);
run;

This works….except two of the variables do not have data until about 100 rows down so it is being read as a character value with a format of $1, instead of a numeric value (values up to 9999).  I know with proc import it is just reading the first x rows and assigning a numeric/character value. 

 

How can I either increase the number of rows that SAS reads in with proc import, or adapt the following import syntax into the above macro (which worked without a macro) but was having to do this multiple times:

 

 

data IPEDS.FY&FY ;
infile "&filepath\&datapull\eap&FY..csv" delimiter=',' DSD MISSOVER firstobs=2 lrecl=32736;
informat
[specify all needed variables here]
input
[specify all needed variables here];
RUN;

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
denperp
Obsidian | Level 7

Excellent Tom, thank you.  I was reading too quickly the first time around.  I was overthinking it.

View solution in original post

7 REPLIES 7
Reeza
Super User
Don't use proc import, instead write a data step to read the files that will read each file correctly with the same types/formats. Proc import guesses, and as you see, it doesn't always make the same guess 🙂
ballardw
Super User

Proc Import examines a very few number of rows by default to set type. So if there are missing values for a variable in the first row it may become character. Which is why Proc Import has an option GUESSINROWS to override that. Use GUESSINGROWS=MAX; to use the largest value of rows to examine before setting variable types.

Personally if these CSV files are supposed to be of similar content then you really should use a data step to read them consistently as lengths of character variables can change between files read with Import causing any number of headaches. The file source really should have documentation somewhere as to maximum length of character variables, layout of date/time/datetime values, order of variables and descriptions of the variables so you use the data properly.

 

 

If the file has more than one header row then the text of the second (or third or whatever) header is going to be treated a character by Import. Which is why Proc Import has a DATAROW option to tell the procedure which is the first actual row of data in a file.

Tom
Super User Tom
Super User

Because you told PROC IMPORT to GUESS how to read the data. And to GUESS for each file independent of the other files.  And to GUESS using only the first 20 lines of the file.

 

Just write your own data step to read the files.

Here is a simple template.  

 

data EAP;
  length filename $200 fname $20 ipedsyear 8 ;
  infile "[directory path]eap*.csv" dsd truncover filename=filename ;
  input @;
  if filename ne lag(filename) then delete;
  fname = scan(filename,-1,'/\');
  ipedsyear = input(substr(fname,4),4.);
  length var1 $25 var2 8 var3 $10 var4 8 var5 8 varlast $10;
  informat var5 mmddyy.;
  format var5 yymmdd10.;
  input var1 -- varlast;
run;

Just update the LENGTH statement to use the names and types of the variables in your CSV files.  Update the INFORMAT and FORMAT for any variables that need them (normally only DATE, TIME and DATETIME values need special informats or formats).  And update the INPUT statement to use the variable list based on the first and last variable in your files.

 

For my example I assumed your CSV files have 6 variables named VAR1 to VAR5 and VARLAST.  Some of them are character and some are numeric (length=8).  One is a date string that has values stored in MDY order in the CSV file but that you want to print using YMD order so that you don't confuse half of your audience.

 

denperp
Obsidian | Level 7

Thank you all for your quick responses.   Perhaps I was too subtle in my request - I want to incorporate this data step (or the other one kindly suggested) into the macro that I have instead of the proc import.

 

Here is the macro:

 

%macro fileinput;
	%local i;
	%do i= 1 %to &numyrs;
		data _null_;
			set _HR1(firstobs= &i obs= &i);
				call symput('fpath', "L:\Metrics Calculated by KU\AAU Metrics Calculation\Faculty Counts\Faculty Count Derivation Files\Raw Data downloaded from IPEDS and AAMC\IPEDS Data Pulls\" || fname);
				call symput('foutname', scan(fname, 1, '.') );
		run;
		[old proc import syntax I am wanting to replace]
	%end;
%mend;

%fileinput

Each time I run this I will have multiple files.  I want the program to be able to read what my file names are (fname in my screenshot) automatically depending on how many files I have.   I need the infile statement to read that number accordingly and not manually put in the name of the csv file to be read as I had been doing.

 

data [new data set name or names] ;
infile "&filepath\&datapull\[csv file name]..csv" delimiter=',' DSD MISSOVER firstobs=2 lrecl=32736;
informat
[variable list]
input
[variable list];  
RUN;

Thank you.

Tom
Super User Tom
Super User

Do you plan to later combine them into one dataset? 

I already showed you how to do your whole process in one data step with no macro logic at all.

 

If you really, really need to generate a separate dataset for each CSV file then make a macro like this.

%macro read(filename,dsname);
data &dsname;
  infile &filename dsd truncover firstobs=2;
  ... rest of data step to read these files properly ...
run;
%mend read;

Then in a data step that has the list of files use CALL EXECUTE() to call the macro once for each file.

data _null_;
  set filelist;
  length dsname $32;
  dsname=scan(scan(fname,-1,'/\'),1,'.');
  call execute(cats('%nrstr(%read)(',quote(trim(fname)),',',dsname,')'));
run;

And again there is no need to make a lot of macro variables or use complex macro logic.  Just the one very simple macro definition.

 

 

denperp
Obsidian | Level 7

Excellent Tom, thank you.  I was reading too quickly the first time around.  I was overthinking it.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1040 views
  • 0 likes
  • 5 in conversation