BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xyxu
Quartz | Level 8

I have a folder of CSV files named based on daily dates, such as "data_20150101". I am trying to import them and append to one single dataset. My current code is:

%macro import(start, end);
	%let start = %sysfunc(inputn(&start, anydtdte9.));
	%let end = %sysfunc(inputn(&end, anydtdte9.));
	%let dif = %sysfunc(intck('day', &start, &end));
    	%do i = 0 %to &dif;
			%let date = %sysfunc(intnx('day', &start, &i, 'e'), yymmddn8.);
				data data_new;
					infile "&in_path\data_&date..csv" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2; 
					informat 
					var1 $10.
					var2 10.;
					input var1 var2;
				run;
				proc append base = want data = data_new; run;
		%end;
%mend;

%import(20150101, 20151231);

When I run the code, nothing happens. I got a message: 

WARNING: The quoted string currently being processed has become more than 262 bytes long. You
might have unbalanced quotation marks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If you want to read all of the files whose names start with data_ in that folder then NO explicit loop is needed;

You might try this code to see what happens:

data data_new;
	infile "&in_path\data_*.csv" delimiter = ',' 
          dsd missover lrecl = 32767 firstobs = 2; 
	informat 
	var1 $10.
	var2 10.
   ;
	input var1 var2;
run;

The * wildcard in the file name will attempt to read every file in the folder that starts with data_  .

If you run that you will get an "invalid data for var2" for each file when the header row is encountered.

 

That header row problem can be addressed with some additional options and conditional statements.

 

Obviously untested because I don't have your folders or data sets:

data data_new;
	infile "&in_path\data_*.csv" delimiter = ',' 
          dsd missover lrecl = 32767 firstobs = 2
          eov=skip; 
	informat 
	var1 $10.
	var2 10.
   ;
   input @;
   if skip then skip=0;
   else do;
      input var1 var2;
output; end; run;

The option EOV sets a variable, skip in this case to the value of 1 when the first first line of a new file is read. The Input @; starts to read the line of data but the @ holds the pointer on that line. Then we test the Skip variable. If the value is 1 then 'if skip' is true and we just reset the value otherwise we know we are not on a header row and the input reads the data line. The explicit output is to only output data when not on a header row.

View solution in original post

4 REPLIES 4
mklangley
Lapis Lazuli | Level 10

Are you missing a semicolon after your informat statement?

data data_new;
	infile "/sasnas/ovations/fic/phi2/mmr/prod/PRS_CGD_Driver_file.xlsx" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2; 
	informat 
	var1 $10.
	var2 10.;
	input var1 var2;
run; 

 

xyxu
Quartz | Level 8
Oh sorry about that. It was missed when I was simplifying that part. Added back.
ballardw
Super User

If you want to read all of the files whose names start with data_ in that folder then NO explicit loop is needed;

You might try this code to see what happens:

data data_new;
	infile "&in_path\data_*.csv" delimiter = ',' 
          dsd missover lrecl = 32767 firstobs = 2; 
	informat 
	var1 $10.
	var2 10.
   ;
	input var1 var2;
run;

The * wildcard in the file name will attempt to read every file in the folder that starts with data_  .

If you run that you will get an "invalid data for var2" for each file when the header row is encountered.

 

That header row problem can be addressed with some additional options and conditional statements.

 

Obviously untested because I don't have your folders or data sets:

data data_new;
	infile "&in_path\data_*.csv" delimiter = ',' 
          dsd missover lrecl = 32767 firstobs = 2
          eov=skip; 
	informat 
	var1 $10.
	var2 10.
   ;
   input @;
   if skip then skip=0;
   else do;
      input var1 var2;
output; end; run;

The option EOV sets a variable, skip in this case to the value of 1 when the first first line of a new file is read. The Input @; starts to read the line of data but the @ holds the pointer on that line. Then we test the Skip variable. If the value is 1 then 'if skip' is true and we just reset the value otherwise we know we are not on a header row and the input reads the data line. The explicit output is to only output data when not on a header row.

xyxu
Quartz | Level 8
This approach works amazingly well. For the header issue, I can easily delete observations that are written from headers. Thank you!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 709 views
  • 1 like
  • 3 in conversation