BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tvandaff
Calcite | Level 5

I've used the relatively popular macro for importing many CSV files into SAS files. I then used a code to alter certain facets of the files. Each of these Datasets then had three columns with NameVar (name of food), Nutrient, and Amount. I wanted to turn this into a row so I could easily merge all of these files. The merge worked on some files and not others. The log is also giving an error that the file naming convention is not correct. 

 

Then I use the PROC APPEND to add all the files together, which works for some and not others, I believe this is because some of the files did not successfully transpose. 

 

Can you help me correct these errors? I've been working on this for several days and cannot figure out where I went wrong: 

/*Create Macro to Create a List of all the CSV files in the Folder*/

%global caps;
%let caps=/folders/myshortcuts/myfolder/Capstone; 
libname capstone "&caps";


%macro list_files(dir, ext);
	%local filrf rc did memcnt name i;
	%let rc=%sysfunc(filename(filrf, &dir));
	%let did=%sysfunc(dopen(&filrf));

	%if &did eq 0 %then
		%do;
			%put Directory &dir cannot be open or does not exist;
			%return;
		%end;

	%do i=1 %to %sysfunc(dnum(&did));
		%let name=%qsysfunc(dread(&did, &i));

		%if %qupcase(%qscan(&name, -1, .))=%upcase(&ext) %then
			%do;
				%put &dir\&name;
				%let file_name =  %qscan(&name, 1, .);
				%put &file_name;

				data _tmp;
					length dir $512 name $100;
					dir=symget("dir");
					name=symget("name");
					path=catx('\', dir, name);
					the_name=substr(name, 1, find(name, '.')-1);
				run;

				proc append base=list data=_tmp force;
				run;

				quit;

				proc sql;
					drop table _tmp;
				quit;

			%end;
		%else %if %qscan(&name, 2, .)=%then
			%do;
				%list_files(&dir\&name, &ext) %end;
			%end;
		%let rc=%sysfunc(dclose(&did));
		%let rc=%sysfunc(filename(filrf));
		
%mend list_files;

/*Finish Macro and create code to import datasets, and also remove extraneous obs*/

%macro import_file(path, file_name, dataset_name);
		data &dataset_name REPLACE;
			INFILE "&path./&file_name." firstobs=5 dsd truncover end=last;
			LENGTH Nutrient Amount $20 Namevar $50;
			Namevar = "&file_name";
			Namevar = substr(Namevar,1,(length(Namevar)-4));
			INPUT Nutrient Amount;

			if not last then
				output;
		run;
		
		proc transpose data=&dataset_name out=&dataset_name;
			VAR Amount;
			ID Nutrient;
			BY Namevar;
		run;
		
		proc append base = FOOD01 FORCE 
			new = &dataset_name;
		run; 

run;

%mend;

%list_files(/folders/myshortcuts/myfolder/food_data, csv);

/*This code will iterate through the files and import each of them with a new title*/

	data _null_;
		set list;
		string=catt('%import_file(', dir, ', ', name, ', ', catt('food', put(_n_, 
			z2.)), ');');
		call execute (string);
	run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why not just read them all at once?

Is your LIST_FILES macro searching sub directories?

If not you can just use a wildcard in the INFILE statement.  Although it might be hard to exclude that last line of the source.

But if you make a dataset with the list of files to read you can use that to control the process.  So if the dataset is named FILELIST and the filename is in a variable named FILENAME then your code looks like:

data TALL ;              
  set filelist ;
  length Nutrient Amount $20 Namevar $50;
  namevar=scan(filename,-1,'.');
  namevar=substrn(scan(filename,-1,'/\'),1,length(scan(filename,-1,'/\'))-length(namevar)+1);
  infile raw filevar=filename firstobs=5 dsd truncover end=last;
  do while (not last);
    input nutrient amount;
    if not last then output;
  end;
run;

proc transpose data=TALL out=WIDE ;
  by namevar;
  id Nutrient;
  var Amount;
run;

Here is a link to a simple data step that can get all of the file names in a directory tree.

 https://communities.sas.com/t5/SAS-Programming/Size-of-a-directory/m-p/642202#M191554

 

View solution in original post

9 REPLIES 9
ballardw
Super User

Show some of the LOG with the errors you encountered. And since you are using macros you should set Options mprint; before executing the  macros to get the generated code with the errors. Copy from the log and paste into a code box opened on the forum with </> icon to preserve the formatting of the log and some of the common diagnostic characters that SAS places in the log to help debug things.

 

I would strongly suggest NOT transposing data to have multiple "nutrients" on a row unless you have a regression model that will use all of them as independent variables. And is your "amount" supposed to be numeric? I would sort of expect an amount to be numeric.

 

Tom
Super User Tom
Super User

Why not just read them all at once?

Is your LIST_FILES macro searching sub directories?

If not you can just use a wildcard in the INFILE statement.  Although it might be hard to exclude that last line of the source.

But if you make a dataset with the list of files to read you can use that to control the process.  So if the dataset is named FILELIST and the filename is in a variable named FILENAME then your code looks like:

data TALL ;              
  set filelist ;
  length Nutrient Amount $20 Namevar $50;
  namevar=scan(filename,-1,'.');
  namevar=substrn(scan(filename,-1,'/\'),1,length(scan(filename,-1,'/\'))-length(namevar)+1);
  infile raw filevar=filename firstobs=5 dsd truncover end=last;
  do while (not last);
    input nutrient amount;
    if not last then output;
  end;
run;

proc transpose data=TALL out=WIDE ;
  by namevar;
  id Nutrient;
  var Amount;
run;

Here is a link to a simple data step that can get all of the file names in a directory tree.

 https://communities.sas.com/t5/SAS-Programming/Size-of-a-directory/m-p/642202#M191554

 

tvandaff
Calcite | Level 5
Hey Tom! Thanks for responding. I assumed that, because I needed to make all those edits, I needed to import them all individually. I wanted to PROC TRANSPOSE each file prior to merging them all, since I thought it would facilitate an easier MERGE.

But you think using an INFILE wildcard makes more sense. Is that only because there are no subdirectories?
Tom
Super User Tom
Super User

SAS only allows a single wildcard in the INFILE statement.

infile "/mydir/*.txt" dsd ... ;

So you couldn't do something like:

infile "/mydir/*/*.txt" dsd ... ;

If you do want to read multiple files using a wildcard you can use FILENAME= option on the INFILE statement to set the name of the variable to get the name of the individual files.  You could then use FILENAME=LAG(FILENAME) to check when a new file starts and read in and ignore the first 4 lines.  But it might be hard to add that step of skipping the last line in the file.

 

You only need to use code generation (via macro or other tools) only when you cannot solve the problem using normal SAS code.

Tom
Super User Tom
Super User

Here is code to get list of files without the detailed FILEINFO that the linked program included.

data filelist;
  length level dir 8 filename $256;
  retain level 0 dir 1;
  filename = "~/temp/";
run;

data filelist;
  modify filelist;
  length fileref $8 ;
  retain sep "%sysfunc(ifc(&sysscp=WIN,\,/))";
  rc1=filename(fileref,filename);
  did=dopen(fileref);
  dir = did>0;
  replace;
  if dir then do;
    level=level+1;
    dname=filename ;
    if sep ne char(dname,length(dname)) then dname=cats(dname,sep);
    do fileno=1 to dnum(did);
      filename=cats(dname,dread(did,fileno));
      output;
    end;
    rc2=dclose(did);
  end;
run;
tvandaff
Calcite | Level 5

Thanks, Tom, this is great! Do you also have any idea why the PROC TRANSPOSE function I did worked for some of the files and not for others? 

Tom
Super User Tom
Super User

You should show the actual errors from PROC TRANSPOSE.  The errors you showed before looked like mistakes in the macro that caused it to generate invalid SAS code. They did not look like errors from PROC TRANSPOSE itself.

For such a simple transpose the only expected error would be the presence of duplicate ID values.  You need to fix the data first. Either keep only one observation per ID value or add an additional BY or ID variable so that the generated variable name is unique within each BY group.

tvandaff
Calcite | Level 5
 proc append base = FOOD01 FORCE     new = food**;   run;
 
 __                                                                                                                                  
 
 22
 
 200
 
 ERROR 22-322: Syntax error, expecting one of the following: ;, (, APPENDVER, APPENDVERSION, BASE, CREATE, DATA, FORCE, GETSORT, 
               NEW, NOWARN, OUT.  
 
 ERROR 200-322: The symbol is not recognized and will be ignored.
 
 NOTE: Statements not processed because of errors noted above.
 NOTE: PROCEDURE APPEND used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 NOTE: The SAS System stopped processing this step because of errors.
 
 
 NOTE: Line generated by the CALL EXECUTE routine.
 255       + data food** REPLACE;    INFILE "/folders/myshortcuts/myfolder/food_data/ice_creams_rich_chocolate.csv" firstobs=5 dsd
                      __
                      22
                      200
 255      !+truncover end=last;    LENGTH Nutrient Amount $20 Namevar $50;    Namevar = "ice_creams_rich_chocolate.csv";    Namevar =
 255      !+ substr
 ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_.  
 
 ERROR 200-322: The symbol is not recognized and will be ignored.
 
 256       +(Namevar,1,(length(Namevar)-4));    INPUT Nutrient Amount;     if not last then     output;   run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.FOOD may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
 WARNING: Data set WORK.FOOD was not replaced because this step was stopped.
 WARNING: The data set WORK.REPLACE may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
 WARNING: Data set WORK.REPLACE was not replaced because this step was stopped.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 
 NOTE: Line generated by the CALL EXECUTE routine.

Here is some of the Error Log

And then there are a ton of the expected warnings from using the Force function

tvandaff
Calcite | Level 5
Thank you BallardW, that Amount should've been numeric

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 931 views
  • 0 likes
  • 3 in conversation