I've used the relatively popular macro for importing many CSV files into SAS files. I then used a code to alter certain facets of the files. Each of these Datasets then had three columns with NameVar (name of food), Nutrient, and Amount. I wanted to turn this into a row so I could easily merge all of these files. The merge worked on some files and not others. The log is also giving an error that the file naming convention is not correct.
Then I use the PROC APPEND to add all the files together, which works for some and not others, I believe this is because some of the files did not successfully transpose.
Can you help me correct these errors? I've been working on this for several days and cannot figure out where I went wrong:
/*Create Macro to Create a List of all the CSV files in the Folder*/
%global caps;
%let caps=/folders/myshortcuts/myfolder/Capstone;
libname capstone "&caps";
%macro list_files(dir, ext);
%local filrf rc did memcnt name i;
%let rc=%sysfunc(filename(filrf, &dir));
%let did=%sysfunc(dopen(&filrf));
%if &did eq 0 %then
%do;
%put Directory &dir cannot be open or does not exist;
%return;
%end;
%do i=1 %to %sysfunc(dnum(&did));
%let name=%qsysfunc(dread(&did, &i));
%if %qupcase(%qscan(&name, -1, .))=%upcase(&ext) %then
%do;
%put &dir\&name;
%let file_name = %qscan(&name, 1, .);
%put &file_name;
data _tmp;
length dir $512 name $100;
dir=symget("dir");
name=symget("name");
path=catx('\', dir, name);
the_name=substr(name, 1, find(name, '.')-1);
run;
proc append base=list data=_tmp force;
run;
quit;
proc sql;
drop table _tmp;
quit;
%end;
%else %if %qscan(&name, 2, .)=%then
%do;
%list_files(&dir\&name, &ext) %end;
%end;
%let rc=%sysfunc(dclose(&did));
%let rc=%sysfunc(filename(filrf));
%mend list_files;
/*Finish Macro and create code to import datasets, and also remove extraneous obs*/
%macro import_file(path, file_name, dataset_name);
data &dataset_name REPLACE;
INFILE "&path./&file_name." firstobs=5 dsd truncover end=last;
LENGTH Nutrient Amount $20 Namevar $50;
Namevar = "&file_name";
Namevar = substr(Namevar,1,(length(Namevar)-4));
INPUT Nutrient Amount;
if not last then
output;
run;
proc transpose data=&dataset_name out=&dataset_name;
VAR Amount;
ID Nutrient;
BY Namevar;
run;
proc append base = FOOD01 FORCE
new = &dataset_name;
run;
run;
%mend;
%list_files(/folders/myshortcuts/myfolder/food_data, csv);
/*This code will iterate through the files and import each of them with a new title*/
data _null_;
set list;
string=catt('%import_file(', dir, ', ', name, ', ', catt('food', put(_n_,
z2.)), ');');
call execute (string);
run;
Why not just read them all at once?
Is your LIST_FILES macro searching sub directories?
If not you can just use a wildcard in the INFILE statement. Although it might be hard to exclude that last line of the source.
But if you make a dataset with the list of files to read you can use that to control the process. So if the dataset is named FILELIST and the filename is in a variable named FILENAME then your code looks like:
data TALL ;
set filelist ;
length Nutrient Amount $20 Namevar $50;
namevar=scan(filename,-1,'.');
namevar=substrn(scan(filename,-1,'/\'),1,length(scan(filename,-1,'/\'))-length(namevar)+1);
infile raw filevar=filename firstobs=5 dsd truncover end=last;
do while (not last);
input nutrient amount;
if not last then output;
end;
run;
proc transpose data=TALL out=WIDE ;
by namevar;
id Nutrient;
var Amount;
run;
Here is a link to a simple data step that can get all of the file names in a directory tree.
https://communities.sas.com/t5/SAS-Programming/Size-of-a-directory/m-p/642202#M191554
Show some of the LOG with the errors you encountered. And since you are using macros you should set Options mprint; before executing the macros to get the generated code with the errors. Copy from the log and paste into a code box opened on the forum with </> icon to preserve the formatting of the log and some of the common diagnostic characters that SAS places in the log to help debug things.
I would strongly suggest NOT transposing data to have multiple "nutrients" on a row unless you have a regression model that will use all of them as independent variables. And is your "amount" supposed to be numeric? I would sort of expect an amount to be numeric.
Why not just read them all at once?
Is your LIST_FILES macro searching sub directories?
If not you can just use a wildcard in the INFILE statement. Although it might be hard to exclude that last line of the source.
But if you make a dataset with the list of files to read you can use that to control the process. So if the dataset is named FILELIST and the filename is in a variable named FILENAME then your code looks like:
data TALL ;
set filelist ;
length Nutrient Amount $20 Namevar $50;
namevar=scan(filename,-1,'.');
namevar=substrn(scan(filename,-1,'/\'),1,length(scan(filename,-1,'/\'))-length(namevar)+1);
infile raw filevar=filename firstobs=5 dsd truncover end=last;
do while (not last);
input nutrient amount;
if not last then output;
end;
run;
proc transpose data=TALL out=WIDE ;
by namevar;
id Nutrient;
var Amount;
run;
Here is a link to a simple data step that can get all of the file names in a directory tree.
https://communities.sas.com/t5/SAS-Programming/Size-of-a-directory/m-p/642202#M191554
SAS only allows a single wildcard in the INFILE statement.
infile "/mydir/*.txt" dsd ... ;
So you couldn't do something like:
infile "/mydir/*/*.txt" dsd ... ;
If you do want to read multiple files using a wildcard you can use FILENAME= option on the INFILE statement to set the name of the variable to get the name of the individual files. You could then use FILENAME=LAG(FILENAME) to check when a new file starts and read in and ignore the first 4 lines. But it might be hard to add that step of skipping the last line in the file.
You only need to use code generation (via macro or other tools) only when you cannot solve the problem using normal SAS code.
Here is code to get list of files without the detailed FILEINFO that the linked program included.
data filelist;
length level dir 8 filename $256;
retain level 0 dir 1;
filename = "~/temp/";
run;
data filelist;
modify filelist;
length fileref $8 ;
retain sep "%sysfunc(ifc(&sysscp=WIN,\,/))";
rc1=filename(fileref,filename);
did=dopen(fileref);
dir = did>0;
replace;
if dir then do;
level=level+1;
dname=filename ;
if sep ne char(dname,length(dname)) then dname=cats(dname,sep);
do fileno=1 to dnum(did);
filename=cats(dname,dread(did,fileno));
output;
end;
rc2=dclose(did);
end;
run;
Thanks, Tom, this is great! Do you also have any idea why the PROC TRANSPOSE function I did worked for some of the files and not for others?
You should show the actual errors from PROC TRANSPOSE. The errors you showed before looked like mistakes in the macro that caused it to generate invalid SAS code. They did not look like errors from PROC TRANSPOSE itself.
For such a simple transpose the only expected error would be the presence of duplicate ID values. You need to fix the data first. Either keep only one observation per ID value or add an additional BY or ID variable so that the generated variable name is unique within each BY group.
proc append base = FOOD01 FORCE new = food**; run; __ 22 200 ERROR 22-322: Syntax error, expecting one of the following: ;, (, APPENDVER, APPENDVERSION, BASE, CREATE, DATA, FORCE, GETSORT, NEW, NOWARN, OUT. ERROR 200-322: The symbol is not recognized and will be ignored. NOTE: Statements not processed because of errors noted above. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: The SAS System stopped processing this step because of errors. NOTE: Line generated by the CALL EXECUTE routine. 255 + data food** REPLACE; INFILE "/folders/myshortcuts/myfolder/food_data/ice_creams_rich_chocolate.csv" firstobs=5 dsd __ 22 200 255 !+truncover end=last; LENGTH Nutrient Amount $20 Namevar $50; Namevar = "ice_creams_rich_chocolate.csv"; Namevar = 255 !+ substr ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_. ERROR 200-322: The symbol is not recognized and will be ignored. 256 +(Namevar,1,(length(Namevar)-4)); INPUT Nutrient Amount; if not last then output; run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.FOOD may be incomplete. When this step was stopped there were 0 observations and 3 variables. WARNING: Data set WORK.FOOD was not replaced because this step was stopped. WARNING: The data set WORK.REPLACE may be incomplete. When this step was stopped there were 0 observations and 3 variables. WARNING: Data set WORK.REPLACE was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: Line generated by the CALL EXECUTE routine.
Here is some of the Error Log
And then there are a ton of the expected warnings from using the Force function
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.