Hello,
I have a macro that processes all the files found in a directory folder. However, I would like it to process the files by the ID variable, so that the last ID processed doesn't overwrite the previously run data. I've pasted my code below.
Note: dirlist is the dataset with names of the files and the ID.
%macro GAscoring; %do i=1 %to &num_files ; DATA _null_; set dirlist; if _n_=&i; call symput('filein',csvname); call symput('id',id); RUN; DATA DATA_IMPORT; more data and sql processing............ PROC SQL; CREATE TABLE SLEEPWINDOW AS SELECT t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp, t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h FROM WORK.DATA_RAW1 t1 NATURAL JOIN surveydata t2 WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime ORDER by id, datetimestamp ; QUIT; PROC APPEND data= sleepwindow out= SLEEP_&id ; RUN ; %end ;*of do num_files; %mend GAscoring(&id); %GAscoring;
Can someone help with what I might be missing?
Thanks!
%macro GAscoring (num_files)
%do i = 1 %to &num_files;
DATA _null_; set dirlist (firstobs = &i obs = &i); call symput('filein',csvname); call symput('id',id); RUN;
.....
%mend GAscoring;
%GAscoring(10);
This is a basic way of stepping through each row in your table assuming &i does not exceed the number of table rows. Not the most elegant of solutions but it should work. I've put &num_files in as a macro parameter as it doesn't appear to be defined anywhere.
I have it defined previously in my code (not posted). However, this will just do 10 files correct? I'd like the macro to run through every file but append only the matching IDs.
filename dirlist pipe "dir /B &dir\*.csv"; DATA dirlist ; length csvname $30; infile dirlist length=reclen; input csvname $varying30. reclen; call symput ('num_files',_n_); id= scan(csvname,1,'_'); RUN ; **DO THIS MACRO FOR EACH ID**; **note: there are multiple files per ID ; %macro GAscoring; %do i=1 %to &num_files ; DATA _null_; set dirlist; *if _n_=&i; call symput('filein',csvname); call symput('id',id); RUN; DATA DATA_IMPORT; LENGTH datetimestamp $ 23 x 8 y 8 z 8 lux 8 event 8 temp 8 ; LABEL datetimestamp = "F1" x = "F2" y = "F3" z = "F4" lux = "F5" event = "F6" temp = "F7" ; FORMAT datetimestamp $CHAR23. x BEST7. y BEST7. z BEST7. lux BEST3. event BEST3. temp BEST4. ; INFORMAT datetimestamp $CHAR23. x BEST7. y BEST7. z BEST7. lux BEST3. event BEST3. temp BEST4. ; INFILE "&dir\&filein" LRECL=32760 ENCODING="WLATIN1" TERMSTR=crlf DLM=',' MISSOVER firstobs=101 DSD ; INPUT datetimestamp : $23. x : ?? BEST7. y : ?? BEST7. z : ?? BEST7. lux : ?? BEST3. event : ?? BEST3. temp : ?? BEST4. ; RUN; DATA DATA_RAW1; set DATA_IMPORT; id= "&id"; date = input(scan(datetimestamp,1,' '), yymmdd10.); format date mmddyys10. ; timestamp = (input(substr(datetimestamp,12,8),time8. ) + (input(substr(datetimestamp,21,3),3.))/1000); format timestamp time12.3 ; newdatetimestamp = dhms(date,0,0,timestamp) ; format newdatetimestamp datetime21.3 ; timestamp = floor(timestamp); drop datetimestamp ; rename newdatetimestamp = datetimestamp ; drop date timestamp ; run; DATA surveydata ; set sleeplog.surveydata ; beddatetime = lag(beddatetime) ; format beddatetime datetime21.9 ; output surveydata ; RUN ; PROC SQL; CREATE TABLE SLEEPWINDOW AS SELECT t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp, t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h FROM WORK.DATA_RAW1 t1 NATURAL JOIN surveydata t2 WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime ORDER by id, datetimestamp ; QUIT; PROC APPEND data= sleepwindow out= SLEEP_&id ; RUN ; %end ;*of do num_files; %mend GAscoring(&id); %GAscoring;
This is still not resulting in different datasets for each ID once the macro is complete. The only data set resulting is the last one run. How can we code it to do the macro by each ID in the dirlist. There are multiple files per ID also.
Hi,
Not sure you need all that code. A datastep is a loop itself, so you have a list of files and you want to run some code on those files, a simple change to the code will suffice:
data _null_; set dirlist; call execute(cats('%GAscoring (file="',filein,'");')); run;
Change the GAscoring macro to accept the filename as a parameter and the datastep above will generate a call to that macro for every obervation in the dirlist dataset. Far simpler than all that do looping and symputting etc.
But how does the macro know to append only the files with the same ID? Doesn't there need to be some kind of class=ID designation?
Here is my code. As shown, I'm trying to process the macro only by ID, so that the last defined ID processed doesn't overwrite the previous.
filename dirlist pipe "dir /B &dir\*.csv"; DATA dirlist ; length csvname $30; infile dirlist length=reclen; input csvname $varying30. reclen; call symput ('num_files',_n_); id= scan(csvname,1,'_'); RUN ; **DO THIS MACRO FOR EACH ID**; **note: there are multiple files per ID ; %macro GAscoring; %do i=1 %to &num_files ; DATA _null_; set dirlist; *if _n_=&i; call symput('filein',csvname); call symput('id',id); RUN; DATA DATA_IMPORT; LENGTH datetimestamp $ 23 x 8 y 8 z 8 lux 8 event 8 temp 8 ; LABEL datetimestamp = "F1" x = "F2" y = "F3" z = "F4" lux = "F5" event = "F6" temp = "F7" ; FORMAT datetimestamp $CHAR23. x BEST7. y BEST7. z BEST7. lux BEST3. event BEST3. temp BEST4. ; INFORMAT datetimestamp $CHAR23. x BEST7. y BEST7. z BEST7. lux BEST3. event BEST3. temp BEST4. ; INFILE "&dir\&filein" LRECL=32760 ENCODING="WLATIN1" TERMSTR=crlf DLM=',' MISSOVER firstobs=101 DSD ; INPUT datetimestamp : $23. x : ?? BEST7. y : ?? BEST7. z : ?? BEST7. lux : ?? BEST3. event : ?? BEST3. temp : ?? BEST4. ; RUN; DATA DATA_RAW1; set DATA_IMPORT; id= "&id"; date = input(scan(datetimestamp,1,' '), yymmdd10.); format date mmddyys10. ; timestamp = (input(substr(datetimestamp,12,8),time8. ) + (input(substr(datetimestamp,21,3),3.))/1000); format timestamp time12.3 ; newdatetimestamp = dhms(date,0,0,timestamp) ; format newdatetimestamp datetime21.3 ; timestamp = floor(timestamp); drop datetimestamp ; rename newdatetimestamp = datetimestamp ; drop date timestamp ; run; DATA surveydata ; set sleeplog.surveydata ; beddatetime = lag(beddatetime) ; format beddatetime datetime21.9 ; output surveydata ; RUN ; PROC SQL; CREATE TABLE SLEEPWINDOW AS SELECT t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp, t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h FROM WORK.DATA_RAW1 t1 NATURAL JOIN surveydata t2 WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime ORDER by id, datetimestamp ; QUIT; PROC APPEND data= sleepwindow out= SLEEP_&id ; RUN ; **APPEND HAS TO ONLY APPEND sleepwindow FOR SAME IDs; %end ;*of do num_files; %mend GAscoring(&id); %GAscoring;
Hi,
Well, there are quite a few things wrong with that code I am afraid. Some obvious points:
- This is: SLEEP_&id - is not shown in your code, no dataset sleep_x - where x is the &i.?
- This is invalid: %mend GAscoring(&id)
Now what I was suggesting in my code before was that rather than loop in the macro, you use the orginal dataset to generate calls out to the macro:
%macro GAscoring (file=); data data_import; length ...l infile "&dir.\&file." lrecl=32760 encoding="wlatin1" termstr=crlf dlm=',' missover firstobs=101 dsd; ... run; ... proc append data=sleepwindow out=sleep_&id.; run; %mend GAscoring; data _null_; set dirlist; call execute(cats('%GAscoring (file="',filein,'");')); run;
This would simplfy your macro code. However, it maybe simpler than that. Can you provide a couple of example test data files? In the below, I simulate your dir list. From that I create a call execute statement which creates a file reference to all the rows in that dataset which conform to the rules given in the where clause. I then have a datastep that reads all those datafiles from the fileref.
data dirlist; length csvname $20; input csvname $; datalines; abcdef.csv rtyuio.csv werty.csv ; run; data _null_; length fn $2000; set dirlist (where=(substr(csvname,1,3) in ("abc","wer"))) end=last; retain fn; fn=catx(" ",fn,quote(strip(csvname))); if last then call execute('filename mydata ('||strip(fn)||');'); run; data total_data; length datetimestamp...; label datetimestamp...; format ...; infile mydata lrecl=32760 encoding="wlatin1" termstr=crlf dlm=',' missover firstobs=101 dsd; input ...; run;
You will of course have to modify it to your files as I don't have that, this is just to demonstrate how to do this kind of process in stadard datastep.
I tried this code and it just runs endlessly with errors for 'Apparent symbolic reference FILEIN not resolved.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.