BookmarkSubscribeRSS Feed
jkurka
Fluorite | Level 6

Hello,

I have a macro that processes all the files found in a directory folder. However, I would like it to process the files by the ID variable, so that the last ID processed doesn't overwrite the previously run data. I've pasted my code below.

Note: dirlist is the dataset with names of the files and the ID.

%macro GAscoring;
%do i=1 %to &num_files ;
   DATA _null_;
    set dirlist; if _n_=&i;
    call symput('filein',csvname);
    call symput('id',id);
   RUN;
   DATA DATA_IMPORT;
more data and sql processing............
   PROC SQL;
    CREATE TABLE SLEEPWINDOW AS
    SELECT    t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp,
            t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h
    FROM WORK.DATA_RAW1 t1
    NATURAL JOIN surveydata t2
    WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime
    ORDER by id, datetimestamp ;
   QUIT;
   PROC APPEND data= sleepwindow out= SLEEP_&id ;
   RUN ;

%end ;*of do num_files;
%mend GAscoring(&id); %GAscoring;

 

Can someone help with what I might be missing?

Thanks!

8 REPLIES 8
SASKiwi
PROC Star
   %macro GAscoring (num_files)
%do i = 1 %to &num_files;
DATA _null_; set dirlist (firstobs = &i obs = &i); call symput('filein',csvname); call symput('id',id); RUN;
.....
%mend GAscoring;
%GAscoring(10);

This is a basic way of stepping through each row in your table assuming &i does not exceed the number of table rows. Not the most elegant of solutions but it should work. I've put &num_files in as a macro parameter as it doesn't appear to be defined anywhere.

jkurka
Fluorite | Level 6

I have it defined previously in my code (not posted). However, this will just do 10 files correct? I'd like the macro to run through every file but append only the matching IDs.

filename dirlist pipe "dir /B &dir\*.csv";
DATA dirlist ;
	length csvname $30;
	infile dirlist length=reclen;
	input csvname $varying30. reclen;
	call symput ('num_files',_n_);
	id= scan(csvname,1,'_');
RUN ;

**DO THIS MACRO FOR EACH ID**;
**note: there are multiple files per ID ;
%macro GAscoring;
%do i=1 %to &num_files ;
DATA _null_;
	set dirlist; *if _n_=&i;
	call symput('filein',csvname);
	call symput('id',id);
RUN;
DATA DATA_IMPORT;
    LENGTH
        datetimestamp    $ 23
        x                  8
        y                  8
        z                  8
        lux                8
        event              8
        temp               8	;
    LABEL
		datetimestamp    = "F1"
        x                = "F2"
        y                = "F3"
        z                = "F4"
        lux              = "F5"
        event            = "F6"
        temp             = "F7" ;
    FORMAT
        datetimestamp    $CHAR23.
        x                BEST7.
        y                BEST7.
        z                BEST7.
        lux              BEST3.
        event            BEST3.
        temp             BEST4. ;
    INFORMAT
        datetimestamp    $CHAR23.
        x                BEST7.
        y                BEST7.
        z                BEST7.
        lux              BEST3.
        event            BEST3.
        temp             BEST4. ;
	INFILE "&dir\&filein" LRECL=32760 ENCODING="WLATIN1" TERMSTR=crlf DLM=',' MISSOVER firstobs=101 DSD ;
    INPUT
		datetimestamp    : $23.
        x                : ?? BEST7.
        y                : ?? BEST7.
        z                : ?? BEST7.
        lux              : ?? BEST3.
        event            : ?? BEST3.
        temp             : ?? BEST4. ;
RUN;
DATA DATA_RAW1; set DATA_IMPORT;
	id= "&id";
	date = 		input(scan(datetimestamp,1,' '), yymmdd10.);	format date mmddyys10. ;
	timestamp = (input(substr(datetimestamp,12,8),time8. )
				+ (input(substr(datetimestamp,21,3),3.))/1000); format timestamp time12.3 ;
	newdatetimestamp = dhms(date,0,0,timestamp) ;				format newdatetimestamp datetime21.3 ;
	timestamp = floor(timestamp);
	drop datetimestamp ;
	rename newdatetimestamp = datetimestamp ;
	drop date timestamp ;
run;
DATA surveydata ;
	set sleeplog.surveydata ;
	beddatetime = lag(beddatetime) ; format beddatetime datetime21.9 ;
	output surveydata ;
RUN ;
PROC SQL;
	CREATE TABLE SLEEPWINDOW AS 
	SELECT	t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp, 
			t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h
	FROM WORK.DATA_RAW1 t1
	NATURAL JOIN surveydata t2
	WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime 
	ORDER by id, datetimestamp ;
QUIT;
 
PROC APPEND data= sleepwindow out= SLEEP_&id ;
RUN ;

%end ;*of do num_files;
%mend GAscoring(&id); %GAscoring;
jkurka
Fluorite | Level 6

This is still not resulting in different datasets for each ID once the macro is complete. The only data set resulting is the last one run. How can we code it to do the macro by each ID in the dirlist. There are multiple files per ID also.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Not sure you need all that code.  A datastep is a loop itself, so you have a list of files and you want to run some code on those files, a simple change to the code will suffice:

data _null_;
  set dirlist;
  call execute(cats('%GAscoring (file="',filein,'");'));
run;

Change the GAscoring macro to accept the filename as a parameter and the datastep above will generate a call to that macro for every obervation in the dirlist dataset.  Far simpler than all that do looping and symputting etc.

jkurka
Fluorite | Level 6

But how does the macro know to append only the files with the same ID? Doesn't there need to be some kind of class=ID designation?

jkurka
Fluorite | Level 6

Here is my code. As shown, I'm trying to process the macro only by ID, so that the last defined ID processed doesn't overwrite the previous.

filename dirlist pipe "dir /B &dir\*.csv";
DATA dirlist ;
	length csvname $30;
	infile dirlist length=reclen;
	input csvname $varying30. reclen;
	call symput ('num_files',_n_);
	id= scan(csvname,1,'_');
RUN ;

**DO THIS MACRO FOR EACH ID**;
**note: there are multiple files per ID ;
%macro GAscoring;
%do i=1 %to &num_files ;
DATA _null_;
	set dirlist; *if _n_=&i;
	call symput('filein',csvname);
	call symput('id',id);
RUN;
DATA DATA_IMPORT;
    LENGTH
        datetimestamp    $ 23
        x                  8
        y                  8
        z                  8
        lux                8
        event              8
        temp               8	;
    LABEL
		datetimestamp    = "F1"
        x                = "F2"
        y                = "F3"
        z                = "F4"
        lux              = "F5"
        event            = "F6"
        temp             = "F7" ;
    FORMAT
        datetimestamp    $CHAR23.
        x                BEST7.
        y                BEST7.
        z                BEST7.
        lux              BEST3.
        event            BEST3.
        temp             BEST4. ;
    INFORMAT
        datetimestamp    $CHAR23.
        x                BEST7.
        y                BEST7.
        z                BEST7.
        lux              BEST3.
        event            BEST3.
        temp             BEST4. ;
	INFILE "&dir\&filein" LRECL=32760 ENCODING="WLATIN1" TERMSTR=crlf DLM=',' MISSOVER firstobs=101 DSD ;
    INPUT
		datetimestamp    : $23.
        x                : ?? BEST7.
        y                : ?? BEST7.
        z                : ?? BEST7.
        lux              : ?? BEST3.
        event            : ?? BEST3.
        temp             : ?? BEST4. ;
RUN;
DATA DATA_RAW1; set DATA_IMPORT;
	id= "&id";
	date = 		input(scan(datetimestamp,1,' '), yymmdd10.);	format date mmddyys10. ;
	timestamp = (input(substr(datetimestamp,12,8),time8. )
				+ (input(substr(datetimestamp,21,3),3.))/1000); format timestamp time12.3 ;
	newdatetimestamp = dhms(date,0,0,timestamp) ;				format newdatetimestamp datetime21.3 ;
	timestamp = floor(timestamp);
	drop datetimestamp ;
	rename newdatetimestamp = datetimestamp ;
	drop date timestamp ;
run;
DATA surveydata ;
	set sleeplog.surveydata ;
	beddatetime = lag(beddatetime) ; format beddatetime datetime21.9 ;
	output surveydata ;
RUN ;
PROC SQL;
	CREATE TABLE SLEEPWINDOW AS 
	SELECT	t1.x, t1.y, t1.z, t1.lux, t1.temp, t1.id, t1.datetimestamp, 
			t2.day, t2.beddatetime, t2.wakedatetime, t2.sleep_h
	FROM WORK.DATA_RAW1 t1
	NATURAL JOIN surveydata t2
	WHERE t1.datetimestamp BETWEEN t2.wakedatetime AND t2.beddatetime 
	ORDER by id, datetimestamp ;
QUIT;
 
PROC APPEND data= sleepwindow out= SLEEP_&id ;
RUN ;
**APPEND HAS TO ONLY APPEND sleepwindow FOR SAME IDs;

%end ;*of do num_files;
%mend GAscoring(&id); %GAscoring;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,


Well, there are quite a few things wrong with that code I am afraid.  Some obvious points:
- This is: SLEEP_&id - is not shown in your code, no dataset sleep_x - where x is the &i.?

- This is invalid: %mend GAscoring(&id)

 

Now what I was suggesting in my code before was that rather than loop in the macro, you use the orginal dataset to generate calls out to the macro:

%macro GAscoring (file=);
  data data_import;
    length ...l
    infile "&dir.\&file." lrecl=32760 encoding="wlatin1" termstr=crlf dlm=',' missover firstobs=101 dsd;
    ...
  run;
  ...
  proc append data=sleepwindow out=sleep_&id.;
  run;
%mend GAscoring;
data _null_;
  set dirlist;
  call execute(cats('%GAscoring (file="',filein,'");'));
run;

This would simplfy your macro code.  However, it maybe simpler than that.  Can you provide a couple of example test data files?  In the below, I simulate your dir list.  From that I create a call execute statement which creates a file reference to all the rows in that dataset which conform to the rules given in the where clause.  I then have a datastep that reads all those datafiles from the fileref.

data dirlist;
  length csvname $20;
  input csvname $;
datalines;
abcdef.csv
rtyuio.csv
werty.csv
;
run;

data _null_;
  length fn $2000;
  set dirlist (where=(substr(csvname,1,3) in ("abc","wer"))) end=last;
  retain fn;
  fn=catx(" ",fn,quote(strip(csvname)));
  if last then call execute('filename mydata ('||strip(fn)||');');
run;

data total_data;
  length datetimestamp...;
  label datetimestamp...;
  format ...;
  infile mydata lrecl=32760 encoding="wlatin1" termstr=crlf dlm=',' missover firstobs=101 dsd;
  input ...;
run;

You will of course have to modify it to your files as I don't have that, this is just to demonstrate how to do this kind of process in stadard datastep.

jkurka
Fluorite | Level 6

I tried this code and it just runs endlessly with errors for 'Apparent symbolic reference FILEIN not resolved.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1702 views
  • 1 like
  • 3 in conversation