Solved: Re: Resolving loop counter inside a DATA step without using MACRO.

sasanon1664 · Posted 09-23-2015 05:12 AM

Hello everyone,

I am currently using hash objects to join tables and I would like to loop through 5 tables (e.g. Table_2010, ..., Table_2015) inside my DATA step. To do that, I would like to concatenante the loop counter with a string, let's say "Table_".

My progress so far is the following :

DATA TABLE_2010;
   INPUT ID VAL;
   DATALINES;
1  63
2  22
3  71
;
RUN;
 
DATA TABLE_2011;
   INPUT ID VAL;
   DATALINES;
4  68
5  55
6  54
;
RUN;
 
DATA RES;
DO i = 2010 TO 2011;
	DO UNTIL (EOF) ;
		SET 'TABLE_'&i END = EOF;
		OTHER_VAL = CATS('TEST',i);
		OUTPUT;
	END;
END;
STOP;
RUN;

This code doesn't currently work (otherwise I wouldn't be posting here) and I can see at least 3 issues :

If I run this, the first error in the log indicates "Apparent symbolic reference i not resolved."
The loop looks like it stops after 2010, I suspect EOF is the cause of this and should be replaced by EOF_2010 and EOF_2011.
I'm not sure wheter STOP; instruction should be put before or after the END; of the outside loop.

Replacing "TABLE_&i" with CATS("TABLE_",i) doesn't work either ("Invalid option name "TABLE_" error).

I am kind of lost here and any help would be appreciated :).

Kurt_Bremser · Posted 09-23-2015 09:41 AM

Write this macro:

%MACRO one_table(year);
DO UNTIL (EOF_&year) ;
  SET TABLE_&year END = EOF_&year;
  CALL MISSING(NAME);
  rc = h.find ();
  OUTPUT;
END;
%MEND;

and call it for every year:

%one_table(2010)
%one_table(2011)

or use a macro %do loop within the macro to repeat the code.

This is what the macro engine is for.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

AhmedAl_Attar · Posted 09-23-2015 05:32 AM

Hi,

To mu knowledge, the SET statement is executed at compilation time, therefore you can not just change the table name on the fly!

Here is an alternative approach that could allow you to do that.

data _null_;
    do table='table1','table2','table3' ; /* Specify your list of tables */
       rc = open(table);
       if (not rc) then
       do;
           m=sysmsg();
             put m;
       end;
       else

do;

/* .... Some Other SAS Statements ....*/
rc=close(rc);

end;
end;
run;

Hope this helps,

Ahmed

Kurt_Bremser · Posted 09-23-2015 06:28 AM

Let´s dig a little into how the SAS interpreter works:

- program text is gathered

- if a macro reference (¨&¨or ¨%¨)is encountered, text is handed over to the macro processor

- if text for a data step is encountered, the data step is compiled and

- executed

So your &i is handed over to the macro processor LONG before the data step even starts to run, and would need for the macro variable i to already be defined.

Also, as alreay mentioned, theSET statement within a data step can not be done dynamically. The reason is that the table named in the SET has a major influence on the data structures present in the data step, which are determined at compile time.

To make your step dynamic, I suggest the following:

%macro dynamic_sets;

data res;

set

%do i = 2010 %to 2011;

table_&i (in=in&i)

%end;

;

%do i = 2010 %to 2011;

if in&i then other_val = ¨test&i¨;

%end;

run;

%mend;

%dynamic_sets;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 09-23-2015 06:58 AM

Could you post test data and required output. Its not entirely clear to me what your attempting to do here, why can you not use indsname for this?

data want;
length store $20;
set table_: indsname=dsname;
store=scan(dsname,2,".");
run;

Steelers_In_DC · Posted 09-23-2015 09:17 AM

The way you have it set up isn't logically correct. Here it is broken down into two steps. Hope this helps:

DATA TABLE_2010;
   INPUT ID VAL;
   DATALINES;
1 63
2 22
3 71
;

DATA TABLE_2011;
   INPUT ID VAL;
   DATALINES;
4 68
5 55
6 54
;

%macro first;
%do i = 2010 %to 2011;
data table_&i;
set table_&i;
other_val = "test_&i";
run;
%end;
%mend;
%first;

data final;
%macro second;
set

%do i = 2010 %to 2011;
table_&i
%end;
%mend;
%second;
;
run;

I would recommend writting this out long hand and seeing what you can do without macros first.

Mark

sasanon1664 · Posted 09-23-2015 09:33 AM

Thank you everyone for your answers, especially Kurt for the explanation on how things work.

I feel like I should provide a more complete code and here it is :

DATA SMALL;
   INPUT ID NAME $;
   DATALINES;
1 Red
2 Green
3 Blue
;
RUN;

DATA TABLE_2010;
   INPUT ID YEAR VAL;
   DATALINES;
1 2010 63
2 2010 22
3 2010 71

;
RUN;

DATA TABLE_2011;
   INPUT ID YEAR VAL;
   DATALINES;
1 2011 68
2 2011 55
3 2011 54
;
RUN;

DATA WANT (drop = rc);

	declare hash h (hashexp:16);
	rc = h.DefineKey ('ID');
	rc = h.DefineData ('NAME');
	rc = h.DefineDone ();

	do until (eof1);
		set SMALL end = eof1;
		rc = h.add ();
	end;

	DO UNTIL (EOF_2010) ;
		SET TABLE_2010 END = EOF_2010;
		CALL MISSING(NAME);
		rc = h.find ();
		OUTPUT;
	END;

	DO UNTIL (EOF_2011) ;
		SET TABLE_2011 END = EOF_2011;
		CALL MISSING(NAME);
		rc = h.find ();
		OUTPUT;
	END;

STOP;
RUN;

This part works as intended but my goal is to go through TABLE_2010 and TABLE_2011 without having to write the same code (DO UNTIL... END;) twice in this case. That's why some of the suggestions won't work, because their alter my code too much for what I want to do. I think I can't wrap everything in a macro because it will load the SMALL table in memory twice, which I am trying to avoid (there are actually more like 20 tables loaded and some of them are over 100 000 rows). The yearly tables are about 1M rows each.

I hope this makes things clearer for you.

Kurt_Bremser · Posted 09-23-2015 09:41 AM

Write this macro:

%MACRO one_table(year);
DO UNTIL (EOF_&year) ;
  SET TABLE_&year END = EOF_&year;
  CALL MISSING(NAME);
  rc = h.find ();
  OUTPUT;
END;
%MEND;

and call it for every year:

%one_table(2010)
%one_table(2011)

or use a macro %do loop within the macro to repeat the code.

This is what the macro engine is for.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 09-23-2015 09:58 AM

Nope, I am sorry, I still do not see the point of all that code. What you are outputting is a union of the second and third dataset merged with the first. This can be simply written as:

proc sql;
create table WANT2 as
select A.*,
B.NAME
from (select * from TABLE_2010 union all select * from TABLE_2011) A
left join SMALL B
on A.ID=B.ID
order by YEAR,ID;
quit;

Please also note that writing everything in uppercase and not finishing datasteps makes your code harder to read.

sasanon1664 · Posted 09-23-2015 10:52 AM

Thank you Kurt, this solution seems to be working on my example, I will try to make it work in my code :). One question though : is it wrong to write the entire macro inside the data step or should I only call it?

As for RW9's suggestion, my point is to avoid using SQL joins because they take way too much of time compared to joining using hash objects. Why should I do something in one hour when it can be done in two minutes (those are real numbers)? Moreover, thank you for pointing out that I forgot to finish my datasteps, I will correct them. As for writing in uppercase, I would gladly take any directions on what to write in upper/lowercase because I have no knowledge on that matter.

RW9 · Posted 09-23-2015 11:54 AM

Just use datastep version then:

data inter;

set table:;

run;

proc sort data=inter;

by id;

run;

data want;

merge inter small;

by id;

run;

As for GPP, its something you will see on a lot of code here. Avoid using upper case (except SQL) as its harder to read. Finishing datasteps, consistent indents using spaces rather than tabs (although the forum here is bad for code formatting). Just helps reading code 🐵

sasanon1664 · Posted 09-24-2015 06:13 AM

Thank you for the tips RW9, they are duly noted.

I also advise you to read the followings articles which compare hash tables vs SQL joins vs merge : http://support.sas.com/resources/papers/proceedings09/071-2009.pdf and http://www2.sas.com/proceedings/sugi31/244-31.pdf

Registration is open

SAS Training: Just a Click Away