Hello everyone,
I am currently using hash objects to join tables and I would like to loop through 5 tables (e.g. Table_2010, ..., Table_2015) inside my DATA step. To do that, I would like to concatenante the loop counter with a string, let's say "Table_".
My progress so far is the following :
DATA TABLE_2010;
INPUT ID VAL;
DATALINES;
1 63
2 22
3 71
;
RUN;
DATA TABLE_2011;
INPUT ID VAL;
DATALINES;
4 68
5 55
6 54
;
RUN;
DATA RES;
DO i = 2010 TO 2011;
DO UNTIL (EOF) ;
SET 'TABLE_'&i END = EOF;
OTHER_VAL = CATS('TEST',i);
OUTPUT;
END;
END;
STOP;
RUN;
This code doesn't currently work (otherwise I wouldn't be posting here) and I can see at least 3 issues :
Replacing "TABLE_&i" with CATS("TABLE_",i) doesn't work either ("Invalid option name "TABLE_" error).
I am kind of lost here and any help would be appreciated :).
Write this macro:
%MACRO one_table(year);
DO UNTIL (EOF_&year) ;
SET TABLE_&year END = EOF_&year;
CALL MISSING(NAME);
rc = h.find ();
OUTPUT;
END;
%MEND;
and call it for every year:
%one_table(2010)
%one_table(2011)
or use a macro %do loop within the macro to repeat the code.
This is what the macro engine is for.
Hi,
To mu knowledge, the SET statement is executed at compilation time, therefore you can not just change the table name on the fly!
Here is an alternative approach that could allow you to do that.
data _null_;
do table='table1','table2','table3' ; /* Specify your list of tables */
rc = open(table);
if (not rc) then
do;
m=sysmsg();
put m;
end;
else
do;
/* .... Some Other SAS Statements ....*/
rc=close(rc);
end;
end;
run;
Hope this helps,
Ahmed
Let´s dig a little into how the SAS interpreter works:
- program text is gathered
- if a macro reference (¨&¨or ¨%¨)is encountered, text is handed over to the macro processor
- if text for a data step is encountered, the data step is compiled and
- executed
So your &i is handed over to the macro processor LONG before the data step even starts to run, and would need for the macro variable i to already be defined.
Also, as alreay mentioned, theSET statement within a data step can not be done dynamically. The reason is that the table named in the SET has a major influence on the data structures present in the data step, which are determined at compile time.
To make your step dynamic, I suggest the following:
%macro dynamic_sets;
data res;
set
%do i = 2010 %to 2011;
table_&i (in=in&i)
%end;
;
%do i = 2010 %to 2011;
if in&i then other_val = ¨test&i¨;
%end;
run;
%mend;
%dynamic_sets;
Could you post test data and required output. Its not entirely clear to me what your attempting to do here, why can you not use indsname for this?
data want;
length store $20;
set table_: indsname=dsname;
store=scan(dsname,2,".");
run;
The way you have it set up isn't logically correct. Here it is broken down into two steps. Hope this helps:
DATA TABLE_2010;
INPUT ID VAL;
DATALINES;
1 63
2 22
3 71
;
DATA TABLE_2011;
INPUT ID VAL;
DATALINES;
4 68
5 55
6 54
;
%macro first;
%do i = 2010 %to 2011;
data table_&i;
set table_&i;
other_val = "test_&i";
run;
%end;
%mend;
%first;
data final;
%macro second;
set
%do i = 2010 %to 2011;
table_&i
%end;
%mend;
%second;
;
run;
I would recommend writting this out long hand and seeing what you can do without macros first.
Mark
Thank you everyone for your answers, especially Kurt for the explanation on how things work.
I feel like I should provide a more complete code and here it is :
DATA SMALL;
INPUT ID NAME $;
DATALINES;
1 Red
2 Green
3 Blue
;
RUN;
DATA TABLE_2010;
INPUT ID YEAR VAL;
DATALINES;
1 2010 63
2 2010 22
3 2010 71
;
RUN;
DATA TABLE_2011;
INPUT ID YEAR VAL;
DATALINES;
1 2011 68
2 2011 55
3 2011 54
;
RUN;
DATA WANT (drop = rc);
declare hash h (hashexp:16);
rc = h.DefineKey ('ID');
rc = h.DefineData ('NAME');
rc = h.DefineDone ();
do until (eof1);
set SMALL end = eof1;
rc = h.add ();
end;
DO UNTIL (EOF_2010) ;
SET TABLE_2010 END = EOF_2010;
CALL MISSING(NAME);
rc = h.find ();
OUTPUT;
END;
DO UNTIL (EOF_2011) ;
SET TABLE_2011 END = EOF_2011;
CALL MISSING(NAME);
rc = h.find ();
OUTPUT;
END;
STOP;
RUN;
This part works as intended but my goal is to go through TABLE_2010 and TABLE_2011 without having to write the same code (DO UNTIL... END;) twice in this case. That's why some of the suggestions won't work, because their alter my code too much for what I want to do. I think I can't wrap everything in a macro because it will load the SMALL table in memory twice, which I am trying to avoid (there are actually more like 20 tables loaded and some of them are over 100 000 rows). The yearly tables are about 1M rows each.
I hope this makes things clearer for you.
Write this macro:
%MACRO one_table(year);
DO UNTIL (EOF_&year) ;
SET TABLE_&year END = EOF_&year;
CALL MISSING(NAME);
rc = h.find ();
OUTPUT;
END;
%MEND;
and call it for every year:
%one_table(2010)
%one_table(2011)
or use a macro %do loop within the macro to repeat the code.
This is what the macro engine is for.
Nope, I am sorry, I still do not see the point of all that code. What you are outputting is a union of the second and third dataset merged with the first. This can be simply written as:
proc sql;
create table WANT2 as
select A.*,
B.NAME
from (select * from TABLE_2010 union all select * from TABLE_2011) A
left join SMALL B
on A.ID=B.ID
order by YEAR,ID;
quit;
Please also note that writing everything in uppercase and not finishing datasteps makes your code harder to read.
Thank you Kurt, this solution seems to be working on my example, I will try to make it work in my code :). One question though : is it wrong to write the entire macro inside the data step or should I only call it?
As for RW9's suggestion, my point is to avoid using SQL joins because they take way too much of time compared to joining using hash objects. Why should I do something in one hour when it can be done in two minutes (those are real numbers)? Moreover, thank you for pointing out that I forgot to finish my datasteps, I will correct them. As for writing in uppercase, I would gladly take any directions on what to write in upper/lowercase because I have no knowledge on that matter.
Just use datastep version then:
data inter;
set table:;
run;
proc sort data=inter;
by id;
run;
data want;
merge inter small;
by id;
run;
As for GPP, its something you will see on a lot of code here. Avoid using upper case (except SQL) as its harder to read. Finishing datasteps, consistent indents using spaces rather than tabs (although the forum here is bad for code formatting). Just helps reading code 🐵
Thank you for the tips RW9, they are duly noted.
I also advise you to read the followings articles which compare hash tables vs SQL joins vs merge : http://support.sas.com/resources/papers/proceedings09/071-2009.pdf and http://www2.sas.com/proceedings/sugi31/244-31.pdf
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.