- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SAS User, I face a problem about DO LOOP inside macro.
I generate sheet1, sheet2_outx, sheet3_outx, sheet4_outx as the code below
%macro ImportAndTranspose(
File=
, StartSheet=
, EndSheet=
);
%local i;
%do i = &StartSheet. %to &EndSheet.;
%if &i=1 %then %do;
proc import datafile= "&File."
out= sheet&i.
dbms= xlsx
replace;
range= "Sheet1$A:X";
getnames= yes;
run;
proc sort data= sheet&i.;
by Type;
run;
%end;
%else %if &i ne 1 %then %do;
proc import datafile= "&File."
out= sheet&i.
dbms= xlsx
replace;
range= "Sheet&i.$A:AG";
getnames= yes;
run;
proc sort data= sheet&i.;
by Type;
run;
proc transpose data= sheet&i.
out= sheet&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
by Type;
VAR '1988'N - '2019'N;
run;
proc print data=sheet&i._out;
run;
data sheet&i._outx;
set sheet&i._out;
if s&i. not in: ('NA', '$$') then s&i.2=input(s&i., 32.);
drop s&i.;
run;
%end;
%end;
%mend;
%ImportAndTranspose(
File= C:\Users\pnguyen\Desktop\New folder\Argentina__.xlsx
, StartSheet= 1
, EndSheet= 4);
So, Sheet1 will have 25 columns including: Type, ename, name, mnem, dscd, bdate. like the picture below
And sheet2_outx, sheet3_outx, sheet4_outx are similar as below (just difference in s22, s32, s42)
My aim is to write a SQL to merge sheet1 and sheet2_outx, sheet3_outx, sheet4_outx, to get the result like that:
Type ename mnem dscd Year s22 s32 s42 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1988 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1989 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1990 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1991 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1992 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1993 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1994 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1995 . . . 131566 CENTRAL PUERTO B DEAD - DUPL SEE 134645 NA 131566 1996 . . .
I tried to wrote a macro DO LOOP as below
%macro sqlloop (start, end);
proc sql;
create Table Argentina as
%do i = &start.+1 %to &end.;
select a.Type, a.ename,a.mnem,a.dscd, b.Year ,b.s&i.2
from sheet1 as a, sheet&i._outx as b
where a.Type=b.Type;
%end;
quit;
%mend;
%sqlloop(start=1,end=4);
However, the results is just like that
There is no error so far but it is not the result I want
I have a couple of questions about the situation:
1> I am not sure if there is anything wrong with my code?
2> In my macro for DO LOOP, it means that the program needs to run through sheet2_outx, sheet3_outx, sheet4_outx. Is there any chance that we can make it run only one time?
3> Is there any chance that I can put the macro sqlloop inside macro ImportandTranspose to get the result I want?
Many thanks in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To see if something is wrong with the generated code, you can
- Set OPTION MPRINT and see in the log what is generated
- Look through the log to see if there are any error messages
What your code seems to do is this:
- It creates a table named Argentina, which contains the data from SHEET1 and SHEET2_OUTX
- It prints the data from merging the data from SHEET1 with the data from the other sheets one by one (you may not see the print if you have ODS ALL OFF).
It is probably a lot easier to do this as a couple of data steps. I think what you want comes down to something like this:
data merge1;
merge
sheet1(keep=Type ename mnem dscd)
sheet2_out(keep=type year s22)
;
by type;
run;
data want;
merge
merge1
sheet3_out(keep=type year s32)
sheet4_out(keep=type year s42)
;
by type year;
run;
If that works, you can then turn it into a macro, so that you can easily increase the number of sheets.
That is the general secret to writing macros that work: First, write code that works. Then, generalize it with macro statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As a first step, write the code without(!) any macro coding. Hint: you will have to join the "secondary" tables first on type and year, then join the result to the "primary" table on type alone.
Only when you have that working code before your eyes, you will be able to see how to automate it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your insightful idea @Kurt_Bremser @ballardw @s_lassen .
I also chopped the code down to normalize code rather than being stuck in macro as suggested.
Hi @s_lassen , your code worked perfectly:
However, I want to challenge myself quite a bit by using proc SQL in this case, but the new code written took SAS a lot of time and performed wrongly as well as costing a lot of operating memory (more than 110 GB). However, I decided to face the problem.
Here is my code for merging (following @Kurt_Bremser 's suggestion, I merge the "secondary" first then merge with the "first"), as below
proc sql;
create table merge1 as select *
from
sheet2_outx
full join
sheet3_outx
on sheet2_outx.Type=sheet3_outx.Type
full join
sheet4_out
on sheet3_outx.Type=sheet4_out.Type
;
quit;
proc sql;
create table merge2 as
select a.*, b.*
from sheet1 as a, merge1 as b;
quit;
The log so far
WARNING: Variable Type already exists on file WORK.MERGE1.
WARNING: Variable Year already exists on file WORK.MERGE1.
WARNING: Variable Type already exists on file WORK.MERGE1.
WARNING: Variable Year already exists on file WORK.MERGE1.
NOTE: Table WORK.MERGE1 created, with 14843904 rows and 5 columns.
40 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2.88 seconds
cpu time 1.76 seconds
41
42 proc sql;
43 create table merge2 as
44
45 select a.*, b.*
46
2 The SAS System 07:57 Tuesday, January 5, 2021
47 from sheet1 as a, merge1 as b;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
WARNING: Variable Type already exists on file WORK.MERGE2.
And it took SAS a long period of time to execute this code, similar to the comment that "SQL processes the entire table in memory. As the size of your tables increase you may experience performance degradation. MERGE processes a row at a time so rarely has memory limitations." from ArtC via the link https://communities.sas.com/t5/SAS-Programming/difference-between-merg-and-proc-sql-join/td-p/33399#....
In my case, I have 34 sheets in a file and merge them together, so do you think I should stick to MERGE or PROC SQL in case?
Many thanks in advance and cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your first join needs to match on type and year, or you will get a cartesian join that blows up your result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To see if something is wrong with the generated code, you can
- Set OPTION MPRINT and see in the log what is generated
- Look through the log to see if there are any error messages
What your code seems to do is this:
- It creates a table named Argentina, which contains the data from SHEET1 and SHEET2_OUTX
- It prints the data from merging the data from SHEET1 with the data from the other sheets one by one (you may not see the print if you have ODS ALL OFF).
It is probably a lot easier to do this as a couple of data steps. I think what you want comes down to something like this:
data merge1;
merge
sheet1(keep=Type ename mnem dscd)
sheet2_out(keep=type year s22)
;
by type;
run;
data want;
merge
merge1
sheet3_out(keep=type year s32)
sheet4_out(keep=type year s42)
;
by type year;
run;
If that works, you can then turn it into a macro, so that you can easily increase the number of sheets.
That is the general secret to writing macros that work: First, write code that works. Then, generalize it with macro statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your do loop in the Proc SQL is equivalent to
Proc sql; create table Argentina as select a.typy, a.ename, a.dscd, b,year, b.<somejunk> from sheet1 as a,sheet1.outx as b where a.type=b.type; select a.typy, a.ename, a.dscd, b,year, b.<somejunk2> from sheet1 as a,sheet2.outx as b where a.type=b.type; select a.typy, a.ename, a.dscd, b,year, b.<somejunk3> from sheet1 as a,sheet3.outx as b where a.type=b.type; <repeat until &end>
If none of this generated actual syntax errors your results windows should have filled up with the select results of i=2,3,4.
You only have one Create table select operation. The remaining selects do not have a table to go to.
And if you try to send them each to the same table then only the last would be the result as you are not appending results, if that is the intent.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My code using MERGE now is:
data merge1;
merge
sheet1
sheet2_outx
;
by type;
run;
data want;
merge
merge1
sheet3_outx
sheet4_outx
;
by type year;
run;
And SAS run quickly there and then!