Hello
I'm trying to create a SELECT statement for different tables having different variables - the aim is to check if each table contains duplicates, but I'm struggling to get the right code (this community kindly helped me to find a solution when I was looking for duplicates considering all columns, but now I only need to select some columns ( i.e. keys)).
As I cannot use GROUP BY *, I need to list all the variables in the GROUP BY statement, but I can't find the way to do so.
Below is my attempt - which of course does not work, as the SELECT &Keys and GROUP BY &KEYS does not list all the variables.
/* Create the tables containing the keys for each report */ data METADATA; input report $ Keys $; datalines; CARS MAKE CARS MODEL BASEBALL NAME BASEBALL TEAM GAS FUEL CLASS NAME CLASS SEX ; run; /*create the summary (final) table*/ PROC SQL; CREATE TABLE WORK.DUPL ( Report char(7), Tot_Dups Numeric(5) ); QUIT; /*macro to loop through all tables*/ %macro makereport(Report=, Keys=); PROC SQL; CREATE TABLE POPULATION_AUX AS SELECT "&Report" AS REPORT , COUNT(A.CT) AS Tot_Dups FROM ( SELECT &Keys, COUNT(*) AS CT FROM SASHELP.&Report GROUP BY &KEYS HAVING CT>1 )A GROUP BY A.Report QUIT; PROC APPEND BASE=WORK.POPULATION DATA=WORK.POPULATION_AUX FORCE; RUN; %mend makereport; /*create the macros*/ data; set METADATA; *builds string to execute macro; str = catt('%makereport(REPORT=',Report,',Keys=', Keys, ');'); *execute macro; call execute(str); run;
Thanks in advance
Well if your macro generated SAS code instead of SQL code you wouldn't want the commas.
I would use %STR() to protect the commas. There is no need for %NRSTR() since you never want the macro processor the deal with the commas.
keys=%str(TEAM, NAME, DIVISION, LEAGUE)
You will need to protect the %STR the same way you protected the %MAKEREPORT. Either with two %NRSTR().
string = cats('%nrstr(%makereport)(Report=',Report,',keys=%nrstr(%str)(',keylist,'));');
Or in this case perhaps just one around the whole thing. Either in the string.
string = cats('%nrstr(%makereport(Report=',Report,',keys=%str(',keylist,'));)')
Or in the call execute.
string = cats('%makereport(Report=',Report,',keys=%str(',keylist,'));')
call execute(cats('%nrstr(',string,')'));
I normally don't like to wrap the whole statement like that as sometimes I want the current macro variables expanded so the lines in the log will have the values. So you get lines in the log like:
1 + %mymacro(mydataset)
Instead of
1 + %mymacro(&dsname)
The log is clearer about what code is being run.
Also if you using CALL EXECUTE() to generate multiple macro calls in one data step you would have to worry if the the first macro execution changed the value of the macro variable referenced in the next macro call.
First step you should always do is to obtain running working valid legal SAS code without macros or macro variables. It doesn't seem as if you have done this, and if you had, the whole process of creating working macros is much smoother and easier and less prone to bugs.
Next, to debug macros, you can use
options mprint;
before running the macro, this will make your life soooooo much easier. If that doesn't help, you can get even more debugging information from
options mprint symbolgen mlogic;
Your specific problem is in PROC SQL and shown below, and has nothing to do with macros. This is incorrect syntax in PROC SQL. Can you spot the error? What do you need to change to make this syntax correct?
GROUP BY A.Report
QUIT;
Thanks @PaigeMiller
you are right it should have been GROUP BY "&Report"; (I've attached the new code, as also the PROC APPEND was not quite right either).
I can now run it but the problem is still that I need the SELECT (and GROUP BY) to list all variables in the same query, while at the moment is selecting one variable at a time.
For example, for the CARS table it does
SELECT MAKE... GROUP BY MAKE
SELECT MODEL... GROUP BY MODEL
While I'd need SELECT MAKE, MODEL .... GROUP BY MAKE, MODEL..
the issue must be with the way I build the string in the metadata table, but I can't see how it should be done correctly.
/* Create the tables containing the keys for each report */ data METADATA; input report $ Keys $; datalines; CARS MAKE CARS MODEL BASEBALL NAME BASEBALL TEAM GAS FUEL CLASS NAME CLASS SEX ; run; /*create the summary (final) table*/ PROC SQL; CREATE TABLE WORK.DUPL ( Report char(7), Tot_Dups Numeric(5) ); QUIT; options mprint symbolgen mlogic; /*macro to loop through all tables*/ %macro makereport(Report=, Keys=); PROC SQL; CREATE TABLE DUPL_AUX AS SELECT "&Report" AS REPORT , COUNT(A.CT) AS Tot_Dups FROM ( SELECT &Keys, COUNT(*) AS CT FROM SASHELP.&Report GROUP BY &KEYS HAVING CT>1 )A GROUP BY "&Report"; QUIT; PROC APPEND BASE=WORK.DUPL DATA=WORK.DUPL_AUX FORCE; RUN; %mend makereport; /*create the macros*/ data; set METADATA; *builds string to execute macro; str = catt('%makereport(REPORT=',Report,',Keys=', Keys, ');'); *execute macro; call execute(str); run;
ps: aslo thanks for the debug options, I need to work on those
thanks
So show me SAS code without macros and without macro variables, that does what you want.
Hi @PaigeMiller
the code without macro would look like this:
PROC SQL;
CREATE TABLE WORK.POPULATION
(
Report char(7),
Tot_Dups Numeric(5)
);
QUIT;
PROC SQL;
insert into WORK.POPULATION (report, tot_dups)
SELECT "BASEBALL" AS Report, COUNT(*) as tot_dups
FROM (
SELECT Name, Team, COUNT(*) AS CT
FROM SASHELP.BASEBALL
GROUP BY Name, Team
HAVING CT>1
)A;
QUIT;
PROC SQL;
insert into WORK.POPULATION (report, tot_dups)
SELECT "CARS" AS Report, COUNT(*) as tot_dups
FROM (
SELECT Make, Model, COUNT(*) AS CT
FROM SASHELP.CARS
GROUP BY Make, Model
HAVING CT>1
)A;
QUIT;
PROC SQL;
insert into WORK.POPULATION (report, tot_dups)
SELECT "CLASS" AS Report, COUNT(*) as tot_dups
FROM (
SELECT Sex, Name, COUNT(*) AS CT
FROM SASHELP.CLASS
GROUP BY Sex, Name
HAVING CT>1
)A;
QUIT;
thanks
One problem I see in the code without macros is your GROUP BY statements. Here they are
GROUP BY Name, Team
GROUP BY Make, Model
GROUP BY Sex, Name
But with macros you have
GROUP BY &KEYS
so for any of this to work in a macro, you have to create &KEYS to have two variable names separated by a comma.
However, your macro variable &KEYS is not being created this way, it is being created via using each row of data set METADATA as providing the value to &KEYS, so on the first row of METADATA, &KEYS is assigned the value of MAKE. That's clearly not what you want as shown in your code without macros. This is why it is extremely important to first have code without macros that does what you want.
Perhaps you want this:
data METADATA;
input report $ Keys :$16.;
datalines;
CARS MAKE,MODEL
BASEBALL NAME,TEAM
GAS FUEL
CLASS NAME,SEX
;
run;
Thanks @PaigeMiller
actually I'm very close to getting it by using the below (using Catx would take care of the commas).
data METADATA;
input report $ Keys $;
datalines;
CARS MAKE
CARS MODEL
CARS ORIGIN
CARS TYPE
BASEBALL NAME
BASEBALL TEAM
BASEBALL DIVISION
BASEBALL LEAGUE
CLASS NAME
CLASS SEX
;
run;
data want;
length str2 $200;
do until(eof);
set METADATA end=eof;
str2 = catx(";", str2, keys);
str = catt('%makereport(REPORT=',Report,',Keys=', str2, ');');
end;
run;
so column str is almost perfect - I just need to "break it" by each report - adding BY REPORT; I think I'd need to use DO WHILE but can't get the right syntax.
thanks
Use BY group processing. So if you want to use a DOW loop then something like this:
data want;
do until(last.report);
set METADATA end=eof;
by report ;
length keylist $200;
keylist=catx(' ',keylist,keys);
end;
call execute(cats('%nrstr(%makereport)(report=',report,',keys=',keylist,')'));
run;
Thanks @Tom
this is very, very close but I cannot get the %nrstr working...
I've slightly amended your code - below -
string = %nrstr(cats('%makereport(Report=',Report,',keys=',keylist,');'));
and it gives me exactly what I need
However when passing into the macro, the commas between the keys seem still to be an issue, as I receive an "All positional parameters must precede keyword parameters" error message.
I tried to move the %nrstr everywhere in the string and inside the macro, but cannot find the right way of working.
Here is the whole code for reference:
data METADATA_AUX;
input report $ Keys $;
datalines;
CARS MAKE
CARS MODEL
CARS ORIGIN
CARS TYPE
BASEBALL NAME
BASEBALL TEAM
BASEBALL DIVISION
BASEBALL LEAGUE
CLASS NAME
CLASS SEX
;
run;
PROC SQL;
CREATE TABLE WORK.METADATA AS
SELECT t1.report,
t1.Keys
FROM WORK.METADATA_AUX t1
ORDER BY t1.report;
QUIT;
PROC SQL;
CREATE TABLE WORK.POPULATION
(
Report char(7),
Tot_Dups Numeric(5)
);
QUIT;
options mprint symbolgen mlogic;
%macro makereport(Report=, keys=);
PROC SQL;
insert into WORK.POPULATION (Report, tot_dups)
SELECT "&Report" AS Report, COUNT(*) as tot_dups
FROM (
SELECT &keys, COUNT(*) AS CT
FROM SASHELP.&Report
GROUP BY &keys
HAVING CT>1
)A;
QUIT;
%mend makereport;
data want;
do until (last.report);
set METADATA end=eof;
by Report;
length keylist $1000;
keylist=catx(',',keylist,keys);
end;
string = cats(%nrstr('%makereport(Report=',Report,',keys=',keylist,');'));
call execute(string);
run;
Many thanks
The purpose is to protect the macro call so that it is not expanded while the CALL EXECUTE is pushing the code on the stack to run. Instead the macro does not run until after the data step finishes. As I requested you just need to add the %NRSTR() around the characters %MAKEREPORT. (Just around the %M is enough but that would look goofy).
string = cats('%nrstr(%makereport)(Report=',Report,',keys=',keylist,');');
So that the actual letters %nrstr( and ) are part of the character string used in call execute().
Here is a simple example to demonstrate the issue.
%macro test(dsname);
data &dsname;
set sashelp.class;
run;
%mend test;
data _null_;
call execute('%test(one)');
call execute('%nrstr(%test)(two)');
run;
Notice the difference in the log with the lines generate by CALL EXECUTE() (the lines with the plus sign in front).
264 data _null_; 265 call execute('%test(one)'); 266 call execute('%nrstr(%test)(two)'); 267 run; NOTE: DATA statement used (Total process time): real time 0.06 seconds cpu time 0.01 seconds NOTE: CALL EXECUTE generated line. 1 + data one; set sashelp.class; run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.ONE has 19 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.05 seconds cpu time 0.03 seconds 2 + %test(two) NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.TWO has 19 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Without the macro quoting the SAS statements generated by the macro is what shows in the log. That is a sign that the macro actually ran while the call execute() function was running. With the macro quoting the macro call itself is what shows in the log. So the macro does actually run until after the data step has finished.
Thanks Tom
Yes I understand (I think!)
So I now pass the whole string enclosed to the macro, and I can see it in the log; but it still gives me the error message
I cannot see where the issue is: would it be within the macro?
Thanks again
The error message is telling you that you cannot pass parameter values by position after you have already passed other parameter values by name. You passed the REPORT and KEYS parameters by name. You set the KEYS parameter to TEAM. But then you added three more unnamed parameter values.
The problem now is in the value of the data step variable KEYLIST.
Why did you put commas in the list?
Use spaces between the names in the list.
hi @Tom
I need keys to pass all the parameters, so the query will be like
SELECT TEAM, NAME, DIVISION, LEAGUE
FROM SASHELP.BASEBALL...
If I remove the commas the PROC SQL won't work.
(I thought the %nrstr would serve this purpose?)
thanks
Well if your macro generated SAS code instead of SQL code you wouldn't want the commas.
I would use %STR() to protect the commas. There is no need for %NRSTR() since you never want the macro processor the deal with the commas.
keys=%str(TEAM, NAME, DIVISION, LEAGUE)
You will need to protect the %STR the same way you protected the %MAKEREPORT. Either with two %NRSTR().
string = cats('%nrstr(%makereport)(Report=',Report,',keys=%nrstr(%str)(',keylist,'));');
Or in this case perhaps just one around the whole thing. Either in the string.
string = cats('%nrstr(%makereport(Report=',Report,',keys=%str(',keylist,'));)')
Or in the call execute.
string = cats('%makereport(Report=',Report,',keys=%str(',keylist,'));')
call execute(cats('%nrstr(',string,')'));
I normally don't like to wrap the whole statement like that as sometimes I want the current macro variables expanded so the lines in the log will have the values. So you get lines in the log like:
1 + %mymacro(mydataset)
Instead of
1 + %mymacro(&dsname)
The log is clearer about what code is being run.
Also if you using CALL EXECUTE() to generate multiple macro calls in one data step you would have to worry if the the first macro execution changed the value of the macro variable referenced in the next macro call.
Thanks @Tom ; the %str around keys has worked!! got it, that was the issue (and I agree, it's useful to se e all the values in the log).
Thanks for all your help, very much appreciated
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.