BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MART1
Quartz | Level 8

Hello

 

I'm trying to create a SELECT statement for different tables having different variables - the aim is to check if each table contains duplicates, but I'm struggling to get the right code (this community kindly helped me to find a solution when I was looking for duplicates considering all columns, but now I only need to select some columns ( i.e. keys)).

 

As I cannot use GROUP BY *, I need to list all the variables in the GROUP BY statement, but I can't find the way to do so.

 

Below is my attempt - which of course does not work, as the SELECT &Keys and GROUP BY &KEYS does not list all the variables.

/* Create the tables containing the keys for each report */
data METADATA;
input report $ Keys $;
datalines;
CARS MAKE
CARS MODEL
BASEBALL NAME
BASEBALL TEAM
GAS FUEL
CLASS NAME
CLASS SEX
;
run;

/*create the summary (final) table*/
PROC SQL;
CREATE TABLE WORK.DUPL
(
Report char(7),
Tot_Dups Numeric(5)
);
QUIT;

/*macro to loop through all tables*/
%macro makereport(Report=, Keys=);

PROC SQL;
CREATE TABLE POPULATION_AUX AS


SELECT "&Report" AS REPORT , COUNT(A.CT) AS Tot_Dups
FROM (

SELECT &Keys, COUNT(*) AS CT
FROM SASHELP.&Report
GROUP BY &KEYS
HAVING CT>1

)A
GROUP BY A.Report

QUIT;

PROC APPEND BASE=WORK.POPULATION DATA=WORK.POPULATION_AUX FORCE;
RUN;

%mend makereport;

/*create the macros*/
data;
set METADATA;

*builds string to execute macro;
str = catt('%makereport(REPORT=',Report,',Keys=', Keys, ');');

*execute macro;
call execute(str);
run;

 

Thanks in advance

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Well if your macro generated SAS code instead of SQL code you wouldn't want the commas.

 

I would use %STR() to protect the commas. There is no need for %NRSTR() since you never want the macro processor the deal with the commas. 

keys=%str(TEAM, NAME, DIVISION, LEAGUE)

You  will need to protect the %STR the same way you protected the %MAKEREPORT.  Either with two %NRSTR().

string = cats('%nrstr(%makereport)(Report=',Report,',keys=%nrstr(%str)(',keylist,'));');

Or in this case perhaps just one around the whole thing. Either in the string.

string = cats('%nrstr(%makereport(Report=',Report,',keys=%str(',keylist,'));)')

Or in the call execute.

string = cats('%makereport(Report=',Report,',keys=%str(',keylist,'));')
call execute(cats('%nrstr(',string,')'));

I normally don't like to wrap the whole statement like that as sometimes I want the current macro variables expanded so the lines in the log will have the values.  So you get lines in the log like:

1    + %mymacro(mydataset)

Instead of

1    + %mymacro(&dsname)

The log is clearer about what code is being run. 

 

Also if you using CALL EXECUTE() to generate multiple macro calls in one data step you would have to worry if the the first macro execution changed the value of the macro variable referenced in the next macro call.

View solution in original post

17 REPLIES 17
PaigeMiller
Diamond | Level 26

First step you should always do is to obtain running working valid legal SAS code without macros or macro variables. It doesn't seem as if you have done this, and if you had, the whole process of creating working macros is much smoother and easier and less prone to bugs.

 

Next, to debug macros, you can use

options mprint;

before running the macro, this will make your life soooooo much easier. If that doesn't help, you can get even more debugging information from

options mprint symbolgen mlogic;

 

Your specific problem is in PROC SQL and shown below, and has nothing to do with macros. This is incorrect syntax in PROC SQL. Can you spot the error? What do you need to change to make this syntax correct?

 

GROUP BY A.Report

QUIT;

 

 

 

--
Paige Miller
MART1
Quartz | Level 8

Thanks @PaigeMiller 

 

you are right it should have been GROUP BY "&Report";  (I've attached the new code, as also the PROC APPEND was not quite right either).

 

I can now run it but the problem is still that I need the SELECT (and GROUP BY) to list all variables in the same query, while at the moment is selecting one variable at a time.

 

For example, for the CARS table it does

 

SELECT MAKE... GROUP BY MAKE

SELECT MODEL... GROUP BY MODEL

 

While I'd need SELECT MAKE, MODEL .... GROUP BY MAKE, MODEL..

 

the issue must be with the way I build the string in the metadata table, but I can't see how it should be done correctly.

 

/* Create the tables containing the keys for each report */
data METADATA;
	input report $ Keys $;
	datalines;
CARS MAKE
CARS MODEL
BASEBALL NAME
BASEBALL TEAM
GAS FUEL
CLASS NAME
CLASS SEX
;
run;

/*create the summary (final) table*/
PROC SQL;
	CREATE TABLE WORK.DUPL
		(
		Report    char(7),
		Tot_Dups Numeric(5)
		);
QUIT;

options mprint symbolgen mlogic;

/*macro to loop through all tables*/
%macro makereport(Report=, Keys=);


	PROC SQL;
		CREATE TABLE DUPL_AUX AS


		SELECT  "&Report" AS REPORT , COUNT(A.CT) AS Tot_Dups
FROM (

		SELECT &Keys, COUNT(*) AS CT
			FROM SASHELP.&Report
			GROUP BY &KEYS
			HAVING CT>1

)A
GROUP BY "&Report";

				QUIT;

	PROC APPEND BASE=WORK.DUPL DATA=WORK.DUPL_AUX FORCE;
	RUN;

%mend makereport;

/*create the macros*/
data;
	set METADATA;

	*builds string to execute macro;
	str = catt('%makereport(REPORT=',Report,',Keys=',  Keys,  ');');

	*execute macro;
	call execute(str);
run;

ps: aslo thanks for the debug options, I need to work on those

 

thanks

PaigeMiller
Diamond | Level 26

So show me SAS code without macros and without macro variables, that does what you want.

--
Paige Miller
MART1
Quartz | Level 8

Hi @PaigeMiller 

the code without macro would look like this:

PROC SQL;

     CREATE TABLE WORK.POPULATION

           (

           Report    char(7),

           Tot_Dups Numeric(5)

           );

QUIT;

 

     PROC SQL;

insert into WORK.POPULATION (report, tot_dups)

 

           SELECT "BASEBALL" AS Report, COUNT(*) as tot_dups

FROM (

 

           SELECT Name, Team, COUNT(*) AS CT

                FROM SASHELP.BASEBALL

                GROUP BY Name, Team

                HAVING CT>1

 

)A;       

QUIT;

 

 

     PROC SQL;

insert into WORK.POPULATION (report, tot_dups)

 

           SELECT "CARS" AS Report, COUNT(*) as tot_dups

FROM (

 

           SELECT Make, Model, COUNT(*) AS CT

                FROM SASHELP.CARS

                GROUP BY Make, Model

                HAVING CT>1

 

)A;       

QUIT;

 

     PROC SQL;

insert into WORK.POPULATION (report, tot_dups)

 

           SELECT "CLASS" AS Report, COUNT(*) as tot_dups

FROM (

 

           SELECT Sex, Name, COUNT(*) AS CT

                FROM SASHELP.CLASS

                GROUP BY Sex, Name

                HAVING CT>1

 

)A;       

QUIT;

 

thanks

PaigeMiller
Diamond | Level 26

One problem I see in the code without macros is your GROUP BY statements. Here they are

 

GROUP BY Name, Team
GROUP BY Make, Model
GROUP BY Sex, Name

But with macros you have

 

GROUP BY &KEYS

so for any of this to work in a macro, you have to create &KEYS to have two variable names separated by a comma.

 

However, your macro variable &KEYS is not being created this way, it is being created via using each row of data set METADATA as providing the value to &KEYS, so on the first row of METADATA, &KEYS is assigned the value of MAKE. That's clearly not what you want as shown in your code without macros. This is why it is extremely important to first have code without macros that does what you want.

 

Perhaps you want this:

 

data METADATA;
input report $ Keys :$16.;
datalines;
CARS MAKE,MODEL
BASEBALL NAME,TEAM
GAS FUEL
CLASS NAME,SEX
;
run;

 

--
Paige Miller
MART1
Quartz | Level 8

Thanks @PaigeMiller 

 

actually I'm very close to getting it by using the below (using Catx would take care of the commas).

 

data METADATA;

     input report $ Keys $;

     datalines;

CARS MAKE

CARS MODEL

CARS ORIGIN

CARS TYPE

BASEBALL NAME

BASEBALL TEAM

BASEBALL DIVISION

BASEBALL LEAGUE

CLASS NAME

CLASS SEX

;

run;

 
data want;

     length str2 $200;

     do until(eof);

           set METADATA end=eof;

                str2 = catx(";", str2, keys);

                str = catt('%makereport(REPORT=',Report,',Keys=',  str2,  ');');

           end;

run;

 

SAS.png

so column str is almost perfect - I just need to "break it" by each report - adding BY REPORT; I think I'd need to use DO WHILE but can't get the right syntax.

 

thanks

Tom
Super User Tom
Super User

Use BY group processing. So if you want to use a DOW loop then something like this:

data want;
  do until(last.report);
    set METADATA end=eof;
    by report ;
    length keylist $200;
    keylist=catx(' ',keylist,keys);
  end;
  call execute(cats('%nrstr(%makereport)(report=',report,',keys=',keylist,')'));
run;
MART1
Quartz | Level 8

Thanks @Tom 

 

this is very, very close but I cannot get the %nrstr working...

 

I've slightly amended your code - below -

 

 

string = %nrstr(cats('%makereport(Report=',Report,',keys=',keylist,');'));
 

and it gives me exactly what I need

ss.png

 

However when passing into the macro, the commas between the keys seem still to be an issue, as I receive an "All positional parameters must precede keyword parameters" error message.

 

I tried to move the %nrstr everywhere in the string and inside the macro, but cannot find the right way of working.

 

Here is the whole code for reference:

 

 

 

 data METADATA_AUX;

     input report $ Keys $;

     datalines;

CARS MAKE
CARS MODEL
CARS ORIGIN
CARS TYPE
BASEBALL NAME
BASEBALL TEAM
BASEBALL DIVISION
BASEBALL LEAGUE
CLASS NAME
CLASS SEX
;
run;

 

PROC SQL;

     CREATE TABLE WORK.METADATA AS

           SELECT t1.report,

                t1.Keys

           FROM WORK.METADATA_AUX t1

                ORDER BY t1.report;

QUIT;


PROC SQL;

     CREATE TABLE WORK.POPULATION

           (
           Report    char(7),
           Tot_Dups Numeric(5)
           );
QUIT;


options mprint symbolgen mlogic;


%macro makereport(Report=, keys=);

PROC SQL;

     insert into WORK.POPULATION (Report, tot_dups)

     SELECT "&Report" AS Report, COUNT(*) as tot_dups

           FROM (
           SELECT &keys, COUNT(*) AS CT

                FROM SASHELP.&Report

                     GROUP BY &keys

                           HAVING CT>1

                               )A;
QUIT;


%mend makereport;

data want;
     do until (last.report);

           set METADATA end=eof;

           by Report;

           length keylist $1000;

           keylist=catx(',',keylist,keys);

     end;

string = cats(%nrstr('%makereport(Report=',Report,',keys=',keylist,');'));

call execute(string);
run;

Many thanks

 

 

Tom
Super User Tom
Super User

The purpose is to protect the macro call so that it is not expanded while the CALL EXECUTE is pushing the code on the stack to run.  Instead the macro does not run until after the data step finishes.  As I requested you just need to add the %NRSTR() around the characters %MAKEREPORT. (Just around the %M is enough but that would look goofy).

string = cats('%nrstr(%makereport)(Report=',Report,',keys=',keylist,');');

So that the actual letters %nrstr( and ) are part of the character string used in call execute().

 

Here is a simple example to demonstrate the issue.

%macro test(dsname);
data &dsname;
  set sashelp.class;
run;
%mend test;

data _null_;
  call execute('%test(one)');
  call execute('%nrstr(%test)(two)');
run;

Notice the difference in the log with the lines generate by CALL EXECUTE() (the lines with the plus sign in front).

264   data _null_;
265     call execute('%test(one)');
266     call execute('%nrstr(%test)(two)');
267   run;

NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.01 seconds


NOTE: CALL EXECUTE generated line.
1    + data one;   set sashelp.class; run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.ONE has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds


2    + %test(two)

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TWO has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Without the macro quoting the SAS statements generated by the macro is what shows in the log.  That is a sign that the macro actually ran while the call execute() function was running.  With the macro quoting the macro call itself is what  shows in the log. So the macro does actually run until after the data step has finished.

MART1
Quartz | Level 8

Thanks Tom

 

Yes I understand (I think!) 

 

So I now pass the whole string enclosed to the macro, and I can see it in the log; but it still gives me the error message

 

ss1.png

 

I cannot see where the issue is: would it be within the macro?

 

Thanks again 

Tom
Super User Tom
Super User

The error message is telling you that you cannot pass parameter values by position after you have already passed other parameter values by name.  You passed the REPORT and KEYS parameters by name.  You set the KEYS parameter to TEAM.  But then you added three more unnamed parameter values.

 

The problem now is in the value of the data step variable KEYLIST.

Why did you put commas in the list?  

Use spaces between the names in the list.

MART1
Quartz | Level 8

hi @Tom 

 

I need keys to pass all the parameters, so the query will be like 

 

SELECT TEAM, NAME, DIVISION, LEAGUE

FROM SASHELP.BASEBALL...

 

If I remove the commas the PROC SQL won't work.

 

(I thought the %nrstr would serve this purpose?)

 

thanks

Tom
Super User Tom
Super User

Well if your macro generated SAS code instead of SQL code you wouldn't want the commas.

 

I would use %STR() to protect the commas. There is no need for %NRSTR() since you never want the macro processor the deal with the commas. 

keys=%str(TEAM, NAME, DIVISION, LEAGUE)

You  will need to protect the %STR the same way you protected the %MAKEREPORT.  Either with two %NRSTR().

string = cats('%nrstr(%makereport)(Report=',Report,',keys=%nrstr(%str)(',keylist,'));');

Or in this case perhaps just one around the whole thing. Either in the string.

string = cats('%nrstr(%makereport(Report=',Report,',keys=%str(',keylist,'));)')

Or in the call execute.

string = cats('%makereport(Report=',Report,',keys=%str(',keylist,'));')
call execute(cats('%nrstr(',string,')'));

I normally don't like to wrap the whole statement like that as sometimes I want the current macro variables expanded so the lines in the log will have the values.  So you get lines in the log like:

1    + %mymacro(mydataset)

Instead of

1    + %mymacro(&dsname)

The log is clearer about what code is being run. 

 

Also if you using CALL EXECUTE() to generate multiple macro calls in one data step you would have to worry if the the first macro execution changed the value of the macro variable referenced in the next macro call.

MART1
Quartz | Level 8

Thanks @Tom ; the %str around keys has worked!! got it, that was the issue (and I agree, it's useful to se e all the values in the log).

 

Thanks for all your help, very much appreciated

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2332 views
  • 0 likes
  • 3 in conversation