BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
fierceanalytics
Obsidian | Level 7

Hello, 

Here is generic form of the situation.

 

%macro cvn77;

%do I =1 %to 4;

data subset ; set WHOLE (where=(year=&I)); by account; if last.account;

.....

%end;

%mend ; %cvn77

 

The subset using where statement here works fine. if removing where subset on the data set WHOLE, and applying " if year=&I; by account; if last.account; ", the subset comes up with 0 observation with no error flag. Since I is 1,2,3,4, I can put on %eval or %sysevalf (not necessary since I is integer. just testing) like "if year =%eval(&I);", it still  gives 0 observation.

 

+; if we use "%put _all_", that lists all macro variables in the log file. I vaguely recall there is  a way to list macro variables produced by a specific macro block such as %macro cvn77, but could be wrong that there is no such way. Let me know. 

 

Thanks. Jia

 

Thoughts? Thanks, Jia 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You cannot list macro variables that are scoped to a specific macro.  But you can display all local macro variables.  So all of the macro variables that are currently defined by any executing macro (you could have more than on local scope active if a macro calls another macro).  

%put _local_;

Note that just referencing a macro variable inside a macro definition will not necessarily make it a local macro variable. The only macro variable in your posted code is i.  If there exists a macro variable named i before CVN77 is called then the %DO loop would modify the value of that existing macro variable instead of creating on the CVN77 macro scope.  The macro variable i would be created local to CVN77 only if no existing macro variable named i was found at the time the %DO loop started to execute.

 

There will be an interaction between the WHERE= dataset option and the functioning of your particular IF statement.

That is because the WHERE condition will filter out observations before they even enter the data step, so can only reference variables that exist in that dataset.

The IF statement will evaluate during the current iteration of the data step (essentially while processing the current observation).  So you can reference any variable available in the data step.  And the value will be the value when the IF statement runs (which could have been changed by statements earlier in the data step).

Your particular IF statement is testing the FIRST. variable generated by SAS when you use a BY statement in a data step.

But since you have a WHERE clause being applied to what observations are considered the FIRST observation for an account that the data step sees might NOT be the first observation for that account in the dataset being read.

 

Try this example:

proc sort data=sashelp.class out=class;
  by sex age ;
run;

data test1;
  set class;
  by sex;
  if first.sex;
run;

proc print;
run;

data test2;
  set class;
  where age>11 ;
  by sex;
  if first.sex;
run;

proc print;
run;
Spoiler
Tom_0-1654885309404.png

 

 

View solution in original post

7 REPLIES 7
Reeza
Super User
Post both sets of code. As shown they should be equivalent, the primary difference between a WHERE/IF subset is performance time, results should be identical.
FreelanceReinh
Jade | Level 19

Hello @fierceanalytics,

 

Note that the BY groups are formed after the WHERE condition is applied, so last.account refers to the last of a group of observations for the same account after restricting to a particular year.

 

With the IF year=... condition, however, last.account refers to the last of a group of observations for the same account across all years. So the year of that observation can well be different from the year specified in the subsetting IF statement, hence zero observations.

 

Example:

data have;
input account year x $;
cards;
0 1 a
0 2 b
;

data want;
set have(where=(year=1));
by account;
if last.account;
run; /* 1 obs. */

data dontwant;
set have;
by account;
if year=1;
if last.account;
run; /* 0 obs. */
Tom
Super User Tom
Super User

You cannot list macro variables that are scoped to a specific macro.  But you can display all local macro variables.  So all of the macro variables that are currently defined by any executing macro (you could have more than on local scope active if a macro calls another macro).  

%put _local_;

Note that just referencing a macro variable inside a macro definition will not necessarily make it a local macro variable. The only macro variable in your posted code is i.  If there exists a macro variable named i before CVN77 is called then the %DO loop would modify the value of that existing macro variable instead of creating on the CVN77 macro scope.  The macro variable i would be created local to CVN77 only if no existing macro variable named i was found at the time the %DO loop started to execute.

 

There will be an interaction between the WHERE= dataset option and the functioning of your particular IF statement.

That is because the WHERE condition will filter out observations before they even enter the data step, so can only reference variables that exist in that dataset.

The IF statement will evaluate during the current iteration of the data step (essentially while processing the current observation).  So you can reference any variable available in the data step.  And the value will be the value when the IF statement runs (which could have been changed by statements earlier in the data step).

Your particular IF statement is testing the FIRST. variable generated by SAS when you use a BY statement in a data step.

But since you have a WHERE clause being applied to what observations are considered the FIRST observation for an account that the data step sees might NOT be the first observation for that account in the dataset being read.

 

Try this example:

proc sort data=sashelp.class out=class;
  by sex age ;
run;

data test1;
  set class;
  by sex;
  if first.sex;
run;

proc print;
run;

data test2;
  set class;
  where age>11 ;
  by sex;
  if first.sex;
run;

proc print;
run;
Spoiler
Tom_0-1654885309404.png

 

 

fierceanalytics
Obsidian | Level 7
Thank you. Tom. The Where -IF subject I think it clear. I am going to snippet your reply below
"Note that just referencing a macro variable inside a macro definition will not necessarily make it a local macro variable. The only macro variable in your posted code is i. If there exists a macro variable named i before CVN77 is called then the %DO loop would modify the value of that existing macro variable instead of creating on the CVN77 macro scope. The macro variable i would be created local to CVN77 only if no existing macro variable named i was found at the time the %DO loop started to execute."

to answer another question just asked by somebody. I habitually don't use same i, j.... crossing my macros. Somebody just asked why she cannot just use I.
Tom
Super User Tom
Super User

You can use whatever names you want for your macro variables.  Frequently there are more descriptive names than I or J that will make the code easier to understand.

 

The main point of local macro variables is to prevent your macro from accidently modifying macro variables that the caller is using.  But there is no way to prevent a macro you call from messing with your local macro variables.  

 

So the best thing to do is to always define the macro variables your macro uses as LOCAL, either as parameters on the %MACRO statement or via the %LOCAL statement.  Then at least for macros you have written you can feel safer calling them with the knowledge that they will not modify the macro variables you are using.

 

If you need values from the calling environment define them as parameters and pass them in via the macro call.

 

If you need to return macro variables be very clear in your documentation that is what you are doing.  And don't just blindly issue a %GLOBAL statement in a macro.  That will generate an error if the macro variable already exists in some running macro's scope.  You can use %SYMEXIST() to check.

%if not %symexist(returncode) %then %global returncode;

 

 

fierceanalytics
Obsidian | Level 7
Thank you.
mkeintz
PROC Star

When you use WHERE to filter data, the filtering is outsourced to whatever data engine is needed to access the data.  But the "first.", "last." and "by" process is not outsourced.  Therefore these order-based statements are applied only to the filtered data, i.e. only to those records for a given year.  You will successfully retrieve the last record for a given account/year.

 

But if you use the IF filter, the BY, FIRST and LAST statements are applied to ALL the observations, because the filtering is not outsourced.  So if the year-of-interest is not the last obs for a given account, it will not be retrieved, as per @FreelanceReinh's response.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 590 views
  • 0 likes
  • 5 in conversation