BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cosmid
Lapis Lazuli | Level 10

Hi,

 

The book has the following code:

%let job=Analyst;
data work.staff;
  keep employeeID jobtitle salary;
  set certadv.staff;
  where jobtitle contains "&job";
  total+salary;
  count+1;
  call symputx('avg',put(total/count, dollar9.));
run;

Question 1:

Why if I modify the above the code for the counter variable, count:

data work.staff;
  count = 0;
  ....orig code here...
  count = count + 1;
run;

Why wouldn't this work for the count variable? I will end up with a value of 1 for every observation. So for SAS, it has to be count+1? which is not a short version of a count=count+1.

 

Question 2:

I can use a %PUT statement to print the value of a macro variable to the log:

%put avg is &avg;

How do I print the value of a data step variable to the log or somewhere? For example, the count:

%put count is count;

That will just print count is count. Is looking at the SAS dataset itself the only way to see the values of a data step variable?

 

Question 3:

The call symputx used in the code from the book, how come it calculated the average for the total correctly? I thought an IF statement will be needed to calculate the average for the final value of Total, something like:

data work.staff;
  set certadv.staff end=last;
  ... original code...
  if last then do;
    call symputx('avg',put(total/count,dollar9.));
  end;
run;

Would the variable last make the call symputx to use the final value of Total?

 

Question 4:

I thought the way a DATA step works is that, all the code will evaluate once for each observation. Example:

data work.staff;
  ...original code..
where jobtitle contains "&job"; count+1; %put count; run;

If I add that %put statement in the code, and assume there are 7 observations that satisfied the WHERE statement, shouldn't there be 7 put statements in the log? But when I tested this, it only printed 7. I guess I may have misunderstood how SAS DATA step works and hence the confusion for Question 3.

 

Thanks for all the help!

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Good questions!  

 

Question 1:

If you modify the code to:

data work.staff;
  count = 0;
  ....orig code here...
  count = count + 1;
run;

You will end up with count=1 for every record because of the COUNT=0 assignment statement.  Remember that the DATA step is a loop.  With above code, it will set COUNT=0, then will execute the SET statement and read the first record, then at the bottom it will increment COUNT by 1.  Then DATA step loop will iterate again, and at the stop it will set COUNT=0 again before reading the second record.  

 

The statement COUNT + 1 is an example of a SUM statement.  The documentation for the SUM statement is: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/n1dfiqj146yi2cn1maeju9wo7ijs.htm. The documentation explains that the sum statement:

COUNT + 1 ;

is equivalent to: 

retain count 0;
count=sum(count,1);

In your example, you could replace your statement COUNT=0; with the statement RETAIN COUNT 0;  and your code will work. It works because the RETAIN statements works at DATA step compile time, it is not executed each time the DATA step loops.  So it sets COUNT to 0 once, before the DATA step has started executing.

 

Question 2:

The %PUT statement is part of the macro language.  You can use a DATA step PUT statement to write the value of dataset variables to the log.  For example:

data _null_ ;
  set sashelp.class (obs=3);
  put "The value of name is " name  ;
run ;

data _null_ ;
  set sashelp.class (obs=3);
  put (name age sex)(=) ;
run ;

Question 3:

I agree with you.  The original code works, but I don't like it:

%let job=Analyst;
data work.staff;
  keep employeeID jobtitle salary;
  set certadv.staff;
  where jobtitle contains "&job";
  total+salary;
  count+1;
  call symputx('avg',put(total/count, dollar9.));
run;

Again, the DATA step is a loop.  If there are 1000 records read in the above step, it will execute the CALL SYMPUTX statement 1000 times.  Each time it will overwrite the value of the macro variable AVG.  So it works, because when it reads the last record it will over-write the macro variable AVG for the 1000th time, and it will have the correct value.  But your code with IF LAST... is better, both because it will execute CALL SYMPUTX only once, and because the purpose of the code is more clear.

 

Question 4. 

This is the same answer as Question 2.  Macro statements like %PUT are not part of the DATA step code.  They are executed by the macro processor before the DATA step even compiles.  If you want to see the value of the COUNT variable, you can add a PUT statement:

put count= ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

10 REPLIES 10
Quentin
Super User

Good questions!  

 

Question 1:

If you modify the code to:

data work.staff;
  count = 0;
  ....orig code here...
  count = count + 1;
run;

You will end up with count=1 for every record because of the COUNT=0 assignment statement.  Remember that the DATA step is a loop.  With above code, it will set COUNT=0, then will execute the SET statement and read the first record, then at the bottom it will increment COUNT by 1.  Then DATA step loop will iterate again, and at the stop it will set COUNT=0 again before reading the second record.  

 

The statement COUNT + 1 is an example of a SUM statement.  The documentation for the SUM statement is: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/n1dfiqj146yi2cn1maeju9wo7ijs.htm. The documentation explains that the sum statement:

COUNT + 1 ;

is equivalent to: 

retain count 0;
count=sum(count,1);

In your example, you could replace your statement COUNT=0; with the statement RETAIN COUNT 0;  and your code will work. It works because the RETAIN statements works at DATA step compile time, it is not executed each time the DATA step loops.  So it sets COUNT to 0 once, before the DATA step has started executing.

 

Question 2:

The %PUT statement is part of the macro language.  You can use a DATA step PUT statement to write the value of dataset variables to the log.  For example:

data _null_ ;
  set sashelp.class (obs=3);
  put "The value of name is " name  ;
run ;

data _null_ ;
  set sashelp.class (obs=3);
  put (name age sex)(=) ;
run ;

Question 3:

I agree with you.  The original code works, but I don't like it:

%let job=Analyst;
data work.staff;
  keep employeeID jobtitle salary;
  set certadv.staff;
  where jobtitle contains "&job";
  total+salary;
  count+1;
  call symputx('avg',put(total/count, dollar9.));
run;

Again, the DATA step is a loop.  If there are 1000 records read in the above step, it will execute the CALL SYMPUTX statement 1000 times.  Each time it will overwrite the value of the macro variable AVG.  So it works, because when it reads the last record it will over-write the macro variable AVG for the 1000th time, and it will have the correct value.  But your code with IF LAST... is better, both because it will execute CALL SYMPUTX only once, and because the purpose of the code is more clear.

 

Question 4. 

This is the same answer as Question 2.  Macro statements like %PUT are not part of the DATA step code.  They are executed by the macro processor before the DATA step even compiles.  If you want to see the value of the COUNT variable, you can add a PUT statement:

put count= ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
cosmid
Lapis Lazuli | Level 10

Thank you for the detailed explanation to all of my questions! I understand the retain statement much better now. And that I need to use PUT for DATA SET vars and %PUT for macro vars.

 

I am still a bit confused about the CALL SYMPUTX. Let me use the same code with AVG changed to MACRO_COUNT to use as a macro variable for the DATA SET var COUNT:

 

%let job=Analyst;
data work.staff;
  keep employeeID jobtitle salary;
  set certadv.staff;
  where jobtitle contains "&job";
  total+salary;
  count+1;
put "count is " count; call symputx('macro_count',put(count, 8.)); %put &=macro_count; run;

I add a PUT for count and a %PUT for macr_count.

 

The PUT did print 1,2,3,4,5,6,7 as expected since there are 7 observations that matched the WHERE condition.

However, the %PUT only printed 7 to the log, and the result is printed before the PUT results although it appeared after the PUT statement in the code. 

Here's what the book has:

CALL SYMPUTX('macro-variable', DATA-step-variable);

This form of the CALL SYMPUTX routine creates a macro variable named macro-variable and assigns to it the current value of DATA-step-variable.

 

If it is assigning the current value, then why didn't the %PUT print 1,2,3,4,5,6,7? So, the logic is confusing me here. Maybe I missed something and need to read this chapter again.

Quentin
Super User

@cosmid wrote:

Thank you for the detailed explanation to all of my questions! I understand the retain statement much better now. And that I need to use PUT for DATA SET vars and %PUT for macro vars.

 

I am still a bit confused about the CALL SYMPUTX. Let me use the same code with AVG changed to MACRO_COUNT to use as a macro variable for the DATA SET var COUNT:

 

%let job=Analyst;
data work.staff;
  keep employeeID jobtitle salary;
  set certadv.staff;
  where jobtitle contains "&job";
  total+salary;
  count+1;
put "count is " count; call symputx('macro_count',put(count, 8.)); %put &=macro_count; run;

I add a PUT for count and a %PUT for macr_count.

 

The PUT did print 1,2,3,4,5,6,7 as expected since there are 7 observations that matched the WHERE condition.

However, the %PUT only printed 7 to the log, and the result is printed before the PUT results although it appeared after the PUT statement in the code. 

Here's what the book has:

CALL SYMPUTX('macro-variable', DATA-step-variable);

This form of the CALL SYMPUTX routine creates a macro variable named macro-variable and assigns to it the current value of DATA-step-variable.

 

If it is assigning the current value, then why didn't the %PUT print 1,2,3,4,5,6,7? So, the logic is confusing me here. Maybe I missed something and need to read this chapter again.


It's a tricky timing issue.  I agree with others, that if you're learning the SAS DATA step language, it's good to focus on that now.  The macro language (%PUT) is a completely different language, with a different purpose, and different rules.

 

That said, one of the rules of the macro language is that macro language code is executed before any DATA step language code is executed (or even compiled).  This is because the macro language is a pre-processor, with the typical purpose of generating DATA step code (or other SAS code).  If you code:

 

%symdel name /nowarn ;
data _null_;
  set sashelp.class;
  call symputx('name',name);
  put "Value of dataset var " name= ;
  %put Value of macro var: &=name ;
run;

 it looks like the %PUT statement is 'inside' the DATA step.  But even though the line number is inside the DATA step, the %PUT statement is not part of the DATA step, because it's a statement for the macro language.  So the %PUT statement is executed before the DATA step starts to execute.  That's why %PUT is only executed once, even though PUT is executed 19 times.  PUT is part of the DATA step.

 

Also running the above shows that the %PUT statement reference to the macro variable NAME will not resolve.  Why not?  It's again because the %PUT statement is executed before any of the code in the DATA step is executed.  So when %PUT is executed, the CALL SYMPUTX has not run yet, and there is no macro variable NAME.

 

If you move the %PUT statement to after the RUN; statement, then the macro variable NAME would exist, and the code would work as intended.  Because in that case, the DATA step would be executed before SAS ever sees the PUT statement.

 

Again, if you're learning SAS and the DATA step, it's best not to try to learn the macro language at the same time.  You would be better off to focus on learning the DATA step language.  The macro language adds another layer of complexity.  There are timing issues, and other tricky aspects.  Your confusion around %PUT vs PUT is common, and it's also common to have confusion around %IF vs IF, %DO vs DO, etc.  If you try to learn the macro language at the same time as you're learning the DATA step language, you're trying to learn two languages at once.  I often recommend people program in SAS for a year or two before starting to learn the macro language.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
cosmid
Lapis Lazuli | Level 10
Thanks again for the detailed explanation! I'll take everyone's advice and start learning SAS again.
PaigeMiller
Diamond | Level 26

Question 1:

Why if I modify the above the code for the counter variable, count:

data work.staff;
  count = 0;
  ....orig code here...
  count = count + 1;
run;

Why wouldn't this work for the count variable? I will end up with a value of 1 for every observation. So for SAS, it has to be count+1? which is not a short version of a count=count+1.

 

Count=Count+1 is not the same as Count+1. Each record has count set to zero initially (in your code), so when Count=Count+1 is executed in your code, it is always equal to 1 (because it was zero, and you added 1 to it). To get cumulative totals or counts, you need to remove COUNT=0 and then use Count+1.

 

How do I print the value of a data step variable to the log or somewhere? For example, the count:

In a data step, use the PUT statement (not the %PUT statement).

 

Would the variable last make the call symputx to use the final value of Total?

 

You haven't shown us the entire DATA step code, so we cannot answer this question. But you can answer the question by running the code in SAS and seeing what the answer is.

 

If I add that %put statement in the code, and assume there are 7 observations that satisfied the WHERE statement, shouldn't there be 7 put statements in the log? But when I tested this, it only printed 7. I guess I may have misunderstood how SAS DATA step works and hence the confusion for Question 3.

 

You want the PUT statement in a DATA step, PUT being a DATA step command, and it will execute on each record. %PUT is not a data step command, it is a macro command, and behaves the way you see it, it executes once when the DATA step ends.

--
Paige Miller
cosmid
Lapis Lazuli | Level 10
Hi, thanks for your answer! I now learned not to use var=0 in SAS and the difference between PUT and %PUT. Thanks!
For Question3, that's the entire DATA step code from the book. I did run the code and it give the correct average value which is why I was confused. But as Quentin explained, the result get overwritten each time and it clarified my confusion on that part.
Question 4, The macro command executing only once is where I am super confused. If it execute only once, why do they need to be used with an IF statment with the END= option variable. If they execute only once, then how do they get the current value of the Data Step variable? I did some test runs and they seems to have values as though they have been iterated the same number of times as of the regular Data Step statements. Sorry, I am so confused about this part I couldn't even ask the correct question in a logical way. I think I need to read that chapter once more and follow up.
PaigeMiller
Diamond | Level 26

First, there is no need to name a macro variable as MACRO_COUNT, as usually the name COUNT will suffice for the variable name, and COUNT is not a macro anyway, it is a macro variable.

 

Putting a macro command %PUT into a data step will not work the way you want it to. Using PUT works the way you want it to work. Macro commands do not work like data step commands; macro variables do not work the same as data step variables. Just accept that.  You can use CALL SYMPUTX to put the value of a data step variable into a macro variable.

 

I now learned not to use var=0 in SAS

 

This is not correct, and you should un-learn it. In the code you posted, var=0 was incorrect, it did things you didn't want and produced incorrect answers. In other code, that does other things, var=0 is perfectly reasonable.

--
Paige Miller
cosmid
Lapis Lazuli | Level 10
Thanks again for the follow up. I'll try to unlearn what I learned and learn SAS again. Thanks!
Tom
Super User Tom
Super User

TIP.  Learn how to use SAS code first before attempting to learn how to use the macro processor to help you generate SAS code.

 

For your questions look for the following statements in the documentation 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p10bvg3wauedhan1qly0hiokirlv.htm

 

PUT

RETAIN

sum statement

SET

IF-THEN/ELSE

IF-Subsetting

PaigeMiller
Diamond | Level 26

@Tom wrote:

TIP.  Learn how to use SAS code first before attempting to learn how to use the macro processor to help you generate SAS code.


I agree with the above. It is worth repeating. Learn SAS first, before venturing into using Macro statements.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1254 views
  • 7 likes
  • 4 in conversation