BookmarkSubscribeRSS Feed
MikeFox
Fluorite | Level 6

Hello everyone!

 

I am trying to build a data step dynamically, based on the value of the variable _N_. After some analysys and experimenting, it seems to me that I have to convert it to a macro before using it (I beg your trust on this part) but the problem is that the macro variable always resolves to the _N_ associated with the last observation of the ds in the set statement, if some_ds=SASHELP.CLASS then row_numbers resolves to 19.

 

This is very odd to me, it's my understanding that the call symput is an execution phase statement, and I expect it to run every time the data step goes through the implicit loop. In other words, for the sashelp.class example, I would expect the call symput to run 19 times, the row_number variable being progressively overswritten with values from 1 to 19. Here some code for clarification:

 

 

%macro my_macro;

data final; set some_ds; call symput("row_number",_N_);
output;
%do i=1 %to &some_value;
%if %sysevalf(&row_number=%substr(%scan(&my_peculiar_string,&i,"|"),2,2)) %then %do;
         my_field=some_calculation;
           %end;

           %else %do;
                my_field=some_other_calculations;
            %end;

%end;
%mend my_macro;

The condition inside the %sysevalf is never met because the row_numbers resolves to 19 instead than 1,2,3,4,5,6...19. I found something online stating this:

 

"The macro variable created by CALL SYMPUT can not be referenced inside the creating data step. But CALL EXECUTE, SYMGET and RESOLVE can be used to reference a macro variable with in the data step. " 

 

This is then followed up with 

 

"CALL EXECUTE statements get resolved first and then moved to the input stack while iterating the data step. If the argument is a SAS statement then it gets moved to the input stack immediately and starts to execute at the data step
boundary"

 

Maybe it's because I'm tired but I can't wrap my head around this, any bit of help would be really appreciated. Link to the article for reference: https://www.lexjansen.com/phuse/2009/po/PO15.pdf

 

 

 

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

The call symputx does execute 19 times. On the first observation &row_number=1. And on the second observation &row_number=2 and so on until on the last row, &row_number=19. So when the data step ends, you have &row_number=19. But you can't use this macro variable inside the data step that creates it.

 

CALL EXECUTE allows you to use that row number (or any other value of variables in the SAS data set) in SAS commands that execute once the data step is over. So this pointless little piece of code runs a PROC PRINT on 19 different data sets named FAKEDATA1 to FAKEDATA19 (you get errors in the log because those data sets don't exist, but you can see what CALL EXECUTE is doing by looking at the log)

 

data _null_;
    set sashelp.class;
    row_number=_N_;
    call execute(cats('proc print data=fakedata',row_number,'; run;'));
run;

 

--
Paige Miller
Quentin
Super User

There are tricky timing issues involved in the macro language.  In particular, if you put macro language code "inside" a DATA step, the macro language code is executed before the DATA step code is executed (or even finished compiling).

 

Consider this simpler example along the lines of what you are doing:

%macro my_macro() ;
  %local row_number ;

  data _null_ ;
    set sashelp.class ;
    call symputx("row_number",_N_) ;
    %put before dataset step has compiled: &=row_number ;
  run ;

  %put after dataset step has executed: &=row_number ;
%mend my_macro ;

%my_macro()

The %PUT statement "inside" the DATA step executes before the DATA step has finished compiling.  So it executes before CALL SYMPUTX has executed even once.  It returns:

before dataset step has compiled: ROW_NUMBER=

You are correct that the CALL SYMPUTX  runs at execution time, and it will update the value of the macro variable row_number 19 times (once for each iteration of the DATA step loop). But the macro language code %DO statement does NOT rut at data step execution time, it runs before the data step has compiled.  So it runs only once, even though you've placed it 'inside' a DATA step.

 

As you saw in the docs, there are some tricky ways to look at the changing value of a macro variable while the data step is executing, but note that CALL EXECUTE, SYMGET, and RESOLVE are all data step functions, not macro functions.  Thus they run during data step execution time.  So you can do stuff like:

 

%macro my_macro() ;
  %local row_number ;

  data _null_ ;
    set sashelp.class ;
    call symputx("row_number",_N_) ;
    row_number=symget("row_number") ;
    put "For _N_=" _N_ "Macro variable row_number resolved to " row_number ;
  run ;
%mend my_macro ;

%my_macro()

While technically it is possible to generate a macro variable from data and use the macro variable in the same data step, usually when you have to resort to such contortions it's a symptom of a confused design.  For example, if you want to make some conditional decision based on the value of the DATA step iterator _N_, why do you feel the need to use the macro language %IF statement, rather than a DATA step IF statement?

 

As an aside, the reason that you see &row_number=19 is because in your code, CALL SYMPUT is creating a global macro variable.  So when you run the macro, there is already a global macro variable row_number with the value 19 (from the prior time you ran the macro).  If you add a %LOCAL row_number; statement to your macro, it will make this much easier to debug.  Global macro variables can cause all sorts of confusion, and it's generally best to avoid them when possible, to avoid collisions like this.

 

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ballardw
Super User

Suggestion: provide enough information, such as data and the value of all the macro variables other than Row_number and what you expect the result to be given that example.

 

It is real hard to imagine what you might expect given multiple macro variables that are undefined. I have to say that your use of

%substr(%scan(&my_peculiar_string,&i,"|"),2,2)

compared to the line of the data file seems to be pointing to a poor construction of &my_peculiar_string. That seems to indicate that you placed a lot of unneeded information into the string and now you need to parse it out with some potential issues such as substrings delimited by the | less than 4 characters or having non-digit characters in those positions.

yabwon
Onyx | Level 15

How your dynamically created data step would look like if you would do it by hand? As @Quentin wrote, maybe it is about the design. If you show what is your "4GL expectation" it will be easier for us to discuss the problem, and maybe propose a solution.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

Analyse the following 3 codes, look into the log and see how macro vs. 4gl execution timing works:

 

%let a = 42;
data _null_;
call symput("a", "17");
x = &a.;
put x = ;
run;


%let b = 42;
data _null_;
call symput("b", "17");
y = symget("b");
put y = ;
run;


%let c = 42;
data _null_;
call symput("c", "17");
z = resolve('&c.');
put z = ;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

So your first statement 

I am trying to build a data step dynamically, based on the value of the variable _N_. 

is the problem.

 

If by _N_ you mean the automatic variable that SAS assigns the iteration number to when a data step runs then the only way you could use it to build a data step is to build a data step that you want to run after the one that is currently running that is setting values to _N_.  You can do that by using CALL EXECUTE() or just writing the code you want to generate to a file and running it via %INCLUDE.

 

Now let's look at your macro.  First let's clean up the indentations so it is readable.

%macro my_macro;

data final;
  set some_ds;
  call symput("row_number",_N_);
  output;
  %do i=1 %to &some_value;
    %if %sysevalf(&row_number=%substr(%scan(&my_peculiar_string,&i,"|"),2,2)) %then %do;

  my_field=some_calculation;
    %end;
    %else %do;

  my_field=some_other_calculations;
    %end;
  %end;
run;

%mend my_macro;

So your macro is going to generate a data step that starts with some static statements. Then based on the value of the magic macro variable SOME_VALUE (Magic macro variables are those referenced in a macro definition that are not an input parameter and where there is no explanation in any comments in the macro about what they are and what types of values they have) a series of assignment statements that overwrite the value of MY_FIELD (after the observation has already been written to FINAL).  Since the value of ROW_NUMBER that is in place when the macro runs is a constant the %IF/%THEN/%ELSE test will go down the same path each time so all of the assignment statements will be exactly the same.

 

So you are trying to run a data step that looks like this:

data final;
  set some_ds;
  call symput("row_number",_N_);
  output;
  my_field=some_other_calculations;
  my_field=some_other_calculations;
  my_field=some_other_calculations;
  my_field=some_other_calculations;
  ...
run;

 Which will just make a copy of SOME_DS into FINAL.  If MY_FIELD exists in SOME_DS then its value will be unchanged since the OUTPUT statement is before the assignment statements.  If MY_FIELD is a new variable then its value will be missing on every observation since SAS will reset a new variable to missing when it starts a new iteration of the data step.

 

The value of ROW_NUMBER will be the number of observations read. 

Tom
Super User Tom
Super User

Perhaps you should just forget you read that paper?

 

Look at the newer, more powerful, CALL SYMPUTX() function/routine instead.  

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1079 views
  • 3 likes
  • 6 in conversation