What does this code do?

Reply
Contributor
Posts: 73

What does this code do?

Say I have a dataset with variables a, b, c, n, n2, and run this piece of data procedure code. What does it do exactly? Thanks.

 

data yyy;
set xxx;
by a b c;
if first.c then do;
%let n_used=,;
%let n2_used=,;
end;
if find(symget('n_used'),cat(", ",n,",")) or find(symget('n2_used'),cat(", ",n2,",")) then match=0;
else do; match=1; call symputx('n_used',cat(symget('n_used')," ",n,","));
call symputx('n2_used',cat(symget('n2_used')," ",n2,","));
end;
run;

Super User
Super User
Posts: 7,720

Re: What does this code do?

Sorry, that code is not right.  You are mixing macro in with Base SAS, they do not work like that, this bit:
if first.c then do;
  %let n_used=,;
  %let n2_used=,;
end;

Now, the %let statements execute regardless of the if statement - the if is Base SAS, the % is macro.  Macro is a text generation which happens before Base SAS code is executed, so the above resolves to Base SAS:
if first.c then do;
end;

I.e. pointless.  So the question really is what are you trying to do, where did this code come from?

 

Your code really looks like this with some comments:

/* These create tow macro variables, n_used which is set to , */
/* and n2_used which is also set to , */
%let n_used=,;
%let n2_used=,;

data yyy;
  set xxx;
  by a b c;  
  if first.c then do;
  end;
/* Decode to if , is found in the string ", " and n and "," which of course it is for the first observation
  as you set that above */
  if find(symget('n_used'),cat(", ",n,",")) or find(symget('n2_used'),cat(", ",n2,",")) then match=0;
  else do; 
  /* This is never called as n_used always contains , */
    match=1; 
    call symputx('n_used',cat(symget('n_used')," ",n,",")); 
    call symputx('n2_used',cat(symget('n2_used')," ",n2,","));
  end;
run;
Respected Advisor
Posts: 3,156

Re: What does this code do?

No offence, but I need to be frank upfront. If you are new to SAS and try to learn to some, learn from those who are good. Mixing macro and SAS code is bad practice (even it is part of bigger Macro). Macro delivers 'SAS code', then SAS code delivers data. And trying to understand a code like this is wasting of your time.

Contributor
Posts: 73

Re: What does this code do?

i have attached the dataset, please try running the following code. it does seem to run fine on my end. thanks.

 

data yyy;
set xxx;
by a b c;
if first.c then do;
%let n_used=,;
%let n2_used=,;
end;
if find(symget('n_used'),cat(", ",n,",")) or find(symget('n2_used'),cat(", ",n2,",")) then match=0;
else do; match=1; call symputx('n_used',cat(symget('n_used')," ",n,","));
call symputx('n2_used',cat(symget('n2_used')," ",n2,","));
end;
run;

Attachment
Super User
Super User
Posts: 7,720

Re: What does this code do?

This does exactly the same:

%let n_used=,;
%let n2_used=,;

data yyy;
  set a.xxx;
  by a b c;
  if find(symget('n_used'),cat(", ",n,",")) or find(symget('n2_used'),cat(", ",n2,",")) then match=0;
  else do; 
    match=1; 
    call symputx('n_used',cat(symget('n_used')," ",n,","));
    call symputx('n2_used',cat(symget('n2_used')," ",n2,","));
  end;
run;

What this is doing:

1) Create 2 macro variables.

2) for each observation in the dataset do:

3)   if the text comma space value comma, is found in the contents of the macro variable n_used or n_used2 then set match to 0

4)   Otherwise set match to 1, then add to the macro variables the value of n with a space and a comma. 

So each iteration the macro variable where value doesn't exist in there is expanded by that element.  If it does exist then the observation is flagged as 0.  

 

As noted before this really isn't good programming practice.  The fact that it "works" does not change this.  There are many ways to get the solution.  For example:

data yyy1;
  set a.xxx;
  if lag(n)=n or lag(n2)=n2 then match=0;
  else match=1;
run;
Contributor
Posts: 73

Re: What does this code do?

i can easily understand your code, but i still don't see in my code, how n and n2 are compared to their previous records to generate match. could you please elaborate on how it works?

 

also, i found some differences in the results generated by my code versus yours, shown below. match_lag is the results generated by your code. thanks.

nn2matchmatch_lag
813948629628311
813949629628300
813948629628401
813949629628410
Super User
Super User
Posts: 7,720

Re: What does this code do?

The differences would be doen to the data sorting + logic of it.

 

What the code is doing is, take this as your data:

n n2 match match_lag
813948 6296283 1 1
813949 6296283 0 0
813948 6296284 0 1
813949 6296284 1

0

 

On iteration:

1, n_used=,n_used2=, - logic does not find n in n_used or n2 in n_used2 so goes to the else which adds n to n_used and n2 to n_used2

 

2, n_used=, 813948, n_used2=, 6296283,  logic does find n in the list given by n_used so sets match to 1

 

...

Contributor
Posts: 73

Re: What does this code do?

so it sounds liike the logic treats the first record with match = 1, and set n_used and n2_used as the values of n and n2 of the first record, and compares to the ensuing records, where if either one matches, match = 0, and if neither one matches, match = 1 and the n_used and n2_used are updated with their correpsonding values of the current record. does that sound about right?

 

and if so, how does the by statment on top come into play here? this process gets done for each combination of a, b and c, and starts fresh every time this combination changes? thanks.

Super User
Super User
Posts: 6,845

Re: What does this code do?

[ Edited ]

It probably doesn't do what you thought or wanted it to do.  

 

As others have noted the %LET statements will run once, before the DATA step starts, and not at the beginning of each BY group.  To have the macro variables reset for each BY group you would need to use CALL SYMPUTX.

if first.c then do;
  call symputx('n_used',',');
  call symputx('n_used2',',');
end;

 

It almost looks like you are trying to generate a comma delimited list of all of the distinct values of N within each BY group.  Or perhaps you are trying to generate the MATCH variable to indicate the first time that N appeared within a group?

It would probably be better to remove the macro code completely.

data yyy;
  set xxx;
  by a b c;
  length n_used $3000 ;
  if first.c then n_used=' ';
  match= not indexw(n_used,cats(n));
  if match then n_used=catx(',',n_used,n);
  if last.c ;
  keep a b c match n_used;
run;

 

Ask a Question
Discussion stats
  • 8 replies
  • 539 views
  • 4 likes
  • 4 in conversation