Hi
when I am using the below code. I get wrong result.
data a1;
x='abc';
run;
data a2;
y='y';output;
y='';output;
run;
data a3;
set a1 a2;
if y='y' then x='yes';
run;
Since in the third obs of dataset a3, y= '', I expect to get the result x='', but in fact, the result is x='yes'.
What happens when the SET statement misbehaves—and how you can fix it!
By Kim Wilson on SAS Users February 21, 2014
https://blogs.sas.com/content/sgf/2014/02/21/what-happens-when-the-set-statement-misbehaves-and-how-...
KB0041936
Variables read using SET, MERGE, and UPDATE statements are automatically retained
https://sas.service-now.com/csm/en/variables-read-using-set-merge-and-update-statements-are-automati...
You get the results you want - with same number of steps - when you do something like this:
data a1;
x='abc';
run;
data a2;
y='y'; x=''; output;
y='' ; x=''; output;
run;
data a3;
set a1 a2;
if y='y' then x='yes';
run;
/* end of program */
BR, Koen
In short ... to understand why you get the results you get ... you have to consider what is happening in the program data vector (PDV).
When using a SET statement, the values are automatically retained until the next observations value is written to the program data vector. The program data vector is created at compile time.
In your code, the variables X and Y do not exist in both data sets but when the program data vector is built, it sees both variables and marks them to RETAIN because they exist in one of the data set where you SET them together.
When you create the X and Y variables on both data sets ... you will get the desired results.
[EDIT] Although I understand that this (making sure X and Y variables exist on both data sets) is an unfriendly solution (for the user). I’m actually more in favor of @Ksharp solution (reset variable X to missing). You find the @Ksharp solution somewhat lower in this topic thread.
BR, Koen
Hi,
I see the expected result. In the second observation the value is "yes" but it's empty in obs 1 and 3.
If the DATA step isn't following the logic you expect, I recommend using the DATA step debugger (available in EG and in SAS Studio within Viya) to step through.
@ChrisHemedinger wrote:
If the DATA step isn't following the logic you expect, I recommend using the DATA step debugger (available in EG and in SAS Studio within Viya) to step through.
Using the data step debugger is a good idea.
You can easily follow what's happening !
For information about the SAS Studio DATA step debugger, see Using the Data Step Debugger in SAS Studio: User’s Guide.
The original DATA step debugger remains available in the SAS windowing environment (and Enterprise Guide). The documentation for the original DATA step debugger has moved to SAS Code Debugger: User’s Guide.
BR, Koen
Firstly , I should claim
the variable X1 is from dataset A1, so
x='yes'
is retained for the next all obs.
The key point is all the variables from datasets would be retained .
So if you want to get right result , you should reset variable X .
data a1; x='abc'; run; data a2; y='y';output; y='';output; run; data a3; set a1 a2(in=ina2); if ina2 then call missing(x); if y='y' then x='yes'; run;
Adds some PUT statements to your data step to see what is happening.
80 data a3; 81 if 0 then set a1 a2; 82 put _n_=; 83 put '1. ' (in1 in2 x y) (=) ; 84 set a1(in=in1) a2(in=in2); 85 put '2. ' (in1 in2 x y) (=) ; 86 if y='y' then x='yes'; 87 put '3. ' (in1 in2 x y) (=) ; 88 run; _N_=1 1. in1=0 in2=0 x= y= 2. in1=1 in2=0 x=abc y= 3. in1=1 in2=0 x=abc y= _N_=2 1. in1=1 in2=0 x=abc y= 2. in1=0 in2=1 x= y=y 3. in1=0 in2=1 x=yes y=y _N_=3 1. in1=0 in2=1 x=yes y=y 2. in1=0 in2=1 x=yes y= 3. in1=0 in2=1 x=yes y= _N_=4 1. in1=0 in2=1 x=yes y= NOTE: There were 1 observations read from the data set WORK.A1. NOTE: There were 2 observations read from the data set WORK.A2. NOTE: The data set WORK.A3 has 3 observations and 2 variables.
So on the first iteration before the SET statement all of the variables are missing.
Once the SET executes X gets a value and the IN= flag variables are updated.
Since Y is empty the IF does nothing.
Now on the second iteration the values of X and also Y and IN1 and IN2 are remembered ("retained" ). Once the SET executes X is cleared since A1 is no longer contributing values. But then the IF statement is true so X is assigned a new value. Which can now never change since A1 is no longer being read from since its has been exhausted.
What is it that you want to happen instead?
Do you want X to get set to missing when Y is not 'y' then say that in the code?
if y='y' then x='yes';
else x=' ';
Do you want X to remember the value from the last observation of A1 when Y is not 'y'? If so then you probably need to introduce a new variable.
if in1 then oldx=x; retain oldx; if y='y' then x='yes'; else x=oldx;
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.