I met a strange issue when I merge two datasets. Below is a simple example
data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sort data=b; by subjid visit; run;
proc sort data=a; by subjid visit; run;
data c;
merge a b;
by subjid visit;
if param='aaa' then call missing(flag);
/* if visit='vis01' then call missing(flag);*/
run;
I merged a and b. I hope only when param='aaa' then flag= missing. but if I run above code, all flag values are missing. I know if I do two data steps, I can resolve this issue. But I am not sure why this happened, and how I can avoild this issue. Please help me out. Thanks
George
All variables that are sourced from input dataset are "retained".
That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.
Once you change the value of FLAG it keeps that value until it is changed.
SAS will not read the next observation from B until the BY group values change.
To see if better use IF PARAM='bbb' instead of 'aaa'.
Just use a NEW variable for your desired result.
data c;
merge a b;
by subjid visit;
if param ne 'aaa' then new_flag=flag;
run;
Result
Obs subjid visit param flag new_flag 1 101-001 vis01 aaa Y 2 101-001 vis01 bbb Y Y 3 101-001 vis01 ccc Y Y 4 101-001 vis02 aaa Y 5 101-001 vis02 bbb Y Y 6 101-001 vis02 ccc Y Y
For each match, the dataset b is read only once (as there is only one observation for each BY group). After that, the value you set is retained throughout the group. As an exercise, set flag to some non-missing value.
All variables that are sourced from input dataset are "retained".
That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.
Once you change the value of FLAG it keeps that value until it is changed.
SAS will not read the next observation from B until the BY group values change.
To see if better use IF PARAM='bbb' instead of 'aaa'.
Just use a NEW variable for your desired result.
data c;
merge a b;
by subjid visit;
if param ne 'aaa' then new_flag=flag;
run;
Result
Obs subjid visit param flag new_flag 1 101-001 vis01 aaa Y 2 101-001 vis01 bbb Y Y 3 101-001 vis01 ccc Y Y 4 101-001 vis02 aaa Y 5 101-001 vis02 bbb Y Y 6 101-001 vis02 ccc Y Y
Your code could work as is - only if the param "aaa" is the last value in each id. I.e. you could replace the
proc sort data=a; by subjid visit; run;
with
proc sort data=a; by subjid visit descending param; run;
Not an appealing solution. You'll be better off with something like @Tom 's suggestion.
Hello @Niugg2010
A simple way is to use Proc SQL. It works without any issue.
data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sql;
create table c as
select a.*,
case
when a.param="aaa" then ' '
else b.flag
end as Flag
from a left join b
on a.subjid=b.subjid
and a.visit=b.visit;
quit;
The output would be
I paid less attention to this point in merge in my previous coding. Thanks everyone for help.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.