- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I met a strange issue when I merge two datasets. Below is a simple example
data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sort data=b; by subjid visit; run;
proc sort data=a; by subjid visit; run;
data c;
merge a b;
by subjid visit;
if param='aaa' then call missing(flag);
/* if visit='vis01' then call missing(flag);*/
run;
I merged a and b. I hope only when param='aaa' then flag= missing. but if I run above code, all flag values are missing. I know if I do two data steps, I can resolve this issue. But I am not sure why this happened, and how I can avoild this issue. Please help me out. Thanks
George
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All variables that are sourced from input dataset are "retained".
That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.
Once you change the value of FLAG it keeps that value until it is changed.
SAS will not read the next observation from B until the BY group values change.
To see if better use IF PARAM='bbb' instead of 'aaa'.
Just use a NEW variable for your desired result.
data c;
merge a b;
by subjid visit;
if param ne 'aaa' then new_flag=flag;
run;
Result
Obs subjid visit param flag new_flag 1 101-001 vis01 aaa Y 2 101-001 vis01 bbb Y Y 3 101-001 vis01 ccc Y Y 4 101-001 vis02 aaa Y 5 101-001 vis02 bbb Y Y 6 101-001 vis02 ccc Y Y
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For each match, the dataset b is read only once (as there is only one observation for each BY group). After that, the value you set is retained throughout the group. As an exercise, set flag to some non-missing value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All variables that are sourced from input dataset are "retained".
That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.
Once you change the value of FLAG it keeps that value until it is changed.
SAS will not read the next observation from B until the BY group values change.
To see if better use IF PARAM='bbb' instead of 'aaa'.
Just use a NEW variable for your desired result.
data c;
merge a b;
by subjid visit;
if param ne 'aaa' then new_flag=flag;
run;
Result
Obs subjid visit param flag new_flag 1 101-001 vis01 aaa Y 2 101-001 vis01 bbb Y Y 3 101-001 vis01 ccc Y Y 4 101-001 vis02 aaa Y 5 101-001 vis02 bbb Y Y 6 101-001 vis02 ccc Y Y
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your code could work as is - only if the param "aaa" is the last value in each id. I.e. you could replace the
proc sort data=a; by subjid visit; run;
with
proc sort data=a; by subjid visit descending param; run;
Not an appealing solution. You'll be better off with something like @Tom 's suggestion.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Niugg2010
A simple way is to use Proc SQL. It works without any issue.
data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sql;
create table c as
select a.*,
case
when a.param="aaa" then ' '
else b.flag
end as Flag
from a left join b
on a.subjid=b.subjid
and a.visit=b.visit;
quit;
The output would be
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I paid less attention to this point in merge in my previous coding. Thanks everyone for help.