BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Niugg2010
Obsidian | Level 7

I met a strange issue when I merge two datasets. Below is a simple example 


data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sort data=b; by subjid visit; run;
proc sort data=a; by subjid visit; run;

data c;
merge a b;
by subjid visit;
if param='aaa' then call missing(flag);
/* if visit='vis01' then call missing(flag);*/
run;

I merged a and b. I hope only when param='aaa'  then flag= missing. but if I run above code, all flag values are missing. I know if I do two data steps, I can resolve this issue. But I am not sure why this happened, and how I can avoild this issue. Please help me out. Thanks

Capture.JPG

 

George

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

All variables that are sourced from input dataset are "retained".

That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.

 

Once you change the value of FLAG it keeps that value until it is changed.  

SAS will not read the next observation from B until the BY group values change.

 

To see if better use IF PARAM='bbb' instead of 'aaa'.

 

Just use a NEW variable for your desired result.

data c;
  merge a b;
  by subjid visit;
  if param ne 'aaa' then new_flag=flag;
run;

Result

Obs    subjid     visit    param    flag    new_flag

 1     101-001    vis01     aaa      Y
 2     101-001    vis01     bbb      Y         Y
 3     101-001    vis01     ccc      Y         Y
 4     101-001    vis02     aaa      Y
 5     101-001    vis02     bbb      Y         Y
 6     101-001    vis02     ccc      Y         Y

 

 

 

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

For each match, the dataset b is read only once (as there is only one observation for each BY group). After that, the value you set is retained throughout the group. As an exercise, set flag to some non-missing value.

Tom
Super User Tom
Super User

All variables that are sourced from input dataset are "retained".

That is what allows you attach the FLAG value from the one observation in B to ALL of the observations in A for that BY group.

 

Once you change the value of FLAG it keeps that value until it is changed.  

SAS will not read the next observation from B until the BY group values change.

 

To see if better use IF PARAM='bbb' instead of 'aaa'.

 

Just use a NEW variable for your desired result.

data c;
  merge a b;
  by subjid visit;
  if param ne 'aaa' then new_flag=flag;
run;

Result

Obs    subjid     visit    param    flag    new_flag

 1     101-001    vis01     aaa      Y
 2     101-001    vis01     bbb      Y         Y
 3     101-001    vis01     ccc      Y         Y
 4     101-001    vis02     aaa      Y
 5     101-001    vis02     bbb      Y         Y
 6     101-001    vis02     ccc      Y         Y

 

 

 

mkeintz
PROC Star

Your code could work as is - only if the param "aaa" is the last value in each id.  I.e. you could replace the 

 

proc sort data=a; by subjid visit; run;

with

proc sort data=a; by subjid visit descending param; run;

Not an appealing solution.  You'll be better off with something like @Tom 's suggestion.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sajid01
Meteorite | Level 14

Hello @Niugg2010 
A simple way is to use Proc SQL. It works without any issue.

data a;
subjid="101-001";
do visit='vis01', 'vis02';
do param="aaa", 'bbb', 'ccc';
output;
end;
end;
run;
data b;
subjid="101-001";
do visit='vis01', 'vis02';
flag="Y";
output;
end;
run;
proc sql;
create table c as 
select a.*,
case 
when a.param="aaa" then ' '
else b.flag
end as Flag
from a left join b 
on a.subjid=b.subjid
and a.visit=b.visit;
quit;

The output would be

Sajid01_0-1651247097926.png

 

Niugg2010
Obsidian | Level 7

I paid less attention to this point in merge in my previous coding. Thanks everyone for help.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1414 views
  • 0 likes
  • 5 in conversation