BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
d0816
Quartz | Level 8

HI,

here is the sample dataset which I want to subset into dataset A and B. In dataset B, keep PINs if all obs for session is blank. If one of the Obs in "Session" for a  PIN is not blank then keep all the obs for that PIN in dataset B. Please suggest.

dataset  
IDLevelSession
1Z 
2X 
2Y 
2Z 
3Z 
3X 
4XA
4X 
4YB

 

 

dataset B  
IDLevelSession
1Z 
2X 
2Y 
2Z 
3Z 
3X 

 

 

dataset A  
IDLevelSession
4XA
4X 
4YB
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Merge a subset of non-blanks with the entire dataset:

 

dm 'clear log';
data have;
infile datalines missover;
input id	Level :$1.	Session :$1.;
datalines;
1	Z	 
2	X	 
2	Y	 
2	Z	 
3	Z	 
3	X	 
4	X	A
4	X	 
4	Y	B
run;
data a b;
   merge have (where=(session^=' ') in=anynonblanks)  have;
   by id;
   if anynonblanks then output b;
   else output a;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

Merge a subset of non-blanks with the entire dataset:

 

dm 'clear log';
data have;
infile datalines missover;
input id	Level :$1.	Session :$1.;
datalines;
1	Z	 
2	X	 
2	Y	 
2	Z	 
3	Z	 
3	X	 
4	X	A
4	X	 
4	Y	B
run;
data a b;
   merge have (where=(session^=' ') in=anynonblanks)  have;
   by id;
   if anynonblanks then output b;
   else output a;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
d0816
Quartz | Level 8
This worked in my original dataset. Thank you so much.
novinosrin
Tourmaline | Level 20

Nice one @mkeintz how did  4 X "' "  get to be true nonblanks. Where am i missing the point?

mkeintz
PROC Star

@novinosrin

 

There are two attributes of MERGE being utilized here:

 

  1. Values in the "right" dataset supersede those in the "left", for all common variables.   In this case, for id 4 the "left" has the two non-blank observations, in this order:
      1st rec:  4 X A
      2nd rec:  4 Y B
    but the "right" dataset has all the ID=4 observations in this order:
      1st rec:    4 X A
      2nd rec:    4 X blank
      3rd rec:    4 X B

    Within a BY group 1st rec matches with first rec, 2nd with 2nd, etc.  So the merge results are
       1st rec:   4 X A   is "superseded" by 4 X A
       2nd rec:  4 Y B  is "superseded" by 4 X blank
       3rd rec:   4 Y B (see below) is "supersede" by 4 X B

  2. When either the left or right merged dataset is shorter within a BY group, then it's last observation is repeatedly matched to all the "excess" observations of the dataset with the longer BY group.  That's why in my example of the 3rd rec above you see "4 Y B" propagated.   And the "in=" parameter is also propagated.  So the "excess" observation goes to the same destination as all the other ID=4 obs.

In fact, if you merge more than two datasets, then the principle is extended: for any common variable in multiple datasets, the rightmost value prevails, assuming the variable is of the same type (numeric or character) for all datasets.  If a variable is of both types, the merge fails.  In the case of other attributes (length, label, format), they are inherited from the leftmost dataset, since that's the first encounter the sas compiler has with the variable.


values  supersede

What does MERGE X Y; BY ID; do when an ID group has fewer observations in (say)  X than in Y.   It propagates the last observation in X to be associated with the "excess" observations in Y. 

 

When an ID group is being MERGEd, and  there are no an equal number of observations in the "left" and "right" datasets,

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

@mkeintz Thank you so much for explaining at length. I really appreciate the privilege of receiving your time and knowledge.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1774 views
  • 1 like
  • 3 in conversation