BookmarkSubscribeRSS Feed
Corinthian94
Obsidian | Level 7

Hi all,

 

I have a question regarding removing certain observations from a dataset that has too many observations so that it matches up with another dataset that has the correct number of observations. Essentially, I'm trying to merge the two, but without all the extra observations, so that the variables from both datasets end up in the same one. They have a common variable which is what I am trying tot use to parse out (IDs of participants), of which the first dataset has some duplicates and extras that I am trying to parse out. Would anyone have advice on how to do this? I tried to do a proc compare and it shows me the difference in the individual observations but doesn't give me a list of which specific ones do not match up, which would be ideal. Thanks!

3 REPLIES 3
Reeza
Super User
PROC COMPARE gives you a full list of differences in the data sets, if you request them.

Look at the OUT options in PROC COMPARE.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n0c1y14wyd3u7yn1dmfcpaejllsn.htm
ballardw
Super User

Is the ID of participants duplicated in the set with extra observations or additional Ids?

 

If they are "additional" then you can filter the results.

To use Merge first sort both data sets by the Id variable. Then something like this should work:

 

data want;
    merge dataset1 (in=in1)
          extradataset (in=in2)
   ;
   by id;
   if in1 and in2;
run;

The IN= data set option creates a temporary variable with the name after the equal sign that is a numeric 1/0 for true/false that the current observation comes from that data set. With two such as above then when both the In1 and In2 variables are 1 then it means both records contribute and the IF is only true when both and the match you want may work. An IF statement like this example is called a "subsetting IF" and only keeps records in the output where the value is true. Details abound related to names and content of other variables.

 

If the "extra" is duplicated ID values then you need to provide some rule on how to know which of them is the desired of the multiple records, which you have not discussed.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 771 views
  • 2 likes
  • 3 in conversation