BookmarkSubscribeRSS Feed
rrevans
Calcite | Level 5

Hi,

I want to merge two data sets in the usual way, by a certain criterion, matching only the observations (by id) that are in both data sets. What I would like to know is how I might go about collecting the data that does not make it through, i.e. the "id's" that are not in both sets. We would like to see both data sets for our study. I was directed toward the tables statement as a way of doing this:

tables var / out = newdata;

But, the more I look into it, it doesn't seem like what I want to use. If I were to use it, where should I place this in my merge step? What should I use instead?

Thank you very much.

3 REPLIES 3
Haikuo
Onyx | Level 15

data both a_only b_only;

  merge a(in=a) b(in=b);

by id;

  if a and b then output both;

else if a and not b then output a_only;

else if b and not a then output b_only;

run;

Haikuo

rrevans
Calcite | Level 5

I've been trying this in a few different ways and still cannot get a useful output.

this is what i've been trying:

data merged_data_set_i_want;

     merge a (in=ina) b (in=inb);

     by id;

     if ina and inn then output;

     else if ina and not inb then output;

     else if inb and not ina then output;

run;

i get 'merged_data_set_i_want' to be actually bigger than either a or b. Maybe I worded my question wrong. What I'd like is the first merged data set 'merged_data_set_i_want' which I could get easily without the else if statements. What I'd also is a separate output file for the ids that did not merge. I've tried, output out = thing, but I'm not there just yet.

ballardw
Super User

By using just an output and a single dataset on the datastatement and not the separate dataset names with each output statement then the data is sent to the only set on the DATA statement. That's why you are getting more records. That is why Hai.Kuo had output <datasetname> for each statement with multiple dataset names on the data statement, that's the part that splits the output to the desired in both or only in one of the contributing data sets.

You also have a typo with "If ina and inn then" where the inn should be inb;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 432 views
  • 0 likes
  • 3 in conversation