Obtaining the data that does not meet the criteria: tables statement within a merge?

Reply
Contributor
Posts: 20

Obtaining the data that does not meet the criteria: tables statement within a merge?

Hi,

I want to merge two data sets in the usual way, by a certain criterion, matching only the observations (by id) that are in both data sets. What I would like to know is how I might go about collecting the data that does not make it through, i.e. the "id's" that are not in both sets. We would like to see both data sets for our study. I was directed toward the tables statement as a way of doing this:

tables var / out = newdata;

But, the more I look into it, it doesn't seem like what I want to use. If I were to use it, where should I place this in my merge step? What should I use instead?

Thank you very much.

Respected Advisor
Posts: 3,124

Re: Obtaining the data that does not meet the criteria: tables statement within a merge?

data both a_only b_only;

  merge a(in=a) b(in=b);

by id;

  if a and b then output both;

else if a and not b then output a_only;

else if b and not a then output b_only;

run;

Haikuo

Contributor
Posts: 20

Re: Obtaining the data that does not meet the criteria: tables statement within a merge?

I've been trying this in a few different ways and still cannot get a useful output.

this is what i've been trying:

data merged_data_set_i_want;

     merge a (in=ina) b (in=inb);

     by id;

     if ina and inn then output;

     else if ina and not inb then output;

     else if inb and not ina then output;

run;

i get 'merged_data_set_i_want' to be actually bigger than either a or b. Maybe I worded my question wrong. What I'd like is the first merged data set 'merged_data_set_i_want' which I could get easily without the else if statements. What I'd also is a separate output file for the ids that did not merge. I've tried, output out = thing, but I'm not there just yet.

Super User
Posts: 10,539

Re: Obtaining the data that does not meet the criteria: tables statement within a merge?

By using just an output and a single dataset on the datastatement and not the separate dataset names with each output statement then the data is sent to the only set on the DATA statement. That's why you are getting more records. That is why Hai.Kuo had output <datasetname> for each statement with multiple dataset names on the data statement, that's the part that splits the output to the desired in both or only in one of the contributing data sets.

You also have a typo with "If ina and inn then" where the inn should be inb;

Ask a Question
Discussion stats
  • 3 replies
  • 171 views
  • 0 likes
  • 3 in conversation