04-09-2013 04:28 PM
I want to merge two data sets in the usual way, by a certain criterion, matching only the observations (by id) that are in both data sets. What I would like to know is how I might go about collecting the data that does not make it through, i.e. the "id's" that are not in both sets. We would like to see both data sets for our study. I was directed toward the tables statement as a way of doing this:
tables var / out = newdata;
But, the more I look into it, it doesn't seem like what I want to use. If I were to use it, where should I place this in my merge step? What should I use instead?
Thank you very much.
04-09-2013 05:25 PM
data both a_only b_only;
merge a(in=a) b(in=b);
if a and b then output both;
else if a and not b then output a_only;
else if b and not a then output b_only;
04-18-2013 12:17 PM
I've been trying this in a few different ways and still cannot get a useful output.
this is what i've been trying:
merge a (in=ina) b (in=inb);
if ina and inn then output;
else if ina and not inb then output;
else if inb and not ina then output;
i get 'merged_data_set_i_want' to be actually bigger than either a or b. Maybe I worded my question wrong. What I'd like is the first merged data set 'merged_data_set_i_want' which I could get easily without the else if statements. What I'd also is a separate output file for the ids that did not merge. I've tried, output out = thing, but I'm not there just yet.
04-18-2013 01:55 PM
By using just an output and a single dataset on the datastatement and not the separate dataset names with each output statement then the data is sent to the only set on the DATA statement. That's why you are getting more records. That is why Hai.Kuo had output <datasetname> for each statement with multiple dataset names on the data statement, that's the part that splits the output to the desired in both or only in one of the contributing data sets.
You also have a typo with "If ina and inn then" where the inn should be inb;