Solved: MERGE

robertrao · Posted 12-16-2013 03:58 PM

Hi,

I am pulling info from two datasets(sources) and i am suspecting some common records(repeats) by the encounters variable.

we need to count only once when there are repeats

Can we do the simple merge step like shown below?????

/*DOES THIS TESTER DATASET GIVE EXCLUSIVES TO EACH OF THE TWO DATASETS AND ALSO THE COMMONS
data tester;========getting 193
merge one two;
by encounters;
run;

When i split the data i am getting like this...........total number here match the count we get above

data test1 test2 test3;

merge one(in=a) two(in=b);

by encounters;

if a and not b then output test1;========62

if b and not a then output test2;========48

if a and b then output test3;============83

run;

Thanks

Tom · Posted 12-16-2013 05:02 PM

What is the question?

If you want to count unique values of ENCOUNTERS? Overall and by each dataset?

data test1;

if eof then put n= a= b= ;

merge one(in=in1) two(in=in2) end=eof;

by encounters;

if first.encounters then do;

n+1;a+in1; b+in2;

end;

run;

View solution in original post

Vince28_Statcan · Posted 12-16-2013 04:47 PM

If the encounter repeats in both data sources, there will still be a duplicate entry in your merged data.

data one;

input encounter fakedata;

datalines;

1 1

1 2

;

run;

data two;

input encounter fakedata2;

datalines;

1 3

1 4

1 5

1 6

;

run;

data tester;

merge one two;

by encounter;

run;

Data step merge does NOT support many to many appropriately and still will manage one-to-many without failing output.

If you notice the TESTER result, this is what happens: starting with the first record for a by variable in each dataset, merges 1:1 until either data set runs out of records for the by group at which points the last value is retained and output with all subsequent records from the other source that have the same by value. Hence this does by no mean kill duplicates from either set.

So in short, no it does not solve your issue. At least not how you've worded it.

robertrao · Posted 12-16-2013 07:19 PM

Hi,

I think I put it in wrong words.....

when I meant repeats ...........I mean the same encounter is present in source1 and also source 2 .

I mean if we set those two datasets we would have repeats by encounters

so I was asking if the simple merge I showed would bring up exclusive encounters to both as well as common to both???????

in total I was wondering if it would bring up the unique records from both the datasets???????

Thanks

Tom · Posted 12-16-2013 09:14 PM

If each individual file does not have duplicates then the example you showed will separate them into the three possible groups.

Tom · Posted 12-16-2013 05:02 PM

What is the question?

If you want to count unique values of ENCOUNTERS? Overall and by each dataset?

data test1;

if eof then put n= a= b= ;

merge one(in=in1) two(in=in2) end=eof;

by encounters;

if first.encounters then do;

n+1;a+in1; b+in2;

end;

run;

umashankersaini · Posted 12-17-2013 04:19 AM

Hi,

If counting the number of duplicate row in a table can help you then please try it.

It will generate an output column that shows how many times each row occurs.

Proc SQL;

select * , count (*) as count

from DSN

group by var1, var2

having count (*) >1;

quit;

Regards

Uma Shanker Saini

MERGE

Re: MERGE

Re: MERGE

Re: MERGE

Re: MERGE

Re: MERGE

Re: MERGE

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!