Hi,
I am pulling info from two datasets(sources) and i am suspecting some common records(repeats) by the encounters variable.
we need to count only once when there are repeats
Can we do the simple merge step like shown below?????
/*DOES THIS TESTER DATASET GIVE EXCLUSIVES TO EACH OF THE TWO DATASETS AND ALSO THE COMMONS
data tester;========getting 193
merge one two;
by encounters;
run;
When i split the data i am getting like this...........total number here match the count we get above
data test1 test2 test3;
merge one(in=a) two(in=b);
by encounters;
if a and not b then output test1;========62
if b and not a then output test2;========48
if a and b then output test3;============83
run;
Thanks
What is the question?
If you want to count unique values of ENCOUNTERS? Overall and by each dataset?
data test1;
if eof then put n= a= b= ;
merge one(in=in1) two(in=in2) end=eof;
by encounters;
if first.encounters then do;
n+1;a+in1; b+in2;
end;
run;
If the encounter repeats in both data sources, there will still be a duplicate entry in your merged data.
data one;
input encounter fakedata;
datalines;
1 1
1 1
1 2
;
run;
data two;
input encounter fakedata2;
datalines;
1 3
1 4
1 5
1 6
;
run;
data tester;
merge one two;
by encounter;
run;
Data step merge does NOT support many to many appropriately and still will manage one-to-many without failing output.
If you notice the TESTER result, this is what happens: starting with the first record for a by variable in each dataset, merges 1:1 until either data set runs out of records for the by group at which points the last value is retained and output with all subsequent records from the other source that have the same by value. Hence this does by no mean kill duplicates from either set.
So in short, no it does not solve your issue. At least not how you've worded it.
Hi,
I think I put it in wrong words.....
when I meant repeats ...........I mean the same encounter is present in source1 and also source 2 .
I mean if we set those two datasets we would have repeats by encounters
so I was asking if the simple merge I showed would bring up exclusive encounters to both as well as common to both???????
in total I was wondering if it would bring up the unique records from both the datasets???????
Thanks
If each individual file does not have duplicates then the example you showed will separate them into the three possible groups.
What is the question?
If you want to count unique values of ENCOUNTERS? Overall and by each dataset?
data test1;
if eof then put n= a= b= ;
merge one(in=in1) two(in=in2) end=eof;
by encounters;
if first.encounters then do;
n+1;a+in1; b+in2;
end;
run;
Hi,
If counting the number of duplicate row in a table can help you then please try it.
It will generate an output column that shows how many times each row occurs.
Proc SQL;
select * , count (*) as count
from DSN
group by var1, var2
having count (*) >1;
quit;
Regards
Uma Shanker Saini
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.