DATA Step, Macro, Functions and more

distinct id's

Reply
Super Contributor
Posts: 673

distinct id's

Dataset A has a million Id's and Dataset B has million id's.
how to find the count of distinct id's that are common in two datasets?
Super Contributor
Super Contributor
Posts: 3,174

Re: distinct id's

PROC SQL - two SELECT DINSTINCT(keyvar1 keyvar2), one for each file and then a JOIN, possibly using a sub-query in the process.

For a DATA step approach, suggesting setting a VIEW for each file, then do two PROC SORT NODUPKEY with your BY variable list, then a MERGE with a BY statement, and using the IN= dataset option, you can then test your IN= variables for both files contributing to the MERGE.

Scott Barry
SBBWorks, Inc.
Respected Advisor
Posts: 4,173

Re: distinct id's

Hi

A SQL approach:

data haveA;
do id=1,2,2,3,4,5,5,5,6;
output;
end;
run;

data haveB;
do id=1,1,1,3,3,4,6;
output;
end;
run;

proc sql feedback;
select COUNT(*) as N_UniqueIds
from
( select distinct id from work.haveA) as A,
( select distinct id from work.haveB) as B
where A.id = B.id;
;
quit;


HTH
Patrick
Super Contributor
Posts: 673

Re: distinct id's

Thanks Patrick.
Ask a Question
Discussion stats
  • 3 replies
  • 170 views
  • 0 likes
  • 3 in conversation