distinct id's

SASPhile · Posted 11-05-2010 10:27 AM

Dataset A has a million Id's and Dataset B has million id's.
how to find the count of distinct id's that are common in two datasets?

sbb · Posted 11-05-2010 10:55 AM

PROC SQL - two SELECT DINSTINCT(keyvar1 keyvar2), one for each file and then a JOIN, possibly using a sub-query in the process.

For a DATA step approach, suggesting setting a VIEW for each file, then do two PROC SORT NODUPKEY with your BY variable list, then a MERGE with a BY statement, and using the IN= dataset option, you can then test your IN= variables for both files contributing to the MERGE.

Scott Barry
SBBWorks, Inc.

Patrick · Posted 11-05-2010 04:06 PM

Hi

A SQL approach:

data haveA;
do id=1,2,2,3,4,5,5,5,6;
output;
end;
run;

data haveB;
do id=1,1,1,3,3,4,6;
output;
end;
run;

proc sql feedback;
select COUNT(*) as N_UniqueIds
from
( select distinct id from work.haveA) as A,
( select distinct id from work.haveB) as B
where A.id = B.id;
;
quit;

HTH
Patrick

SASPhile · Posted 11-05-2010 04:52 PM

Thanks Patrick.

distinct id's

Re: distinct id's

Re: distinct id's

Re: distinct id's

distinct id's

Re: distinct id's

Re: distinct id's

Re: distinct id's

Click image to register for webinar

Classroom Training Available!