Solved: Multiple Match Merging?

djbateman · Posted 01-16-2014 03:38 PM

Suppose I have two datasets with dates as follows:

Dates1:

Subject	Date1
001	29AUG2013
002	21MAY2013
002	04JUL2013
002	20SEP2013
003	17JUN2013
004	16DEC2013

Dates2:

Subject	Date2
001	08May2013
001	25JUN2013
001	13AUG2013
002	17JAN2013
002	09APR2013
002	21MAY2013
003	24JAN2013
003	18MAR2013
003	17JUN2013
004	17JUN2013
004	27JUN2013
004	19AUG2013
004	30SEP2013
004	11NOV2013

I need to create a new dataset that compares DATES1 and DATES2. If any of the dates in DATES1 are found in DATES2 for a given subject, then I want to flag that subject. In other words, for the above example, my output would look like this:

Subject	Match
001	0
002	1
003	1
004	0

I have been playing with this for a couple of days now, and I can't seem to find a solution. What would be the best way to merge these two datasets so I can compare Date1 and Date2? My main obstacle is the fact that there can be multiple entries for a single subject in both datasets, and I need to compare all dates in both datasets for a match. Any ideas?

PGStats · Posted 01-16-2014 04:13 PM

you could use this:

data date1;
input sub date :anydtdte.;
datalines;
001 29AUG2013
002 21MAY2013
002 04JUL2013
002 20SEP2013
003 17JUN2013
004 16DEC2013
;

data date2;
input sub date :anydtdte.;
datalines;
001 08May2013
001 25JUN2013
001 13AUG2013
002 17JAN2013
002 09APR2013
002 21MAY2013
003 24JAN2013
003 18MAR2013
003 17JUN2013
004 17JUN2013
004 27JUN2013
004 19AUG2013
004 30SEP2013
004 11NOV2013
;

proc sql;
create table subCount as
select sub, sum(date in (select date from date2 where sub=d1.sub)) as match
from date1 as d1
group by sub;
select * from subCount;
quit;

PG

View solution in original post

PGStats · Posted 01-16-2014 04:13 PM

you could use this:

data date1;
input sub date :anydtdte.;
datalines;
001 29AUG2013
002 21MAY2013
002 04JUL2013
002 20SEP2013
003 17JUN2013
004 16DEC2013
;

data date2;
input sub date :anydtdte.;
datalines;
001 08May2013
001 25JUN2013
001 13AUG2013
002 17JAN2013
002 09APR2013
002 21MAY2013
003 24JAN2013
003 18MAR2013
003 17JUN2013
004 17JUN2013
004 27JUN2013
004 19AUG2013
004 30SEP2013
004 11NOV2013
;

proc sql;
create table subCount as
select sub, sum(date in (select date from date2 where sub=d1.sub)) as match
from date1 as d1
group by sub;
select * from subCount;
quit;

PG

djbateman · Posted 01-16-2014 04:20 PM

I never would have thought of that on my own. Thank you so much!

PGStats · Posted 01-16-2014 04:57 PM

You are welcome. You can reverse the role of date1 and date2 to get the match counts of table2 subjects. - PG

PG

Astounding · Posted 01-16-2014 04:56 PM

Call me old-fashioned ?? I probably would use a DATA step instead of SQL:

data want;

merge date1 (in=in1) date2 (in=in2 rename=(date2=date1));

by subject date1;

if first.subject then match=0;

if in1 and in2 then match=1;

retain match;

keep subject match;

if last.subject;

run;

PGStats · Posted 01-16-2014 05:15 PM

FAIW - Replace match=1 with match+1 to get the count of matching dates for each subject and render the retain statement redundant. - PG

PG

Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Re: Multiple Match Merging?

Registration is open

SAS Training: Just a Click Away