Solved
SAS Employee
Posts: 4

compare two datasets

Compare data from different datasets:

1. Dataset 1 contains history of customer records – (Number of kids(kid_count) who ever in care of the customer and the date(FCH_date) they came into care.)
 ID FCH_Date KID_COUNT 945666 20060517 1 945666 20060614 1 945666 20071005 1 945666 20080701 1 945666 20080726 2 945666 20081025 2 945666 20081209 2 945666 20090205 2 945666 20090313 2 945666 20100719 3 945666 20110517 3 945666 20110619 4 945666 20110918 4 945666 20120101 4 945666 20120223 4 945666 20120405 4 945666 20120713 2 945666 20120715 1 945666 20120810 3 945666 20120820 4 945666 20121120 3 945666 20140901 3 945666 20141201 2 945666 20150223 1 945666 20150701 2 945666 20160619 3

1. Dataset 2 contains 1 record before and one after the activation date(PDNS_DOE) transactions of customers
 ID PDNS_DOE kid_count_after FCH_Date_after KID_Count_before FCH_Date_before 945666 20141130 3 20141201 3 20140901 945666 20071019 2 20081025 1 20071005

Need to select customers where the kid count has increased since the FCH_Date_after, the tricky bit is “since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1. If there is an increase then output ID and PDNS_DOE.

Help !!!!!!!!

Accepted Solutions
Solution
‎06-06-2016 07:50 PM
Super User
Posts: 10,850

Re: compare two datasets

[ Edited ]

Here is Hash Table way .

This ought to be right .

``````
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();

if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;``````

All Replies
Super User
Posts: 10,850

Re: compare two datasets

What kind of output would you like to get? I don't understand your question completely. Can you explain it more ? What do you mean "since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1" ?
SAS Employee
Posts: 4

Re: compare two datasets

Sorry I should have explained it a bit better.

Desired output is just the ID and PDNS_DOE

Well we are looking for increase in number of kids after the FCH_DATE_Before

This could even be something like

 ID FCH_DATE KID_COUNT 945666 20140901 3 945666 20140908 4 945666 20140915 3 945666 20150901 2 945666 20151011 1 945666 20160101 2

in this case I select the ID and the PDNS_DOE from dataset 2 as there is an increase in the kid_count when the last iteration happened.

If the data is something like this then we WONTselect the customer( ID) or discard the record and move to the next record.

 ID FCH_DATE KID_COUNT 945666 20140901 4 945666 20140908 1 945666 20140915 1 945666 20150901 2 945666 20151011 6 945666 20160101 5

hope this explains a bit better..

Super User
Posts: 10,850

Re: compare two datasets

OK. Here is an question. If the data looks like the following ,what you gonna do ? Assuming you want output this obs.

 945666 20140901 3 945666 20140908 3 945666 20140915 4

``````data table1;
input ID	FCH_DATE : yymmdd10.	KID_COUNT;
format FCH_DATE  yymmdd10.;
cards;
945666
20060517
1
945666
20060614
1
945666
20071005
1
945666
20080701
1
945666
20080726
2
945666
20081025
2
945666
20081209
2
945666
20090205
2
945666
20090313
2
945666
20100719
3
945666
20110517
3
945666
20110619
4
945666
20110918
4
945666
20120101
4
945666
20120223
4
945666
20120405
4
945666
20120713
2
945666
20120715
1
945666
20120810
3
945666
20120820
4
945666
20121120
3
945666
20140901
3
945666
20141201
2
945666
20150223
1
945666
20150701
2
945666
20160619
3
;
run;
data table2;
input ID
PDNS_DOE
kid_count_after
FCH_Date_after
KID_Count_before
FCH_Date_before  : yymmdd10.;
format FCH_Date_before  yymmdd10.;
cards;
945666
20141130
3
20141201
3
20140901
945666
20071019
2
20081025
1
20071005
;
run;

proc sort data=table1 ;by  id FCH_DATE;run;
proc sort data=table2 ;by  id FCH_Date_before;run;
data temp;
set table1 table2(in=inb keep=ID PDNS_DOE KID_Count_before FCH_Date_before
rename=(FCH_Date_before=FCH_DATE KID_Count_before=KID_COUNT));
by id FCH_DATE	;
intwo=inb;
if first.id or intwo then group+1;
run;
data want;
n=0;
do i=1 by 1 until(last.group);
set temp;
by group;
if i=1 then _KID_COUNT=KID_COUNT;
if KID_COUNT ne _KID_COUNT then n+1;
if n=1 and KID_COUNT gt _KID_COUNT then y=1;
end;

do until(last.group);
set temp;
by group;
if y=1 and intwo then output;
end;
keep id PDNS_DOE ;
run;
``````
Super User
Posts: 5,919

Re: compare two datasets

Please provide the desired output and motivate why just those records should be selected.
Data never sleeps
Solution
‎06-06-2016 07:50 PM
Super User
Posts: 10,850

Re: compare two datasets

[ Edited ]

Here is Hash Table way .

This ought to be right .

``````
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();

if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;``````
SAS Employee
Posts: 4

Re: compare two datasets

thanks Ksharp. Sorry took a while to reply back.

🔒 This topic is solved and locked.