Solved: compare two datasets

varunsudham · Posted 05-24-2016 03:33 AM

Compare data from different datasets:

Dataset 1 contains history of customer records – (Number of kids(kid_count) who ever in care of the customer and the date(FCH_date) they came into care.)

ID	FCH_Date	KID_COUNT
945666	20060517	1
945666	20060614	1
945666	20071005	1
945666	20080701	1
945666	20080726	2
945666	20081025	2
945666	20081209	2
945666	20090205	2
945666	20090313	2
945666	20100719	3
945666	20110517	3
945666	20110619	4
945666	20110918	4
945666	20120101	4
945666	20120223	4
945666	20120405	4
945666	20120713	2
945666	20120715	1
945666	20120810	3
945666	20120820	4
945666	20121120	3
945666	20140901	3
945666	20141201	2
945666	20150223	1
945666	20150701	2
945666	20160619	3

Dataset 2 contains 1 record before and one after the activation date(PDNS_DOE) transactions of customers

ID	PDNS_DOE	kid_count_after	FCH_Date_after	KID_Count_before	FCH_Date_before
945666	20141130	3	20141201	3	20140901
945666	20071019	2	20081025	1	20071005

Need to select customers where the kid count has increased since the FCH_Date_after, the tricky bit is “since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1. If there is an increase then output ID and PDNS_DOE.

Help !!!!!!!!

Ksharp · Posted 05-25-2016 11:49 PM

Here is Hash Table way .

This ought to be right .


proc summary data=table1 nway;
 class id;
 var FCH_DATE;
 output out=max_date(drop=_:) max=max_date;
run;
data want;
 if _n_=1 then do;
  if 0 then set max_date ;
  declare hash h1(dataset:'max_date');
  h1.definekey('id');
  h1.definedata('max_date');
  h1.definedone();
  
  if 0 then set table1;
  declare hash h2(dataset:'table1');
  h2.definekey('id','FCH_DATE');
  h2.definedata('KID_COUNT');
  h2.definedone();
 end;
 set table2;
 rc=h1.find();
 if rc=0 then do;
  do date=FCH_Date_before+1 to max_date;
   if h2.find(key:id,key:date)=0 then do;
     if KID_COUNT>KID_Count_before then do;output;leave;end;
     if KID_COUNT<KID_Count_before then leave;
   end;
  end;
 end;
keep id PDNS_DOE ;
run;

View solution in original post

Ksharp · Posted 05-24-2016 04:14 AM

What kind of output would you like to get? I don't understand your question completely. Can you explain it more ? What do you mean "since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1" ?

varunsudham · Posted 05-25-2016 08:01 PM

Sorry I should have explained it a bit better.

Desired output is just the ID and PDNS_DOE

Well we are looking for increase in number of kids after the FCH_DATE_Before

This could even be something like

ID	FCH_DATE	KID_COUNT
945666	20140901	3
945666	20140908	4
945666	20140915	3
945666	20150901	2
945666	20151011	1
945666	20160101	2

in this case I select the ID and the PDNS_DOE from dataset 2 as there is an increase in the kid_count when the last iteration happened.

If the data is something like this then we WONTselect the customer( ID) or discard the record and move to the next record.

ID	FCH_DATE	KID_COUNT
945666	20140901	4
945666	20140908	1
945666	20140915	1
945666	20150901	2
945666	20151011	6
945666	20160101	5

hope this explains a bit better..

Ksharp · Posted 05-25-2016 10:31 PM

OK. Here is an question. If the data looks like the following ,what you gonna do ? Assuming you want output this obs.

945666	20140901	3
945666	20140908	3
945666	20140915	4

data table1;
input ID	FCH_DATE : yymmdd10.	KID_COUNT;
format FCH_DATE  yymmdd10.;
cards;
945666
20060517
1
945666
20060614
1
945666
20071005
1
945666
20080701
1
945666
20080726
2
945666
20081025
2
945666
20081209
2
945666
20090205
2
945666
20090313
2
945666
20100719
3
945666
20110517
3
945666
20110619
4
945666
20110918
4
945666
20120101
4
945666
20120223
4
945666
20120405
4
945666
20120713
2
945666
20120715
1
945666
20120810
3
945666
20120820
4
945666
20121120
3
945666
20140901
3
945666
20141201
2
945666
20150223
1
945666
20150701
2
945666
20160619
3
;
run;
data table2;
input ID
PDNS_DOE
kid_count_after
FCH_Date_after
KID_Count_before
FCH_Date_before  : yymmdd10.;
format FCH_Date_before  yymmdd10.;
cards;
945666
20141130
3
20141201
3
20140901
945666
20071019
2
20081025
1
20071005
;
run;


proc sort data=table1 ;by  id FCH_DATE;run;
proc sort data=table2 ;by  id FCH_Date_before;run;
data temp;
 set table1 table2(in=inb keep=ID PDNS_DOE KID_Count_before FCH_Date_before
                   rename=(FCH_Date_before=FCH_DATE KID_Count_before=KID_COUNT));
 by id FCH_DATE	;
 intwo=inb;
 if first.id or intwo then group+1;
run;
data want;
n=0;
 do i=1 by 1 until(last.group);
  set temp;
  by group;
  if i=1 then _KID_COUNT=KID_COUNT;
  if KID_COUNT ne _KID_COUNT then n+1;
  if n=1 and KID_COUNT gt _KID_COUNT then y=1;
 end;
 
 do until(last.group);
  set temp;
  by group;
  if y=1 and intwo then output;
 end;
keep id PDNS_DOE ;
run;

LinusH · Posted 05-24-2016 06:22 AM

Please provide the desired output and motivate why just those records should be selected.

Data never sleeps

Ksharp · Posted 05-25-2016 11:49 PM

Here is Hash Table way .

This ought to be right .


proc summary data=table1 nway;
 class id;
 var FCH_DATE;
 output out=max_date(drop=_:) max=max_date;
run;
data want;
 if _n_=1 then do;
  if 0 then set max_date ;
  declare hash h1(dataset:'max_date');
  h1.definekey('id');
  h1.definedata('max_date');
  h1.definedone();
  
  if 0 then set table1;
  declare hash h2(dataset:'table1');
  h2.definekey('id','FCH_DATE');
  h2.definedata('KID_COUNT');
  h2.definedone();
 end;
 set table2;
 rc=h1.find();
 if rc=0 then do;
  do date=FCH_Date_before+1 to max_date;
   if h2.find(key:id,key:date)=0 then do;
     if KID_COUNT>KID_Count_before then do;output;leave;end;
     if KID_COUNT<KID_Count_before then leave;
   end;
  end;
 end;
keep id PDNS_DOE ;
run;

varunsudham · Posted 06-06-2016 07:51 PM

thanks Ksharp. Sorry took a while to reply back.

compare two datasets

Re: compare two datasets

Re: compare two datasets

Re: compare two datasets

Re: compare two datasets

Re: compare two datasets

Re: compare two datasets

Re: compare two datasets

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away