compare two datasets

Accepted Solution Solved
Reply
SAS Employee
Posts: 4
Accepted Solution

compare two datasets

Compare data from different datasets:

 

  1. Dataset 1 contains history of customer records – (Number of kids(kid_count) who ever in care of the customer and the date(FCH_date) they came into care.)

ID

FCH_Date

KID_COUNT

945666

20060517

1

945666

20060614

1

945666

20071005

1

945666

20080701

1

945666

20080726

2

945666

20081025

2

945666

20081209

2

945666

20090205

2

945666

20090313

2

945666

20100719

3

945666

20110517

3

945666

20110619

4

945666

20110918

4

945666

20120101

4

945666

20120223

4

945666

20120405

4

945666

20120713

2

945666

20120715

1

945666

20120810

3

945666

20120820

4

945666

20121120

3

945666

20140901

3

945666

20141201

2

945666

20150223

1

945666

20150701

2

945666

20160619

3

 

 

  1. Dataset 2 contains 1 record before and one after the activation date(PDNS_DOE) transactions of customers

ID

PDNS_DOE

kid_count_after

FCH_Date_after

KID_Count_before

FCH_Date_before

945666

20141130

3

20141201

3

20140901

945666

20071019

2

20081025

1

20071005

 

 

 

 

 

Need to select customers where the kid count has increased since the FCH_Date_after, the tricky bit is “since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1. If there is an increase then output ID and PDNS_DOE.

 

 Help !!!!!!!!


Accepted Solutions
Solution
‎06-06-2016 07:50 PM
Super User
Posts: 9,687

Re: compare two datasets

[ Edited ]

Here is Hash Table way .

This ought to be right .

 

 


proc summary data=table1 nway;
 class id;
 var FCH_DATE;
 output out=max_date(drop=_:) max=max_date;
run;
data want;
 if _n_=1 then do;
  if 0 then set max_date ;
  declare hash h1(dataset:'max_date');
  h1.definekey('id');
  h1.definedata('max_date');
  h1.definedone();
  
  if 0 then set table1;
  declare hash h2(dataset:'table1');
  h2.definekey('id','FCH_DATE');
  h2.definedata('KID_COUNT');
  h2.definedone();
 end;
 set table2;
 rc=h1.find();
 if rc=0 then do;
  do date=FCH_Date_before+1 to max_date;
   if h2.find(key:id,key:date)=0 then do;
     if KID_COUNT>KID_Count_before then do;output;leave;end;
     if KID_COUNT<KID_Count_before then leave;
   end;
  end;
 end;
keep id PDNS_DOE ;
run;

View solution in original post


All Replies
Super User
Posts: 9,687

Re: compare two datasets

What kind of output would you like to get? I don't understand your question completely. Can you explain it more ? What do you mean "since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1" ?
SAS Employee
Posts: 4

Re: compare two datasets

Sorry I should have explained it a bit better.

Desired output is just the ID and PDNS_DOE

 

Well we are looking for increase in number of kids after the FCH_DATE_Before

This could even be something like

ID FCH_DATE KID_COUNT
945666 20140901 3
945666 20140908 4
945666 20140915 3
945666 20150901 2
945666 20151011 1
945666 20160101 2

 

in this case I select the ID and the PDNS_DOE from dataset 2 as there is an increase in the kid_count when the last iteration happened.

 

If the data is something like this then we WONTselect the customer( ID) or discard the record and move to the next record.

ID FCH_DATE KID_COUNT
945666 20140901 4
945666 20140908 1
945666 20140915 1
945666 20150901 2
945666 20151011 6
945666 20160101 5

 

hope this explains a bit better..

 

 

 

 

 

Super User
Posts: 9,687

Re: compare two datasets

OK. Here is an question. If the data looks like the following ,what you gonna do ? Assuming you want output this obs.

 

945666 20140901 3
945666 20140908 3
945666 20140915 4

 

 

data table1;
input ID	FCH_DATE : yymmdd10.	KID_COUNT;
format FCH_DATE  yymmdd10.;
cards;
945666
20060517
1
945666
20060614
1
945666
20071005
1
945666
20080701
1
945666
20080726
2
945666
20081025
2
945666
20081209
2
945666
20090205
2
945666
20090313
2
945666
20100719
3
945666
20110517
3
945666
20110619
4
945666
20110918
4
945666
20120101
4
945666
20120223
4
945666
20120405
4
945666
20120713
2
945666
20120715
1
945666
20120810
3
945666
20120820
4
945666
20121120
3
945666
20140901
3
945666
20141201
2
945666
20150223
1
945666
20150701
2
945666
20160619
3
;
run;
data table2;
input ID
PDNS_DOE
kid_count_after
FCH_Date_after
KID_Count_before
FCH_Date_before  : yymmdd10.;
format FCH_Date_before  yymmdd10.;
cards;
945666
20141130
3
20141201
3
20140901
945666
20071019
2
20081025
1
20071005
;
run;


proc sort data=table1 ;by  id FCH_DATE;run;
proc sort data=table2 ;by  id FCH_Date_before;run;
data temp;
 set table1 table2(in=inb keep=ID PDNS_DOE KID_Count_before FCH_Date_before
                   rename=(FCH_Date_before=FCH_DATE KID_Count_before=KID_COUNT));
 by id FCH_DATE	;
 intwo=inb;
 if first.id or intwo then group+1;
run;
data want;
n=0;
 do i=1 by 1 until(last.group);
  set temp;
  by group;
  if i=1 then _KID_COUNT=KID_COUNT;
  if KID_COUNT ne _KID_COUNT then n+1;
  if n=1 and KID_COUNT gt _KID_COUNT then y=1;
 end;
 
 do until(last.group);
  set temp;
  by group;
  if y=1 and intwo then output;
 end;
keep id PDNS_DOE ;
run;
 
Super User
Posts: 5,260

Re: compare two datasets

Please provide the desired output and motivate why just those records should be selected.
Data never sleeps
Solution
‎06-06-2016 07:50 PM
Super User
Posts: 9,687

Re: compare two datasets

[ Edited ]

Here is Hash Table way .

This ought to be right .

 

 


proc summary data=table1 nway;
 class id;
 var FCH_DATE;
 output out=max_date(drop=_:) max=max_date;
run;
data want;
 if _n_=1 then do;
  if 0 then set max_date ;
  declare hash h1(dataset:'max_date');
  h1.definekey('id');
  h1.definedata('max_date');
  h1.definedone();
  
  if 0 then set table1;
  declare hash h2(dataset:'table1');
  h2.definekey('id','FCH_DATE');
  h2.definedata('KID_COUNT');
  h2.definedone();
 end;
 set table2;
 rc=h1.find();
 if rc=0 then do;
  do date=FCH_Date_before+1 to max_date;
   if h2.find(key:id,key:date)=0 then do;
     if KID_COUNT>KID_Count_before then do;output;leave;end;
     if KID_COUNT<KID_Count_before then leave;
   end;
  end;
 end;
keep id PDNS_DOE ;
run;
SAS Employee
Posts: 4

Re: compare two datasets

thanks Ksharp. Sorry took a while to reply back.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 434 views
  • 0 likes
  • 3 in conversation