Compare data from different datasets:
ID |
FCH_Date |
KID_COUNT |
945666 |
20060517 |
1 |
945666 |
20060614 |
1 |
945666 |
20071005 |
1 |
945666 |
20080701 |
1 |
945666 |
20080726 |
2 |
945666 |
20081025 |
2 |
945666 |
20081209 |
2 |
945666 |
20090205 |
2 |
945666 |
20090313 |
2 |
945666 |
20100719 |
3 |
945666 |
20110517 |
3 |
945666 |
20110619 |
4 |
945666 |
20110918 |
4 |
945666 |
20120101 |
4 |
945666 |
20120223 |
4 |
945666 |
20120405 |
4 |
945666 |
20120713 |
2 |
945666 |
20120715 |
1 |
945666 |
20120810 |
3 |
945666 |
20120820 |
4 |
945666 |
20121120 |
3 |
945666 |
20140901 |
3 |
945666 |
20141201 |
2 |
945666 |
20150223 |
1 |
945666 |
20150701 |
2 |
945666 |
20160619 |
3 |
ID |
PDNS_DOE |
kid_count_after |
FCH_Date_after |
KID_Count_before |
FCH_Date_before |
945666 |
20141130 |
3 |
20141201 |
3 |
20140901 |
945666 |
20071019 |
2 |
20081025 |
1 |
20071005 |
Need to select customers where the kid count has increased since the FCH_Date_after, the tricky bit is “since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1. If there is an increase then output ID and PDNS_DOE.
Help !!!!!!!!
Here is Hash Table way .
This ought to be right .
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();
if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;
Sorry I should have explained it a bit better.
Desired output is just the ID and PDNS_DOE
Well we are looking for increase in number of kids after the FCH_DATE_Before
This could even be something like
ID | FCH_DATE | KID_COUNT |
945666 | 20140901 | 3 |
945666 | 20140908 | 4 |
945666 | 20140915 | 3 |
945666 | 20150901 | 2 |
945666 | 20151011 | 1 |
945666 | 20160101 | 2 |
in this case I select the ID and the PDNS_DOE from dataset 2 as there is an increase in the kid_count when the last iteration happened.
If the data is something like this then we WONTselect the customer( ID) or discard the record and move to the next record.
ID | FCH_DATE | KID_COUNT |
945666 | 20140901 | 4 |
945666 | 20140908 | 1 |
945666 | 20140915 | 1 |
945666 | 20150901 | 2 |
945666 | 20151011 | 6 |
945666 | 20160101 | 5 |
hope this explains a bit better..
OK. Here is an question. If the data looks like the following ,what you gonna do ? Assuming you want output this obs.
945666 | 20140901 | 3 |
945666 | 20140908 | 3 |
945666 | 20140915 | 4 |
data table1;
input ID FCH_DATE : yymmdd10. KID_COUNT;
format FCH_DATE yymmdd10.;
cards;
945666
20060517
1
945666
20060614
1
945666
20071005
1
945666
20080701
1
945666
20080726
2
945666
20081025
2
945666
20081209
2
945666
20090205
2
945666
20090313
2
945666
20100719
3
945666
20110517
3
945666
20110619
4
945666
20110918
4
945666
20120101
4
945666
20120223
4
945666
20120405
4
945666
20120713
2
945666
20120715
1
945666
20120810
3
945666
20120820
4
945666
20121120
3
945666
20140901
3
945666
20141201
2
945666
20150223
1
945666
20150701
2
945666
20160619
3
;
run;
data table2;
input ID
PDNS_DOE
kid_count_after
FCH_Date_after
KID_Count_before
FCH_Date_before : yymmdd10.;
format FCH_Date_before yymmdd10.;
cards;
945666
20141130
3
20141201
3
20140901
945666
20071019
2
20081025
1
20071005
;
run;
proc sort data=table1 ;by id FCH_DATE;run;
proc sort data=table2 ;by id FCH_Date_before;run;
data temp;
set table1 table2(in=inb keep=ID PDNS_DOE KID_Count_before FCH_Date_before
rename=(FCH_Date_before=FCH_DATE KID_Count_before=KID_COUNT));
by id FCH_DATE ;
intwo=inb;
if first.id or intwo then group+1;
run;
data want;
n=0;
do i=1 by 1 until(last.group);
set temp;
by group;
if i=1 then _KID_COUNT=KID_COUNT;
if KID_COUNT ne _KID_COUNT then n+1;
if n=1 and KID_COUNT gt _KID_COUNT then y=1;
end;
do until(last.group);
set temp;
by group;
if y=1 and intwo then output;
end;
keep id PDNS_DOE ;
run;
Here is Hash Table way .
This ought to be right .
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();
if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;
thanks Ksharp. Sorry took a while to reply back.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.