Compare data from different datasets:
ID |
FCH_Date |
KID_COUNT |
945666 |
20060517 |
1 |
945666 |
20060614 |
1 |
945666 |
20071005 |
1 |
945666 |
20080701 |
1 |
945666 |
20080726 |
2 |
945666 |
20081025 |
2 |
945666 |
20081209 |
2 |
945666 |
20090205 |
2 |
945666 |
20090313 |
2 |
945666 |
20100719 |
3 |
945666 |
20110517 |
3 |
945666 |
20110619 |
4 |
945666 |
20110918 |
4 |
945666 |
20120101 |
4 |
945666 |
20120223 |
4 |
945666 |
20120405 |
4 |
945666 |
20120713 |
2 |
945666 |
20120715 |
1 |
945666 |
20120810 |
3 |
945666 |
20120820 |
4 |
945666 |
20121120 |
3 |
945666 |
20140901 |
3 |
945666 |
20141201 |
2 |
945666 |
20150223 |
1 |
945666 |
20150701 |
2 |
945666 |
20160619 |
3 |
ID |
PDNS_DOE |
kid_count_after |
FCH_Date_after |
KID_Count_before |
FCH_Date_before |
945666 |
20141130 |
3 |
20141201 |
3 |
20140901 |
945666 |
20071019 |
2 |
20081025 |
1 |
20071005 |
Need to select customers where the kid count has increased since the FCH_Date_after, the tricky bit is “since FCH_Date_before need to check for increase in count, and this could be as demonstrated in the dataset1. If there is an increase then output ID and PDNS_DOE.
Help !!!!!!!!
Here is Hash Table way .
This ought to be right .
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();
if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;
Sorry I should have explained it a bit better.
Desired output is just the ID and PDNS_DOE
Well we are looking for increase in number of kids after the FCH_DATE_Before
This could even be something like
ID | FCH_DATE | KID_COUNT |
945666 | 20140901 | 3 |
945666 | 20140908 | 4 |
945666 | 20140915 | 3 |
945666 | 20150901 | 2 |
945666 | 20151011 | 1 |
945666 | 20160101 | 2 |
in this case I select the ID and the PDNS_DOE from dataset 2 as there is an increase in the kid_count when the last iteration happened.
If the data is something like this then we WONTselect the customer( ID) or discard the record and move to the next record.
ID | FCH_DATE | KID_COUNT |
945666 | 20140901 | 4 |
945666 | 20140908 | 1 |
945666 | 20140915 | 1 |
945666 | 20150901 | 2 |
945666 | 20151011 | 6 |
945666 | 20160101 | 5 |
hope this explains a bit better..
OK. Here is an question. If the data looks like the following ,what you gonna do ? Assuming you want output this obs.
945666 | 20140901 | 3 |
945666 | 20140908 | 3 |
945666 | 20140915 | 4 |
data table1;
input ID FCH_DATE : yymmdd10. KID_COUNT;
format FCH_DATE yymmdd10.;
cards;
945666
20060517
1
945666
20060614
1
945666
20071005
1
945666
20080701
1
945666
20080726
2
945666
20081025
2
945666
20081209
2
945666
20090205
2
945666
20090313
2
945666
20100719
3
945666
20110517
3
945666
20110619
4
945666
20110918
4
945666
20120101
4
945666
20120223
4
945666
20120405
4
945666
20120713
2
945666
20120715
1
945666
20120810
3
945666
20120820
4
945666
20121120
3
945666
20140901
3
945666
20141201
2
945666
20150223
1
945666
20150701
2
945666
20160619
3
;
run;
data table2;
input ID
PDNS_DOE
kid_count_after
FCH_Date_after
KID_Count_before
FCH_Date_before : yymmdd10.;
format FCH_Date_before yymmdd10.;
cards;
945666
20141130
3
20141201
3
20140901
945666
20071019
2
20081025
1
20071005
;
run;
proc sort data=table1 ;by id FCH_DATE;run;
proc sort data=table2 ;by id FCH_Date_before;run;
data temp;
set table1 table2(in=inb keep=ID PDNS_DOE KID_Count_before FCH_Date_before
rename=(FCH_Date_before=FCH_DATE KID_Count_before=KID_COUNT));
by id FCH_DATE ;
intwo=inb;
if first.id or intwo then group+1;
run;
data want;
n=0;
do i=1 by 1 until(last.group);
set temp;
by group;
if i=1 then _KID_COUNT=KID_COUNT;
if KID_COUNT ne _KID_COUNT then n+1;
if n=1 and KID_COUNT gt _KID_COUNT then y=1;
end;
do until(last.group);
set temp;
by group;
if y=1 and intwo then output;
end;
keep id PDNS_DOE ;
run;
Here is Hash Table way .
This ought to be right .
proc summary data=table1 nway;
class id;
var FCH_DATE;
output out=max_date(drop=_:) max=max_date;
run;
data want;
if _n_=1 then do;
if 0 then set max_date ;
declare hash h1(dataset:'max_date');
h1.definekey('id');
h1.definedata('max_date');
h1.definedone();
if 0 then set table1;
declare hash h2(dataset:'table1');
h2.definekey('id','FCH_DATE');
h2.definedata('KID_COUNT');
h2.definedone();
end;
set table2;
rc=h1.find();
if rc=0 then do;
do date=FCH_Date_before+1 to max_date;
if h2.find(key:id,key:date)=0 then do;
if KID_COUNT>KID_Count_before then do;output;leave;end;
if KID_COUNT<KID_Count_before then leave;
end;
end;
end;
keep id PDNS_DOE ;
run;
thanks Ksharp. Sorry took a while to reply back.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.