Hi Experts,
I have two tables. I will have to append table B to table A and continue to calculate the variable DPD.
Table A
CID | SID | Date | Final_flg | DPD |
aa | b | 20200101 | Y | 5 |
aa | b | 20200102 | Y | 6 |
aa | b | 20200103 | Y | 7 |
aa | b | 20200104 | N | 0 |
bb | c | 20200101 | Y | 25 |
bb | c | 20200102 | Y | 26 |
bb | c | 20200103 | Y | 27 |
aa | d | 20200101 | N | 0 |
aa | d | 20200102 | N | 0 |
aa | d | 20200103 | Y | 1 |
aa | d | 20200104 | Y | 2 |
Table B
CID | SID | Date | Final_flg |
aa | b | 20200105 | Y |
aa | b | 20200106 | Y |
aa | b | 20200107 | Y |
aa | d | 20200105 | Y |
aa | d | 20200106 | Y |
aa | d | 20200107 | Y |
bb | c | 20200105 | N |
bb | c | 20200106 | Y |
bb | c | 20200107 | Y |
ee | f | 20200105 | N |
ee | f | 20200106 | Y |
Table Want
CID | SID | Date | Final_flg | DPD |
aa | b | 20200101 | Y | 5 |
aa | b | 20200102 | Y | 6 |
aa | b | 20200103 | Y | 7 |
aa | b | 20200104 | N | 0 |
aa | b | 20200105 | Y | 1 |
aa | b | 20200106 | Y | 2 |
aa | b | 20200107 | Y | 3 |
aa | d | 20200101 | N | 0 |
aa | d | 20200102 | N | 0 |
aa | d | 20200103 | Y | 1 |
aa | d | 20200104 | Y | 2 |
aa | d | 20200105 | Y | 3 |
aa | d | 20200106 | Y | 4 |
aa | d | 20200107 | Y | 5 |
bb | c | 20200101 | Y | 25 |
bb | c | 20200102 | Y | 26 |
bb | c | 20200103 | Y | 27 |
bb | c | 20200105 | N | 0 |
bb | c | 20200106 | Y | 1 |
bb | c | 20200107 | Y | 2 |
ee | f | 20200105 | N | 0 |
ee | f | 20200106 | Y | 1 |
DPD variable has to count whenever there is Final_flag ="Y", it resets to 0 when Final_flag ="N". Following code is used to calculate DPD.
data Table_A;
set input;
by CID SID;
if first.CID | final_flg=0 then DPD=0;
if final_flg then DPD+1;
run;
I cannot rerun the whole table for DPD variable. Once we appended Table B, DPD needs to be updated for the data derived from table B. count should start from least DPD count available for the CID and SID.
Thank you so much for your time and expertise help.
Regards,
Myu
data a;
input (CID SID) ($) Date yymmdd10. Final_flg $ DPD ;
format date yymmdd10.;
cards;
aa b 20200101 Y 5
aa b 20200102 Y 6
aa b 20200103 Y 7
aa b 20200104 N 0
bb c 20200101 Y 25
bb c 20200102 Y 26
bb c 20200103 Y 27
aa d 20200101 N 0
aa d 20200102 N 0
aa d 20200103 Y 1
aa d 20200104 Y 2
;
data b;
input (CID SID) ($) Date yymmdd10. Final_flg $;
format date yymmdd10.;
cards;
aa b 20200105 Y
aa b 20200106 Y
aa b 20200107 Y
aa d 20200105 Y
aa d 20200106 Y
aa d 20200107 Y
bb c 20200105 N
bb c 20200106 Y
bb c 20200107 Y
ee f 20200105 N
ee f 20200106 Y
;
proc sort data=a out=a_sort;
by cid sid date;
run;
proc sort data=b out=b_sort;
by cid sid date;
run;
data want;
set a_sort b_sort;
by cid sid ;
if first.sid then _dpd=.;
If dpd>. then _dpd=dpd;
else if dpd=. then if Final_flg='N' then _dpd=0;
else _dpd+1;
rename _dpd=dpd;
drop dpd;
run;
proc print noobs;run;
CID | SID | Date | Final_flg | dpd |
---|---|---|---|---|
aa | b | 2020-01-01 | Y | 5 |
aa | b | 2020-01-02 | Y | 6 |
aa | b | 2020-01-03 | Y | 7 |
aa | b | 2020-01-04 | N | 0 |
aa | b | 2020-01-05 | Y | 1 |
aa | b | 2020-01-06 | Y | 2 |
aa | b | 2020-01-07 | Y | 3 |
aa | d | 2020-01-01 | N | 0 |
aa | d | 2020-01-02 | N | 0 |
aa | d | 2020-01-03 | Y | 1 |
aa | d | 2020-01-04 | Y | 2 |
aa | d | 2020-01-05 | Y | 3 |
aa | d | 2020-01-06 | Y | 4 |
aa | d | 2020-01-07 | Y | 5 |
bb | c | 2020-01-01 | Y | 25 |
bb | c | 2020-01-02 | Y | 26 |
bb | c | 2020-01-03 | Y | 27 |
bb | c | 2020-01-05 | N | 0 |
bb | c | 2020-01-06 | Y | 1 |
bb | c | 2020-01-07 | Y | 2 |
ee | f | 2020-01-05 | N | 0 |
ee | f | 2020-01-06 | Y | 1 |
data a;
input (CID SID) ($) Date yymmdd10. Final_flg $ DPD ;
format date yymmdd10.;
cards;
aa b 20200101 Y 5
aa b 20200102 Y 6
aa b 20200103 Y 7
aa b 20200104 N 0
bb c 20200101 Y 25
bb c 20200102 Y 26
bb c 20200103 Y 27
aa d 20200101 N 0
aa d 20200102 N 0
aa d 20200103 Y 1
aa d 20200104 Y 2
;
data b;
input (CID SID) ($) Date yymmdd10. Final_flg $;
format date yymmdd10.;
cards;
aa b 20200105 Y
aa b 20200106 Y
aa b 20200107 Y
aa d 20200105 Y
aa d 20200106 Y
aa d 20200107 Y
bb c 20200105 N
bb c 20200106 Y
bb c 20200107 Y
ee f 20200105 N
ee f 20200106 Y
;
proc sort data=a out=a_sort;
by cid sid date;
run;
proc sort data=b out=b_sort;
by cid sid date;
run;
data want;
set a_sort b_sort;
by cid sid ;
if first.sid then _dpd=.;
If dpd>. then _dpd=dpd;
else if dpd=. then if Final_flg='N' then _dpd=0;
else _dpd+1;
rename _dpd=dpd;
drop dpd;
run;
proc print noobs;run;
CID | SID | Date | Final_flg | dpd |
---|---|---|---|---|
aa | b | 2020-01-01 | Y | 5 |
aa | b | 2020-01-02 | Y | 6 |
aa | b | 2020-01-03 | Y | 7 |
aa | b | 2020-01-04 | N | 0 |
aa | b | 2020-01-05 | Y | 1 |
aa | b | 2020-01-06 | Y | 2 |
aa | b | 2020-01-07 | Y | 3 |
aa | d | 2020-01-01 | N | 0 |
aa | d | 2020-01-02 | N | 0 |
aa | d | 2020-01-03 | Y | 1 |
aa | d | 2020-01-04 | Y | 2 |
aa | d | 2020-01-05 | Y | 3 |
aa | d | 2020-01-06 | Y | 4 |
aa | d | 2020-01-07 | Y | 5 |
bb | c | 2020-01-01 | Y | 25 |
bb | c | 2020-01-02 | Y | 26 |
bb | c | 2020-01-03 | Y | 27 |
bb | c | 2020-01-05 | N | 0 |
bb | c | 2020-01-06 | Y | 1 |
bb | c | 2020-01-07 | Y | 2 |
ee | f | 2020-01-05 | N | 0 |
ee | f | 2020-01-06 | Y | 1 |
You say that your TABLE_A is created by setting an input BY CID SID, but your TABLE_A is not sorted by those two variables.
I assume that TABLE_A is a large master table, which has been modified in some other way after appending, and that you need to append more data and update it without recreating (e.g. sorting) the master table. In that case, indexed access may be the way to go:
data TABLE_A;
informat CID $2. SID SID $1. Date yymmdd8. Final_flg $1. DPD 2.;
input CID SID Date Final_flg DPD;
format date yymmddn8.;
cards;
aa b 20200101 Y 5
aa b 20200102 Y 6
aa b 20200103 Y 7
aa b 20200104 N 0
bb c 20200101 Y 25
bb c 20200102 Y 26
bb c 20200103 Y 27
aa d 20200101 N 0
aa d 20200102 N 0
aa d 20200103 Y 1
aa d 20200104 Y 2
;run;
data TABLE_B;
informat CID $2. SID SID $1. Date yymmdd8. Final_flg $1. ;
input CID SID Date Final_flg ;
format date yymmddn8.;
cards;
aa b 20200105 Y
aa b 20200106 Y
aa b 20200107 Y
aa d 20200105 Y
aa d 20200106 Y
aa d 20200107 Y
bb c 20200105 N
bb c 20200106 Y
bb c 20200107 Y
ee f 20200105 N
ee f 20200106 Y
;run;
proc sql;
create unique index idx on TABLE_A(CID,SID,Date);
quit;
proc append base=TABLE_A data=TABLE_B force;
run;
data TABLE_A;
set TABLE_A(cntllev=rec);
by CID SID;
if first.CID | final_flg='N' then _DPD=0;
if final_flg='Y' then _DPD+1;
if missing(DPD);
modify TABLE_A(cntllev=rec) key=idx;
DPD=_DPD;
replace;
run;
The TABLE_A does not have the same sort order as your WANT table, but you can still access it BY CID SID DATE, using the index:
data something;
set TABLE_A;
by CID SID DATE;
/* more statements */
It is also possible to do the append and the update in one step, if the transaction table (TABLE_B) is sorted:
data TABLE_A;
if 0 then modify TABLE_A(cntllev=rec);
set TABLE_A(cntllev=rec) TABLE_B;
by CID SID date;
if first.CID | final_flg='N' then _DPD=0;
if final_flg='Y' then _DPD+1;
if missing(DPD);
DPD=_DPD;
output;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.