@novinosrin Thank you so much for your generous help.
I have the following question to adjust the code to include new_data into the master_data. Since I cannot run the logic after appending the tables, I would like to do the calculation only on the new_data table and then append this to the master table.
Master_Table, new_table and the code to get the master_table with the calculated variables (probation_count, probation_flag, and cure_count)
data Master_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $;
format date ddmmyy10.;
datalines;
111 04/04/2021 87 N
111 05/04/2021 88 N
111 06/04/2021 89 N
111 07/04/2021 90 Y
111 08/04/2021 91 Y
111 09/04/2021 92 Y
111 10/04/2021 93 Y
111 11/04/2021 00 N
111 12/04/2021 00 N
111 13/04/2021 00 N
111 14/04/2021 00 N
111 15/04/2021 00 N
111 16/04/2021 01 N
111 17/04/2021 02 N
111 18/04/2021 00 N
111 19/04/2021 00 N
111 20/04/2021 00 N
111 21/04/2021 00 N
111 22/04/2021 00 N
111 23/04/2021 00 N
111 24/04/2021 00 N
222 04/04/2021 86 N
222 05/04/2021 87 N
222 06/04/2021 88 N
222 07/04/2021 89 N
222 08/04/2021 90 Y
222 09/04/2021 91 Y
222 10/04/2021 92 Y
222 11/04/2021 00 N
222 12/04/2021 01 N
222 13/04/2021 02 N
222 14/04/2021 03 N
222 15/04/2021 04 N
222 16/04/2021 05 N
333 04/04/2021 87 N
333 05/04/2021 88 N
333 06/04/2021 89 N
333 07/04/2021 90 Y
333 08/04/2021 91 Y
333 09/04/2021 92 Y
333 10/04/2021 00 N
333 11/04/2021 00 N
333 12/04/2021 00 N
333 13/04/2021 00 N
333 14/04/2021 00 N
333 15/04/2021 00 N
333 16/04/2021 01 N
333 17/04/2021 02 N
333 18/04/2021 03 Y
333 19/04/2021 04 Y
333 20/04/2021 05 Y
333 21/04/2021 00 Y
333 22/04/2021 01 N
333 23/04/2021 02 N
333 24/04/2021 00 N
333 25/04/2021 00 N
333 26/04/2021 00 N
333 27/04/2021 00 N
333 28/04/2021 00 N
333 29/04/2021 00 N
333 30/04/2021 00 N
333 01/05/2021 00 N
333 02/05/2021 00 N
333 03/05/2021 00 N
333 04/05/2021 90 Y
111 25/04/2021 00 N
444 10/12/2021 00 N
444 11/12/2021 90 Y
444 12/12/2021 91 Y
;
Run;
proc sort data=master_data;
by cid date;
run;
data want;
if 0 then set master_data;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set master_data;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
data New_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $;
format date ddmmyy10.;
datalines;
111 26/04/2021 00 N
333 05/05/2021 91 Y
222 17/04/2021 06 Y
444 13/12/2021 00 N
;
Run;
I want the code to pick the latest values from master table and continue the calculations in the new_data table.
I am in a same situation as this thread https://communities.sas.com/t5/SAS-Programming/Append-two-tables-and-increment-count-on-the-new-data... , where you helped me to solve.
Thank you so much again for your help.
Good morning @Myurathan , Not sure if I am reading your points correctly. Please clarify
1. "I want the code to pick the latest values from master table and continue the calculations in the new_data table. "-- Do you want to replicate the same solution in the other thread just using the Master_table and New_data.
2. Does the existing WANT code you posted has got to do anything in this ? or is the WANT table that goes in as Master table?
3 Or is it a rewrite of the same WANT code logic here, however with inclusion of New_data?
Can you please make it clearer
Hi @Myurathan Thank you.
To address " I want the code that picks the latest probation_count and cure_count for the given CID from master_table and continue the calculation in the new_data table creating probation_count, probation_flag and cure_count." , you would still need to read and process the master_table to determine/capture the record that has the latest probation_count and cure_count for the given CID.
Of course you may do so by -
proc sql;
create table latest as
select *
from want
group by cid
having date=max(date);
quit;
which will give you the latest for each CID i.e. -
CID | date | DPD | Default_Flag | Probation_count | Probation_Flag | Cure_count |
111 | 25/04/2021 | 0 | N | 0 | 5 | |
222 | 16/04/2021 | 5 | N | 0 | Y | 0 |
333 | 4/5/2021 | 90 | Y | 0 | 0 | |
444 | 12/12/2021 | 91 | Y | 0 | 0 |
Then we can proceed with a LOOKUP/JOIN operation from New_data to the above result fetching the latest information for each CID and continue with the necessary calculations from there onwards.
However, I would like you to be aware that in any case, there is a requirement to have at least one pass of the dataset WANT aka Master table with the count info to fetch the latest record and then use that as a "start" to process new_data.
Please think through, review your requirement and let me know what you think is best for you. I hope you get what I am saying to make sense of it.
@novinosrin I have a latest_flag in the master_table already. Therefore, we can fetch the latest information from master_table by filtering. Then we can proceed with a LOOKUP/JOIN operation from New_data to the above result fetching the latest information for each CID and continue with the necessary calculations from there onwards.
I have updated the master_data accordingly.
data Master_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $ latest_flag $;
format date ddmmyy10.;
datalines;
111 04/04/2021 87 N N
111 05/04/2021 88 N N
111 06/04/2021 89 N N
111 07/04/2021 90 Y N
111 08/04/2021 91 Y N
111 09/04/2021 92 Y N
111 10/04/2021 93 Y N
111 11/04/2021 00 N N
111 12/04/2021 00 N N
111 13/04/2021 00 N N
111 14/04/2021 00 N N
111 15/04/2021 00 N N
111 16/04/2021 01 N N
111 17/04/2021 02 N N
111 18/04/2021 00 N N
111 19/04/2021 00 N N
111 20/04/2021 00 N N
111 21/04/2021 00 N N
111 22/04/2021 00 N N
111 23/04/2021 00 N N
111 24/04/2021 00 N N
111 25/04/2021 00 N Y
222 04/04/2021 86 N N
222 05/04/2021 87 N N
222 06/04/2021 88 N N
222 07/04/2021 89 N N
222 08/04/2021 90 Y N
222 09/04/2021 91 Y N
222 10/04/2021 92 Y N
222 11/04/2021 00 N N
222 12/04/2021 01 N N
222 13/04/2021 02 N N
222 14/04/2021 03 N N
222 15/04/2021 04 N N
222 16/04/2021 05 N Y
333 04/04/2021 87 N N
333 05/04/2021 88 N N
333 06/04/2021 89 N N
333 07/04/2021 90 Y N
333 08/04/2021 91 Y N
333 09/04/2021 92 Y N
333 10/04/2021 00 N N
333 11/04/2021 00 N N
333 12/04/2021 00 N N
333 13/04/2021 00 N N
333 14/04/2021 00 N N
333 15/04/2021 00 N N
333 16/04/2021 01 N N
333 17/04/2021 02 N N
333 18/04/2021 03 Y N
333 19/04/2021 04 Y N
333 20/04/2021 05 Y N
333 21/04/2021 00 Y N
333 22/04/2021 01 N N
333 23/04/2021 02 N N
333 24/04/2021 00 N N
333 25/04/2021 00 N N
333 26/04/2021 00 N N
333 27/04/2021 00 N N
333 28/04/2021 00 N N
333 29/04/2021 00 N N
333 30/04/2021 00 N N
333 01/05/2021 00 N N
333 02/05/2021 00 N N
333 03/05/2021 00 N N
333 04/05/2021 90 Y Y
444 10/12/2021 00 N N
444 11/12/2021 90 Y N
444 12/12/2021 91 Y Y
;
Run;
Data latest_data;
set master_data(where=(latest_flag="Y"));
run;
proc sort data=latest_data;
by cid date;
run;
data New_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $;
format date ddmmyy10.;
datalines;
111 26/04/2021 00 N
333 05/05/2021 91 Y
222 17/04/2021 06 Y
444 13/12/2021 00 N
555 07/04/2021 00 Y
555 08/04/2021 00 N
555 09/04/2021 00 N
;
Run;
using latest_data as a "start" to process new_data. logic needs to be adjusted to take the starting points as follows;
data want;
if 0 then set master_data;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set master_data;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
Please let me know, I am still being unclear. Thank you so much for your help.
Good morning @Myurathan You weren't unclear at all. I was just stating the extra pass i.e. processing of the master dataset though. However, it seems you are quite content to process the master dataset to get the latest information. This is what typically we would want to avoid downstream. An ideal situation is to have process done for all data in one go upstream rather.
Hey no worries, Let me review some of my day's work and then respond to you with the solution
Hi @Myurathan Pardon me as my day was very long and extremely busy today and so I'm afraid I couldn't get my hands on SAS much. I'll try and work on your requirement tomorrow morning. Hang in there!
@Myurathan Since you have provided the latest_data, we could keep it simple by
1. Interleave aka sorted append by group
data
New_data; input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $; format date ddmmyy10.; datalines; 111 26/04/2021 00 N 333 05/05/2021 91 Y 222 17/04/2021 06 Y 444 13/12/2021 00 N 555 07/04/2021 00 Y 555 08/04/2021 00 N 555 09/04/2021 00 N ; Run; proc sort data=new_data out=_new_data; by cid date; run; data combine; set latest_data _new_data; by cid date; run; We would get the following- CID date DPD Default_Flag latest_flag 111 25/04/2021 0 N Y 111 26/04/2021 0 N 222 16/04/2021 5 N Y 222 17/04/2021 6 Y 333 04/05/2021 90 Y Y 333 05/05/2021 91 Y 444 12/12/2021 91 Y Y 444 13/12/2021 0 N 555 07/04/2021 0 Y 555 08/04/2021 0 N 555 09/04/2021 0 N
Then we apply the existing logic, just by removing not first.cid condition from the below
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
for the reason, the new_Data is a trail.
so -
if Default_Flag='N' and first.Default_Flag then _k=1;
Therefore, the edited code would be-
data new_want;
if 0 then set combine;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set combine;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
CID date DPD Default_Flag latest_flag Probation_count Probation_Flag Cure_count
111 25/04/2021 0 N Y 1 Y 0
111 26/04/2021 0 N 2 Y 0
222 16/04/2021 5 N Y 0 Y 0
222 17/04/2021 6 Y 0 0
333 04/05/2021 90 Y Y 0 0
333 05/05/2021 91 Y 0 0
444 12/12/2021 91 Y Y 0 0
444 13/12/2021 0 N 1 Y 0
555 07/04/2021 0 Y 0 0
555 08/04/2021 0 N 1 Y 0
555 09/04/2021 0 N 2 Y 0
Please check if the above is what you are looking for and let me know
@novinosrin First of all, Thank you so so much for your time and help. Unfortunately, the results are not the same as expected. I am sure it is me clumsy in explaining the details. Let me explain in a language you understand best; SAS codes.
Here is the solution you have suggested;
data Master_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $ latest_flag $;
format date ddmmyy10.;
datalines;
111 04/04/2021 87 N N
111 05/04/2021 88 N N
111 06/04/2021 89 N N
111 07/04/2021 90 Y N
111 08/04/2021 91 Y N
111 09/04/2021 92 Y N
111 10/04/2021 93 Y N
111 11/04/2021 00 N N
111 12/04/2021 00 N N
111 13/04/2021 00 N N
111 14/04/2021 00 N N
111 15/04/2021 00 N N
111 16/04/2021 01 N N
111 17/04/2021 02 N N
111 18/04/2021 00 N N
111 19/04/2021 00 N N
111 20/04/2021 00 N N
111 21/04/2021 00 N N
111 22/04/2021 00 N N
111 23/04/2021 00 N N
111 24/04/2021 00 N N
111 25/04/2021 00 N Y
222 04/04/2021 86 N N
222 05/04/2021 87 N N
222 06/04/2021 88 N N
222 07/04/2021 89 N N
222 08/04/2021 90 Y N
222 09/04/2021 91 Y N
222 10/04/2021 92 Y N
222 11/04/2021 00 N N
222 12/04/2021 01 N N
222 13/04/2021 02 N N
222 14/04/2021 03 N N
222 15/04/2021 04 N N
222 16/04/2021 05 N Y
333 04/04/2021 87 N N
333 05/04/2021 88 N N
333 06/04/2021 89 N N
333 07/04/2021 90 Y N
333 08/04/2021 91 Y N
333 09/04/2021 92 Y N
333 10/04/2021 00 N N
333 11/04/2021 00 N N
333 12/04/2021 00 N N
333 13/04/2021 00 N N
333 14/04/2021 00 N N
333 15/04/2021 00 N N
333 16/04/2021 01 N N
333 17/04/2021 02 N N
333 18/04/2021 03 Y N
333 19/04/2021 04 Y N
333 20/04/2021 05 Y N
333 21/04/2021 00 Y N
333 22/04/2021 01 N N
333 23/04/2021 02 N N
333 24/04/2021 00 N N
333 25/04/2021 00 N N
333 26/04/2021 00 N N
333 27/04/2021 00 N N
333 28/04/2021 00 N N
333 29/04/2021 00 N N
333 30/04/2021 00 N N
333 01/05/2021 00 N N
333 02/05/2021 00 N N
333 03/05/2021 00 N N
333 04/05/2021 90 Y Y
444 10/12/2021 00 N N
444 11/12/2021 90 Y N
444 12/12/2021 91 Y Y
;
Run;
Data latest_data;
set master_data(where=(latest_flag="Y"));
run;
proc sort data=latest_data;
by cid date;
run;
data New_data;
input @1 CID @5 date ddmmyy10. @16 DPD @19 Default_Flag $;
format date ddmmyy10.;
datalines;
111 26/04/2021 00 N
333 05/05/2021 91 Y
222 17/04/2021 06 Y
444 13/12/2021 00 N
555 07/04/2021 00 Y
555 08/04/2021 00 N
555 09/04/2021 00 N
;
Run;
proc sort data=new_data out=_new_data;
by cid date;
run;
data combine;
set latest_data _new_data;
by cid date;
run;
data new_want;
if 0 then set combine;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set combine;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
proc sql;
create table latest_after_execution as
select *
from new_want
group by cid
having date=max(date);
quit;
This is what I would expect to get as a results;
data combine_new;
set master_data(drop=latest_flag) _new_data;
by cid date;
run;
proc sort data=combine_new out=_combine_new;
by cid date;
run;
data want;
if 0 then set _combine_new;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set _combine_new;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
proc sql;
create table unwantedmethod_after_execution as
select *
from want
group by cid
having date=max(date);
quit;
I want the table latest_after_execution to be same as table unwantedmethod_after_execution.
Note that Master table has been executed and contains probation_count, probation_flag and cure_count.
If I am gonna compute the variables for new_data, the following would be my initial table
data initial_table;
if 0 then set Master_data;
Probation_count=0;
length Probation_Flag $1;
Cure_count=0;
do until(last.Default_Flag);
set Master_data;
by cid Default_Flag notsorted;
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
else if Default_Flag='Y' then _k=.;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
I really really hope I have made myself clear now. If not I will try harder.
Thank you so so much for your help.
@Myurathan Got it. Thank you. Please bear with me until late Fri(Today evening) or sat as I have some tight deadline to meet certain requirements at work to be delivered by today. I will respond at the soonest convenience.
Hi @Myurathan Upon looking back the entire thread and reviewing once again, your additional request isn't quite a mere adjustment to the existing thread and rather would involve a new logic i.e. a look up of sorts. I do understand and acknowledge you wouldn't want to run the initial table again and rather process only the subsequent delta records for future dates or new CIDs. This process I am afraid complicates much more than having to process upstream that would mandate a redesign of your upstream processing and rather not make the downstream processing heavy. Typically, the design of the process flow is best when it conforms to a neat water fall.
Also, if you were able to understand and observe the initial solution that encompasses the series of _K: flag variables being tightly coded with a forward looking continuous logic. On the other hand, the new_data needing a look up to the latest from previous and then process for newer or existing CIDs for future dates would rather mandate a pressing need for an overhaul of the process. I guess that's the gap most business functions don't seem to comprehend up front in the design leading to a spaghetti design, which is not desirable in most cases.
Nevertheless, with all the above being said, I have modified the code to produce the results you expect albeit I am really not sure how convenient it is. Since I have been pretty much swamped at work, I haven't really been able to test thoroughly, so would like you to test and let me know and we shall fix it on the go.
/*Sort new_date by cid and date*/
proc sort data=new_data out=_new_data;
by cid date;
run;
/*Subset only the latest record from initial table*/
data temp;
set initial_table;
where latest_flag='Y';
keep cid probation_count probation_flag cure_count;
run;
/*Process the _new_data */
data result_for_new_data;
if _n_=1 then do;
if 0 then set _new_data temp;
dcl hash H (dataset:'temp') ;
h.definekey ("cid") ;
h.definedata ("probation_count", "probation_flag", "cure_count") ;
h.definedone () ;
end;
do until(last.Default_Flag);
set _new_data;
by cid Default_Flag notsorted;
if first.cid then do;
_iorc_= h.find();
if _iorc_ ne 0 then do;
probation_count=0;
probation_flag=' ';
cure_count=0;
end;
end;
if _iorc_ ne 0 then do;
if Default_Flag='N' and first.Default_Flag and not first.cid then _k=1;
else if Default_Flag='Y' then _k=.;
end;
else do;
if Default_Flag='N' then do;
_k=1;
if Cure_count then _k1=1;
end;
else do;
_k=.;
probation_count=0;
probation_flag=' ';
cure_count=0;
end;
end;
if _k then do;
if _k1=. then do;
Probation_count+1;
Probation_Flag='Y';
end;
if Probation_count>10 then _k1=1;
if _k1 then do;
Probation_Flag=' ';
if _k2=. then Cure_count+1;
else Cure_count=0;
end;
if Cure_count=10 then _k2=1;
if dpd>3 or Probation_count>10 then Probation_count=0;
end;
output;
end;
drop _:;
run;
Also, in my humble opinion, you should try and break down your requirement into smaller portions and post as independent threads for the reason, most people would be encouraged to take a stab. The reason is, presenting two different scopes that involving different logic is best addressed as independent thread/questions.
I hope that helps. Best!
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.