Hi.
I have the following table
For a given id, the RSN codes and RSN amounts remain constant across all the lines (line). I need to create additional lines based on following conditions.
For an ID, If RSN_CD_1 = XA and RSN_AMT_1 is not NULL , insert a row with line = 1001, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_1.
For an ID, If RSN_CD_2 = XZ and RSN_AMT_2 is not NULL, insert a row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_2.
For an ID, If RSN_CD_1 and RSN_CD_2 are populated with different values (XY and XZ) including their amounts, then insert 2 rows with line = 1001 and 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with individual RSN_AMTs.
For an ID, If RSN_CD_1 and RSN_CD_2 are populated with same values (XZ for eg) including their amounts, then insert 1 row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with sum of RSN_AMT_1 and RSN_AMT_2.
In the new lines created, the rest of the amt columns/ SVC_CD columns are set to NULL or 0. Below is the final output I am looking for
Can some one please help me out how to do it?
Sure, here you go
data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50
D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
Sorry, this is much better formatted.
data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50
D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
If something more elegant occurs, I will post later. I am sure you will hear from some more people with nifty solutions. In the meantime try this:
data want ; do _n_ = 1 by 1 until (last.id) ; set demo end = eof1 ; by id ; array r(2) ; array rsn_cd_ (2) $ ; array rsn_amt_ (2) ; array chkc (2) $ ("XA" "XZ") ; r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ; maxdt = max(maxdt,svc_dt) ; end ; do _n_ = 1 by 1 until (last.id) ; set demo end = eof2 ; by id ; final_amt = 0 ; output ; if r(_n_) then do ; line = 1000 + _n_ ; svc_dt = maxdt ; final_amt = rsn_amt_(_n_) ; call missing (of rsn_cd_(*)) ; do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ; output ; end ; end ; drop maxdt chkc1 chkc2 r1 r2 i ; run ; proc sort ; by id line ; run ;
Earlier code was copied and pasted code from one version too early:
data want ; do _n_ = 1 by 1 until (last.id) ; set demo end = eof1 ; by id ; array r(2) ; array rsn_cd_ (2) $ ; array rsn_amt_ (2) ; array chkc (2) $ ("XA" "XZ") ; r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ; maxdt = max(maxdt,svc_dt) ; end ; do _n_ = 1 by 1 until (last.id) ; set demo end = eof2 ; by id ; final_amt = 0 ; output ; if r(_n_) then do ; line = 1000 + _n_ ; svc_dt = maxdt ; final_amt = rsn_amt_(_n_) ; if rsn_cd_1 = rsn_cd_2 then final_amt = sum(of rsn_amt_(*)) ; call missing (of rsn_cd_(*)) ; do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ; output ; end ; end ; drop maxdt chkc1 chkc2 r1 r2 i ; run ; proc sort ; by id line ; run ;
Not recommending this approach, however methinks this is no more than a solution that encompasses a one pass of by group for the Max(svc date) and then followed by boring IF THEN verbose, which the following is little incomplete for the non missing RSN_CD series as I am very lazy. Feel free to play around and complete if you would like to try.
data demo;
infile datalines missover dlm='';
input @1 id $ line rsn_cd_1 $ rsn_cd_2 $ svc_cd $ svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA . ABC 11JAN2020 10 20 30 40 0
D100 2 XA . ABC 15JAN2020 0 0 0 40 0
D200 1 . XZ DEF 13MAY2020 0 0 0 0 50
D200 2 . XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
data want;
do until(last.id);
set demo;
by id;
_m=svc_dt max _m;
Final_amt=0;
output;
end;
retain _iorc_ 1000;
array t RSN_AMT_:;
if cmiss(of rsn_cd_:) then do;
_n_=whichc(coalescec(of rsn_cd_:),of rsn_cd_:);
line=_iorc_+_n_;
Final_amt=t(_n_);
svc_dt=_m;
call missing(of rsn_cd_:,of t(*),svc_cd);
call pokelong(repeat(put(0,rb8.),1),addrlong(t(1)),16);
output;
end;
/*One can continue with else do; for non missing rsn_cd: series*/
/*blah blah blah and so forth*/
drop _:;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.