Hi.
I have the following table
For a given id, the RSN codes and RSN amounts remain constant across all the lines (line). I need to create additional lines based on following conditions.
For an ID, If RSN_CD_1 = XA and RSN_AMT_1 is not NULL , insert a row with line = 1001, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_1.
For an ID, If RSN_CD_2 = XZ and RSN_AMT_2 is not NULL, insert a row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_2.
For an ID, If RSN_CD_1 and RSN_CD_2 are populated with different values (XY and XZ) including their amounts, then insert 2 rows with line = 1001 and 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with individual RSN_AMTs.
For an ID, If RSN_CD_1 and RSN_CD_2 are populated with same values (XZ for eg) including their amounts, then insert 1 row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with sum of RSN_AMT_1 and RSN_AMT_2.
In the new lines created, the rest of the amt columns/ SVC_CD columns are set to NULL or 0. Below is the final output I am looking for
Can some one please help me out how to do it?
Sure, here you go
data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50
D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
Sorry, this is much better formatted.
data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50
D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
If something more elegant occurs, I will post later. I am sure you will hear from some more people with nifty solutions. In the meantime try this:
data want ;
do _n_ = 1 by 1 until (last.id) ;
set demo end = eof1 ;
by id ;
array r(2) ;
array rsn_cd_ (2) $ ;
array rsn_amt_ (2) ;
array chkc (2) $ ("XA" "XZ") ;
r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ;
maxdt = max(maxdt,svc_dt) ;
end ;
do _n_ = 1 by 1 until (last.id) ;
set demo end = eof2 ;
by id ;
final_amt = 0 ;
output ;
if r(_n_) then
do ;
line = 1000 + _n_ ;
svc_dt = maxdt ;
final_amt = rsn_amt_(_n_) ;
call missing (of rsn_cd_(*)) ;
do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ;
output ;
end ;
end ;
drop maxdt chkc1 chkc2 r1 r2 i ;
run ;
proc sort ;
by id line ;
run ;
Earlier code was copied and pasted code from one version too early:
data want ;
do _n_ = 1 by 1 until (last.id) ;
set demo end = eof1 ;
by id ;
array r(2) ;
array rsn_cd_ (2) $ ;
array rsn_amt_ (2) ;
array chkc (2) $ ("XA" "XZ") ;
r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ;
maxdt = max(maxdt,svc_dt) ;
end ;
do _n_ = 1 by 1 until (last.id) ;
set demo end = eof2 ;
by id ;
final_amt = 0 ;
output ;
if r(_n_) then
do ;
line = 1000 + _n_ ;
svc_dt = maxdt ;
final_amt = rsn_amt_(_n_) ;
if rsn_cd_1 = rsn_cd_2 then final_amt = sum(of rsn_amt_(*)) ;
call missing (of rsn_cd_(*)) ;
do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ;
output ;
end ;
end ;
drop maxdt chkc1 chkc2 r1 r2 i ;
run ;
proc sort ;
by id line ;
run ;
Not recommending this approach, however methinks this is no more than a solution that encompasses a one pass of by group for the Max(svc date) and then followed by boring IF THEN verbose, which the following is little incomplete for the non missing RSN_CD series as I am very lazy. Feel free to play around and complete if you would like to try.
data demo;
infile datalines missover dlm='';
input @1 id $ line rsn_cd_1 $ rsn_cd_2 $ svc_cd $ svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA . ABC 11JAN2020 10 20 30 40 0
D100 2 XA . ABC 15JAN2020 0 0 0 40 0
D200 1 . XZ DEF 13MAY2020 0 0 0 0 50
D200 2 . XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;
data want;
do until(last.id);
set demo;
by id;
_m=svc_dt max _m;
Final_amt=0;
output;
end;
retain _iorc_ 1000;
array t RSN_AMT_:;
if cmiss(of rsn_cd_:) then do;
_n_=whichc(coalescec(of rsn_cd_:),of rsn_cd_:);
line=_iorc_+_n_;
Final_amt=t(_n_);
svc_dt=_m;
call missing(of rsn_cd_:,of t(*),svc_cd);
call pokelong(repeat(put(0,rb8.),1),addrlong(t(1)),16);
output;
end;
/*One can continue with else do; for non missing rsn_cd: series*/
/*blah blah blah and so forth*/
drop _:;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.