Re: Add a new row based on condition from previous rows

shanmukh2 · Posted 05-27-2020 01:01 PM

Hi.

I have the following table

For a given id, the RSN codes and RSN amounts remain constant across all the lines (line). I need to create additional lines based on following conditions.

For an ID, If RSN_CD_1 = XA and RSN_AMT_1 is not NULL , insert a row with line = 1001, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_1.

For an ID, If RSN_CD_2 = XZ and RSN_AMT_2 is not NULL, insert a row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with RSN_AMT_2.

For an ID, If RSN_CD_1 and RSN_CD_2 are populated with different values (XY and XZ) including their amounts, then insert 2 rows with line = 1001 and 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with individual RSN_AMTs.

For an ID, If RSN_CD_1 and RSN_CD_2 are populated with same values (XZ for eg) including their amounts, then insert 1 row with line = 1002, SVC_DT as max(svc_dt) of that ID and create a new column FINAL_AMT and populate it with sum of RSN_AMT_1 and RSN_AMT_2.

In the new lines created, the rest of the amt columns/ SVC_CD columns are set to NULL or 0. Below is the final output I am looking for

Can some one please help me out how to do it?

biopharma · Posted 05-27-2020 01:34 PM

Can you post code to create the dataset that you HAVE? For example see how this poster has created.

https://communities.sas.com/t5/SAS-Programming/Covert-date-from-character-or-numeric-before-append/m...

General instructions to create code out of a dataset are here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

shanmukh2 · Posted 05-27-2020 02:20 PM

Sure, here you go

data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50
D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;

shanmukh2 · Posted 05-27-2020 02:47 PM

Sorry, this is much better formatted.

data demo;
infile datalines missover dlm='';
input @1 id $1-4 @6 line @8 rsn_cd_1 $8-9 @11 rsn_cd_2 $11-12 @14 svc_cd $14-16 @18 svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA ABC 11JAN2020 10 20 30 40 0
D100 2 XA ABC 15JAN2020 0 0 0 40 0
D200 1 XZ DEF 13MAY2020 0 0 0 0 50

D200 2 XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;

shanmukh2 · Posted 05-27-2020 03:04 PM

I have formatted better, in the below message

biopharma · Posted 05-27-2020 03:07 PM

If something more elegant occurs, I will post later. I am sure you will hear from some more people with nifty solutions. In the meantime try this:

data want ;
   do _n_ = 1 by 1 until (last.id) ;
      set demo end = eof1 ;
      by id ;
      array r(2) ;
      array rsn_cd_ (2) $ ;
      array rsn_amt_ (2) ;
      array chkc (2) $ ("XA" "XZ") ;
      r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ;
      maxdt = max(maxdt,svc_dt) ;
   end ;
   
   do _n_ = 1 by 1 until (last.id) ;
      set demo end = eof2 ;
      by id ;
      final_amt = 0 ;
      output ;
      if r(_n_) then 
         do ;
            line = 1000 + _n_ ;
            svc_dt = maxdt ;
            final_amt = rsn_amt_(_n_) ;
            call missing (of rsn_cd_(*)) ;
            do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ;
            output ;
         end ;
   end ;
   drop maxdt chkc1 chkc2 r1 r2 i ;
run ;

proc sort ; 
  by id line ;
run ;

biopharma · Posted 05-27-2020 03:23 PM

Earlier code was copied and pasted code from one version too early:

data want ;
   do _n_ = 1 by 1 until (last.id) ;
      set demo end = eof1 ;
      by id ;
      array r(2) ;
      array rsn_cd_ (2) $ ;
      array rsn_amt_ (2) ;
      array chkc (2) $ ("XA" "XZ") ;
      r(_n_) = (rsn_cd_(_n_) = chkc(_n_)) * rsn_amt_(_n_) ;
      maxdt = max(maxdt,svc_dt) ;
   end ;
   
   do _n_ = 1 by 1 until (last.id) ;
      set demo end = eof2 ;
      by id ;
      final_amt = 0 ;
      output ;
      if r(_n_) then 
         do ;
            line = 1000 + _n_ ;
            svc_dt = maxdt ;
            final_amt = rsn_amt_(_n_) ;
            if rsn_cd_1 = rsn_cd_2 then final_amt = sum(of rsn_amt_(*)) ;
            call missing (of rsn_cd_(*)) ;
            do i = 1 to 2 ; rsn_amt_(i) = min(0,rsn_amt_(i)) ; end ;
            output ;
         end ;
   end ;
   drop maxdt chkc1 chkc2 r1 r2 i ;
run ;

proc sort ; 
  by id line ;
run ;

novinosrin · Posted 05-27-2020 05:19 PM

Not recommending this approach, however methinks this is no more than a solution that encompasses a one pass of by group for the Max(svc date) and then followed by boring IF THEN verbose, which the following is little incomplete for the non missing RSN_CD series as I am very lazy. Feel free to play around and complete if you would like to try.


data demo;
infile datalines missover dlm='';
input @1 id $ line  rsn_cd_1 $  rsn_cd_2 $ svc_cd $ svc_dt date9. amt_1 amt_2 amt_3 RSN_AMT_1 RSN_AMT_2;
format svc_dt mmddyy10.;
datalines;
D100 1 XA  .  ABC 11JAN2020 10 20 30 40 0
D100 2 XA  .  ABC 15JAN2020 0 0 0 40 0
D200 1  .  XZ DEF 13MAY2020 0 0 0 0 50
D200 2  .  XZ GHI 25MAY2020 0 0 0 0 50
D500 1 XA XZ PQR 20JUN2020 0 0 0 10 30
D500 2 XA XZ GHI 28JUN2020 0 0 0 10 30
D800 1 XZ XZ ABC 27MAY2020 0 0 0 20 40
D800 2 XZ XZ PQR 20JUN2020 0 0 0 20 40
;
run;

data want;
 do until(last.id);
  set demo;
  by id;
  _m=svc_dt max _m;
  Final_amt=0;
  output;
 end;
 retain _iorc_ 1000;
 array t RSN_AMT_:;
 if cmiss(of rsn_cd_:) then do;
  _n_=whichc(coalescec(of rsn_cd_:),of rsn_cd_:);
  line=_iorc_+_n_;
  Final_amt=t(_n_);
  svc_dt=_m;
  call missing(of rsn_cd_:,of t(*),svc_cd);
  call pokelong(repeat(put(0,rb8.),1),addrlong(t(1)),16);
  output;
 end;
/*One can continue with else do; for non missing rsn_cd: series*/
/*blah blah blah and so forth*/
 drop _:;
run;

SAS Training: Just a Click Away