DATA Step, Macro, Functions and more

"Normalize" values of a variable across a BY GROUP

Reply
Contributor
Posts: 53

"Normalize" values of a variable across a BY GROUP

Hello all,

 

In the dataset HAVE, on 31/05/2017 there was no PMT for id=101 (system automatically assigns negative values for the required amounts).

Subsequently id=101 paid the missing installments on 31/07/2017 (so I have four PMT records)

 

I would like to create the dataset WANT, where:

 

1.__  I will delete negative pmts at this date,31/05/2017 , and re-create the erased records with the double ones from 31/07/2017 

2.---  Populate the missing two PMTs for 30/10/2017

 

I would be grateful for any hint.

 

Thanking you in advance.

 

 

data have;

input id date ddmmyy10. pmt;

format date date9.;

datalines;

 

105 30/11/2017 24.49

105 30/11/2017 10.51

105 30/09/2017 24.49

105 30/09/2017 10.51

105 31/08/2017 24.49

105 31/08/2017 10.51

105 31/07/2017 24.49

105 31/07/2017 10.51

105 31/07/2017 10.51

105 31/07/2017 24.49

105 30/06/2017 24.49

105 30/06/2017 10.51

105 31/05/2017 -24.49

105 31/05/2017 -10.51

105 30/04/2017 24.49

105 30/04/2017 10.51

105 31/03/2017 24.49

105 31/03/2017 10.51

105 28/02/2017 24.49

105 28/02/2017 10.51

105 31/01/2017 24.49

;

run;

data want;

input id date ddmmyy10. pmt;

format date date9.;

datalines;

 

105 30/11/2017 24.49

105 30/11/2017 10.51

105 30/10/2017 24.49

105 30/102017 10.51

105 30/09/2017 24.49

105 30/09/2017 10.51

105 31/08/2017 24.49

105 31/08/2017 10.51

105 31/07/2017 24.49

105 31/07/2017 10.51

105 30/06/2017 24.49

105 30/06/2017 10.51

105 31/05/2017 24.49

105 31/05/2017 10.51

105 30/04/2017 24.49

105 30/04/2017 10.51

105 31/03/2017 24.49

105 31/03/2017 10.51

105 28/02/2017 24.49

105 28/02/2017 10.51

105 31/01/2017 24.49

;

run;

 

PROC Star
Posts: 1,300

Re: "Normalize" values of a variable across a BY GROUP

[ Edited ]
Posted in reply to Zeus_Olympous

is it just one scenario where missing two PMTs for 30/10/2017 occur or missing may occur for various dates? May i ask have you covered the entire scenarios? Just curious. sorry.

 

Also, should i assume the payment is the last day of the month? If yes, shouldn't it be 31st oct as opposed to 30/10/2017 coz some months have 30 days and some have 31

Contributor
Posts: 53

Re: "Normalize" values of a variable across a BY GROUP

Posted in reply to novinosrin

Thank you for your message.

 

Yes, your observations are correct.

 

1.--Always at the end of the month (sorry for the ending months date mistakes.

2.-  Missing PMTs may occur for various dates and maybe more than two negative number but always they will be offset by the same number of positive PMTS at a later month (enddate).

 

Of course there are too many IDs that this situation can happen.

 

Thank you.

Super Contributor
Super Contributor
Posts: 260

Re: "Normalize" values of a variable across a BY GROUP

Posted in reply to Zeus_Olympous

It looks like you are essentially trying to reconstruct what should have occurred (payment schedule) from what actually occurred (payment record).  Perhaps your goal is some sort of calculation regarding the value of delayed payments- we don't know.

 

If I were doing something like this, I would guess a payment schedule already existed some place, that I didn't need to reconstruct it, and that instead I needed to seek it out. 

 

Perhaps I don't have enough information about your situation.

PROC Star
Posts: 1,300

Re: "Normalize" values of a variable across a BY GROUP

[ Edited ]
Posted in reply to Zeus_Olympous

Not the best to my satisfaction, but see if this helps: 

My observations: 1. seems like an ad-hoc fix using code .2 not a great database /DWH/DM design 3. Perhaps a DIY design without focusing on rules and some other considerations as my professor would say 

 

data have;

input id date ddmmyy10. pmt;

format date date9.;

datalines;

105 30/11/2017 24.49

105 30/11/2017 10.51

105 30/09/2017 24.49

105 30/09/2017 10.51

105 31/08/2017 24.49

105 31/08/2017 10.51

105 31/07/2017 24.49

105 31/07/2017 10.51

105 31/07/2017 10.51

105 31/07/2017 24.49

105 30/06/2017 24.49

105 30/06/2017 10.51

105 31/05/2017 -24.49

105 31/05/2017 -10.51

105 30/04/2017 24.49

105 30/04/2017 10.51

105 31/03/2017 24.49

105 31/03/2017 10.51

105 28/02/2017 24.49

105 28/02/2017 10.51

105 31/01/2017 24.49

;

run;

 

data want;

do  _n_=1 by 1 until(last.id);

     set have;

     by id descending date;

     _k=mod(_n_,2) ne 0;

     _prev_pmt1=lag1(pmt);

     _prev_pmt2=lag2(pmt);

     _k2=lag(date);

     _k3=lag2(date);

     if _k and date ne _end and date ne _k2 and not missing(_prev_pmt1) and not missing(_prev_pmt2) then

           do;

                _date=date;

                _pmt=pmt;

                date=_end;

                pmt=_prev_pmt2;

                output;

                pmt=_prev_pmt1;

                output;

                date=_date;

                pmt=_pmt;

                output;

           end;

    

          

     else if _k and date eq _k2 and date eq _k3 then

           do;

                _impute_for_negative1=pmt;

                continue;

           end;

     else if not _k and date eq _k2 and date eq _k3 then

           do;

                _impute_for_negative2=pmt;

                continue;

           end;

     else if _k and date eq _end and pmt<0 then

           do;

                pmt=_impute_for_negative1;

                output;

           end;

     else if not _k and pmt<0 then

           do;

                pmt=_impute_for_negative2;

                output;

           end;

     else output;

     _end=intnx('month', date, -1, 'e');

 

end;

drop _:;

run;

PROC Star
Posts: 1,300

Re: "Normalize" values of a variable across a BY GROUP

[ Edited ]
Posted in reply to Zeus_Olympous

Zeus_Olympous wrote:

Thank you for your message.

 

Yes, your observations are correct.

 

1.--Always at the end of the month (sorry for the ending months date mistakes.

2.-  Missing PMTs may occur for various dates and maybe more than two negative number but always they will be offset by the same number of positive PMTS at a later month (enddate). I think this will prompt a change to the code i gave you Smiley Sad

 

Of course there are too many IDs that this situation can happen.

 

Thank you.


 

Super User
Posts: 13,018

Re: "Normalize" values of a variable across a BY GROUP

Posted in reply to Zeus_Olympous

Zeus_Olympous wrote:

Thank you for your message.

 

2.-  Missing PMTs may occur for various dates and maybe more than two negative number but always they will be offset by the same number of positive PMTS at a later month (enddate).

 

 


And if there are more than two negative time periods how to identify which with the proper time period?? Is there some rule that would identify those? If the data is fixed payments such as car loans or similar with identical values as shown for your example data attaching them to the correct period gets a tad tricky. If the payments varied then it might be possible to identify based on value= - value. Maybe.

Contributor
Posts: 53

Re: "Normalize" values of a variable across a BY GROUP

If this is the case i.e. multiple months with negative values always there will be future months with double PMTs or triple PMTs.

 

Thank you.

Super User
Posts: 13,018

Re: "Normalize" values of a variable across a BY GROUP

Posted in reply to Zeus_Olympous

Zeus_Olympous wrote:

If this is the case i.e. multiple months with negative values always there will be future months with double PMTs or triple PMTs.

 

Thank you.


And how to identify which ones replace which?

PROC Star
Posts: 1,300

Re: "Normalize" values of a variable across a BY GROUP

Posted in reply to Zeus_Olympous

Hi, I would appreciate if you could make a more clear and comprehensive sample HAVE and WANT with all possible business scenarios to tackle. Also, please briefly explain the convert logic in some bullet points.

I am keen and willing to write a code at least for my learning purpose however to make use of such codes as a solution in communities, it can be helped to avoid rework or going back and forth. Thank you!

Ask a Question
Discussion stats
  • 9 replies
  • 153 views
  • 0 likes
  • 4 in conversation