BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ertr
Quartz | Level 8

Hello everyone,

 

I try to bring the YEARMONTHDATE of the latest payment as a new variable when there is no payment in my data set. I created a sample table like the one below.

 

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

 

The code I wrote is as follows but for some reason, it didn't work

DATA WANT;
SET HAVE;
IF PAYMENT>0 OR PAYMENT NE . THEN DO;
PAYMENT_EXIST_DATE=YEARMONTHDATE;
CALL SYMPUTX("PAYMENT_EXIST_DATE",PUT(PAYMENT_EXIST_DATE,DATE9.));
%PUT &PAYMENT_EXIST_DATE.;
END;
ELSE DO;
PAYMENT_EXIST_DATE="&PAYMENT_EXIST_DATE."D;
END;
TIMESINCELASTPAYMENT=INTCK("month",PAYMENT_EXIST_DATE,YEARMONTHDATE);
FORMAT PAYMENT_EXIST_DATE DATE9.;
RUN;

The table I desire is as follows, can you help me, please?

 

Desired.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The first 2 observations for ID2 (JAN and FEB 2020) have no payments.  You assign those obs a payment_exist_date in the future, namely 31DEC2020.  I assume that when an ID starts out with no payment, you retrieve the date of the latest (future) payment.

 

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

data want ;
  set have (where=(payment^=.) in=firstpass) 
      have (in=secondpass);
  by id;
  retain payment_exist_date;

  if first.id then call missing(payment_exist_date);

  if firstpass then payment_exist_date=yearmonthdate;

  if secondpass;
  if payment^=. then payment_exist_date=yearmonthdate;
  if yearmonthdate>=payment_exist_date then time_since=intck('month',payment_exist_date,yearmonthdate);
  format payment_exist_date date9. ;
run;

This program passes through each ID twice.  The first pass reads only dates with payments, and constantly reassigns payment_exist_date to the progression of yearmonthdate values.  Then the second pass will start out with the latest value for that id, and will update payment_exist_date only for valid payments.  It calculates the interval only when payment_exist_date is not a date in the future compared to yearmonthdate.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Shmuel
Garnet | Level 18

Whenever you need to do something with an observation coming prior to the one by which yo make the decision when to do it use sort ascending, thus the condition is done prion the observation to deal.

Check next code:

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
; run;
proc sort data=have out=temp;
 by ID descending yearmonthdate;
run;
data want;
 set temp;
  by ID;
     retain payment_exist_date;
     FORMAT YEARMONTHDATE payment_exist_date DATE9.;
     if first.ID then payment_exist_date = yearmonthdate; /* ??? */
     if not missing(payment) 
     then payment_exist_date=yearmonthdate;
run;    
proc sort data=want;
 by ID yearmonthdate;
run;     

I'm note sure I understood the exact condition and which observations will get the correct payment_exist_date.

 

ertr
Quartz | Level 8

Hello Shumel,

 

Everything seems correct but there is only one thing that I should fix, since ID 2 begins without any payment, the code calculates like there is a payment at the beginning of ID 2`s date of inclusion in the data.

 

DATA WANT;
 SET HAVE;
  BY ID;
     RETAIN PAYMENT_EXIST_DATE;
     FORMAT YEARMONTHDATE PAYMENT_EXIST_DATE DATE9.;
     IF FIRST.ID THEN PAYMENT_EXIST_DATE = YEARMONTHDATE; 
     IF PAYMENT >0 
     THEN PAYMENT_EXIST_DATE=YEARMONTHDATE;
	 TIMESINCELASTPAYMENT=INTCK("MONTH",PAYMENT_EXIST_DATE,YEARMONTHDATE);
	 KEEP ID YEARMONTHDATE PAYMENT PAYMENT_EXIST_DATE TIMESINCELASTPAYMENT;
RUN; 

Thanks

 

 

mkeintz
PROC Star

The first 2 observations for ID2 (JAN and FEB 2020) have no payments.  You assign those obs a payment_exist_date in the future, namely 31DEC2020.  I assume that when an ID starts out with no payment, you retrieve the date of the latest (future) payment.

 

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

data want ;
  set have (where=(payment^=.) in=firstpass) 
      have (in=secondpass);
  by id;
  retain payment_exist_date;

  if first.id then call missing(payment_exist_date);

  if firstpass then payment_exist_date=yearmonthdate;

  if secondpass;
  if payment^=. then payment_exist_date=yearmonthdate;
  if yearmonthdate>=payment_exist_date then time_since=intck('month',payment_exist_date,yearmonthdate);
  format payment_exist_date date9. ;
run;

This program passes through each ID twice.  The first pass reads only dates with payments, and constantly reassigns payment_exist_date to the progression of yearmonthdate values.  Then the second pass will start out with the latest value for that id, and will update payment_exist_date only for valid payments.  It calculates the interval only when payment_exist_date is not a date in the future compared to yearmonthdate.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 601 views
  • 2 likes
  • 3 in conversation