Solved: How to Get one row before depending on a condition?

ertr · Posted 02-19-2021 08:20 PM

Hello everyone,

I try to bring the YEARMONTHDATE of the latest payment as a new variable when there is no payment in my data set. I created a sample table like the one below.

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

The code I wrote is as follows but for some reason, it didn't work

DATA WANT;
SET HAVE;
IF PAYMENT>0 OR PAYMENT NE . THEN DO;
PAYMENT_EXIST_DATE=YEARMONTHDATE;
CALL SYMPUTX("PAYMENT_EXIST_DATE",PUT(PAYMENT_EXIST_DATE,DATE9.));
%PUT &PAYMENT_EXIST_DATE.;
END;
ELSE DO;
PAYMENT_EXIST_DATE="&PAYMENT_EXIST_DATE."D;
END;
TIMESINCELASTPAYMENT=INTCK("month",PAYMENT_EXIST_DATE,YEARMONTHDATE);
FORMAT PAYMENT_EXIST_DATE DATE9.;
RUN;

The table I desire is as follows, can you help me, please?

mkeintz · Posted 02-20-2021 02:27 PM

The first 2 observations for ID2 (JAN and FEB 2020) have no payments. You assign those obs a payment_exist_date in the future, namely 31DEC2020. I assume that when an ID starts out with no payment, you retrieve the date of the latest (future) payment.

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

data want ;
  set have (where=(payment^=.) in=firstpass) 
      have (in=secondpass);
  by id;
  retain payment_exist_date;

  if first.id then call missing(payment_exist_date);

  if firstpass then payment_exist_date=yearmonthdate;

  if secondpass;
  if payment^=. then payment_exist_date=yearmonthdate;
  if yearmonthdate>=payment_exist_date then time_since=intck('month',payment_exist_date,yearmonthdate);
  format payment_exist_date date9. ;
run;

This program passes through each ID twice. The first pass reads only dates with payments, and constantly reassigns payment_exist_date to the progression of yearmonthdate values. Then the second pass will start out with the latest value for that id, and will update payment_exist_date only for valid payments. It calculates the interval only when payment_exist_date is not a date in the future compared to yearmonthdate.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Shmuel · Posted 02-20-2021 03:02 AM

Whenever you need to do something with an observation coming prior to the one by which yo make the decision when to do it use sort ascending, thus the condition is done prion the observation to deal.

Check next code:

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
; run;
proc sort data=have out=temp;
 by ID descending yearmonthdate;
run;
data want;
 set temp;
  by ID;
     retain payment_exist_date;
     FORMAT YEARMONTHDATE payment_exist_date DATE9.;
     if first.ID then payment_exist_date = yearmonthdate; /* ??? */
     if not missing(payment) 
     then payment_exist_date=yearmonthdate;
run;    
proc sort data=want;
 by ID yearmonthdate;
run;

I'm note sure I understood the exact condition and which observations will get the correct payment_exist_date.

ertr · Posted 02-21-2021 07:08 AM

Hello Shumel,

Everything seems correct but there is only one thing that I should fix, since ID 2 begins without any payment, the code calculates like there is a payment at the beginning of ID 2`s date of inclusion in the data.

DATA WANT;
 SET HAVE;
  BY ID;
     RETAIN PAYMENT_EXIST_DATE;
     FORMAT YEARMONTHDATE PAYMENT_EXIST_DATE DATE9.;
     IF FIRST.ID THEN PAYMENT_EXIST_DATE = YEARMONTHDATE; 
     IF PAYMENT >0 
     THEN PAYMENT_EXIST_DATE=YEARMONTHDATE;
	 TIMESINCELASTPAYMENT=INTCK("MONTH",PAYMENT_EXIST_DATE,YEARMONTHDATE);
	 KEEP ID YEARMONTHDATE PAYMENT PAYMENT_EXIST_DATE TIMESINCELASTPAYMENT;
RUN;

Thanks

mkeintz · Posted 02-20-2021 02:27 PM

The first 2 observations for ID2 (JAN and FEB 2020) have no payments. You assign those obs a payment_exist_date in the future, namely 31DEC2020. I assume that when an ID starts out with no payment, you retrieve the date of the latest (future) payment.

DATA HAVE;
LENGTH ID 8 PAYMENT 8 YEARMONTHDATE 8;
INFILE DATALINES MISSOVER;
INPUT ID PAYMENT YEARMONTHDATE;
FORMAT YEARMONTHDATE DATE9.;
DATALINES;
1 1000 21945
1 . 21974
1 1000 22005
1 . 22035
1 1000 22066
1 1000 22096
1 . 22127
1 1000 22158
1 1000 22188
1 1000 22219
1 . 22249
1 1000 22280
2 . 21945
2 . 21974
2 1000 22005
2 . 22035
2 1000 22066
2 1000 22096
2 . 22127
2 1000 22158
2 1000 22188
2 1000 22219
2 . 22249
2 1000 22280
;

data want ;
  set have (where=(payment^=.) in=firstpass) 
      have (in=secondpass);
  by id;
  retain payment_exist_date;

  if first.id then call missing(payment_exist_date);

  if firstpass then payment_exist_date=yearmonthdate;

  if secondpass;
  if payment^=. then payment_exist_date=yearmonthdate;
  if yearmonthdate>=payment_exist_date then time_since=intck('month',payment_exist_date,yearmonthdate);
  format payment_exist_date date9. ;
run;

This program passes through each ID twice. The first pass reads only dates with payments, and constantly reassigns payment_exist_date to the progression of yearmonthdate values. Then the second pass will start out with the latest value for that id, and will update payment_exist_date only for valid payments. It calculates the interval only when payment_exist_date is not a date in the future compared to yearmonthdate.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

Re: How to Get one row before depending on a condition?

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away