Solved: Re: Getting max value by group and with a predefined time horizon.

inid · Posted 05-12-2019 04:42 PM

I have a data set structured like below.

ID	Date	Status
x	1/31/2017	0
x	1/31/2017	0
x	2/28/2017	1
y	1/31/2017	0
y	2/28/2017	1
z	1/31/2017	0
z	1/31/2017	0

I want to get the maximum value of column status 1 year from actual date. to do this i am using proc sql but it isn't functioning as it should (i am quite new to this).

what i want to get is:

ID	Date	Status	Status after 1 year
1	12/31/2016	0	0
1	1/31/2017	0	1
1	1/31/2018	1

what i get:

ID	Date	Status	Status after 1 year
1	12/31/2016	0	1
1	1/31/2017	0	1
1	1/31/2018	1	1

The code i have used:

Proc sql;

Select Customer ID, Date, Status, Max (Status) as Status after 1 year

from db t1

where Date between t1.Date and t1.date + 365

group by Customer ID, Date

thank you in advance. is there any way to do this without proc sql? if not could you please provide any hint on how to proceed? I am using sas enterprise guide 7.1

Patrick · Posted 05-13-2019 06:07 AM

@inid

Nothing to be sorry about. Below code should return what you're after.

The yrdif() function as I've used it calculates the AGE between two dates. As I understand you this should be something between 0 and exactly 1 (if the end date is one day later yrdif() would return a value of 1 with some decimals).

data have;
 infile datalines truncover dlm=' ' dsd;
 input ID Observation_date:anydtdte. Status;
 format Observation_date date9.;
 datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;

proc sql;
  select distinct
    l.*, 
    max(r.status) as max_status_within_year
  from 
    have l
      left join
    have r
      on 
        l.id=r.id 
        and yrdif(l.Observation_date,r.Observation_date) between 0 and 1
/*        and yrdif(intnx('month',l.Observation_date,0,'b'),intnx('month',r.Observation_date,0,'b')) between 0 and 1*/
/*        and intck('month',l.Observation_date,r.Observation_date) between 0 and 12*/
  group by l.id,l.Observation_date
  having max(r.status)=r.status
  ;
quit;

The first alternative condition in comments first aligns the two dates to the beginning of the month and only then calculates the AGE.

The 2nd alternative condition calculates the number of moths boundaries between two dates (31Jan2017, 01Feb2017 would return 1).

View solution in original post

ChrisNZ · Posted 05-12-2019 06:41 PM

Welcome to the forums!

1. Your input and output dates are different

2. Your input and output IDs are different

3. Post input data as a data step

How to convert datasets to data steps (You can use this macro)
How to post code

4. What does "1 year from actual date" mean? On the very same date a year later? Is there an interval to consider?

High-Performance SAS Coding - Third Edition

inid · Posted 05-12-2019 07:08 PM

Hi. thank you for the comment.

The initial data look something like below:

ID	Observation Date	Status
1	12/31/2016	0
1	1/31/2017	0
1	12/31/2017	0
1	1/31/2018	1

For each ID and for each data date (i have monthly data) i have their status. I want to add a column where it is shown the maximum value of Status 1 year from the observation date. For example in the first observation I want to know ID=1 status 1 year from observation date which would be 12/31/2017 and would be equal to 0. The same for the second observation i want to know its status 1 year from the 1/31/2017 so until 1/31/2018 and in this case it would be equal to 1.

1 year from now would mean 370 days added to observation date (considering that the monthly date is always on end of month but may vary with 1-2 days..

What I would like to have as a result is:

ID	Date	Status	Status after 1 year
1	12/31/2016	0	0
1	1/31/2017	0	1
1	12/31/2017	0	1
1	1/31/2018	1	1

for the last 2 observations even though 1 full year has not i want to have the maximum value of the available dates.

ScottBass · Posted 05-12-2019 09:52 PM

I pasted this into SAS:

ID	Observation Date	Status
1	12/31/2016	0
1	1/31/2017	0
1	12/31/2017	0
1	1/31/2018	1

but I get errors.

Edit your post so we can cut-and-paste your code into SAS. Don't make us do your work converting your post into valid SAS code.

Same goes for your desired results.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

inid · Posted 05-13-2019 03:11 AM

ID Observation_date Status

1 31/12/2016 0

1 31/01/2017 0

1 31/12/2017 0

1 31/01/2018 1

Sorry I did not have access to SAS when I wrote the issue.

this is the data copy pasted from SAS data.

The code I use is Below:

Proc sql;

SELECT

ID,OBSERVATION_DATE, Status, MAX(Status) FORMAT=BEST4. AS Status_after_1_year

FROM work.datatest t1

where Observation_date between t1.Observation_date and t1.Observation_date+365

GROUP BY ID

oRDER BY Observation_date asc;

QUIT;

the results i get:

ID	Observation_date	Status	Status_after_1_year
1	31/12/2016	0	1
1	31/01/2017	0	1
1	31/12/2017	0	1
1	31/01/2018	1	1

In the first observation i want the "Status_after_1_year" to be =0 as the maximum value 1 is reached only on 31 January 2018 and therefore the value=1 should start at the second observation.

I look forward to your feedback.

thank you in advance

PeterClemmensen · Posted 05-13-2019 03:24 AM

And you want to do so without PROC SQL, correct?

The DATA to DATA Step Macro
Blog: SASnrd

inid · Posted 05-13-2019 03:36 AM

No with Proc SQL is fine as well.

PeterClemmensen · Posted 05-13-2019 03:51 AM

Ok. Here is a data step approach, that gives you what you want

data have;
input ID Observation_date:ddmmyy10. Status;
format Observation_date ddmmyy10.;
datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 0
1 31/01/2018 1
2 31/12/2016 0
2 31/01/2017 0
2 31/12/2017 0
2 31/01/2018 1
;

data want(drop=max_date);
   do until (last.id);
      set have;
      by id;
      if Observation_date ge Max_date then Max_date=Observation_date;
   end;

   do until (last.id);
      set have;
      by id;
      Status_after_1_year=0;
      if intck('day', Observation_date, Max_date) le 365 then
         Status_after_1_year=1;
      output;
   end;
run;

This gives

The DATA to DATA Step Macro
Blog: SASnrd

inid · Posted 05-13-2019 04:30 AM

Maybe I was not clear but this does not work from me since i want to know the maximum value column "status" has and not just the difference between the two dates. For example if the data would be as below:

ID Observation_date Status

1 31/12/2016 0

1 31/01/2017 0

1 31/12/2017 1

1 31/01/2018 2

then the desired output wold be:

ID	Observation_date	Status	Status_after_1_year
1	31/12/2016	0	1
1	31/01/2017	0	2
1	31/12/2017	0	2
1	31/01/2018	1	2

Patrick · Posted 05-13-2019 04:47 AM

@inid

To better derive the rules you want to implement can you please show us how the desired result would look like for below sample data.

data have;
 infile datalines truncover dlm=' ' dsd;
 input ID Observation_date:anydtdte. Status;
 format Observation_date date9.;
 datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;

inid · Posted 05-13-2019 05:14 AM

Hi. the results woul look like below: (Considering that in my database the end of month date is not always the same i want the difference between the two observation date considered to be <=370. )

ID	Observation_date	Status	Status_after_1_year	Logic
1	31/12/2016	0	1	Maximum Value of Status from 31/12/2016 to 31/12/2017
1	31/01/2017	0	1	Maximum Value of Status from 31/01/2017 to 31/01/2018
1	31/12/2017	1	2	Maximum Value of Status from 31/12/2017 to 31/12/2018
1	31/01/2018	1	2	Maximum Value of Status from 31/01/2018 to 31/01/2019
2	31/03/2018	2	2	Maximum Value of Status from 31/03/2018 to 31/03/2019
3	31/01/2020	3	3	Maximum Value of Status from 31/01/2020 to 31/01/2021

Patrick · Posted 05-13-2019 05:20 AM

@inid

Sorry for continuing asking instead of providing a solution. I just try to get full clarity first.

Soo... about the 370 days and "Considering that in my database the end of month date is not always the same":

Are you aware that SAS provides calendar functions which would make it very simple to shift any date to the beginning or end of month before a date comparison? There is also a function which could just return the number of years or months or ... between two dates.

Could any of this be useful for your task? Would shifting the dates prior to comparing them help?

inid · Posted 05-13-2019 05:26 AM

I am sorry I am quite new to sas.
It would definitely help t have the end of month date and i was not aware i could achieve it. Never the less my main concern would be get the maximum value of the given column 1 year from the observation date of each row for each ID.
Thank you for the time spent.

Patrick · Posted 05-13-2019 06:07 AM

@inid

Nothing to be sorry about. Below code should return what you're after.

The yrdif() function as I've used it calculates the AGE between two dates. As I understand you this should be something between 0 and exactly 1 (if the end date is one day later yrdif() would return a value of 1 with some decimals).

data have;
 infile datalines truncover dlm=' ' dsd;
 input ID Observation_date:anydtdte. Status;
 format Observation_date date9.;
 datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;

proc sql;
  select distinct
    l.*, 
    max(r.status) as max_status_within_year
  from 
    have l
      left join
    have r
      on 
        l.id=r.id 
        and yrdif(l.Observation_date,r.Observation_date) between 0 and 1
/*        and yrdif(intnx('month',l.Observation_date,0,'b'),intnx('month',r.Observation_date,0,'b')) between 0 and 1*/
/*        and intck('month',l.Observation_date,r.Observation_date) between 0 and 12*/
  group by l.id,l.Observation_date
  having max(r.status)=r.status
  ;
quit;

The first alternative condition in comments first aligns the two dates to the beginning of the month and only then calculates the AGE.

The 2nd alternative condition calculates the number of moths boundaries between two dates (31Jan2017, 01Feb2017 would return 1).

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away