I have a data set structured like below.
ID | Date | Status |
x | 1/31/2017 | 0 |
x | 1/31/2017 | 0 |
x | 2/28/2017 | 1 |
y | 1/31/2017 | 0 |
y | 2/28/2017 | 1 |
z | 1/31/2017 | 0 |
z | 1/31/2017 | 0 |
I want to get the maximum value of column status 1 year from actual date. to do this i am using proc sql but it isn't functioning as it should (i am quite new to this).
what i want to get is:
ID | Date | Status | Status after 1 year |
1 | 12/31/2016 | 0 | 0 |
1 | 1/31/2017 | 0 | 1 |
1 | 1/31/2018 | 1 |
what i get:
ID | Date | Status | Status after 1 year |
1 | 12/31/2016 | 0 | 1 |
1 | 1/31/2017 | 0 | 1 |
1 | 1/31/2018 | 1 | 1 |
The code i have used:
Proc sql;
Select Customer ID, Date, Status, Max (Status) as Status after 1 year
from db t1
where Date between t1.Date and t1.date + 365
group by Customer ID, Date
thank you in advance. is there any way to do this without proc sql? if not could you please provide any hint on how to proceed? I am using sas enterprise guide 7.1
Nothing to be sorry about. Below code should return what you're after.
The yrdif() function as I've used it calculates the AGE between two dates. As I understand you this should be something between 0 and exactly 1 (if the end date is one day later yrdif() would return a value of 1 with some decimals).
data have;
infile datalines truncover dlm=' ' dsd;
input ID Observation_date:anydtdte. Status;
format Observation_date date9.;
datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;
proc sql;
select distinct
l.*,
max(r.status) as max_status_within_year
from
have l
left join
have r
on
l.id=r.id
and yrdif(l.Observation_date,r.Observation_date) between 0 and 1
/* and yrdif(intnx('month',l.Observation_date,0,'b'),intnx('month',r.Observation_date,0,'b')) between 0 and 1*/
/* and intck('month',l.Observation_date,r.Observation_date) between 0 and 12*/
group by l.id,l.Observation_date
having max(r.status)=r.status
;
quit;
The first alternative condition in comments first aligns the two dates to the beginning of the month and only then calculates the AGE.
The 2nd alternative condition calculates the number of moths boundaries between two dates (31Jan2017, 01Feb2017 would return 1).
Welcome to the forums!
1. Your input and output dates are different
2. Your input and output IDs are different
3. Post input data as a data step
How to convert datasets to data steps (You can use this macro)
How to post code
4. What does "1 year from actual date" mean? On the very same date a year later? Is there an interval to consider?
Hi. thank you for the comment.
The initial data look something like below:
ID | Observation Date | Status |
1 | 12/31/2016 | 0 |
1 | 1/31/2017 | 0 |
1 | 12/31/2017 | 0 |
1 | 1/31/2018 | 1 |
For each ID and for each data date (i have monthly data) i have their status. I want to add a column where it is shown the maximum value of Status 1 year from the observation date. For example in the first observation I want to know ID=1 status 1 year from observation date which would be 12/31/2017 and would be equal to 0. The same for the second observation i want to know its status 1 year from the 1/31/2017 so until 1/31/2018 and in this case it would be equal to 1.
1 year from now would mean 370 days added to observation date (considering that the monthly date is always on end of month but may vary with 1-2 days..
What I would like to have as a result is:
ID | Date | Status | Status after 1 year |
1 | 12/31/2016 | 0 | 0 |
1 | 1/31/2017 | 0 | 1 |
1 | 12/31/2017 | 0 | 1 |
1 | 1/31/2018 | 1 | 1 |
for the last 2 observations even though 1 full year has not i want to have the maximum value of the available dates.
I pasted this into SAS:
ID Observation Date Status
1 12/31/2016 0
1 1/31/2017 0
1 12/31/2017 0
1 1/31/2018 1
but I get errors.
Edit your post so we can cut-and-paste your code into SAS. Don't make us do your work converting your post into valid SAS code.
Same goes for your desired results.
ID Observation_date Status
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 0
1 31/01/2018 1
Sorry I did not have access to SAS when I wrote the issue.
this is the data copy pasted from SAS data.
The code I use is Below:
Proc sql;
SELECT
ID,OBSERVATION_DATE, Status, MAX(Status) FORMAT=BEST4. AS Status_after_1_year
FROM work.datatest t1
where Observation_date between t1.Observation_date and t1.Observation_date+365
GROUP BY ID
oRDER BY Observation_date asc;
QUIT;
the results i get:
ID | Observation_date | Status | Status_after_1_year |
1 | 31/12/2016 | 0 | 1 |
1 | 31/01/2017 | 0 | 1 |
1 | 31/12/2017 | 0 | 1 |
1 | 31/01/2018 | 1 | 1 |
In the first observation i want the "Status_after_1_year" to be =0 as the maximum value 1 is reached only on 31 January 2018 and therefore the value=1 should start at the second observation.
I look forward to your feedback.
thank you in advance
And you want to do so without PROC SQL, correct?
Ok. Here is a data step approach, that gives you what you want
data have;
input ID Observation_date:ddmmyy10. Status;
format Observation_date ddmmyy10.;
datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 0
1 31/01/2018 1
2 31/12/2016 0
2 31/01/2017 0
2 31/12/2017 0
2 31/01/2018 1
;
data want(drop=max_date);
do until (last.id);
set have;
by id;
if Observation_date ge Max_date then Max_date=Observation_date;
end;
do until (last.id);
set have;
by id;
Status_after_1_year=0;
if intck('day', Observation_date, Max_date) le 365 then
Status_after_1_year=1;
output;
end;
run;
This gives
Maybe I was not clear but this does not work from me since i want to know the maximum value column "status" has and not just the difference between the two dates. For example if the data would be as below:
ID Observation_date Status
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 2
then the desired output wold be:
ID | Observation_date | Status | Status_after_1_year |
1 | 31/12/2016 | 0 | 1 |
1 | 31/01/2017 | 0 | 2 |
1 | 31/12/2017 | 0 | 2 |
1 | 31/01/2018 | 1 | 2 |
To better derive the rules you want to implement can you please show us how the desired result would look like for below sample data.
data have;
infile datalines truncover dlm=' ' dsd;
input ID Observation_date:anydtdte. Status;
format Observation_date date9.;
datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;
Hi. the results woul look like below: (Considering that in my database the end of month date is not always the same i want the difference between the two observation date considered to be <=370. )
ID | Observation_date | Status | Status_after_1_year | Logic |
1 | 31/12/2016 | 0 | 1 | Maximum Value of Status from 31/12/2016 to 31/12/2017 |
1 | 31/01/2017 | 0 | 1 | Maximum Value of Status from 31/01/2017 to 31/01/2018 |
1 | 31/12/2017 | 1 | 2 | Maximum Value of Status from 31/12/2017 to 31/12/2018 |
1 | 31/01/2018 | 1 | 2 | Maximum Value of Status from 31/01/2018 to 31/01/2019 |
2 | 31/03/2018 | 2 | 2 | Maximum Value of Status from 31/03/2018 to 31/03/2019 |
3 | 31/01/2020 | 3 | 3 | Maximum Value of Status from 31/01/2020 to 31/01/2021 |
Sorry for continuing asking instead of providing a solution. I just try to get full clarity first.
Soo... about the 370 days and "Considering that in my database the end of month date is not always the same":
Are you aware that SAS provides calendar functions which would make it very simple to shift any date to the beginning or end of month before a date comparison? There is also a function which could just return the number of years or months or ... between two dates.
Could any of this be useful for your task? Would shifting the dates prior to comparing them help?
Nothing to be sorry about. Below code should return what you're after.
The yrdif() function as I've used it calculates the AGE between two dates. As I understand you this should be something between 0 and exactly 1 (if the end date is one day later yrdif() would return a value of 1 with some decimals).
data have;
infile datalines truncover dlm=' ' dsd;
input ID Observation_date:anydtdte. Status;
format Observation_date date9.;
datalines;
1 31/12/2016 0
1 31/01/2017 0
1 31/12/2017 1
1 31/01/2018 1
1 31/03/2018 2
1 31/01/2020 3
;
run;
proc sql;
select distinct
l.*,
max(r.status) as max_status_within_year
from
have l
left join
have r
on
l.id=r.id
and yrdif(l.Observation_date,r.Observation_date) between 0 and 1
/* and yrdif(intnx('month',l.Observation_date,0,'b'),intnx('month',r.Observation_date,0,'b')) between 0 and 1*/
/* and intck('month',l.Observation_date,r.Observation_date) between 0 and 12*/
group by l.id,l.Observation_date
having max(r.status)=r.status
;
quit;
The first alternative condition in comments first aligns the two dates to the beginning of the month and only then calculates the AGE.
The 2nd alternative condition calculates the number of moths boundaries between two dates (31Jan2017, 01Feb2017 would return 1).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.