Good morning.
I am attempting to find the first day of each month for which a record exits. For instance, person_1 may have been involved from 1/10/2012 through 8/17/2020 while another may have very different dates of involvement. What I need is the first day of each month in the range where the person existed. So person_1 would exist in 2/1/2012, 3/1/2012, 4/1/2012 all the way to 8/1/2020 and be counted on each first of each month. Unfortunately I don't have a clear "date" from which to count from I only have a range to count within.
Could anyone point me in the right direction?
Thanks very much.
Have
person | start | end |
PERSON_1 | 1/10/2012 | 8/17/2020 |
PERSON_2 | 10/2/2020 | 10/30/2020 |
PERSON_3 | 9/30/2009 | 9/30/2021 |
PERSON_4 | 10/15/2013 | 2/19/2014 |
PERSON_5 | 2/1/2017 | 12/31/2021 |
PERSON_6 | 2/1/2017 | 10/3/2021 |
PERSON_7 | 6/29/2012 | 11/30/2018 |
PERSON_8 | 7/6/2014 | 11/15/2016 |
PERSON_9 | 7/29/2013 | 6/14/2017 |
PERSON_10 | 1/6/2014 | 6/19/2019 |
data have;
infile cards dlm='09'x;
input person :$10. start :mmddyy10. end :mmddyy10.;
format start end date9.;
cards;
PERSON_1 1/10/2012 8/17/2020
PERSON_2 10/2/2020 10/30/2020
PERSON_3 9/30/2009 9/30/2021
PERSON_4 10/15/2013 2/19/2014
PERSON_5 2/1/2017 12/31/2021
PERSON_6 2/1/2017 10/3/2021
PERSON_7 6/29/2012 11/30/2018
PERSON_8 7/6/2014 11/15/2016
PERSON_9 7/29/2013 6/14/2017
PERSON_10 1/6/2014 6/19/2019
PERSON_11 1/6/2014 1/20/2014
;
;
;;
run;
data want;
set have;
i=1;
if put(start, yymmn6.) ne put(end, yymmn6.) then /*what happens if the start is same year month as end?*/
do while (date<end);
date=intnx('month', start, i, 'b');
i+1;
output;
end;
else
output;
format date date9.;
run;
You don't indicate output requirements.
Do you just want to count the number of persons per month?
So something like this as output?
Obs date n_person 115 2019-03-01 5 116 2019-04-01 5 117 2019-05-01 5 118 2019-06-01 5 119 2019-07-01 4 120 2019-08-01 4 121 2019-09-01 4 122 2019-10-01 4 123 2019-11-01 4 124 2019-12-01 4
You can find the range of dates from the data:
proc sql ;
create table dates as
select min( min(start,end) ) as start format=yymmdd10.
, max( max(start,end) ) as end format=yymmdd10.
from have
;
quit;
And then blow that up to one observation per month.
data dates;
set dates;
do offset = 0 to intck('month',start,end);
date=intnx('month',start,offset);
output;
end;
format date yymmdd10.;
run;
Now you can join it with your actual data and count the number of distinct PERSON values per month.
proc sql ;
create table want as
select a.date
, count(distinct b.person) as n_person
from dates a
left join have b
on a.date between b.start and b.end
group by 1
;
quit;
data have;
infile cards dlm='09'x;
input person :$10. start :mmddyy10. end :mmddyy10.;
format start end date9.;
cards;
PERSON_1 1/10/2012 8/17/2020
PERSON_2 10/2/2020 10/30/2020
PERSON_3 9/30/2009 9/30/2021
PERSON_4 10/15/2013 2/19/2014
PERSON_5 2/1/2017 12/31/2021
PERSON_6 2/1/2017 10/3/2021
PERSON_7 6/29/2012 11/30/2018
PERSON_8 7/6/2014 11/15/2016
PERSON_9 7/29/2013 6/14/2017
PERSON_10 1/6/2014 6/19/2019
PERSON_11 1/6/2014 1/20/2014
;
;
;;
run;
data want;
set have;
i=1;
if put(start, yymmn6.) ne put(end, yymmn6.) then /*what happens if the start is same year month as end?*/
do while (date<end);
date=intnx('month', start, i, 'b');
i+1;
output;
end;
else
output;
format date date9.;
run;
You don't indicate output requirements.
I so wish there was a way to accept two solutions as correct. Both of your solutions worked so honestly I flipped a coin (who has coins these days?).
Thank you both very much for your help. Both solutions give me exactly what I need.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.