I have a dataset with a list of members and spans they were enrolled in a certain program. Start2 and End2 are populated if this member was enrolled in another span in addition to the first span, and if it does not overlap.
member | start1 | end1 | start2 | end2 |
123 | 01MAR2020 | 01APR2020 | 01FEB2022 | 30JUN2023 |
456 | 01JAN2019 | 01JAN2020 | ||
789 | 13APR2023 | 15APR2023 |
I want to count the number of days that the member was enrolled for a given year, like below:
member | start1 | end1 | start2 | end2 | days2019 | days2020 | days2021 | days2022 | days2023 |
123 | 01MAR2020 | 01APR2020 | 01FEB2022 | 30JUN2023 | 0 | 31 | 0 | 333 | 180 |
456 | 01JAN2019 | 01JAN2020 | 365 | 1 | 0 | 0 | 0 | ||
789 | 13APR2023 | 15APR2023 | 0 | 0 | 0 | 0 | 3 |
I've tried creating an array with the full list of days that the member was enrolled and matching to the days of a given year, but that seems inefficient. Would appreciate any help!
Please provide examples of data you have in the form of working data step code, in a text box opened on the forum with the </> icon above the message window, that duplicates or at least mimics your data. That will reduce a lot of questions about things like variable names, value types and formats. The text box is important as the message windows will reformat pasted text and that reformatting can result in code that will no longer run correctly.
Your current data is an example of why "wide" data is harder to work with. This gets much simpler if you make a data set that looks more like:
Member PeriodNumber StartDate EndDate
Then the values can be calculated by PERIOD and then combined to get the count of days by Member.
Should be simple with a an array and some DO loops.
data have;
input member $ (start1 end1 start2 end2) (:date.);
format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .
789 13APR2023 15APR2023 . .
;
data want;
set have;
array days[2019:2024] days2019-days2024;
array start start: ;
array end end: ;
do year=2019 to 2024;
days[year]=0;
end;
do period=1 to dim(start) while(not missing(start[period]));
do day=max('01JAN2019'd,start[period]) to min('31DEC2024'd,end[period]);
year=year(day);
days[year]+1;
end;
end;
drop period day year;
run;
Result:
Obs member start1 end1 start2 end2 days2019 days2020 days2021 days2022 days2023 days2024 1 123 01MAR2020 01APR2020 01FEB2022 30JUN2023 0 32 0 334 181 0 2 456 01JAN2019 01JAN2020 . . 365 1 0 0 0 0 3 789 13APR2023 15APR2023 . . 0 0 0 0 3 0
Instead of iterating once per day in each span (per @Tom 's suggestion), this iterates once per span in each year. A bit more complex, but many fewer iterations.
data have;
input member $ (start1 end1 start2 end2) (:date.);
format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .
789 13APR2023 15APR2023 . .
;
data want (drop=_:);
set have;
array days {2019:2023} days2019-days2023 ;
array start{2};
array end{2} ;
do _y=2019 to 2023;
days{_y}=0;
do _s=1 to 2 while (start{_s}^=.); /*Add until a missing span is encountered */
days{_y}=days{_y} + ifn(start{_s}<=mdy(12,31,_y) and end{_s}>=mdy(1,1,_y)
,1+min(end{_s},mdy(12,31,_y))-max(start{_s},mdy(1,1,_y))
,0);
end;
end;
run;
If the start of a span occurs on or before the end of a year, AND the end of a span occurs on or after the first of a year, then calculate the relevant part of that year
data have;
input member $ (start1 end1 start2 end2) (:date.);
format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .
789 13APR2023 15APR2023 . .
;
data temp;
set have;
if not missing(start1) and not missing(end1) then do;
do date=start1 to end1;output;end;
end;
if not missing(start2) and not missing(end2) then do;
do date=start2 to end2;output;end;
end;
run;
proc sql;
create table temp2 as
select member,year(date) as year,count(distinct date) as count
from temp
group by member,calculated year;
create table temp3 as
select a.*,coalesce(count,0) as count
from (select * from (select distinct member from temp2),(select distinct year from temp2)) as a
natural left join temp2;
quit;
proc transpose data=temp3 out=want(drop=_:) prefix=day;
by member;
id year;
var count;
run;
It's with SAS data steps and especially SAS proc's often easier to work with narrow data structures. That's what below code creates.
data have;
infile datalines dsd truncover;
input member (start1 end1 start2 end2) (:date11.) /*days2019 days2020 days2021 days2022 days2023*/ ;
format start1 end1 start2 end2 date11.;
datalines;
123,1-Mar-20,1-Apr-20,1-Feb-22,30-Jun-23,0,31,0,333,180
456,1-Jan-19,1-Jan-20,,,365,1,0,0,0
789,13-Apr-23,15-Apr-23,,,0,0,0,0,3
;
data want;
set have;
array dates{2,2} start1 end1 start2 end2;
format start_dt end_dt from_dt to_dt date11.;
do k=1 to dim1(dates);
if nmiss(dates[k,1],dates[k,2]) then continue;
start_dt =dates[k,1];
end_dt =dates[k,2];
from_dt=start_dt;
do i=0 by 1;
to_dt=min(intnx('year',start_dt,i+1,'b'),end_dt);
year=year(from_dt);
days=to_dt-from_dt;
output;
if to_dt=end_dt then leave;
from_dt=intnx('year',start_dt,i+1,'b');
end;
end;
drop start1 end1 start2 end2 k i;
run;
proc print data=want;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.