BookmarkSubscribeRSS Feed
gabagotati
Calcite | Level 5

I have a dataset with a list of members and spans they were enrolled in a certain program. Start2 and End2 are populated if this member was enrolled in another span in addition to the first span, and if it does not overlap.

 

memberstart1end1start2end2
12301MAR202001APR202001FEB202230JUN2023
45601JAN201901JAN2020  
78913APR202315APR2023  

 

I want to count the number of days that the member was enrolled for a given year, like below:

 

memberstart1end1start2end2days2019days2020days2021days2022days2023
12301MAR202001APR202001FEB202230JUN20230310333180
45601JAN201901JAN2020  3651000
78913APR202315APR2023  00003

 

I've tried creating an array with the full list of days that the member was enrolled and matching to the days of a given year, but that seems inefficient. Would appreciate any help!

5 REPLIES 5
ballardw
Super User

Please provide examples of data you have in the form of working data step code, in a text box opened on the forum with the </> icon above the message window, that duplicates or at least mimics your data. That will reduce a lot of questions about things like variable names, value types and formats. The text box is important as the message windows will reformat pasted text and that reformatting can result in code that will no longer run correctly.

 

Your current data is an example of why "wide" data is harder to work with. This gets much simpler if you make a data set that looks more like:

Member PeriodNumber StartDate EndDate

Then the values can be calculated by PERIOD and then combined to get the count of days by Member.

Tom
Super User Tom
Super User

Should be simple with a an array and some DO loops.

data have;
  input member $ (start1 end1 start2 end2) (:date.);
  format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .  
789 13APR2023 15APR2023 . .
;

data want;
  set have;
  array days[2019:2024] days2019-days2024;
  array start start: ;
  array end   end: ;
  do year=2019 to 2024;
    days[year]=0;
  end;
  do period=1 to dim(start) while(not missing(start[period]));
    do day=max('01JAN2019'd,start[period]) to min('31DEC2024'd,end[period]);
      year=year(day);
      days[year]+1;
    end;
  end;
  drop period day year;
run;

Result:

Obs   member      start1        end1      start2        end2   days2019   days2020   days2021   days2022   days2023   days2024

 1     123     01MAR2020   01APR2020   01FEB2022   30JUN2023        0        32          0         334        181         0
 2     456     01JAN2019   01JAN2020           .           .      365         1          0           0          0         0
 3     789     13APR2023   15APR2023           .           .        0         0          0           0          3         0

 

mkeintz
PROC Star

Instead of iterating once per day in each span (per @Tom 's suggestion), this iterates once per span in each year.  A bit more complex, but many fewer iterations.  

 

 

data have;
  input member $ (start1 end1 start2 end2) (:date.);
  format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .  
789 13APR2023 15APR2023 . .
;

data want (drop=_:);
  set have;
  array days {2019:2023} days2019-days2023 ;
  array start{2};
  array end{2} ;
  do _y=2019 to 2023;
    days{_y}=0;
    do _s=1 to 2 while (start{_s}^=.);    /*Add until a missing span is encountered */
      days{_y}=days{_y} + ifn(start{_s}<=mdy(12,31,_y) and end{_s}>=mdy(1,1,_y)
                             ,1+min(end{_s},mdy(12,31,_y))-max(start{_s},mdy(1,1,_y))
                             ,0);
    end;
  end;
run;

If the start of a span occurs on or before the end of a year, AND the end of a span occurs on or after the first of a year, then calculate the relevant part of that year

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
  input member $ (start1 end1 start2 end2) (:date.);
  format start: end: date9.;
cards;
123 01MAR2020 01APR2020 01FEB2022 30JUN2023
456 01JAN2019 01JAN2020 . .  
789 13APR2023 15APR2023 . .
;
data temp;
set have;
if not missing(start1) and not missing(end1) then do;
 do date=start1 to end1;output;end;
end;
if not missing(start2) and not missing(end2) then do;
 do date=start2 to end2;output;end;
end;
run;
proc sql;
create table temp2 as
select member,year(date) as year,count(distinct date) as count
 from temp
  group by member,calculated year;
create table temp3 as
select a.*,coalesce(count,0) as count
 from (select * from (select distinct member from temp2),(select distinct year from temp2)) as a
  natural left join temp2;
quit;
proc transpose data=temp3 out=want(drop=_:) prefix=day;
by member;
id year;
var count;
run;
Patrick
Opal | Level 21

It's with SAS data steps and especially SAS proc's often easier to work with narrow data structures. That's what below code creates.

data have;
  infile datalines dsd truncover;
  input member (start1 end1 start2 end2) (:date11.) /*days2019 days2020 days2021 days2022 days2023*/ ;
  format start1 end1 start2 end2 date11.;
  datalines;
123,1-Mar-20,1-Apr-20,1-Feb-22,30-Jun-23,0,31,0,333,180
456,1-Jan-19,1-Jan-20,,,365,1,0,0,0
789,13-Apr-23,15-Apr-23,,,0,0,0,0,3
;

data want;
  set have;
  array dates{2,2} start1 end1 start2 end2;

  format start_dt end_dt from_dt to_dt date11.;
  do k=1 to dim1(dates);
    if nmiss(dates[k,1],dates[k,2]) then continue;
    start_dt =dates[k,1];
    end_dt   =dates[k,2];
    from_dt=start_dt;
    do i=0 by 1;
      to_dt=min(intnx('year',start_dt,i+1,'b'),end_dt);
      year=year(from_dt);
      days=to_dt-from_dt; 
      output;
      if to_dt=end_dt then leave;
      from_dt=intnx('year',start_dt,i+1,'b');
    end;
  end;
  drop start1 end1 start2 end2 k i;
run;

proc print data=want;
run;

Patrick_0-1711941724065.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 382 views
  • 3 likes
  • 6 in conversation