Dear SAS users,
Do you have suggestions on how I can create different start and stop dates based on noncontinuous data points? For example, for each observation I want to create variables mon_start1, mon_end1, mon_start2, mon_end2, mon_start3, mon_end3, etc. I can create mon_start1 and mon_end1, but cannot figure out how to calculate the remaining variables (mon_start2, mon_end2, mon_start3, mon_end3, etc.)
Here's a sample dataset for months Jan 2015 - Dec 2016 along with the beginning sas code where I created variables mon_start1 and mon_end1. For example, for ID=AA, the start/end variables should be as follows:
mon_start1=1, mon_end1=6,
mon_start2=10, mon_end2=18,
mon_start3=22, mon_end3=24
data have;
input id $ dob mmddyy10. total_mon mon1-mon24;
format dob monyy7.;
datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
;
data have2;
set have;
start=1;
Mon_Start1=(intck('month','01Jan2015'd,dob))+1;
array month mon:;
do m=Mon_Start1-start+1 to dim(month) while(month[m]);
end;
if m>=24 then Mon_end1=total_mon;
else Mon_End1=m-1;
run;
Thanks!
Like any question where you have many columns, it is better to drop the "Excel thinking" and normalise your data, i.e. have data going down in observations. Now the below code, loops over the array and creates a normalised data structure with start/stop times, if the data was normalised in the first place it would be even simpler than this. Remember your output report files can be transposed before output and so don't need to look like the data you work with.
data have; input id $ dob mmddyy10. total_mon mon1-mon24; format dob monyy7.; datalines; AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1 BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . . DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 . ; run; data want (keep=id dob start_month end_month); set have; array mon{24}; length start_month end_month $20; do i=1 to 24; if mon{i}=1 and start_month="" then start_month=vname(mon{i}); else if (mon{i}=. and end_month="" and start_month ne "") then do; end_month=vname(mon{i-1}); output; end_month=""; start_month=""; end; else if i=24 and end_month="" and start_month ne "" then do; end_month=vname(mon{i}); output; end; end; run;
Like any question where you have many columns, it is better to drop the "Excel thinking" and normalise your data, i.e. have data going down in observations. Now the below code, loops over the array and creates a normalised data structure with start/stop times, if the data was normalised in the first place it would be even simpler than this. Remember your output report files can be transposed before output and so don't need to look like the data you work with.
data have; input id $ dob mmddyy10. total_mon mon1-mon24; format dob monyy7.; datalines; AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1 BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . . DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 . ; run; data want (keep=id dob start_month end_month); set have; array mon{24}; length start_month end_month $20; do i=1 to 24; if mon{i}=1 and start_month="" then start_month=vname(mon{i}); else if (mon{i}=. and end_month="" and start_month ne "") then do; end_month=vname(mon{i-1}); output; end_month=""; start_month=""; end; else if i=24 and end_month="" and start_month ne "" then do; end_month=vname(mon{i}); output; end; end; run;
This is a beautiful code and does even more than I had hoped for with my initial question! And yes, you are correct, I need to reframe how I think of my data - i.e. drop the Excel thinking.'
Thank you for putting it together so quickly.
I wouldn't be so quick in deciding that @RW9's code does everything you want to accomplish. I added two example records where the code fails to work.
Of course, I'm not suggesting that my code is perfect either, but it doesn't require restructuring your data:
data have; input id $ dob mmddyy10. total_mon mon1-mon24; format dob monyy7.; datalines; AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1 BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . . DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 . FF 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . 1 GG 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . . ; data want (drop=i start_month period); set have; array month{24} mon1-mon24; array mon_start(3); array mon_end(3); period=1; call missing(start_month); do i=1 to 24; if month{i}=1 then do; if missing(start_month) then do; start_month=i; mon_start{period}=i; end; if i eq 24 then mon_end{period}=i; end; else if missing(month{i}) and missing(mon_end{period}) and not missing(start_month) then do; mon_end{period}=i-1; period+1; call missing(start_month); end; end; run;
Art, CEO, AnalystFinder.com
options missing='x';
data have;
input id $ dob mmddyy10. total_mon mon1-mon24;
format dob monyy7.;
datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
FF 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . 1
GG 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . .
;
data want;
set have;
temp=cats(of mon1-mon24);
pid=prxparse('/1+/');
s=1;e=length(temp);
call prxnext(pid,s,e,temp,p,l);
do while(p>0);
start=p;end=p+l-1; output;
call prxnext(pid,s,e,temp,p,l);
end;
drop temp pid s e p l;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.